Rule and Ruleset Reference

Rulesets

The top-level Fathom object is the ruleset, an unordered collection of rules. The plain old Ruleset() is what you typically construct, via the ruleset convenience function:

ruleset(rule[, rule, ...])

Return a new Ruleset() containing the given rules.

class Ruleset(rule[, rule, ...])

An unbound ruleset. Eventually, you’ll be able to add rules to these. Then, when you bind them by calling against(), the resulting BoundRuleset() will be immutable.

against(doc)

Commit this ruleset to running against a specific DOM tree.

This doesn’t actually modify the Ruleset but rather returns a fresh BoundRuleset, which contains caches and other stateful, per-DOM bric-a-brac.

rules()

Return all the rules (both inward and outward) that make up this ruleset.

From this, you can construct another ruleset like this one but with your own rules added.

Then you call Ruleset.against() to get back a BoundRuleset(), which is specific to a given DOM tree. From that, you pull answers.

class BoundRuleset()

A ruleset that is earmarked to analyze a certain DOM

Carries a cache of rule results on that DOM. Typically comes from Ruleset.against().

Arguments:
  • inRules (Array) – Non-out() rules
  • outRules (Map) – Output key -> out() rule
get(thing)

Return an array of zero or more fnodes.

Arguments:
  • thing (string|Lhs|Node) – Can be... * A string which matches up with an “out” rule in the ruleset. If the out rule uses through(), the results of through’s callback (which might not be fnodes) will be returned. * An arbitrary LHS which we calculate and return the results of * A DOM node, for which we will return the corresponding fnode Results are cached in the first and third cases.

Rules

These are the control structures which govern the flow of scores, types, and notes through a ruleset. You construct a rule by calling rule() and passing it a left-hand side and a right-hand side:

rule(lhs, rhs)

Construct and return the proper type of rule class based on the inwardness/outwardness of the RHS.

Left-hand Sides

Left-hand sides are currently a few special forms which select nodes to be fed to right-hand sides.

dom(selector)

Take nodes that match a given DOM selector. Example: dom('meta[property="og:title"]')

Every ruleset has at least one dom rule, as that is where nodes begin to flow into the system.

type(theType)

Take nodes that have the given type. Example: type('titley')

max()

Of the nodes selected by a type call to the left, constrain the LHS to return only the max-scoring one. If there is a tie, more than 1 node will be returned. Example: type('titley').max()

bestCluster(options)

Take the nodes selected by a type call to the left, group them into clusters, and return the nodes in the cluster that has the highest total score (on the relevant type).

Nodes come out in arbitrary order, so, if you plan to emit them, consider using .out('whatever').allThrough(domSort). See domSort().

If multiple clusters have equally high scores, return an arbitrary one, because Fathom has no way to represent arrays of arrays in rulesets.

Arguments:
  • options (Object) – The same depth costs taken by distance(), plus splittingDistance, which is the distance beyond which 2 clusters will be considered separate. splittingDistance, if omitted, defaults to 3.
and(typeCall[, typeCall, ...])

Experimental. Pull nodes that conform to multiple conditions at once.

For example: and(type('title'), type('english'))

Caveats: and supports only simple type calls as arguments for now, and it may fire off more rules as prerequisites than strictly necessary. not and or don’t exist yet, but you can express or the long way around by having 2 rules with identical RHSs.

when(predicate)

Further specify type of node you’d like to select.

Can be chained with type() or dom().

Example: dom('p').when(fnode => fnode.element.id.length > 10)

Arguments:
  • predicate (function) – Accepts a fnode and returns a boolean

Right-hand Sides

A right-hand side takes the nodes chosen by the left-hand side and mutates them. Spelling-wise, a RHS is a strung-together series of calls like this:

type('smoo').props(someCallback).type('whee').score(2)

To facilitate factoring up repetition in right-hand sides, calls layer together like sheets of transparent acetate: if there are repeats, as with type in the above example, the rightmost takes precedence and the left becomes useless. Similarly, if props(), which can return multiple properties of a fact (element, note, score, and type), is missing any of these properties, we continue searching to the left for anything that provides them (excepting other props() calls—if you want that, write a combinator, and use it to combine the 2 functions you want)). To prevent this, return all properties explicitly from your props callback, even if they are no-ops (like {score: 1, note: undefined, type: undefined}). Aside from this layering precedence, the order of calls does not matter.

A good practice is to use more declarative calls—score(), note(), and type()—as much as possible and save props() for when you need it. The query planner can get more out of the more specialized calls without you having to tack on verbose hints like atMost() or typeIn().

atMost(score)

Declare that the maximum returned score multiplier is such and such, which helps the optimizer plan efficiently. This doesn’t force it to be true; it merely throws an error at runtime if it isn’t. To lift an atMost constraint, call atMost() (with no args). The reason atMost and typeIn apply until explicitly cleared is so that, if someone used them for safety reasons on a lexically distant rule you are extending, you won’t stomp on their constraint and break their invariants accidentally.

conserveScore()

Base the scores this RHS applies on the scores of the input nodes rather than starting over from 1.

For now, there is no way to turn this back off (for example with a later application of props or conserveScore(false)).

props(callback)

Determine any of type, note, score, and element using a callback. This overrides any previous call to props and, depending on what properties of the callback’s return value are filled out, may override the effects of other previous calls as well.

The callback should return...

  • An optional score multiplier
  • A type (required on dom(...) rules, defaulting to the input one on type(...) rules)
  • Optional notes
  • An element, defaulting to the input one. Overriding the default enables a callback to walk around the tree and say things about nodes other than the input one.

For example...

function callback(fnode) {
    return [{score: 3,
             element: fnode.element,  // unnecessary, since this is the default
             type: 'texty',
             note: {suspicious: true}}];
}

If you use props, Fathom cannot look inside your callback to see what type you are emitting, so you must declare your output types with typeIn() or set a single static type with type. Fathom will complain if you don’t. (You can still opt not to return any type if the node turns out not to be a good match, even if you declare a typeIn().)

note(callback)

Whatever the callback returns (even undefined) becomes the note of the fact. This overrides any previous call to note.

Since every node can have multiple, independent notes (one for each type), this applies to the type explicitly set by the RHS or, if none, to the type named by the type call on the LHS. If the LHS has none because it’s a dom(...) LHS, an error is raised.

When you query for fnodes of a certain type, you can expect to find notes of any form you specified on any RHS with that type. If no note is specified, it will be undefined. However, if two RHSs emits a given type, one adding a note and the other not adding one (or adding an undefined one), the meaningful note overrides the undefined one. This allows elaboration on a RHS’s score (for example) without needing to repeat note logic.

Indeed, undefined is not considered a note. So, though notes cannot in general be overwritten, a note that is undefined can. Symmetrically, an undefined returned from a note or props() or the like will quietly decline to overwrite an existing defined note, where any other value would cause an error. Rationale: letting undefined be a valid note value would mean you couldn’t shadow a leftward note in a RHS without introducing a new singleton value to serve as a “no value” flag. It’s not worth the complexity and the potential differences between the (internal) fact and fnode note value semantics.

Best practice: any rule adding a type should apply the same note. If only one rule of several type-foo-emitting ones did, it should be made to emit a different type instead so downstream rules can explicitly state that they require the note to be there. Otherwise, there is nothing to guarantee the note-adding rule will run before the note-needing one.

out(key)

Expose the output of this rule’s LHS as a “final result” to the surrounding program. It will be available by calling get() on the ruleset and passing the key. You can run each node through a callback function first by adding through(), or you can run the entire set of nodes through a callback function by adding allThrough().

through(callback)

Append .through to out() to run each fnode emitted from the LHS through an arbitrary function before returning it to the containing program. Example:

out('titleLengths').through(fnode => fnode.noteFor('title').length)
allThrough(callback)

Append .allThrough to out() to run the entire iterable of emitted fnodes through an arbitrary function before returning them to the containing program. Example:

out('sortedTitles').allThrough(domSort)
score(scoreOrCallback)

Multiply the score of the input node by some number, which can be >1 to increase the score or <1 to decrease it (though negative scores are not recommended due to constant sign-flipping).

Since every node can have multiple, independent scores (one for each type), this applies to the type explicitly set by the RHS or, if none, to the type named by the type call on the LHS. If the LHS has none because it’s a dom(...) LHS, an error is raised.

Arguments:
  • scoreOrCallback (number|function) – Can either be a static number or else a callback which takes the fnode and returns a number.
type(theType)

Set the type applied to fnodes processed by this RHS.

typeIn(type[, type, ...])

Constrain this rule to emit 1 of a set of given types. Pass no args to lift a previous typeIn constraint, as you might do when basing a LHS on a common value to factor out repetition.

typeIn is mostly a hint for the query planner when you’re emitting types dynamically from props calls—in fact, an error will be raised if props is used without a typeIn or type to constrain it—but it also checks conformance at runtime to ensure validity.