Utility Functions

In addition to components intrinsically tied to rulesets, Fathom comes with a variety of utility procedures for building scoring and other callback functions or just for improving the imperative shell around your ruleset.

The utilities hang off a utils object in the top-level Fathom module. To import them, do something like this:

const {
  utils: { isBlock, isVisible },
} = require('fathom-web');

This will result in top-level isBlock and isVisible symbols.

ancestors(element)

Yield an element and each of its ancestors.

attributesMatch(element, predicate, attrs)

Checks whether any of the element’s attribute values satisfy some condition.

Example:

rule(type('foo'),
     score(attributesMatch(element,
                           attr => attr.includes('good'),
                           ['id', 'alt']) ? 2 : 1))
Arguments
  • element (Node) – Element whose attributes you want to search

  • predicate (function) – A condition to check. Take a string and return a boolean. If an attribute has multiple values (e.g. the class attribute), attributesMatch will check each one.

  • attrs (Array.<string>) – An Array of attributes you want to search. If none are provided, search all.

Returns

Whether any of the attribute values satisfy the predicate function

best(by, isBetter)

From an iterable return the best item, according to an arbitrary comparator function. In case of a tie, the first item wins.

Arguments
  • by (function) – Given an item of the iterable, return a value to compare

  • isBetter (function) – Return whether its first arg is better than its second

collapseWhitespace(str)

Return a string with each run of whitespace collapsed to a single space.

domSort(fnodes)

Sort the elements by their position in the DOM.

Arguments
  • fnodes (iterable) – fnodes to sort

Returns

Array – sorted fnodes

first(iterable)

Return the first item of an iterable.

getDefault(map, key, defaultMaker)

Get a key of a map or, if it’s missing, a default value.

identity(x)

Return the passed-in arg. Useful as a default.

inlineTextLength(shouldTraverse)

Return the total length of the inline text within an element, with whitespace collapsed.

Arguments
  • shouldTraverse (function) – Specify additional elements to exclude by returning false

inlineTexts(shouldTraverse)

Yield strings of text nodes within a normalized DOM node and its children, without venturing into any contained block elements.

Arguments
  • shouldTraverse (function) – Specify additional elements to exclude by returning false

isBlock(element)

Return whether a DOM element is a block element by default (rather than by styling).

isVisible(fnodeOrElement)

Return whether an element is practically visible, considering things like 0 size or opacity, visibility: hidden and overflow: hidden.

Merely being scrolled off the page in either horizontally or vertically doesn’t count as invisible; the result of this function is meant to be independent of viewport size.

Throws

NoWindowError – The element (or perhaps one of its ancestors) is not in a window, so we can’t find the getComputedStyle() routine to call. That routine is the source of most of the information we use, so you should pick a different strategy for non-window contexts.

isWhitespace(element)

Return whether an element is a text node that consist wholly of whitespace.

length(iterable)

Return the number of items in an iterable, consuming it as a side effect.

linearScale(number, zeroAt, oneAt)

Scale a number to the range [0, 1] using a linear slope.

For a rising line, the result is 0 until the input reaches zeroAt, then increases linearly until oneAt, at which it becomes 1. To make a falling line, where the result is 1 to the left and 0 to the right, use a zeroAt greater than oneAt.

linkDensity(inlineLength)

Return the ratio of the inline text length of the links in an element to the inline text length of the entire element.

Arguments
  • inlineLength (number) – Optionally, the precalculated inline length of the fnode. If omitted, we will calculate it ourselves.

max(by)

Return the maximum item from an iterable, as defined by >.

Works with any type that works with >. If multiple items are equally great, return the first.

Arguments
  • by (function) – Given an item of the iterable, returns a value to compare

maxes(iterable, by)

Return an Array of maximum items from an iterable, as defined by > and ===.

If an empty iterable is passed in, return [].

min(iterable, by)

Return the minimum item from an iterable, as defined by <.

If multiple items are equally great, return the first.

class NiceSet()

A Set with the additional methods it ought to have had

NiceSet.extend(otherSet)

Union another set or other iterable into myself.

Returns

myself, for chaining

NiceSet.minus(otherSet)

Subtract another set from a copy of me.

Returns

a copy of myself excluding the elements in otherSet.

NiceSet.pop()

Remove and return an arbitrary item. Throw an Error if I am empty.

NiceSet.toString()

Actually show the items in me.

numberOfMatches(regex, haystack)

Return the number of times a regex occurs within the string haystack.

Caller must make sure regex has the ‘g’ option set.

page(scoringFunction)

Wrap a scoring callback, and set its element to the page root iff a score is returned.

This is used to build rulesets which classify entire pages rather than picking out specific elements.

For example, these rules might classify a page as a “login page”, influenced by whether they have login buttons or username fields:

rule(type('loginPage'), score(page(pageContainsLoginButton))), rule(type('loginPage'), score(page(pageContainsUsernameField)))

reversed(array)

Return an backward iterator over an Array without reversing it in place.

rgbaFromString(str)

Return the extracted [r, g, b, a] values from a string like “rgba(0, 5, 255, 0.8)”, and scale them to 0..1. If no alpha is specified, return undefined for it.

rootElement(element)

Given any node in a DOM tree, return the root element of the tree, generally an HTML element.

saturation(r, g, b)

Return the saturation 0..1 of a color defined by RGB values 0..1.

setDefault(map, key, defaultMaker)

Get a key of a map, first setting it to a default value if it’s missing.

sigmoid(x)

Return the sigmoid of the argument: 1 / (1 + exp(-x)). This is useful for crunching a feature value that may have a wide range into the range (0, 1) without a hard ceiling: the sigmoid of even a very large number will be a little larger than that of a slightly smaller one.

Arguments
  • x (Number) – a number to be compressed into the range (0, 1)

sum(iterable)

Return the sum of an iterable, as defined by the + operator.

toDomElement(fnodeOrElement)

Return the DOM element contained in a passed-in fnode. Return passed-in DOM elements verbatim.

Arguments
  • fnodeOrElement (Node|Fnode) –

toposort(nodes, nodesThatNeed)

Return an Array, the reverse topological sort of the given nodes.

Arguments
  • nodes – An iterable of arbitrary things

  • nodesThatNeed (function) – Take a node and returns an Array of nodes that depend on it

walk(shouldTraverse)

Iterate, depth first, over a DOM node. Return the original node first.

Arguments
  • shouldTraverse (function) – Given a node, say whether we should include it and its children. Default: always true.