Utility Functions¶
In addition to components intrinsically tied to rulesets, Fathom comes with a variety of utility procedures for building scoring and other callback functions or just for improving the imperative shell around your ruleset.
The utilities hang off a utils
object in the top-level Fathom module. To import them, do something like this:
const {
utils: { isBlock, isVisible },
} = require('fathom-web');
This will result in top-level isBlock
and isVisible
symbols.
-
ancestors
(element)¶ Yield an element and each of its ancestors.
-
attributesMatch
(element, predicate, attrs)¶ Checks whether any of the element’s attribute values satisfy some condition.
Example:
rule(type('foo'), score(attributesMatch(element, attr => attr.includes('good'), ['id', 'alt']) ? 2 : 1))
- Arguments
element (Node) – Element whose attributes you want to search
predicate (function) – A condition to check. Take a string and return a boolean. If an attribute has multiple values (e.g. the class attribute), attributesMatch will check each one.
attrs (Array.<string>) – An Array of attributes you want to search. If none are provided, search all.
- Returns
Whether any of the attribute values satisfy the predicate function
-
best
(by, isBetter)¶ From an iterable return the best item, according to an arbitrary comparator function. In case of a tie, the first item wins.
- Arguments
by (function) – Given an item of the iterable, return a value to compare
isBetter (function) – Return whether its first arg is better than its second
-
collapseWhitespace
(str)¶ Return a string with each run of whitespace collapsed to a single space.
-
domSort
(fnodes)¶ Sort the elements by their position in the DOM.
- Arguments
fnodes (iterable) – fnodes to sort
- Returns
Array – sorted fnodes
-
first
(iterable)¶ Return the first item of an iterable.
-
getDefault
(map, key, defaultMaker)¶ Get a key of a map or, if it’s missing, a default value.
-
identity
(x)¶ Return the passed-in arg. Useful as a default.
-
inlineTextLength
(shouldTraverse)¶ Return the total length of the inline text within an element, with whitespace collapsed.
- Arguments
shouldTraverse (function) – Specify additional elements to exclude by returning false
-
inlineTexts
(shouldTraverse)¶ Yield strings of text nodes within a normalized DOM node and its children, without venturing into any contained block elements.
- Arguments
shouldTraverse (function) – Specify additional elements to exclude by returning false
-
isBlock
(element)¶ Return whether a DOM element is a block element by default (rather than by styling).
-
isVisible
(fnodeOrElement)¶ Return whether an element is practically visible, considering things like 0 size or opacity,
visibility: hidden
andoverflow: hidden
.Merely being scrolled off the page in either horizontally or vertically doesn’t count as invisible; the result of this function is meant to be independent of viewport size.
- Throws
NoWindowError – The element (or perhaps one of its ancestors) is not in a window, so we can’t find the getComputedStyle() routine to call. That routine is the source of most of the information we use, so you should pick a different strategy for non-window contexts.
-
isWhitespace
(element)¶ Return whether an element is a text node that consist wholly of whitespace.
-
length
(iterable)¶ Return the number of items in an iterable, consuming it as a side effect.
-
linearScale
(number, zeroAt, oneAt)¶ Scale a number to the range [0, 1] using a linear slope.
For a rising line, the result is 0 until the input reaches zeroAt, then increases linearly until oneAt, at which it becomes 1. To make a falling line, where the result is 1 to the left and 0 to the right, use a zeroAt greater than oneAt.
-
linkDensity
(inlineLength)¶ Return the ratio of the inline text length of the links in an element to the inline text length of the entire element.
- Arguments
inlineLength (number) – Optionally, the precalculated inline length of the fnode. If omitted, we will calculate it ourselves.
-
max
(by)¶ Return the maximum item from an iterable, as defined by >.
Works with any type that works with >. If multiple items are equally great, return the first.
- Arguments
by (function) – Given an item of the iterable, returns a value to compare
-
maxes
(iterable, by)¶ Return an Array of maximum items from an iterable, as defined by > and ===.
If an empty iterable is passed in, return [].
-
min
(iterable, by)¶ Return the minimum item from an iterable, as defined by <.
If multiple items are equally great, return the first.
-
class
NiceSet
()¶ A Set with the additional methods it ought to have had
-
NiceSet.
extend
(otherSet)¶ Union another set or other iterable into myself.
- Returns
myself, for chaining
-
NiceSet.
minus
(otherSet)¶ Subtract another set from a copy of me.
- Returns
a copy of myself excluding the elements in
otherSet
.
-
NiceSet.
pop
()¶ Remove and return an arbitrary item. Throw an Error if I am empty.
-
NiceSet.
toString
()¶ Actually show the items in me.
-
-
numberOfMatches
(regex, haystack)¶ Return the number of times a regex occurs within the string haystack.
Caller must make sure regex has the ‘g’ option set.
-
page
(scoringFunction)¶ Wrap a scoring callback, and set its element to the page root iff a score is returned.
This is used to build rulesets which classify entire pages rather than picking out specific elements.
For example, these rules might classify a page as a “login page”, influenced by whether they have login buttons or username fields:
rule(type('loginPage'), score(page(pageContainsLoginButton))),
rule(type('loginPage'), score(page(pageContainsUsernameField)))
-
reversed
(array)¶ Return an backward iterator over an Array without reversing it in place.
-
rgbaFromString
(str)¶ Return the extracted [r, g, b, a] values from a string like “rgba(0, 5, 255, 0.8)”, and scale them to 0..1. If no alpha is specified, return undefined for it.
-
rootElement
(element)¶ Given any node in a DOM tree, return the root element of the tree, generally an HTML element.
-
saturation
(r, g, b)¶ Return the saturation 0..1 of a color defined by RGB values 0..1.
-
setDefault
(map, key, defaultMaker)¶ Get a key of a map, first setting it to a default value if it’s missing.
-
sigmoid
(x)¶ Return the sigmoid of the argument: 1 / (1 + exp(-x)). This is useful for crunching a feature value that may have a wide range into the range (0, 1) without a hard ceiling: the sigmoid of even a very large number will be a little larger than that of a slightly smaller one.
- Arguments
x (Number) – a number to be compressed into the range (0, 1)
-
sum
(iterable)¶ Return the sum of an iterable, as defined by the + operator.
-
toDomElement
(fnodeOrElement)¶ Return the DOM element contained in a passed-in fnode. Return passed-in DOM elements verbatim.
- Arguments
fnodeOrElement (Node|Fnode) –
-
toposort
(nodes, nodesThatNeed)¶ Return an Array, the reverse topological sort of the given nodes.
- Arguments
nodes – An iterable of arbitrary things
nodesThatNeed (function) – Take a node and returns an Array of nodes that depend on it
-
walk
(shouldTraverse)¶ Iterate, depth first, over a DOM node. Return the original node first.
- Arguments
shouldTraverse (function) – Given a node, say whether we should include it and its children. Default: always true.