======== Training ======== Training is the process by which Fathom combines your handwritten rules with your labeled example pages to create the most accurate possible recognizer. Training emits a set of numerical parameters: * One *coefficient* per rule, which indicates the rule's relative weight * One *bias* per type, which centers element's scores so they can be construed as 0..1 confidences Running the Trainer =================== .. note:: Fathom has had several trainers over its evolution. Both the Corpus Framework and the trainer built into old versions of FathomFox are obsoleted by :doc:`fathom train`, described herein. Once your samples are collected and at least several rules are written, you're ready to do some initial training. Training is done for one type at a time. If you have types that depend on other types (an advanced case), train the other types first. Run the trainer. A simple beginning, using just a training set, is... :: fathom train samples/training --ruleset rulesets.js --trainee yourTraineeId ...yielding something like... :: {"coeffs": [ ['nextAnchorIsJavaScript', 1.1627885103225708], ['nextButtonTypeSubmit', 4.613410949707031], ['nextInputTypeSubmit', 4.374269008636475], ['nextInputTypeImage', 6.867544174194336], ['nextLoginAttrs', 0.07278082519769669], ['nextButtonContentContainsLogIn', -0.6560719609260559], ], "bias": -3.9029786586761475} Training precision: 0.9834 Recall: 1.0000 Predicted Accuracy: 0.9889 95% CI: (0.9780, 0.9997) ╭───┬── + ───┬── - ───╮ FPR: 0.0328 95% CI: (0.0012, 0.0644) True │ + │ 237 │ 0 │ FNR: 0.0000 95% CI: (0.0000, 0.0000) │ - │ 4 │ 118 │ MCC: 0.9916 ╰───┴────────┴────────╯ Time per page (ms): 2 |▁▃█▅▂▁ | 34 Average per tag: 8 Training per-tag results: AR_534.html `_) are defined in the standard way and are provided for people familiar with them. * MCC (`Matthews Correlation Coefficient `_) tries to mix down all sources of error into one number. It's best to look at precision and recall instead, as they are usually not equally important. However, if you need a single number to roughly sort a bunch of candidate models, MCC is as good a choice as any. It ranges from -1 (getting exactly the wrong predictions all the time) through 0 (predictions having no correlation with the truth) to 1 (a perfect predictor). * All of these statistics (and others, if you like) can be computed from the raw `confusion matrix `_, contained in the bordered box to the right. It shows you raw numbers of false positives, false negatives, true positives, and true negatives. There are also speed histograms:: Time per page (ms): 2 | ▃█▃▁▁ | 35 Average per tag: 11 These show how much time Fathom is taking per page and per tag. The horizontal axis is milliseconds, and the vertical is page count. The histograms vary more from run to run than the other (convergent) statistics, and, of course, the absolute numbers change based on the speed of the machine. What you should look out for is the sudden appearance of large bars to the far right (indicating many slow outliers) or a drastic increase in the numbers (indicating you slowed things down across the board). Workflow ======== A sane authoring process is a feedback loop something like this: #. Collect samples. Observe patterns in the :term:`target` nodes as you do. #. Write a few rules based on your observations. #. Run the trainer. Start with 10-20 training pages and an equal number of validation ones. #. Examine *training* precision and recall. (See :ref:`Evaluating Metrics `.) If they are insufficient, examine the failing training pages. The trainer will point these out on the commandline, but FathomFox's Evaluator will help you track down ones that are hard to distinguish from their tag excerpts. Remediate by changing or adding rules. If there are signals Fathom is missing—positive or negative—add rules that score based on them. You'll probably also need to do some :doc:`debugging`. #. Go back to step 3. #. Once *validation* precision and recall are sufficient, use :doc:`fathom test` on a fresh set of *testing* samples. These are your *testing metrics* and should reflect real-world performance, assuming your sample size is large and representative enough. The computed 95% confidence intervals should help you decide the former. #. If testing precision and recall are too low, imbibe the testing pages into your training set, and go back to step 3. As typical in supervised learning systems, testing samples should be considered "burned" once they are measured against a single time, as otherwise you are effectively training against them. Samples are precious. #. If testing precision and recall are sufficient, you're done! Paste the trained coefficients and biases into your ruleset, paste your ruleset into your application, and ship it.