What is Dogear?

Dogear is a library that implements bookmark tree merging for Firefox Sync. It takes two trees—a valid, consistent local tree, and a possibly inconsistent remote tree—and produces a complete merged tree, with all conflicts and inconsistencies resolved.

Dogear implements the merge algorithm only; it doesn't handle syncing, storage, or application on its own. It's up to your crate to:

  • Persist local and remote bookmarks in a database.
  • Build trees from those bookmarks.
  • Update the local and remote trees to match the merged tree, and...
  • Upload records for changed bookmarks.

Bookmark syncing

Bookmarks are one of the more complicated data types to sync. Locally, they form a hierarchy, where each bookmark lives in exactly one folder, and has a unique position within that folder.

On the server, the hierarchy is flattened into a collection of unordered, encrypted records. Folders keep pointers to their children, and children back to their parents. This means that some changes, like moving a bookmark or deleting a folder, need to upload multiple records in lockstep.

Conflict resolution

Clients must also handle merge conflicts. Most are easy to resolve, like adding a bookmark to a folder on one side, or even adding bookmarks to the same folder on different devices. Ditto for deletions and moves.

Conflicts where a bookmark is deleted on one side and changed on the other, or where a folder is deleted on one side and has new children on the other, are harder. When resolving conflicts, it's important to flag all affected records for upload, so that the remote tree remains consistent.

Dogear manages this complexity, so that clients don't have to implement their own merging logic.

Integrating Dogear

The cornerstone of Dogear is the Store trait. A Store implements methods for:

  • Building local and remote trees.
  • Fetching content info for matching bookmarks with similar contents.
  • Applying the merged tree.

Exactly how this is done is up to your crate!

For example, Firefox Desktop stores local bookmarks in an SQLite database called Places, and remote bookmarks in an attached "mirror" database. During application, Desktop inserts the merged tree into a temporary, in-memory table, then uses triggers to update Places and stage outgoing items for upload.

The Rust Places library is similar to Desktop, but stores local and remote bookmarks in the same database.

However, nothing in Dogear is SQLite-specific. You can use a key-value store like LMDB, or even a JSON file. Also, while Dogear was developed specifically for Firefox Sync clients, you can use it to merge any two bookmark trees.

The second trait that you'll want to implement is Driver. The driver lets your crate customize merging behavior, including:

  • Fixing up invalid item GUIDs.
  • Logging.

Dogear includes a default driver that rejects invalid GUIDs, and calls the log crate's global logger. In Firefox Desktop, the merge driver posts log messages to the main thread, where they're sent to Sync's log manager. In Rust Places, the logger implementation sends logs to platform-specific logging backends on Android and iOS.

And that's it! Once you've implemented these two traits, you can use Store::merge_with_driver to run the merge and collect telemetry.

In the next section, we'll take a closer look at how to implement a Store.

Fetching local and remote trees

First, we need to teach our Store to inflate trees from the storage backend. As you might expect, fetch_local_tree builds the local tree, and fetch_remote_tree builds the remote tree. The trees should be complete, so don't leave out orphans, missing children, or bookmarks with invalid GUIDs or URLs. That way, Dogear has a full picture of the state of the world.

Let's assume we've stored the following records. The current time is T = 11, A was downloaded during the last sync (at T = 6), and the menu and B during this sync:

{ "id": "menu", "type": "folder", "parentid": "places", "children": ["bookmarkAAAA", "bookmarkBBBB"], "modified": 10 }
{ "id": "bookmarkAAAA", "type": "bookmark", "parentid": "menu", "modified": 5 }
{ "id": "bookmarkBBBB", "type": "bookmark", "parentid": "menu", "modified": 10 }

We can build a tree from those records like this, starting with the root:


# #![allow(unused_variables)]
#fn main() {
# extern crate dogear;
use dogear::{Item, Tree};
let mut builder = Tree::with_root(Item::root());
#}

For cases where we know the structure is valid, and the tree already contains the parent, we can use by_structure to indicate that the parent's children and child's parentid match. This is how we build the local tree on Desktop. Notice that we also set the needs_merge flag, to indicate that the menu has changes that we should merge:

# extern crate dogear;
# use dogear::{Result, Tree};
use dogear::{Guid, Item, Kind, MENU_GUID, ROOT_GUID};

# fn main() -> Result<()> {
let now_millis = 11;

# let mut builder = Tree::with_root(Item::root());
let mut menu = Item::new(MENU_GUID.clone(), Kind::Folder);
menu.age = now_millis - 10;
menu.needs_merge = true;

builder
    .item(menu)?
    .by_structure(&ROOT_GUID)?;
# Ok(())
# }

For cases where the parentid and children might disagree, we can set parents from parentid and children separately, using by_parent_guid and by_children. This is equivalent to by_structure, just slightly less efficient:

# extern crate dogear;
# use dogear::{Item, Kind, Tree, Result, MENU_GUID, ROOT_GUID};
# fn main() -> Result<()> {
# let now_millis = 11;
# let mut builder = Tree::with_root(Item::root());
# let mut menu = Item::new(MENU_GUID.clone(), Kind::Folder);
# builder.item(menu)?.by_structure(&ROOT_GUID)?;
let mut a = Item::new("bookmarkAAAA".into(), Kind::Bookmark);
a.age = now_millis - 5;

builder
    .item(a)?
    .by_parent_guid(MENU_GUID.clone())?;
builder.parent_for(&"bookmarkAAAA".into())
    .by_children(&MENU_GUID)?;
# Ok(())
# }

We can also insert an item without its parents, and set parents by parentid and children later, if they exist. This is also equivalent to the above:

# extern crate dogear;
# use dogear::{Item, Kind, Tree, Result, MENU_GUID, ROOT_GUID};
# fn main() -> Result<()> {
# let now_millis = 11;
# let mut builder = Tree::with_root(Item::root());
# let mut menu = Item::new(MENU_GUID.clone(), Kind::Folder);
# builder.item(menu)?.by_structure(&ROOT_GUID)?;
let mut b = Item::new("bookmarkBBBB".into(), Kind::Bookmark);
b.age = now_millis - 10;
b.needs_merge = true;

builder.item(b)?;
builder.parent_for(&"bookmarkBBBB".into())
    .by_parent_guid(MENU_GUID.clone())?;
builder.parent_for(&"bookmarkBBBB".into())
    .by_children(&MENU_GUID)?;
# Ok(())
# }

Finally, let's build and print out our tree!

# extern crate dogear;
# use dogear::{Item, Tree, Result};
use dogear::IntoTree;
# fn main() -> Result<()> {
# let mut builder = Tree::with_root(Item::root());
let tree = builder.into_tree()?;
println!("{}", tree);
# Ok(())
# }

And we see:

📂 root________ (Folder; Age = 0ms)
| 📂 menu________ (Folder; Age = 1ms; Unmerged)
| | 🔖 bookmarkAAAA (Bookmark; Age = 6ms)
| | 🔖 bookmarkBBBB (Bookmark; Age = 1ms; Unmerged)

Divergences

In the last section, the tree is totally consistent. Let's look at a case where we're not so lucky:

{ "id": "menu", "type": "folder", "parentid": "places", "children": ["bookmarkAAAA"], "modified": 5 }
{ "id": "toolbar", "type": "folder", "parentid": "places", "children": ["bookmarkBBBB", "bookmarkAAAA"], "modified": 10 }
{ "id": "unfiled", "type": "folder", "parentid": "places", "children": ["bookmarkEEEE"], "modified": 5 }
{ "id": "bookmarkAAAA", "type": "bookmark", "parentid": "folderCCCCCC", "modified": 5 }
{ "id": "bookmarkDDDD", "type": "bookmark", "parentid": "menu", "modified": 5 }
{ "id": "bookmarkEEEE", "type": "bookmark", "parentid": "unfiled", "modified": 10 }

What's going on here?

  • Both the menu and toolbar claim A as a child, but A says its parentid is C, which doesn't exist.
  • The toolbar references B in its children, but B is nowhere to be found. B is a missing child.
  • D says its parentid is the menu—which exists—but the menu doesn't mention D in its children.
  • E is the only bookmark where its parentid matches its parent's children.

Yikes!

Remember that, while we can enforce hierarchical invariants for the local tree, we can't guarantee anything about the remote tree. This is because the Firefox Sync storage server can't see the contents of bookmarks, and must rely on clients alone to upload consistent data.

Unfortunately, bugs and missing features in older clients caused them to miss changes, or not upload records at all. This leads to the structure divergences that we see here, like orphans, missing children, and parent-child disagreements. Newer Desktops shouldn't upload inconsistencies, but other platforms can, and many long-time Sync users have existing inconsistencies that confuse new clients when they sync for the first time.

Let's add this tree to Dogear:

# extern crate dogear;
use dogear::{Guid, IntoTree, Item, Kind, Tree, MENU_GUID, ROOT_GUID, TOOLBAR_GUID, UNFILED_GUID};
# use dogear::Result;
# fn main() -> Result<()> {
let now_millis = 11;

let mut builder = Tree::with_root(Item::root());

let mut menu = Item::new(MENU_GUID.clone(), Kind::Folder);
menu.age = now_millis - 5;
menu.needs_merge = true;
builder
    .item(menu)?
    .by_structure(&ROOT_GUID)?;

let mut toolbar = Item::new(TOOLBAR_GUID.clone(), Kind::Folder);
toolbar.age = 0;
toolbar.needs_merge = true;
builder
    .item(toolbar)?
    .by_structure(&ROOT_GUID)?;

let mut unfiled = Item::new(UNFILED_GUID.clone(), Kind::Folder);
unfiled.age = now_millis - 5;
unfiled.needs_merge = true;
builder
    .item(unfiled)?
    .by_structure(&ROOT_GUID)?;

let mut a = Item::new("bookmarkAAAA".into(), Kind::Bookmark);
a.age = now_millis - 5;
builder.item(a)?;

let mut d = Item::new("bookmarkDDDD".into(), Kind::Bookmark);
d.age = now_millis - 5;
builder.item(d)?;

let mut e = Item::new("bookmarkEEEE".into(), Kind::Bookmark);
e.age = 0;
builder.item(e)?;

// A is mentioned in both the menu's and toolbar's `children`, and its
// `parentid` is C.
builder.parent_for(&"bookmarkAAAA".into())
    .by_children(&MENU_GUID)?
    .parent_for(&"bookmarkAAAA".into())
    .by_children(&TOOLBAR_GUID)?
    .parent_for(&"bookmarkAAAA".into())
    .by_parent_guid("folderCCCCCC".into())?;

// B is mentioned in the toolbar's `children`.
builder.parent_for(&"bookmarkBBBB".into())
    .by_children(&TOOLBAR_GUID)?;

// D's `parentid` is the menu, even though it's not mentioned in the menu's
// `children`.
builder.parent_for(&"bookmarkDDDD".into())
    .by_parent_guid(MENU_GUID.clone())?;

// E's `parentid` is unfiled, and unfiled's `children` mention E.
builder.parent_for(&"bookmarkEEEE".into())
    .by_structure(&UNFILED_GUID)?;

let tree = builder.into_tree()?;
println!("{}", tree);
# Ok(())
# }

When we print this tree, we see:

📂 root________ (Folder; Age = 0ms)
| 📂 menu________ (Folder; Age = 6ms; Unmerged)
| | ❗️🔖 bookmarkDDDD (Bookmark; Age = 6ms)
| 📂 toolbar_____ (Folder; Age = 0ms; Unmerged)
| | ❗️🔖 bookmarkAAAA (Bookmark; Age = 6ms)
| 📂 unfiled_____ (Folder; Age = 6ms; Unmerged)
| | 🔖 bookmarkEEEE (Bookmark; Age = 0ms)

When Dogear built this tree, it saw inconsistencies, and decided where to keep each item based on its age. It also marked the tree structure as diverged, indicated with a ❗️, so that the merger can flag it for reupload.

Content matching

Store::fetch_new_local_contents and Store::fetch_new_remote_contents provide information that the merger uses to dedupe new local items to incoming remote items with different GUIDs and similar contents. They return HashMaps that map a candidate item's GUID to its content info.

Two items are a match if they have congruent parents and matching properties:

  • Bookmarks and queries match if they have the same title and URL.
  • Folders match if they have the same title.
  • Separators match if they have the same position in their respective parents.

Here are three suggestions for implementing fetch_new_local_contents and fetch_new_remote_contents:

  • Neither should return roots.
  • fetch_new_local_contents should only return local items that haven't been uploaded before. In Firefox Desktop and Rust Places, these are all items with a NEW or UNKNOWN sync status.
  • fetch_new_remote_contents should only return remote items that have changed since the last sync, and don't exist locally. In Desktop and Rust Places, these are items with needsMerge set.

The merger still does the right thing if you don't follow these suggestions, but, depending on your storage backend, it may be more efficient if you do.

For example, you can return content info for all items in your store, if you want! This makes sense if all the content info is in memory already, or where filtering contents is more costly. On the other hand, if you're using a store like SQLite, it's more efficient to filter out contents that you know won't match in the query.

You can disable deduping entirely by returning empty HashMaps from these methods.

Application

The last method that we need to implement is Store::apply. This method takes a merged root and a set of deletions, and updates the local tree to match the merged tree.

Merging

Dogear's Merger produces a complete, consistent merged tree from a local and remote bookmark tree. The merge algorithm examines each item, and resolves two kinds of changes:

  • Structure changes to an item's parent or children, like adding or deleting an item, moving an item to a different folder, or reordering a folder's children.
  • Value changes to an item's properties, like the title or URL.

Merge states

Each merged item has a merge state that describes how to resolve the change. The merge state determines:

  • Which side—local or remote—to prefer when resolving a conflict.
  • Whether or not to apply a remote change to the local tree.
  • Whether or not to upload the merged item to the server.

There are seven merge states:

  • Items that are unchanged (MergeState::Unchanged) on both sides don't need to be uploaded or applied.
  • Items that only exist remotely (MergeState::RemoteOnly), only changed remotely, or have newer remote changes (MergeState::Remote), must be applied to the local tree. This overwrites any local changes to the item.
  • Items that only exist remotely (MergeState::RemoteOnlyWithNewStructure) or have newer remote changes (MergeState::RemoteWithNewStructure), but conflict with local structure changes, must be applied to the local tree and reuploaded to the server. This resolves the conflict on both sides.
  • Items that only exist locally (MergeState::LocalOnly), only changed locally, or have newer local changes (MergeState::Local), must be uploaded to the server. This overwrites remote changes to the item. Since LocalOnly and Local always imply upload, there are no corresponding WithNewStructure states.

The merger starts at the roots of both trees, and recursively walks their children, eventually visiting every item.

Conflicts

Structure and value conflicts, where an item changes on both sides, are resolved using timestamps, picking the chronologically newer side. The merger handles conflicts at the item level, not the property level, using the needs_merge flag.

For example, consider changing a bookmark's title locally, and the same bookmark's URL remotely. Both items have needs_merge = true set.

  • If the local title change is newer (local_item.age < remote_item.age), the URL change will be reverted on the server.
  • If the remote URL change is newer (remote_item.age > local_item.age), the title change will be reverted locally.

Since the only indication that an item changed is its needs_merge flag, and there's no shared parent tree, Dogear can't know which fields changed, or if they're independent. In other words, the merger knows that the item changed, but not how. For this reason, the algorithm is a two-way merge, not a three-way merge.

This is a trade-off between simplicity and correctness: it removes a chunk of complexity from Dogear, and means that clients don't need to persist snapshots of the shared tree. However, it does mean that some conflicts will cause changes to revert on one side, which is a form of data loss.

Deletions

Conflicts where an item is deleted on one or both sides are handled specially:

  • If a non-folder item is changed on one side and deleted on the other, we ignore the deletion, and revive the item.
  • If a non-root folder is changed on one side and deleted on the other, we keep the folder deleted, and recursively move any new descendants that don't exist on the other side to the closest surviving parent folder.
  • If a root folder is deleted on one side, we ignore the deletion and revive the root. This should never happen except in cases of corruption on the server.

If an item is deleted on both sides, there's no conflict, so we keep the deletion.

Non-syncable items

A non-syncable item is:

  • Any item that's not a descendant of a user content root (menu, mobile, toolbar, or unfiled) on either side.
  • A remote livemarks.
  • A remote orphaned query.

If an item isn't syncable on either side—for example, a legacy organizer left pane query might be non-syncable locally, but orphaned and syncable remotely—it's deleted. Any syncable descendants of a non-syncable folder are relocated to the closest surviving parent folder.

Deduping

If a remote item with the same GUID doesn't exist locally, the merger first tries to find an item with similar contents. This is called deduping. An item is a candidate for deduping if:

  • It's not a root.
  • It hasn't synced before. As a consequence, items that are already on the server will never dedupe to one another, even if they match.

Invalid items

If an item on either side has an invalid GUID, the merger asks the Driver to generate a new one.