What is Dogear?
Dogear is a library that implements bookmark tree merging for Firefox Sync. It takes two trees—a valid, consistent local tree, and a possibly inconsistent remote tree—and produces a complete merged tree, with all conflicts and inconsistencies resolved.
Dogear implements the merge algorithm only; it doesn't handle syncing, storage, or application on its own. It's up to your crate to:
- Persist local and remote bookmarks in a database.
- Build trees from those bookmarks.
- Update the local and remote trees to match the merged tree, and...
- Upload records for changed bookmarks.
Bookmark syncing
Bookmarks are one of the more complicated data types to sync. Locally, they form a hierarchy, where each bookmark lives in exactly one folder, and has a unique position within that folder.
On the server, the hierarchy is flattened into a collection of unordered, encrypted records. Folders keep pointers to their children, and children back to their parents. This means that some changes, like moving a bookmark or deleting a folder, need to upload multiple records in lockstep.
Conflict resolution
Clients must also handle merge conflicts. Most are easy to resolve, like adding a bookmark to a folder on one side, or even adding bookmarks to the same folder on different devices. Ditto for deletions and moves.
Conflicts where a bookmark is deleted on one side and changed on the other, or where a folder is deleted on one side and has new children on the other, are harder. When resolving conflicts, it's important to flag all affected records for upload, so that the remote tree remains consistent.
Dogear manages this complexity, so that clients don't have to implement their own merging logic.
Integrating Dogear
The cornerstone of Dogear is the Store
trait. A Store
implements methods for:
- Building local and remote trees.
- Fetching content info for matching bookmarks with similar contents.
- Applying the merged tree.
Exactly how this is done is up to your crate!
For example, Firefox Desktop stores local bookmarks in an SQLite database called Places, and remote bookmarks in an attached "mirror" database. During application, Desktop inserts the merged tree into a temporary, in-memory table, then uses triggers to update Places and stage outgoing items for upload.
The Rust Places library is similar to Desktop, but stores local and remote bookmarks in the same database.
However, nothing in Dogear is SQLite-specific. You can use a key-value store like LMDB, or even a JSON file. Also, while Dogear was developed specifically for Firefox Sync clients, you can use it to merge any two bookmark trees.
The second trait that you'll want to implement is Driver
. The driver lets your crate customize merging behavior, including:
- Fixing up invalid item GUIDs.
- Logging.
Dogear includes a default driver that rejects invalid GUIDs, and calls the log
crate's global logger. In Firefox Desktop, the merge driver posts log messages to the main thread, where they're sent to Sync's log manager. In Rust Places, the logger implementation sends logs to platform-specific logging backends on Android and iOS.
And that's it! Once you've implemented these two traits, you can use Store::merge_with_driver
to run the merge and collect telemetry.
In the next section, we'll take a closer look at how to implement a Store
.
Fetching local and remote trees
First, we need to teach our Store
to inflate trees from the storage backend. As you might expect, fetch_local_tree
builds the local tree, and fetch_remote_tree
builds the remote tree. The trees should be complete, so don't leave out orphans, missing children, or bookmarks with invalid GUIDs or URLs. That way, Dogear has a full picture of the state of the world.
Let's assume we've stored the following records. The current time is T = 11, A was downloaded during the last sync (at T = 6), and the menu and B during this sync:
{ "id": "menu", "type": "folder", "parentid": "places", "children": ["bookmarkAAAA", "bookmarkBBBB"], "modified": 10 }
{ "id": "bookmarkAAAA", "type": "bookmark", "parentid": "menu", "modified": 5 }
{ "id": "bookmarkBBBB", "type": "bookmark", "parentid": "menu", "modified": 10 }
We can build a tree from those records like this, starting with the root:
# #![allow(unused_variables)] #fn main() { # extern crate dogear; use dogear::{Item, Tree}; let mut builder = Tree::with_root(Item::root()); #}
For cases where we know the structure is valid, and the tree already contains the parent, we can use by_structure
to indicate that the parent's children
and child's parentid
match. This is how we build the local tree on Desktop. Notice that we also set the needs_merge
flag, to indicate that the menu has changes that we should merge:
# extern crate dogear; # use dogear::{Result, Tree}; use dogear::{Guid, Item, Kind, MENU_GUID, ROOT_GUID}; # fn main() -> Result<()> { let now_millis = 11; # let mut builder = Tree::with_root(Item::root()); let mut menu = Item::new(MENU_GUID.clone(), Kind::Folder); menu.age = now_millis - 10; menu.needs_merge = true; builder .item(menu)? .by_structure(&ROOT_GUID)?; # Ok(()) # }
For cases where the parentid
and children
might disagree, we can set parents from parentid
and children
separately, using by_parent_guid
and by_children
. This is equivalent to by_structure
, just slightly less efficient:
# extern crate dogear; # use dogear::{Item, Kind, Tree, Result, MENU_GUID, ROOT_GUID}; # fn main() -> Result<()> { # let now_millis = 11; # let mut builder = Tree::with_root(Item::root()); # let mut menu = Item::new(MENU_GUID.clone(), Kind::Folder); # builder.item(menu)?.by_structure(&ROOT_GUID)?; let mut a = Item::new("bookmarkAAAA".into(), Kind::Bookmark); a.age = now_millis - 5; builder .item(a)? .by_parent_guid(MENU_GUID.clone())?; builder.parent_for(&"bookmarkAAAA".into()) .by_children(&MENU_GUID)?; # Ok(()) # }
We can also insert an item without its parents, and set parents by parentid
and children
later, if they exist. This is also equivalent to the above:
# extern crate dogear; # use dogear::{Item, Kind, Tree, Result, MENU_GUID, ROOT_GUID}; # fn main() -> Result<()> { # let now_millis = 11; # let mut builder = Tree::with_root(Item::root()); # let mut menu = Item::new(MENU_GUID.clone(), Kind::Folder); # builder.item(menu)?.by_structure(&ROOT_GUID)?; let mut b = Item::new("bookmarkBBBB".into(), Kind::Bookmark); b.age = now_millis - 10; b.needs_merge = true; builder.item(b)?; builder.parent_for(&"bookmarkBBBB".into()) .by_parent_guid(MENU_GUID.clone())?; builder.parent_for(&"bookmarkBBBB".into()) .by_children(&MENU_GUID)?; # Ok(()) # }
Finally, let's build and print out our tree!
# extern crate dogear; # use dogear::{Item, Tree, Result}; use dogear::IntoTree; # fn main() -> Result<()> { # let mut builder = Tree::with_root(Item::root()); let tree = builder.into_tree()?; println!("{}", tree); # Ok(()) # }
And we see:
📂 root________ (Folder; Age = 0ms)
| 📂 menu________ (Folder; Age = 1ms; Unmerged)
| | 🔖 bookmarkAAAA (Bookmark; Age = 6ms)
| | 🔖 bookmarkBBBB (Bookmark; Age = 1ms; Unmerged)
Divergences
In the last section, the tree is totally consistent. Let's look at a case where we're not so lucky:
{ "id": "menu", "type": "folder", "parentid": "places", "children": ["bookmarkAAAA"], "modified": 5 }
{ "id": "toolbar", "type": "folder", "parentid": "places", "children": ["bookmarkBBBB", "bookmarkAAAA"], "modified": 10 }
{ "id": "unfiled", "type": "folder", "parentid": "places", "children": ["bookmarkEEEE"], "modified": 5 }
{ "id": "bookmarkAAAA", "type": "bookmark", "parentid": "folderCCCCCC", "modified": 5 }
{ "id": "bookmarkDDDD", "type": "bookmark", "parentid": "menu", "modified": 5 }
{ "id": "bookmarkEEEE", "type": "bookmark", "parentid": "unfiled", "modified": 10 }
What's going on here?
- Both the menu and toolbar claim A as a child, but A says its
parentid
is C, which doesn't exist. - The toolbar references B in its
children
, but B is nowhere to be found. B is a missing child. - D says its
parentid
is the menu—which exists—but the menu doesn't mention D in its children. - E is the only bookmark where its
parentid
matches its parent'schildren
.
Yikes!
Remember that, while we can enforce hierarchical invariants for the local tree, we can't guarantee anything about the remote tree. This is because the Firefox Sync storage server can't see the contents of bookmarks, and must rely on clients alone to upload consistent data.
Unfortunately, bugs and missing features in older clients caused them to miss changes, or not upload records at all. This leads to the structure divergences that we see here, like orphans, missing children, and parent-child disagreements. Newer Desktops shouldn't upload inconsistencies, but other platforms can, and many long-time Sync users have existing inconsistencies that confuse new clients when they sync for the first time.
Let's add this tree to Dogear:
# extern crate dogear; use dogear::{Guid, IntoTree, Item, Kind, Tree, MENU_GUID, ROOT_GUID, TOOLBAR_GUID, UNFILED_GUID}; # use dogear::Result; # fn main() -> Result<()> { let now_millis = 11; let mut builder = Tree::with_root(Item::root()); let mut menu = Item::new(MENU_GUID.clone(), Kind::Folder); menu.age = now_millis - 5; menu.needs_merge = true; builder .item(menu)? .by_structure(&ROOT_GUID)?; let mut toolbar = Item::new(TOOLBAR_GUID.clone(), Kind::Folder); toolbar.age = 0; toolbar.needs_merge = true; builder .item(toolbar)? .by_structure(&ROOT_GUID)?; let mut unfiled = Item::new(UNFILED_GUID.clone(), Kind::Folder); unfiled.age = now_millis - 5; unfiled.needs_merge = true; builder .item(unfiled)? .by_structure(&ROOT_GUID)?; let mut a = Item::new("bookmarkAAAA".into(), Kind::Bookmark); a.age = now_millis - 5; builder.item(a)?; let mut d = Item::new("bookmarkDDDD".into(), Kind::Bookmark); d.age = now_millis - 5; builder.item(d)?; let mut e = Item::new("bookmarkEEEE".into(), Kind::Bookmark); e.age = 0; builder.item(e)?; // A is mentioned in both the menu's and toolbar's `children`, and its // `parentid` is C. builder.parent_for(&"bookmarkAAAA".into()) .by_children(&MENU_GUID)? .parent_for(&"bookmarkAAAA".into()) .by_children(&TOOLBAR_GUID)? .parent_for(&"bookmarkAAAA".into()) .by_parent_guid("folderCCCCCC".into())?; // B is mentioned in the toolbar's `children`. builder.parent_for(&"bookmarkBBBB".into()) .by_children(&TOOLBAR_GUID)?; // D's `parentid` is the menu, even though it's not mentioned in the menu's // `children`. builder.parent_for(&"bookmarkDDDD".into()) .by_parent_guid(MENU_GUID.clone())?; // E's `parentid` is unfiled, and unfiled's `children` mention E. builder.parent_for(&"bookmarkEEEE".into()) .by_structure(&UNFILED_GUID)?; let tree = builder.into_tree()?; println!("{}", tree); # Ok(()) # }
When we print this tree, we see:
📂 root________ (Folder; Age = 0ms)
| 📂 menu________ (Folder; Age = 6ms; Unmerged)
| | ❗️🔖 bookmarkDDDD (Bookmark; Age = 6ms)
| 📂 toolbar_____ (Folder; Age = 0ms; Unmerged)
| | ❗️🔖 bookmarkAAAA (Bookmark; Age = 6ms)
| 📂 unfiled_____ (Folder; Age = 6ms; Unmerged)
| | 🔖 bookmarkEEEE (Bookmark; Age = 0ms)
When Dogear built this tree, it saw inconsistencies, and decided where to keep each item based on its age. It also marked the tree structure as diverged, indicated with a ❗️, so that the merger can flag it for reupload.
Content matching
Store::fetch_new_local_contents
and Store::fetch_new_remote_contents
provide information that the merger uses to dedupe new local items to incoming remote items with different GUIDs and similar contents. They return HashMap
s that map a candidate item's GUID to its content info.
Two items are a match if they have congruent parents and matching properties:
- Bookmarks and queries match if they have the same title and URL.
- Folders match if they have the same title.
- Separators match if they have the same position in their respective parents.
Here are three suggestions for implementing fetch_new_local_contents
and fetch_new_remote_contents
:
- Neither should return roots.
fetch_new_local_contents
should only return local items that haven't been uploaded before. In Firefox Desktop and Rust Places, these are all items with aNEW
orUNKNOWN
sync status.fetch_new_remote_contents
should only return remote items that have changed since the last sync, and don't exist locally. In Desktop and Rust Places, these are items withneedsMerge
set.
The merger still does the right thing if you don't follow these suggestions, but, depending on your storage backend, it may be more efficient if you do.
For example, you can return content info for all items in your store, if you want! This makes sense if all the content info is in memory already, or where filtering contents is more costly. On the other hand, if you're using a store like SQLite, it's more efficient to filter out contents that you know won't match in the query.
You can disable deduping entirely by returning empty HashMap
s from these methods.
Application
The last method that we need to implement is Store::apply
. This method takes a merged root and a set of deletions, and updates the local tree to match the merged tree.
Merging
Dogear's Merger
produces a complete, consistent merged tree from a local and remote bookmark tree. The merge algorithm examines each item, and resolves two kinds of changes:
- Structure changes to an item's parent or children, like adding or deleting an item, moving an item to a different folder, or reordering a folder's children.
- Value changes to an item's properties, like the title or URL.
Merge states
Each merged item has a merge state that describes how to resolve the change. The merge state determines:
- Which side—local or remote—to prefer when resolving a conflict.
- Whether or not to apply a remote change to the local tree.
- Whether or not to upload the merged item to the server.
There are seven merge states:
- Items that are unchanged (
MergeState::Unchanged
) on both sides don't need to be uploaded or applied. - Items that only exist remotely (
MergeState::RemoteOnly
), only changed remotely, or have newer remote changes (MergeState::Remote
), must be applied to the local tree. This overwrites any local changes to the item. - Items that only exist remotely (
MergeState::RemoteOnlyWithNewStructure
) or have newer remote changes (MergeState::RemoteWithNewStructure
), but conflict with local structure changes, must be applied to the local tree and reuploaded to the server. This resolves the conflict on both sides. - Items that only exist locally (
MergeState::LocalOnly
), only changed locally, or have newer local changes (MergeState::Local
), must be uploaded to the server. This overwrites remote changes to the item. SinceLocalOnly
andLocal
always imply upload, there are no correspondingWithNewStructure
states.
The merger starts at the roots of both trees, and recursively walks their children, eventually visiting every item.
Conflicts
Structure and value conflicts, where an item changes on both sides, are resolved using timestamps, picking the chronologically newer side. The merger handles conflicts at the item level, not the property level, using the needs_merge
flag.
For example, consider changing a bookmark's title locally, and the same bookmark's URL remotely. Both items have needs_merge = true
set.
- If the local title change is newer (
local_item.age < remote_item.age
), the URL change will be reverted on the server. - If the remote URL change is newer (
remote_item.age > local_item.age
), the title change will be reverted locally.
Since the only indication that an item changed is its needs_merge
flag, and there's no shared parent tree, Dogear can't know which fields changed, or if they're independent. In other words, the merger knows that the item changed, but not how. For this reason, the algorithm is a two-way merge, not a three-way merge.
This is a trade-off between simplicity and correctness: it removes a chunk of complexity from Dogear, and means that clients don't need to persist snapshots of the shared tree. However, it does mean that some conflicts will cause changes to revert on one side, which is a form of data loss.
Deletions
Conflicts where an item is deleted on one or both sides are handled specially:
- If a non-folder item is changed on one side and deleted on the other, we ignore the deletion, and revive the item.
- If a non-root folder is changed on one side and deleted on the other, we keep the folder deleted, and recursively move any new descendants that don't exist on the other side to the closest surviving parent folder.
- If a root folder is deleted on one side, we ignore the deletion and revive the root. This should never happen except in cases of corruption on the server.
If an item is deleted on both sides, there's no conflict, so we keep the deletion.
Non-syncable items
A non-syncable item is:
- Any item that's not a descendant of a user content root (menu, mobile, toolbar, or unfiled) on either side.
- A remote livemarks.
- A remote orphaned query.
If an item isn't syncable on either side—for example, a legacy organizer left pane query might be non-syncable locally, but orphaned and syncable remotely—it's deleted. Any syncable descendants of a non-syncable folder are relocated to the closest surviving parent folder.
Deduping
If a remote item with the same GUID doesn't exist locally, the merger first tries to find an item with similar contents. This is called deduping. An item is a candidate for deduping if:
- It's not a root.
- It hasn't synced before. As a consequence, items that are already on the server will never dedupe to one another, even if they match.
Invalid items
If an item on either side has an invalid GUID, the merger asks the Driver
to generate a new one.