Application Services Rust Components
Application Services is collection of Rust Components. The components are used to enable Firefox, and related applications to integrate with Firefox accounts, sync and enable experimentation. Each component is built using a core of shared code written in Rust, wrapped with native language bindings for different platforms.
Contact us
To contact the Application Services team you can:
- Find us in the chat #rust-components:mozilla.org (How to connect)
- To report issues with sync on Firefox Desktop, file a bug in Bugzilla for Firefox :: Sync
- To report issues with our components, file an issue in the GitHub issue tracker
The source code is available on GitHub.
License
The Application Services Source Code is subject to the terms of the Mozilla Public License v2.0. You can obtain a copy of the MPL at https://mozilla.org/MPL/2.0/.
Contributing to Application Services
Anyone is welcome to help with the Application Services project. Feel free to get in touch with other community members on Matrix or through issues on GitHub.
Participation in this project is governed by the Mozilla Community Participation Guidelines.
Bug Reports
You can file issues on GitHub. Please try to include as much information as you can and under what conditions you saw the issue.
Building the project
Build instructions are available in the building
page. Please let us know if you encounter any pain-points setting up your environment.
Finding issues
Below are a few different queries you can use to find appropriate issues to work on. Feel free to reach out if you need any additional clarification before picking up an issue.
- good first issues - If you are a new contributor, search for issues labeled
good-first-issue
- good second issues - Once you've got that first PR approved and you are looking for something a little more challenging, we are keeping a list of next-level issues. Search for the
good-second-issue
label. - papercuts - A collection of smaller sized issues that may be a bit more advanced than a first or second issue.
- important, but not urgent - For more advanced contributors, we have a collection of issues that we consider important and would like to resolve sooner, but work isn't currently prioritized by the core team.
Sending Pull Requests
Patches should be submitted as pull requests (PRs).
When submitting PRs, We expect external contributors to push patches to a fork of
application-services
. For more information about submitting PRs from forks, read GitHub's guide.
Before submitting a PR:
- Your patch should include new tests that cover your changes, or be accompanied by explanation for why it doesn't need any. It is your and your reviewer's responsibility to ensure your patch includes adequate tests.
- Consult the testing guide for some tips on writing effective tests.
- Your code should pass all the automated tests before you submit your PR for review.
- Before pushing your changes, run
./automation/tests.py changes
. The script will calculate which components were changed and run test suites, linters and formatters against those components. Because the script runs a limited set of tests, the script should execute in a fairly reasonable amount of time.- If you have modified any Swift code, also run
swiftformat --swiftversion 5
on the modified code.
- If you have modified any Swift code, also run
- Before pushing your changes, run
- Your patch should include a changelog entry in CHANGELOG.md or an explanation of why it does not need one. Any breaking changes to Swift or Kotlin binding APIs should be noted explicitly.
- If your patch adds new dependencies, they must follow our dependency management guidelines. Please include a summary of the due diligence applied in selecting new dependencies.
- After you open a PR, our Continuous Integration system will run a full test suite. It's possible that this step will result in errors not caught with the script so make sure to check the results.
- "Work in progress" pull requests are welcome, but should be clearly labeled as such and should not be merged until all tests pass and the code has been reviewed.
- You can label pull requests as "Work in progress" by using the Github PR UI to indicate this PR is a draft (learn more about draft PRs).
When submitting a PR:
- You agree to license your code under the project's open source license (MPL 2.0).
- Base your branch off the current
main
branch. - Add both your code and new tests if relevant.
- Please do not include merge commits in pull requests; include only commits with the new relevant code.
- We encourage you to GPG sign your commits.
Code Review
This project is production Mozilla code and subject to our engineering practices and quality standards. Every patch must be peer reviewed by a member of the Application Services team.
Building Application Services
When working on Application Services, it's important to set up your environment for building the Rust code and the Android or iOS code needed by the application.
First time builds
Building for the first time is more complicated than a typical Rust project. To build for an end-to-end experience that enables you to test changes in client applications like Firefox for Android (Fenix) and Firefox iOS, there are a number of build systems required for all the dependencies. The initial setup is likely to take a number of hours to complete.
Building the Rust Components
Complete this section before moving to the android/iOS build instructions.
- Make sure you cloned the repository:
$ git clone https://github.com/mozilla/application-services # (or use the ssh link)
$ cd application-services
$ git submodule update --init --recursive
-
Install Rust: install via rustup
-
Install your system dependencies:
Linux
-
Install the system dependencies required for building NSS
- Install gyp:
apt install gyp
(required for NSS) - Install ninja-build:
apt install ninja-build
- Install python3 (at least 3.6):
apt install python3
- Install zlib:
apt install zlib1g-dev
- Install perl (needed to build openssl):
apt install perl
- Install patch (to build the libs):
apt install patch
- Install gyp:
-
Install the system dependencies required for SQLcipher
- Install tcl:
apt install tclsh
(required for SQLcipher)
- Install tcl:
-
Install the system dependencies required for bindgen
- Install libclang:
apt install libclang-dev
- Install libclang:
MacOS
- Install Xcode: check the ci config for the correct version.
- Install Xcode tools:
xcode-select --install
- Install homebrew via its installation instructions (it's what we use for ci).
- Install the system dependencies required for building NSS:
- Install ninja and python:
brew install ninja python
- Make sure
which python3
maps to the freshly installed homebrew python.- If it isn't, add the following to your bash/zsh profile and
source
the profile before continuing:alias python3=$(brew --prefix)/bin/python3
- Ensure
python
maps to the same Python version. You may have to create a symlink:PYPATH=$(which python3); ln -s $PYPATH `dirname $PYPATH`/python
- If it isn't, add the following to your bash/zsh profile and
- Install gyp:
wget https://bootstrap.pypa.io/ez_setup.py -O - | python3 - git clone https://chromium.googlesource.com/external/gyp.git ~/tools/gyp cd ~/tools/gyp python3 setup.py install
- Add
~/tools/gyp
to your path:export PATH="~/tools/gyp:$PATH"
- If you have additional questions, consult this guide.
- Add
- Make sure your homebrew python's bin folder is on your path by updating your bash/zsh profile with the following:
export PATH="$PATH:$(brew --prefix)/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/bin"
- Install ninja and python:
Windows
Install windows build tools
Why Windows Subsystem for Linux (WSL)?
It's currently tricky to get some of these builds working on Windows, primarily due to our use of SQLcipher. By using WSL it is possible to get builds working, but still have them published to your "native" local maven cache so it's available for use by a "native" Android Studio.
- Install WSL (recommended over native tooling)
- Install unzip:
sudo apt install unzip
- Install python3:
sudo apt install python3
Note: must be python 3.6 or later - Install system build tools:
sudo apt install build-essential
- Install zlib:
sudo apt-get install zlib1g-dev
- Install tcl:
sudo apt install tcl-dev
-
-
Check dependencies and environment variables by running:
./libs/verify-desktop-environment.sh
Note that this script might instruct you to set some environment variables, set those by adding them to your
.zshrc
or.bashrc
so they are set by default on your terminal. If it does so instruct you, you must run the command again after setting them so the libraries are built.
- Run cargo test:
cargo test
Once you have successfully run ./libs/verify-desktop-environment.sh
and cargo test
you can move to the Building for Fenix and Building for iOS sections below to setup your local environment for testing with our client applications.
Building for Fenix
The following instructions assume that you are building application-services
for Fenix, and want to take advantage of the
Fenix Auto-publication workflow for android-components and application-services.
- Install Android SDK, JAVA, NDK and set required env vars
- Clone the firefox-android repository (not inside the Application Service repository).
- Install Java 17 for your system
- Set
JAVA_HOME
to point to the JDK 17 installation directory. - Download and install Android Studio.
- Set
ANDROID_SDK_ROOT
andANDROID_HOME
to the Android Studio sdk location and add it to your rc file (either.zshrc
or.bashrc
depending on the shell you use for your terminal). - Configure the required versions of NDK
Configure menu > System Settings > Android SDK > SDK Tools > NDK > Show Package Details > NDK (Side by side)
- 26.2.11394342 (required by Fenix; note: a specific NDK version isn't configured, this maps to default NDK version for the AGP version)
- 27.0.12077973 (required by Application Services, as configured)
- If you are on Windows using WSL - drop to the section below, Windows setup for Android (WSL) before proceeding.
- Check dependencies, environment variables
- Run
./libs/verify-android-environment.sh
- Follow instructions and rerun until it is successful.
- Run
Windows setup for Android (via WSL)
Note: For non-Ubuntu linux versions, it may be necessary to execute $ANDROID_HOME/tools/bin/sdkmanager "build-tools;26.0.2" "platform-tools" "platforms;android-26" "tools"
. See also this gist for additional information.
Configure Maven
Configure maven to use the native windows maven repository - then, when doing ./gradlew install
from WSL, it ends up in the Windows maven repo. This means we can do a number of things with Android Studio in "native" windows and have then work correctly with stuff we built in WSL.
- Install maven:
sudo apt install maven
- Confirm existence of (or create) a
~/.m2
folder - In the
~/.m2
create a file calledsettings.xml
- Add the content below replacing
{username}
with your username:
<settings>
<localRepository>/mnt/c/Users/{username}/.m2/repository</localRepository>
</settings>
Building for Firefox iOS
- Install xcpretty:
gem install xcpretty
- Run
./libs/verify-ios-environment.sh
to check your setup and environment variables. - Make any corrections recommended by the script and re-run.
- Next, run
./megazords/ios-rust/build-xcframework.sh
to build all the binaries needed to consume a-s in iOS
Once the script passes, you should be able to run the Xcode project.
Note: The built Xcode project is located at
megazords/ios-rust/MozillaTestServices.xcodeproj
.
Note: This is mainly for testing the rust components, the artifact generated in the above steps should be all you need for building application with application-services
Locally building Firefox iOS against a local Application Services
Detailed steps to build Firefox iOS against a local application services can be found this document
Using locally published components in Fenix
It's often important to test work-in-progress changes to Application Services components against a real-world consumer project. The most reliable method of performing such testing is to publish your components to a local Maven repository, and adjust the consuming project to install them from there.
With support from the upstream project, it's possible to do this in a single step using our auto-publishing workflow.
rust.targets
Both the auto-publishing and manual workflows can be sped up significantly by
using the rust.targets
property which limits which architectures the Rust
code gets build against. You can set this property by creating/editing the
local.properties
file in the repository root and adding a line like
rust.targets=x86,linux-x86-64
. The trick is knowing which targets to put in
that comma separated list:
- Use
x86
for running the app on most emulators (in rare cases, when you have a 64-bit emulator, you'll wantx86_64
) - If you're running the
android-components
orfenix
unit tests, then you'll need the architecture of your machine:- OSX running Intel chips:
darwin-x86-64
- OSX running M1 chips:
darwin-aarch64
- Linux:
linux-x86-64
- OSX running Intel chips:
Using the auto-publishing workflow
mozilla-central has support for automatically publishing and including a local development version of application-services in the build. This is supported for most of the Android targets available in mozilla-central including Fenix - this doc will focus on Fenix, but the same general process is used for all. The workflow is:
-
Ensure you have a regular build of Fenix working from mozilla-central and that you've done a
./mach build
-
Ensure you have a regular build of application-services working.
-
Edit (or create) the file
local.properties
- this can be in the root of the mozilla-central checkout, or in the project specific directory (eg,mobile/android/fenix
) and tell it where to find your local checkout of application-services, by adding a line like:autoPublish.application-services.dir=path/to/your/checkout/of/application-services
Note that the path can be absolute or relative from
local.properties
. For example, ifapplication-services
andmozilla-central
are at the same level, and you are using alocal.properties
in the root of mozilla-central, the relative path would be../application-services
-
Build your target normally - eg, in Android Studio. or using
gradle
If all goes well, this should automatically build your checkout of application-services
, publish it
to a local maven repository, and configure the consuming project to install it from there instead of
from our published releases.
Using Windows/WSL
Good luck! This implies you are also building mozilla-central in a Windows/WSL environment; please contribute docs if you got this working.
However, there's an excellent chance that you will need to execute
./automation/publish_to_maven_local_if_modified.py
from your local application-services
root.
Caveats
- This assumes you are able to build both Fenix and application-services directly before following any of these instructions.
- Make sure you're fully up to date in all repos, unless you know you need to not be.
- Contact us if you get stuck.
How to locally test Swift Package Manager components on Firefox iOS
This is a guide on testing the Swift Package Manager component locally against a local build of Firefox iOS. For more information on our Swift Package Manager design, read the ADR that introduced it
This guide assumes the component you want to test is already distributed with the
rust-components-swift
repository, you can read the guide for adding a new component if you would like to distribute a new component.
The goal for this document is to be able to build a local firefox iOS against a local application-services. On a high level, that requires the following:
- Build an xcframework in a local checkout of
application-services
- Include the xcframework in a local checkout of
rust-components-swift
- Run the
generate
script inrust-components-swift
using a local checkout ofapplication-services
- Include the local checkout of
rust-components-swift
infirefox-ios
Prerequisites:
- A local checkout of
firefox-ios
that is ready to build - A local checkout of
rust-components-swift
- A local checkout of
application-services
that is ready to build for iOS
Using the automated flow
For convenience, there is a script that will do all the necessary steps to configure your local firefox-ios
build with a local application-services
repository. You do not need to do the manual steps if you follow those steps.
-
Run the following to execute the script, the example below assumes all of
firefox-ios
,rust-components-swift
andapplication-services
are in the same directory. Adjust the paths according to where they are on your filesystem.$ cd firefox-ios # This is your local checkout of firefox-ios $ ./rust_components_local.sh -a ../application-services ../rust-components-swift
-
Using Xcode, open
Client.xcodeproj
infirefox-ios
-
Then, make sure to reset packages cache in Xcode. This forces Xcode to remove any previously cached versions of the Rust components.
- You can reset package caches by going to
File -> Packages -> Reset Package Caches
- You can reset package caches by going to
-
If this is not the first time you run the script, make sure to also update package versions. This forces Xcode to pull the latest changes in the
rust-components-swift
branch.- You can update package versions by going to
File -> Packages -> Update To Latest Package Versions
- If this step fails, it's possible that the
Reset Package Caches
step above left some cruft behind. You can force this step by manually removing~/Library/Caches/org.swift.swiftpm
and~/Library/Developer/Xcode/DerivedData/Client-{some-long-string}
- You can update package versions by going to
-
Once the above steps are done, attempt building firefox ios. If you face problems, feel free to contact us
Disabling local development
The easiest way to disable local development is to simply revert any changes to firefox-ios/Client.xcodeproj/project.pbxproj
.
However, if there are other changes to the file that you would like to preserve, you can use the same script. To use the same script, you will need to:
- Know what version of
rust-components-swift
was used beforehand. You can find this by checking the git diff onfirefox-ios/Client.xcodeproj/project.pbxproj
. - Run:
$ ./rust_components_local.sh --disable <VERSION> ../rust-components-swift
- Then, make sure to reset packages cache in Xcode. This forces Xcode to remove any previously cached versions of the Rust components.
- You can reset package caches by going to
File -> Packages -> Reset Package Caches
- You can reset package caches by going to
If you happen to change branches in
rust-components-swift
, you will need to disable then re-enable local development. The script is not currently smart enough to switch branches. Alternatively, keep the branch inrust-components-swift
the same.rust-components-swift
serves only as a release surface so there is little use to switching branches and pushing changes to it, unless you are changing something related to the release process.
Using the manual flow
It's important to note the automated flow runs through all the necessary steps in a script, so if possible use the script as it's a tedious manual process
However, if the script is failing or you would like to run the manual process for any other reason follow the following steps.
Building the xcframework
To build the xcframework do the following:
- In your local checkout of
application-services
, navigate tomegazords/ios-rust/
- Run the
build-xcframework.sh
script:
$ ./build-xcframework.sh
This will produce a file name MozillaRustComponents.xcframework.zip
that contains the following, built for all our target iOS platforms.
- The compiled Rust code for all the crates listed in
Cargo.toml
as a static library - The C header files and Swift module maps for the components
Include the xcframework in a local checkout of rust-components-swift
After you generated the MozillaRustComponents.xcframework.zip
in the previous step, do the following to include it in a local checkout of rust-components-swift
. The file will be in the megazords/ios-rust
directory.
- Unzip the
MozillaRustComponents.xcframework.zip
into therust-components-swift
repository: (Assuming you are in the root of therust-components-swift
directory andapplication-services
is a neighbor directory)unzip -o ../application-services/megazords/ios-rust/MozillaRustComponents.xcframework.zip -d .
- Change the
Package.swift
's reference to the xcframework to point to the unzippedMozillaRustComponents.xcframework
that was created in the previous step. You can do this by uncommenting the following line:
and commenting out the following lines:path: "./MozillaRustComponents.xcframework"
url: url, checksum: checksum,
Run the generation script with a local checkout of application services
For this step, run the following script from inside the rust-components-swift
repository (assuming that application-services
is a neighboring directory to rust-components-swift
).
./generate.sh ../application-services
Once that is done, stage and commit the changes the script ran. Xcode can only pick up committed changes.
Include the local checkout of rust-components-swift
in firefox-ios
This is the final step to include your local changes into firefox-ios
. Do the following steps:
-
Open
Client.xcodeproj
in Xcode -
Navigate to the Swift Packages in Xcode:
-
Remove the dependency on
rust-components-swift
as listed on Xcode, you can click the dependency then click the-
-
Add a new swift package by clicking the
+
:- On the top right, enter the full path to your
rust-components-swift
checkout, preceded byfile://
. If you don't know what that is, runpwd
in while inrust-components-swift
. For example:file:///Users/tarikeshaq/code/rust-components-swift
- Change the branch to be the checked-out branch of rust-component-swift you have locally. This is what the dialog should look like:
Note: If Xcode prevents you from adding the dependency to reference a local package, you will need to manually modify the
Client.xcodeproj/project.pbxproj
and replace every occurrence ofhttps://github.com/mozilla/rust-components-swift
with the full path to your local checkout.- Click
Add Package
- Now include the packages you would like to include, choose
MozillaAppServices
- On the top right, enter the full path to your
-
Finally, attempt to build firefox-ios, and if all goes well it should launch with your code. If you face problems, feel free to contact us
How to locally test Swift Package Manager components on Focus iOS
This is a guide on testing the Swift Package Manager component locally against a local build of Focus iOS. For more information on our Swift Package Manager design, read the ADR that introduced it
This guide assumes the component you want to test is already distributed with the
rust-components-swift
repository, you can read the guide for adding a new component if you would like to distribute a new component.
To test a component locally, you will need to do the following:
- Build an xcframework in a local checkout of
application-services
- Include the xcframework in a local checkout of
rust-components-swift
- Run the
make-tag
script inrust-components-swift
using a local checkout ofapplication-services
- Include the local checkout of
rust-components-swift
inFocus
Below are more detailed instructions for each step
Building the xcframework
To build the xcframework do the following:
- In a local checkout of
application-services
, navigate tomegazords/ios-rust/
- Run the
build-xcframework.sh
script:
$ ./build-xcframework.sh --focus
This will produce a file name FocusRustComponents.xcframework.zip
in the focus
directory that contains the following, built for all our target iOS platforms.
- The compiled Rust code for all the crates listed in
Cargo.toml
as a static library - The C header files and Swift module maps for the components
Include the xcframework in a local checkout of rust-components-swift
After you generated the FocusRustComponents.xcframework.zip
in the previous step, do the following to include it in a local checkout of rust-components-swift
:
- clone a local checkout of
rust-components-swift
, not inside theapplication-services
repository:git clone https://github.com/mozilla/rust-components.swift.git
- Unzip the
FocusRustComponents.xcframework.zip
into therust-components-swift
repository: (Assuming you are in the root of therust-components-swift
directory andapplication-services
is a neighbor directory)unzip -o ../application-services/megazords/ios-rust/focus/FocusRustComponents.xcframework.zip -d .
- Change the
Package.swift
's reference to the xcframework to point to the unzippedFocusRustComponents.xcframework
that was created in the previous step. You can do this by uncommenting the following line:
and commenting out the following lines:path: "./FocusRustComponents.xcframework"
url: focusUrl, checksum: focusChecksum,
Run the generation script with a local checkout of application services
For this step, run the following script from inside the rust-components-swift
repository (assuming that application-services
is a neighboring directory to rust-components-swift
).
./generate.sh ../application-services
Once that is done, stage and commit the changes the script ran. Xcode can only pick up committed changes.
Include the local checkout of rust-components-swift
in Focus
This is the final step to include your local changes into Focus
. Do the following steps:
-
Clone a local checkout of
Focus
if you haven't already. Make sure you also install the project dependencies, more information in their build instructions -
Open
Blockzilla.xcodeproj
in Xcode -
Navigate to the Swift Packages in Xcode:
-
Remove the dependency on
rust-components-swift
as listed on Xcode, you can click the dependency then click the-
-
Add a new swift package by clicking the
+
:- On the top right, enter the full path to your
rust-components-swift
checkout, preceded byfile://
. If you don't know what that is, runpwd
in while inrust-components-swift
. For example:file:///Users/tarikeshaq/code/rust-components-swift
- Change the branch to be the checked-out branch of rust-component-swift you have locally. This is what the dialog should look like:
- Click
Add Package
- Now include the
FocusAppServices
library.
Note: If Xcode prevents you from adding the dependency to reference a local package, you will need to manually modify the
Blockzilla.xcodeproj/project.pbxproj
and replace every occurrence ofhttps://github.com/mozilla/rust-components-swift
with the full path to your local checkout. - On the top right, enter the full path to your
-
Finally, attempt to build focus, and if all goes well it should launch with your code. If you face any problems, feel free to contact us
Building and using a locally-modified version of JNA
Java Native Access is an important dependency for the Application Services components on Android, as it provides the low-level interface from the JVM into the natively-compiled Rust code.
If you need to work with a locally-modified version of JNA (e.g. to investigate an apparent JNA bug) then you may find these notes helpful.
The JNA docs do have an Android Development Environment guide that is a good starting point, but the instructions did not work for me and appear a little out of date. Here are the steps that worked for me:
-
Modify your environment to specify
$NDK_PLATFORM
, and to ensure the Android NDK tools for each target platform are in your$PATH
. On my Mac with Android Studio the config was as follows:export NDK_ROOT="$HOME/Library/Android/sdk/ndk/27.0.12077973" export NDK_PLATFORM="$NDK_ROOT/platforms/android-25" export PATH="$PATH:$NDK_ROOT/toolchains/llvm/prebuilt/darwin-x86_64/bin" export PATH="$PATH:$NDK_ROOT/toolchains/aarch64-linux-android-4.9/prebuilt/darwin-x86_64/bin" export PATH="$PATH:$NDK_ROOT/toolchains/arm-linux-androideabi-4.9/prebuilt/darwin-x86_64/bin" export PATH="$PATH:$NDK_ROOT/toolchains/x86-4.9/prebuilt/darwin-x86_64/bin" export PATH="$PATH:$NDK_ROOT/toolchains/x86_64-4.9/prebuilt/darwin-x86_64/bin"
You will probably need to tweak the paths and version numbers based on your operating system and the details of how you installed the Android NDK.
-
Install the
ant
build tool (usingbrew install ant
worked for me). -
Checkout the JNA source from Github. Try doing a basic build via
ant dist
andant test
. This won't build for Android but will test the rest of the tooling. -
Adjust
./native/Makefile
for compatibility with your Android NSK install. Here's what I had to do for mine:- Adjust the
$CC
variable to use clang instead of gcc:CC=aarch64-linux-android21-clang
. - Adjust thd
$CCP
variable to use the version from your system:CPP=cpp
. - Add
-landroid -llog
to the list of libraries to link against in$LIBS
.
- Adjust the
-
Build the JNA native libraries for the target platforms of interest:
ant -Dos.prefix=android-aarch64
ant -Dos.prefix=android-armv7
ant -Dos.prefix=android-x86
ant -Dos.prefix=android-x86-64
-
Package the newly-built native libraries into a JAR/AAR using
ant dist
. This should produce./dist/jna.aar
. -
Configure
build.gradle
for the consuming application to use the locally-built JNA artifact:// Tell gradle where to look for local artifacts. repositories { flatDir { dirs "/PATH/TO/YOUR/CHECKOUT/OF/jna/dist" } } // Tell gradle to exclude the published version of JNA. configurations { implementation { exclude group: "net.java.dev.jna", module:"jna" } } // Take a direct dependency on the local JNA AAR. dependencies { implementation name: "jna", ext: "aar" }
-
Rebuild and run your consuming application, and it should be using the locally-built JNA!
If you're trying to debug some unexpected JNA behaviour (and if you favour old-school printf-style debugging) then you can this code snippet to print to the Android log from the compiled native code:
#ifdef __ANDROID__
#include <android/log.h>
#define HACKY_ANDROID_LOG(...) __android_log_print(ANDROID_LOG_VERBOSE, "HACKY-DEBUGGING-FOR-ALL", __VA_ARGS__)
#else
#define HACKY_ANDROID_LOG(MSG)
#endif
HACKY_ANDROID_LOG("this will go to the android logcat output");
HACKY_ANDROID_LOG("it accepts printf-style format sequences, like this: %d", 42);
Branch builds
Branch builds are a way to build and test Fenix using branches from application-services
and firefox-android
.
iOS is not currently supported, although we may add it in the future (see #4966).
Breaking changes in an application-services branch.
When we make breaking changes in an application-services branch, we typically make corresponding changes in an
android-components
branch. Branch builds allow combining those branches together in order to run CI tests
and to produce APKs for manual testing. To trigger a branch build for this:
- Create the PR for the
application-services
branch you're working on - Add
[firefox-android: branch-name]
to the PR title - The branch build tasks will be listed as checks the Github PR. In particular:
branch-build-fenix-test
andbranch-build-ac-test
will run the unit android-components/fenix unit testsbranch-build-fenix-build
will contain the Fenix APK.
Application-services nightlies
When we make non-breaking changes, we typically merge them into main and let them sit there until the next release. In
order to check that the current main really does only have non-breaking changes, we run a nightly branch build from the
main
branch of application-services
,
- To view the latest branch builds:
- Open the latest decision task from the task index.
- Click the "View Task" link
- Click "Task Group" in the top-left
- You should now see a list of tasks from the latest nightly
*-build
were for building the application. A failure here indicates there's probably a breaking change that needs to be resolved.- To get the APK, navigate to
branch-build-fenix-build
and downloadapp-x86-debug.apk
from the artifacts list branch-build-ac-test.*
are the android-components tests tasks. These are split up by gradle project, which matches how the android-components CI handles things. Running all the tests together often leads to failures.branch-build-fenix-test
is the Fenix tests. These are not split up per-project.
- These builds are triggered by our .cron.yml file
Guide to Testing a Rust Component
This document gives a high-level overview of how we test components in application-services
.
It will be useful to you if you're adding a new component, or working on increasing the test
coverage of an existing component.
If you are only interested in running the existing test suite, please consult the contributor docs and the tests.py script.
Unit and Functional Tests
Rust code
Since the core implementations of our components live in rust, so does the core of our testing strategy.
Each rust component should be accompanied by a suite of unit tests, following the guidelines for writing tests from the Rust Book. Some additional tips:
-
Where possible, it's better use use the Rust typesystem to make bugs impossible than to write tests to assert that they don't occur in practice. But given that the ultimate consumers of our code are not in Rust, that's sometimes not possible. The best idiomatic Rust API for a feature is not necessarily the best API for consuming it over an FFI boundary.
-
Rust's builtin assertion macros are sparse; we use the more_asserts for some additional helpers.
-
Rust's strict typing can make test mocks difficult. If there's something you need to mock out in tests, make it a Trait and use the mockiato crate to mock it.
The Rust tests for a component should be runnable via cargo test
.
FFI Layer code
We are currently using uniffi
to generate most ((and soon all!) of our FFI code and thus the FFI code itself does not need to be extensively tested.
Kotlin code
The Kotlin wrapper code for a component should have its own test suite, which should follow the general guidelines for testing Android code in Mozilla projects. In practice that means we use JUnit as the test framework and Robolectric to provide implementations of Android-specific APIs.
The Kotlin tests for a component should be runnable via ./gradlew <component>:test
.
The tests at this layer are designed to ensure that the API binding code is working as intended, and should not repeat tests for functionality that is already well tested at the Rust level. But given that the Kotlin bindings involve a non-trivial amount of hand-written boilerplate code, it's important to exercise that code thoroughly.
One complication with running Kotlin tests is that the code needs to run on your local development machine,
but the Kotlin code's native dependencies are typically compiled and packaged for Android devices. The
tests need to ensure that an appropriate version of JNA and of the compiled Rust code is available in
their library search path at runtime. Our build.gradle
files contain a collection of hackery that ensures
this, which should be copied into any new components.
The majority of our Kotlin bindings are autogenerated using uniffi
and do not need extensive testing.
Swift code
The Swift wrapper code for a component should have its own test suite, using Apple's Xcode unittest framework.
Due to the way that all rust components need to be compiled together into a single "megazord"
framework, this entire repository is a single Xcode project. The Swift tests for each component
thus need to live under megazords/ios-rust/MozillaTestServicesTests/
rather than in the directory
for the corresponding component. (XXX TODO: is this true? it would be nice to find a way to avoid having
them live separately because it makes them easy to overlook).
The tests at this layer are designed to ensure that the API binding code is working as intended, and should not repeat tests for functionality that is already well tested at the Rust level. But given that the Swift bindings involve a non-trivial amount of hand-written boilerplate code, it's important to exercise that code thoroughly.
The majority of our Swift bindings are autogenerated using uniffi
and do not need extensive testing.
Integration tests
End-to-end Sync Tests
⚠️ Those tests were disabled because of how flakey the stage server was. See #3909 ⚠️
The testing/sync-test
directory contains a test harness for running sync-related
Rust components against a live Firefox Sync infrastructure, so that we can verifying the functionality
end-to-end.
Each component that implements a sync engine should have a corresponding suite of tests in this directory.
- XXX TODO: places doesn't.
- XXX TODO: send-tab doesn't (not technically a sync engine, but still, it's related)
- XXX TODO: sync-manager doesn't
Android Components Test Suite
It's important that changes in application-services
are tested against upstream consumer code in the
android-components repo. This is currently
a manual process involving:
- Configuring your local checkout of android-components to use your local application-services build.
- Running the android-components test suite via
./gradle test
. - Manually building and running the android-components sample apps to verify that they're still working.
Ideally some or all of this would be automated and run in CI, but we have not yet invested in such automation.
Test Coverage
We currently have code coverage reporting on Github using codecov. However, our code coverage does not tell us how much more coverage is caused by our consumers' tests.
Ideas for Improvement
- ASan, Memsan, and maybe other sanitizer checks, especially around the points where we cross FFI boundaries.
- General-purpose fuzzing, such as via https://github.com/jakubadamw/arbitrary-model-tests
- We could consider making a mocking backend for viaduct, which would also be mockable from Kotlin/Swift.
- Add more end-to-end integration tests!
- Live device tests, e.g. actual Fenixes running in an emulator and syncing to each other.
- Run consumer integration tests in CI against main.
Smoke testing Application Services against end-user apps
This is a great way of finding integration bugs with application-services
.
The testing can be done manually using substitution scripts, but we also have scripts that will do the smoke-testing for you.
Dependencies
Run pip3 install -r automation/requirements.txt
to install the required Python packages.
Android Components
The automation/smoke-test-android-components.py
script will clone (or use a local version) of
android-components and run a subset of its tests against the current application-services
worktree.
It tries to only run tests that might be relevant to application-services
functionality.
Fenix
The automation/smoke-test-fenix.py
script will clone (or use a local version) of Fenix and
run tests against the current application-services
worktree.
Firefox iOS
The automation/smoke-test-fxios.py
script will clone (or use a local version) of Firefox iOS and
run tests against the current application-services
worktree.
Testing faster: How to avoid making compile times worse by adding tests
Background
We'd like to keep cargo test
, cargo build
, cargo check
, ... reasonably
fast, and we'd really like to keep them fast if you pass -p
for a specific
project. Unfortunately, there are a few ways this can become unexpectedly slow.
The easiest of these problems for us to combat at the moment is the unfortunate
placement of dev-dependencies in our build graph.
If you perform a cargo test -p foo
, all dev-dependencies of foo
must be
compiled before foo
's tests can start. This includes dependencies only used
non-test targets, such as examples or benchmarks.
In an ideal world, cargo could run your tests as soon as it finished with the dependencies it needs for those tests, instead of waiting for your benchmark suite, or the arg-parser your examples use, or etc.
Unfortunately, all cargo knows is that these are dev-dependencies
, and not
which targets actually use them.
Additionally, unqualified invocations of cargo (that is, without -p
) might
have an even worse time if we aren't careful. If I run, cargo test
, cargo
knows every crate in the workspace needs to be built with all dev
dependencies, if places
depends on fxa-client
, all of fxa-clients
dev-dependencies must be compiled, ready, and linked in at least to the lib
target before we can even think about starting on places
.
We have not been careful about what shape the dependency graph ends up as when example code is taken into consideration (as it is by cargo during certain builds), and as a result, we have this problem. Which isn't really a problem we want to fix: Example code can and should depend on several different components, and use them together in interesting ways.
So, because we don't want to change what our examples do, or make major architectural changes of the non-test code for something like this, we need to do something else.
The Solution
To fix this, we manually insert "cuts" into the dependency graph to help cargo out. That is, we pull some of these build targets (e.g. examples, benchmarks, tests if they cause a substantial compile overhead) into their own dedicated crates so that:
- They can be built in parallel with each other.
- Crates depending on the component itself are not waiting on the test/bench/example build in order for their test build to begin.
- A potentially smaller set of our crates need to be rebuilt -- and a smaller set of possible configurations exist meaning fewer items to add pressure to caches.
- ...
Some rules of thumb for when / when not to do this:
-
All rust examples should be put in
examples/*
. -
All rust benchmarks should be put in
testing/separated/*
. See the section below on how to set your benchmark up to avoid redundant compiles. -
Rust tests which brings in heavyweight dependencies should be evaluated on an ad-hoc basis. If you're concerned, measure how long compilation takes with/without, and consider how many crates depend on the crate where the test lives (e.g. a slow test in support/foo might be far worse than one in a leaf crate), etc...
Appendix: How to avoid redundant compiles for benchmarks and integration tests
To be clear, this is way more important for benchmarks (which always compile as release and have a costly link phase).
Say you have a directory structure like the following:
mycrate
├── src
│ └── lib.rs
| ...
├── benches
│ ├── bench0.rs
| ├── bench1.rs
│ └── bench2.rs
├── tests
│ ├── test0.rs
| ├── test1.rs
│ └── test2.rs
└── ...
When you run your integration tests or benchmarks, each of test0
, test1
,
test2
or bench0
, bench1
, bench2
is compiled as it's own crate that runs
the tests in question and exits.
That means 3 benchmark executables are built on release settings, and 3 integration test executables.
If you've ever tried to add a piece of shared utility code into your integration
tests, only to have cargo (falsely) complain that it is dead code: this is why.
Even if test0.rs
and test2.rs
both use the utility function, unless
every test crate uses every shared utility, the crate that doesn't will
complain.
(Aside: This turns out to be an unintentional secondary benefit of this approach
-- easier shared code among tests, without having to put a
#![allow(dead_code)]
in your utils.rs. We haven't hit that very much here,
since we tend to stick to unit tests, but it came up in mentat several times,
and is a frequent complaint people have)
Anyway, the solution here is simple: Create a new crate. If you were working in
components/mycrate
and you want to add some integration tests or benchmarks,
you should do cargo new --lib testing/separated/mycrate-test
(or
.../mycrate-bench
).
Delete .../mycrate-test/src/lib.rs
. Yep, really, we're making a crate that
only has integration tests/benchmarks (See the "FAQ0" section at the bottom of
the file if you're getting incredulous).
Now, add a src/tests.rs
or a src/benches.rs
. This file should contain mod foo;
declarations for each submodule containing tests/benchmarks, if any.
For benches, this is also where you set up the benchmark harness (refer to benchmark library docs for how).
Now, for a test, add: into your Cargo.toml
[[test]]
name = "mycrate-test"
path = "src/tests.rs"
and for a benchmark, add:
[[test]]
name = "mycrate-benches"
path = "src/benches.rs"
harness = false
Because we aren't using src/lib.rs
, this is what declares which file is the
root of the test/benchmark crate. Because there's only one target (unlike with
tests/*
/ benches/*
under default settings), this will compile more quickly.
Additionally, src/tests.rs
and src/benches.rs
will behave like a normal
crate, the only difference being that they don't produce a lib, and that they're
triggered by cargo test
/cargo run
respectively.
FAQ0: Why put tests/benches in src/*
instead of disabling autotests
/autobenches
Instead of putting tests/benchmarks inside src
, we could just delete the src
dir outright, and place everything in tests
/benches
.
Then, to get the same one-rebuild-per-file behavior that we'll get in src
, we
need to add autotests = false
or autobenches = false
to our Cargo.toml,
adding a root tests/tests.rs
(or benches/benches.rs
) containing mod
decls
for all submodules, and finally by referencing that "root" in the Cargo.toml
[[tests]]
/ [[benches]]
list, exactly the same way we did for using src/*
.
This would work, and on the surface, using tests/*.rs
and benches/*.rs
seems
more consistent, so it seems weird to use src/*.rs
for these files.
My reasoning is as follows: Almost universally, tests/*.rs
, examples/*.rs
,
benches/*.rs
, etc. are automatic. If you add a test into the tests folder, it
will run without anything else.
If we're going to set up one-build-per-{test,bench}suite as I described, this
fundamentally cannot be true. In this paradigm, if you add a test file named
blah.rs
, you must add a mod blah
it to the parent module.
It seems both confusing and error-prone to use tests/*
, but have it behave
that way, however this is absolutely the normal behavior for files in src/*.rs
-- When you add a file, you then need to add it to it's parent module, and this
is something Rust programmers are pretty used to.
(In fact, we even replicated this behavior (for no reason) in the places
integration tests, and added the mod
declarations to a "controlling" parent
module -- It seems weird to be in an environment where this isn't required)
So, that's why. This way, we make it way less likely that you add a test file
to some directory, and have it get ignored because you didn't realize that in
this one folder, you need to add a mod mytest
into a neighboring tests.rs.
Debugging Sql
It can be quite tricky to debug what is going on with sql statement, especially once the sql gets complicated or many triggers are involved.
The sql_support
create provides some utilities to help. Note that
these utilities are gated behind a debug-tools
feature. The module
provides docstrings, so you should read them before you start.
This document describes how to use these capabilities and we'll use places
as an example.
First, we must enable the feature:
--- a/components/places/Cargo.toml
+++ b/components/places/Cargo.toml
@@ -22,7 +22,7 @@ lazy_static = "1.4"
url = { version = "2.1", features = ["serde"] }
percent-encoding = "2.1"
caseless = "0.2"
-sql-support = { path = "../support/sql" }
+sql-support = { path = "../support/sql", features=["debug-tools"] }
and we probably need to make the debug functions available:
--- a/components/places/src/db/db.rs
+++ b/components/places/src/db/db.rs
@@ -108,6 +108,7 @@ impl ConnectionInitializer for PlacesInitializer {
";
conn.execute_batch(initial_pragmas)?;
define_functions(conn, self.api_id)?;
+ sql_support::debug_tools::define_debug_functions(conn)?;
We now have a Rust function print_query()
and a SQL function dbg()
available.
Let's say we were trying to debug a test such as test_bookmark_tombstone_auto_created
.
We might want to print the entire contents of a table, then instrument a query to check
what the value of a query is. We might end up with a patch something like:
index 28f19307..225dccbb 100644
--- a/components/places/src/db/schema.rs
+++ b/components/places/src/db/schema.rs
@@ -666,7 +666,8 @@ mod tests {
[],
)
.expect("should insert regular bookmark folder");
- conn.execute("DELETE FROM moz_bookmarks WHERE guid = 'bookmarkguid'", [])
+ sql_support::debug_tools::print_query(&conn, "select * from moz_bookmarks").unwrap();
+ conn.execute("DELETE FROM moz_bookmarks WHERE dbg('CHECKING GUID', guid) = 'bookmarkguid'", [])
.expect("should delete");
// should have a tombstone.
assert_eq!(
There are 2 things of note:
- We used the
print_query
function to dump the entiremoz_bookmarks
table before executing the query. - We instrumented the query to print the
guid
every time sqlite reads a row and compares it against a literal.
The output of this test now looks something like:
running 1 test
query: select * from moz_bookmarks
+----+------+------+--------+----------+---------+---------------+---------------+--------------+------------+-------------------+
| id | fk | type | parent | position | title | dateAdded | lastModified | guid | syncStatus | syncChangeCounter |
+====+======+======+========+==========+=========+===============+===============+==============+============+===================+
| 1 | null | 2 | null | 0 | root | 1686248350470 | 1686248350470 | root________ | 1 | 1 |
+----+------+------+--------+----------+---------+---------------+---------------+--------------+------------+-------------------+
| 2 | null | 2 | 1 | 0 | menu | 1686248350470 | 1686248350470 | menu________ | 1 | 1 |
+----+------+------+--------+----------+---------+---------------+---------------+--------------+------------+-------------------+
| 3 | null | 2 | 1 | 1 | toolbar | 1686248350470 | 1686248350470 | toolbar_____ | 1 | 1 |
+----+------+------+--------+----------+---------+---------------+---------------+--------------+------------+-------------------+
| 4 | null | 2 | 1 | 2 | unfiled | 1686248350470 | 1686248350470 | unfiled_____ | 1 | 1 |
+----+------+------+--------+----------+---------+---------------+---------------+--------------+------------+-------------------+
| 5 | null | 2 | 1 | 3 | mobile | 1686248350470 | 1686248350470 | mobile______ | 1 | 1 |
+----+------+------+--------+----------+---------+---------------+---------------+--------------+------------+-------------------+
| 6 | null | 3 | 1 | 0 | null | 1 | 1 | bookmarkguid | 2 | 1 |
+----+------+------+--------+----------+---------+---------------+---------------+--------------+------------+-------------------+
test db::schema::tests::test_bookmark_tombstone_auto_created ... FAILED
failures:
---- db::schema::tests::test_bookmark_tombstone_auto_created stdout ----
CHECKING GUID root________
CHECKING GUID menu________
CHECKING GUID toolbar_____
CHECKING GUID unfiled_____
CHECKING GUID mobile______
CHECKING GUID bookmarkguid
It's unfortunate that the output of print_table()
goes to the tty while the output of dbg
goes to stderr
, so
you might find the output isn't quite intermingled as you would expect, but it's better than nothing!
Dependency Management Guidelines
This repository uses third-party code from a variety of sources, so we need to be mindful of how these dependencies will affect our consumers. Considerations include:
- General code quality.
- Licensing compatibility.
- Handling of security vulnerabilities.
- The potential for supply-chain compromise.
We're still evolving our policies in this area, but these are the guidelines we've developed so far.
Rust Code
Unlike Firefox,
we do not vendor third-party source code directly into the repository. Instead we rely on
Cargo.lock
and its hash validation to ensure that each build uses an identical copy
of all third-party crates. These are the measures we use for ongoing maintence of our
existing dependencies:
- Check
Cargo.lock
into the repository. - Generate built artifacts using the
--locked
flag tocargo build
, as an additional assurance that the existingCargo.lock
will be respected. - Regularly run cargo-audit in CI to alert us to
security problems in our dependencies.
- It runs on every PR, and once per hour on the
main
branch
- It runs on every PR, and once per hour on the
- Use a home-grown tool to generate a summary of dependency licenses
and to check them for compatibility with MPL-2.0.
- Check these summaries into the repository and have CI alert on unexpected changes, to guard against pulling in new versions of a dependency under a different license.
Adding a new dependency, whether we like it or not, is a big deal - that dependency and everything it brings with it will become part of Firefox-branded products that we ship to end users. We try to balance this responsibility against the many benefits of using existing code, as follows:
- In general, be conservative in adding new third-party dependencies.
- For trivial functionality, consider just writing it yourself. Remember the cautionary tale of left-pad.
- Check if we already have a crate in our dependency tree that can provide the needed functionality.
- Prefer crates that have a a high level of due-diligence already applied, such as:
- Crates that are already vendored into Firefox.
- Crates from rust-lang-nursery.
- Crates that appear to be widely used in the rust community.
- Check that it is clearly licensed and is MPL-2.0 compatible.
- Take the time to investigate the crate's source and ensure it is suitably high-quality.
- Be especially wary of uses of
unsafe
, or of code that is unusually resource-intensive to build. - Dev dependencies do not require as much scrutiny as dependencies that will ship in consuming applications,
but should still be given some thought.
- There is still the potential for supply-chain compromise with dev dependencies!
- Be especially wary of uses of
- As part of the PR that introduces the new dependency:
- Regenerate dependency summary files using the regenerate_dependency_summaries.sh.
- Explicitly describe your consideration of the above points.
Updating to new versions of existing dependencies is a normal part of software development and is not accompanied by any particular ceremony.
Android/Kotlin Code
We currently depend only on the following Kotlin dependencies:
We currently depend on the following developer dependencies in the Kotlin codebase, but they do not get included in built distribution files:
- detekt
- ktlint
No additional Kotlin dependencies should be added to the project unless absolutely necessary.
iOS/Swift Code
We currently do not depend on any Swift dependencies. And no Swift dependencies should be added to the project unless absolutely necessary.
Other Code
We currently depend on local builds of the following system dependencies:
No additional system dependencies should be added to the project unless absolutely necessary.
Adding a new component to Application Services
Each component in the Application Services repository has three parts (the Rust code, the Kotlin wrapper, and the Swift wrapper) so there are quite a few moving parts involved in adding a new component. This is a rapid-fire list of all the things you'll need to do if adding a new component from scratch.
The Rust Code
Your component should live under ./components
in this repo.
Use cargo new --lib ./components/<your_crate_name>
to create a new library crate,
and please try to avoid using hyphens in the crate name.
See the Guide to Building a Rust Component for general advice on designing and structuring the actual Rust code, and follow the Dependency Management Guidelines if your crate introduces any new dependencies.
Use UniFFI to define how your crate's
API will get exposed to foreign-language bindings. By convention, put the interface
definition file at ./components/<your_crate_name>/<your_crate_name>.udl
. Use
the builtin-bindgen
feature of UniFFI to simplify the build process, by
putting the following in your Cargo.toml
:
[build-dependencies]
uniffi_build = { version = "<latest version here>", features=["builtin-bindgen"] }
Include your new crate in the application-services
workspace, by adding
it to the members
and default-members
lists in the Cargo.toml
at
the root of the repository.
In order to be published to consumers, your crate must be included in the "megazord" crate for each target platform:
- For Android, add it as a dependency in
./megazords/full/Cargo.toml
and add apub use <your_crate_name>
to./megazords/full/src/lib.rs
. - For iOS, add it as a dependency in
./megazords/ios-rust/rust/Cargo.toml
and add apub use <your_crate_name>
to./megazords/ios-rust/src/lib.rs
.
Run cargo check -p <your_crate_name>
in the repository root to confirm that
things are configured properly. This will also have the side-effect of updating
Cargo.lock
to contain your new crate and its dependencies.
The Kotlin Bindings
Make a ./components/<your_crate_name>/android
subdirectory to contain
Kotlin- and Android-specific code. This directory will contain a gradle
project for building your Kotlin bindings.
Copy the build.gradle
file from ./components/crashtest/android/
into
your own component's directory, and edit it to replace the references to
crashtest.udl
with your own component's .udl
file.
Create a file ./components/<your_crate_name>/uniffi.toml
with the
following contents:
[bindings.kotlin]
package_name = "mozilla.appservices.<your_crate_name>"
cdylib_name = "megazord"
Create a file ./components/<your_crate_name>/android/src/main/AndroidManifest.xml
with the following contents:
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
package="org.mozilla.appservices.<your_crate_name>" />
In the root of the repository, edit .buildconfig-android.yml
to add
your component's metadata. This will cause it to be included in the
gradle workspace and in our build and publish pipeline. Check whether
it builds correctly by running:
./gradlew <your_crate_name>:assembleDebug
You can include hand-written Kotlin code alongside the automatically generated bindings, by placing `.kt`` files in a directory named:
./android/src/test/java/mozilla/appservices/<your_crate_name>/
You can write Kotlin-level tests that consume your component's API, by placing `.kt`` files in a directory named:
./android/src/test/java/mozilla/appservices/<your_crate_name>/
.
So you would end up with a directory structure something like this:
components/<your_crate_name>/
Cargo.toml
uniffi.toml
src/
- Rust code here.
android/
build.gradle
src/
main/
AndroidManifest.xml
java/mozilla/appservices/<your_crate_name>/
- Hand-written Kotlin code here.
test/java/mozilla/appservices/<your_crate_name>/
- Kotlin test-cases here.
Run your component's Kotlin tests with ./gradlew <your_crate_name>:test
to confirm that this is all working correctly.
The Swift Bindings
Creating the directory structure
Make a ./components/<your_crate_name>/ios
subdirectory to contain
Swift- and iOS-specific code. The UniFFI-generated swift bindings will
be written to a subdirectory named Generated
.
You can include hand-written Swift code alongside the automatically
generated bindings, by placing .swift
files in a directory named:
./ios/<your_crate_name>/
.
So you would end up with a directory structure something like this:
components/<your_crate_name>/
Cargo.toml
uniffi.toml
src/
- Rust code here.
ios/
<your_crate_name>/
- Hand-written Swift code here.
Generated/
- Generated Swift code will be written into this directory.
Adding your component to the Swift Package Manager Megazord
For more information on our how we ship components using the Swift Package Manager, check the ADR that introduced the Swift Package Manager
You will need to do the following steps to include the component in the megazord:
-
Update its
uniffi.toml
to include the following settings:[bindings.swift] ffi_module_name = "MozillaRustComponents" ffi_module_filename = "<crate_name>FFI"
-
Add the component as a dependency to the
Cargo.toml
inmegazords/ios-rust/
-
Add a
pub use
declaration for the component inmegazords/ios-rust/src/lib.rs
-
Add logic to the
megazords/ios-rust/build-xcframework.sh
to copy or generate its header file into the build -
Add an
#import
for its header file tomegazords/ios-rust/MozillaRustComponents.h
-
Add your component into the iOS "megazord" through the Xcode project, which can only really by done using the Xcode application, which can only really be done if you're on a Mac.
-
Open
megazords/ios-rust/MozillaTestServices/MozillaTestServices.xcodeproj
in Xcode. -
In the Project navigator, add a new Group for your new component, pointing to the
./ios/
directory you created above. Add the following entries to the Group:- The
.udl
file for you component, from../src/<your_crate_name>.udl
. - Any hand-written
.swift
files for your component
- The
-
Make sure that the "Copy items if needed" option is unchecked, and that nothing is checked in the "Add to targets" list.
The result should look something like this:
Click on the top-level "MozillaTestServices" project in the navigator, then go to "Build Phases".
Double-check that
<your_crate_name>.udl
does not appear in the "Copy Bundle Resources" section.
Add <your_crate_name>.udl
to the list of "Compile Sources". This will trigger an Xcode Build Rule that generates
the Swift bindings automatically. Also include any hand-written .swift
files in this list.
Finally, in the Project navigator, add a sub-group named "Generated", pointing to the ./Generated/
subdirectory, and
containing entries for the files generated by UniFFI:
* <your_crate_name>.swift
* <your_crate_name>FFI.h
Make sure that "Copy items if needed" is unchecked, and that nothing is checked in "Add to targets".
Double-check that
<your_crate_name>.swift
does not appear in the "Compile Sources" section.
The result should look something like this:
Build the project in Xcode to check whether that all worked correctly.
To add Swift tests for your component API, create them in a file under
megazords/ios-rust/MozillaTestServicesTests/
. Use this syntax to import
your component's bindings from the compiled megazord:
@testable import MozillaTestServices
In Xcode, navigate to the MozillaTestServicesTests
Group and add your
new test file as an entry. Select the corresponding target, click on
"Build Phases", and add your test file to the list of "Compile Sources".
The result should look something like this:
Use the Xcode Test Navigator to run your tests and check whether they're passing.
Distribute your component with rust-components-swift
The Swift source code and generated UniFFI bindings are distributed to consumers (eg: Firefox iOS) through rust-components-swift
.
A nightly taskcluster job prepares the rust-component-swift
packages from the source code in the application-services repository. To distribute your component with rust-component-swift
, add the following to the taskcluster script in taskcluster/scripts/build-and-test-swift.py
:
- Add the path to the
<your_crate_name>.udl
file toBINDINGS_UDL_PATHS
- Optionally also to
FOCUS_UDL_PATHS
if your component is also targeting Firefox Focus
- Optionally also to
- Add the path to the directory containing any hand-written swift code to
SOURCE_TO_COPY
- Optionally also to
FOCUS_SOURCE_TO_COPY
if your component is also targeting Firefox Focus
- Optionally also to
Your component should now automatically get included in the next rust-component-swift
nightly release.
Guide to Building a Syncable Rust Component
This is a guide to creating a new Syncable Rust Component like many of the components in this repo. If you are looking for information how to build (ie,compile, etc) the existing components, you are looking for our build documentation
Welcome!
It's great that you want to build a Rust Component - this guide should help get you started. It documents some nomenclature, best-practices and other tips and tricks to get you started.
This document is just for general guidance - every component will be different and we are still learning how to make these components. Please update this document with these learnings.
To repeat with emphasis - please consider this a living document.
General design and structure of the component
We think components should be structured as described here.
We build libraries, not frameworks
Think of building a "library", not a "framework" - the application should be in control and calling functions exposed by your component, not providing functions for your component to call.
The "store" is the "entry-point"
[Note that some of the older components use the term "store" differently; we
should rename them! In Places, it's called an "API"; in Logins an "engine". See
webext-storage
for a more recent component that uses the term "Store" as we
think it should be used.]
The "Store" is the entry-point for the consuming application - it provides the core functionality exposed by the component and manages your databases and other singletons. The responsibilities of the "Store" will include things like creating the DB if it doesn't exist, doing schema upgrades etc.
The functionality exposed by the "Store" will depend on the complexity of the
API being exposed. For example, for webext-storage
, where there are only a
handful of simple public functions, it just directly exposes all the
functionality of the component. However, for Places, which has a much more
complex API, the (logical) Store instead supplies "Connection" instances which
expose the actual functionality.
Using sqlite
We prefer sqlite instead of (say) JSON files or RKV.
Always put sqlite into WAL mode, then have exactly 1 writer connection and as
many reader connections you need - which will depend on your use-case - for
example, webext_storage
has 1 reader, while places
has many.
(Note that places has 2 writers (one for sync, one for the api), but we believe this was a mistake and should have been able to make things work better with exactly 1 shared between sync and the api)
We typically have a "DB" abstraction which manages the database itself - the logic for handling schema upgrades etc and enforcing the "only 1 writer" rule is done by this.
However, this is just a convenience - the DB abstractions aren't really passed
around - we just pass raw connections (or transactions) around. For example, if
there's a utility function that reads from the DB, it will just have a Rusqlite
connection passed. (Again, older components don't really do this well, but
webext-storage
does)
We try and leverage rust to ensure transactions are enforced at the correct
boundaries - for example, functions which write data but which must be done as
part of a transaction will accept a Rusqlite Transaction
reference as the
param, whereas something that only reads the Db will accept a Rusqlite
Connection
- note that because Transaction
supports
Deref<Target = Connection>
, you can pass a &Transaction
wherever a
&Connection
is needed - but not vice-versa.
Meta-data
You are likely to have a table just for key/value metadata, and this table will be used by sync (and possibly other parts of the component) to track the sync IDs, lastModified timestamps etc.
Schema management
The schemas are stored in the tree in .sql files and pulled into the source at
build time via include_str!
. Depending on the complexity of your component,
there may be a need for different Connections to have different Sql (for
example, it may be that only your 'write' connection requires the sql to define
triggers or temp tables, so these might be in their own file.)
Because requirements evolve, there will be a need to support schema upgrades.
This is done by way of sqlite's PRAGMA user_version
- which can be thought of
as simple metadata for the database itself. In short, immediately after opening
the database for the first time, we check this version and if it's less than
expected we perform the schema upgrades necessary, then re-write the version
to the new version.
This is easier to read than explain, so read the upgrade()
function in
the Places schema code
You will need to be a big careful here because schema upgrades are going to block the calling application immediately after they upgrade to a new version, so if your schema change requires a table scan of a massive table, you are going to have a bad time. Apart from that though, you are largely free to do whatever sqlite lets you do!
Note that most of our components have very similar schema and database management code - these are screaming out to be refactored so common logic can be shared. Please be brave and have a go at this!
Triggers
We tend to like triggers for encompassing application logic - for example, if
updating one row means a row in a different table should be updated based on
that data, we'd tend to prefer an, eg, AFTER UPDATE
trigger than having our
code manually implement the logic.
However, you should take care here, because functionality based on triggers is difficult to debug (eg, logging in a trigger is difficult) and the functionality can be difficult to locate (eg, users unfamiliar with the component may wonder why they can't find certain functionity in the rust code and may not consider looking in the sqlite triggers)
You should also be careful when creating triggers on persistent main tables. For example, bumping the change counter isn't a good use for a trigger, because it'll run for all changes on the table—including those made by Sync. This means Sync will end up tracking its own changes, and getting into infinite syncing loops. Triggers on temporary tables, or ones that are used for bookkeeping where the caller doesn't matter, like bumping the foreign reference count for a URL, are generally okay.
General structure of the rust code
We prefer flatter module hierarchies where possible. For example, in Places
we ended up with sync_history
and sync_bookmarks
sub-modules rather than
a sync
submodule itself with history
and bookmarks
.
Note that the raw connections are never exposed to consumers - for example, they will tend to be stored as private fields in, eg, a Mutex.
Syncing
The traits you need to implement to sync aren't directly covered here.
All meta-data related to sync must be stored in the same database as the
data itself - often in a meta
table.
All logic for knowing which records need to be sync must be part of the
application logic, and will often be implemented using triggers
. It's quite
common for components to use a "change counter" strategy, which can be
summarized as:
-
Every table which defines the "top level" items being synced will have a column called something like 'sync_change_counter' - the app will probably track this counter manually instead of using a trigger, because sync itself will need different behavior when it updates the records.
-
At sync time, items with a non-zero change counter are candidates for syncing.
-
As the sync starts, for each item, the current value of the change counter is remembered. At the end of the sync, the counter is decremented by this value. Thus, items which were changed between the time the sync started and completed will be left with a non-zero change counter at the end of the sync.
Syncing FAQs
This section is stolen from this document
What’s the global sync ID and the collection sync ID?
Both guids, both used to identify when the data in the server has changed radically underneath us (eg, when looking at lastModified is no longer a sane thing to do.)
The "global sync ID" changing means that every collection needs to be assumed as having changed radically, whereas just the "collection sync ID" changing means just that one collection.
These global IDs are most likely to change on a node reassignment (which should be rare now with durable storage), a password reset, etc. An example of when the collection ID will change is a "bookmarks restore" - handling an old version of a database re-appearing is why we store these IDs in the database itself.
What’s get_sync_assoc
, why is it important? What is StoreSyncAssociation
?
They are all used to track the guids above. It’s vitally important we know when these guids change.
StoreSyncAssociation is a simple enum which reflects the state a sync engine
can be in - either Disconnected
(ie, we have no idea what the GUIDs are) or
Connected
where we know what we think the IDs are (but the server may or may
not match with this)
These GUIDs will typically be stored in the DB in the metadata table.
what is apply_incoming
versus sync_finished
apply_incoming
is where any records incoming from the server (ie, possibly
all records on the server if this is a first-sync, records with a timestamp
later than our last sync otherwise) are processed.
sync_finished
is where we've done all the sync work other than uploading new
changes to the server.
What's the diff between reset and wipe?
- Reset means “I don’t know what’s on the server - I need to reconcile everything there with everything I have”. IOW, a “first sync”
- Wipe means literally “wipe all server data”
Exposing to consumers
You will need an FFI or some other way of exposing stuff to your consumers.
We use a tool called UniFFI to automatically generate FFI bindings from the Rust code.
If UniFFI doesn't work for you, then you'll need to hand-write the FFI layer. Here are some earlier blog posts on the topic which might be helpful:
- Building and Deploying a Rust library on Android
- Building and Deploying a Rust library on iOS
- Blog post re: lessons in binding to Rust code from iOS
The above are likely to be superseded by uniffi docs, but for now, good luck!
Naming Conventions
All names in this project should adhere to the guidelines outlined in this document.
Rust Code
TL;DR: do what Rust's builtin warnings and clippy lints tell you (and CI will fail if there are any unresolved warnings or clippy lints).
Overview
-
All variable names, function names, module names, and macros in Rust code should follow typical
snake_case
conventions. -
All Rust types, traits, structs, and enum variants must follow
UpperCamelCase
. -
Static and constant variables should be written in
SCREAMING_SNAKE_CASE
. s
For more in-depth Rust conventions, see the Rust Style Guide.
Examples:
#![allow(unused)] fn main() { fn sync15_passwords_get_all() struct PushConfiguration{...} const COMMON_SQL }
Swift Code
Overview
-
Names of types and protocols are
UpperCamelCase
. -
All other uses are
lowerCamelCase
.
For more in-depth Swift conventions, check out the Swift API Design Guidelines.
Examples:
enum CheckChildren{...}
func checkTree()
public var syncKey: String
Kotlin Code
If a source file contains only a top-level class, the source file should reflect the case-sensitive name of the class plus the .kt extension. Otherwise, if the source contains multiple top-level declarations, choose a name that describes the contents of the file, apply UpperCamelCase
and append .kt
extension.
Overview
-
Names of packages are always lower case and do not include underscores. Using multi-word names should be avoided. However, if used, they should be concatenated or use
lowerCamelCase
. -
Names of classes and objects use
UpperCamelCase
. -
Names of functions, properties, and local variables use
lowerCamelCase
.
For more in-depth Kotlin Conventions, see the Kotlin Style Guide.
Examples:
//FooBar.kt
class FooBar{...}
fun fromJSONString()
package mozilla.appservices.places
Converting an existing Component to use UniFFI
When we started building the components in this repo, exposing Rust code to Kotlin and Swift was a manual process and each component had its own hand-written FFI layer and foreign-language bindings.
As we've gained more experience with building components in this way, we've started to automate bindings generation and capture best practices in a tool called UniFFI, which is the currently recommended approach when adding a new component from scratch.
We expect that existing components will gradually be ported over to use UniFFI, and this document is a guide to doing that port.
First, get familiar with UniFFI
First, make sure you've perused the UniFFI guide to understand the overall architecture of a UniFFI component, and take a look at the guide to adding a new component to understand how such components fit in to this repo. The aim of porting will be to have a component that looks like it was added by the process described therein.
Next, get familiar with the target component
Pre-UniFFI components typically consist of four main parts:
- A Rust crate implementing the core functionality of the component
- A separate Rust crate that exposes the core functionality over a C-style FFI.
- An Android package that imports the C-style FFI into idiomatic Kotlin.
- A Swift module that imports the C-style FFI into idiomatic Swift.
The code for these parts will be laid out something like this:
components/<component_name>/
Cargo.toml
src/
- Rust code for the core functionality of the component goes here.
ffi/
Cargo.toml
src/
- Rust code specifically for exposing the C-style FFI goes here.
android/
build.gradle
src/
main/
AndroidManifest.xml
java/mozilla/appservices/<component_name>/
Lib<ComponentName>FFI.kt
(low-level bindings to the C-style FFI)- Higher-level hand-written Kotlin that wraps the FFI.
ios/
<component_name>/
Rust<ComponentName>API.h
(low-level bindings to the C-style FFI)- Higher-level hand-written Swift that wraps the FFI.
The goal here is to replace much of the hand-written wrapper layers with autogenerated code:
- The
./ffi/
crate will disappear entirely, its work is automated by UniFFI- If you still need some hand-written
pub extern "C"
functions, perhaps to implement features not currently supported by UniFFI, then they should move intolib.rs
of the main component crate.
- If you still need some hand-written
- The low-level
Lib<ComponentName>FFI.kt
file will disappear entirely, as will some of the code that converts it back into nice high-level Kotlin classes and interfaces.- Some of the hand-written Kotlin code may remain, if it provides functionality that cannot be implemented in Rust.
- The low-level
Rust<ComponentName>API.h
file will disappear entirely, as will some of the code that converts it back into nice high-level Swift classes and interfaces.- Some of the hand-written Swift code may remain, if it provides functionality that cannot be implemented in Rust.
You'll aim to end up with a simplified file structure that looks like this:
components/<component_name>/
Cargo.toml
uniffi.toml
src/
<component_name>.udl
(abstract interface definition)- Rust code here.
android/
build.gradle
src/
main/
AndroidManifest.xml
java/mozilla/appservices/<component_name>/
- Optional hand-written Kotlin code here.
ios/
<component_name>/
- Optional hand-written Swift code here.
Write a first draft of the .udl
file for the component's interface
Make sure you've got the uniffi-bindgen
command available; cargo install uniffi_bindgen
will
ensure you have the latest version.
Create ./src/<component_name>.udl
and try to describe the intended interface for the component
using UniFFI's interface definition language.
You'll probably need to reverse-engineer it a little bit from the existing hand-written Kotlin and/or
Swift code.
Don't spend too much time on trying to match every minute detail of the existing hand-written API. There are likely to be small differences between how UniFFI likes to do things and how the hand-written APIs were structured, and it's in everyone's best long-term interests to just push ahead and update consumers to accommodate any breaking API changes, rather than e.g. trying to convince UniFFI to capitalize enum variant names in the same style that the hand-written code was using.
To check whether the .udl
file is syntactically valid, you can use uniffi-bindgen
to generate
the Rust FFI scaffolding like so:
uniffi-bindgen scaffolding ./src/<component_name>.udl
If this succeeds, it will generate a file ./src/<component_name>.uniffi.rs
with a bunch of
thorny auto-generated Rust code. If it fails, it will likely fail with an inscrutable error message.
Unfortunately the error reporting in UniFFI is currently a known pain point, and it can take a
bit of trial-and-error to identify what part of the file is causing the issue. Sorry :-(
The aim at this point is to ensure that the intended interface of the component can be expressed in terms that UniFFI understands. Most cases should be supported, but you may find some aspect of the existing component that is hard to express in UniFFI, perhaps even uncovering new functionality that needs to be added to UniFFI itself!
The .udl
file is definitely a first draft at this point. It is normal and expected to need
to iterate on this file as you port over the underlying Rust code.
Restructure the Rust code to introduce UniFFI
You will now restructure the existing Rust crate so that its public API surface
and overall "shape" match what you defined in the .udl
file.
Start by deleting the ./ffi
sub-crate, because you're going to use UniFFI to generate
all of that code. You'll also need to remove it from the workspace in the top-level
Cargo.toml
file, as well as change the crates under /megazords
to import the core
Rust crate for the component rather than importing the FFI sub-crate.
Add UniFFI to the crate's dependencies and configure its build.rs
script to invoke the
UniFFI scaffolding generator, as described in "adding a new component".
Now, edit ./lib.rs
so that it matches the interface defined in the .udl
file as closely
as possible. If the .udl
has an interface Example
then lib.rs
should contain a
pub struct Example
, if the .udl
contains an enum ExampleItem
then lib.rs
should
contain a pub enum ExampleItem
, and so-on.
The details of this step will depend heavily on the specific crate, but some tips include:
-
You may find it useful to move all of the existing code into a sub-module named
internal
, and then make a brand newlib.rs
that imports or re-defines just the pieces it needs in order to implement the interface from the.udl
file. Thefxa-client
crate is an example of a case where this worked out well, though of course your mileage may vary. -
If the existing crate contains a file named like
<component_name>_msg_types.proto
, then it was using Protocol Buffers to serialize data to pass over the FFI. The message types defined in the.proto
file will need to be converted intodictionary
orenum
definitions in your.udl
file. See the section below for more details.
As noted above, don't be afraid to accept some API churn during the conversion process. We're willing to accept some breaking API changes as the cost of getting bindings generated for free, as long as the core functionality and mental model of the component remain intact.
At this point, in theory the crate should be buildable with UniFFI, although it's likely
to require some iteration to get it all working! Run cargo check
to check for any
compilation errors without having to do a full build.
Removing Protobuf Messages
Passing rich structured data over the FFI is the most complex part of our hand-written bindings, and was previously done by serializing data via Protocol Buffers. This is something that UniFFI tries to make as simple as possible.
Start by locating the <component_name>_msg_types.proto
file for the component. This file defines
the structured messages that can be passed over the FFI, and you should see that they correspond
to various types of structured data that the component wants to receive from, or return to,
the foreign-language code.
Find the places in your .udl
interface that correspond to these message types and make sure
that you've got a similarly-shaped dictionary
or enum
for each one. You should find that
representing this structured data in UDL is simpler than protobuf in many cases - for example
many of our .protobuf
files need to use a separate ExampleStructs
message in order to
pass a list of ExampleStruct
messages over the FFI, but in UniFFI this is represented
directly as sequence<ExampleStruct>
.
Find the places in the Rust code that are using these message types to return structured data.
In simple cases, you may be able to directly replace uses of msg_types::ExampleStruct
with
the corresponding crate::ExampleStruct
from your public API.
For more complex cases, you may find it helpful to define an Into
mapping between the
UniFFI dictionary/enum in the crate's public interface, and a more complex struct designed
for internal use.
As noted above, don't be afraid to accept some API churn during this conversion process.
Once you have replaced all uses of the msg_types
structs in the Rust code:
- Delete
./src/<component_name>_msg_types.proto
. - Delete
./src/mozilla.appservices.<component_name>.protobuf.rs
, which is generated from the.proto
file. - Remote
prost
andprost-derive
from the crate's dependencies. - Delete the crate from the list in
/tools/protobuf_files.toml
.
If you happen to find that you've deleted the last crate from the list in protobuf_files.toml
,
congratulations! You've successfully removed protocol buffers from this repo entirely, and should
file a bug to track the complete removal of protobuf from our tooling and dependency chain.
Document the Public API in the Rust code
Write consumer-facing documentation on the public API in lib.rs
using Rust's standard
rustdoc conventions
and tools. The fxa-client
crate may serve as a good example.
You can view the generated documentation by running:
cargo doc --no-deps --open
In future, we intend to automatically extract documentation from the Rust code and make it easily available to consumers of the generated bindings.
(In fact there is some work-in-progress code in uniffi-rs#416
that can read docs from the Rust code and write them back into the .udl
file, which you're
welcome to try out if you're feeling adventurous. But it's just a very hacky prototype.)
Set up the Kotlin wrapper
It's easiest to start by removing all of the hand-written Kotlin code under android/src/main/java
and then restoring parts of it later if necessary. Leave the AndroidManifest.xml
file and any tests
in place.
Delete the android/build.gradle
file and then follow the instructions for adding Kotlin bindings
for a new component to create a new build.gradle
file and a corresponding uniffi.toml
.
This should be all that's required to set up UniFFI to build the Kotlin bindings. Try building the Android package to confirm:
./gradlew <component_name>:assembleDebug
The UniFFI-generated Kotlin code will be under ./android/build/generated/source/uniffi/
and
may be useful for debugging.
If there are existing Kotlin tests for the component, the next step is to get those passing:
./gradlew <component_name>:test
As noted above, it is normal and expected for the autogenerated bindings to be subtly different from the previous hand-written ones. For example, UniFFI insists on using SHOUTY_SNAKE_CASE variant names in Kotlin enums while the hand-written code may have used CamelCase. Some components also have small naming differences between the Rust code and the hand-written Kotlin bindings, which UniFFI will not allow.
If the component had functionality in its Kotlin layer that was not part of the Rust API,
then you'll need to add some hand-written Kotlin code under android/src/main/java
to
implement it. The fxa-client
component may be a good example here: its Rust layer exposes
a FirefoxAccount
struct that the Kotlin code wraps into a PersistedFirefoxAccount
class,
adding the ability to set a persistence callback.
Finally, you will need to try out the new bindings with a consuming app. For Kotlin code you should make a local build of android-components and Fenix, updating them to accommodate any changes in the component's public API.
Set up the Swift wrapper
It's easiest to start by removing all of the hand-written Swift code under ./ios
and then
restoring parts of it later if necessary.
Edit /megazords/ios-rust/MozillaTestServices.h
to remove any references to Rust<ComponentName>API.h
,
replacing them with the UniFFI-generated header file name <component_name>FFI.h
.
Open /megazords/ios-rust/MozillaTestServices.xcodeproj
in Xcode and follow the instructions for
adding Swift bindings for a new component to
configure Xcode to build your UniFFI-generated bindings.
While you are in the Xcode Project Navigator, you should also delete any references to
Rust<ComponentName>API.h
or to the old hand-written Swift wrappers. (They should be highlighted
in red in the Project Navigator, because the files will be missing from disk after you
deleted them above).
This should be all that's required to set up UniFFI to build the Swift bindings. Try building the project in Xcode to confirm.
The UniFFI-generated Swift code will be under ios/Generated
and may be useful for debugging.
If there are existing Swift tests for the component, the next step is to get those passing:
./automation/run_ios_tests.sh
- (or run them from the Xcode GUI)
As noted above, it is normal and expected for the autogenerated bindings to be subtly different from the previous hand-written ones. Many existing components have small naming differences between the Rust code and the hand-written Swift bindings, which UniFFI will not allow.
If the component had functionality in its Swift layer that was not part of the Rust API,
then you'll need to add some hand-written Swift code under ./ios/<ComponentName>
to
implement it. The fxa-client
component may be a good example here: its Rust layer exposes
a FirefoxAccount
struct that the Swift code wraps into a PersistedFirefoxAccount
class,
adding the ability to set a persistence callback.
You will need to add any such file to the "Compile Sources" list in Xcode, in the same way
that you added the .udl
file.
Finally, you will need to try out the new bindings with a consuming app. For Swift code you should make a local build of Firefox iOS, you can do that by following the steps in this document
Rust + Android FAQs
How do I expose Rust code to Kotlin?
Use UniFFI, which can produce Kotlin bindings for your Rust code from an interface definition file.
If UniFFI doesn't currently meet your needs, please open an issue to discuss how the tool can be improved.
As a last resort, you can make hand-written bindings from Rust to Kotlin,
essentially manually performing the steps that UniFFI tries to automate
for you: flatten your Rust API into a bunch of pub extern "C"
functions,
then use JNA to call them
from Kotlin. The details of how to do that are well beyond the scope of
this document.
How should I name the package?
Published packages should be named org.mozilla.appservices.$NAME
where $NAME
is the name of your component, such as logins
. The Java namespace in which
your package defines its classes etc should be mozilla.appservices.$NAME.*
.
How do I publish the resulting package?
Add it to .buildconfig-android.yml
in the root of this repository.
This will cause it to be automatically included as part of our release
publishing pipeline.
How do I know what library name to load to access the compiled rust code?
Assuming that you're building the Rust code as part of the application-services
build and release process, your pub extern "C"
API should always be available
from a file named libmegazord.so
.
What challenges exist when calling back into Kotlin from Rust?
There are a number of them. The issue boils down to the fact that you need to be completely certain that a JVM is associated with a given thread in order to call java code on it. The difficulty is that the JVM can GC its threads and will not let rust know about it.
JNA can work around this for us to some extent, at the cost of some complexity.
The approach it takes is essentially to spawn a thread for each callback
invocation. If you are certain you’re going to do a lot of callbacks and they
all originate on the same thread, you can have them all run on a single thread
by using the CallbackThreadInitializer
.
With the help of JNA's workarounds, calling back from Rust into Kotlin isn’t too bad so long as you ensure that Kotlin cannot GC the callback while rust code holds onto it (perhaps by stashing it in a global variable), and so long as you can either accept the overhead of extra threads being instantiated on each call or are willing to manage the threads explicitly.
Note that the situation would be somewhat better if we used JNI directly (and not JNA), but this would cause us to need to generate different Rust FFI code for Android than for iOS.
Ultimately, in any case where there is an alternative to using a callback, you should probably pursue that alternative.
For example if you're using callbacks to implement async I/O, it's likely better to move to doing a blocking call, and have the calling code dispatch it on a background thread. It’s very easy to run such things on a background thread in Kotlin, is in line with the Android documentation on JNI usage, and in our experience is vastly simpler and less painful than using callbacks.
(Of course, not every case is solvable like this).
Why are we using JNA rather than JNI, and what tradeoffs does that involve?
We get a couple things from using JNA that we wouldn't with JNI.
-
We are able to use the same Rust FFI code on all platforms. If we used JNI we'd need to generate an Android-specific Rust FFI crate that used the JNI APIs, and a separate Rust FFI crate for exposing to Swift.
-
JNA provides a mapping of threads to callbacks for us, making callbacks over the FFI possible. That said, in practice this is still error prone, and easy to misuse/cause memory safety bugs, but it's required for cases like logging, among others, and so it is a nontrivial piece of complexity we'd have to reimplement.
However, it comes with the following downsides:
- JNA has bugs. In particular, its not safe to use bools with them, it thinks
they are 32 bits, when on most platforms (every platform Rust supports) they
are 8 bits. They've been unwilling to fix the issue due to it breaking
backwards compatibility (which is... somewhat fair, there is a lot of C89
code out there that uses
bool
as a typedef for a 32-bitint
). - JNA makes it really easy to do the wrong thing and have it work but corrupt
memory. Several of the caveats around this are documented in the
ffi_support
docs, but a major one is when to usePointer
vsString
(getting this wrong will often work, but may corrupt memory).
We aim to avoid triggering these bugs by auto-generating the JNA bindings rather than writing them by hand.
How do I debug Rust code with the step-debugger in Android Studio
- Uncomment the
packagingOptions { doNotStrip "**/*.so" }
line from the build.gradle file of the component you want to debug. - In the rust code, either:
- Cause something to crash where you want the breakpoint. Note: Panics
don't work here, unfortunately. (I have not found a convenient way to
set a breakpoint to rust code, so
unsafe { std::ptr::write_volatile(0 as *const _, 1u8) }
usually is what I do). - If you manage to get an LLDB prompt, you can set a breakpoint using
breakpoint set --name foo
, orbreakpoint set --file foo.rs --line 123
. I don't know how to bring up this prompt reliably, so I often do step 1 to get it to appear, delete the crashing code, and then set the breakpoint using the CLI. This is admittedly suboptimal.
- Cause something to crash where you want the breakpoint. Note: Panics
don't work here, unfortunately. (I have not found a convenient way to
set a breakpoint to rust code, so
- Click the Debug button in Android Studio, to display the "Select Deployment Target" window.
- Make sure the debugger selection is set to "Both". This tends to unset itself, so make sure.
- Click "Run", and debug away.
Breaking changes in application-services code
Application-services components are consumed by multiple consumers including Firefox Android, Firefox iOS, Focus Android, and Focus iOS. To minimize the disruption to those projects when making breaking API changes, we follow a simple rule: Have approved PRs ready to land that fix the breakage in the other repos before merging the PR into application-services.
This means writing code for the firefox-android and firefox-ios repositories that resolves any breaking changes, creating a PR in those repositories, and waiting for it to be approved.
You can test this code locally using the autopublish flow (Android, iOS) and use the branch build system to run CI tests.
Merging
Do not merge any PRs until all are approved. Once they are all approved then:
- Merge the
application-services
PR intomain
- Manually trigger a new nightly build using the taskcluster hook: https://firefox-ci-tc.services.mozilla.com/hooks/project-releng/cron-task-mozilla-application-services%2Fnightly
- Once the nightly task completes, trigger a new rust-components-swift build using the github action: https://github.com/mozilla/rust-components-swift/actions/workflows/update-as-nightly.yml
- Update the
firefox-android
andfirefox-ios
PRs to use the newly built nightly: - Ideally, get the PRs merged before the firefox-android/firefox-ios nightly bump the next day. If you don't get these merged, then the nightly bump PR will fail. Add a link to your PR in the nightly bump PR so the mobile teams know how to fix this.
Vendoring Application Services into mozilla-central
Some of these components are used in mozilla-central. This document describes how to update existing components or add new components.
The general process for vendoring rust code into mozilla-central has its own documentation - please make sure you read that before continuing.
When to vendor
We want to keep our versions in moz-central relatively up-to-date, but it takes some manual effort to do. The main possibility of breakage is from a dependency mismatch, so our current vendoring policy is:
- Whenever a 3rd-party dependency is added or updated, the dev who made the change is responsible for vendoring.
- At the start of the release cycle the triage owner is response for vendoring.
Updating existing components.
To update components which are already in mozilla-central, follow these steps:
-
Ensure your mozilla-central build environment is setup correctly to make "non-artifact" builds - check you can get a full working build before starting this process.
-
Run
./tools/update-moz-central-vendoring.py [path-to-moz-central]
from the application-services root directory. -
If this generates errors regarding duplicate crates, you will enter a world of pain, and probably need to ask for advice from the application-services team, and/or the
#build
channel on matrix. -
Run
./mach cargo vet
to check if there any any new dependencies that need to be vetted. If there are ask for advice from the application-services team. -
Build and test your tree. Ideally make a try run.
-
Put your patch up to phabricator, requesting review from, at least, someone on the application-services team and one of the "build peers" - asking on
#build
on matrix for a suitable reviewer might help. Alternatively, try and find the bug which made the most recent update and ask the same reviewer in that patch. -
Profit!
Adding a new component
Follow the Uniffi documentation on mozilla-central to understand where you'll need to add your crate path and UDL. In general:
- The consuming component will specify the dependency as a nominal "version 0.1"
- The top-level
Cargo.toml
will override that dependency with a specific git revision.
For example, consider the webext-storage crate:
- The consuming crate specifies version 0.1
- The top-level Cargo.toml specifies the exact revision.
Adding a new component implies there will be related mozilla-central changes
which leverage it. The best practice here is to land both the vendoring of the
new component and the related mozilla-central
changes in the same bug, but in
different phabricator patches. As noted above, the best-practice is that all
application-services components are on the same revision, so adding a new
component implies you will generally also be updating all the existing
components.
For an example of a recently added component, the tabs was recently added to mozilla-central with uniffi and shows a general process to follow.
Vendoring an unreleased version for testing purposes
Sometimes you will need to make changes in application-services and in mozilla-central simultaneously - for example, you may need to add new features or capabilities to a component, and matching changes in mozilla-central to use that new feature.
In that scenario, you don't want to check your changes in and re-vendor as you iterate - it would be far better to use a local checkout of application-services with uncommitted changes with your mozilla-central tree which also has uncommitted changes.
To do this, you can edit the top-level Cargo.toml
to specify a path. Note
however that in this scenario, you need to specify the path to the
individual component rather than to the top-level of the repo.
For example, you might end up with something like:
# application-services overrides to make updating them all simpler.
interrupt-support = { path = "../application-services/components/support/interrupt" }
relevancy = { path = "../application-services/components/relevancy" }
suggest = { path = "../application-services/components/suggest" }
sql-support = { path = "../application-services/components/support/sql" }
sync15 = { path = "../application-services/components/sync15" }
tabs = { path = "../application-services/components/tabs" }
viaduct = { path = "../application-services/components/viaduct" }
webext-storage = { path = "../application-services/components/webext-storage" }
Note that when you first do this, it will still be necessary to run
./mach vendor rust
and to re-build.
After you make a change to the local repository, you do not need to run
./mach vendor rust
, but you do still obviously need to rebuild.
Once you are happy with all the changes, you would:
- Open a PR up in application-services and land your changes there.
- Follow the process above to re-vendor your new changes, and in that same bug (although not necessarily the same phabricator patch), include the other mozilla-central changes which rely on the new version.
Application Services Logging
When writing code in application-services, code implemented in Rust, Kotlin, Java, or Swift might have to write debug logs. To do so, one should generally log using the normal logging facilities for the language. Where the logs go depends on the application which is embedding the components.
Accessing logs when running Fenix
On android, logs currently go to logcat. (This may change in the future.) Android Studio can be used to view the logcat logs; connect the device over USB and view the Logcat tab at the bottom of Android Studio. Check to make sure you have the right device selected at the top left of the Logcat pane, and the correct process to the right of that. One trick to avoid having to select the correct process (as there are main and content processes) is to choose "No Filters" from the menu on the top right of the Logcat pane. Then, use the search box to search for the log messages you are trying to find.
There are also many other utilities, command line and graphical, that can be used to view logcat logs from a connected android device in a more flexible manner.
Changing the loglevel in Fenix
If you need more verbose logging, after the call to RustLog.enable()
in
FenixApplication
, you may call RustLog.setMaxLevel(Log.Priority.DEBUG, true)
.
Accessing logs when running iOS
[TODO]
UniFFI object destruction on Kotlin
UniFFI supports interface objects, which are implemented by Boxing a Rust object and sending the raw pointer to the foreign code. Once the objects are no longer in use, the foreign code needs to destroy the object and free the underlying resources.
This is slightly tricky on Kotlin. The prevailing Java wisdom is to use explicit destructors and avoid using finalizers for destruction, which means we can't simply rely on the garbage collector to free the pointer. The wisdom seems simple to follow, but in practice it can be difficult to know how to apply it to specific situations. This document examines provides guidelines for handling UniFFI objects.
You can create objects in a function if you also destroy them there
The simplest way to get destruction right is to create an object and destroy it in the same function. The use function makes this really easy:
SomeUniFFIObject()
.use { obj ->
obj.doSomething()
obj.doSomethingElse()
}
You can create and store objects in singletons
If we are okay with UniFFI objects living for the entire application lifetime, then they can be stored in singletons. This is how we handle our database connections, for example SyncableLoginsStorage and PlacesReaderConnection.
You can create and store objects in an class, then destroy them in a corresponding lifecycle method
UniFFI objects can stored in classes like the Android Fragment class that have a defined lifecycle, with methods called at different stages. Classes can construct UniFFI
objects in one of the lifecycle methods, then destroy it in the corresponding one. For example, creating an object in Fragment.onCreate
and destroying it in Fragment.onDestroy()
.
You can share objects
Several classes can hold references to an object, as long as (exactly) one class is responsible for managing it and destroying it when it's not used. A good example is the GeckoLoginStorageDelegate. The LoginStorage
is initialized and managed by another object, and GeckoLoginStorageDelegate
is passed a (lazy) reference to it.
Care should be taken to ensure that once the managing class destroys the object, no other class attempts to use it. If they do, then the generate code will raise an IllegalStateException
. This clearly should be avoided, although it won't result in memory corruption.
Destruction may not always happen
Destructors may not run when a process is killed, which can easily happen on Android. This is especially true of lifecycle methods. This is normally fine, since the OS will close resources like file handles and network connections on its own. However, be aware that custom code in the destructor may not run.
Architectural Decision Log
This log lists the architectural decisions for MADR.
- ADR-0000 - Use Markdown Architectural Decision Records
- ADR-0001 - Update Logins API
- ADR-0002 - Handling Database Corruption
- ADR-0003 - Distributing Swift Packages
- ADR-0004 - Running experiments on first run early startup
- ADR-0005 - A remote-settings client for our mobile browsers.
- ADR-0007 - Limit Visits Migrated to Places History in Firefox iOS
For new ADRs, please use template.md as basis. More information on MADR is available at https://adr.github.io/madr/. General information about architectural decision records is available at https://adr.github.io/.
Use Markdown Architectural Decision Records
Context and Problem Statement
We want to record architectural decisions made in this project. Which format and structure should these records follow?
Considered Options
- MADR 2.1.2 – The Markdown Architectural Decision Records
- Michael Nygard's template – The first incarnation of the term "ADR"
- Sustainable Architectural Decisions – The Y-Statements
- Other templates listed at https://github.com/joelparkerhenderson/architecture_decision_record
- Formless – No conventions for file format and structure
Decision Outcome
Chosen option: "MADR 2.1.2", because
- Implicit assumptions should be made explicit. Design documentation is important to enable people understanding the decisions later on. See also A rational design process: How and why to fake it.
- The MADR format is lean and fits our development style.
- The MADR structure is comprehensible and facilitates usage & maintenance.
- The MADR project is vivid.
- Version 2.1.2 is the latest one available when starting to document ADRs.
Update Logins API
- Status: accepted
- Date: 2021-06-17
Technical Story: #4101
Context and Problem Statement
We no longer want to depend on SQLCipher and want to use SQLite directly for build complexity and concerns over the long term future of the rust bindings. The encryption approach taken by SQLCipher means that in practice, the entire database is decrypted at startup, even if the logins functionality is not interacted with, defeating some of the benefits of using an encrypted database.
The per-field encryption in autofill, which we are planning to replicate in logins, separates the storage and encryption logic by limiting the storage layer to the management of encrypted data. Applying this approach in logins will break the existing validation and deduping code so we need a way to implement per-field encryption while supporting the validation and de-duping behavior.
Decision Drivers
- Addressing previously identified deficiencies in the logins API while we are breaking the API for the encryption work
- Continuing to support the existing logins validation and deduping logic
- Avoiding the implementation of new security approaches that may require additional time and security resources
- Establishing a standard encryption approach across components
Considered Options
- Option 1 - Reduce the API functions that require the encryption key and pass the key to the remaining functions
- Option 2 - Keep the general shape of the API that is in place now - the app can pass the encryption key at any time to "unlock" the API, and re-lock it at any time, but the API in its entirety is only available when unlocked
Decision Outcome
Chosen Option: "Reduce the API functions that require the encryption key and pass the key to the remaining functions" because it will not require a security review as similar to the approach we have established in the codebase.
Pros and Cons of the Options
Option 1 - Reduce the API functions that require the encryption key and pass the key to the remaining functions
-
Description
Currently the below logins API functions would require the per-field encryption key:
-
Note:
- Functions related to sync have been omitted as it is assumed they will have access to decrypted data.
- The
get_all
,get_by_base_domain
, andget_by_id
functions will require the encryption key because they call the validate and fixup logic, not because we want to return logins with decrypted data.
Proposed changes:
- Combine the
add
andupdate
functions into a newadd_or_update
function- This will allow the removal of consumer code that distinguishes when a login record should be created or updated
- Note: This function needs the encryption key for the fixup and deduping logic and for continued support of the accurate population of the
time_password_changed
field
- Pass the per-field encryption key to the
import_multiple
function- This function will be removed once the Fennec to Fenix migration period ends
- Remove both the
potential_dupes_ignoring_username
andcheck_valid_with_no_dupes
from the API- Neither function is called in Firefox iOS
- Android Components uses both to provide validation and de-duping before logins are added or updated so we can eliminate the need to externalize these functions by replicating this logic in the new
add_or_update
function
- Create a
decrypt_and_fixup_login
function that both decrypts a login and performs the validate and fixup logic- This will eliminate the need for the
get_all
,get_by_base_domain
, andget_by_id
API functions to perform the fixup logic
- This will eliminate the need for the
Making the above changes will reduce the API functions requiring the encryption key to the following:
add_or_update
decrypt_and_fixup_login
import_multiple
-
Pros
- Improves the logins API for consumers by combining add/update functionality (see #3899 for details)
- Removes redundant validation and de-duping logic in consumer code
- Uses the same encryption model as autofill so there is consistency in our approaches
-
Cons
- Requires consumer code to both encrypt login fields and pass the encryption key when calling either
add_or_update
andimport_multiple
- Requires consumer code to both encrypt login fields and pass the encryption key when calling either
Option 2 - Implement a different key management approach
-
Description
Unlike the first option, the publicly exposed login API would only handle decrypted login records and all encryption is internal (which works because we always have the key). Any attempt to use the API will fail as the login records are not encrypted or decrypted if the key is not available.
Proposed changes:
- Combine the
add
andupdate
functions intoadd_or_update
- Remove both the
potential_dupes_ignoring_username
andcheck_valid_with_no_dupes
from the API
- Combine the
-
Pros
- Prevents the consumer from having to encrypt or decrypt login records
- Maintains our current fixup and validation approach
- Improves the logins API for consumers by combining add/update functionality
- Removes redundant validation and de-duping logic in consumer code
-
Cons
- Makes us responsible for securing the encryption key and will most likely require a security review
Links
Handling Database Corruption
- Status: accepted
- Date: 2021-06-08
Context and Problem Statement
Some of our users have corrupt SQLite databases and this makes the related component unusable. The best way to deal with corrupt databases is to simply delete the database and start fresh (#2628). However, we only want to do this for persistent errors, not transient errors like programming logic errors, disk full, etc. This ADR deals with 2 related questions:
- A) When and how do we identify corrupted databases?
- B) What do we do when we identify corrupted databases?
Decision Drivers
- Deleting valid user data should be avoided at almost any cost
- Keeping a corrupted database around is almost as bad. It currently prevents the component from working at all.
- We don't currently have a good way to distinguish between persistent and transient errors, but this can be improved by reviewing telemetry and sentry data.
Considered Options
- A) When and how do we identify corrupted databases?
- 1: Assume all errors when opening a database are from corrupt databases
- 2: Check for errors when opening a database and compare against known corruption error types
- 3: Check for errors for all database operations and compare against known corruption error types
- B) What do we do when we identify corrupted databases?
- 1: Delete the database file and recreate the database
- 2: Move the database file and recreate the database
- 3: Have the component fail
Decision Outcome
- A2: Check for errors when opening a database and compare against known corruption error types
- B1: Delete the database file and recreate the database
Decision B follows from the choice of A. Since we're being conservative in identifying errors, we can delete the database file with relative confidence.
"Check for errors for all database operations and compare against known corruption error types" also seems like a reasonable solution that we may pursue in the future, but we decided to wait for now. Checking for errors during opening time is the simpler solution to implement and should fix the issue in many cases. The plan is to implement that first, then monitor sentry/telemetry to decide what to do next.
Pros and Cons of the Options
A1: Assume all errors when opening a database are from corrupt databases
- Good, because the sentry data indicates that many errors happen during opening time
- Good, because migrations are especially likely to trigger corruption errors
- Good, because it's a natural time to delete the database -- the consumer code hasn't run any queries yet and doesn't have any open connections.
- Bad, because it will delete valid user data in several situations that are relatively common: migration logic errors, OOM errors, Disk full.
A2: Check for errors when opening a database and compare against known corruption error types (Decided)
- Good, because should eliminate the possibility of deleting valid user data.
- Good, because the sentry data indicates that many errors happen during opening time
- Good, because it's a natural time to delete the database -- the consumer code hasn't run any queries yet and doesn't have any open connections.
- Bad, because we don't currently have a good list corruption errors
A3: Check for errors for all database operations and compare against known corruption error types
- Good, because the sentry data indicates that many errors happen outside of opening time
- Good, because should eliminate the possibility of deleting valid user data.
- Bad, because the consumer code probably doesn't expect the database to be deleted and recreated in the middle of a query. However, this is just an extreme case of normal database behavior -- for example any given row can be deleted during a sync.
- Bad, because we don't currently have a good list corruption errors
B1: Delete the database file and recreate the database (Decided)
- Good, because it would allow users with corrupted databases to use the affected components again
- Bad, because any misidentification will lead to data loss.
B2: Move the database file and recreate the database
This option would be similar to 1, but instead of deleting the file we would move it to a backup location. When we started up, we could look for backup files and try to import lost data.
- Good, because if we misidentify corrupt databases, then we have the possibility of recovering the data
- Good, because it allows a way for users to delete their data (in theory).
If the consumer code executed a
wipe()
on the database, we could also delete any backup data. - Bad, because it's very difficult to write a recovery function that merged deleted data with any new data. This function would be fairly hard to test and it would be easy to introduce a new logic error.
- Bad, because it adds significant complexity to the database opening code
- Bad, because the user experience would be strange. A user would open the app, discover that their data was gone, then sometime later discover that the data is back again.
B3: Return a failure code
- Good, because this option leaves no chance of user data being deleted
- Good, because it's the simplest to implement
- Bad, because the component will not be usable if the database is corrupt
- Bad, because the user's data is potentially exposed in the corrupted database file and we don't provide any way for them to delete it.
Distributing Swift Packages
- Status: accepted
- Deciders: rfkelly
- Date: 2021-07-22
Context and Problem Statement
Our iOS consumers currently obtain application-services as a pre-compiled .framework
bundle
distributed via Carthage. The current setup is not
compatible with building on new M1 Apple Silicon machines and has a number of other problems.
As part of a broader effort to modernize the build process of iOS applications at Mozilla,
we have been asked to re-evaluate how application-services components are dsitributed for iOS.
See Problems with the current setup for more details.
Decision Drivers
- Ease-of-use for iOS consumers.
- Compatibility with M1 Apple Silicon machines.
- Consistency with other iOS components being developed at Mozilla.
- Ability for the Nimbus Swift bindings to easily depend on Glean.
- Ease of maintainability for application-services developers.
Considered Options
- (A) Do Nothing
- Keep our current build and distribution setup as-is.
- (B) Use Carthage to build XCFramework bundles
- Make a minimal change to our Carthage setup so that it builds the newer XCFramework format, which can support M1 Apple Silicon.
- (C) Distribute a single pre-compiled Swift Package
- Convert the all-in-one
MozillaAppServices
Carthage build to a similar all-in-one Swift Package, distributed as a binary artifact.
- Convert the all-in-one
- (D) Distribute multiple source-based Swift Package targets, with pre-compiled Rust code
- Split the all-in-one
MozillaAppServices
Carthage build into a separate Swift Package target for each component, with a shared dependency on pre-compiled Rust code as a binary artiact.
- Split the all-in-one
Decision Outcome
Chosen option: (D) Distribute multiple source-based Swift Packages, with pre-compiled Rust code.
This option will provide the best long-term consumer experience for iOS developers, and has the potential to simplify maintenance for application-services developers after an initial investment of effort.
Positive Consequences
- Swift packages are very convenient to consume in newer versions of Xcode.
- Different iOS apps can choose to import a different subset of the available components, potentiallying helping keep application size down.
- Avoids issues with mis-matched Swift version between application-services build and consumers, since Swift files are distributed in source form.
- Encourages better conceptual separation between Swift code for different components; e.g. it will make it possible for two Swift components to define an item of the same name without conflicts.
- Reduces the need to use Xcode as part of application-services build process, in favour of command-line tools.
Negative Consequences
- More up-front work to move to this new setup.
- We may be less likely to notice if our build setup breaks when used from within Xcode, because we're not exercising that code path ourselves.
- May be harder to concurrently publish a Carthage framework for current consumers who aren't able to move to Swift packages.
- There is likely to be some amount of API breakage for existing consumers, if only in having
to replace a single
import MozillaAppServices
with independent imports of each component.
Implementation Sketch
We will maintain the existing Carthage build infrastructure in the application-services repo and continue publishing a pre-built Carthage framework, to support firefox-ios until they migrate to Swift Packages.
We will add an additional iOS build task in the application-services repo, that builds just the Rust code as a .xcframework
bundle.
An initial prototype shows that this can be achieved using a relatively straightforward shell script, rather than requiring a second Xcode project.
It will be published as a .zip
artifact on each release in the same way as the current Carthage framework.
The Rust code will be built as a static library, so that the linking process of the consuming application can pull in
just the subset of the Rust code that is needed for the components it consumes.
We will initially include only Nimbus and its dependencies in the .xcframework
bundle,
but will eventually expand it to include all Rust components (including Glean, which will continue
to be included in the application-services
repo as a git submodule)
We will create a new repository rust-components-swift
to serve as the root of the new Swift Package distribution.
It will import the application-services
repository as a git submodule. This will let us iterate quickly on the
Swift packaging setup without impacting existing consumers.
We will initially include only Nimbus and its dependencies in this new repository, and the Nimbus swift code
it will depend on Glean via the external glean-swift
package. In the future we will publish all application-services
components that have a Swift interface through this repository, as well as Glean and any future Rust components.
(That's why the repository is being given a deliberately generic name).
The rust-components-swift
repo will contain a Package.swift
file that defines:
- A single binary target that references the pre-built
.xcframework
bundle of Rust code. - One Swift target for each component, that references the Swift code from the git submodule and depends on the pre-built Rust code.
We will add automation to the rust-components-swift
repo so that it automatically tracks
releases made in the application-services
repo and creates a corresponding git tag for
the Swift package.
At some future date when all consumers have migrated to using Swift packages, we will remove the Carthage build setup from the application-services repo.
At some future date, we will consider whether to move the Package.swift
definition in to the application-services
repo,
or whether it's better to keep it separate. (Attempting to move it into the application-services
will involve non-trivial
changes to the release process, because the checksum of the released .xcframework
bundle needs to be included in
the release tagged version of the Package.swift
file.)
Pros and Cons of the Options
(A) Do Nothing
In this option, we would make no changes to our iOS build and publishing process.
- Good, because it's the least amount of work.
- Neutral, because it doesn't change the maintainability of the system for appservices developers.
- Neutral, because it doesn't change the amount of separation between Swift code for our various components.
- Neutral, because it doesn't address the Swift version incompatibility issues around binary artifacts.
- Bad, because it will frustrate consumers who want to develop on M1 Apple Silicon.
- Bad, because it may prevent consumers from migrating to a more modern build setup.
- Bad, because it would prevent consumers from consuming Glean as a Swift package; we would require them to use the Glean that is bundled in our build.
This option isn't really tractable for us, but it's included for completeness.
(B) Use Carthage to build XCFramework bundles
In this option, we would try to change our iOS build and publishing process as little as possible, but use Carthage's recent support for building platform-independent XCFrameworks in order to support consumers running on M1 Apple Silicon.
- Good, because the size of the change is small.
- Good, because we can support development on newer Apple machines.
- Neutral, because it doesn't change the maintainability of the system for appservices developers.
- Neutral, because it doesn't change the amount of separation between Swift code for our various components.
- Neutral, because it doesn't address the Swift version incompatibility issues around binary artifacts.
- Bad, because our iOS consumers have expressed a preference for moving away from Carthage.
- Bad, because other iOS projects at Mozilla are moving to Swift Packages, making us inconsistent with perceived best practice.
- Bad, because it would prevent consumers from consuming Glean as a Swift package; we would require them to use the Glean that is bundled in our build.
- Bad, because consumers don't get to choose which components they want to use (without us building a whole new "megazord" with just the components they want).
Overall, current circumstances feel like a good opportunity to invest a little more time in order to set ourselves up for better long-term maintainability and happier consumers. The main benefit of this option (it's quicker!) is less attractive under those circumstances.
(C) Distribute a single pre-compiled Swift Package
In this option, we would compile the Rust code and Swift code for all our components into
a single .xcframework
bundle, and then distribute that as a
binary artifact via Swift Package. This is similar to the approach
currently taken by Glean (ref Bug 1711447)
except that they only have a single component.
- Good, because Swift Packages are the preferred distribution format for new iOS consumers.
- Good, because we can support development on newer Apple machines.
- Good, because it aligns with what other iOS component developers are doing at Mozilla.
- Neutral, because it doesn't change the maintainability of the system for appservices
developers.
- (We'd need to keep the current Xcode project broadly intact).
- Neutral, because it doesn't change the amount of separation between Swift code for our various components.
- Neutral, because it doesn't address the Swift version incompatibility issues around binary artifacts.
- Neutral, because it would prevent consumers from consuming Glean as a separate Swift package; they'd have to get it as part of our all-in-one Swift package.
- Bad, because it's a larger change and we have to learn about a new package manager.
- Bad, because consumers don't get to choose which components they want to use (without building a whole new "megazord" with just the components they want).
Overall, this option would be a marked improvement on the status quo, but leaves out some potential improvements. For not that much more work, we can make some of the "Neutral" and "Bad" points here into "Good" points.
(D) Distribute multiple source-based Swift Packages, with pre-compiled Rust code
In this option, we would compile just the Rust code for all our components into a single
.xcframework
bundle and distribute that as a binary artifact via Swift Package.
We would then declare a separate Swift source target for the Swift wrapper of each component,
each depending on the compiled Rust code but appearing as a separate item in the Swift package
definition.
- Good, because Swift Packages are the preferred distribution format for new iOS consumers.
- Good, because we can support development on newer Apple machines.
- Good, because it aligns with what other iOS component developers are doing at Mozilla.
- Good, because it can potentially simplify the maintenance of the system for appservices developers, by removing Xcode in favour of some command-line scripts.
- Good, because it introduces strict separation between the Swift code for each component, instead of compiling them all together in a single shared namespace.
- Good, because the Nimbus Swift package could cleanly depend on the Glean Swift package.
- Good, because consumers can choose which components they want to include.
- Good, because it avoids issues with Swift version incompatibility in binary artifacts.
- Bad, because it's a larger change and we have to learn about a new package manager.
The only downside to this option appears to be the amount of work involved, but an initial prototype has given us some confidence that the change is tractable and that it may lead to a system that is easier to maintain over time. It is thus our preferred option.
Appendix
Further Reading
- Bug 1711447 has good historical context on the work to move Glean to using a Swift Package.
- Some material on swift packages:
- Managing dependencies using the Swift Package Manager was a useful overview.
- Understanding Swift Packages and Dependency Declarations gives a bit of a deeper dive into having multiple targets with different names in a single package.
- Outputs of initial prototype:
- A prototype of Option (C): Nimbus + Glean as a pre-built XCFramework Swift Package
- A prototype of Option (D): Rust code as XCFRamework plus a Multi-product Swift Package that depends on it.
- A video demo of the resulting consumer experience.
Problems with the current setup
It doesn't build for M1 Apple Silicon machines, because it's not possible to support
both arm64 device builds and arm64 simulator builds in a single binary .framework
.
Carthage is dispreferred by our current iOS consumers.
We don't have much experience with the setup on the current Application Services team, and many of its details are under-documented. Changing the build setup requires Xcode and some baseline knowledge of how to use it.
All components are built as a single Swift module, meaning they can see each other's internal symbols and may accidentally conflict when naming things. For example we can't currently have two components that define a structure of the same name.
Consumers can only use the pre-built binary artifacts if they are using the same
version of Xcode as was used during the application-services build. We are not able
to use Swift's BUILD_LIBRARY_FOR_DISTRIBUTION
flag to overcome this, because some
of our dependencies do not support this flag (specifically, the Swift protobuf lib).
Running experiments on first run early startup
- Status: rejected
- Deciders: teshaq, travis, k88hudson, jhugman, jaredlockhart
- Date: 2021-08-16
Technical Story: https://mozilla-hub.atlassian.net/browse/SDK-323
Context and Problem Statement
As an experimenter, I would like to run experiments early on a user's first run of the application. However, the experiment data is only available on the second run. We would like to have that experiment data available before the user's first run. For more information: https://docs.google.com/document/d/1Qw36_7G6XyHvJZdM-Hxh4nqYZyCsYajG0L5mO33Yd5M/edit
Decision Drivers
- Availability of experiments early on the first run
- No impact on experimentation data analysis
- Flexibility in creating experiments
- Ability to quickly disable experiments
- Simplicity of releases
- Mobile's expectations of Nimbus (The SDK should be idempotent)
Considered Options
- (A) Do Nothing
- Keep everything the way it is, preventing us from experimenting on users early on their first run
- (B) Bundle Experiment data with app on release
- On release, have an
initial_experiments.json
that defines the experiments that will be applied early on the first run - Later on the first run, the client would retrieve the actual experiment data from remote-settings and overwrite the bundled data
- On release, have an
- (C) Retrieve Experiment data on first run, and deal with delay
- We can retrieve the experiment data on the first run, experiment data however will not be available until after a short delay (network I/O + some disk I/O)
Decision Outcome
None of the options were feasible, so for now we are sticking with option (A) Do Nothing until there are experiments planned that are expected to run on early startup on the first run, then we will revaluate our options.
The (B) Bundle Experiment data with app on release option was rejected mainly due to difficulty in disabling experiments and pausing enrollments. This can create a negative user experience as it prevents us from disabling any problematic experiments. Additionally, it ties experiment creation with application release cycles.
The (C) Retrieve Experiment data on first run, and deal with delay option was rejected due to the fact it changes the Nimbus SDK will no longer be idempotent,and the possibility of introducing undesirable UI.
Pros and Cons of the Options
Do nothing
- Good, because it keeps the flexibility in experiment creation
- Good, because disabling experiments can still done remotely for all experiments
- Good, because it keeps the Nimbus SDK idempotent.
- Bad, because it doesn't address the main problem of exposing experiments to user on their first run
Bundle Experiment data with app on release
- Good, because it allows us to run experiments early on a user's first run
- Good, because it prevents us from having to wait for experiments, especially if a user has a slow network connection
- Bad, because it ties experiment creation with release cycles
- Bad, because it prevents us from disabling problematic first-run experiments without a dot release
- Bad, because it prevents us from pausing enrollment on first-run experiments without a dot release
- Bad, because it requires investment from the console team, and can modify existing flows.
Retrieve Experiment data on first run, and deal with delay
- Good, because it enables us to retrieve experiments for users on their first run
- Good, because it keeps the flexibility in experiment creation
- Good, because disabling experiments can still done remotely for all experiments
- Bad, because experiments may not be ready early on the user's experience
- Bad, because it forces the customer application to deal with either the delay, or changing the configuration shortly after startup. e.g. a loading spinner or a pre-onboarding screen not under experimental control; delaying initialization of onboarding screens until after experiments have been loaded.
- Bad, because it changes the programming model from Nimbus being an idempotent configuration store to configuration changing non-deterministically.
- Bad, because the experimentation platform could force the app to add unchangeable user interface for the entire population. This itself may have an effect on key metrics.
Links
- RFC for bundling into iOS and Fenix
- Document presented to product managers about (C) Retrieve Experiment data on first run, and deal with delay: https://docs.google.com/document/d/1X1hC3t5zC7-Rp0OPIoiUr_ueLOAI0ez_jqslaNzOHjY/edit
- Demo presenting option (C) Retrieve Experiment data on first run, and deal with delay: https://drive.google.com/file/d/19HwnlwrabmSNsB7tjW2l4kZD3PWABi4u/view?usp=sharing
A remote-settings client for our mobile browsers.
-
Status: proposed
-
Discussion: https://github.com/mozilla/application-services/pull/5302
-
Deciders:
- csadilek for the mobile teams ✔️
- leplatrem for the remote-settings team ✔️
- mhammond for the application-services team ✔️
-
Date: 2022-12-16
Context and Problem Statement
Mozilla’s mobile browsers have a requirement to access the remote settings service, but currently lack any libraries or tools which are suitable without some work. A concrete use case is the management of search engine configurations, which are stored in Remote Settings for Firefox Desktop, but shipped as individual files on our mobile browsers, requiring application releases for all changes.
A constraint on any proposed solutions is that this work will be performed by Mozilla's mobile team, who have limited experience with Rust, and that it is required to be completed in Q1 2023.
This document identifies the requirements, then identifies tools which already exist and are close to being suitable, then identifies all available options we can take, and outlines our decision.
Requirements
The requirements are for a library which is able to access Mozilla’s Remote Settings service and return the results to our mobile browsers. This list of requirements is not exhaustive, but instead focuses on the requirements which will drive our decision making process. As such, it identifies the non-requirements first.
Non-requirements
The following items all may have some degree of desirability, but they are not hard requirements for the initial version
- While the https connection to the server must be validated, there is no requirement to verify the content received by the server - ie, there’s no requirement to check the signature of the body itself.
- There’s no requirement to validate the result of the server conforms to a pre-defined schema - we trust the server data.
- There’s no requirement to return strongly-typed data to the applications - returning a JSON string/object is suitable.
- There’s no requirement to cache server responses to the file-system - if the app requests content, it’s fine for the library to always hit the server.
- There’s no requirement for any kind of scheduling or awareness of network state - when we are requested for content, we do it immediately and return an appropriate error if it can not be fetched.
- There’s no requirement to support publishing records, requesting reviews or providing approvals via this new library.
- There’s no requirement that push be used to communicate changes to the application (eg, to enable rapid-enrolment type features)
- There’s no requirement to manage buckets, groups and collections via this new library.
Initial Requirements
The requirements we do have for the initial version are:
- The library should allow fetching records from Mozilla’s Remote Settings servers. This includes support for attachments, and fetching incremental changes.
- The library should not create threads or run any event loops - the mobile apps themselves are responsible for all threading requirements. While this might change in the future, considering this kind of change to our mobile applications is out of scope for this project.
- We must use Necko for all networking on Android, must enforce all connections are via valid https hosts (although some test-only exceptions might be helpful for QA, such as allowing localhost connections to be http)
- The library should be the only remote-settings library used in the browser. Specifically, this means that Nimbus must also be capable of using the library, and the work to move Nimbus to the library must be considered as part of the project.
Existing Libraries
We have identified the following libraries which may be suitable for this project.
Remote-settings on desktop
There is a version of the remote settings client in desktop, written in Javascript. It has been used and been relatively stable since at least 2018, so can be considered very capable, but the roadblock to it being suitable for use by our mobile browsers is that it is written in Javascript, so while it might be possible to expose it to Android via geckoview, there’s no reasonable path to have it made available to iOS.
Rust Remote Settings Client
There is an existing remote settings client on github. This client is written in Rust and has evolved over a number of years. The most recent changes were made to support being used in Merino, which was re-written in Python, so there are no known consumers of this library left.
The main attributes of this library relevant to this discussion are:
- It’s written in Rust, but has no FFI - ie, it’s currently only consumable by other Rust code.
- It has recently been updated to use async rust, so requires an internal event loop.
- It includes the capability to verify the signatures of the content.
The Nimbus-sdk Client
The nimbus-sdk is a component in the application-services repository written in Rust. It has client code which talks to the remote-settings server and while this has only actually been used with the "Nimbus" collection there's no reason to believe it can't be used in the more general case. The main attributes of this library relevant to this discussion are:
- It’s consumed by a component which is already consumed by our mobile browsers via UniFFI.
- It does not verify the signatures of the content - while this could be done, there hasn’t been sufficient justification made for this (ie, there are no realistic threat models which would be solved by this capability.)
- The client itself does not persist a local cache of remote resources, but instead delegates this responsibility to the consuming application (in this case, nimbus itself, which does persist them via the rkv library)
- It does not use async Rust, but instead everything is blocking and run on threads exclusively created by the app itself.
- It has good test support, which run against a docker image.
Considered Options
Option 1: Writing a new library
The requirements of this client are such that writing new libraries in Kotlin and Swift is currently a realistic option. However, we are rejecting this option because we don’t want to duplicate the effort required to write and maintain two libraries - inevitably, the features and capabilities will diverge. Future requirements such as supporting content signature verification would lead to significant duplication.
Writing a new library from scratch in Rust and exposing it via UniFFI so it can be used by both platforms is also a possibility. However, we are rejecting this option because existing Rust libraries already exist, so we would be better served by modifying or forking one of the existing libraries.
Option 2: Use the existing remote settings client
Modifying or forking the existing client is an attractive option. It would require a number of changes - the async capabilities would probably need to be removed (using a Rust event loop in our mobile browsers is something we are trying to avoid until we better understand the implications given these browsers already have an event loop and their own threading model).
The persistence model used by this library is something that is not a requirement for the new library, which isn’t itself a problem, but it probably would preclude being able to use this library by Nimbus - so the end result is that we would effectively have two remote-settings clients written in Rust and used by our browsers.
Some API changes would probably be required to make it suitable for use by UniFFI would also be necessary, but these would be tractable.
We would need to update nimbus to use this client, which would almost certainly require moving this client into the application-services repository to avoid the following issues:
- Marrying the persistence model of this client with the existing rkv-based persistence used by nimbus would be required.
- Ensuring the upstream version changes continued to work for us.
- Managing the circular dependency which exists due to this library needing to use viaduct.
- Complication of our build process because the library needs to end up in our “megazord”. These are the exact reasons why Nimbus itself is in the application-services repo.
Option 3: Use the existing nimbus client
Splitting the existing client out from Nimbus in a way that allows Nimbus to continue to use it, while also making it available for stand-alone use is also an attractive option.
In particular, the feature set of that client overlaps with the requirements of the new library - no local persistence is necessary and no signature verification is required. It is already used by a component which is exposed via UniFFI.
Note that this option does not preclude both Nimbus and this new crate from moving to the existing remote settings client at some point in the future. A key benefit of this decision is that it keeps nimbus and the new crate using the same client, so updating both to use a different client in the future will always remain an option.
Chosen Option
We have chosen Option 3 because it allows us to reuse the new client in Nimbus, as well as on iOS and on Android with minimal initial development effort. If the new library ends up growing requirements that are already in the existing remote settings client, we remain able to copy that functionality from that library into this.
Specific Plans
This section is non-normative - ie, is not strictly part of the ADR, but exists for context.
This is a very high-level view of the tasks required here.
-
Create a new top-level component in the application-services repository, identify the exact API we wish to expose for this new library, describe this API using UniFFI, then implement the API with “stubs” (eg, using rust
todo!()
or similar). This is depicted asRemoteSettings
in the diagram. -
Identify which parts of Nimbus should be factored out into a shared component (depicted as
rs-client
in the diagram below) and move that functionality to the new shared component. Of note:- This component probably will not have a UniFFI .udl file, but is just for consumption by the new component above and the existing nimbus component.
- There is still some uncertainty here - if it is a requirement that nimbus and the new component share some configuration or initialization code, we might need to do something more complex here. This seems unlikely, but possible, so is included here for completeness.
-
Identify which of the nimbus tests should move to the new client and move them.
-
Update Nimbus to take a dependency on the new component and use it, including tests.
-
Flesh out the API of the new top-level component using the new shared component (ie, replace the
todo!()
macros with real code.) -
Identify any impact on the Nimbus android/swift code - in particular, any shared configuration and initialization code identified above in the application-services repo.
-
Implement the Android and iOS code in the application-services repo desired to make this an ergonomic library for the mobile platforms.
-
Update the mobile code for the UniFFI changes made to Nimbus, if any.
-
Implement the mobile code which consumes the new library, including tests.
-
Profit?
This diagram attempts to depict this final layout. Note:
rs-client
andRemoteSettings
are both new components, everything else already exists. Please do not consider these names as suggestions! Names are hard, I'm sure we can do better.- Dashed lines are normal Rust dependencies (ie, dependencies listed in
Cargo.toml
) - Solid lines are where the component uses UniFFI
- Viaduct is a little odd in that it is consumed by the mobile applications indirectly (eg, via Glean), hence it's not in
support
, but please ignore that anomaly.
flowchart RL subgraph app-services-support[Shared components in application-services/components/support] rs-client other-support-components end subgraph app-services-components[Top-level application-services components, in application-services/components] Nimbus RemoteSettings Viaduct end subgraph mobile [Code in the mobile repositories] Fenix Firefox-iOS end Nimbus -- nimbus.udl --> mobile RemoteSettings -- remote_settings.udl --> mobile rs-client -.-> Nimbus other-support-components -.-> Nimbus rs-client -.-> RemoteSettings other-support-components -.-> RemoteSettings Viaduct -.-> rs-client other-support-components -.-> rs-client
Content Signatures
This section is non-normative - ie, is not strictly part of the ADR, but exists for context.
Content Signatures have been explicitly called out as a non-requirement. Because this capability was a sticking point in the desktop version of the remote settings client, and because significant effort was spent on it, it's worth expanding on this here.
Because https will be enforced for all network requests, the consumers of this library can have a high degree of confidence that:
- The servers hit by this client are the servers we expect to hit (ie, no man-in-the-middle attacks will be able to redirect to a different server).
- The response from the server is exactly what was sent by the Mozilla controlled server (ie, no man-in-the-middle attacks will be able to change the content in-flight)
- Therefore, the content received must be exactly as sent by the Mozilla controlled servers.
Content signatures offer an additional capability of checking the content of a remote settings response matches the signature generated with a secret key owned by Mozilla, independenty of the https certificates used for the request itself.
This capability was added to the desktop version primarily to protect the integrity of the data at rest. Because the Desktop client cached the responses on disk, there was a risk that this data could be tampered with - so it was effectively impossible to guarantee that the data finally presented to the application is what was initially sent.
The main threat-model that required this capability was 3rd party applications installed on the same system where Firefox was installed. Because of the security model enforced by Desktop operating systems (most notably Windows), there was evidence that these 3rd-party applications would locate and modify the cache of remote-settings responses and modify them in a way that benefited them and caused revenue harm to Mozilla - the most obvious example is changing the search provider settings.
The reason we are declaring this capability a non-requirement in the initial version is two-fold:
-
We have also declared caching of responses a non-requirement, meaning there's no data at rest managed by this library which is vulnerable to this kind of attack.
-
The mobile operating systems have far stronger application isolation - in the general case, a 3rd party mobile application is prevented from touching any of the files used by other applications.
Obviously though, things may change in the future - for example, we might add response caching, so we must be sure to reevaluate this requirement as other requirements change.
Limit Visits Migrated to Places History in Firefox iOS
- Status: accepted
- Deciders: teshaq, mhammond, lougeniaC64, dnarcese
- Date: 2023-01-06
Context and Problem Statement
A significant part of the project is migrating users’ history from the old database to a new one. To measure risk, we ran a dry-run migration. A dry-run migration runs a background thread in the user’s application and attempts to migrate to a fake database. The dry-run was implemented purely to collect telemetry on the migration to evaluate risk. The results can be found in the following Looker dashboard. Below is a list of observations.
Observations from Dry-Run Experiment
The following is a list of observations from the experiment:
- 5-6% of migrations do not end. This means for 5-6% of users, the application was terminated before migration ended. For a real migration, this would mean those users lose all of their history unless we attempt the migration multiple times.
- Out of the migrations that failed (the 5-6% mentioned above) 97% of those users had over 10,000 history visits.
- Out of migrations that do end, over 99% of migrations are successful.
- This means that we are not experiencing many errors with the migration beyond the time it takes.
- The average for visits migrated is around 25,000 - 45,000 visits.
- The median for visits migrated is around 5,000-15,000 visits.
- The difference between the average and the median suggests that we have many users with a large number of visits
- For migrations that did end, the following are the percentiles for how long it took (in milliseconds). We would like to emphasize that the following only includes migrations that did end
- 10th percentile: 37 ms
- 25th percentile: 80 ms
- 50th percentile: 400 ms
- 75th percentile: 2,500 ms (2.5 seconds)
- 90th percentile: 6,400 ms (6.4 seconds)
- 95th percentile: 11,000 ms (11 seconds)
- 99th percentile: 25,000 ms (25 seconds)
- 99.9th percentile: 50,000 ms (50 seconds)
Problem Statement
Given the observations from the dry-run experiment, the rest of the document examines an approach to answer the question: How can we increase the rate of which migrations end, and simultaneously keep the user’s database size at a reasonable size?
The user implication of keeping the rate of ended migrations high is that users keep their history, and can interact normally with the URL bar to search history, searching history in the history panel and navigating to sites they visited in the past.
The user implication of keeping a reasonable database size is that the database is less likely to lock on long queries. Meaning we reduce performance issues when users use the search bar, the history panel and when navigating to sites.
Finally, it’s important to note that power users and daily active users will be more likely to have large histories and thus:
- Power users are more likely to fail their migration.
- Power users are more likely to have performance issues with history and the search bar.
- We saw a version of this with Favicons in an earlier incident, where users were coming across a significant number of database locks, crashing the app. This isn’t to say that the incident is directly related to this, however, large histories could contribute to the state we saw in the incident as it would take longer to run the queries.
Decision Drivers
-
We must not lose users’ recent history.
- What is considered “recent” is not concretely defined. There is prior art, however:
-
User’s experience with History must not regress, and ideally should improve.
- User experience is tightly coupled with the size of the database. The larger the database, the longer queries take. The longer queries take, the longer it would take for a user to observe their searched history and the history panel.
Considered Options
- Keep the migration as-is.
- This option means that we have no limit. We will attempt to migrate all history for our users.
- Introduce a date-based limit on visits for the migration
- This option means that we only migrate visits that occurred in the past X days/weeks/months etc
- Introduce a visit number-based limit for the migration
- This option means we only migrate the latest X visits
Decision Outcome
Chosen option: Introduce a visit number-based limit for the migration. This option was chosen because given our decision drivers:
- We must not lose users’ recent history:
- We have established in the results of the dry-run, that the majority of failed migrations were for users with a large number of visits.
- By setting a reasonable limit, we can increase the likelihood the migration succeeds. We can set the limit to encompass “recent history” while choosing a limit that has an over 99% success rate.
- User’s experience with History must not regress, and ideally should improve.
- We have established in our decision driver that the user’s experience with history is coupled with the database size.
- By setting a reasonable limit, we can keep the size of the history database controlled.
- It's also worth noting that with the switch to the new implementation of history, we are also introducing a target size for the database. This means that we have maintenance logic that would compact the database and prune it if it grows beyond the target size.
Positive Consequences
- The migration runs in a shorter time.
- This means a higher chance of the migration succeeding, thus keeping the user’s recent history without loss.
- Users who have less than the selected limit, will still get all their history. More on this in the Suggested Limit section.
- We keep the size of the history database low.
- This way users with more than the limit, will only keep their recent history.
- Additionally, when we delete users’ history from the old database, the size of the data the app keeps will decrease dramatically.
- Keeping the database size low means we lower the chance a user has performance issues with the database.
Negative Consequences
The biggest negative consequence is that Users with more visits than the limit, will lose visits.
- Since we would only keep the latest X visits for a user, if a user has Y visits, they would lose all of the Y-X visits (assuming Y is greater than X)
- The effect here is mitigated by the observation that recent history is more important to users than older history. Unfortunately, we do not have any telemetry to demonstrate this claim, but it’s an assumption based on the existing limits on history imposed in Android and Desktop mentioned in the decision drivers section.
Pros and Cons of the Other Options
Keep the migration as-is
- Good because if the migration succeeds, users keep all their history.
- Bad, because it’s less likely for migrations to succeed.
- Bad, because even if the migration succeeds it causes the size of the database to be large if a user has a lot of history.
- Large databases can cause a regression in performance.
- Users with a lot of history will now have two large databases (the old and new ones) since we won’t delete the data in the old database right away to support downgrades.
- Bad, because it can take a long time for the migration to finish.
- Bad because until the migration is over users will experience the app without history.
Introduce a date-based limit on visits
- Good, because we match users’ usage of the app.
- Users that use the app more, will keep more of their history.
- Good, because it’s more likely that the migration ends because we set a limit
- Bad because it’s hard to predict how large user’s databases will be.
- This is particularly important for Sync users. As Firefox-iOS syncs all your history, meaning if a user has many visits before the limit across multiple platforms, a large number of visits will be migrated.
- Bad, because users who haven’t used the app since the limit, will lose all their history
- For example, if the limit is 3 months, a user who last used the app 3 months ago will suddenly lose all their history
Suggested Limit
This section describes a suggested limit for the visits. Although it’s backed up with telemetry, the specific number is up for discussion. It’s also important to note that statistical significance was not a part of the analysis. Migration has run for over 16,000 users and although that may not be a statistically significant representation of our population, it’s good enough input to make an educated suggestion.
- First, we start by observing the distribution of visit counts. This will tell us how many of our users have between 0-10000 visits, 10000-20000, etc. We will identify that most of our users have less than 10,000 visits.
- Then, we will observe the dry-run migration ended rate based on the above buckets. We will observe that users with under 10,000 visits have a high chance of migration success.
- Finally, based on the analysis and prior art we’ll suggest 10,000 visits.
User History Distribution
We will look at https://mozilla.cloud.looker.com/looks/1078 which demonstrates a distribution of our users based on the number of history visits. Note that the chart is based on our release population.
Observations
- 67% of firefox-ios users have less than 10,000 visits
- There is a very long tail to the distribution, with 6% of users having over 100,000 visits.
Dry-run Ending Rate by Visits
We will look at https://mozilla.cloud.looker.com/looks/1081. The chart demonstrates the rate at which migrations end by the number of visits. We bucket users in buckets of 10,000 visits.
Observations
- We observe that for users that have visits under 10,000, the success rate is over 99.6%.
- We observe that for users with over 100,000 visits, the success rate drops to 75~80%.
- Users in between, have success rates in between. For example, users with visits between 10,000 and 20,000 have a 98-99% success rate.
- All success rates for buckets beyond 20,000 visits drop under 96%.
Suggestion
Based on the above, we’re suggesting a limit of 10,000 visits because
- 10,000 visits encompass the full history of 67% of our users.
- Migrations with under 10,000 visits have a success rate of over 99%.
- For users with over 10,000 visits, they still keep the latest 10,000 visits. The choice is reasonable considering:
Links
- Epic for moving iOS’s history implementation to application-services places
- Dry-run migration experiment
- Overall dry-run migration looker dashboard
- Firefox iOS User distribution by history
- Migration Ended rate by User History
- Firefox Sync on Android only Syncs 5000 sites
- Firefox Desktop Limits import from Chrome to 2000 visits
- Firefox Android limits the size of its
places.db
to 75MiB - Chrome only keeps 90 days of history
- Performance incident in Firefox iOS
Design Documents
- Megazords - Megazords and how we ship code
- Sync Manager - Our Sync Manager and how Sync works in using it
- Shipping Rust Components as Swift Packages - High level design of how use the Swift Package Manager to distribute our Rust components to iOS
- Sync overview - High level overview of how Firefox sync works
- Rust Component's Strategy - High level description of our Rust components strategy
- Metrics - (Glean Telemetry)
- Rust Version Policy
Megazording
Each Rust component published by Application Services is conceptually a stand-alone library, but for
distribution we compile all the rust code for all components together into a single .so
file. This
has a number of advantages:
- Easy and direct interoperability between different components at the Rust level
- Cross-component optimization of generated code
- Reduced code size thanks to distributing a single copy of the rust stdlib, low-level dependencies, etc.
This process is affectionately known as "megazording" and the resulting artifact as a megazord library.
On Android, the situation is quite complex due to the way packages and dependencies are managed. We need to distribute each component as a separate Android ARchive (AAR) that can be managed as a dependency via gradle, we need to provide a way for the application to avoid shipping rust code for components that it isn't using, and we need to do it in a way that maintanins the advantages listed above.
This document describes our current approach to meeting all those requirements on Android. Other platforms such as iOS are not considered.
AAR Dependency Graph
We publish a separate AAR for each component (e.g. fxaclient, places, logins) which contains
just the Kotlin wrappers that expose the relevant functionality to Android. Each of these AARs depends on a separate
shared "megazord" AAR in which all the rust code has been compiled together into a single .so
file.
The application's dependency graph thus looks like this:
This generates a kind of strange inversion of dependencies in our build pipeline:
- Each individual component defines both a rust crate and an Android AAR.
- There is a special "full-megazord" component that also defines a rust crate and an Android AAR.
- The full-megazord rust crate depends on the rust crates for each individual component.
- But the Android AAR for each component depends on the Android AAR of the full-megazord!
It's a little odd, but it has the benefit that we can use gradle's dependency-replacement features to easily manage the rust code that is shipping in each application.
Custom Megazords
By default, an application that uses any appservices component will include the compiled rust code for all appservices components.
To reduce its overall code size, the application can use gradle's module replacement rules to replace the "full-megazord" AAR with a custom-built megazord AAR containing only the components it requires. Such an AAR can be built in the same way as the "full-megazord", and simply avoid depending on the rust crates for components that are not required.
To help ensure this replacement is done safely at runtime, the mozilla.appservices.support.native
package
provides helper functions for loading the correct megazord .so
file. The Kotlin wrapper for each component
should load its shared library by calling mozilla.appservices.support.native.loadIndirect
, specifying both
the name of the component and the expected version number of the shared library.
Unit Tests
The full-megazord AAR contains compiled rust code that targets various Android platforms, and is not
suitable for running on a Desktop development machine. In order to support integration with unittest
suites such as robolectric, each megazord has a corresponding Java ARchive (JAR) distribution named e.g.
full-megazord-forUnitTests.jar
. This contains the rust code compiled for various Desktop architectures,
and consumers can add it to their classpath when running tests on a Desktop machine.
Gotchas and Rough Edges
This setup mostly works, but has a handful of rough edges.
The build.gradle
for each component needs to declare an explicit dependency on project(":full-megazord")
,
otherwise the resulting AAR will not be able to locate the compiled rust code at runtime. It also needs to
declare a dependency between its build task and that of the full-megazord, for reasons. Typically this looks something
like:
tasks["generate${productFlavor}${buildType}Assets"].dependsOn(project(':full-megazord').tasks["cargoBuild"])
In order for unit tests to work correctly, the build.gradle
for each component needs to add the rustJniLibs
directory of the full-megazord project to its srcDirs
, otherwise the unittests will not be able to find and load
the compiled rust code. Typically this looks something like:
test.resources.srcDirs += "${project(':full-megazord').buildDir}/rustJniLibs/desktop"
The above also means that unittests will not work correctly when doing local composite builds, because it's unreasonable to expect the main project (e.g. Fenix) to include the above in its build scripts.
Sync manager
We've identified the need for a "sync manager" (although are yet to identify a good name for it!) This manager will be responsible for managing "global sync state" and coordinating each engine.
At a very high level, the sync manager is responsible for all syncing. So far, so obvious. However, given our architecture, it's possible to identify a key architectural split.
-
The embedding application will be responsible for most high-level operations. For example, the app itself will choose how often regular syncs should happen, what environmental concerns apply (eg, should I only sync on WiFi?), letting the user choose exactly what to sync, and so forth.
-
A lower-level component will be responsible for the direct interaction with the engines and with the various servers needed to perform a sync. It will also have the ultimate responsibility to not cause harm to the service (for example, it will be likely to enforce some kind of rate limiting or ensuring that service requests for backoff are enforced)
Because all application-services engines are written in Rust, it's tempting to suggest that this lower-level component also be written in Rust and everything "just works", but there are a couple of complications here:
-
For iOS, we hope to integrate with older engines which aren't written in Rust, even if iOS does move to the new Sync Manager.
-
For Desktop, we hope to start by reusing the existing "sync manager" implemented by Desktop, and start moving individual engines across.
-
There may be some cross-crate issues even for the Rust implemented engines. Or more specifically, we'd like to avoid assuming any particular linkage or packaging of Rust implemented engines.
Even with these complications, we expect there to be a number of high-level components, each written in a platform specific language (eg, Kotlin or Swift) and a single lower-level component to be implemented in Rust and delivered as part of the application-services library - but that's not a free-pass.
Why "a number of high-level components"? Because that is the thing which understands the requirements of the embedding application. For example, Android may end up with a single high-level component in the android-components repo and shared between all Android components. Alternatively, the Android teams may decide the sync manager isn't generic enough to share, so each app will have their own. iOS will probably end up with its own and you could imagine a future where Desktop does too - but they should all be able to share the low level component.
The responsibilities of the Sync Manager.
The primary responsibilities of the "high level" portion of the sync manager are:
-
Manage all FxA interaction. The low-level component will have a way to communicate auth related problems, but it is the high-level component which takes concrete FxA action.
-
Expose all UI for the user to choose what to sync and coordinate this with the low-level component. Note that because these choices can be made on any connected device, these choices must be communicated in both directions.
-
Implement timers or any other mechanism to fully implement the "sync scheduler", including any policy decisions such as only syncing on WiFi, etc.
-
Implement a UI so the user can "sync now".
-
Collect telemetry from the low-level component, probably augment it, then submit it to the telemetry pipeline.
The primary responsibilities of the "low level" portion of the sync manager are:
-
Manage the
meta/global
,crypto/keys
andinfo/collections
resources, and interact with each engine as necessary based on the content of these resources. -
Manage interaction with the token server.
-
Enforce constraints necessary to ensure the entire ecosystem is not subject to undue load. For example, this component should ignore attempts to sync continuously, or to sync when the services have requested backoff.
-
Manage the "clients" collection - we probably can't ignore this any longer, especially for bookmarks (as desktop will send a wipe command on bookmark restore, and things will "be bad" if we don't see that command).
-
Define a minimal "service state" so certain things can be coordinated with the high-level component. Examples of these states are "everything seems ok", "the service requested we backoff for some period", "an authentication error occurred", and possibly others.
-
Perform, or coordinate, the actual sync of the rust implemented engines - from the containing app's POV, there's a single "sync now" entry-point (in practice there might be a couple, but conceptually there's a single way to sync). Note that as below, how non-rust implemented engines are managed is TBD.
-
Manage the collection of (but not the submission of) telemetry from the various engines.
-
Expose APIs and otherwise coordinate with the high-level component.
Stuff we aren't quite sure where it fits include:
- Coordination with non-rust implemented engines. These engines are almost certainly going to be implemented in the same language as the high-level component, which will make integration simpler. However, the low-level component will almost certainly need some information about these engines for populating info/collections etc. For now, we are punting on this until things become a bit clearer.
Implementation Details.
The above has been carefully written to try and avoid implementation details - the intent is that it's an overview of the architecture without any specific implementation decisions.
These next sections start getting specific, so implementation choices need to be made, and thus will possibly be more contentious.
In other words, get your spray-cans ready because there's a bikeshed being built!
However, let's start small and make some general observations.
Current implementations and challenges with the Rust components
-
Some apps only care about a subset of the engines - lockbox is one such app and only cares about a single collection/engine. It might be the case that lockbox uses a generic application-services library with many engines available, even though it only wants logins. Thus, the embedding application is the only thing which knows which engines should be considered to "exist". It may be that the app layer passes an engine to the sync manager, or the sync manager knows via some magic how to obtain these handles.
-
Some apps will use a combination of Rust components and "legacy" engines. For example, iOS is moving some of the engines to using Rust components, while other engines will be ported after delivery of the sync manager, if they are ported at all. We also plan to introduce some rust engines into desktop without integrating the "sync manager"
-
The rust components themselves are designed to be consumed as individual components - the "logins" component doesn't know anything about the "bookmarks" component.
There are a couple of gotchyas in the current implementations too - there's an issue when certain engines don't yet appear in meta/global - see bug 1479929 for all the details.
The tl;dr of the above is that each rust component should be capable of working with different sync managers. That said though, let's not over-engineer this and pretend we can design a single, canonical thing that will not need changing as we consider desktop and iOS.
State, state and more state. And then some state.
There's loads of state here. The app itself has some state. The high-level Sync Manager component will have state, the low-level component will have state, and each engine has state. Some of this state will need to be persisted (either on the device or on the storage servers) and some of this state can be considered ephemeral and lives only as long as the app.
A key challenge will be defining this state in a coherent way with clear boundaries between them, in an effort to allow decoupling of the various bits so Desktop and iOS can fit into this world.
This state management should also provide the correct degree of isolation for the various components. For example, each engine should only know about state which directly impacts how it works. For example, the keys used to encrypt a collection should only be exposed to that specific engine, and there's no need for one engine to know what info/collections returns for other engines, nor whether the device is currently connected to WiFi.
A thorn here is for persisted state - it would be ideal if the low-level component could avoid needing to persist any state, so it can avoid any kind of storage abstraction. We have a couple of ways of managing this:
-
The state which needs to be persisted is quite small, so we could delegate state storage to the high-level component in an opaque way, as this high-level component almost certainly already has storage requirements, such as storing the "choose what to sync" preferences.
-
The low-level component could add its own storage abstraction. This would isolate the high-level component from this storage requirement, but would add complexity to the sync manager - for example, it would need to be passed a directory where it should create a file or database.
We'll probably go with the former.
Implementation plan for the low-level component.
Let's try and move into actionable decisions for the implementation. We expect the implementation of the low-level component to happen first, followed very closely by the implementation of the high-level component for Android. So we focus first on these.
Clients Engine
The clients engine includes some meta-data about each client. We've decided we can't replace the clients engine with the FxA device record and we can't simply drop this engine entirely.
Of particular interest is "commands" - these involve communicating with the engine regarding commands targeting it, and accepting commands to be send to other devices. Note that outgoing commands are likely to not originate from a sync, but instead from other actions, such as "restore bookmarks".
However, because the only current requirement for commands is to wipe the store, and because you could anticipate "wipe" also being used when remotely disconnecting a device (eg, when a device is lost or stolen), our lives would probably be made much simpler by initially supporting only per-engine wipe commands.
Note that there has been some discussion about not implementing the client engine and replacing "commands" with some other mechanism. However, we have decided to not do that because the implementation isn't considered too difficult, and because desktop will probably require a number of changes to remove it (eg, "synced tabs" will not work correctly without a client record with the same guid as the clients engine.)
Note however that unlike desktop, we will use the FxA device ID as the client ID. Because FxA device IDs are more ephemeral than sync IDs, it will be necessary for engines using this ID to track the most-recent ID they synced with so the old record can be deleted when a change is detected.
Collections vs engines vs stores vs preferences vs Apis
For the purposes of the sync manager, we define:
-
An engine is the unit exposed to the user - an "engine" can be enabled or disabled. There is a single set of canonical "engines" used across the entire sync ecosystem - ie, desktop and mobile devices all need to agree about what engines exist and what the identifier for an engine is.
-
An Api is the unit exposed to the application layer for general application functionality. Application services has 'places' and 'logins' Apis and is the API used by the application to store and fetch items. Each 'Api' may have one or more 'stores' (although the application layer will generally not interact directly with a store)
-
A store is the code which actually syncs. This is largely an implementation detail. There may be multiple stores per engine (eg, the "history" engine may have "history" and "forms" stores) and a single 'Api' may expose multiple stores (eg, the "places Api" will expose history and bookmarks stores)
-
A collection is a unit of storage on a server. It's even more of an implementation detail than a store. For example, you might imagine a future where the "history" store uses multiple "collections" to help with containers.
In practice, this means that the high-level component should only need to care about an engine (for exposing a choice of what to sync to the user) and an api (for interacting with the data managed by that api). The low-level component will manage the mapping of engines to stores.
The declined list
This document isn't going to outline the history of how "declined" is used, nor talk about how this might change in the future. For the purposes of the sync manager, we have the following hard requirements:
-
The low-level component needs to know what the currently declined set of engines is for the purposes of re-populating
meta/global
. -
The low-level component needs to know when the declined set needs to change based on user input (ie, when the user opts in to or out of a particular engine on this device)
-
The high-level component needs to be aware that the set of declined engines may change on every sync (ie, when the user opts in to or out of a particular engine on another device)
A complication is that due to networks being unreliable, there's an inherent conflict between "what is the current state?" and "what state changes are requested?". For example, if the user changes the state of an engine while there's no network, then exits the app, how do we ensure the user's new state is updated next time the app starts? What if the user has since made a different state request on a different device? Is the state as last-known on this device considered canonical?
To clarify, consider:
-
User on this device declines logins. This device now believes logins is disabled but history is enabled, but is unable to write this to the server due to no network.
-
The user declines history on a different device, but doesn't change logins. This device does manage to write the new list to the server.
-
This device restarts and the network is up. It believes history is enabled but logins is not - however, the state on the server is the exact opposite.
How does this device react?
(On the plus side, this is an extreme edge-case which none of our existing implementations handle "correctly" - which is easy to say, because there's no real definition for "correctly")
Regardless, the low-level component will not pretend to hide this complexity (ie, it will ignore it!). The low-level component will allow the app to ask for state changes as part of a sync, and will return what's on the server at the end of every sync. The app is then free to build whatever experience it desires around this.
Disconnecting from Sync
The low-level component needs to have the ability to disconnect all engines from Sync. Engines which are declined should also be reset.
Because we will need wipe() functionality to implement the clients engine, and because Lockbox wants to wipe on disconnect, we will provide disconnect and wipe functionality.
Specific deliverables for the low-level component.
Breaking the above down into actionable tasks which can be some somewhat concurrently, we will deliver:
The API
A straw-man for the API we will expose to the high-level components. This probably isn't too big, but we should do this as thoroughly as we can. In particular, ensure we have covered:
-
Declined management - how the app changes the declined list and how it learns of changes from other devices.
-
How telemetry gets handed from the low-level to the high-level.
-
The "state" - in particular, how the high-level component understands the auth state is wrong, and whether the service is in a degraded mode (eg, server requested backoff)
-
How the high-level component might specify "special" syncs, such as "just one engine" or "this is a pre-sleep, quick-as-possible sync", etc
There's a straw-man proposal for this at the end of the document.
A command-line (and possibly Android) utility.
We should build a utility (or 2) which can stand in for the high-level component, for testing and demonstration purposes.
This is something like places-utils.rs and the little utility Grisha has been using. This utility should act like a real client (ie, it should have an FxA device record, etc) and it should use the low-level component in exactly the same we we expect real products to use it.
Because it is just a consumer of the low-level component, it will force us to confront some key issues, such as how to get references to engines stored in multiple crates, how to present a unified "state" for things like auth errors, etc.
The "clients" engine
The initial work for the clients engine can probably be done without too much regard for how things are tied together - for example, much work could be done without caring how we get a reference to engines across crates.
State work
Implementing things needed to we can expose the correct state to the high-level manager for things like auth errors, backoff semantics, etc
Tie it together and other misc things.
There will be lots of loose ends to clean up - things like telemetry, etc.
Followup with non-rust engines.
We have identified that iOS will, at least in the short term, want the sync manager to be implemented in Swift. This will be responsible for syncing both the Swift and Rust implemented engines.
At some point in the future, Desktop may do the same - we will have both Rust and JS implemented engines which need to be coordinated. We ignore this requirement for now.
This approach still has a fairly easy time coordinating with the Rust implemented engines - the FFI will need to expose the relevant sync entry-points to be called by Swift, but the Swift code can hard-code the Rust engines it has and make explicit calls to these entry-points.
This Swift code will need to create the structures identified below, but this shouldn't be too much of a burden as it already has the information necessary to do so (ie, it already has info/collections etc)
TODO: dig into the Swift code and make sure this is sane.
Details
While we use rust struct definitions here, it's important to keep in mind that as mentioned above, we'll need to support the manager being written in something other than rust, and to support engines written in other than rust.
The structures below are a straw-man, but hopefully capture all the information that needs to be passed around.
#![allow(unused)] fn main() { // We want to define a list of "engine IDs" - ie, canonical strings which // refer to what the user perceives as an "enigine" - but as above, these // *do not* correspond 1:1 with either "stores" or "collections" (eg, "history" // refers to 2 stores, and in a future world, might involve 3 collections). enum Engine { History, // The "History" and "Forms" stores. Bookmarks, // The "Bookmark" store. Passwords, } impl Engine { fn as_str(&self) -> &'static str { match self { History => "history", // etc } } // A struct which reflects engine declined states. struct EngineState { engine: Engine, enabled: bool, } // A straw-man for the reasons why a sync is happening. enum SyncReason { Scheduled, User, PreSleep, Startup, } // A straw man for the general status. enum ServiceStatus { Ok, // Some general network issue. NetworkError, // Some apparent issue with the servers. ServiceError, // Some external FxA action needs to be taken. AuthenticationError, // We declined to do anything for backoff or rate-limiting reasons. BackedOff, // Something else - you need to check the logs for more details. OtherError, } // Info we need from FxA to sync. This is roughly our Sync15StorageClientInit // structure with the FxA device ID. struct AccountInfo { key_id: String, access_token: String, tokenserver_url: Url, device_id: String, } // Instead of massive param and result lists, we use structures. // This structure is passed to each and every sync. struct SyncParams { // The engines to Sync. None means "sync all" engines: Option<Vec<Engine>>, // Why this sync is being done. reason: SyncReason, // Any state changes which should be done as part of this sync. engine_state_changes: Vec<EngineState>, // An opaque state "blob". This should be persisted by the app so it is // reused next sync. persisted_state: Option<String>, } struct SyncResult { // The general health. service_status: ServiceStatus, // The result for each engine. engine_results: HashMap<Engine, Result<()>>, // The list of declined engines, or None if we failed to get that far. declined_engines: Option<Vec<Engine>>, // When we are allowed to sync again. If > now() then there's some kind // of back-off. Note that it's not strictly necessary for the app to // enforce this (ie, it can keep asking us to sync, but we might decline). // But we might not too - eg, we might try a user-initiated sync. next_sync_allowed_at: Timestamp, // New opaque state which should be persisted by the embedding app and supplied // the next time Sync is called. persisted_state: String, // Telemetry. Nailing this down is tbd. telemetry: Option<JSONValue>, } struct SyncManager {} impl SyncManager { // Initialize the sync manager with the set of Engines known by this // application without regard to the enabled/declined states. // XXX - still TBD is how we will pass "stores" around - it may be that // this function ends up taking an `impl Store` fn init(&self, engines: Vec<&str>) -> Result<()>; fn sync(&self, params: SyncParams) -> Result<SyncResult>; // Interrupt any current syncs. Note that this can be called from a different // thread. fn interrupt() -> Result<()>; // Disconnect this device from sync. This may "reset" the stores, but will // not wipe local data. fn disconnect(&self) -> Result<()>; // Wipe all local data for all local stores. This can be done after // disconnecting. // There's no exposed way to wipe the remote store - while it's possible // stores will want to do this, there's no need to expose this to the user. fn wipe(&self) -> Result<()>; } }
Sync Overview
This document provides a high-level overview of how syncing works. Note: each component has its own quirks and will handle sync slightly differently than the general process described here.
General flow and architecture
- Crates involved:
- The
sync15
andsupport/sync15-traits
handle the general syncing logic and define theSyncEngine
trait - Individual component crates (
logins
,places
,autofill
, etc). These implementSyncEngine
. sync_manager
manages the overall syncing process.
- The
- High level sync flow:
- Sync is initiated by the application that embeds application-services.
- The application calls
SyncManager.sync()
to start the sync process. SyncManager
createsSyncEngine
instances to sync the individual components. EachSyncEngine
corresponds to acollection
on the sync server.
Sync manager
SyncManager
is responsible for performing the high-level parts of the sync process:
- The consumer code calls it's
sync()
function to start the sync, passing in aSyncParams
object in, which describes what should be synced. SyncManager
performs all network operations on behalf of the individual engines. It's also responsible for tracking the general authentication state (primarily by inspecting the responses from these network requests) and fetching tokens from the token server.SyncManager
checks if we are currently in a backoff period and should wait before contacting the server again.- Before syncing any engines, the sync manager checks the state of the meta/global collection and compares it with the enabled engines specified in the SyncParams. This handles the cases when the user has requested an engine be enabled or disabled on this device, or when it was requested on a different device. (Note that engines enabled and disabled states are state on the account itself and not a per-device setting). Part of this process is comparing the collection's GUID on the server with the GUID known locally - if they are different, it implies some other device has "reset" the collection, so the engine drops all metadata and attempts to reconcile with every record on the server (ie, acts as though this is the very first sync this engine has ever done).
SyncManager
instantiates aSyncEngine
for each enabled component. We currently use 2 different methods for this:- The older method is for the
SyncManager
to hold a weakref to aStore
use that to create theSyncEngine
(tabs and places). TheSyncEngine
uses theStore
for database access, see theTabsStore
for an example. - The newer method is for the components to provide a function to create the
SyncEngine
, hiding the details of how that engine gets created (autofill/logins). These components also define aStore
instance for theSyncEngine
to use, but it's all transparent to theSyncManager
. (Seeautofill::get_registered_sync_engine()
andautofill::db::store::Store
)
- The older method is for the
- For components that use local encryption,
SyncManager
passes the local encryption key to theirSyncEngine
- Finally, calls
sync_multiple()
function from thesync15
crate, sending it theSyncEngine
instances.sync_multiple()
then calls thesync()
function for each individualSyncEngine
Sync engines
SyncEngine
is defined in thesupport/sync15-traits
crate and defines the interface for syncing a component.- A new
SyncEngine
instance is created for each sync SyncEngine.apply_incoming()
does the main work. It is responsible for processing incoming records from the server in order to update the local records and calculating which local records should be synced back.
The apply_incoming
pattern
SyncEngine
instances are free to implement apply_incoming()
any way they want, but the most components follow a general pattern.
Database Tables
- The local table stores records for the local application
- The mirror table stores the last known record from the server
- The staging temporary table stores the incoming records that we're currently processing
- The local/mirror/staging tables contains a
guid
as its primary key. A record will share the sameguid
for the local/mirror/staging table. - The metadata table stores the GUID for the collection as a whole and the the last-known server timestamp of the collection.
apply_incoming
stages
- stage incoming: write out each incoming server record to the staging table
- fetch states: take the rows from all 3 tables and combine them into a single struct containing
Option
s for the local/mirror/staging records. - iterate states: loop through each state, decide how to do change the local records, then execute that plan.
- reconcile/plan: For each state we create an action plan for it. The action plan is a low-level description of what to change (add this record, delete this one, modify this field, etc). Here are some common situations:
- A record only appears in the staging table. It's a new record from the server and should be added to the local DB
- A record only appears in the local table. It's a new record on the local instance and should be synced back to the serve
- Identical records appear in the local/mirror tables and a changed record is in the staging table. The record was updated remotely and the changes should be propagated to the local DB.
- A record appears in the mirror table and changed records appear in both the local and staging tables. The record was updated both locally and remotely and we should perform a 3-way merge.
- apply plan: After we create the action plan, then we execute it.
- reconcile/plan: For each state we create an action plan for it. The action plan is a low-level description of what to change (add this record, delete this one, modify this field, etc). Here are some common situations:
- fetch outgoing:
- Calculate which records need to be sent back to the server
- Update the mirror table
- Return those records back to the
sync15
code so that it can upload them to the server. - The sync15 code returns the timestamp reported by the server in the POST response and hands it back to the engine. The engine persists this timestamp in the metadata table - the next sync will then use this timestamp to only fetch records that have since been changed by other devices
syncChangeCounter
The local table has an integer column syncChangeCounter which is incremented every time the embedding app makes a change to a local record (eg, updating a field). Thus, any local record with a non-zero change counter will need to be updated on the server (with either the local record being used, or after it being merged if the record also changed remotely). At the start of the sync, when we are determining what action to take, we take a copy of the change counter, typically in a temp staging table. After we have uploaded the record to the server, we decrement the counter by whatever it was when the sync started. This means that if a record is changed in between staging the record and uploading it, the change counter will not drop to zero, and so it will correctly be seen as locally modified on the next sync
High level design for shipping Rust Components as Swift Packages
This is a high level description of the decision highlighted in the ADR that introduced Swift Packages as a strategy to ship our Rust components. That document includes that tradeoffs and why we chose this approach.
The strategy includes two main parts:
- The xcframework that is built from a megazord. The xcframework contains the following, built for all our target iOS platforms.
- The compiled Rust code for all the crates listed in
Cargo.toml
as a static library - The C header files and Swift module maps for the components
- The compiled Rust code for all the crates listed in
- The
rust-components-swift
repository which has aPackage.swift
that includes thexcframework
and acts as the swift package the consumers import
The xcframework and application-services
In application-services
, in the megazords/ios-rust
directory, we have the following:
- A Rust crate that serves as the megazord for our iOS distributions. The megazord depends on all the Rust Component crates and re-exports their public APIs.
- Some skeleton files for building an xcframework:
1.
module.modulemap
: The module map tells the Swift compiler how to use C APIs. 1.MozillaRustComponents.h
: The header is used by the module map as a shortcut to specify all the available header files 1.Info.plist
: Theplist
file specifies metadata about the resulting xcframework. For example, architectures and subdirectories. - The
build-xcframework.sh
script that stitches things together into a full xcframework bundle:- The
xcframework
format is not well documented; briefly:- The xcframework is a directory containing the resources compiled for multiple target architectures. The xcframework is distributed as a
.zip
file. - The top-level directory contains a subdirectory per architecture and an
Info.plist
. TheInfo.plist
describes what lives in which directory. - Each subdirectory represents an architecture. And contains a
.framework
directory for that architecture.
- The xcframework is a directory containing the resources compiled for multiple target architectures. The xcframework is distributed as a
- The
It's a little unusual that we're building the xcframework by hand, rather than defining it as the build output of an Xcode project. It turns out to be simpler for our purposes, but does risk diverging from the expected format if Apple changes the details of xcframeworks in future Xcode releases.
The rust-components-swift
repository
The repository is a Swift Package for distributing releases of Mozilla's various Rust-based application components. It provides the Swift source code packaged in a format understood by the Swift package manager, and depends on a pre-compiled binary release of the underlying Rust code published from application-services
The rust-components-swift
repo mainly includes the following:
Package.swift
: Defines all thetargets
andproducts
the package exposes.Package.swift
also includes where the package gets thexcframework
thatapplication-services
builds
make_tag.sh
: A script that does the following:- Generates any dynamically generated Swift code, mainly:
- The uniffi generated Swift bindings
- The Glean metrics
- Creates and commits a git tag that can be pushed to cut a release
- Generates any dynamically generated Swift code, mainly:
Consumers would then import the
rust-components-swift
swift package, by indicating the url of the package on github (i.e https://github.com/mozilla/rust-components-swift) and selecting a version using the git tag.
High level firefox sync interactions
On a high level, Firefox Sync has three main components:
- The Firefox Account Server: Which uses oauth to authenticate and provide users with scoped access. The FxA Server also stores input that will be used by the clients to generate the sync keys.
- Firefox: This is the firefox app itself, which implements the client logic to communicate with the firefox account servers, generate sync keys, use them to encrypt data and send/receive encrypted data to/from the sync storage servers
- Sync Storage Server: The server that stores encrypted sync data. The clients would retrieve the encrypted data and decrypt it client side
Additionally, the token server assists in providing metadata to Firefox, so that it knows which sync server to communicate with.
Multi-platform sync diagram
Since we have multiple Firefox apps (Desktop, iOS, Android, Focus, etc) Firefox sync can sync across platforms. Allowing users to access their up-to-date data across apps and devices.
Before: How sync was
Before our Rust Components came to life, each application had its own implementation of the sync and FxA client protocols. This lead to duplicate logic across platforms. This was problematic since any modification to the sync or FxA client business logic would need to be modified in all implementations and the likelihood of errors was high.
Now: Sync is starting to streamline its components
Currently, we are in the process of migrating many of the sync implementation to use our Rust Component strategy. Fenix primarily uses our Rust Components and iOS has some integrated as well. Additionally, Firefox Desktop also uses one Rust component (Web Extension Storage).
The Rust components not only unify the different implementations of sync, they also provide a convenient local storage for the apps. In other words, the apps can use the components for storage, with or without syncing to the server.
Current Status
The following table has the status of each of our sync Rust Components
Application\Component | Bookmarks | History | Tabs | Passwords | Autofill | Web Extension Storage | FxA Client |
---|---|---|---|---|---|---|---|
Fenix | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
Firefox iOS | ✔️ | ✔️ | ✔️ | ✔️ | |||
Firefox Desktop | ✔️ | ||||||
Focus |
Future: Only one implementation for each sync engine
In an aspirational future, all the applications would use the same implementation for Sync. However, it's unlikely that we would migrate everything to use the Rust components since some implementations may not be prioritized, this is especially true for desktop which already has stable implementations. That said, we can get close to this future and minimize duplicate logic and the likelihood of errors.
You can edit the diagrams in the following lucid chart (Note: Currently only Mozilla Employees can edit those diagrams): https://lucid.app/lucidchart/invitations/accept/inv_ab72e218-3ad9-4604-a7cd-7e0b0c259aa2
Once they are edited, you can re-import them here by replacing the old diagrams in the docs/diagrams
directory on GitHub. As long as the
names are the same, you shouldn't need to edit those docs!
Metrics collected by Application Services components
Some application-services components collect telemetry using the Glean SDK.
Products that send telemetry via Glean must request a data-review following the Firefox Data Collection process before integrating any of the components listed below.
Rust Versions
Like almost all Rust projects, the entire point of the application-services components is that they be used by external projects. If these components use Rust features available in only the very latest Rust version, this will cause problems for projects which aren't always able to be on that latest version.
Given application-services is currently developed and maintained by Mozilla staff, it should be no surprise that an important consideration is mozilla-central (aka, the main Firefox repository).
Mozilla-central Rust policies.
It should also come as no surprise that the Rust policy for mozilla-central is somewhat flexible. There is an official Rust Update Policy Document but everything in the future is documented as "estimated".
Ultimately though, that page defines 2 Rust versions - "Uses" and "Requires", and our policy revolves around these.
To discover the current, actual "Uses" version, there is a Meta bug on Bugzilla that keeps track of the latest versions as they are upgraded.
To discover the current, actual "Requires" version, see searchfox
application-services Rust version policy
Our official Rust version policy is:
-
All components will ship using, have all tests passing, and have clippy emit no warnings, with the same version mozilla-central currently "uses".
-
All components must be capable of building (although not necessarily with all tests passing nor without clippy errors or other warnings) with the same version mozilla-central currently "requires".
-
This policy only applies to the "major" and "minor" versions - a different patch level is still considered compliant with this policy.
Implications of this
All CI for this project will try and pin itself to this same version. At
time of writing, this means that our circle CI integration
and
rust-toolchain configuration
will specify the versions (and where possible, the CI configuration file will
avoid duplicating the information in rust-toolchain
)
We should maintain CI to ensure we still build with the "Requires" version.
As versions inside mozilla-central change, we will bump these versions accordingly. While newer versions of Rust can be expected to work correctly with our existing code, it's likely that clippy will complain in various ways with the new version. Thus, a PR to bump the minimum version is likely to also require a PR to make changes which keep clippy happy.
In the interests of avoiding redundant information which will inevitably become stale, the circleci and rust-toolchain configuration links above should be considered the canonical source of truth for the currently supported official Rust version.
Sqlite Database Pragma Usage
The data below has been added as a tool for future pragma analysis work and is expected to be useful so long as our pragma usage remains stable or this doc is kept up-to-date. This should help us understand our current pragma usage and where we may be able to make improvements.
Pragma | Value | Component | Notes |
---|---|---|---|
cache_size | -6144 | places | |
foreign_keys | ON | autofill, places, tabs, webext-storage | |
journal_mode | WAL | autofill, places, tabs, webext-storage | |
page_size | 32768 | places | |
secure_delete | true | logins | |
temp_store | 2 | autofill, logins, places, tabs, webext_storage | Setting temp_store to 2 (MEMORY) is necessary to avoid SQLITE_IOERR_GETTEMPPATH errors on Android (see here for details) |
wal_autocheckpoint | 62 | places | |
wal_checkpoint | PASSIVE | places | Used in the sync finished step in history and bookmarks syncing and in the places run_maintenance function |
- The user_version pragma is excluded because the value varies and sqlite does not do anything with the value.
- The push component does not implement any of the commonly used pragmas noted above.
- The sqlcipher pragmas that we set have been excluded from this list as we are trying to remove sqlcipher and do not want to encourage future use.
Application Services Release Process
Nightly builds
Nightly builds are automatically generated using a taskcluster cron task.
- The results of the latest successful nightly build is listed here: https://firefox-ci-tc.services.mozilla.com/tasks/index/project.application-services.v2.nightly/latest
- The latest nightly decision task should be listed here: https://firefox-ci-tc.services.mozilla.com/tasks/index/project.application-services.v2.branch.main.latest.taskgraph/decision-nightly
- If you don't see a decision task from the day before, then contact releng. It's likely that the cron decision task is broken.
Release builds
Release builds are generated from the release-vXXX
branches and triggered in Ship-it
- Whenever a commit is pushed to a release branch, we build candidate artifacts. These artifacts are shippable -- if we decide that the release is ready, they just need to be copied to the correct location.
- The
push
phase ofrelease-promotion
copies the candidate to a staging location where they can be tested. - The
ship
phase ofrelease-promotion
copies the candidate to their final, published, location.
[Release management] Creating a new release
This part is 100% covered by the Release Management team. The dev team should not perform these steps.
On Merge Day we take a snapshot of the current main
, and prepare a release. See Firefox Release Calendar.
-
Create a branch name with the format
release-v[release_version]
off of themain
branch (for example,release-v118
) through the GitHub UI.[release_version]
should follow the Firefox release number. See Firefox Release Calendar. -
Create a PR against the release branch that updates
version.txt
and updates theCHANGELOG.md
as follows:
- In version.txt, update the version from [release_version].0a1 to [release_version].0.
diff --git a/version.txt b/version.txt
--- a/version.txt
+++ b/version.txt
@@ -1 +1 @@
-118.0a1
+118.0
- In CHANGELOG.md, change
In progress
to_YYYY-MM-DD_
to match the Merge Day date and add a URL to the release version change log.
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7f2c07a1a8..06688fdcab 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,8 +1,7 @@
-# v118.0 (In progress)
-
-[Full Changelog](In progress)
+# v118.0 (_2023-08-28_)
## General
+
### 🦊 What's Changed 🦊
- Backward-incompatible changes to the Suggest database schema to accommodate custom details for providers ([#5745](https://github.com/mozilla/application-services/pull/5745)) and future suggestion types ([#5766](https://github.com/mozilla/application-services/pull/5766)). This only affects prototyping, because we aren't consuming Suggest in any of our products yet.
@@ -16,7 +15,6 @@
- The Remote Settings client has a new `Client::get_records_with_options()` method ([#5764](https://github.com/mozilla/application-services/pull/5764)). This is for Rust consumers only; it's not exposed to Swift or Kotlin.
- `RemoteSettingsRecord` objects have a new `deleted` property that indicates if the record is a tombstone ([#5764](https://github.com/mozilla/application-services/pull/5764)).
-
## Rust log forwarder
### 🦊 What's Changed 🦊
@@ -34,6 +32,8 @@
- Removed previously deprecated commands `experimenter`, `ios`, `android`, `intermediate-repr` ([#5784](https://github.com/mozilla/application-services/pull/5784)).
+[Full Changelog](https://github.com/mozilla/application-services/compare/v117.0...v118.0)
+
# v117.0 (_2023-07-31_)
- Create a commit named 'Cut release v[release_version].0` and a PR for this change.
- See example PR
- Create a PR against the main branch that updates
version.txt
and updates theCHANGELOG.md
as follows:
- In version.txt, update the version from [release_version].0a1 to [next_release_version].0a1.
diff --git a/version.txt b/version.txt
--- a/version.txt
+++ b/version.txt
@@ -1 +1@@
-118.0a1
+119.0a1
- In CHANGELOG.md, change the in progress version from [release_version].0 to [next_release_version].0, add a header for the previous release version, and add a URL to the previous release version change log.
diff --git a/CHANGELOG.md b/CHANGELOG.md
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,8 +1,7 @@
-# v118.0 (In progress)
+# v119.0 (In progress)
[Full Changelog](In progress)
+# v118.0 (_2023-08-28_)
@@ -34,6 +36,8 @@
+[Full Changelog](https://github.com/mozilla/application-services/compare/v117.0...v118.0)
+
# v117.0 (_2023-07-31_)
- Create a commit named 'Start release v[next_release_version].0` and a PR for this change.
- See example PR
- Once all of the above PRs have landed, create a new Application Services release in Ship-It.
- Promote and Ship the release.
-
Tag the release in the Application Services repo.
-
Inform the Application Services team to cut a release of rust-components-swift
- The team will tag the repo and let you know the git hash to use when updating the consumer applications
- Update consumer applications
- firefox-android: Follow the directions in the release checklist
- firefox-ios: Follow the directions in the release checklist
[Release management] Creating a new release via scripts:
- Run
pip3 install -r automation/requirements.txt
to install the required Python packages. - Run the
automation/prepare-release.py
script. This should:
- Create a new branch named
release-vXXX
- Create a PR against that branch that updates
version.txt
like this:
diff --git a/version.txt b/version.txt
index 8cd923873..6482018e0 100644
--- a/version.txt
+++ b/version.txt
@@ -1,4 +1,4 @@
-114.0a1
+114.0
- Create a PR on
main
that starts a new CHANGELOG header.
- Tag the release with
automation/tag-release.py [major-version-number]
Cutting patch releases for uplifted changes (dot-release)
If you want to uplift changes into a previous release:
- Make sure the changes are present in
main
and have been thoroughly tested - Find the PR for the changes and add this comment:
@mergify backport release-vXXX
- Find the Bugzilla bug with the changes and add an uplift request
- Find the attacment corresponding to new PR created from the
@mergify
comment. - Click the "details" link
- Set
approval-mozilla-beta
orapproval-mozilla-release
to?
- Save the form
- Find the attacment corresponding to new PR created from the
- Release management will then:
- Arrange for the backport to be merged
- Create a new Application Services release in Ship-It for the release branch. Promote & ship the release
- Tag the release in the Application Services repo
- Notify the Application Services team in case there is a need to cut a new release of rust-components-swift
- Notify any affected consumer applications teams.
What gets built in a release?
We build several artifacts for both nightlies and releases:
nightly.json
/release.json
. This is a JSON file containing metadata from successful builds. The metadata for the latest successful build can be found from a taskcluster index: https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/project.application-services.v2.release.latest/artifacts/public%2Fbuild%2Frelease.json The JSON file contains:- The version number for the nightly/release
- The git commit ID
- The maven channel for Kotlin packages:
maven-production
: https://maven.mozilla.org/?prefix=maven2/org/mozilla/appservices/maven-nightly-production
: https://maven.mozilla.org/?prefix=maven2/org/mozilla/appservices/nightly/maven-staging
: https://maven-default.stage.mozaws.net/?prefix=maven2/org/mozilla/appservices/maven-nightly-staging
: https://maven-default.stage.mozaws.net/?prefix=maven2/org/mozilla/appservices/nightly/
- Links to
nimbus-fml.*
: used to build Firefox/Focus on Android and iOS - Links to
*RustComponentsSwift.xcframework.zip
: XCFramework archives used to build Firefox/Focus on iOS - Link to
swift-components.tar.xz
: UniFFI-generated swift files which get extracted into therust-components-swift
repository for each release.
Nightly builds
For nightly builds, consumers get the artifacts directly from the taskcluster.
- For,
firefox-android
, the nightlies are handled by relbot - For,
firefox-ios
, the nightlies are consumed by rust-components-swift.rust-components-swift
makes a github release, which is picked up by a Github action in firefox-ios
Release promotion
For real releases, we use the taskcluster release-promotion action. Release promotion happens in two phases:
promote
copies the artifacts from taskcluster and moves them to a staging area. This allows for testing the consumer apps with the artifacts.ship
copies the artifacts from the staging area to archive.mozilla.org, which serves as their permanent storage area.
Application Services Build and Publish Pipeline
This document provides an overview of the build-and-publish pipeline used to make our work in this repo available to consuming applications. It's intended both to document the pipeline for development and maintenance purposes, and to serve as a basic analysis of the integrity protections that it offers (so you'll notice there are notes and open questions in place where we haven't fully hashed out all those details).
The key points:
- We use "stable" Rust. CI is pinned to whatever version is currently used on mozilla-central to help with vendoring into that repository. You should check what current values are specified for CircleCI and for TaskCluster
- We use Cargo for building and testing the core Rust code in isolation, Gradle with rust-android-gradle for combining Rust and Kotlin code into Android components and running tests against them, and XCframeworks driving XCode for combining Rust and Swift code into iOS components.
- TaskCluster runs on every pull-request, release, and push to main, to ensure Android artifacts build correctly and to execute their tests via gradle.
- CircleCI runs on every branch, pull-request (including forks), and release, to execute lint checks and automated tests at the Rust and Swift level.
- Releases align with the Firefox Releases schedules, and nightly releases are automated to run daily see the releases for more information
- Notifications about build failures are sent to a mailing list at a-s-ci-failures@mozilla.com
- Our Taskcluster implementation is almost entirely maintained by the Release Engineering team.
The proper way to contact them in case of emergency or for new developments is to ask on the
#releaseduty-mobile
Slack channel. Our main point of contact is @mihai.
For Android consumers these are the steps by which Application Services code becomes available, and the integrity-protection mechanisms that apply at each step:
- Code is developed in branches and lands on
main
via pull request.- GitHub branch protection prevents code being pushed to
main
without review. - CircleCI and TaskCluster run automated tests against the code, but do not have
the ability to push modified code back to GitHub thanks to the above branch protection.
- TaskCluster jobs do not run against PRs opened by the general public, only for PRs from repo collaborators.
- Contra the github org security guidelines, signing of individual commits is encouraged but is not required. Our experience in practice has been that this adds friction for contributors without sufficient tangible benefit.
- GitHub branch protection prevents code being pushed to
- Developers manually create a release from latest
main
.- The ability to create new releases is managed entirely via github's permission model.
- TODO: the github org security guidelines recommend signing tags, and auditing all included commits as part of the release process. We should consider some tooling to support this. I don't think there's any way to force githib to only accept signed releases in the same way it can enforce signed commits.
- TaskCluster checks out the release tag, builds it for all target platforms, and runs automated tests.
- These tasks run in a pre-built docker image, helping assure integrity of the build environment.
- TODO: could this step check for signed tags as an additional integrity measure?
- TaskCluster uploads symbols to Socorro.
- The access token for this is currently tied to @eoger's LDAP account.
- TaskCluster uploads built artifacts to maven.mozilla.org
- Secret key for uploading to maven is provisioned via TaskCluster, guarded by a scope that's only available to this task.
- TODO: could a malicious dev dependency from step (3) influence the build environment here?
- TODO: talk about how TC's "chain of trust" might be useful here.
- Consumers fetch the published artifacts from maven.mozilla.org.
For iOS consumers the corresponding steps are:
- Code is developed in branches and lands on
main
via pull request, as above. - Developers manually create a release from latest
main
, as above. - CircleCI checks out the release tag, builds it, and runs automated tests.
- TODO: These tasks bootstrap their build environment by fetching software over https. could we do more to ensure the integrity of the build environment?
- TODO: could this step check for signed tags as an additional integrity measure?
- TODO: can we prevent these steps from being able to see the tokens used for publishing in subsequent steps?
- CircleCI builds a binary artifact:
- An XCFramework containing just Rust code and header files, as a zipfile, for use by Swift Packages.
- TODO: could a malicious dev dependency from step (3) influence the build environment here?
- CircleCI uses dpl to publish to GitHub as a release artifact.
- Consumers add Application services as a dependency from the Rust Components Swift repo using Apple's Swift Package Manager.
For consuming in mozilla-central, see how to vendor components into mozilla-central
This is a diagram of the pipeline as it exists (and is planned) for the Nimbus SDK, one of the libraries in Application Services: (Source: https://miro.com/app/board/o9J_lWx3jhY=/)
Authentication and secrets
@appsvc-moz account
There's an appsvc-moz github account owned by one of the application-services team (currently markh, but we should consider rotating ownership). Given only 1 2fa device can be connected to a github account, multiple owners doesn't seem practical. In most cases, whenever a github account needs to own a secret for any CI, it will be owned by this account.
CircleCI
CircleCI config requires a github token (owned by @appsvc-moz). This is a "personal access token" obtained via github's Settings -> Developer Settings -> Personal Access Tokens -> Classic Token. This token:
- Should be named something like "circleci"
- Have "no expiration" (XXX - this seems wrong, should we adjust?)
Once you have generated the token, it must be added to https://app.circleci.com/settings/project/github/mozilla/application-services/environment-variables as the environment variable GITHUB_TOKEN
Guide to upgrading NSS
Our components rely on cryptographic primitives provided by NSS. Every month or so, a new version of NSS is published and we should try to keep our version as up-to-date as possible.
Because it makes unit testing easier on Android, and helps startup performance on iOS, we compile NSS ourselves and link to it statically. Note that NSS is mainly used by Mozilla as a dynamic library and the NSS project is missing related CI jobs (iOS builds, windows cross-compile builds etc.) so you should expect breakage when updating the library (hence this guide).
Updating the Version
The build code is located in the libs/
folder.
The version string is located in the beginning of build-all.sh
.
For most NSS upgrades, you'll need to bump the version number in this file and update the downloaded archive checksum. Then follow the steps for Updating the cross-compiled NSS Artifacts below. The actual build invocations are located in platform-specific script files (e.g. build-nss-ios.sh
) but usually don't require any changes.
To test out updating NSS version:
- Ensure you've bumped the NSS in
build-all.sh
- Clear any old NSS build artifacts:
rm -rf ./libs/desktop && cargo clean
- Install the updates version:
./libs/verify-desktop-environment.sh
- Try it out:
cargo test
Updating the Cross-Compiled NSS Artifacts
We use a Linux TC worker for cross-compiling NSS for iOS, Android and Linux desktop machines. However, due to the complexity of the NSS build process, there is no easy way for cross-compiling MacOS and Windows -- so we currently use pre-built artifacts for MacOS desktop machines (ref #5210).
- Look for the tagged version from the NSS CI
usually a description with something like
Added tag NSS_3_90_RTM
- Select the build for the following system(s) (first task with the title "B"):
- For Intel MacOS:
mac opt-static
- For Intel MacOS:
- Update taskcluster/ci/fetch/kind.yml, specifically
nss-artifact
task to the appropriateurl
andchecksum
andsize
Note: To get the checksum, you can run
shasum -a 256 {path-to-artifact}
or you can make a PR and see the output of the failed log. - Update the SHA256 value for darwin cross-compile in libs/build-nss-desktop.sh to the same checksum as above.
- Once the pull request lands,
build-nss-desktop.sh
should be updated once more using the L3 cache Taskcluster artifact.
Exposing new functions
If the new version of NSS comes with new functions that you want to expose, you will need to:
- Add low-level bindings for those functions in the
nss_sys
crate; follow the instructions in README for that crate. - Expose a safe wrapper API for the functions from the
nss
crate; - Expose a convenient high-level API for the functions from the
rc_crypto
crate;
Tips for Fixing Bustage
On top of the primitives provided by NSS, we have built a safe Rust wrapper named rc_crypto that links to NSS and makes these cryptographic primitives available to our components.
The linkage is done by the nss_build_common
crate. Note that it supports a is_gecko
feature to link to NSS dynamically on Desktop.
Because the NSS static build process does not output a single .a
file (it would be great if it did), this file must describe for each architecture which modules should we link against. It is mostly a duplication of logic from the NSS gyp build files. Note that this logic is also duplicated in our NSS lib build steps (e.g. build-nss-desktop.sh).
One of the most common build failures we get when upgrading NSS comes from NSS adding new vectorized/asm versions of a crypto algorithm for a specific architecture in order to improve performance. This new optimized code gets implemented as a new gyp target/module that is emitted only for the supported architectures. When we upgrade our copy of NSS we notice the linking step failing on CI jobs because of undefined symbols.
This PR shows how we update nss_common_build
and the build scripts to accommodate for these new modules. Checking the changelog for any suspect commit relating to hardware acceleration is rumored to help.
Expand description
§Firefox Accounts Client
The fxa-client component lets applications integrate with the Firefox Accounts identity service. The shape of a typical integration would look something like:
-
Out-of-band, register your application with the Firefox Accounts service, providing an OAuth
redirect_uri
controlled by your application and obtaining an OAuthclient_id
. -
On application startup, create a
FirefoxAccount
object to represent the signed-in state of the application.- On first startup, a new
FirefoxAccount
can be created by callingFirefoxAccount::new
and passing the application’sclient_id
. - For subsequent startups the object can be persisted using the
to_json
method and re-created by callingFirefoxAccount::from_json
.
- On first startup, a new
-
When the user wants to sign in to your application, direct them through a web-based OAuth flow using
begin_oauth_flow
orbegin_pairing_flow
; when they return to your registeredredirect_uri
, pass the resulting authorization state back tocomplete_oauth_flow
to sign them in. -
Display information about the signed-in user by using the data from
get_profile
. -
Access account-related services on behalf of the user by obtaining OAuth access tokens via
get_access_token
. -
If the user opts to sign out of the application, calling
disconnect
and then discarding any persisted account data.
Structs§
- An OAuth access token, with its associated keys and metadata.
- A client connected to the user’s account.
- Information about the authorization state of the application.
- Parameters provided in an incoming OAuth request.
- The payload sent when invoking a “close tabs” command.
- A device connected to the user’s account.
- Device configuration
- Details of a web-push subscription endpoint.
- Object representing the signed-in state of an application.
- Local device that’s connecting to FxA
- Information about the user that controls a Firefox Account.
- A cryptographic key associated with an OAuth scope.
- The payload sent when invoking a “send tab” command.
- An individual entry in the navigation history of a sent tab.
- User data provided by the web content, meant to be consumed by user agents
Enums§
- An event that happened on the user’s account.
- A “capability” offered by a device.
- Enumeration for the different types of device.
- FxA internal error type These are used in the internal code. This error type is never returned to the consumer.
- Public error type thrown by many [
FirefoxAccount
] operations. - Fxa event
- High-level view of the authorization state
- Fxa state
- Internal state machine events
- State passed to the state checker, this is exactly the same as
internal_machines::State
except theComplete
variant uses a named field for UniFFI compatibility. - A command invoked by another device.
Type Aliases§
- Result returned by public-facing API functions
- Result returned by internal functions
Developing documentation
The documentation in this repository pertains to the application-services library, primarily the sync and storage components, firefox account client and the nimbus-sdk experimentation client.
The markdown is converted to static HTML using mdbook. To add a new document, you need to add it to the SUMMARY.md file which produces the sidebar table of contents.
Building documentation
Building the narrative (book) documentation
The mdbook
crate is required in order to build the documentation:
cargo install mdbook mdbook-mermaid mdbook-open-on-gh
The repository documents are be built with:
./tools/build.docs.sh
The built documentation is saved in build/docs/book
.