MozMEAO SRE Status Report - June 20, 2017

Here’s what happened on the MozMEAO SRE team from June 13th - June 20th.

Current work

Static site hosting

  • The irlpodcast site now has a staging environment also hosted in S3 with CloudFront. Additionally, Jenkins has been updated to deploy to staging and production via git push.

  • We’re going to move viewsourceconf.org from Kubernetes to S3 and CloudFront hosting. Production and staging environments have been provisioned, but we’ll need to update Jenkins to push changes to these new environments.

Basket move to Kubernetes

Kubernetes (general)

Our DataDog, New Relic and MIG DaemonSets have been configured to use Kubernetes tolerations to schedule pods on master nodes. This allows us to capture metrics from K8s master nodes in additional to worker nodes.

Frankfurt Kubernetes cluster provisioning

Work continues to enable our apps in the new Frankfurt Kubernetes cluster. In addition, we’re working on automating our app installs as must as possible.

MDN

  • ElasticSearch will be upgraded to 2.4 in SCL3 production, June 21 11 AM PST

  • We may reconsider self-hosting ElasticSearch.

MozMEAO SRE Status Report - June 13, 2017

Here’s what happened on the MozMEAO SRE team from June 6th - June 13th.

Current work

Frankfurt Kubernetes cluster provisioning

We’re provisioning a new Kubernetes 1.6.4 cluster in Frankfurt (eu-central-1). This cluster takes advantage of features in new versions of kops, helm, and kubectl.

We’ve modified our New Relic, Datadog, and mig DaemonSets with tolerations so we can gather system metrics from both K8s master and worker nodes.

The first apps to be installed in this cluster will be bedrock and basket.

Basket move to Kubernetes

Basket has been moved to Kubernetes! We experienced some networking issues in our Virginia Kubernetes cluster, so traffic has been routed away from this cluster for the time being.

Snippets

The Firefox 56 activity stream will ship to some users, with some form of snippets integration.

MozMEAO SRE Status Report - 6/6/2017

Here’s what happened on the MozMEAO SRE team from May 30th - June 6th.

Current work

Scale down Deis 1 clusters

Now that bedrock, nucleus, surveillance, and viewsourceconf have been deployed to Kubernetes, we scaled down our Portland Deis 1 cluster from 10 nodes to 6 to save on AWS costs. Additionally, Deis 1 ELB’s for surveillance, viewsourceconf, and nucleus have been decommissioned.

Cloudflare to Datadog service running in Kubernetes

The Cloudflare to Datadog service that was previously running in Deis 1 is now running in Kubernetes. Additionally, an external contributor has submitted a pull request to add this service to the Kubernetes charts repo. The PR looks to be abandoned, so it may be closed without being merged within a few days. If this happens, we’ll open a new PR with any requested changed from the current PR.

Cloudfront Provisioning

We’ve started work on provisioning Cloudfront, a global content delivery network service, for our bedrock staging environment. Once we iron out the wrinkles with bedrock stage, we’ll continue on to bedrock prod.

Preparing to move basket.mozilla.org to Kubernetes

Work has started to move Basket to Kubernetes.

pmac has completed work to build and deploy Basket with Jenkins similar to how our Bedrock deployment work.

Evaluation of Kubewatch

We tried Kubewatch, a service to watch Kubernetes events and report them to Slack. However, this doesn’t seem like the right tool for us, as it currently doesn’t allow us to filter the many notifications that we get.

Future work

Decommission openwebdevice.org

We are waiting on some internal communications before moving forward.

Kuma Report, May 2017

Here’s what happened in May in Kuma, the engine of MDN:

  • Refactored zone CSS
  • Improved drafts
  • Moved redirects into Kuma
  • Retired old features
  • Let data be data
  • Shipped tweaks and fixes

Here’s the plan for June:

  • Ship on-site interactive examples
  • Ship brand updates to beta users
  • Add KumaScript macro tests
  • Ship the sample database

Done in May

Refactored Zone CSS

Some MDN sections look different, like the archive of old pages. Others also appear at non-standard URLs, like the Firefox pages. Kuma uses manually maintained Zones to accomplish this, and it is a source of bugs and inconsistent experiences.

We took a big step toward better zones by refactoring the custom styles. escattone did the backend work (PR 4209) so that styles are automatically applied across translations. stephaniehobson did the front-end work, moving the CSS from the database to the repository (PR 4206), then splitting them into per-zone CSS files (PR 4224, PR 4229).

The zone CSS is now up to the quality standard of the rest of our CSS, and the experience across translations is more consistent. It wasn’t easy, taking 10 total PRs, but Sass and other front-end tools made the transition smoother than it would have been a year ago. Custom Zone URLs are still painful, but we’ll tackle those soon.

Improved Drafts

We have a papercut process to determine the most annoying bugs. Recently, bugs around the drafts feature rose to the top. The draft feature saves the editor content to local storage, to add a layer of safety from browser crashes and session timeouts.

stephaniehobson has been working on PR 4186 for a few weeks, and it was recently merged to master. This PR fixes 6 known bugs, including the document_saved query parameter. This code will be be deployed next week.

Moved Redirects into Kuma

In production, many basic redirects are handled using Apache RewriteRules. This helped with the transition from DekiWiki to Kuma in 2012. As we move to AWS, we’d like to move this functionality into Kuma. This makes it easier to test and modify redirects, reduces differences between development and deployment, and reduces or eliminates the need for Apache or another web server.

pmac recently released django-redirect-url, which packages the redirects code used by bedrock. metadave integrated this library (PR 4217), and translated production Apache rules into Kuma code (PR 4220). The functional tests exposed an Apache configuration difference between staging and production, which our WebOps team fixed. The work continues in PR 4231.

Now that we have a redirects framework in Kuma, we may use it to help retire the custom zone URLs.

Retired Old Features

I removed some features that have been deprecated in the last year:

The changes removed 7,600 lines from the Kuma project, and means that we don’t have to explain this bit of history to new contributors. We’re using more of the native services of TravisCI, which makes our py27 build 30% faster, and lets us experiment with alternate environments and services.

Let Data be Data

There’s a lot of data on MDN, contributed over more than 10 years. A lot of that data is trapped in formats like HTML that made it easy to contribute, but hard to maintain and remix. We want to formalize this data in machine-parsable formats, so that MDN and others can use it in new and exciting ways.

mdn/browser-compat-data is a growing repository of Browser Compatibility data extracted from MDN. There were 36 merged PRs in May, and we’re using it on some of the compatibility tables on MDN.

mdn/data contains general data for Web technologies, starting with CSS data such as properties, selectors, and types. There were 12 merged PRs in May, and after some recent updates (PR 162 by jwhitlock and PR 183 by Elchi3) we’re using the master branch on MDN again.

With these data sources rapidly changing, there is pressure on KumaScript to move quickly and break less things. They can be loaded as npm packages (npm install mdn/browser-compat-data and npm install mdn/data), and with escattone’s PR 183, we’re loading some of the data this way. He also has switched from nodeunit to Mocha (PR 188), in preparation for automated testing of KumaScript macros.

Shipped Tweaks and Fixes

Here’s some other highlights from the 37 merged Kuma PRs in May:

Here’s some other highlights from the 19 merged KumaScript PRs in May:

Planned for June

Mozilla is gathering in San Francisco for an All-Hands meeting at the end of June, which leaves 3 week for development work. Here’s what we’re planning to ship in June:

Ship On-site Interactive Examples

We ran an A/B test on popular pages, showing half the users pages with small examples on top, and half without. We looked at the analytics, and we did not see a significant change in user behavior. We did get feedback that the samples are useful, especially for those reminding themselves how a familiar technology works.

We’re going ahead with the next phase. We’re going to make the new version the default, and start experimenting with interactive examples. Instead of looking for changes in site usage, we’ll focus on interaction and performance. schalkneethling is leading this next phase, and you can follow the work at mdn/interactive-examples.

Ship Brand Updates to Beta Users

Mozilla had a open design process to develop a new brand identity, and has a website detailing the results. This new brand is rolling out across Mozilla websites. We’ve also been thinking about the brand, mission, and focus of MDN, which has evolved over the last five years.

In June, we’ll start talking about the MDN brand, and will start shipping some of the new elements to beta users, such as updated logos, headers, and footers.

Add KumaScript Macro Tests

Currently, maintainers review KumaScript macro changes by manually testing them in development environments. This works for small changes, but big changes and complex macros are hard to test manually. In June, escattone will start adding regression tests for some key macros. When we have a working framework and some good examples, we’ll start asking staff and contributors to add tests for other macros, and to submit updated tests with PRs.

Ship the Sample Database

The Sample Database has been promised every month since October 2016, and has slipped every month. We don’t want to break the tradition, so we’ll bend it a little. The first bit of the supporting code, a scrape_user command, has been merged, and the rest of the code will ship in July. See PR 4248 for the scrape_document command, and PR 4076 for the remaining tasks.

MozMEAO SRE Status Report - 5/30/2017

Here’s what happened on the MozMEAO SRE team from May 23rd - May 30th.

Current work

Bedrock (mozilla.org)

Bedrock has been stable in production on Kubernetes for 7 days. The current traffic policy includes Virginia (K8s), Tokyo (K8s), Portland (Deis 1) and Ireland (Deis 1).

  • application limits/requests were increased to deal with initial performance issues.

  • we’re discussing replacing the usage of assets.mozilla.org on www.mozilla.org.

nucleus.mozilla.org

Nucleus has been moved from our Deis 1 infrastructure to Kubernetes in Virginia.

surveillance.mozilla.org

The surveillance site has been moved from our Deis 1 infrastructure to Kubernetes in Virginia.

snippets.mozilla.org

Web QA tests have been added by Stephen Donner to the snippets service.

Future work

Move basket.mozilla.org to K8s

We’re planning to move basket to Kubernetes shortly after the nucleus migration, and then proceed to decommission existing infrastructure.

Scale down Deis 1 clusters

Now that were serving a large portion of production traffic via Kubernetes, we can safely scale down the Portland and Ireland Deis 1/Fleet clusters to reduce AWS costs. We’ll also be provisioning a Portland Kubernetes cluster in the near future.

Decommission openwebdevice.org

We are waiting on some internal communications before moving forward.