MozMEAO SRE Status Report - October 31, 2017

Here’s what happened on the MozMEAO SRE team from October 24th - October 31st.

Current work

SUMO

An initial infra project structure for SUMO, along with S3 + Cloudfront distributions for dev, stage and production environments has been created in this PR.

Future SUMO migration tasks will be tracked here, while infrastructure code will be stored here.

MDN

MDN attachments are now behind a Cloudfront CDN to help reduce load on the MDN web pods.

SRE misc

MozMEAO SRE Status Report - October 24, 2017

Here’s what happened on the MozMEAO SRE team from October 17th - October 24th.

Current work

SUMO

@glogiotatidis, @jgmize, @metadave and @pmac are starting to plan on a SUMO migration from the SCL3 datacenter to AWS. Work will be tracked in a forthcoming project in the https://github.com/mozilla/kitsune repo.

MDN

SRE misc

MozMEAO SRE Status Report - October 17, 2017

Here’s what happened on the MozMEAO SRE team from October 3rd - October 17th.

Current work

MDN Migration to AWS

MDN is now running in AWS on Kubernetes.

Our migration project has been closed, and a new post-migration project has been opened.

Migration work
Migration testing
Post migration

Upcoming Portland Deis 1 cluster decommissioning

Applications are being moved off Deis 1 to support decommissioning the Deis 1 cluster in Portland.

Kuma Report, September 2017

Here’s what happened in September in Kuma, the engine of MDN Web Docs:

  • Ran Maintenance Mode Tests in AWS
  • Updated Article Styling
  • Continued Conversion to Browser Compat Data
  • Shipped Tweaks and Fixes

Here’s the plan for October:

  • Move MDN to AWS
  • Improve Performance of the Interactive Editor

Done in September

Ran Maintenance Mode Tests in AWS

Back in March 2017, we added Maintenance Mode to Kuma, which allows the site content to be available when we can’t write to the database. This mode got its first workout this month, as we put MDN into Maintenance Mode in SCL3, and then sent an increasing percentage of public traffic to an MDN deployment in AWS.

We ran 3 tests in September. In the first, we just tried Maintenance Mode with production traffic in SCL3. In the second test we sent 5% of traffic to AWS, and in the third test we ramped it up to 15%, then 50%, and finally 100%. The most recent test, on October 3, included New Relic monitoring, which gave us useful data and pretty charts.

Web Transactions Time shows how the average request is handled by the different services. For the SCL3 side, you can see a steady improvement in transaction time from 125 to 75 ms, as more traffic is handled by AWS.

SCL3 transaction time

On the AWS side, the response time grows from 40 to 90 ms, as the DNS configuration sends 100% of traffic to the new cluster.

AWS transaction time

The Web Transaction Percentiles chart shows useful statistics beyond the average. For example, 99% of users see at least 375 ms response time, and the median is at 50 ms.

SCL3 transaction percent

On the AWS side, 99% of users see at least 350 ms response time (slightly better), and the median is at 100 ms (slightly worse).

AWS transaction percent

Finally, Throughput measures the requests handled per minute. SCL3 continued handling over 500 requests per minute during the test. This may be due to clients using old DNS records, or because KumaScript continues making requests to render out-of-date pages.

SCL3 throughput

AWS ramped up to over 2000 requests per minute during the test, easily handing the load of a US afternoon.

AWS throughput

We consider this a successful test. Our AWS environment can easily handle regular, read-only MDN traffic, with capacity to spare. We don’t expect MDN users to notice much of a difference when we make the change.

Updated Article Styling

We’re working on the next phase of redesigning MDN. We’re looking at ways to present MDN articles, to make them easier to read, to scan quickly, and to emphasize the most useful information. We’re testing some ideas with users, and some of the adjustments showed up on the site this month.

For example, MDN documents a lot of code in prose, such as HTML element and attribute names. In PR 4400, Stephanie Hobson added a highlight background to make these stand out.

Before PR 4400, a fixed-width font was used to display literals:

Before 4400 no highlight

After PR 4000, the literals stand out with a light grey background:

After 4400 highlight

There’s a lot that goes into making text on the web readable (see Stephanie’s slides from her talk at #a11yTOConf for some suggestions). One of the things we can do with the default style is to try to make lines about 50-75 characters wide. On the other hand, code examples don’t wrap well, and we want to make them stand out. We’re experimenting with style changes for line length with beta testers, using some of the ideas from blog.mozilla.org. For example, PR 4402 expands the sample output, making the examples stand out from the rest of the page.

Before PR 4402, the examples shared the text’s narrow width:

Before 4402 narrow

After PR 4402, the example is as wide as the code samples, and the buttons restyled:

After 4402 narrow

We’ll test more adjustments with beta testers and in individual user tests. Some of these we’ll ship immediately, and others will inform the article redesign.

Continued Conversion to Browser Compat Data

The Browser Compat Data (BCD) project now includes all the HTML and JavaScript compatibility data from MDN. 1,500 MDN pages now generate their compatibility tables from this data. Only 4,500 more to go!

The BCD project was the most active MDN project in September. There were 159 commits over 90 pull requests. These PRs came from from 18 different contributors, bringing the total to 50 contributors. There’s over 58,000 additional lines in the project. 13 of these PRs are from Daniel D. Beck, who is joining the MDN team as a contractor.

This progress was made possible by Florian Scholz, Jean-Yves Perrier, and wbamberg, who quickly and accurately reviewed the PRs, working out issues and getting them merged. Florian has also started a weekly release of the npm package, and we’re up to mdn-browser-compat-data 0.0.8.

Shipped Tweaks and Fixes

There were many PRs merged in September:

Here are some of the highlights:

Planned for October

Work will continue to migrate to Browser Compat Data, and to fix issues with the redesign and the new interactive examples.

Move MDN to AWS

This week, we’ll complete our functional testing of MDN, making sure that page editing and other read/write tests are working, and that the rarely used features continue to work.

On Tuesday October 10, we’ll put SCL3 in Maintenance Mode again, move the database, and come back with MDN in AWS.

We’ve done a lot of preparation, but we expect something to break, so we’re planning on fixing AWS-related bugs in October. The AWS move will also allow us to improve our deployment processes, helping us ship features faster. If things go smoothly, we have plenty of other work lined up, such as style improvements, SEO-related tweaks, updating to Django 1.11, and getting KumaScript UI strings into Pontoon.

Improve Performance of the Interactive Editor

We’re continuing the beta test for the interactive editor. The feedback has been overwhelming positive, but we’re not happy with the page speed impact. We’ll continue work in October to improve performance. In the meantime, contractor Mark Boas is preparing examples for the launch, such as 26 examples for JavaScript expressions and operators (PR 286).

MozMEAO SRE Status Report - October 3, 2017

Here’s what happened on the MozMEAO SRE team from September 26th - October 3rd.

Current work

MDN Migration to AWS

We’ve successfully completed a series of tests against MDN hosted in AWS, but we have a few more to complete before moving to AWS.

Testing

  • A successful MDN maintenance mode test was performed on Tuesday October 3rd 2017, at 2pm eastern / 11 pacific.

Migration work

  • Restrict URLs for untrusted (files / samples) and CDN domains. PR 529

  • New Relic support has been added to the MDN Kubernetes deployments in these PRs: 549, 548, 547, 542

  • MDN K8s crontasks have been updated to change the process user:group to kuma, add Deadmanssnitch support, and some optimizations to prevent aws s3 sync from timing out. PR 533

  • Unused MDN S3 buckets have been deleted, with some manual cleanup due to versioning enabled on the buckets. PR 531

Upcoming Portland Deis 1 cluster decommissioning

Applications are being moved off Deis 1 to support decommissioning the Deis 1 cluster in Portland.