MozMEAO SRE Status Report - February 16, 2018

Here’s what happened on the MozMEAO SRE team from January 23 - February 16.

Current work

SRE general

Load Balancers

Cloudflare to Datadog service

  • The Cloudflare to Datadog service has been converted to use a non-helm based install, and is running in our new Oregon-B cluster.

Oregon-A cluster

  • We have a new Kubernetes cluster running in the us-west-2 AWS region that will run support.mozilla.org (SUMO) services as well as many of our other services.

Bedrock

  • Bedrock is moving to a “sqlitened” version in our Oregon-B Kubernetes cluster that removes the dependency on an external database.

MDN

  • The cronjob that performs backups on attachments and other static media broke due to a misconfigured LANG environment variable. The base image for the cronjob was updated and deployed. We’ve also added some cron troubleshooting documentation as part of the same pull request.

  • Safwan Rahman submitted an excellent PR to optimize Kuma document views 🎉🎉🎉.

support.mozilla.org (SUMO)

  • SUMO now uses AWS Simple Email Service (SES) to send email.
  • We’re working on establishing a secure link between SCL3 and AWS for MySQL replication, which will help us signficantly reduce the amount of time needed in our migration window.
  • SUMO is now using a CDN to host static media
  • We’re working on Python-based Kubernetes automation for SUMO based on the Invoke library. Automation includes web, cron and celery deployments, as well as rollout and rollback functionality.
  • Using the Python automation above, SUMO now runs in “vanilla Kubernetes” without Deis Workflow.

MDN Changelog for January 2018

Here’s what happened in January to the code, data, and tools that support MDN Web Docs:

Here’s the plan for February:

Done in January

Completed CSS Compatibility Data Migration and More

Thanks to Daniel D. Beck and his 83 Pull Requests, the CSS compatibility data is migrated to the browser-compat-data repository. This finishes Daniel’s current contract, and we hope to get his help again soon.

The newly announced MDN Product Advisory Board supports the Browser Compatibility Data project, and members are working to migrate more data. In January, we saw an increase in contributions, many from first-time contributors. The migration work jumped from 39% to 43% complete in January. See the contribution guide to learn how to help.

On January 23, we turned on the new browser compatability tables for all users. The new presentation provides a good overview of feature support across desktop and mobile browsers, as well as JavaScript run-time environments like Node.js, while still letting implementors dive into the details.

Florian Scholz promoted the project with a blog post, and highlighted the compat-report addon by Eduardo Bouças that uses the data to highlight compatibility issues in a developer tools tab. Florian also gave a talk about the project on February 3 at FOSDEM 18. We’re excited to tell people about this new resource, and see what people will do with this data.

compat-report

Shipped a New Method for Declaring Language Preference

If you use the language switcher on MDN, you’ll now be asked if you want to always view the site in that language. This was added by Safwan Rahman in PR 4321.

language-switcher

This preference goes into effect for our “locale-less” URLs. If you access https://developer.mozilla.org/docs/Web/HTML, MDN uses your browser’s preferred language, as set by the Accept-Language header. If it is set to Accept-Language: en-US,en;q=0.5, then you’ll get the English page at https://developer.mozilla.org/en-US/docs/Web/HTML, while Accept-Language: de-CH will send you to the German page at https://developer.mozilla.org/de/docs/Web/HTML. If you’ve set a preference with this new dialog box, the Accept-Language header will be ignored and you’ll get your preferred language for MDN.

This is useful for MDN visitors who like to browse the web in their native language, but read MDN in English, but it doesn’t fix the issue entirely. If a search engine thinks you prefer German, for instance, it will pick the German translations of MDN pages, and send you to https://developer.mozilla.org/de/docs/Web/HTML. MDN respects the link and shows the German page, and the new language preference is not used.

We hope this makes MDN a little easier to use, but more will be needed to satisfy those who get the “wrong” page. I’m not convinced there is a solution that will work for everyone. I’ve suggested a web extension in bug 1432826, to allow configurable redirects, but it is unclear if this is the right solution. We’ll keep thinking about translations, and adjusting to visitors’ preferences.

Increased Availability of MDN

MDN easily serves millions of visitors a month, but struggles under some traffic patterns, such as a single visitor requesting every page on the site. We continue to make MDN more reliable despite these traffic spikes, using several different strategies.

The most direct method is to limit the number of requests. We’ve updated our rate limiting to return the HTTP 429 “Too Many Requests” code (PR 4614), to more clearly communicate when a client hits these limits. Dave Parfitt automated bans for users making thousands of requests a minute, which is much more than legitimate scrapers.

Another strategy is to reduce the database load for each request, so that high traffic doesn’t slow down the database and all the page views. We’re reducing database usage by changing how async processes store state (PR 4615) and using long-lasting database connections to reduce time spent establishing per-request connections (PR 4644).

Safwan Rahman took a close look at the database usage for wiki pages, and made several changes to reduce both the number of queries and the size of the data transmitted from the database (PR 4630). This last change has significantly reduced the network traffic to the database.

network-traffic-drop response-time

All of these add up to a 10% to 15% improvement in server response time from December’s performance.

Ryan Johnson continued work on the long-term solution, to serve MDN content from a CDN. This requires getting our caching headers just right (PR 4638). We hope to start shipping this in February. At that point, a high-traffic user may still slow down the servers, but most people will quickly get their content from the CDN instead.

Shipped Tweaks and Fixes

There were 326 PRs merged in January:

67 of these were from first-time contributors:

Other significant PRs:

Planned for February

Continue Development Projects

In February, we’ll continue working on our January projects. Our plans include:

  • Converting more compatibility data
  • Serving developer.mozilla.org from a CDN
  • Updating third-party libraries for compatibility with Django 1.11
  • Designing interactive examples for more complex scenarios
  • Preparing for a team meeting and “Hack on MDN” event in March

See the December report for more information on these projects.

How to make a chart of your users' window sizes

In preparation for the MDN redesign I examined our analytics to get an idea of how wide our users’ browser windows were. I wanted window widths, not screen sizes and I thought a chart would tell a more compelling story than a table.

Here’s the chart I made:

Chart of MDN window widths showing spikes at 1350 and 1900 pixels and very
little between 420 and 930 pixels.

I found this view useful because it shows us “clumps” of window sizes.

How to make a chart of browser window widths

The basic idea is:

  1. Create and export a Custom Report for Browser Size.
  2. Filter the Browser Size to just include widths.
  3. Aggregate the number of users for each width.
  4. Make a chart.

Working with Google Analytics and Google Sheets the specific steps I used were:

  1. Create a custom report for browser sizes.
    1. Customization > Custom Reports > New Custom Report
    2. Set the Metric Group to Users
    3. Set Dimension Drilldowns to Browser Size
    4. Save
  2. View the custom report.
  3. Set Show rows: to 5000.
  4. Export to Google Sheets.
  5. Delete the extra stuff from the top and bottom of the export, you just want two columns: Browser Size, and Users.
  6. Create a new column (C) called Width. Add this regex to it and fill down: =REGEXEXTRACT(A2, "^[0-9]+"). This gives you a column with just the width part of the browser size.
  7. Create a new column (D) called Unique List. Add this formula to it (you don’t need to fill down): =UNIQUE(C2:C5001).This gives you a list of widths with no repeating values. That means 1900x950 and 1900x970 will be treated the same in our final chart.
  8. Create a new column (E) called Conditional Sums. Add this formula and fill down the height of your Unique List: =SUMIF(C$2:C$5001,D2,B$2:B$5000).
  9. Copy the Unique List and Conditional Sums columns.
  10. Create a new sheet in your document.
  11. Use Edit > Paste special > Paste values only to paste only the computed values of these columns.
  12. Rename Unique List to Width and Conditional Sums to Total Users.
  13. Find the (not set) row and delete it.
  14. Make sure both columns are being treated as numbers (a hint this is happening properly is that they are right aligned). If you have headings on the columns make sure they’re frozen (View > Freeze > 1 row).
  15. Sort on Width from A→Z.
  16. Select both columns and create a chart (Insert > Chart). (I made a “Stepped area chart”)
  17. Set Width as the X-axis.
  18. Done!

This answered a question I’ve been curious about for ages: Do people with large monitors use MDN full screen? About 40% of our users have a screen resolution of 1900px or wider and 25% of our users use MDN at 1900px or wider.

MozMEAO SRE Status Report - January 23, 2018

Here’s what happened on the MozMEAO SRE team from December 2017 - January 23.

Current work

SRE general

We’re busy setting up multiple Kubernetes 1.8 clusters in us-west-2 to serve SUMO, Bedrock and other MozMEAO applications. These new clusters will replace our Deis 1 cluster in the same region.

www.mozilla.org

The Bedrock team will be moving from an RDS database to updates distributed via S3. This will make Bedrock hosting cheaper and easier to manage.

Additionally, Bedrock needs to be tweaked to run on Kubernetes natively as Deis Workflow is being discontinued.

MDN

MDN is switching it’s Celery results backend from MySQL to Redis, to avoid database reads/writes. No significant difference in throughput noticed.

support.mozilla.org (SUMO)

Work is progressing quickly on the SUMO move from SCL3 to AWS.

User media is now hosted by S3 and Cloudfront in SCL3 as of 2017-01-17 14:14 UTC. This makes our migration easier as it’s one less component we have to plan for on go-live day.

We’ve been discussing SUMO database architecture with a focus on high uptime. We’re also discussing our Elasticsearch architecture.

Work on a new CDN is in-progress for hosting static media in AWS. We’ll need to request a few DNS changes and certificates in order to proceed.

The team is also working on establishing MySQL replication between SCL3 and RDS, in order to significantly decrease the deployment window on migration day.

Kuma Report, December 2017

Here’s what happened in December in Kuma, the engine of MDN Web Docs:

Here’s the plan for January:

Done in December

Purged 162 KumaScript Macros

We moved the KumaScript macros to GitHub in November 2016, and added a new macro dashboard. This gave us a clearer view of macros across MDN, and highlighted that there were still many macros that were unused or lightly used. Reducing the total macro count is important as we change the way we localize the output and add automated tests to prevent bugs.

We scheduled some time to remove these old macros at our Austin work week, when multiple people could quickly double-check macro usage and merge the 86 Macro Massacre pull requests. Thanks to Florian Scholz, Ryan Johnson, and wbamberg, we’ve removed 162 old macros, or 25% of the total at the start of the month.

macro-massacre

Increased Availability of MDN

We made some additional changes to keep MDN available and to reduce alerts. Josh Mize added rate limiting to several public endpoints, including the homepage and wiki documents (PR 4591). The limits should be high enough for all regular visitors, and only high-traffic scrapers should be blocked.

I adjusted our liveness tests, but kept the database query for now (PR 4579). We added new thresholds for liveness and readiness in November, and these appear to be working well.

We continue to get alerts about MDN during high-traffic spikes. We’ll continue to work on availability in 2018.

Improved Kuma Deployments

Ryan Johnson worked to make our Jenkins-based tests more reliable. For example, Jenkins now confirms that MySQL is ready before running tests that use the database (PR 4581). This helped find an issue with the database being reused, and we’re doing a better job of cleaning up after tests (PR 4599).

Ryan continued developing branch-based deployments, making them more reliable (PR 4587) and expanding to production deployments (PR 4588). We can now deploy to staging and production by merging to stage-push and prod-push for Kuma as well as KumaScript, and we can monitor the deployment with bot notifications in #mdndev. This makes pushes easier and more reliable, and gets us closer to an automated deployment pipeline.

Added Browser Compatibility Data

Daniel D. Beck continued to convert CSS compatibility data from the wiki to the repository, and wrote 35 of the 57 PRs merged in December. Thanks to Daniel for doing the conversion work, and thanks to Jean-Yves Perrier for many reviews and merges over the holiday break.

Stephanie Hobson continued to refine the design of the new compatibility tables, including an icon for the Samsung Internet Browser and an updated Firefox icon (Kuma PR 4605). Florian Scholz added a legend, to explain the notation (KumaScript PR 437). We’re getting closer to shipping these to all users. Please give any feedback at Beta Testing New Compatibility Table on Discourse.

Said Goodbye to Stephanie Hobson

Stephanie Hobson is moving to the bedrock team in January, where she’ll help maintain and improve www.mozilla.org. Schalk Neethling will take over as the primary front-end developer for MDN Web Docs.

Over the past 3½ years, Stephanie has had a huge impact on MDN. She shared her expertise on accessibility, multi-language support, readable HTML tables and all things Google Analytics. She advocated for the users during the spam mitigations and Persona shutdown. She’d argue for design changes from a web developer’s perspective, and back it up with surveys and interviews.

She’s also a talented developer, authoring over 400 PRs. She’s responsible for a lot of the changes on MDN in 2017:

mdn-one-year

Schalk has been working on MDN for most of 2017. He’s been focused on the interactive examples project that fully shipped in December. He’s also been reviewing front-end PRs, and his feedback and suggestion have improved the front-end code for months. In December, Stephanie and Schalk worked closely to make a smooth transition, which included getting all the JavaScript to pass eslint tests (PR 4596 and PR 4597).

We look forward to seeing what Stephanie will do on bedrock, and we look forward to Schalk’s work and fresh perspective on MDN Web Docs.

Shipped Tweaks and Fixes

There were 209 PRs merged in December (which was supposed to be a light month):

Several of these were from first-time contributors:

Other significant PRs:

Planned for January

We’re contining on existing projects like BCD in January, and starting some larger projects that will start to ship in February.

Prepare for a CDN

We’ve exhausted the easy solutions for increasing availability on MDN. We believe the next step is to put developer.mozilla.org behind a Content Distribution Network, or CDN. Once we have everything setup, most requests won’t even hit the Kuma engine, but instead will be handled by caching servers around the world. We expect it to take 1 - 2 months before we can get the majority of requests served by the CDN.

A first step is to reduce the page variants sent to anonymous users, so that the CDN edge servers can handle most requests. Schalk Neethling has been removing waffle flags or migrating them to switches over many PRs, such as PR 4561.

In January, Ryan Johnson will start adding the caching headers needed for the CDN to store and serve the pages without contacting Kuma.

We believe a CDN will reduce downtime and alerts from increased traffic. More importantly, we expect it will speed up MDN Web Docs for visitors outside the US.

Ship More Interactive Examples

We launched the interactive example editor on a dozen pilot pages, and the analytics look good. Just before the holiday break, we decided we can ship the interactive example editor to any MDN page. You can see it on CSS background-size, Javascript Array.slice(), and more.

background-size array-slice

We have many more interactive examples ready to publish, including many JavaScript examples by Mark Boas. We’ll roll these and more out to MDN. We’ll also start on HTML interactive examples, and we’re planning to ship them in February. Follow mdn/interactive-examples to see the progress and learn how to help.

Update Django to 1.11

MDN Web Docs is built on top of Django. We’re currently using Django 1.8, first released in 2015. It is a Long-Term Release (LTS) that will be supported with security updates until at least April 2018. Django 1.11, released in 2017, is the new LTS release, and will be supported until at least April 2020. In January, we’ll focus on updating our code and third-party libraries so that we can quickly make the transition to 1.11.

For now, our plan is to stay on Django 1.11 until April 2019, when Django 2.2, the next LTS release, is shipped. Django 2 requires Python 3, and it may take a lot of effort to update Kuma and switch to third-party libraries that support Python 3. We’ll make a lot of progress during the 1.11 transition, and we’ll monitor our Django 2 and Python 3 compatibility in 2018.

Plan for 2018

We have a lot of things we have to do in Q1 2018, such as the CDN and Django 1.11 update. We postponed a detailed plan for 2018, and instead will spend some of Q1 discussing goals and priorities. During our discussions in December, a few themes came up.

For the MDN Web Docs product, the 2018 theme is Reach. We want to reach more web developers with MDN Web Docs data, and earn a key place in developers’ workflows. Sometimes this means making developer.mozilla.org the best place to find the information, and sometimes it means delivering the data where the developer works. We’re using interviews and surveys to learn more and design the best experience for web developers.

For the technology side, the 2018 theme is Simplicity. There are many seldom-used Kuma features that require a history lesson to explain. These make it more complicated to maintain and improve the web site. We’d like to retire some of these features, simplify others, and make it easier to work on the code and data. We have ideas around zone redirects, asset pipelines, and translations, and we hope to implement these in 2018.

One thing that has gotten more complex in 2017 is code contribution. We’re implementing new features like browser-compat-data and interactive-examples as their own projects. Kuma is usually not the best place to contribute, and it can be challenging to discover where to contribute. We’re thinking through ways to improve this in 2018, and to steer contributor’s effort and enthusiasm where it will have the biggest impact.