During his keynote at this month’s AWS re:Invent virtual conference, AWS CTO Werner Vogels introduced a number of organizations that detailed how they adapted to the pandemic and other market shifts by leveraging the cloud. That included toy giant The Lego Group’s move to serverless computing to support its online presence, particularly after customer demand outstripped its on-premise resources.
Much like others in ecommerce, Lego sees extremely spikey traffic patterns, said Nicole Yip, engineering manager in direct shopper technology at The Lego Group. She discussed how the team behind Lego.com deals with sudden increases in demand for its services, usually tied to product launches and sales events that encourage throngs of customers to access the site at the same time. “Imagine trying to tackle all of that spikiness and year-on-year growth with an on-premise monolith tied to back-end systems with limited scale,” Yip said.
Sometimes such monoliths stumbles over themselves. In 2017, Lego faced a high-profile sales event, she said, for the Star Wars Millennium Falcon set -- the company’s biggest set to date. “On the launch day, we experienced a huge spike in traffic that resulted in our back-end services being overwhelmed,” Yip said. “All our customers could see was the maintenance page.”
The culprit service that failed the hardest was a small piece of functionality that calculated sales tax, she said. It made a call back to the on-premise tax calculation system that quickly reached its limits. “At that point, we knew that we were on a trajectory for growth that could no longer be sustained with an on-premise system,” Yip said.
Three key drivers, she said, led to Lego’s decision move to the cloud:
- Instead of maintaining infrastructure that was not a differentiating factor for the company, Lego could focus on customer experiences.
- It became critical for the company to possess flexibility to respond to spikes in demand and access to the precise capacity it needed.
- Having composable architecture at the most granular levels increased Lego’s speed to market and ability to continue innovating.
With cloud resources in place, Lego could focus on its business logic, Yip said, and spread it across several layers of serverless services. “We batched them by carefully selected, third-party vendors who provide specialized services like payment providers and content management systems,” she said. Each layer was designed to scale automatically and independently, Yip said, to support constantly changing traffic profiles.
Lego’s journey to the cloud began in 2018, she said, by migrating a single baked-in service, sales tax calculation, followed by three back-end processing services. After 10 months, Yip said Lego matched its existing on-premise capabilities through a completely serverless platform that handled the same level of traffic and transactions. Soon the serverless platform exceeded those rates of transaction and traffic, setting new records every month, she said.
As Lego started 2020 with a cloud roadmap, a growing team, a platform that was just a few months old, Yip said a question was raised in response to the pandemic and changes in consumer behavior. “Could we deliver on that ambitious roadmap with twice the number of engineers all onboarded remotely and keep the platform stable?” she asked. This all occurred while also seeing high levels of traffic. Yip said Lego went on to double the number of its services while handling increasingly busy sales periods.
In the past year and a half, Lego also tripled the engineers on its team and launched another 36 serverless services, she said. “The growing team meant we had to distribute many tasks previously held centrally by an infrastructure team,” Yip said. “Automation has been key to supporting the ever-growing squads and application engineers to get their features and services into production.”
Lego’s ultimate goal on this front is to develop its application engineers into DevOps engineers, she said, who own and operate their services in production. One step toward that objective includes introducing a standard where all serverless services implement canary deployments of software updates -- a strategy of rolling out changes to a small subset of servers for testing before a wider release. The serverless operations also include on-call teams monitoring key, high-level metrics centrally, Yip said, with default alerts in place based on the profile of each service.
“This is giving our engineering team a starting point of how to monitor their services in production and not only detect but react to issues that are happening in their space quickly,” she said. The rapid growth of team, Yip said, with engineers possessing different levels of experience, meant tacit standards would no longer suffice. Guidelines that Lego introduced for deployment and monitoring of services are intended to make it easier for team members to take ownership, she said. Next up, she said Lego plans to focus on standards for its remaining reliability and performance pillars and increase visibility. “We want to show our engineers the services they own, what state they are in, all in one place,” Yip said.
Follow up with these articles on cloud migration and serverless: