As a startup company, Lyft wasn't so different from many other young companies. It skipped the data center building phase, never bought servers, and instead established its business in the cloud.
"It was 100% based on Amazon Web Services. We had three servers in US East (Ashburn, Va.), each in a different availability zone," recalled Chris Lambert, the CTO of Lyft, during a recent talk during the Amazon Summit in New York on Aug. 11.
That decision was made in 2012, when the company got started. Four years later, "We have thousands of servers today, powering over 150 different microservices," said Lambert in an email exchange with InformationWeek. Lyft is on a rate of $1 billion in revenues. But, says Lambert, it still doesn't have a data center.
"We're still 100% on Amazon but we're taking advantage of many more Amazon technologies than just EC2," Lambert said, after being invited to the stage during a keynote by AWS CTO Werner Vogels.
Lyft is an example of how a company can start out small with Amazon and within a few years find it needs some of the business IT services that Amazon has evolved into providing.
For example, Lyft uses Autoscaling, DynamoDB, Load Balancing, and Kinesis data streaming.
In addition, the company started out using Amazon's DynamoDB service for the data it gathered on each ride, including the GPS data, after outgrowing its own server-hosted database. The data built up quickly, but Lyft managers liked the ease with which DynamoDB was able to scale with the growth.
"It was so simple to scale out. We had two knobs. One was for reads and one was for writes," Lambert recalled.
The engineering team didn't have to worry about "chunk migration" or look for an alert if when things went awry. "It just works," Lambert said at the summit.
In an additional email comment, Lambert wrote:
We now have hundreds of different DynamoDB tables in production, across many different production services. We're not releasing any numbers on storage size, but it's fair to say that we haven't been constrained by any capacity limits on DynamoDB.
In part, keeping the focus on the rides and not the compute infrastructure is a constant goal. Lambert recalls Lyft's first application on EC2 was so simple that the two engineers who conceived it could see data indicating the first passenger had arrived at his destination, but they weren't sure they should believe it.
"They said, 'Maybe it worked ... we actually don't know,'" he told the summit audience. So they got the driver's phone number and called him.
"'Did you just drop off Darrell?' He said, 'yes.' I don't know who was more surprised: The engineers that it worked or the driver that we had called him," Lambert recounted. The driver probably figured that would be his experience with every drop-off, he joked.
Expansion in Real-Time
That first passenger drop-off occurred May 31, 2012. By March 2014, Lyft was active in San Francisco, Los Angeles, and a few other cities. However, it decided it was time to make a big move.
Again, it wanted to maintain simplicity of operations but at a much larger scale.
"We decided to expand into 24 cities. In engineering, we said, 'Let's get out of the way. Our infrastructure should scale up seamlessly,'" Lambert said. At that point, the engineers invoked Amazon's Auto Scaling service.
Auto Scaling let Lyft's operation expand to cope with its busiest period of the week -- Saturday night -- and it helped Lyft in another way.
"We do eight times the rides at our peak time on Saturday night compared to Sunday morning," Lambert noted. Auto Scaling allowed Lyft operations to automatically shut down servers that it no longer needed. "Scaling down is equally important. You save money when you don't use all this capacity that you don't need."
Lambert added to his summit comments in an email message:
From day one, Lyft has always had a solid track record with service orchestration and configuration management. This got us a lot of the way towards being prepared for autoscaling, but there were some subtleties around log rotation and draining event queues that we wanted to button up, and we were able to do that before the 24-city launch in March.
Once we made the move to autoscaling, scaling our service tier was no longer an operational task that our engineering teams had to think about, freeing us up to invest more in other areas of the business.
Lyft continued to expand its use of Amazon IT services over the next year, building up to August 2014, when it had its "single biggest launch, Lyft Lines, our shared ride service," Lambert said.
Lyft had been collecting reams of data on its rides, including all the GPS points recorded, and turned that data over to its data scientists. Every few months they ran simulations on it in Amazon Redshift, asking, "What might be possible in the way of new transportation modes," Lambert said.
"We found often during peak times, if two users request rides within a few minutes of each other over similar routes, we can offer them the option of sharing the ride, with up to a 50% savings possible," Lambert said. "We realized at peak times, 90% of our rides were co-similar enough that we could actually build a shared rides product... "
Getting More From Data
The data mining in Redshift and resulting application proved extremely valuable to the young firm. It helped differentiate it from competitors, such as Uber. "It's our biggest driver of growth in our biggest markets, and is a feature of the company," said Lambert.
"We have a number of teams that are working on data driven approaches to improve the Lyft Line experience, even today," Lambert added in an email message.
To some extent, Lyft appears to have become habituated to the services that Amazon was offering and willing to experiment with ways to make them useful to its business. One such service, Kinesis, is a streaming service that many people tend to think of as something you would use to collect data from devices on the internet of things.
Lyft engineers realized it might fit something else they wanted to do.
By August 2015, a year after the Lyft Lines application started running, Lyft was experiencing a build-out of small services designed to operate independently in the cloud. By August 2016, it had 100 microservices in operation.
"We had a proliferation of microservices, all serving production traffic, and we wanted an easy, simple way for them to communicate with each other. We built this pub/sub system on top of Kinesis that funnels every single production event through this system," he said.
"Every time you open the app, every time you request a ride, every time the car actually moves, it goes through this system," he told the crowd.
[Want to see what AWS customers are using? Read AWS S3, Data Transfer Among Its Most Popular Services: Report.]
As software events stream through Kinesis, other applications can tell which events they wish to be notified of.
"Any other service at Lyft can subscribe to it," Lambert said. "I care about credit card adds (new credit cards added to personal accounts). So when a customer adds a card, our fraud system can say: 'Every time a user adds a new payment system or removes a payment system, let me know because I want to rescore that user's fraud risk.'"
Kinesis event and data streams will allow Lyft to constantly check its pricing versus passenger traffic, implementing demand-based pricing during peaks. That's something that Lyft doesn’t want to get wrong and yield advantage to competitors, he said.
Lyft's systems appear to be doing the job. As the company creeps up on $1 billion in revenue, it's carrying 10,000 people at any given moment and delivering 14 million rides a month. "Downtime is not an option for us," said Lambert. So far, AWS has been "reliable and available."
What he likes most is the way Lyft concentrates on delivering rides and the customer experience of these rides, while Amazon concentrates on the infrastructure. "This is a complex infrastructure with a lot of moving parts -- but it's simple for us to maintain," he said.