Evernote, the online service that lets millions of users store their notes and related content for free, is putting the finishing touches to its move into the Google Compute Cloud and closure of its two main data centers.
With 200 million users as of July 2016, that's no small task. Evernote has 350 TBs of note content, entered by customers. In addition, it has 2.58 PBs of related data that is stored in customer accounts as attachments. Users might file notes for a talk at a conference in Evernote. And they might attach their travel itinerary, references cited in the talk, and recent news clippings as attachments. The volume of attachments ends up far outstripping the volume of "notes."
Evernote CTO Anirban Kundu, formerly a distinguished engineer at GoDadddy, said in an interview that it was crucial to migrate both types of content without losing any data and without interfering with customer access. In fact, there would have to be a short period of two minutes or so when customers would not have access to their data as the master data store in an Evernote data center switched over to a master in the Google Cloud.
The downtime was necessary because users accessing their accounts during the final migration phase might write something new to their account after the automated migration process had concluded that the data transfer was complete. By banning user connections briefly, Evernote could complete a segment of the migration by "flipping over" from master data in its own data center to master data in the Google cloud.
In most cases, that blackout was limited to two minutes. In some cases it stretched as long as four minutes, still a relatively brief interval for non-transactional and occasionally-used "notes" data, Kundu said.
During the move, Evernote maintained three copies of the data and could lose one without jeopardizing the overall durability of customer data, he said. The smaller, customer notes data set was migrated into the cloud in 2.5 weeks without mishap. The attachment data took 60 days to make the move.
For eight years, Evernote had built up its customer base, including 20,000 paying business customers, by storing data in a Santa Clara, Calif., data center, with a second site outside Silicon Valley. "We require every piece of information to be stored in two different, geographical locations," said Kundu. The practice it will continue by using different data center locations in the Google cloud. The distinct geographical locations make it less likely that a natural disaster, such as a wildfire, a Hurricane Sandy or a California earthquake, would take out both locations.
Kundu divided the data intp 800 GB shards, and as he surveyed the task before him in November, he had 750 shards to migrate. The seven members of the IT staff dedicated to the task started cautiously, migrating one shard through a process that they had largely automated. But it still required the manual intervention at the end of the migration of three IT staffers to complete it. By the end of the first week, they were up to 16 shards a day. By the end of the second week, they were migrating 32 shards a day. At their peak, they were able to migrate 318 shards in a single day, but that proved the exception.
The first shards of data established at Evernote tended to have the most engaged customers using them and took longer to migrate. Kundu's team only attempted four of them at a time.
"We wanted to interfere at a minimum with existing traffic," he said. Upon completion of a shard, a final check was run to make sure all the data in the new master in the cloud matched what was still resident in the former master in the Evernote data center.
Kundu said he had misgivings about how the migration would turn out right up until Evernote migrated the first shard, ran it in the Google data center, and migrated it back to Santa Clara to prove they could do so. That test occurred the day before Thanksgiving. That shard of Evernote data performed in the cloud "with the level of latency we like to have. That was a big, big win," Kundu said.
"By the beginning of December, I was 80% confident it was going to happen" as planned, and Kundu's confidence grew as the team's ability to migrate multiple shards a day grew through the first two weeks.
"We had to build our own tools to look at the server logs and watch to see if any errors were occurring," he said. By monitoring the process, the team could be more confident that the final check of the new master against the old one would yield the right answer.
Through the automated process, the monitoring tools and the procedure for final switchover to master data in the cloud, all the migrations occurred as expected and checked out. At the start of the process, he had been able to envision glitches that would prompt a repeat attempt.
Evernote is slated to close its primary data center this week and is 50 days from being able to shut down its second center as well, except for a few residual management processes that will be kept running there, Kundu said.
"We didn't have to migrate any shard back into our data center. That was the thing I was most proud of," he said.
In addition, Evernote will be able to rely on the multiple data centers and scalable storage of the Google Cloud Platform while taking advantage of Google's machine-learning algorithms and other services being made available in the cloud, according to this account in Fortune.
Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.