Google on Friday blamed a recent Google Docs outage on a real-time collaboration update that exposed a glitch in its system.
Google Docs was inaccessible for about an hour on Wednesday afternoon, meaning users could not view or edit document lists, documents, drawings, or Apps Scripts.
“So what happened? The outage was caused by a change designed to improve real time collaboration within the document list,” Alan Warren, a Google engineering director, wrote in a blog post. “Unfortunately this change exposed a memory management bug which was only evident under heavy usage.”
Warren said that every time someone modifies a Google Doc, a machine works behind the scenes to look up the server that needs to be updated. The memory management bug, however, meant that these machines didn’t recycle their memory correctly after those lookups. As a result, they ran out of memory and restarted. While they restarted, other machines had to pick up their slack, causing them to also run out of memory.
“This meant that eventually the servers couldn’t properly process a large fraction of the requests to access document lists, documents, drawings, and scripts which led to the outage you saw on Wednesday,” Warren wrote.
Google was alerted about 60 seconds after the failure rate spiked, and engineers started rolling back the feature change 23 minutes later. That took another half hour, at which point Google Docs was restored, he said.
“We use Google Docs ourselves every day, so we feel your pain and are very sorry,” Warren wrote.
In the wake of the outage, Google has been examining its response and working on how to fix similar problems faster in the future, he said. “We’re committed to keeping Google’s services exceptionally reliable.”
Google was not the only tech company to experience an outage this week; Microsoft’s cloud services also went down for a few hours yesterday.