The app engine team in its RCA explanation stops at this point :
The Datastore relies on Bigtable to store data (read more about
Bigtable here: http://labs.google.com/papers/bigtable.html). One of
the components of the Bigtable is a repository for determining where a
specific entity is located in the distributed system. Due to
instability in the cluster, this component became overloaded.
Instability in the cluster? Sounds like a bug in the system which Google does not want to talk about in detail. But why go through this whole ritual of trying to be transparent with the community? Being an engineer, this is not an acceptable RCA. Who are they kidding? We don’t know whether the problem is really fixed or it is still lurking out there. Knowing a thing or two about clustering, instabilities of any kind are not that easy to fix. Let us wait and watch.
Still a fan of GAE.