A btree is a btree is a btree. And no matter how many servers you distribute the workload it is still going to take the same amount of time to find the record or records that you are looking for. By extension the very same thing can be said about a hash function lookup. And finally an unconstrained mapreduce produces the same results as any database scan. So we have gone from O(log(n)) to O(1) to O(n).
When you realize that the same numbers apply to SQL and NoSQL systems alike you have to start thinking critically. For example if the search time is the same between a SQL and NoSQL datastore then where are the differences?
- Network latency
- Merging results
- Distributed search on smaller subsets of the data
- Disk latency
I’m probably missing a few things here but the point I want to make is this:
- if you have a search that take 100 units of work
- if that search is initiated by 100 users
- this creates 1000 user/units of work.
- Now distribute that work over 100 or 1000 compute nodes and you get the same number of work units per compute node.
Assuming that everything else is fair and equal it should take exactly the same amount of time +/- just a little of the overhead I mention above.
Where the real differences are between NoSQL and SQL is that NoSQL uses CAP as it’s guiding principle and SQL uses ACID. ACID clearly has more overhead than CAP and CAP makes very few actual promises where ACID requires complete adherence.
So the next time you start thinking about the database you want to build… first decide whether CAP or ACID are more important. Then chose your brand.
PS: I’ve watched this pseudo-video a couple of times and I have no idea what the author is really promoting… but taking the message at face value is what interested me. At face value it is inline with my comments but from the other side of the same stream.