RSS

Tag Archives: nosql

Common Sense – NoSQL

A btree is a btree is a btree. And no matter how many servers you distribute the workload it is still going to take the same amount of time to find the record or records that you are looking for. By extension the very same thing can be said about a hash function lookup. And finally an unconstrained mapreduce produces the same results as any database scan. So we have gone from O(log(n)) to O(1) to O(n).

When you realize that the same numbers apply to SQL and NoSQL systems alike you have to start thinking critically. For example if the search time is the same between a SQL and NoSQL datastore then where are the differences?

  • Network latency
  • Merging results
  • Distributed search on smaller subsets of the data
  • Disk latency

I’m probably missing a few things here but the point I want to make is this:

  • if you have a search that take 100 units of work
  • if that search is initiated by 100 users
  • this creates 1000 user/units of work.
  • Now distribute that work over 100 or 1000 compute nodes and you get the same number of work units per compute node.

Assuming that everything else is fair and equal it should take exactly the same amount of time +/- just a little of the overhead I mention above.

Where the real differences are between NoSQL and SQL is that NoSQL uses CAP as it’s guiding principle and SQL uses ACID. ACID clearly has more overhead than CAP and CAP makes very few actual promises where ACID requires complete adherence.

So the next time you start thinking about the database you want to build… first decide whether CAP or ACID are more important. Then chose your brand.

PS: I’ve watched this pseudo-video a couple of times and I have no idea what the author is really promoting… but taking the message at face value is what interested me. At face value it is inline with my comments but from the other side of the same stream.

 
Leave a comment

Posted by on 2012/04/16 in database

 

Tags: , , ,

NoSQL != NoDBA

For the reader who is not familiar; the title of this article reads: NoSQL not equal NoDBA. And what I mean by it is that while the traditional function of the DBA is different in the NoSQL environment; one still needs a subject matter expert (SME) on the payroll in order to keep the “engine” running smoothly. NoSQL is just another specialty.

Many years ago I was caught-up in SleepyCat’s BDB libraries. They worked, they were fast, and as they promised; you could forgo a DBA. I developed a few proof of concept applications using BDB and they worked great. They included speed, big data, ACID and everything they promised. Luckily for me, at the time, the projects never ran long enough for a disaster to occur. I know now that, at the time, I did not know enough about BDB to recover from even a moderate system failure.

Today we are inundated with NoSQL alternatives. Riak, MongoDB, Redis, Cassandra, Volt, Orient; just to name a few. To my knowledge, none of them actually state that a DBA is not required, however, they all seem to imply that your developers are going to assume the responsibility. At least Riak and MongoDB have enterprise consoles for the NOC (network operations center) suggesting that they realize otherwise.

Let’s start with the schema. Most developers will knock out their first or second iteration of the schema over lunch. And in most cases it’s probably pretty simple. It’s not until you get into production that “you” realize the warts when your perfect parochial schema. I’ve implemented several payment systems. The first holds 12B active accounts and processes 12M sale transactions a day(333TPS). The second had a hard time at 25TPS. The first contained only 5 tables and the second was a beautiful 100 table constraint nightmare.

And then there is “real world” data. For example, when you’re doing 12M transactions a day Oracle it’s still a challenge to export the data so that it can be warehoused and reported upon. ETL is going to take time. That’s when one might consider sharding and other approaches to optimization; even normalization (all functions that should be performed by a DBA). However, in the NoSQL/NoDBA world, this function is going to fall on the developer… who is no longer working on new functions or revenue generating opportunities but is instead sandbagging the dam.

As far as SME’s go. They tend to know vertical markets or applications very well. They tend not to know every last detail about the data store.

For example, there was a time when my DOS based PC would crash and I’ve have to fix my harddisk. There was a time when I could and would repair the filesystem by hand, however, after Norton Utilities performed that function in a fraction of the time I had to turn in my keys. And now, when that type of failure occurs on my Linux machine I simply reinstall. I do not have the time or the inclination to repair the data.

That function was always left to the DBA when it came to traditional RDBMS and the sysadmin when the filesystem went bad. I just cannot imagine that anyone would want to perform that function when there are people who specialize in it.

So just because you have read the docs for the client libraries and maybe the source code. None of that makes you a SME. And there is nothing that is going to replace the SME. Just because you’re not calling him/her a DBA does not mean that the function is not being performed.

 
1 Comment

Posted by on 2011/08/23 in database

 

Tags: , , , , , ,

Reported my First Bug to MongoDB

I have a client that generates several million Asterisk CDR (call data records). These CDRs are not perfect. In fact they are formatted as TSV and not CSVs; and they have a leading TAB character. Since the CDRs are generated in 5 minute intervals and the files contain a few thousand CDRs it does not make sense to load the DB a record at a time. It actually makes more sense to bulk load so that the data is processed at as low a level in the DB engins as possible.

My first attempt to load data into MongoDB failed. The data was all askew. The problem is/was that there was a leading tab in the TSV file. And during the normal processing of the input file the import utility was stripping all leading whitespace regardless of the filetype. Since the whitespace includes the TAB character and since the first column of my data was mostly empty… the file had a leading TAB character.

And this character was considered a whitespace and so it was deleted before the record was processed.

So I did what any open source guy would do. I opened a ticket. Fixed the bug. And presented my patch in the ticket.I hope they will accept it.

 
Leave a comment

Posted by on 2011/06/17 in beta, database, nosql, Tools

 

Tags: , , , , ,

 
One Page Docs

Creating a library one page at a time.

One Page Bugs

Reducing the friction of writing and fixing bugs or features.

Follow

Get every new post delivered to your Inbox.

Join 223 other followers