Monday, February 7, 2011

About notes on document-oriented NoSQL by Curt Monash

I feel that couple observations need to be added to the Notes on document-oriented NoSQL by Curt Monash.

MongoDB does not store JSON documents, but rather JSON-style documents - specifically BSON (http://bsonspec.org/). It has important performance benefits for mostly numeric flexible-schema stores (read - health and social statistics, finance). Effectively the data does not need to serialize in and out of character stream between application objects and the store.

That also allows MongodB to manage storage as a set of memory-mapped files so that the DB server has little need and overhead of managing data persistence on disk. A side effect of memory mapped files efficiency is that objects are capped in size. I believe the current limit is 8MB, but do not quote me on that.

Many RDBMS implementations can have explicit (foreign keys) and implicit (joins) references between data items. That allows to build an arbitrary, albeit complex, data graph and have it persisted in the data's meta-data or at least somewhere between an application and DB. For example in queries, views, and stored procedures.

BSON, like JSON, represents inherently acyclic data graphs - effectively directed trees. It has no build-in mechanism to keep record of any relationships in data except for containment at below object level. That seems to be consistent with MongoDB's philosophy of disclaiming any significant responsibility for meta-data. If schema management is not in the DB engine, then why should meta-schema be in the DB engine? That is a blessing and a curse, as one needs to use a proprietary format if they want to persist the structural information in the data store.

In XML-based stores one has an option of using the family of XML-related standards to record and query edges of a data graph. One can check a schema validity for an XML document. I doubt MarkLogic does it natively today however it is very conceivable to have XPath references from inside one document into innards of another and have the relationship followed as a part of a query. This is a bit more than a "true document" notion as it is a cross-document relationship.

Same thing - it is a blessing and a curse as it brings to the table character serialization penalty and an easy way to make a very convoluted data design. And I do not even want to go into performance issues of a random graph traversal.

5 comments :

  1. MarkLogic does XInclude.

    While I can assume all your assumptions might be right for MongoDB internal architecture they do not stand for MarkLogic. If you want to know more you can start by reading the Inside MarkLogic white-paper and I'm free for any questions.

    ReplyDelete
  2. Nuno, interesting. What are my assumptions about MarkLogic?

    ReplyDelete
  3. Just seems like you are extremely familiar with mongo and bson and not so much with MarkLogic. So while I can't follow your assumptions cause they probably relate to how Mongo works internally and your experience with it and graphs I it's very likely they don't stand in such a different solution like MarkLogic.

    ReplyDelete
  4. I see. So I will check my assumptions with MarkLogic before speaking in public, when I have them. http://dilbert.com/strips/comic/2011-02-05/

    ReplyDelete
  5. Funny Dilbert vlad. But that wasn't my intention for sure! :)

    ReplyDelete

Comments which in my opinion do not contribute to a discussion will be removed.