In the following link , i have seen that mongoDB implementation for HistoricalTimeSeriesMaster is tried out somewhere.
LINK - http://docs.opengamma.com/display/DOC/Sources%2C+Masters%2C+and+Databases
Please point to that code.
It will help a lot.
Also while using mongoDB to store timeseries was there any particular difficulties faced ?
There was an attempt to implement the Master interfaces using MongoDB. What we found was that MongoDB (and probably other similar NoSQL databases) could not implement the Master interfaces accurately. The problem is that the interfaces represent data with both versions and corrections. In order to make that work, the SQL implementation uses transactions (to “end date” the old row when the new row is added). Since the MongoDB implementations did not work fully, they have been deleted. We would recommend the SQL versions for production use.
I didn’t fully understand the issue in using mongoDB.
What i understand from your post is that we wont be able to track changes made to time series. Like if a value of a day is corrected later , we wont be able to track the old value.
Am i having the right idea here ?
Also cant we implement lock on the HistoricalTimeSeriesMaster implementation here.
Like make sure that either update/remove/correct/get function run at a time for a particular index.
What we found was that to store data in the complex versioned format that we’ve adopted (with both versions and corrections to versions), we required the use of more than one collection. Because Mongo doesn’t support transactions, or any real locking (or at least didn’t when we last used it), we stopped using it. Now theoretically you could add your own locking layer, and we did look into that, but we came to the conclusion it was more trouble than it was worth. In particular, my experience with time series databases that require global locking (MySQL required internal global locks on it’s auto-increment id columns as sequences were not supported) showed that this was a very bad idea. Insert performance became a serious problem as the database grew. There are strategies for overcoming this, but as I said, we didn’t think it was worth the trouble.
To be clear about the older version, this was a much simpler implementation that didn’t support proper versioning and wouldn’t now be compatible with the architecture of the rest of the system.
Got it.
Thanks Jim and Stephen.
Quite apart from the problems with using MongoDB (or any other NoSQL system) as the primary Master interface for any type of data (the “correct” storage of which requires traditional ACID-style semantics in the OpenGamma Platform), that’s not the only potential use of these systems in our architecture.
The rigorous use of Source and Master interfaces throughout the system means that in any real-world use case you will have a cascade of implementations between client code and the ultimate data source. A typical production client might have:
- A CachingSource, on top of
- A RemoteSource, communicating over a network connection with
- A Servlet/other container, communicating with
- A CachingSource, on top of
- A DatabaseSoruce
There are many layers in between the ultimate client code and the actual database code. This gives maximum flexibility in configuring your particular application for your particular runtime topology.
Where we believe that NoSQL technologies are likely to have the greatest impact is in the layers in between the ultimate client code and the RDBMS implementation (see the CachingSources above). There, we think that systems like Voldemort or Cassandra may prove to be superior layers above the canonical RDBMS-based Master implementations in large-scale implementations, using RDBMS sharding underneath.