Couple of basic questions

Thanks for the nice product and all the time you all take to share your knowledge here, we really appreciate it. I am in the early stages of understanding how the OG platform work and here are my some questions.

Appreciate your replies as always.

  1. Is it possible to detach view and portfolios in the analytics tab. In other words, typically we will have a set of risk measure we want to see for more than one portfolios or load different portfolios to the same “view”. Just like reports used to do, we save the report and then run the report against multiple portfolios to see or compare the portfolios.

  2. We typically have multiple historical fundamental data saved out to some db. For example, EPS for all companies, we have one column for the time and another for the company id, but all of EPS is in one single table.So in OG platform if we have to save this, we have to save EPS for all companies in seperate time series.? Though the data is changing qtrly. So for typical case of 1000 companies that will be 1000 time series.? So if you have 100 data items like this, it will be like 1000 X 100 time series in this minimal case.?

(I had seen one of your earlier replies about 20000 time series testing, but won’t that be too small for a real case like ours.?)

  1. Is there any way these risk measure that are calculated real time on analytics tab can be saved out to a db, which can be read back later. So in other words we will have timeseries of HistoricalVar on the portfolio saved out, so we can see how it varied in the past.?

  2. What is the difference in the data source and data provider, like Bloomberg will be the data provider and realtime or reference (a sub classification on Bloomberg side) etc will be the data source.? Will you be able to provide examples here, so the context is more clear on the equity side.

  3. Is there a concept of data life for the data anywhere.? Like if we are looking for the last price, and if the last price on the time series is 2 months back price, is there a way we can specify that, 1 week old is price is ok but not 2 month old price, something like that.?

  4. In earlier post Jim had mentioned about dependency graph analyser tool
    http://forums.opengamma.com/forum/discussion/28/market-data-implementation
    Sorry, where do we have this tool.Like which tab.?

Also he has talked about

Turn on an option in ViewDefinitionCompiler, there’s a boolean called OUTPUT_LIVE_DATA_REQUIREMENTS.
Where do you we have this.?

If I want to see how much data should be loaded for HistoricalVar, where do I get all the details, if I load only half of it will it run.? Where do we see all the bells and whistles on the data engine work flow.


Sorry to have asked you a lot of questions, I really apologize, I hope getting clarity will help many of the current and future users of OG and helps them in a much faster adoption cycle.

Thanks again for your help and replies.
Regards.

Also the question about Source and Master is kind of confusing at present now. Will it be possible to explain it with some examples, so we understand it fully.

http://docs.opengamma.com/display/DOC090/Sources%2C+Masters%2C+and+Databases

Appreciate your replies as always. Thanks Again

Hi,

To address your questions one by one:

  1. Here I'm assuming that you mean that you want to be able to view and compare two or more views at once. We currently just expect users to create a new browser tab or window to look at separate views, but we're working on a new version of the web UI that will allow tearing off of views and various pop-up items. It's also possible to configure the engine in such a way as to make certain views always run the background. This allows faster connection from clients (web, .NET etc) and also allows for things like alert monitoring to e.g. text message you when certain conditions are met.
  2. Yes, but it's essentially stored in exactly the same way - we store all the time series data points in a single table, it's just that we use our id to separate out the time series at a slightly different level, so the performance should be the same or similar. We're aware that we need to provide larger scale performance testing numbers for time series, but we don't see any reason why our schema won't scale. If nothing else, it's a pretty simple addition to shard data over multiple nodes if necessary. We're also planning to add Vertica support, which as a column store, is better suited to time series type data with it's relatively low cardinality.
  3. We've had a batch database component in the system for a while now, but it's undergoing a substantial overhaul at the moment. This will allow you to dump results into a database with a relatively standard structure that you can project into your favorite OLAP reporting or analysis tools. It should be merged into the master branch in the next couple of weeks. You can also store result data sets in memory in both R and Excel.
  4. The data source is e.g. Bloomberg. The data provider is where the data originally came from. CMPL is Bloomberg London Composite, ICPL is ICAP London, etc. It's more generally useful for OTC quotes when you use the @PROVIDER syntax in Bloomberg.
  5. Yes, on the HistoricalTimeSeriesSource interface, when you call getLatestDataPoint(), you can specify the start and end dates. If the first data point available is before this start date or after the end date, then it will not be returned. Therefore, you simply set your start-date to and you'll get what you're looking for.
  6. The tool is activated by left-clicking on a value in the analytics page of the web UI. If you need debugging information, set the flag in DependencyGraphBuilder.java (as referred to by that article), the output now appears in JAVA.IO.TMP (which is usually your local tmp directory).
You then had a couple of unnumbered questions:
  • If you load only some time series for HVaR only the positions for which you have data will show results, the others will show 'Loading...'. The engine log output should tell you which time series are missing. We should really provide a tool to do this automatically, and the main reason we haven't is because we currently only provide the Bloomberg module as a commercial component. With Bloomberg announcing the open sourcing of their API, we're planning to open source this component and so will be able to add this kind of tool to the system. We're currently awaiting confirmation on certain aspects of Bloomberg's new approach before we can proceed.
  • The difference between sources and masters is that sources are generally read-only use-case driven APIs. The Master interfaces are lower level document-oriented APIs (we treat results as 'documents' with metadata) for managing read/write/update/correct operations as would be required by system level tools.

    For example the SecurityMaster API requires the construction of a SecuritySearchRequest, on which you set properties such as a search query or a unique id, or a template document with various parameters, and then send that to the SecurityMaster interface via the search() method. This returns a SecuritySearchResult object containing a list of 0 or more SecurityDocument objects, each of which will contain metadata, plus the matching security. This API is meant to be very flexible and uniform across all of our masters. In contrast, the SecuritySource interface has methods like getSecurities(ExternalIdbundle), which return a completely unpacked collection of securities, with the source doing any necessary conflict resolution between versions, etc, depending on the context. These are what you’d usually want to use from within your e.g. analytics functions that plug into the engine.

I hope that helps.

Jim

Thanks a lot Jim for taking time to explain all the question. Appreciate a lot.

This definitely helped me to solve some of my questions. Will add here if there are more question comes up. Thanks again

Is there any operations available on portfolios.? Like add two portfolios (merge), intersect, exclude etc. Combining all the firm portfolio etc.

Thanks for all your help

It’s certainly supported by the data structures but there aren’t any pre-built tools to do that. Adding that is a good idea so I’ve opened a JIRA Issue. It probably won’t make 1.0 now but we’ll try and get it in soon.

Thanks again Jim for being very helpful and for opening JIRA issue

Hello @jim ,
I didn’t quite understand how is data life feature achieved using openGamma.
Let me elaborate the requirement first.
Lets say the data life for a timeseries is 2.
And the timeseries have value change on 1st , 5th and 10th of the month
And for the rest of the days value haven’t changed.
So when someone ask value for that month , what i expect is value on 1st ,2nd , 3rd , 5th , 6th , 7th , 10th , 11th , 12th and rest of the days it should be NA or absent in the output time series.
Is it possible to achieve this in OG with the current setup ?

Thanks
Vineeth

Hi @vineeth,

Sorry, I don’t really understand the question. The ‘data life’ being referred above to is simply a reference to the oldest value that is acceptable to get as a ‘last price’. As I said, if you set the start data in the call to getLatestDataPoint() then it will not return a point older than that cut-off date. It sounds like you’re asking about versions and corrections. If you change existing values in a series, that’s classified as a ‘correction’. New values are classified as ‘updates’. Whether the corrections are applied or not when you’re querying is determined by the correction date you provide (if any). By default, all corrections are applied up before returning data, giving you the most up-to-date version of the data available. The only reason you’d not want this behaviour is if you want to exactly reproduce older reports using the original data used (even if it was wrong).

Does that answer your question?

Jim

For completeness on “datalife” concept let me summarize two situations,

-> when we need a single value to return, say requesting data for 20101231, in this case as I understand using
getLatestDataPoint() will let me specify a starting date to allow “acceptable” value to return, so if the acceptable date (datalife) is last 30 days then if there is any data between 20101130 to 20101231 then it will return the last available in this date range, if there is no data in this range then it will return NA.

-> Second, when we need a range of value, say we want daily value for the same time range 20101130 to 20101231 and actual exiting date and data value pair are (date, data) are say on (20101130,10), (20101210,12) and (20101215,11), with the concept of “datalife” say with value datalife = 10, what we would like to see is

20101130, 10
To
20101209, 10
20101210, 12
To
20101214, 12
20101215, 11
To
20101225, 11
20101226, NA
To
20101231, NA

I think this is achieved by sampling function in OG, with padding
PreviousValuePaddingTimeSeriesSamplingFunction
May be for this specific case, we will have to write a padding function with this specific need. Is this understanding correct.? Thanks Jim for the time and effort, really appreciate it.

Ahhh, okay. You’re talking about time series padding. Now I understand. We don’t have anything specifically to do that, but you’re on the right track looking at that time series function. All you need to do is to create a new one based on that one that stops padding after a set number.

Thanks Jim as always for the prompt reply.