What is DependencyGraph while invoking Function execution


#1

Hi ,

I am seeing the mention of a DependencyGraph while invoking Function execution.
I am not fully understanding what it is used for and how a graph comes into picture here.
Kindly elaborate.

~Shyam


#2

The computation engine in the view processor works by building a data structure called a dependency graph. You can find general background information on wikipedia or this book a colleague of mine recommends.

The basic idea is that each ‘node’ in the graph represents a function that has data that it depends on (inputs, represented as lines with arrows pointing into the node when drawing it) and data that it produces (outputs, represented as lines with arrows pointing away from the nodes when drawing it). The system then builds up this data structure by taking the value that the user has asked for, and searching for functions that satisfy that requirement. It then sees if the chosen function in turn has requirements, and further tries to satisfy these requirements from the the function repository (which is populated in DemoStandardFunctionConfiguration). When all that remain are market data the requirements (which constitute the ‘leaf nodes’ on the graph), the system then subscribes to that market data, and starts executing.

The dependency graph data structure now represent a ‘plan’ of the order in which functions need to be executes, and how data flows from one to the other. The engine packages each node/function on the graph up as a ‘job’ and sending them off to be run on Calculation Nodes (which may be a local thread or a remote process or machine over a network) in the appropriate order. As each node/function is run, it’s inputs are provided from a shared ‘value cache’ and it’s outputs are also placed into this shared ‘value cache’ (which you can think of as a big shared hash map) for use by the next function up the graph. When all functions/nodes have been executed, the view processor is notified and can then pull the result out of the value cache and return it to the user.

There’s some general information about it in the documentation as well.

I hope that helps.


#3

Woww - that was a real eye opener.
The idea is quite clear and i believe by using this architecture we can avoid redundant computation and achieve parallelism.

I am more curious about this on - “the system then subscribes to that market data, and starts executing.”

How can we define a market data and how can we connect that to the function.

~Shyam


#4

Yes, that’s the idea - we can automatically eliminate redundant calculations (e.g. only build a yield curve, surface one, etc) and farm out jobs in parallel.

There is a Bloomberg live data adapter included in the Open Source distribution if you have a Bloomberg subscription (and terminal or Managed B-Pipe). If you don’t you can either see the documentation covering how to write a custom live data server, or use historical end-of-day data (the UI currently only supports the latest point value although the ViewClient interface lets you source data from specific historical dates). If you’re not processing live tick data (e.g. it’s more of a time series processing application) you can read time series in directly within the function.

Lastly there’s a structured market data snapshot mechanism, although that currently requires a live market data source to sample, although theoretically you could hook it up to historical data. It’s really designed to be used with out commercial market data snapshot editing/sampling tools although there is a command line tool to sample a snapshot too.

Commercial customers get access to other market data adapters (Reuters, Activ Financial, Tullet Prebon, ICAP) and also to our Excel integration.