Print
Hits: 10881
If anyone ever asks, "How do clusters make money?", you can answer simply by saying, "the old fashion way, they earn it." Clusters have been used in the Finance sector for quite a while and their use continues to increase.

One way clusters earn their keep is by helping to forecast and predict risk. Like forecasting the weather, timeliness is important, because yesterdays forecast is of no value if we get the answer tomorrow. Similarly, financial institutions need to do an almost real-time analysis on market derivatives to determine the Value At Risk (VAR) (see below). In the late eighties and early nineties, institutions realized that they could divide up the large portfolios of derivative positions and use parallel computers to perform VAR calculations thereby providing the almost real-time analysis they desired.

Many organizations found however, that the computing resources to perform these tasks are used only at certain times. Typically the risk determinations were done three times a day for three days during the trading period and then left idle the rest of the time. The was the need for very quick turn around on the trading days, because federal regulation's require that financial institution "know" the amount of risk they own. Due to the nature of the work load, and the cost of a dedicated supercomputer, companies like Bear Stearns and Lehman Brothers started using clustered Sun workstations to run their derivative pricing and VAR calculations. This approach was developed in in the early 1990's and can be considered the first clusters on Wall Street.

One of the key issues with financial computing is often time to solution. Unlike a large scale simulation of an aircraft design that may take days to run, financial information usually is needed in real-time. Again like weather prediction, the more complex the model, the better the results, the longer it take to run, and the more difficult it it becomes to provide timely results. Clusters represent an affordable answer to the real-time nature of financial markets. As an aside, weather prediction on clusters can also be considered a financial application. Long term models of weather are of great interested to commodities traders.

Clusters continue to solve these types of "embarrassing parallel" problems, but they are not limited to this type of application. There are other areas that are not quite as simple to implement, but none the less useful to the financial market. Some of the more interesting applications are discussed below.

Derivatives and Value At Risk VAR

A derivative instrument is a method to buy and sell risk. It is a security whose value is derived from one or more other securities (the price of a share of stock), commodities (the price of corn), or events. The value is influenced by the features of the derivative contract, including the timing of the contract fulfillment, the value of the underlying security or commodity, and other factors like volatility.

Value at Risk (VAR) is basically is the largest amount of money that an instrument (a derivative) can loose with a probability of X for a time period Y. Or put another way, a portfolio manager would like to know that a $30 million portion of portfolio has only a 1% chance that the loss will exceed the $30 million over the next month. These calculations typically are determined using Monte Carlo methods which are easily parallelized and run on clusters.

Prism is Watching

The National Stock Exchange (NSE) of India is a growing securities marketplace. It is considered one of the top five exchanges in the world. The NSE was also one of the first exchanges in India to introduce derivatives that could be traded like stocks and bonds.

The NSE uses a Linux cluster to implement an advanced risk management system that enables on-line risk and position monitoring of members. If a trade crosses a Value at Risk (VAR) limit, the next trade will be automatically rejected. Such a monitoring process requires real-time throughput, high scalability, and the ability to work under high loads. In the course of normal trading, if a broker goes past his acceptable risk limits, his account is disabled in real-time and the system sends out alerts to the trading system and risk management team.

The software system used for doing VAR calculations is called Prism (Parallel Risk Management System). Prism is a cluster application that uses uses MPI to farm out jobs to worker nodes. The cluster uses a master/worker model and consists of a centralized head node that is connected via Fast Ethernet to the worker nodes. The initial system handled 50 trades/second but now accommodates the 500 trades/sec seen by the NSE. Each trade calls for two VAR computations. In order to handle the load, the NSE estimated the calculation of each VAR needed about 30 milliseconds for every computation (or trade). If the monitoring system was to be real-time, and scalable, then distributing the load was the only way to deliver this performance at a reasonable price. The head node used was a dual-CPU Intel Xeon running at 1 Ghz. This machine receives trade information and was designed to be fault tolerant so that data recovery is possible in case of a service outage. The worker machines were Intel PIII running at 800 Mhz. The workers do all the VAR computations and then send it back to the head node. The current system can be scaled up to 1,000 trades/second and the NSE believes it can continue to scale up the the cluster as the trade volume increases with faster machines and interconnects. Worker nodes can be easily added (or removed) thus providing another level of fault tolerance.

Due to the economics and unavailability of cluster technology, the PRISM system is probably the most cost effective monitoring system available today.

VAR from the Desktop

An interesting approach to desktop risk analysis comes from the Cornell Theory Center. In this application, VAR calculations are done using a desktop computer running a Microsoft Excel spreadsheet. The bulk of the calculations are done on a dedicated Windows cluster or collection of idle desktop machines with the help of .NET web-services. The application works as follows. An Excel interface allows the user to set parameters which are used to send jobs to worker machines. Each processor on a worker node, works independently on the all the instruments comprising the portfolio. A Monte Carlo simulation is used to construct a set of interest rate paths. The problem is partition over the number of interest rate paths specified by the user. Therefore, each processor is responsible for the total number of paths specified by the user divided by the number of processors. The processor then prices the entire portfolio on each of the interest rate paths it was assigned.

This type of divide and conquer problem is very effective in a cluster or distributed environment because there is no communication between the worker processes. It is very powerful for two reasons. First, it works transparently within a known desktop application (Excel) and second it is highly scalable. Scalability provides the ability to adding more worker nodes as the problem size to grows with no reduction in time to solution.

Digging for Data

As the amount of on-line financial data is increasing each year, data mining has become one method for learning trading rules from this data. Once a trading rule has been established from historical data, it can assist with forecasting S&P 500, exchange rates, stock directions, and rating stocks for portfolio. Rules are generated by a number of ways including statistical analysis, neural networks, ruled-based systems , decision-tree systems , and fuzzy-logic methods. Clearly the large the amount of data and large amount of processing requires high end resources like clusters. Parallel data mining can sometimes be done using SQL, but more often advanced tools are used to look for relationships that are not apparent in the data. For those interested, there are open source applications available from the Australian National University that provide tools to do parallel data mining on Linux clusters (see Resources).

Tic, Tic, Tic

High performance databases are another area where clusters can provide answers to hard questions. An interesting application in this area is the use of the Kdb from Kx Systems to scan stock ticks. Every time a stock is traded the price is recorded on the "ticker". What once was a mechanical printer and paper tape is now a huge database which can easily exceed four million trades each day for just the NYSE and the NASD alone. Searching this data is monumental task and is important to "time series analysis" where a stock's past series of trade and quote prices can assist in future trades.

Kx has, in the past, demonstrated Kdb running on a Linux cluster consisting of 50 CPUs, 50 Gigabytes of RAM, and 300 GB of storage. They loaded 2 years of NYSE tick data (2.5 billion trades and quotes) onto the cluster and where able to archive a sub-second query response rate on all publicly traded stocks. In addition, multi-dimensional aggregations were produced in 5 to 20 seconds.

Quantlib Anyone?

If you are interested in learning how to become an options trading baron or would like to play with some computational finance tools you can take a look at Quantlib. QuantLib is library for modeling, trading, and risk management in real-life. It is released under the modified BSD License. QuantLib is written in C++ with a clean object model, and is then exported to different languages such as Python, Ruby, and Scheme. An initial Excel add-in is also available. There are ports to the .NET framework in C#. Bindings to other languages (including Java), and ports to Gnumeric, Matlab/Octave, S-PLUS/R, Mathematica, COM/CORBA/SOAP architectures, FpML, are under consideration.

Quantlib offer tools that can be used for building your own applications. Some of the components include, Lattice methods, finite differences, Monte Carlo, Short-rate models, Currencies and FX (exchange) rates, and Instruments and pricers. There are also source code example applications using the Quantlib library.

Only the Beginning

Clusters are only now coming onto their own in the finance industry. In addition to being the workhorse of derivative pricing and risk calculation for large and small institutions, they are finding their way into many other areas that include, desktop analysis, real time trading monitoring, data mining, large database analysis, neural nests, and genetic algorithms. As clusters continue to grow and develop they will continue to earn their keep and provide the financial markets levels of analysis never thought possible.

This article was originally published in ClusterWorld Magazine. It has been updated and formated for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Sidebar One: The Black-Scholes Formula

You will ofter hear the Black-Scholes equation mentioned when people are discussing the finance markets. The Black-Scholes equation is the workhorse of the finance community and provides a method to determine how much a call option is worth at any given time. A call option is a contract between two parties that allows the buyer the right but not the obligation to buy an agreed quantity of a particular commodity or financial instrument at an agreed upon price for an agreed upon time.

The power of the Black-Scholes model is that it lets you calculate the value of an option at any given time. Using Black-Scholes the price of the call option is based on a fraction of the stock's current price minus a fraction of the exercise price. These fractions depend on five factors; the price of the stock; the exercise price of the option; the risk-free interest rate; the time to maturity of the option, and volatility of the underlying stock price. The last factor is the only one that is unobservable.

Financial institutions use the Black-Scholes and other methods to calculate the value of options from which they can understand the risk and set a price to help manage the portfolio.

Using the Black-Scholes equation requires the solution of a set of partial differential equations by means of numerical integration. Depending on the number of assets taken into account, obtaining a solution can be highly computationally intensive.

Sidebar Two: Resources
Computational Financial Derivatives Laboratory

Financial Engineering News

PRISM

Cornell Theory Center

Kx Systems

KD Nuggets (Data Mining)

Data Mining

Quantlib

The QuantNotes

Douglas Eadline is the swinging Head Monkey at ClusterMonkey.net.