The big data effort, which centers around a system known as the Market Information Data Analytics System (Midas), is being used not only to understand market trends and rapidly emerging modes of trading such as high-frequency trading, but also, according to the SEC, to inform future policy making.
Midas, which is costing the SEC $2.5 million a year, captures data such as time, price, trade type and order number on every order posted on national stock exchanges, every cancellation and modification, and every trade execution, including some off-exchange trades. Combined it adds up to billions of daily records.
[ Why are good big data tools so hard to find? Read Big Data Dev Tools Too Slow, Startup Says. ]
Although the system has been live for only a few months, the data in some cases goes back as far as four or five years. The uncompressed archive of Midas data would amount to about 1 petabyte, although the archive has been compressed to just more than 100 terabytes.
According to Walter and the top official overseeing Midas, Gregg Berman, who was recently named associate director of the Office of Analytics and Research in the SEC's Division of Trading and Markets, Midas has potentially wide application for the SEC.
"The downpour of data generated by the markets every hour will lead to better regulation and better investor protection," Walter said in a speech Tuesday, adding that Midas will "dramatically improve our understanding of the way today's markets function."
Much of the initial public information around Midas has centered around its ability to monitor and analyze high-frequency trading. "It will give us dramatically better insight into the function of a market that moves many millions of dollars in millionths of a second," Walter said in her speech. "It will be like the first time scientists used high-speed photography and strobe lighting to see how a hummingbird's wings actually move."
In more tangible terms, such information will, for example, help the SEC to rapidly analyze the causes of so-called flash crashes, in which the market drops significantly in a brief time period, and will facilitate the study of the need for and possible impact of potential regulations like requiring high-frequency traders to hold quotes for minimum time periods.
Much of the big data push stems from the May 6, 2010, "flash crash" in which the Dow Jones Industrial Average dropped about 600 points within five minutes before later recovering those losses. High-frequency trading would later shoulder much of the blame. The SEC investigation, led by Berman, took months and a lot of custom software development, but Midas could greatly accelerate such analysis.
"We realized we needed this data all the time, and not just in an emergency, to understand detailed patterns, monitor the markets, and inform us on how orders flow, interactions with trades, rates of cancellation, volatility, and other things that are really central to understanding market structure," Berman, a former hedge fund manager and Princeton-trained nuclear physics PhD, said in a recent interview with InformationWeek.