I keep hearing about HFT, and have read about it a little. But there's a surprising dearth of data.
Can someone come up with a data set? For example: here are N inputs to the program. If the program can make a decision D in time T, then it can make money. Or something to that effect. I have no idea right now what the inputs to these HFT programs are; what the expected actions under the time constraints are; and how the effectiveness is measured (in terms of the given data). Someone please enlighten me.
1. Data set - order book of your chosen stock exchange. Events being streamed in at rates up to 1Gb/s [1]. This is the entire set of actions affecting the order book - bids, offers, cancellations, adjustments, etc. - it's huge. If you want to get fancy - most do - you would typically pull in several of these feeds (or subscribe to a consolidated feed) containing several exchanges, and look to arb any price/book inefficiencies.
2. Actions - given the changes to the order book, figure out how you need to change your positioning to make money. Time is everything here - steps here are counted in tenths of milliseconds, and overall response times in low ms. HFT firms even try to minimise cable lengths within the colocated datacentres to shave the time to response down even further.
3. Effectiveness - if you didn't blow up, and if so, if you made money - typically you'd examine intraday PnL volatility, returns, etc. and assess yourself on those and other metrics.
How about releasing some data? Do some feature generation and come up with features, along with the desired action (the best possible action, given that you know how things are going to go in the future). Then us clueless types can see what all the hubbub is all about :-)
Can someone come up with a data set? For example: here are N inputs to the program. If the program can make a decision D in time T, then it can make money. Or something to that effect. I have no idea right now what the inputs to these HFT programs are; what the expected actions under the time constraints are; and how the effectiveness is measured (in terms of the given data). Someone please enlighten me.