Select Page

The Roadblocks of AI Backtesting

The following text is from a response from the EquBot leadership team to the media on AI backtesting

Institutional investors must satisfy various manager selection criteria before funding a new strategy. Historical investment track record is very important in these processes, but backtesting is often substituted to accommodate newer quantitative strategies. Complex investment management systems offer various forms of factor criteria to reconstruct portfolio constituents and analyze relative performance for specified periods of time to satisfy backtesting requirements. Since we have a perfect vision of what has transpired in the markets, many technical investment products suffer from the “overfitting” syndrome associated with backtesting. Talented quantitative investment professionals can aggregate multiple signals and can generate amazing backtest returns for strategies going back decades. Sophisticated investors need to question if this is the right approach for a perpetually changing investment environment relative to more complex investment solutions that leverage real time data.

Investment research has indicated the majority of data has been created in the past two years. Given the growth of data we believe that this statement will hold true two years in the future. A vast majority of the data growth will come from alternative data sources. Alternative data sources like announcements distributed across business wires, social media, and videos can have different degrees of impact on financial markets. AI investment strategies analyzing sentiment, intent, magnitude, and credibility from digital media seem to offer a compelling value proposition in being able to analyze the volumes of data and derive conclusions in a timelier manner when compared to traditional investment managers. Many instinctually feel the importance of these newer alternative data sources, but how can they be validated?

Web data is archived, so why not overlay the same backtesting methods on these unstructured alternative data sets relative to the holdings in the aforementioned tests?

As a result of the explosion of data, archived information loses detail. Time stamps will become muddled if you go back a handful of years. When seconds can move the market and impact the profitability of a trade, applying unstructured data signals to backtest results can be misleading. The most reasonable expectation of backtesting with respect to newer AI investment strategies should involve matching historical data characteristics to the frequency of trading. As such, if you see a daily active rebalancing AI strategy, a 30 year backtest should be heavily scrutinized. If you are using daily signals, then a reasonable backtest should involve datasets so far as the daily detail can be derived.

The alternative data explosion is being driven by new sources of information. Similarly, we will see many historical data sources disappear. This is a critical requirement for any investor looking to allocate money into the AI investment space. Maintaining an understanding on how an investment manager is handling the explosion of new data, and monitoring the ongoing credibility of existing information is important. A flexible solution should be in place, and should be openly discussed to confirm if the technological advantage of the manager will hold.

The jury is still out to determine the optimal measure of the potential efficacy of a highly quantitative AI investment strategy. Similar to the procedure of understanding an investment manager process, investors should include a data management process criteria to their quantitative AI investment manager searches. We understand firsthand how track records and backtests can cause friction in the current institutional investment environment but remain optimistic. EquBot is an advocate of backtesting investment strategies for periods with commensurate duration of observed signals and trading activity, and data management discussions. No one can say for sure what the investment information environment will look like 30 years in the future, but we can say with confidence that portfolios 30 years ago were not facing the impact of social media posts and other alternative data like they are today.