Backtesting is a process to recreate the historical trades by using the available historical information to find out the subsequent performance of those trades. The process may seem easy, but it can go wrong in many ways.
These are some of the dangers related to backtesting you need to steer clear of –
Look-Ahead Bias
It is a situation when the trader uses information available only at a time ahead of the time of the trade. For instance, if you follow the trade entry rule of buying the stock when it is within 1% of the day’s low, the look-ahead bias is introduced in your strategy because it is not possible for you to know about the day’s low before the market closes on the same day. There is another example of this. Assume that a model is involved with a linear regression fit of two price series. If you determine your daily trading signals by using the regression coefficients that are generated from the complete data set, the look-ahead bias is introduced again.
To avoid look-ahead bias, you need to calculate signals at every opportunity by using lagged historical data. Lagging the data series means that you can calculate each quantity such as the moving averages, the highs and lows, and volume, according to the data up to the close of the previous trading period only. However, you do not need to lag the data if the strategy enters at the close of the period only.
It is easier to avoid the look-ahead bias if you use Excel or other WYSI-WYG programs instead of using MATLAB. It is because all the different columns of data can be aligned in Excel easily. Moreover, you can make sure that all cells are computed according to the rows above the current row. When the current day’s data is used in generating signals, it becomes visually obvious due to the cell-highlighting feature of Excel. While on MATLAB, you need to be more careful and run a lag function on specific series that are used to generate a signal.
However, even after being all careful and cautious to avoid look-ahead bias in a backtest program, sometimes it is slipped in. There are some look-ahead bias that has quite subtle nature and you cannot avoid them easily, especially if you use MATLAB. So, you must run a final check-up of your MATLAB backtest program.
Data-Snooping Bias
Data-snooping bias is the pitfall that inflates back-test performance compared to its future performance due to the overoptimization of the model’s parameters according to the transient noise in the historical data. When it comes to the business of predictive statistical models of historical data, data-snooping bias is general but it is more serious in finance due to the limited amount of independent data.
We can access stock market data from the early twentieth century, but only the past ten years of data is suitable to develop a predictive model. Moreover, based on regime shifts, even just a few year-old data is obsolete for backtesting purposes. With less independent data, you need to employ less adjustable parameters in your trading model.
As a rule of thumb, you should not employ over five parameters. These parameters can include quantities like the entry and exit thresholds, lookback period, and holding period to compute moving averages. However, data-snooping bias not always occurs because of the optimization of parameters. You can make multiple choices to create a trading model and these choices can be affected by repeated backtesting on the same set of data.
The decisions that are mainly affected include whether to enter at the open or close, whether to trade mid-cap or large-cap stocks, whether to hold the positions overnight and more. These qualitative decisions are often made for optimizing the backtest performance, but they may stop being optimal after that. Eliminating dana-snooping bias completely is almost impossible as long as you develop models driven by data. However, you can take some measures to weaken the bias.
Sample Size
If you want a safeguard against the data-snooping bias, you need to make sure that you have enough of backtesting data suited to the number of free parameters you prefer to optimize. According to the rule of thumb, suppose that the number of data points required to optimize your parameters is equivalent to 252 times the number of free parameters on your model. Now, assume that you must have a minimum of three years’ worth of backtest data with daily prices. However, if your three-parameter trading model updates positions every minute, you must have at least 252/390 year, or seven-month, or one-minute backtest data.
Out-of-Sample Testing
After dividing the historical data into two parts, you can save the more recent part of the data for out-of-sample testing. While developing the model, you need to optimize the parameters and other qualitative decisions on the training set, which is the first portion. However, the resulting model must be tested on the test set, which is the second portion. These two portions are usually roughly equal in size, but with insufficient training data, you must have a minimum of one-third as much test data as training data.
You can determine the minimum size of the training set by the rule of thumb. Ideally, the set of optimal parameters and decisions for the first part is also the optimal setting for the second part of the backtest period, but it is rare when things go this perfectly. At least, the performance of the data on the second part should be reasonable or the model would have data-snooping bias developed in it. You can cure it by simplifying the model, as well as eliminating some parameters.
Another method of out-of-sample testing is using moving optimization of the parameters. In that case, the parameters adapt to the changing historical data constantly, which eliminates data-snooping bias with respect to parameters.
Backtest results are very important for trading, so, you must be careful about the mistakes and avoid them if you want to improve in trading.