Optimizing parameters of a trading strategy via backtesting has one major problem: there are typically not enough historical trades to achieve statistical significance.This talk will discuss a variety of methods of overcoming that, including stochastic control theory and simulations. Simulations may involve either linear or nonlinear time series models such as recurrent neural networks.
3. Optimizing Trading Signals
• Optimize trading strategy ≈ Optimize sum(PLs)
by tweaking trading signals.
• Number(trading signals) << Number(prices)
typically.
– Easy to cherry-pick trading signals for
optimization.
– Overfitting/data Snooping Bias.
– No predictive power on unseen/out-of-sample
data!
4. Remedies for Overfitting
• Increase length of historical backtest period.
– Subject to data availability
– Regime changes ⇒ old prices may be irrelevant.
• Create mathematical model of historical prices,
then analytically find optimal trading signals
– Effectively infinite backtest period.
– Historical price models tend to be oversimplified.
– Only analytically solvable for Trading Signals and
performance objective linearly related to prices.
5. Remedies for Overfitting
• Simulate historical prices with similar statistics
as actual historical prices.
– As large number of price series as practical.
– Can capture as many quirks of actual historical
prices as necessary.
• E.g. serial correlation, volatility clustering, tail events, …
– Can be used to optimize nonlinear trading signals
and performance objectives.
6. Analytical Optimization
• Example: a mean-reverting log price series 𝑥.
• Ornstein-Uhlenbeck equation
𝑑𝑥 𝑡 = 𝜅 𝜃 − 𝑥 𝑡 𝑑𝑡 + 𝜎𝑑𝑊(𝑡)
𝜅: rate of mean reversion
𝜃: mean log price level
𝜎: conditional volatility of 𝑥
W: random walk
• What are optimal entry/exit levels?
– Optimal ≡ maximum expected (discounted) profit for
single round-trip trade.
– Similar to optimal Bollinger bands.
7. Solving HJB
• Cartea, 2015 demonstrated solution using Hamilton-Jacobi-Bellman
equation (a PDE), familiar from stochastic control theory.
• Numerical solution to equation shows
– Entry and exit levels are asymmetric w.r.t. mean, due to discount
factor.
– Entry level closer to mean level than exit level.
– Distance of entry / exit levels to mean increases with decreasing 𝜅.
– Distance of entry / exit levels to mean increases with increasing 𝜎.
– (Last 2 points expected because unconditional volatility is
𝜎2
2𝜅
⇔
width of Bollinger bands.)
– Long exit = short entry, vice versa.
– Position is path-dependent.
– Always in either long or short position.
8. Optimal Entry and Exit
Long entry
Long exit
Short exit
Short entry
∼
𝜎2
2𝜅
9. Analytical Optimization
• What if underlying price prices are not described
by simple SDE like OU process?
– Jumps, volatility clustering, long range correlations,
etc.
• What if objective function is not discounted profit
but a nonlinear function of PL?
– Sharpe ratio, Calmar ratio, etc.
• What if objective function is total PL, not PL per
trade?
• Even setting up HJB equation is too difficult.
10. Simulation for optimization
• We can simulate as many copies of price series as
we like.
– All follow the same time series model, e.g. AR(p).
• Find trading parameters that maximizes the
average Sharpe ratio over all simulated price
series.
– Similar to solving HJB equation.
• Alternatively, find trading parameters that most
often maximizes Sharpe ratio of a simulated price
series.
– Similar to maximum likelihood estimation.
12. Example: AUDCAD
• ADF test indicates hourly AUDCAD prices are
stationary with p-Value better than 1%.
• Assume AR(1) model on daily log prices 𝑥.
𝑥 𝑡 = 𝑎1 𝑥 𝑡 − 1 + 𝑎0 + 𝜎0 𝜖 𝑡
𝜖~𝒩(0, 1)
– For illustrative purpose only.
– Train (𝑎0, 𝑎1, 𝜎0) on first half of data using MLE.
13. Optimal trading of AUDCAD
• Simulate 10,000 log price series based on
fitted AR(1).
– Each series is about 3.7 years (~10 x halflife).
• On each series, backtest a simple strategy:
Buy if expected log return > 𝑘𝜎0
Sell if expected log return < -𝑘𝜎0
Flatten otherwise.
• Apply 1.8 bps per side transaction cost.
14. Simulation Results
• Maximizing the average Sharpe ratio gives
optimal 𝑘=0.0088±0.0002.
𝐴𝑟𝑔𝑚𝑎𝑥 𝑘{𝐸 𝑝𝑎𝑡ℎ[𝑆ℎ𝑎𝑟𝑝𝑒(𝑝𝑎𝑡ℎ)|𝑘]}
• In contrast, 𝑘=0.01±0.006 maximizes the
likelihood that a path has highest Sharpe ratio
𝐴𝑟𝑔𝑚𝑎𝑥 𝑘{𝑃𝑝𝑎𝑡ℎ[𝐴𝑟𝑔𝑚𝑎𝑥 𝑘[𝑆ℎ𝑎𝑟𝑝𝑒(𝑘, path)]]}
• In general, the first method is more accurate
since all paths are used to determine 𝐸 𝑝𝑎𝑡ℎ.
18. Suboptimal > optimal?
• Backtest of “optimal” parameter underperforms
that of “suboptimal” parameter out-of-sample.
• AR(1) model may need refitting periodically.
• Nobody promises that for a particular realized
path, our optimal 𝒌 will maximize Sharpe!
– It is worth trading a range of 𝑘 in the vicinity of the
optimal for diversification.
• See similar work by Carr and Lopez de Prado,
2014.
19. Further work
• Can easily optimize other nonlinear functions of
prices instead
– Calmar ratio.
– CVaR .
• Can easily extend this to more complicated time
series models
– AR+GARCH
– Nonlinear generative models: e.g. LSTM (recurrent
neural network)
• Can easily extend this to more complicated
trading strategies, with multiple parameters.
20. Conclusion
• Optimizing trading strategy parameters on
historical data invites overfitting.
• More robust to fit time series (not trading)
models on historical data instead.
• Fitted time series model can be used to
simulate arbitrary number of time series.
• Can find optimal trading parameters on
simulated time series to arbitrary precision.
21. Thank you for your time!
www.qtscm.com
Twitter: @chanep
Blog: epchan.blogspot.com