Evaluating with Bootstrap Metrics
Bootstrap metrics can help us to more thoroughly evaluate a trading strategy, as we will see in this notebook.
In the previous notebook, we implemented a trading strategy and backtested it. Here is the implementation again:
[1]:
import pybroker
from pybroker import Strategy, StrategyConfig, YFinance
pybroker.enable_data_source_cache('my_strategy')
def buy_low(ctx):
if ctx.long_pos():
return
if ctx.bars >= 2 and ctx.close[-1] < ctx.low[-2]:
ctx.buy_shares = ctx.calc_target_shares(0.25)
ctx.buy_limit_price = ctx.close[-1] - 0.01
ctx.hold_bars = 3
def short_high(ctx):
if ctx.short_pos():
return
if ctx.bars >= 2 and ctx.close[-1] > ctx.high[-2]:
ctx.sell_shares = 100
ctx.hold_bars = 2
As before, we create a new Strategy instance with the given configurations:
[2]:
config = StrategyConfig(initial_cash=500_000, bootstrap_sample_size=100)
strategy = Strategy(YFinance(), '3/1/2017', '3/1/2022', config)
strategy.add_execution(buy_low, ['AAPL', 'MSFT'])
strategy.add_execution(short_high, ['TSLA'])
This time, the Strategy
is configured with a bootstrap_sample_size of 100
because of the small amount of historical data being used. Next, we run the backtest with bootstrap metrics enabled:
[3]:
result = strategy.backtest(calc_bootstrap=True)
result.metrics_df
Backtesting: 2017-03-01 00:00:00 to 2022-03-01 00:00:00
Loaded cached bar data.
Test split: 2017-03-01 00:00:00 to 2022-02-28 00:00:00
100% (1259 of 1259) |####################| Elapsed Time: 0:00:00 Time: 0:00:00
Calculating bootstrap metrics: sample_size=100, samples=10000...
Calculated bootstrap metrics: 0:00:03
Finished backtest: 0:00:05
[3]:
name | value | |
---|---|---|
0 | trade_count | 388.000000 |
1 | initial_market_value | 500000.000000 |
2 | end_market_value | 655753.670000 |
3 | total_pnl | 156575.000000 |
4 | unrealized_pnl | -821.330000 |
5 | total_return_pct | 31.315000 |
6 | total_profit | 383032.400000 |
7 | total_loss | -226457.400000 |
8 | total_fees | 0.000000 |
9 | max_drawdown | -30181.580000 |
10 | max_drawdown_pct | -4.554816 |
11 | win_rate | 52.577320 |
12 | loss_rate | 47.422680 |
13 | winning_trades | 204.000000 |
14 | losing_trades | 184.000000 |
15 | avg_pnl | 403.543814 |
16 | avg_return_pct | 0.279639 |
17 | avg_trade_bars | 2.414948 |
18 | avg_profit | 1877.609804 |
19 | avg_profit_pct | 3.168775 |
20 | avg_winning_trade_bars | 2.465686 |
21 | avg_loss | -1230.746739 |
22 | avg_loss_pct | -2.923533 |
23 | avg_losing_trade_bars | 2.358696 |
24 | largest_win | 20797.970000 |
25 | largest_win_pct | 14.490000 |
26 | largest_win_bars | 3.000000 |
27 | largest_loss | -10831.630000 |
28 | largest_loss_pct | -6.490000 |
29 | largest_loss_bars | 3.000000 |
30 | max_wins | 7.000000 |
31 | max_losses | 7.000000 |
32 | sharpe | 0.054488 |
33 | sortino | 0.061320 |
34 | profit_factor | 1.312935 |
35 | ulcer_index | 0.627821 |
36 | upi | 0.035531 |
37 | equity_r2 | 0.893202 |
38 | std_error | 63596.828230 |
When we look at the total_pnl
metric above, it seems that we have a profitable trading strategy on our first try. However, we cannot be completely sure that these results are repeatable and not just due to chance. To gain more confidence in our results, we can use the boostrap method to compute metrics.
The bootstrap method works by repeatedly computing a metric on random samples drawn from the backtest’s returns. Then, the metric is computed on each random sample, and the average is taken. By doing this on thousands of random samples, we obtain a more robust and accurate estimate of the metric.
Confidence Intervals
PyBroker applies the bootstrap method to calculate confidence intervals for two performance metrics, the Profit Factor and Sharpe Ratio:
[4]:
result.bootstrap.conf_intervals
[4]:
lower | upper | ||
---|---|---|---|
name | conf | ||
Profit Factor | 97.5% | 0.594243 | 4.400753 |
95% | 0.719539 | 3.715684 | |
90% | 0.877060 | 3.153457 | |
Sharpe Ratio | 97.5% | -0.136541 | 0.243573 |
95% | -0.100146 | 0.220099 | |
90% | -0.060583 | 0.193326 |
PyBroker uses the bias corrected and accelerated (BCa) bootstrap method to calculate the confidence intervals for these metrics. The returns are sampled per-bar rather than per-trade to capture more information in the metrics.
The resulting table shows the lower bound of the confidence interval at the given confidence level. This provides a more conservative estimate of the strategy’s performance. For example, we can be 97.5%
confident that the Sharpe Ratio is at or above a given value of x.
In this example, the Sharpe Ratio has negative lower bounds, and the lower bounds of the Profit Factor are less than 1, which suggests that the strategy is not reliable.
Maximum Drawdown
In this section, we examine the maximum drawdown of the strategy using the bootstrap method. The probabilities of the drawdown not exceeding certain values, represented in cash and percentage of portfolio equity, are displayed below:
[5]:
result.bootstrap.drawdown_conf
[5]:
amount | percent | |
---|---|---|
conf | ||
99.9% | -66401.25 | -10.462527 |
99% | -50062.49 | -7.963632 |
95% | -35794.82 | -5.848482 |
90% | -29931.10 | -4.912087 |
These confidence levels were obtained using per-bar returns from the backtest’s out-of-sample results, similar to how the Profit Factor and Sharpe Ratio were calculated.
We can observe that the bootstrapped max drawdown of -10.46%
at a 99.9%
confidence level is worse than the -4.55%
we saw in our original results. This highlights the importance of using randomized tests to evaluate the performance of your trading strategy.