Evaluating with Bootstrap Metrics

Bootstrap metrics can help us to more thoroughly evaluate a trading strategy, as we will see in this notebook.

In the previous notebook, we implemented a trading strategy and backtested it. Here is the implementation again:

[1]:
import pybroker
from pybroker import Strategy, StrategyConfig, YFinance

pybroker.enable_data_source_cache('my_strategy')

def buy_low(ctx):
    if ctx.long_pos():
        return
    if ctx.bars >= 2 and ctx.close[-1] < ctx.low[-2]:
        ctx.buy_shares = ctx.calc_target_shares(0.25)
        ctx.buy_limit_price = ctx.close[-1] - 0.01
        ctx.hold_bars = 3

def short_high(ctx):
    if ctx.short_pos():
        return
    if ctx.bars >= 2 and ctx.close[-1] > ctx.high[-2]:
        ctx.sell_shares = 100
        ctx.hold_bars = 2

As before, we create a new Strategy instance with the given configurations:

[2]:
config = StrategyConfig(initial_cash=500_000, bootstrap_sample_size=100)
strategy = Strategy(YFinance(), '3/1/2017', '3/1/2022', config)
strategy.add_execution(buy_low, ['AAPL', 'MSFT'])
strategy.add_execution(short_high, ['TSLA'])

This time, the Strategy is configured with a bootstrap_sample_size of 100 because of the small amount of historical data being used. Next, we run the backtest with bootstrap metrics enabled:

[3]:
result = strategy.backtest(calc_bootstrap=True)
result.metrics_df
Backtesting: 2017-03-01 00:00:00 to 2022-03-01 00:00:00

Loaded cached bar data.

Test split: 2017-03-01 00:00:00 to 2022-02-28 00:00:00
100% (1259 of 1259) |####################| Elapsed Time: 0:00:00 Time:  0:00:00

Calculating bootstrap metrics: sample_size=100, samples=10000...
Calculated bootstrap metrics: 0:00:03

Finished backtest: 0:00:05
[3]:
name value
0 trade_count 388.000000
1 initial_market_value 500000.000000
2 end_market_value 655753.670000
3 total_pnl 156575.000000
4 unrealized_pnl -821.330000
5 total_return_pct 31.315000
6 total_profit 383032.400000
7 total_loss -226457.400000
8 total_fees 0.000000
9 max_drawdown -30181.580000
10 max_drawdown_pct -4.554816
11 win_rate 52.577320
12 loss_rate 47.422680
13 winning_trades 204.000000
14 losing_trades 184.000000
15 avg_pnl 403.543814
16 avg_return_pct 0.279639
17 avg_trade_bars 2.414948
18 avg_profit 1877.609804
19 avg_profit_pct 3.168775
20 avg_winning_trade_bars 2.465686
21 avg_loss -1230.746739
22 avg_loss_pct -2.923533
23 avg_losing_trade_bars 2.358696
24 largest_win 20797.970000
25 largest_win_pct 14.490000
26 largest_win_bars 3.000000
27 largest_loss -10831.630000
28 largest_loss_pct -6.490000
29 largest_loss_bars 3.000000
30 max_wins 7.000000
31 max_losses 7.000000
32 sharpe 0.054488
33 sortino 0.061320
34 profit_factor 1.312935
35 ulcer_index 0.627821
36 upi 0.035531
37 equity_r2 0.893202
38 std_error 63596.828230

When we look at the total_pnl metric above, it seems that we have a profitable trading strategy on our first try. However, we cannot be completely sure that these results are repeatable and not just due to chance. To gain more confidence in our results, we can use the boostrap method to compute metrics.

The bootstrap method works by repeatedly computing a metric on random samples drawn from the backtest’s returns. Then, the metric is computed on each random sample, and the average is taken. By doing this on thousands of random samples, we obtain a more robust and accurate estimate of the metric.

Confidence Intervals

PyBroker applies the bootstrap method to calculate confidence intervals for two performance metrics, the Profit Factor and Sharpe Ratio:

[4]:
result.bootstrap.conf_intervals
[4]:
lower upper
name conf
Profit Factor 97.5% 0.594243 4.400753
95% 0.719539 3.715684
90% 0.877060 3.153457
Sharpe Ratio 97.5% -0.136541 0.243573
95% -0.100146 0.220099
90% -0.060583 0.193326

PyBroker uses the bias corrected and accelerated (BCa) bootstrap method to calculate the confidence intervals for these metrics. The returns are sampled per-bar rather than per-trade to capture more information in the metrics.

The resulting table shows the lower bound of the confidence interval at the given confidence level. This provides a more conservative estimate of the strategy’s performance. For example, we can be 97.5% confident that the Sharpe Ratio is at or above a given value of x.

In this example, the Sharpe Ratio has negative lower bounds, and the lower bounds of the Profit Factor are less than 1, which suggests that the strategy is not reliable.

Maximum Drawdown

In this section, we examine the maximum drawdown of the strategy using the bootstrap method. The probabilities of the drawdown not exceeding certain values, represented in cash and percentage of portfolio equity, are displayed below:

[5]:
result.bootstrap.drawdown_conf
[5]:
amount percent
conf
99.9% -66401.25 -10.462527
99% -50062.49 -7.963632
95% -35794.82 -5.848482
90% -29931.10 -4.912087

These confidence levels were obtained using per-bar returns from the backtest’s out-of-sample results, similar to how the Profit Factor and Sharpe Ratio were calculated.

We can observe that the bootstrapped max drawdown of -10.46% at a 99.9% confidence level is worse than the -4.55% we saw in our original results. This highlights the importance of using randomized tests to evaluate the performance of your trading strategy.

In the next notebook, we will discuss how to incorporate ranking and position sizing in your trading strategies.