Evaluating with Bootstrap Metrics

Bootstrap metrics can help us to more thoroughly evaluate a trading strategy, as we will see in this notebook.

In the previous notebook, we implemented a trading strategy and backtested it. Here is the implementation again:

[1]:

import pybroker
from pybroker import Strategy, StrategyConfig, YFinance

pybroker.enable_data_source_cache('my_strategy')

def buy_low(ctx):
    if ctx.long_pos():
        return
    if ctx.bars >= 2 and ctx.close[-1] < ctx.low[-2]:
        ctx.buy_shares = ctx.calc_target_shares(0.25)
        ctx.buy_limit_price = ctx.close[-1] - 0.01
        ctx.hold_bars = 3

def short_high(ctx):
    if ctx.short_pos():
        return
    if ctx.bars >= 2 and ctx.close[-1] > ctx.high[-2]:
        ctx.sell_shares = 100
        ctx.hold_bars = 2

As before, we create a new Strategy instance with the given configurations:

[2]:

config = StrategyConfig(initial_cash=500_000, bootstrap_sample_size=100)
strategy = Strategy(YFinance(), '3/1/2017', '3/1/2022', config)
strategy.add_execution(buy_low, ['AAPL', 'MSFT'])
strategy.add_execution(short_high, ['TSLA'])

This time, the Strategy is configured with a bootstrap_sample_size of 100 because of the small amount of historical data being used. Next, we run the backtest with bootstrap metrics enabled:

[3]:

result = strategy.backtest(calc_bootstrap=True)
result.metrics_df

Backtesting: 2017-03-01 00:00:00 to 2022-03-01 00:00:00

Loaded cached bar data.

Test split: 2017-03-01 00:00:00 to 2022-02-28 00:00:00

100% (1259 of 1259) |####################| Elapsed Time: 0:00:00 Time:  0:00:00


Calculating bootstrap metrics: sample_size=100, samples=10000...
Calculated bootstrap metrics: 0:00:03

Finished backtest: 0:00:05

[3]:

	name	value
0	trade_count	388.000000
1	initial_market_value	500000.000000
2	end_market_value	655753.670000
3	total_pnl	156575.000000
4	unrealized_pnl	-821.330000
5	total_return_pct	31.315000
6	total_profit	383032.400000
7	total_loss	-226457.400000
8	total_fees	0.000000
9	max_drawdown	-30181.580000
10	max_drawdown_pct	-4.554816
11	win_rate	52.577320
12	loss_rate	47.422680
13	winning_trades	204.000000
14	losing_trades	184.000000
15	avg_pnl	403.543814
16	avg_return_pct	0.279639
17	avg_trade_bars	2.414948
18	avg_profit	1877.609804
19	avg_profit_pct	3.168775
20	avg_winning_trade_bars	2.465686
21	avg_loss	-1230.746739
22	avg_loss_pct	-2.923533
23	avg_losing_trade_bars	2.358696
24	largest_win	20797.970000
25	largest_win_pct	14.490000
26	largest_win_bars	3.000000
27	largest_loss	-10831.630000
28	largest_loss_pct	-6.490000
29	largest_loss_bars	3.000000
30	max_wins	7.000000
31	max_losses	7.000000
32	sharpe	0.054488
33	sortino	0.061320
34	profit_factor	1.312935
35	ulcer_index	0.627821
36	upi	0.035531
37	equity_r2	0.893202
38	std_error	63596.828230

When we look at the total_pnl metric above, it seems that we have a profitable trading strategy on our first try. However, we cannot be completely sure that these results are repeatable and not just due to chance. To gain more confidence in our results, we can use the boostrap method to compute metrics.

The bootstrap method works by repeatedly computing a metric on random samples drawn from the backtest’s returns. Then, the metric is computed on each random sample, and the average is taken. By doing this on thousands of random samples, we obtain a more robust and accurate estimate of the metric.

Confidence Intervals

PyBroker applies the bootstrap method to calculate confidence intervals for two performance metrics, the Profit Factor and Sharpe Ratio:

[4]:

result.bootstrap.conf_intervals

[4]:

		lower	upper
name	conf
Profit Factor	97.5%	0.594243	4.400753
	95%	0.719539	3.715684
	90%	0.877060	3.153457
Sharpe Ratio	97.5%	-0.136541	0.243573
	95%	-0.100146	0.220099
	90%	-0.060583	0.193326

PyBroker uses the bias corrected and accelerated (BCa) bootstrap method to calculate the confidence intervals for these metrics. The returns are sampled per-bar rather than per-trade to capture more information in the metrics.

The resulting table shows the lower bound of the confidence interval at the given confidence level. This provides a more conservative estimate of the strategy’s performance. For example, we can be 97.5% confident that the Sharpe Ratio is at or above a given value of x.

In this example, the Sharpe Ratio has negative lower bounds, and the lower bounds of the Profit Factor are less than 1, which suggests that the strategy is not reliable.

Maximum Drawdown

In this section, we examine the maximum drawdown of the strategy using the bootstrap method. The probabilities of the drawdown not exceeding certain values, represented in cash and percentage of portfolio equity, are displayed below:

[5]:

result.bootstrap.drawdown_conf

[5]:

	amount	percent
conf
99.9%	-66401.25	-10.462527
99%	-50062.49	-7.963632
95%	-35794.82	-5.848482
90%	-29931.10	-4.912087

These confidence levels were obtained using per-bar returns from the backtest’s out-of-sample results, similar to how the Profit Factor and Sharpe Ratio were calculated.

We can observe that the bootstrapped max drawdown of -10.46% at a 99.9% confidence level is worse than the -4.55% we saw in our original results. This highlights the importance of using randomized tests to evaluate the performance of your trading strategy.

In the next notebook, we will discuss how to incorporate ranking and position sizing in your trading strategies.