Writing Indicators

This notebook explains how to create and integrate custom stock indicators in PyBroker. Indicators in PyBroker are written using NumPy, a powerful library for numerical computing. To optimize performance, we’ll also be utilizing Numba, a JIT compiler that translates Python code into efficient machine code. Numba is especially helpful for accelerating code that involves loops and NumPy arrays. Here’s how we import these libraries:

[1]:
import numpy as np
from numba import njit

The following code shows an indicator function that calculates close prices minus a moving average (CMMA), which can be used for a mean reversion strategy:

[2]:
def cmma(bar_data, lookback):

    @njit  # Enable Numba JIT.
    def vec_cmma(values):
        # Initialize the result array.
        n = len(values)
        out = np.array([np.nan for _ in range(n)])

        # For all bars starting at lookback:
        for i in range(lookback, n):
            # Calculate the moving average for the lookback.
            ma = 0
            for j in range(i - lookback, i):
                ma += values[j]
            ma /= lookback
            # Subtract the moving average from value.
            out[i] = values[i] - ma
        return out

    # Calculate with close prices.
    return vec_cmma(bar_data.close)

The cmma function takes two arguments: bar_data, which is an instance of the BarData class that holds OHLCV data and custom fields, and lookback, which is a user-defined argument for the lookback of the moving average.

The vec_cmma function is JIT-compiled by Numba and nested inside cmma. This is necessary since a Numba compiled function supports a NumPy array as an argument but not an instance of a Python class like BarData. Note the computation of the indicator values is vectorized by Numba, meaning that it’s performed on all of the historical data at once. This approach significantly speeds up the backtesting process.

The next step is to register the indicator function with PyBroker using the following code:

[3]:
import pybroker

cmma_20 = pybroker.indicator('cmma_20', cmma, lookback=20)

Here, we are giving the name cmma_20 to the indicator function and specifying the lookback parameter as 20 bars. Any arguments in the indicator function that come after bar_data will be passed as user-defined arguments to pybroker.indicator. Once the indicator function is registered with PyBroker, it will return a new Indicator instance that references the indicator function we defined.

The following is an example of how to use the registered Indicator in PyBroker with some data downloaded from Yahoo Finance:

[4]:
from pybroker import YFinance

pybroker.enable_data_source_cache('yfinance')

yfinance = YFinance()
df = yfinance.query('PG', '4/1/2020', '4/1/2022')
Loading bar data...
[*********************100%***********************]  1 of 1 completed
Loaded bar data: 0:00:01

[5]:
cmma_20(df)
[5]:
2020-04-01         NaN
2020-04-02         NaN
2020-04-03         NaN
2020-04-06         NaN
2020-04-07         NaN
                ...
2022-03-25    1.967502
2022-03-28    3.288005
2022-03-29    4.968507
2022-03-30    3.790999
2022-03-31    2.171002
Length: 505, dtype: float64

As you can see, the Indicator instance is a Callable. Once called, the resulting computed indicator values are returned as a Pandas Series.

The Indicator class also provides functions for measuring its information content. For example, you can compute the interquartile range (IQR):

[6]:
cmma_20.iqr(df)
[6]:
4.655495452880842

Or compute the relative entropy:

[7]:
cmma_20.relative_entropy(df)
[7]:
0.7495800114455111

Using the Indicator in a Strategy

After implementing our indicator, the next step is to integrate it into a trading strategy. The following example shows a simple strategy that goes long when the 20-day CMMA is less than 0 — i.e. when the last close price drops below the 20-day moving average:

[8]:
def buy_cmma_cross(ctx):
    if ctx.long_pos():
        return
    # Place a buy order if the most recent value of the 20 day CMMA is < 0:
    if ctx.indicator('cmma_20')[-1] < 0:
        ctx.buy_shares = ctx.calc_target_shares(1)
        ctx.hold_bars = 3

The indicator values are retrieved by calling ctx.indicator on the ExecContext and passing in the registered name of the cmma_20 indicator.

(Note, you can also retrieve indicator data for another symbol by passing the symbol to ExecContext#indicator())

[9]:
from pybroker import Strategy

strategy = Strategy(yfinance, '4/1/2020', '4/1/2022')
strategy.add_execution(buy_cmma_cross, 'PG', indicators=cmma_20)

Here, the buy_cmma_cross function is added to the Strategy along with the cmma_20 indicator. We can enable caching of the computed indicator values to disk with the following:

[10]:
pybroker.enable_indicator_cache('my_indicators')
[10]:
<diskcache.core.Cache at 0x7f45b0a73bb0>

Finally, we can run the backtest with the following code. The warmup argument specifies that 20 bars need to pass before running the backtest execution:

[11]:
result = strategy.backtest(warmup=20)
result.metrics_df.round(4)
Backtesting: 2020-04-01 00:00:00 to 2022-04-01 00:00:00

Loaded cached bar data.

Computing indicators...
100% (1 of 1) |##########################| Elapsed Time: 0:00:00 Time:  0:00:00

Test split: 2020-04-01 00:00:00 to 2022-03-31 00:00:00
100% (505 of 505) |######################| Elapsed Time: 0:00:00 Time:  0:00:00

Finished backtest: 0:00:01
[11]:
name value
0 trade_count 60.0000
1 initial_market_value 100000.0000
2 end_market_value 100759.3600
3 total_pnl 759.3600
4 unrealized_pnl 0.0000
5 total_return_pct 0.7594
6 total_profit 41596.7500
7 total_loss -40837.3900
8 total_fees 0.0000
9 max_drawdown -13446.9300
10 max_drawdown_pct -11.9774
11 win_rate 53.3333
12 loss_rate 46.6667
13 winning_trades 32.0000
14 losing_trades 28.0000
15 avg_pnl 12.6560
16 avg_return_pct 0.0293
17 avg_trade_bars 3.0000
18 avg_profit 1299.8984
19 avg_profit_pct 1.2609
20 avg_winning_trade_bars 3.0000
21 avg_loss -1458.4782
22 avg_loss_pct -1.3782
23 avg_losing_trade_bars 3.0000
24 largest_win 4263.4500
25 largest_win_pct 4.1000
26 largest_win_bars 3.0000
27 largest_loss -4675.6700
28 largest_loss_pct -4.1700
29 largest_loss_bars 3.0000
30 max_wins 7.0000
31 max_losses 4.0000
32 sharpe 0.0023
33 profit_factor 1.0092
34 ulcer_index 1.8823
35 upi 0.0019
36 equity_r2 0.0015
37 std_error 3385.1968

When the backtest runs, PyBroker computes the indicator values. If there are multiple indicators added to the Strategy, then PyBroker will compute them in parallel across multiple CPU cores.

Vectorized Helpers

The PyBroker library provides vectorized helper functions to make the process of computing indicators easier. One of these helper functions is highv, which calculates the highest value for every period of n bars.

In the example code, an indicator function called hhv is defined that uses highv to calculate the highest high price for every period of 5 bars:

[12]:
from pybroker import highv

def hhv(bar_data, period):
    return highv(bar_data.high, period)

hhv_5 = pybroker.indicator('hhv_5', hhv, period=5)
hhv_5(df)
[12]:
2020-04-01           NaN
2020-04-02           NaN
2020-04-03           NaN
2020-04-06           NaN
2020-04-07    120.059998
                 ...
2022-03-25    153.919998
2022-03-28    153.919998
2022-03-29    156.470001
2022-03-30    156.470001
2022-03-31    156.470001
Length: 505, dtype: float64

The pybroker.vect module also includes other vectorized helpers such as lowv, sumv, returnv, and cross, the last of which is used to compute crossovers.

Additionally, PyBroker includes convenient wrappers for highest and lowest indicators. Our hhv indicator can be rewritten as:

[13]:
from pybroker import highest

hhv_5 = highest('hhv_5', 'high', period=5)
hhv_5(df)
[13]:
2020-04-01           NaN
2020-04-02           NaN
2020-04-03           NaN
2020-04-06           NaN
2020-04-07    120.059998
                 ...
2022-03-25    153.919998
2022-03-28    153.919998
2022-03-29    156.470001
2022-03-30    156.470001
2022-03-31    156.470001
Length: 505, dtype: float64

Computing Multiple Indicators

An IndicatorSet can be used to calculate multiple indicators. The cmma_20 and hhv_5 indicators can be computed together by adding them to the IndicatorSet. The resulting output will be a Pandas DataFrame containing both:

[14]:
from pybroker import IndicatorSet

indicator_set = IndicatorSet()
indicator_set.add(cmma_20, hhv_5)
indicator_set(df)
Computing indicators...
100% (2 of 2) |##########################| Elapsed Time: 0:00:01 Time:  0:00:01

[14]:
symbol date cmma_20 hhv_5
0 PG 2020-04-01 NaN NaN
1 PG 2020-04-02 NaN NaN
2 PG 2020-04-03 NaN NaN
3 PG 2020-04-06 NaN NaN
4 PG 2020-04-07 NaN 120.059998
... ... ... ... ...
500 PG 2022-03-25 1.967502 153.919998
501 PG 2022-03-28 3.288005 153.919998
502 PG 2022-03-29 4.968507 156.470001
503 PG 2022-03-30 3.790999 156.470001
504 PG 2022-03-31 2.171002 156.470001

505 rows × 4 columns

Using TA-Lib

TA-Lib is a widely used technical analysis library that implements many financial indicators. Integrating TA-Lib with PyBroker is straightforward. Here is an example:

[15]:
import talib

rsi_20 = pybroker.indicator('rsi_20', lambda data: talib.RSI(data.close, timeperiod=20))
rsi_20(df)
[15]:
2020-04-01          NaN
2020-04-02          NaN
2020-04-03          NaN
2020-04-06          NaN
2020-04-07          NaN
                ...
2022-03-25    49.373093
2022-03-28    51.014810
2022-03-29    53.407971
2022-03-30    51.610544
2022-03-31    49.029540
Length: 505, dtype: float64

In the next tutorial, you will learn how to train a model using custom indicators in PyBroker.