Creating a Custom Data Source
PyBroker comes with pre-built DataSources for Yahoo Finance, Alpaca, and AKShare, which you can use right away without any additional setup. But if you have a specific need or
want to use a different data source, PyBroker also allows you to create your own DataSource
class.
Extending DataSource
In the example code provided below, a new DataSource
called CSVDataSource
is implemented, which loads data from a CSV file. The CSVDataSource
reads a file named prices.csv
into a Pandas DataFrame, and then returns the data from this DataFrame based on the input parameters provided:
[1]:
import pandas as pd
import pybroker
from pybroker.data import DataSource
class CSVDataSource(DataSource):
def __init__(self):
super().__init__()
# Register custom columns in the CSV.
pybroker.register_columns('rsi')
def _fetch_data(self, symbols, start_date, end_date, _timeframe, _adjust):
df = pd.read_csv('data/prices.csv')
df = df[df['symbol'].isin(symbols)]
df['date'] = pd.to_datetime(df['date'])
return df[(df['date'] >= start_date) & (df['date'] <= end_date)]
To make the custom 'rsi'
column from the CSV file available to PyBroker, we register it using pybroker.register_columns. This allows PyBroker to use this custom column when it processes the data.
It’s important to note that when returning the data from your custom DataSource, it must include the following columns: symbol
, date
, open
, high
, low
, and close
, as these columns are expected by PyBroker.
Now we can query the CSV data from an instance of CSVDataSource
:
[2]:
csv_data_source = CSVDataSource()
df = csv_data_source.query(['MCD', 'NKE', 'DIS'], '6/1/2021', '12/1/2021')
df
Loading bar data...
Loaded bar data: 0:00:00
[2]:
date | symbol | open | high | low | close | rsi | |
---|---|---|---|---|---|---|---|
0 | 2021-06-01 | DIS | 180.179993 | 181.009995 | 178.740005 | 178.839996 | 46.321532 |
1 | 2021-06-01 | MCD | 235.979996 | 235.990005 | 232.740005 | 233.240005 | 46.522926 |
2 | 2021-06-01 | NKE | 137.850006 | 138.050003 | 134.210007 | 134.509995 | 53.308085 |
3 | 2021-06-02 | DIS | 179.039993 | 179.100006 | 176.929993 | 177.000000 | 42.635256 |
4 | 2021-06-02 | MCD | 233.970001 | 234.330002 | 232.809998 | 233.779999 | 48.051484 |
... | ... | ... | ... | ... | ... | ... | ... |
382 | 2021-11-30 | MCD | 247.380005 | 247.899994 | 243.949997 | 244.600006 | 40.461178 |
383 | 2021-11-30 | NKE | 168.789993 | 171.550003 | 167.529999 | 169.240005 | 51.505558 |
384 | 2021-12-01 | DIS | 146.699997 | 148.369995 | 142.039993 | 142.149994 | 16.677555 |
385 | 2021-12-01 | MCD | 245.759995 | 250.899994 | 244.110001 | 244.179993 | 39.853689 |
386 | 2021-12-01 | NKE | 170.889999 | 173.369995 | 166.679993 | 166.699997 | 46.704527 |
387 rows × 7 columns
To use CSVDataSource
in a backtest, we create a new Strategy object and pass the custom DataSource
:
[3]:
from pybroker import Strategy
def buy_low_sell_high_rsi(ctx):
pos = ctx.long_pos()
if not pos and ctx.rsi[-1] < 30:
ctx.buy_shares = 100
elif pos and ctx.rsi[-1] > 70:
ctx.sell_shares = pos.shares
strategy = Strategy(csv_data_source, '6/1/2021', '12/1/2021')
strategy.add_execution(buy_low_sell_high_rsi, ['MCD', 'NKE', 'DIS'])
result = strategy.backtest()
result.orders
Backtesting: 2021-06-01 00:00:00 to 2021-12-01 00:00:00
Loading bar data...
Loaded bar data: 0:00:00
Test split: 2021-06-01 00:00:00 to 2021-12-01 00:00:00
100% (129 of 129) |######################| Elapsed Time: 0:00:00 Time: 0:00:00
Finished backtest: 0:00:02
[3]:
type | symbol | date | shares | limit_price | fill_price | fees | |
---|---|---|---|---|---|---|---|
id | |||||||
1 | buy | NKE | 2021-09-21 | 100 | NaN | 154.86 | 0.0 |
2 | sell | NKE | 2021-11-04 | 100 | NaN | 173.82 | 0.0 |
3 | buy | DIS | 2021-11-16 | 100 | NaN | 159.40 | 0.0 |
Note that because we registered the custom rsi
column with PyBroker, it can be accessed in the ExecContext using ctx.rsi
.
Using a Pandas DataFrame
If you do not need the flexibility of implementing your own DataSource, then you can pass a Pandas DataFrame to a Strategy
instead.
To demonstrate, the earlier example can be re-implemented as follows:
[4]:
df = pd.read_csv('data/prices.csv')
df['date'] = pd.to_datetime(df['date'])
pybroker.register_columns('rsi')
strategy = Strategy(df, '6/1/2021', '12/1/2021')
strategy.add_execution(buy_low_sell_high_rsi, ['MCD', 'NKE', 'DIS'])
result = strategy.backtest()
result.orders
Backtesting: 2021-06-01 00:00:00 to 2021-12-01 00:00:00
Test split: 2021-06-01 00:00:00 to 2021-12-01 00:00:00
100% (129 of 129) |######################| Elapsed Time: 0:00:00 Time: 0:00:00
Finished backtest: 0:00:00
[4]:
type | symbol | date | shares | limit_price | fill_price | fees | |
---|---|---|---|---|---|---|---|
id | |||||||
1 | buy | NKE | 2021-09-21 | 100 | NaN | 154.86 | 0.0 |
2 | sell | NKE | 2021-11-04 | 100 | NaN | 173.82 | 0.0 |
3 | buy | DIS | 2021-11-16 | 100 | NaN | 159.40 | 0.0 |