Getting Started with Data Sources
Welcome to PyBroker! The best place to start is to learn about DataSources. A DataSource
is a class that can fetch data from external sources, which you can then use to backtest your trading strategies.
Yahoo Finance
One of the built-in DataSources
in PyBroker is Yahoo Finance. To use it, you can import YFinance:
[1]:
from pybroker import YFinance
yfinance = YFinance()
df = yfinance.query(['AAPL', 'MSFT'], start_date='3/1/2021', end_date='3/1/2022')
df
Loading bar data...
[*********************100%%**********************] 2 of 2 completed
Loaded bar data: 0:00:00
[1]:
date | symbol | open | high | low | close | volume | adj_close | |
---|---|---|---|---|---|---|---|---|
0 | 2021-03-01 | AAPL | 123.750000 | 127.930000 | 122.790001 | 127.790001 | 116307900 | 125.599655 |
1 | 2021-03-01 | MSFT | 235.899994 | 237.470001 | 233.149994 | 236.940002 | 25324000 | 230.847702 |
2 | 2021-03-02 | AAPL | 128.410004 | 128.720001 | 125.010002 | 125.120003 | 102260900 | 122.975403 |
3 | 2021-03-02 | MSFT | 237.009995 | 237.300003 | 233.449997 | 233.869995 | 22812500 | 227.856628 |
4 | 2021-03-03 | AAPL | 124.809998 | 125.709999 | 121.839996 | 122.059998 | 112966300 | 119.967857 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
501 | 2022-02-24 | MSFT | 272.510010 | 295.160004 | 271.519989 | 294.589996 | 56989700 | 289.353271 |
502 | 2022-02-25 | AAPL | 163.839996 | 165.119995 | 160.869995 | 164.850006 | 91974200 | 162.987427 |
503 | 2022-02-25 | MSFT | 295.140015 | 297.630005 | 291.649994 | 297.309998 | 32546700 | 292.024872 |
504 | 2022-02-28 | AAPL | 163.059998 | 165.419998 | 162.429993 | 165.119995 | 95056600 | 163.254364 |
505 | 2022-02-28 | MSFT | 294.309998 | 299.140015 | 293.000000 | 298.790009 | 34627500 | 293.478607 |
506 rows × 8 columns
The above code queries data for AAPL and MSFT stocks, and returns a Pandas DataFrame with the results.
Caching Data
If you want to speed up your data retrieval, you can cache your queries using PyBroker’s caching system. You can enable caching by calling pybroker.enable_data_source_cache(‘name’) where name
is the name of the cache you want to use:
[2]:
import pybroker
pybroker.enable_data_source_cache('yfinance')
[2]:
<diskcache.core.Cache at 0x7f3884390d60>
The next call to query will cache the returned data to disk. Each unique combination of ticker symbol and date range will be cached separately:
[3]:
yfinance.query(['TSLA', 'IBM'], '3/1/2021', '3/1/2022')
Loading bar data...
[*********************100%%**********************] 2 of 2 completed
Loaded bar data: 0:00:00
[3]:
date | symbol | open | high | low | close | volume | adj_close | |
---|---|---|---|---|---|---|---|---|
0 | 2021-03-01 | IBM | 115.057358 | 116.940727 | 114.588913 | 115.430206 | 5977367 | 100.173241 |
1 | 2021-03-01 | TSLA | 230.036667 | 239.666672 | 228.350006 | 239.476669 | 81408600 | 239.476669 |
2 | 2021-03-02 | IBM | 115.430206 | 116.539200 | 114.971321 | 115.038239 | 4732418 | 99.833076 |
3 | 2021-03-02 | TSLA | 239.426666 | 240.369995 | 228.333328 | 228.813339 | 71196600 | 228.813339 |
4 | 2021-03-03 | IBM | 115.200768 | 117.237091 | 114.703636 | 116.978966 | 7744898 | 101.517288 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
501 | 2022-02-24 | TSLA | 233.463333 | 267.493347 | 233.333328 | 266.923340 | 135322200 | 266.923340 |
502 | 2022-02-25 | IBM | 122.050003 | 124.260002 | 121.449997 | 124.180000 | 4460900 | 113.041489 |
503 | 2022-02-25 | TSLA | 269.743347 | 273.166656 | 260.799988 | 269.956665 | 76067700 | 269.956665 |
504 | 2022-02-28 | IBM | 122.209999 | 123.389999 | 121.040001 | 122.510002 | 6757300 | 111.521271 |
505 | 2022-02-28 | TSLA | 271.670013 | 292.286682 | 271.570007 | 290.143341 | 99006900 | 290.143341 |
506 rows × 8 columns
Calling query
again with the same ticker symbols and date range returns the cached data:
[4]:
df = yfinance.query(['TSLA', 'IBM'], '3/1/2021', '3/1/2022')
df
Loaded cached bar data.
[4]:
date | symbol | open | high | low | close | volume | adj_close | |
---|---|---|---|---|---|---|---|---|
0 | 2021-03-01 | IBM | 115.057358 | 116.940727 | 114.588913 | 115.430206 | 5977367 | 100.173241 |
1 | 2021-03-02 | IBM | 115.430206 | 116.539200 | 114.971321 | 115.038239 | 4732418 | 99.833076 |
2 | 2021-03-03 | IBM | 115.200768 | 117.237091 | 114.703636 | 116.978966 | 7744898 | 101.517288 |
3 | 2021-03-04 | IBM | 116.634796 | 117.801147 | 113.537285 | 114.827919 | 8439651 | 99.650551 |
4 | 2021-03-05 | IBM | 115.334610 | 118.307838 | 114.961761 | 117.428299 | 7268968 | 101.907227 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
248 | 2022-02-22 | TSLA | 278.043335 | 285.576660 | 267.033325 | 273.843323 | 83288100 | 273.843323 |
249 | 2022-02-23 | TSLA | 276.809998 | 278.433319 | 253.520004 | 254.679993 | 95256900 | 254.679993 |
250 | 2022-02-24 | TSLA | 233.463333 | 267.493347 | 233.333328 | 266.923340 | 135322200 | 266.923340 |
251 | 2022-02-25 | TSLA | 269.743347 | 273.166656 | 260.799988 | 269.956665 | 76067700 | 269.956665 |
252 | 2022-02-28 | TSLA | 271.670013 | 292.286682 | 271.570007 | 290.143341 | 99006900 | 290.143341 |
506 rows × 8 columns
You can clear your cache using pybroker.clear_data_source_cache:
[5]:
pybroker.clear_data_source_cache()
Or disable caching altogether using pybroker.disable_data_source_cache:
[6]:
pybroker.disable_data_source_cache()
Note that these calls should be made after first calling pybroker.enable_data_source_cache.
Alpaca
PyBroker also includes an Alpaca DataSource
for fetching stock data. To use it, you can import Alpaca and provide your API key and secret:
[7]:
from pybroker import Alpaca
import os
alpaca = Alpaca(os.environ['ALPACA_API_KEY'], os.environ['ALPACA_API_SECRET'])
You can query Alpaca
for stock data using the same syntax as with Yahoo Finance, but Alpaca also supports querying data by different timeframes. For example, to query 1 minute data:
[8]:
df = alpaca.query(
['AAPL', 'MSFT'],
start_date='3/1/2021',
end_date='4/1/2021',
timeframe='1m'
)
df
Loading bar data...
Loaded bar data: 0:00:05
[8]:
date | symbol | open | high | low | close | volume | vwap | |
---|---|---|---|---|---|---|---|---|
0 | 2021-03-01 04:00:00-05:00 | AAPL | 124.30 | 124.56 | 124.30 | 124.50 | 12267.0 | 124.433365 |
1 | 2021-03-01 04:00:00-05:00 | MSFT | 235.87 | 236.00 | 235.87 | 236.00 | 1429.0 | 235.938887 |
2 | 2021-03-01 04:01:00-05:00 | AAPL | 124.56 | 124.60 | 124.30 | 124.30 | 9439.0 | 124.481323 |
3 | 2021-03-01 04:01:00-05:00 | MSFT | 236.17 | 236.17 | 236.17 | 236.17 | 104.0 | 236.161538 |
4 | 2021-03-01 04:02:00-05:00 | AAPL | 124.00 | 124.05 | 123.78 | 123.78 | 4834.0 | 123.935583 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
33340 | 2021-03-31 19:57:00-04:00 | MSFT | 237.28 | 237.28 | 237.28 | 237.28 | 507.0 | 237.367870 |
33341 | 2021-03-31 19:58:00-04:00 | AAPL | 122.36 | 122.39 | 122.33 | 122.39 | 3403.0 | 122.360544 |
33342 | 2021-03-31 19:58:00-04:00 | MSFT | 237.40 | 237.40 | 237.35 | 237.35 | 636.0 | 237.378066 |
33343 | 2021-03-31 19:59:00-04:00 | AAPL | 122.39 | 122.45 | 122.38 | 122.45 | 5560.0 | 122.402606 |
33344 | 2021-03-31 19:59:00-04:00 | MSFT | 237.40 | 237.53 | 237.40 | 237.53 | 1163.0 | 237.473801 |
33345 rows × 8 columns
Alpaca Crypto
If you are interested in fetching cryptocurrency data, you can use AlpacaCrypto. Here’s an example of how to use it:
[9]:
from pybroker import AlpacaCrypto
crypto = AlpacaCrypto(
os.environ['ALPACA_API_KEY'],
os.environ['ALPACA_API_SECRET']
)
df = crypto.query('BTC/USD', start_date='1/1/2021', end_date='2/1/2021', timeframe='1h')
df
Loading bar data...
Loaded bar data: 0:00:06
[9]:
symbol | date | open | high | low | close | volume | vwap | trade_count | |
---|---|---|---|---|---|---|---|---|---|
0 | BTC/USD | 2021-01-01 01:00:00-05:00 | 29255.71 | 29338.25 | 29153.55 | 29234.15 | 42.244289 | 29237.240312 | 1243.0 |
1 | BTC/USD | 2021-01-01 02:00:00-05:00 | 29235.61 | 29236.95 | 28905.00 | 29162.50 | 34.506038 | 29078.423185 | 1070.0 |
2 | BTC/USD | 2021-01-01 03:00:00-05:00 | 29162.50 | 29248.52 | 28948.86 | 29076.77 | 27.596804 | 29091.465155 | 1110.0 |
3 | BTC/USD | 2021-01-01 04:00:00-05:00 | 29075.31 | 29372.32 | 29058.05 | 29284.92 | 20.694200 | 29248.730924 | 880.0 |
4 | BTC/USD | 2021-01-01 05:00:00-05:00 | 29291.54 | 29400.00 | 29232.16 | 29286.63 | 16.617646 | 29338.609132 | 742.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
735 | BTC/USD | 2021-01-31 15:00:00-05:00 | 32837.67 | 32964.87 | 32528.54 | 32882.87 | 40.631122 | 32818.132855 | 2197.0 |
736 | BTC/USD | 2021-01-31 16:00:00-05:00 | 32889.01 | 32935.98 | 32554.59 | 32586.68 | 26.673190 | 32737.975296 | 1625.0 |
737 | BTC/USD | 2021-01-31 17:00:00-05:00 | 32599.00 | 33126.32 | 32599.00 | 32998.35 | 25.422568 | 32923.438893 | 1770.0 |
738 | BTC/USD | 2021-01-31 18:00:00-05:00 | 33000.00 | 33263.94 | 32957.10 | 33134.86 | 31.072017 | 33147.086803 | 2203.0 |
739 | BTC/USD | 2021-01-31 19:00:00-05:00 | 33134.03 | 33134.03 | 32303.44 | 32572.03 | 60.460424 | 32552.937863 | 2665.0 |
740 rows × 9 columns
In the above example, we’re querying for hourly data for the BTC/USD currency pair.