从数据源开始
欢迎来到 PyBroker!最好的起点是学习有关 DataSources 的知识。DataSource 是一个可以从外部来源获取数据的类,你可以使用这些数据来回测你的交易策略。
雅虎财经
PyBroker 内置的一个数据源是 Yahoo Finance。要使用它,你可以导入 YFinance:
[1]:
from pybroker import YFinance
yfinance = YFinance()
df = yfinance.query(['AAPL', 'MSFT'], start_date='3/1/2021', end_date='3/1/2022')
df
Loading bar data...
[*********************100%%**********************] 2 of 2 completed
Loaded bar data: 0:00:00
[1]:
date | symbol | open | high | low | close | volume | adj_close | |
---|---|---|---|---|---|---|---|---|
0 | 2021-03-01 | AAPL | 123.750000 | 127.930000 | 122.790001 | 127.790001 | 116307900 | 125.599655 |
1 | 2021-03-01 | MSFT | 235.899994 | 237.470001 | 233.149994 | 236.940002 | 25324000 | 230.847702 |
2 | 2021-03-02 | AAPL | 128.410004 | 128.720001 | 125.010002 | 125.120003 | 102260900 | 122.975403 |
3 | 2021-03-02 | MSFT | 237.009995 | 237.300003 | 233.449997 | 233.869995 | 22812500 | 227.856628 |
4 | 2021-03-03 | AAPL | 124.809998 | 125.709999 | 121.839996 | 122.059998 | 112966300 | 119.967857 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
501 | 2022-02-24 | MSFT | 272.510010 | 295.160004 | 271.519989 | 294.589996 | 56989700 | 289.353271 |
502 | 2022-02-25 | AAPL | 163.839996 | 165.119995 | 160.869995 | 164.850006 | 91974200 | 162.987427 |
503 | 2022-02-25 | MSFT | 295.140015 | 297.630005 | 291.649994 | 297.309998 | 32546700 | 292.024872 |
504 | 2022-02-28 | AAPL | 163.059998 | 165.419998 | 162.429993 | 165.119995 | 95056600 | 163.254364 |
505 | 2022-02-28 | MSFT | 294.309998 | 299.140015 | 293.000000 | 298.790009 | 34627500 | 293.478607 |
506 rows × 8 columns
上述代码查询了 AAPL 和 MSFT 股票的数据,并返回一个包含结果的 Pandas DataFrame。
缓存数据
如果你想加快数据加载速度,可以使用 PyBroker 的缓存系统来缓存你的查询。你可以通过调用 pybroker.enable_data_source_cache(‘name’) 来启用缓存,其中 name
是你想要使用的缓存名称:
[2]:
import pybroker
pybroker.enable_data_source_cache('yfinance')
[2]:
<diskcache.core.Cache at 0x7f3884390d60>
下一次调用 查询 时,将把返回的数据缓存到磁盘。每个唯一的股票代码和日期范围组合将被单独缓存:
[3]:
yfinance.query(['TSLA', 'IBM'], '3/1/2021', '3/1/2022')
Loading bar data...
[*********************100%%**********************] 2 of 2 completed
Loaded bar data: 0:00:00
[3]:
date | symbol | open | high | low | close | volume | adj_close | |
---|---|---|---|---|---|---|---|---|
0 | 2021-03-01 | IBM | 115.057358 | 116.940727 | 114.588913 | 115.430206 | 5977367 | 100.173241 |
1 | 2021-03-01 | TSLA | 230.036667 | 239.666672 | 228.350006 | 239.476669 | 81408600 | 239.476669 |
2 | 2021-03-02 | IBM | 115.430206 | 116.539200 | 114.971321 | 115.038239 | 4732418 | 99.833076 |
3 | 2021-03-02 | TSLA | 239.426666 | 240.369995 | 228.333328 | 228.813339 | 71196600 | 228.813339 |
4 | 2021-03-03 | IBM | 115.200768 | 117.237091 | 114.703636 | 116.978966 | 7744898 | 101.517288 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
501 | 2022-02-24 | TSLA | 233.463333 | 267.493347 | 233.333328 | 266.923340 | 135322200 | 266.923340 |
502 | 2022-02-25 | IBM | 122.050003 | 124.260002 | 121.449997 | 124.180000 | 4460900 | 113.041489 |
503 | 2022-02-25 | TSLA | 269.743347 | 273.166656 | 260.799988 | 269.956665 | 76067700 | 269.956665 |
504 | 2022-02-28 | IBM | 122.209999 | 123.389999 | 121.040001 | 122.510002 | 6757300 | 111.521271 |
505 | 2022-02-28 | TSLA | 271.670013 | 292.286682 | 271.570007 | 290.143341 | 99006900 | 290.143341 |
506 rows × 8 columns
再次使用相同的股票代码和日期范围调用 查询
时,将返回缓存的数据:
[4]:
df = yfinance.query(['TSLA', 'IBM'], '3/1/2021', '3/1/2022')
df
Loaded cached bar data.
[4]:
date | symbol | open | high | low | close | volume | adj_close | |
---|---|---|---|---|---|---|---|---|
0 | 2021-03-01 | IBM | 115.057358 | 116.940727 | 114.588913 | 115.430206 | 5977367 | 100.173241 |
1 | 2021-03-02 | IBM | 115.430206 | 116.539200 | 114.971321 | 115.038239 | 4732418 | 99.833076 |
2 | 2021-03-03 | IBM | 115.200768 | 117.237091 | 114.703636 | 116.978966 | 7744898 | 101.517288 |
3 | 2021-03-04 | IBM | 116.634796 | 117.801147 | 113.537285 | 114.827919 | 8439651 | 99.650551 |
4 | 2021-03-05 | IBM | 115.334610 | 118.307838 | 114.961761 | 117.428299 | 7268968 | 101.907227 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
248 | 2022-02-22 | TSLA | 278.043335 | 285.576660 | 267.033325 | 273.843323 | 83288100 | 273.843323 |
249 | 2022-02-23 | TSLA | 276.809998 | 278.433319 | 253.520004 | 254.679993 | 95256900 | 254.679993 |
250 | 2022-02-24 | TSLA | 233.463333 | 267.493347 | 233.333328 | 266.923340 | 135322200 | 266.923340 |
251 | 2022-02-25 | TSLA | 269.743347 | 273.166656 | 260.799988 | 269.956665 | 76067700 | 269.956665 |
252 | 2022-02-28 | TSLA | 271.670013 | 292.286682 | 271.570007 | 290.143341 | 99006900 | 290.143341 |
506 rows × 8 columns
你可以使用 pybroker.clear_data_source_cache 清除缓存:
[5]:
pybroker.clear_data_source_cache()
或者使用 pybroker.disable_data_source_cache 完全禁用缓存:
[6]:
pybroker.disable_data_source_cache()
请注意,在调用这些方法之前,应先调用 pybroker.enable_data_source_cache。
Alpaca
PyBroker 还包括了一个 Alpaca 数据源
,用于获取股票数据。要使用它,可以导入 Alpaca 并提供你的 API 密钥和密钥:
[7]:
from pybroker import Alpaca
import os
alpaca = Alpaca(os.environ['ALPACA_API_KEY'], os.environ['ALPACA_API_SECRET'])
你可以使用与 Yahoo Finance 相同的语法查询 Alpaca
的股票数据,但 Alpaca 还支持按不同时间段查询数据。例如,要查询 1 分钟的数据:
[8]:
df = alpaca.query(
['AAPL', 'MSFT'],
start_date='3/1/2021',
end_date='4/1/2021',
timeframe='1m'
)
df
Loading bar data...
Loaded bar data: 0:00:05
[8]:
date | symbol | open | high | low | close | volume | vwap | |
---|---|---|---|---|---|---|---|---|
0 | 2021-03-01 04:00:00-05:00 | AAPL | 124.30 | 124.56 | 124.30 | 124.50 | 12267.0 | 124.433365 |
1 | 2021-03-01 04:00:00-05:00 | MSFT | 235.87 | 236.00 | 235.87 | 236.00 | 1429.0 | 235.938887 |
2 | 2021-03-01 04:01:00-05:00 | AAPL | 124.56 | 124.60 | 124.30 | 124.30 | 9439.0 | 124.481323 |
3 | 2021-03-01 04:01:00-05:00 | MSFT | 236.17 | 236.17 | 236.17 | 236.17 | 104.0 | 236.161538 |
4 | 2021-03-01 04:02:00-05:00 | AAPL | 124.00 | 124.05 | 123.78 | 123.78 | 4834.0 | 123.935583 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
33340 | 2021-03-31 19:57:00-04:00 | MSFT | 237.28 | 237.28 | 237.28 | 237.28 | 507.0 | 237.367870 |
33341 | 2021-03-31 19:58:00-04:00 | AAPL | 122.36 | 122.39 | 122.33 | 122.39 | 3403.0 | 122.360544 |
33342 | 2021-03-31 19:58:00-04:00 | MSFT | 237.40 | 237.40 | 237.35 | 237.35 | 636.0 | 237.378066 |
33343 | 2021-03-31 19:59:00-04:00 | AAPL | 122.39 | 122.45 | 122.38 | 122.45 | 5560.0 | 122.402606 |
33344 | 2021-03-31 19:59:00-04:00 | MSFT | 237.40 | 237.53 | 237.40 | 237.53 | 1163.0 | 237.473801 |
33345 rows × 8 columns
Alpaca Crypto
如果你想获取加密货币数据,可以使用 AlpacaCrypto。以下是使用 AlpacaCrypto 的示例:
[9]:
from pybroker import AlpacaCrypto
crypto = AlpacaCrypto(
os.environ['ALPACA_API_KEY'],
os.environ['ALPACA_API_SECRET']
)
df = crypto.query('BTC/USD', start_date='1/1/2021', end_date='2/1/2021', timeframe='1h')
df
Loading bar data...
Loaded bar data: 0:00:06
[9]:
symbol | date | open | high | low | close | volume | vwap | trade_count | |
---|---|---|---|---|---|---|---|---|---|
0 | BTC/USD | 2021-01-01 01:00:00-05:00 | 29255.71 | 29338.25 | 29153.55 | 29234.15 | 42.244289 | 29237.240312 | 1243.0 |
1 | BTC/USD | 2021-01-01 02:00:00-05:00 | 29235.61 | 29236.95 | 28905.00 | 29162.50 | 34.506038 | 29078.423185 | 1070.0 |
2 | BTC/USD | 2021-01-01 03:00:00-05:00 | 29162.50 | 29248.52 | 28948.86 | 29076.77 | 27.596804 | 29091.465155 | 1110.0 |
3 | BTC/USD | 2021-01-01 04:00:00-05:00 | 29075.31 | 29372.32 | 29058.05 | 29284.92 | 20.694200 | 29248.730924 | 880.0 |
4 | BTC/USD | 2021-01-01 05:00:00-05:00 | 29291.54 | 29400.00 | 29232.16 | 29286.63 | 16.617646 | 29338.609132 | 742.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
735 | BTC/USD | 2021-01-31 15:00:00-05:00 | 32837.67 | 32964.87 | 32528.54 | 32882.87 | 40.631122 | 32818.132855 | 2197.0 |
736 | BTC/USD | 2021-01-31 16:00:00-05:00 | 32889.01 | 32935.98 | 32554.59 | 32586.68 | 26.673190 | 32737.975296 | 1625.0 |
737 | BTC/USD | 2021-01-31 17:00:00-05:00 | 32599.00 | 33126.32 | 32599.00 | 32998.35 | 25.422568 | 32923.438893 | 1770.0 |
738 | BTC/USD | 2021-01-31 18:00:00-05:00 | 33000.00 | 33263.94 | 32957.10 | 33134.86 | 31.072017 | 33147.086803 | 2203.0 |
739 | BTC/USD | 2021-01-31 19:00:00-05:00 | 33134.03 | 33134.03 | 32303.44 | 32572.03 | 60.460424 | 32552.937863 | 2665.0 |
740 rows × 9 columns
在上面的示例中,我们正在查询 BTC/USD 货币对的小时数据。