从数据源开始

欢迎来到 PyBroker!最好的起点是学习有关 DataSources 的知识。DataSource 是一个可以从外部来源获取数据的类,你可以使用这些数据来回测你的交易策略。

雅虎财经

PyBroker 内置的一个数据源是 Yahoo Finance。要使用它,你可以导入 YFinance

[1]:
from pybroker import YFinance

yfinance = YFinance()
df = yfinance.query(['AAPL', 'MSFT'], start_date='3/1/2021', end_date='3/1/2022')
df
Loading bar data...
[*********************100%%**********************]  2 of 2 completed
Loaded bar data: 0:00:00


[1]:
date symbol open high low close volume adj_close
0 2021-03-01 AAPL 123.750000 127.930000 122.790001 127.790001 116307900 125.599655
1 2021-03-01 MSFT 235.899994 237.470001 233.149994 236.940002 25324000 230.847702
2 2021-03-02 AAPL 128.410004 128.720001 125.010002 125.120003 102260900 122.975403
3 2021-03-02 MSFT 237.009995 237.300003 233.449997 233.869995 22812500 227.856628
4 2021-03-03 AAPL 124.809998 125.709999 121.839996 122.059998 112966300 119.967857
... ... ... ... ... ... ... ... ...
501 2022-02-24 MSFT 272.510010 295.160004 271.519989 294.589996 56989700 289.353271
502 2022-02-25 AAPL 163.839996 165.119995 160.869995 164.850006 91974200 162.987427
503 2022-02-25 MSFT 295.140015 297.630005 291.649994 297.309998 32546700 292.024872
504 2022-02-28 AAPL 163.059998 165.419998 162.429993 165.119995 95056600 163.254364
505 2022-02-28 MSFT 294.309998 299.140015 293.000000 298.790009 34627500 293.478607

506 rows × 8 columns

上述代码查询了 AAPL 和 MSFT 股票的数据,并返回一个包含结果的 Pandas DataFrame

缓存数据

如果你想加快数据加载速度,可以使用 PyBroker 的缓存系统来缓存你的查询。你可以通过调用 pybroker.enable_data_source_cache(‘name’) 来启用缓存,其中 name 是你想要使用的缓存名称:

[2]:
import pybroker

pybroker.enable_data_source_cache('yfinance')
[2]:
<diskcache.core.Cache at 0x7f3884390d60>

下一次调用 查询 时,将把返回的数据缓存到磁盘。每个唯一的股票代码和日期范围组合将被单独缓存:

[3]:
yfinance.query(['TSLA', 'IBM'], '3/1/2021', '3/1/2022')
Loading bar data...
[*********************100%%**********************]  2 of 2 completed
Loaded bar data: 0:00:00


[3]:
date symbol open high low close volume adj_close
0 2021-03-01 IBM 115.057358 116.940727 114.588913 115.430206 5977367 100.173241
1 2021-03-01 TSLA 230.036667 239.666672 228.350006 239.476669 81408600 239.476669
2 2021-03-02 IBM 115.430206 116.539200 114.971321 115.038239 4732418 99.833076
3 2021-03-02 TSLA 239.426666 240.369995 228.333328 228.813339 71196600 228.813339
4 2021-03-03 IBM 115.200768 117.237091 114.703636 116.978966 7744898 101.517288
... ... ... ... ... ... ... ... ...
501 2022-02-24 TSLA 233.463333 267.493347 233.333328 266.923340 135322200 266.923340
502 2022-02-25 IBM 122.050003 124.260002 121.449997 124.180000 4460900 113.041489
503 2022-02-25 TSLA 269.743347 273.166656 260.799988 269.956665 76067700 269.956665
504 2022-02-28 IBM 122.209999 123.389999 121.040001 122.510002 6757300 111.521271
505 2022-02-28 TSLA 271.670013 292.286682 271.570007 290.143341 99006900 290.143341

506 rows × 8 columns

再次使用相同的股票代码和日期范围调用 查询 时,将返回缓存的数据:

[4]:
df = yfinance.query(['TSLA', 'IBM'], '3/1/2021', '3/1/2022')
df
Loaded cached bar data.

[4]:
date symbol open high low close volume adj_close
0 2021-03-01 IBM 115.057358 116.940727 114.588913 115.430206 5977367 100.173241
1 2021-03-02 IBM 115.430206 116.539200 114.971321 115.038239 4732418 99.833076
2 2021-03-03 IBM 115.200768 117.237091 114.703636 116.978966 7744898 101.517288
3 2021-03-04 IBM 116.634796 117.801147 113.537285 114.827919 8439651 99.650551
4 2021-03-05 IBM 115.334610 118.307838 114.961761 117.428299 7268968 101.907227
... ... ... ... ... ... ... ... ...
248 2022-02-22 TSLA 278.043335 285.576660 267.033325 273.843323 83288100 273.843323
249 2022-02-23 TSLA 276.809998 278.433319 253.520004 254.679993 95256900 254.679993
250 2022-02-24 TSLA 233.463333 267.493347 233.333328 266.923340 135322200 266.923340
251 2022-02-25 TSLA 269.743347 273.166656 260.799988 269.956665 76067700 269.956665
252 2022-02-28 TSLA 271.670013 292.286682 271.570007 290.143341 99006900 290.143341

506 rows × 8 columns

你可以使用 pybroker.clear_data_source_cache 清除缓存:

[5]:
pybroker.clear_data_source_cache()

或者使用 pybroker.disable_data_source_cache 完全禁用缓存:

[6]:
pybroker.disable_data_source_cache()

请注意,在调用这些方法之前,应先调用 pybroker.enable_data_source_cache

Alpaca

PyBroker 还包括了一个 Alpaca 数据源,用于获取股票数据。要使用它,可以导入 Alpaca 并提供你的 API 密钥和密钥:

[7]:
from pybroker import Alpaca
import os

alpaca = Alpaca(os.environ['ALPACA_API_KEY'], os.environ['ALPACA_API_SECRET'])

你可以使用与 Yahoo Finance 相同的语法查询 Alpaca 的股票数据,但 Alpaca 还支持按不同时间段查询数据。例如,要查询 1 分钟的数据:

[8]:
df = alpaca.query(
    ['AAPL', 'MSFT'],
    start_date='3/1/2021',
    end_date='4/1/2021',
    timeframe='1m'
)
df
Loading bar data...
Loaded bar data: 0:00:05

[8]:
date symbol open high low close volume vwap
0 2021-03-01 04:00:00-05:00 AAPL 124.30 124.56 124.30 124.50 12267.0 124.433365
1 2021-03-01 04:00:00-05:00 MSFT 235.87 236.00 235.87 236.00 1429.0 235.938887
2 2021-03-01 04:01:00-05:00 AAPL 124.56 124.60 124.30 124.30 9439.0 124.481323
3 2021-03-01 04:01:00-05:00 MSFT 236.17 236.17 236.17 236.17 104.0 236.161538
4 2021-03-01 04:02:00-05:00 AAPL 124.00 124.05 123.78 123.78 4834.0 123.935583
... ... ... ... ... ... ... ... ...
33340 2021-03-31 19:57:00-04:00 MSFT 237.28 237.28 237.28 237.28 507.0 237.367870
33341 2021-03-31 19:58:00-04:00 AAPL 122.36 122.39 122.33 122.39 3403.0 122.360544
33342 2021-03-31 19:58:00-04:00 MSFT 237.40 237.40 237.35 237.35 636.0 237.378066
33343 2021-03-31 19:59:00-04:00 AAPL 122.39 122.45 122.38 122.45 5560.0 122.402606
33344 2021-03-31 19:59:00-04:00 MSFT 237.40 237.53 237.40 237.53 1163.0 237.473801

33345 rows × 8 columns

Alpaca Crypto

如果你想获取加密货币数据,可以使用 AlpacaCrypto。以下是使用 AlpacaCrypto 的示例:

[9]:
from pybroker import AlpacaCrypto

crypto = AlpacaCrypto(
    os.environ['ALPACA_API_KEY'],
    os.environ['ALPACA_API_SECRET']
)
df = crypto.query('BTC/USD', start_date='1/1/2021', end_date='2/1/2021', timeframe='1h')
df
Loading bar data...
Loaded bar data: 0:00:06

[9]:
symbol date open high low close volume vwap trade_count
0 BTC/USD 2021-01-01 01:00:00-05:00 29255.71 29338.25 29153.55 29234.15 42.244289 29237.240312 1243.0
1 BTC/USD 2021-01-01 02:00:00-05:00 29235.61 29236.95 28905.00 29162.50 34.506038 29078.423185 1070.0
2 BTC/USD 2021-01-01 03:00:00-05:00 29162.50 29248.52 28948.86 29076.77 27.596804 29091.465155 1110.0
3 BTC/USD 2021-01-01 04:00:00-05:00 29075.31 29372.32 29058.05 29284.92 20.694200 29248.730924 880.0
4 BTC/USD 2021-01-01 05:00:00-05:00 29291.54 29400.00 29232.16 29286.63 16.617646 29338.609132 742.0
... ... ... ... ... ... ... ... ... ...
735 BTC/USD 2021-01-31 15:00:00-05:00 32837.67 32964.87 32528.54 32882.87 40.631122 32818.132855 2197.0
736 BTC/USD 2021-01-31 16:00:00-05:00 32889.01 32935.98 32554.59 32586.68 26.673190 32737.975296 1625.0
737 BTC/USD 2021-01-31 17:00:00-05:00 32599.00 33126.32 32599.00 32998.35 25.422568 32923.438893 1770.0
738 BTC/USD 2021-01-31 18:00:00-05:00 33000.00 33263.94 32957.10 33134.86 31.072017 33147.086803 2203.0
739 BTC/USD 2021-01-31 19:00:00-05:00 33134.03 33134.03 32303.44 32572.03 60.460424 32552.937863 2665.0

740 rows × 9 columns

在上面的示例中,我们正在查询 BTC/USD 货币对的小时数据。

AKShare

PyBroker 还包括了一个 AKShare 数据源,用于获取中国股票数据。AKShare 是一个广泛使用的开源包,专门用于获取金融数据,重点关注中国市场。与 yfinance 相比,这个免费工具为用户提供了更高质量的中国市场数据。要使用它,可以导入 AKShare

[10]:
from pybroker.ext.data import AKShare

akshare = AKShare()
# You can substitute 000001.SZ with 000001, and it will still work!
# and you can set start_date as "20210301" format
# You can also set adjust to 'qfq' or 'hfq' to adjust the data,
# and set timeframe to '1d', '1w' to get daily, weekly data
df = akshare.query(
    symbols=['000001.SZ', '600000.SH'],
    start_date='3/1/2021',
    end_date='3/1/2023',
    adjust="",
    timeframe="1d",
)
df
Loading bar data...
Loaded bar data: 0:00:10

[10]:
date symbol open high low close volume
0 2021-03-01 000001.SZ 21.54 21.68 21.18 21.45 1125387
1 2021-03-01 600000.SH 10.59 10.64 10.50 10.58 547461
2 2021-03-02 000001.SZ 21.62 22.15 21.26 21.65 1473425
3 2021-03-02 600000.SH 10.61 10.70 10.36 10.47 747631
4 2021-03-03 000001.SZ 21.58 23.08 21.46 23.01 1919635
... ... ... ... ... ... ... ...
969 2023-02-27 600000.SH 7.16 7.20 7.16 7.16 158006
970 2023-02-28 000001.SZ 13.75 13.85 13.61 13.78 607936
971 2023-02-28 600000.SH 7.18 7.20 7.14 7.18 174481
972 2023-03-01 000001.SZ 13.80 14.19 13.74 14.17 1223452
973 2023-03-01 600000.SH 7.17 7.27 7.17 7.26 256613

974 rows × 7 columns

注意:如果上述导入产生 Native library not available 错误,但是你还想使用AKShare,那么可以参考 see this issue for details on how to resolve it

在下一篇文章中,我们将研究如何使用 DataSources 对一个简单的交易策略进行回测