diff --git "a/B-\345\233\240\345\255\220\346\236\204\345\273\272\347\261\273/\345\233\240\345\255\220\346\213\251\346\227\266/\345\205\211\345\244\247\350\257\201\345\210\270-Alpha1.0&Alpha2.pdf" "b/B-\345\233\240\345\255\220\346\236\204\345\273\272\347\261\273/\345\233\240\345\255\220\346\213\251\346\227\266/\345\205\211\345\244\247\350\257\201\345\210\270-Alpha1.0&Alpha2.pdf"
new file mode 100644
index 0000000..27558aa
Binary files /dev/null and "b/B-\345\233\240\345\255\220\346\236\204\345\273\272\347\261\273/\345\233\240\345\255\220\346\213\251\346\227\266/\345\205\211\345\244\247\350\257\201\345\210\270-Alpha1.0&Alpha2.pdf" differ
diff --git "a/B-\345\233\240\345\255\220\346\236\204\345\273\272\347\261\273/\345\233\240\345\255\220\346\213\251\346\227\266/\345\233\240\345\255\220\346\213\251\346\227\266\347\240\224\347\251\266.ipynb" "b/B-\345\233\240\345\255\220\346\236\204\345\273\272\347\261\273/\345\233\240\345\255\220\346\213\251\346\227\266/\345\233\240\345\255\220\346\213\251\346\227\266\347\240\224\347\251\266.ipynb"
new file mode 100644
index 0000000..e257a70
--- /dev/null
+++ "b/B-\345\233\240\345\255\220\346\236\204\345\273\272\347\261\273/\345\233\240\345\255\220\346\213\251\346\227\266/\345\233\240\345\255\220\346\213\251\346\227\266\347\240\224\347\251\266.ipynb"
@@ -0,0 +1,4895 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {
+ "code_folding": [
+ 0
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "# 引入库\n",
+ "from jqdata import *\n",
+ "from jqfactor import get_factor_values\n",
+ "\n",
+ "# 日常处理\n",
+ "import datetime\n",
+ "import calendar\n",
+ "from dateutil.relativedelta import relativedelta\n",
+ "\n",
+ "# 常用库\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "\n",
+ "\n",
+ "# 爬虫用\n",
+ "import json\n",
+ "import time\n",
+ "import re\n",
+ "import requests\n",
+ "\n",
+ "# 其他\n",
+ "from alphalens.utils import print_table\n",
+ "from tqdm import * # 进度条\n",
+ "import itertools\n",
+ "import copy\n",
+ "import pickle\n",
+ "\n",
+ "# 线性模型库\n",
+ "import statsmodels.api as sm\n",
+ "\n",
+ "# 计算\n",
+ "import scipy.stats as ss\n",
+ "from scipy.stats import zscore,spearmanr\n",
+ "\n",
+ "# 机器学习\n",
+ "from sklearn import linear_model,svm \n",
+ "from sklearn.tree import DecisionTreeClassifier\n",
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "\n",
+ "# 画图\n",
+ "from pylab import mpl\n",
+ "import seaborn as sns\n",
+ "import matplotlib.pyplot as plt\n",
+ "%matplotlib inline\n",
+ "plt.rcParams['font.family'] = 'serif'\n",
+ "\n",
+ "# 用来正常显示负号\n",
+ "plt.rcParams['axes.unicode_minus'] = False\n",
+ "\n",
+ "plt.style.use('seaborn')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 一、择时模型:分类模型 & 条件期望\n",
+ "\n",
+ "\n",
+ "\n",
+ "择时模型主要可分为2类:`分类预测模型`和`条件期望模型`。 \n",
+ "\n",
+ "**1. 分类预测模型:** \n",
+ "\n",
+ "+ 预测因子未来收益的`方向`;\n",
+ "\n",
+ "\n",
+ "+ 用到的`模型`主要有:决策树(随机森林、GBDT等),逻辑回归(Logistic Regression),支持向量机(SVM)等。\n",
+ "\n",
+ "\n",
+ "**2. 条件期望模型:**\n",
+ "\n",
+ "+ 假设因子收益与条件变量服从联合正态分布;\n",
+ "\n",
+ "+ 求解因子收益的条件期望和条件协方差;\n",
+ "\n",
+ "+ `缺点:`联合正态分布假设难以满足。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 二、分类模型简介\n",
+ "\n",
+ "\n",
+ "## 2.1 逻辑回归:操作简单但要求特征分类线性\n",
+ "\n",
+ "+ 逻辑回归方法使用Sigmoid函数来归一化回归方程中的预测值,使p(x)的取值保持在(0,1)区间内,从而将分类问题映射到回归方程:\n",
+ "\n",
+ "### $$P(x_{1})=\\frac{e^{w_{0}+w_{1}x_{1}+...+w_{p}x_{p}}}{1+e^{w_{0}+w_{1}x_{1}+...+w_{p}x_{p}}}$$\n",
+ "\n",
+ "+ 因此回归方程也可写为:\n",
+ "\n",
+ "$$log\\left (\\frac{P(x)}{1-P(x)} \\right )=w_{0}+w_{1}x_{1}+...+w_{p}x_{p}$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2.2 决策树:决策过程直观,应用广泛\n",
+ "\n",
+ "决策树是一种基本的分类算法,构建一棵决策树的关键之处在于,每一步选择哪种特征作为节点分裂的规则。其**核心原则**是使得节点分裂后的信息增益最大。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2.3 SVM:低维到高维,适合小样本\n",
+ "\n",
+ "SVM是一种二类分类模型,对于在低维空间中线性不可分的输入变量,采用核函数将输入变量映射到高维特征空间,在这个高维空间中构造最优分类超平面。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2.4 三种分类模型的优缺点对比\n",
+ "\n",
+ "**1.逻辑回归:**\n",
+ "\n",
+ "+ 优点:可输出预测概率;\n",
+ "\n",
+ "+ 缺点:特征变量较多时表现不佳;对于非线性分类问题的处理较困难(依赖于线性变换模型)。\n",
+ "\n",
+ "**2.决策树:**\n",
+ "\n",
+ "+ 优点:决策过程很直观易理解可以解决非线性分类问题;可以处理特征变量之间的相互关系;\n",
+ "\n",
+ "+ 缺点:容易过拟合(可使用随机森林降低过拟概率)。\n",
+ "\n",
+ "**3.支持向量机:**\n",
+ "\n",
+ "+ 优点:可以处理特征空间较大的分类问题;可以处理特征变量之间的非线性相关性;可以用于训练集较小的情形;\n",
+ "\n",
+ "+ 缺点:预测变量过多时运行效率较低;核函数的选择没有统一标准。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 三、入选因子以及相关系数矩阵\n",
+ " \n",
+ "从光大Alpha因子体系,即**估值、质量、成长、规模、波动、换手、流动性、动量因子**中挑选如下12个常用因子作为SVM因子择时的测试对象:\n",
+ "\n",
+ "|因子类型|因子简称|因子含义| \n",
+ "|---|---|---|\n",
+ "|估值|book_to_price_ratio|账面市值比| \n",
+ "|估值|PEG|市盈率相对盈利增长比率| \n",
+ "|规模|ln(market_cap)|总市值对数| \n",
+ "|质量|cfo_to_ev|经营活动产生的现金流量净额与企业价值之比TTM|\n",
+ "|成长|total_asset_growth_rate|总资产增长率|\n",
+ "|成长|roe_ttm|权益回报率TTM|\n",
+ "|杠杆|LVGI|财务杠杆指数|\n",
+ "|动量|ROC20|20日变动速率| \n",
+ "|动量|ROC60|60日变动速率| \n",
+ "|波动|sharpe_ratio_20|20日夏普比率| \n",
+ "|流动性|VOL20|20日平均换手率|\n",
+ "|流动性|Volume1M|当前交易量相比过去1个月日均交易量 与过去过去20日日均收益率乘积|\n",
+ " \n",
+ "**样本筛选规则:**\n",
+ "+ 回测区间:2014年1月1日至2019年12月31日(后期需要用到过去36个月的因子收益均值,所以需将起始时间前推36个月,即推前至2011年1月1日); \n",
+ "+ 调仓日:每月最后一个交易日为调仓日,以每月最后一个交易日的收盘价买入卖出; \n",
+ "+ 股票池:HS300;剔除选股日的ST股票;剔除上市不满6个月的股票;剔除选股日由于停牌等原因而无法买入的股票。 "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "**上述因子的初步筛选方式:**\n",
+ "\n",
+ "从大类因子中筛选收益率显著,高IC、IR并且单调性得分高的因子:\n",
+ "\n",
+ "单调性得分计算方式:\n",
+ "\n",
+ "$$Mono\\_Score = \\frac{R_5-R_1}{R_4-R_2}$$\n",
+ "注:$R_i$代表分层回溯法得到的第i组的年化收益率\n",
+ "\n",
+ "---\n",
+ "\n",
+ "
***对因子进行打分***\n",
+ "\n",
+ "|筛选指标|指标说明|打分标准(绝对值)|\n",
+ "|--|--|--|\n",
+ "|Factor_Ret|最近60个月因子收益均值|>0.002|\n",
+ "|Factor_Ret_tvalue|最近60个月因子收益率t值|>2|\n",
+ "|IC|信息系数|>0.02|\n",
+ "|IR|信息系数比|>0.2|\n",
+ "|Monotony|单调性得分|>2|"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 3.1 数据提取\n",
+ "\n",
+ "根据样本筛选规则从jqdata的API接口中提取择时因子的原始数据。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "code_folding": [
+ 1,
+ 17,
+ 48
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "## step1:获取回测区间每月最后一个交易日\n",
+ "def GetTradePerid(start_date: str, end_date: str, freq: str = 'M') -> list:\n",
+ " '''\n",
+ " start_date/end_date:str YYYY-MM-DD\n",
+ " freq:M月末,Q季末,Y年末 默认M\n",
+ " ================\n",
+ " return datetime.date list \n",
+ " '''\n",
+ " days = [x.date() for x in pd.date_range(start_date, end_date, freq=freq)]\n",
+ "\n",
+ " return [\n",
+ " d if d in get_trade_days(start_date, end_date) else get_trade_days(\n",
+ " end_date=d, count=1)[0] for d in days\n",
+ " ]\n",
+ "\n",
+ "\n",
+ "## step2:获取回测区间每月最后一个交易日的满足筛选条件的股票池\n",
+ "def GetStocks(symbol: str, trDate: datetime.date, limit: int = 6) -> list:\n",
+ " '''\n",
+ " symobl:指数代码\n",
+ " trDate:交易日期\n",
+ " limit:上市不足N月 默认未上市不足6月\n",
+ " ================\n",
+ " return list \n",
+ " '''\n",
+ " stocks = get_index_stocks(symbol, date=trDate)\n",
+ "\n",
+ " # 1.过滤ST\n",
+ " is_st = get_extras('is_st', stocks, end_date=trDate, count=1).iloc[-1]\n",
+ "\n",
+ " stocks = is_st[is_st == False].index.tolist()\n",
+ "\n",
+ " # 2.过滤上市不足6月股票\n",
+ " stocks = [\n",
+ " s for s in stocks\n",
+ " if get_security_info(symbol, date=trDate).start_date < trDate -\n",
+ " datetime.timedelta(limit * 30)\n",
+ " ]\n",
+ "\n",
+ " # 3.过滤当日未交易股票\n",
+ " pause = get_price(\n",
+ " stocks, end_date=trDate, fields='paused', count=1, panel=False)\n",
+ " stocks = pause.query('paused==0')['code'].values.tolist()\n",
+ "\n",
+ " return stocks\n",
+ "\n",
+ "\n",
+ "## step3: 提取因子\n",
+ "def GetFactors(dates: list):\n",
+ "\n",
+ " factors_list = []\n",
+ "\n",
+ " for date in tqdm(dates, desc='Download Factors'):\n",
+ "\n",
+ " ## 提取每月末的股票池\n",
+ " stocks = GetStocks('000300.XSHG', date)\n",
+ "\n",
+ " ## 提取每月末对应的因子值\n",
+ " factors = factors = [\n",
+ " 'book_to_price_ratio', 'PEG', 'market_cap', 'cfo_to_ev',\n",
+ " 'total_asset_growth_rate', 'roe_ttm', 'LVGI', 'ROC20', 'ROC60',\n",
+ " 'sharpe_ratio_20', 'VOL20', 'Volume1M'\n",
+ " ]\n",
+ " \n",
+ " f_dict = get_factor_values(\n",
+ " stocks, factors, end_date=date, count=1)\n",
+ " \n",
+ " f_df = pd.concat(f_dict, axis=1).stack()\n",
+ "\n",
+ " ## 提取辅助因子:申万一级行业名称,用于行业中性化\n",
+ " ind = get_industry(security=stocks, date=date)\n",
+ "\n",
+ " ind = {\n",
+ " x: v.get('sw_l1').get('industry_name', np.nan)\n",
+ " for x in ind.keys() for v in ind.values() if 'sw_l1' in v.keys()\n",
+ " }\n",
+ "\n",
+ " f_df['INDUSTRY'] = list(\n",
+ " map(lambda x: ind.get(x, np.nan), f_df.index.get_level_values(1)))\n",
+ "\n",
+ " factors_list.append(f_df) # 将每月末提取的因子数据存list中\n",
+ "\n",
+ " factors_df = pd.concat(factors_list)\n",
+ " factors_df.index.names = ['date','code']\n",
+ " return factors_df"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "## 设置回测区间\n",
+ "start_date = '2014-01-01'\n",
+ "end_date = '2019-12-31'\n",
+ "\n",
+ "# 前推36个月\n",
+ "begin_date = pd.date_range(end=start_date,periods=36,freq='M')[0].strftime('%Y-%m-%d')\n",
+ "dates = GetTradePerid(begin_date, end_date, freq='M')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Download Factors: 100%|██████████| 108/108 [01:52<00:00, 1.30it/s]\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " | \n",
+ " LVGI | \n",
+ " PEG | \n",
+ " ROC20 | \n",
+ " ROC60 | \n",
+ " VOL20 | \n",
+ " Volume1M | \n",
+ " book_to_price_ratio | \n",
+ " cfo_to_ev | \n",
+ " market_cap | \n",
+ " roe_ttm | \n",
+ " sharpe_ratio_20 | \n",
+ " total_asset_growth_rate | \n",
+ " INDUSTRY | \n",
+ "
\n",
+ " \n",
+ " date | \n",
+ " code | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 2011-01-31 | \n",
+ " 000001.XSHE | \n",
+ " 0.999736 | \n",
+ " NaN | \n",
+ " -3.047619 | \n",
+ " -19.077901 | \n",
+ " 0.7905 | \n",
+ " -0.000787 | \n",
+ " 1.659492 | \n",
+ " NaN | \n",
+ " 5.335556e+10 | \n",
+ " 0.190618 | \n",
+ " -1.218289 | \n",
+ " 0.217945 | \n",
+ " 金融服务I | \n",
+ "
\n",
+ " \n",
+ " 000002.XSHE | \n",
+ " 0.993440 | \n",
+ " 0.809803 | \n",
+ " -0.310078 | \n",
+ " -15.837696 | \n",
+ " 1.0840 | \n",
+ " 0.000219 | \n",
+ " 0.590631 | \n",
+ " -0.016202 | \n",
+ " 9.016072e+10 | \n",
+ " 0.140209 | \n",
+ " -0.152238 | \n",
+ " 0.451210 | \n",
+ " 金融服务I | \n",
+ "
\n",
+ " \n",
+ " 000009.XSHE | \n",
+ " 1.020497 | \n",
+ " 1.314595 | \n",
+ " 5.889015 | \n",
+ " 6.613455 | \n",
+ " 3.0240 | \n",
+ " 0.005626 | \n",
+ " -1.613173 | \n",
+ " -0.031213 | \n",
+ " 1.937173e+10 | \n",
+ " 0.129520 | \n",
+ " 1.432914 | \n",
+ " 0.355307 | \n",
+ " 金融服务I | \n",
+ "
\n",
+ " \n",
+ " 000012.XSHE | \n",
+ " 0.906201 | \n",
+ " 0.163172 | \n",
+ " -0.330852 | \n",
+ " -4.968454 | \n",
+ " 2.2345 | \n",
+ " -0.001619 | \n",
+ " -1.432409 | \n",
+ " 0.049135 | \n",
+ " 3.940520e+10 | \n",
+ " 0.217347 | \n",
+ " -0.763299 | \n",
+ " 0.066277 | \n",
+ " 金融服务I | \n",
+ "
\n",
+ " \n",
+ " 000021.XSHE | \n",
+ " 1.357714 | \n",
+ " 1.437908 | \n",
+ " -6.384977 | \n",
+ " -21.619497 | \n",
+ " 0.4765 | \n",
+ " -0.004083 | \n",
+ " -0.606767 | \n",
+ " 0.040464 | \n",
+ " 1.412947e+10 | \n",
+ " 0.092667 | \n",
+ " -2.018794 | \n",
+ " 0.225302 | \n",
+ " 金融服务I | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " LVGI ... INDUSTRY\n",
+ "date code ... \n",
+ "2011-01-31 000001.XSHE 0.999736 ... 金融服务I\n",
+ " 000002.XSHE 0.993440 ... 金融服务I\n",
+ " 000009.XSHE 1.020497 ... 金融服务I\n",
+ " 000012.XSHE 0.906201 ... 金融服务I\n",
+ " 000021.XSHE 1.357714 ... 金融服务I\n",
+ "\n",
+ "[5 rows x 13 columns]"
+ ]
+ },
+ "execution_count": 12,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 数据获取\n",
+ "datas = GetFactors(dates)\n",
+ "datas.to_csv('../Data/SVM.csv')\n",
+ "\n",
+ "datas.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 3.2 数据清洗\n",
+ "\n",
+ "对提取的因子原始数据进行数据清洗,包括**异常值处理**、**缺失值处理**、**标准化处理**以及**正交化处理**,各处理过程所使用的方法如下所示:\n",
+ "\n",
+ "+ Step1 异常值处理:采用稳健的`绝对中位数法MAD`;\n",
+ " \n",
+ " \n",
+ "+ Step2 缺失值处理:由于行业缺失比例很小,如果有缺失,则直接删除行业缺失值;对于择时因子,则用该天的`行业中位数代替`;对于缺失比例很大(如超过20%)的因子则直接删除;\n",
+ "\n",
+ "\n",
+ "+ Step3 标准化处理:采用`z值标准化`;\n",
+ "\n",
+ "\n",
+ "+ Step4 正交化处理:采用`对称正交法`,因为用该方法正交后的因子值与原始值的相似程度高于其他正交方法,而且不需要确定正交顺序,计算效率高。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " | \n",
+ " LVGI | \n",
+ " PEG | \n",
+ " ROC20 | \n",
+ " ROC60 | \n",
+ " VOL20 | \n",
+ " Volume1M | \n",
+ " book_to_price_ratio | \n",
+ " cfo_to_ev | \n",
+ " market_cap | \n",
+ " roe_ttm | \n",
+ " sharpe_ratio_20 | \n",
+ " total_asset_growth_rate | \n",
+ " INDUSTRY | \n",
+ "
\n",
+ " \n",
+ " date | \n",
+ " code | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 2011-01-31 | \n",
+ " 000001.XSHE | \n",
+ " 0.999736 | \n",
+ " NaN | \n",
+ " -3.047619 | \n",
+ " -19.077901 | \n",
+ " 0.7905 | \n",
+ " -0.000787 | \n",
+ " 1.659492 | \n",
+ " NaN | \n",
+ " 5.335556e+10 | \n",
+ " 0.190618 | \n",
+ " -1.218289 | \n",
+ " 0.217945 | \n",
+ " 金融服务I | \n",
+ "
\n",
+ " \n",
+ " 000002.XSHE | \n",
+ " 0.993440 | \n",
+ " 0.809803 | \n",
+ " -0.310078 | \n",
+ " -15.837696 | \n",
+ " 1.0840 | \n",
+ " 0.000219 | \n",
+ " 0.590631 | \n",
+ " -0.016202 | \n",
+ " 9.016072e+10 | \n",
+ " 0.140209 | \n",
+ " -0.152238 | \n",
+ " 0.451210 | \n",
+ " 金融服务I | \n",
+ "
\n",
+ " \n",
+ " 000009.XSHE | \n",
+ " 1.020497 | \n",
+ " 1.314595 | \n",
+ " 5.889015 | \n",
+ " 6.613455 | \n",
+ " 3.0240 | \n",
+ " 0.005626 | \n",
+ " -1.613173 | \n",
+ " -0.031213 | \n",
+ " 1.937173e+10 | \n",
+ " 0.129520 | \n",
+ " 1.432914 | \n",
+ " 0.355307 | \n",
+ " 金融服务I | \n",
+ "
\n",
+ " \n",
+ " 000012.XSHE | \n",
+ " 0.906201 | \n",
+ " 0.163172 | \n",
+ " -0.330852 | \n",
+ " -4.968454 | \n",
+ " 2.2345 | \n",
+ " -0.001619 | \n",
+ " -1.432409 | \n",
+ " 0.049135 | \n",
+ " 3.940520e+10 | \n",
+ " 0.217347 | \n",
+ " -0.763299 | \n",
+ " 0.066277 | \n",
+ " 金融服务I | \n",
+ "
\n",
+ " \n",
+ " 000021.XSHE | \n",
+ " 1.357714 | \n",
+ " 1.437908 | \n",
+ " -6.384977 | \n",
+ " -21.619497 | \n",
+ " 0.4765 | \n",
+ " -0.004083 | \n",
+ " -0.606767 | \n",
+ " 0.040464 | \n",
+ " 1.412947e+10 | \n",
+ " 0.092667 | \n",
+ " -2.018794 | \n",
+ " 0.225302 | \n",
+ " 金融服务I | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " LVGI ... INDUSTRY\n",
+ "date code ... \n",
+ "2011-01-31 000001.XSHE 0.999736 ... 金融服务I\n",
+ " 000002.XSHE 0.993440 ... 金融服务I\n",
+ " 000009.XSHE 1.020497 ... 金融服务I\n",
+ " 000012.XSHE 0.906201 ... 金融服务I\n",
+ " 000021.XSHE 1.357714 ... 金融服务I\n",
+ "\n",
+ "[5 rows x 13 columns]"
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 数据读取\n",
+ "factors = pd.read_csv('../Data/SVM.csv',index_col=[0,1],parse_dates=True)\n",
+ "factors.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {
+ "code_folding": [
+ 2,
+ 26,
+ 45,
+ 61
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "## step1:构建绝对中位数处理法函数\n",
+ "# data为输入的数据集,如果数值超过num个判断标准则使其等于num个标准\n",
+ "def extreme_process_MAD(df:pd.DataFrame, num:int=3)->pd.DataFrame:\n",
+ "\n",
+ " # 为不破坏原始数据,先对其进行拷贝\n",
+ " df_ = df.copy()\n",
+ " feature_names = [\n",
+ " i for i in df_.columns.tolist() if i not in ['INDUSTRY']\n",
+ " ] #获取数据集中需测试的因子名\n",
+ "\n",
+ " # 获取中位数\n",
+ " median = df_[feature_names].median(axis=0)\n",
+ "\n",
+ " # 按列索引匹配,并在行中广播\n",
+ " MAD = abs(df_[feature_names].sub(median, axis=1)).median(axis=0)\n",
+ "\n",
+ " # 利用clip()函数,将因子取值限定在上下限范围内,即用上下限来代替异常值\n",
+ " df_.loc[:, feature_names] = df_.loc[:, feature_names].clip(\n",
+ " lower=median - num * 1.4826 * MAD,\n",
+ " upper=median + num * 1.4826 * MAD,\n",
+ " axis=1)\n",
+ "\n",
+ " return df_\n",
+ "\n",
+ "\n",
+ "## step2:构建缺失值处理函数\n",
+ "def factors_null_process(df:pd.DataFrame)->pd.DataFrame:\n",
+ "\n",
+ " # 删除行业缺失值\n",
+ " df = df[df['INDUSTRY'].notnull()]\n",
+ "\n",
+ " # 变化索引,以行业为第一索引,股票代码为第二索引\n",
+ " df_ = df.reset_index().set_index(['INDUSTRY', 'code']).sort_index()\n",
+ "\n",
+ " # 用行业中位数填充\n",
+ " df_ = df_.groupby(\n",
+ " level=0).apply(lambda factor: factor.fillna(factor.median()))\n",
+ "\n",
+ " # 将索引换回\n",
+ " df_ = df_.reset_index().set_index('code').sort_index()\n",
+ "\n",
+ " return df_.drop('date', axis=1)\n",
+ "\n",
+ "\n",
+ "## step3:构建标准化处理函数\n",
+ "def data_scale_Z_Score(df:pd.DataFrame)->pd.DataFrame:\n",
+ " # 为不破坏原始数据,先对其进行拷贝\n",
+ "\n",
+ " df_ = df.copy()\n",
+ " feature_names = [\n",
+ " i for i in df_.columns.tolist() if i not in ['INDUSTRY']\n",
+ " ] #获取数据集中需测试的因子名\n",
+ "\n",
+ " df_.loc[:, feature_names] = (\n",
+ " df_.loc[:, feature_names] -\n",
+ " df_.loc[:, feature_names].mean()) / df_.loc[:, feature_names].std()\n",
+ "\n",
+ " return df_\n",
+ "\n",
+ "\n",
+ "## step4:构建对称正交变换函数\n",
+ "def lowdin_orthogonal(df:pd.DataFrame)->pd.DataFrame:\n",
+ "\n",
+ " df_ = df.copy()\n",
+ " # 除去第一列行业指标,将数据框转化为矩阵\n",
+ " col = [i for i in df_.columns.tolist() if i not in ['INDUSTRY']]\n",
+ " F = np.array(df_[col])\n",
+ " M = np.dot(F.T, F)\n",
+ " a, U = np.linalg.eig(M) #U为特征向量,a为特征值\n",
+ " one = np.identity(len(col))\n",
+ " D = one * a #生成有特征值组成的对角矩阵\n",
+ " D_inv = np.linalg.inv(D)\n",
+ " S = U.dot(np.sqrt(D_inv)).dot(U.T)\n",
+ " df_[col] = df_[col].dot(S)\n",
+ " return df_"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "factors1 = factors.groupby(level='date').apply(extreme_process_MAD) #去极值\n",
+ "factors2 = factors1.groupby(level='date').apply(factors_null_process) #去缺失值\n",
+ "factors3 = factors2.groupby(level='date').apply(data_scale_Z_Score) #标准化处理\n",
+ "factors4 = factors3.groupby(level='date').apply(lowdin_orthogonal) #对称正交化"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "MultiIndex: 31286 entries, (2011-01-31 00:00:00, 000001.XSHE) to (2019-12-31 00:00:00, 603993.XSHG)\n",
+ "Data columns (total 13 columns):\n",
+ "LVGI 30203 non-null float64\n",
+ "PEG 19212 non-null float64\n",
+ "ROC20 31286 non-null float64\n",
+ "ROC60 31286 non-null float64\n",
+ "VOL20 31286 non-null float64\n",
+ "Volume1M 31286 non-null float64\n",
+ "book_to_price_ratio 31286 non-null float64\n",
+ "cfo_to_ev 30309 non-null float64\n",
+ "market_cap 31286 non-null float64\n",
+ "roe_ttm 31268 non-null float64\n",
+ "sharpe_ratio_20 31282 non-null float64\n",
+ "total_asset_growth_rate 30632 non-null float64\n",
+ "INDUSTRY 31286 non-null object\n",
+ "dtypes: float64(12), object(1)\n",
+ "memory usage: 4.4+ MB\n",
+ "None\n",
+ "\n",
+ "MultiIndex: 31286 entries, (2011-01-31 00:00:00, 000001.XSHE) to (2019-12-31 00:00:00, 603993.XSHG)\n",
+ "Data columns (total 13 columns):\n",
+ "INDUSTRY 31286 non-null object\n",
+ "LVGI 31286 non-null float64\n",
+ "PEG 31286 non-null float64\n",
+ "ROC20 31286 non-null float64\n",
+ "ROC60 31286 non-null float64\n",
+ "VOL20 31286 non-null float64\n",
+ "Volume1M 31286 non-null float64\n",
+ "book_to_price_ratio 31286 non-null float64\n",
+ "cfo_to_ev 31286 non-null float64\n",
+ "market_cap 31286 non-null float64\n",
+ "roe_ttm 31286 non-null float64\n",
+ "sharpe_ratio_20 31286 non-null float64\n",
+ "total_asset_growth_rate 31286 non-null float64\n",
+ "dtypes: float64(12), object(1)\n",
+ "memory usage: 4.4+ MB\n",
+ "去缺失值:\n",
+ "None\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(factors.info())\n",
+ "print('去缺失值:\\n%s'%factors2.info())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 四、以对称正交后的因子收益作为预测目标变量\n",
+ "\n",
+ "因子择时是通过对预期有效的因子赋予较大的权重,对预期失效的因子赋予较小的权重或剔除来实现因子权重的动态配置,进而提高组合收益,所以因子有效性衡量指标的选择显得尤为重要。IC、IR、因子收益等指标都可用来衡量因子的有效性,但由于因子IC是秩相关系数,找到影响相关系数的解释变量比较困难,而因子收益可以类比股票多空组合的收益,从中筛选出有效的解释变量的可能性更大,所以我们**采用因子收益(即因子横截面回归的斜率)来衡量因子有效性**。但回归之前需要对因子进行对称正交处理,对称正交后的截面因子值两两正交,**这保证了回归法计算各个因子收益时,因子间不会出现重复暴露某一风格的情况,从而回归得到的因子收益更具有代表性,也更适合作为分类预测模型的目标变量**\n",
+ " \n",
+ "## 4.1 对称正交前后因子横截面相关系数对比"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {
+ "code_folding": [
+ 1
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "# 构建计算横截面因子载荷相关系数均值函数\n",
+ "def get_relations(datas: pd.DataFrame)->pd.DataFrame:\n",
+ " '''\n",
+ " datas:MultiIndex date,code columns->factor_name\n",
+ " '''\n",
+ " dates = set(datas.index.get_level_values(0))\n",
+ " \n",
+ " relations = 0\n",
+ " \n",
+ " for date,data in datas.groupby(level='date'):\n",
+ "\n",
+ " # data为提取横截面因子数据\n",
+ " relations = relations + data.corr() # 计算相关系数\n",
+ " \n",
+ " return relations / len(dates) # relations_mean计算横截面因子载荷相关系数均值\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {
+ "code_folding": [
+ 4
+ ]
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 24,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "#绘制因子正交前的相关性的热力图\n",
+ "fig = plt.figure(figsize=(26, 7))\n",
+ "relations = get_relations(factors3) #计算对称正交之前的相关系数矩阵\n",
+ "\n",
+ "sns.heatmap(\n",
+ " relations,\n",
+ " annot=True,\n",
+ " linewidths=0.05,\n",
+ " linecolor='white',\n",
+ " annot_kws={\n",
+ " 'size': 8,\n",
+ " 'weight': 'bold'\n",
+ " })"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {
+ "code_folding": [
+ 0
+ ]
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 25,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# 绘制因子正交后的相关性热力图\n",
+ "fig = plt.figure(figsize=(18, 8))\n",
+ "relations = get_relations(factors4) #计算对称正交之后的相关系数矩阵\n",
+ "\n",
+ "sns.heatmap(\n",
+ " relations,\n",
+ " annot=True,\n",
+ " linewidths=0.05,\n",
+ " linecolor='white',\n",
+ " annot_kws={\n",
+ " 'size': 8,\n",
+ " 'weight': 'bold'\n",
+ " })"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "从原始的横截面因子载荷(去极值中性化后)相关系数矩阵热力图中可以看出,有些因子之间的共线性确实较为明显;但将因子对称正交后,横截面因子载荷的相关性基本降低为0,共线性已基本消除。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 4.2 计算对称正交后的因子收益\n",
+ "\n",
+ "采用`截面回归`的方法计算因子收益,在对每个因子进行回归时需要加入申万一级行业虚拟变量和市值因子剔除行业因素和市值因素,但计算规模因子(对数市值)时只剔除行业。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {
+ "code_folding": [
+ 1
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "# 提取市值因子并计算未来一期对数收益率\n",
+ "def GetRetCap(factors: pd.DataFrame) -> pd.DataFrame:\n",
+ " '''\n",
+ " factors:因子值的df MultiIndex date,code\n",
+ " ==================\n",
+ " return df next_ret,market_cap\n",
+ " '''\n",
+ " \n",
+ " dates = [x.date() for x in factors.index.levels[0]]\n",
+ " start_date = min(dates)\n",
+ " end_date = max(dates)\n",
+ " \n",
+ " # 将最后一天的日期再往后推30天,这样能将最后一个调仓日往后推一个月\n",
+ " target = end_date + relativedelta(months=1)\n",
+ " monthCountDay = calendar.monthrange(target.year, target.month)[1]\n",
+ " offset_day = datetime.date(target.year, target.month, day=monthCountDay)\n",
+ " dates.append(offset_day)\n",
+ "\n",
+ " datas_dic = {}\n",
+ " n = len(dates) - 1\n",
+ " \n",
+ " for i in tqdm(range(n), desc='DownLoad NetRet'):\n",
+ "\n",
+ " date = dates[i]\n",
+ "\n",
+ " date_next = dates[i + 1]\n",
+ "\n",
+ " stocks = factors.loc[date].index.tolist() #提取股票池\n",
+ " \n",
+ " #计算对数收益率\n",
+ " close_df = get_price(\n",
+ " stocks, end_date=date, count=1, fields='close',\n",
+ " panel=False).set_index('code')['close']\n",
+ "\n",
+ " next_df = get_price(\n",
+ " stocks, end_date=date_next, count=1, fields='close',\n",
+ " panel=False).set_index('code')['close']\n",
+ "\n",
+ " df = np.log(next_df / close_df) #计算对数收益率\n",
+ " df = df.to_frame('log_ret')\n",
+ " \n",
+ " #提取总市值\n",
+ " df['cap'] = get_valuation(\n",
+ " stocks, end_date=date, fields='market_cap',\n",
+ " count=1).set_index('code')['market_cap']\n",
+ "\n",
+ " #存储数\n",
+ " datas_dic[date] = df[['cap', 'log_ret']]\n",
+ "\n",
+ " datas_df = pd.concat(datas_dic)\n",
+ " datas_df.index.names = ['date','code']\n",
+ " datas_df = datas_df.reset_index()\n",
+ " datas_df['date'] = pd.to_datetime(datas_df['date'])\n",
+ " datas_df.set_index(['date','code'],inplace=True)\n",
+ " \n",
+ " return datas_df"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 67,
+ "metadata": {
+ "scrolled": false
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "DownLoad NetRet: 100%|██████████| 108/108 [00:22<00:00, 4.60it/s]\n"
+ ]
+ }
+ ],
+ "source": [
+ "ret_cap = GetRetCap(factors) # 对数收益计算较慢,代码运行时间较长"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 70,
+ "metadata": {
+ "code_folding": []
+ },
+ "outputs": [],
+ "source": [
+ "# 存储对称正交变换后的数据\n",
+ "datas_all = pd.concat([factors4, ret_cap], axis=1, join='inner') # 将数据合并入原始数据中\n",
+ "datas_all.index.names = ['date', 'code']\n",
+ "datas_all.to_csv('../Data/SVM_timing_datas.csv') # 将数据存入数据文件中\n",
+ "\n",
+ "# 存储对称正交前的数据\n",
+ "datas_all = pd.concat([factors3, ret_cap], axis=1, join='inner') # 将数据合并入原始数据中\n",
+ "datas_all.index.names = ['date', 'code']\n",
+ "datas_all.to_csv('../Data/noorth_SVM_timing_datas.csv') # 将数据存入数据文件中"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " | \n",
+ " INDUSTRY | \n",
+ " LVGI | \n",
+ " PEG | \n",
+ " ROC20 | \n",
+ " ROC60 | \n",
+ " VOL20 | \n",
+ " Volume1M | \n",
+ " book_to_price_ratio | \n",
+ " cfo_to_ev | \n",
+ " market_cap | \n",
+ " roe_ttm | \n",
+ " sharpe_ratio_20 | \n",
+ " total_asset_growth_rate | \n",
+ " cap | \n",
+ " log_ret | \n",
+ "
\n",
+ " \n",
+ " date | \n",
+ " code | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 2019-12-31 | \n",
+ " 603501.XSHG | \n",
+ " 有色金属I | \n",
+ " 0.022625 | \n",
+ " -0.036422 | \n",
+ " -0.167140 | \n",
+ " 0.168606 | \n",
+ " 0.106633 | \n",
+ " -0.024207 | \n",
+ " -0.075915 | \n",
+ " -0.039279 | \n",
+ " 0.063812 | \n",
+ " -0.143146 | \n",
+ " -0.024140 | \n",
+ " 0.127055 | \n",
+ " 1238.4915 | \n",
+ " 0.257686 | \n",
+ "
\n",
+ " \n",
+ " 603833.XSHG | \n",
+ " 有色金属I | \n",
+ " -0.081008 | \n",
+ " 0.048392 | \n",
+ " 0.117325 | \n",
+ " -0.033240 | \n",
+ " 0.031121 | \n",
+ " -0.071707 | \n",
+ " -0.017307 | \n",
+ " 0.012956 | \n",
+ " -0.041966 | \n",
+ " 0.041899 | \n",
+ " 0.001852 | \n",
+ " 0.119194 | \n",
+ " 491.5991 | \n",
+ " -0.034786 | \n",
+ "
\n",
+ " \n",
+ " 603899.XSHG | \n",
+ " 有色金属I | \n",
+ " 0.068465 | \n",
+ " 0.020370 | \n",
+ " 0.013386 | \n",
+ " 0.025082 | \n",
+ " -0.057924 | \n",
+ " -0.005205 | \n",
+ " -0.045873 | \n",
+ " -0.011351 | \n",
+ " -0.053812 | \n",
+ " 0.078045 | \n",
+ " -0.045636 | \n",
+ " 0.077171 | \n",
+ " 448.4080 | \n",
+ " 0.027533 | \n",
+ "
\n",
+ " \n",
+ " 603986.XSHG | \n",
+ " 有色金属I | \n",
+ " 0.011945 | \n",
+ " 0.145880 | \n",
+ " 0.091352 | \n",
+ " 0.130843 | \n",
+ " 0.078486 | \n",
+ " -0.002135 | \n",
+ " -0.007030 | \n",
+ " -0.011422 | \n",
+ " -0.011045 | \n",
+ " -0.031203 | \n",
+ " -0.104375 | \n",
+ " 0.134126 | \n",
+ " 657.8523 | \n",
+ " 0.322953 | \n",
+ "
\n",
+ " \n",
+ " 603993.XSHG | \n",
+ " 有色金属I | \n",
+ " -0.029949 | \n",
+ " -0.020104 | \n",
+ " 0.084894 | \n",
+ " 0.019199 | \n",
+ " -0.024283 | \n",
+ " 0.086690 | \n",
+ " -0.033563 | \n",
+ " -0.043170 | \n",
+ " 0.025330 | \n",
+ " -0.067603 | \n",
+ " 0.075067 | \n",
+ " 0.006625 | \n",
+ " 941.7269 | \n",
+ " -0.056619 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " INDUSTRY LVGI ... cap log_ret\n",
+ "date code ... \n",
+ "2019-12-31 603501.XSHG 有色金属I 0.022625 ... 1238.4915 0.257686\n",
+ " 603833.XSHG 有色金属I -0.081008 ... 491.5991 -0.034786\n",
+ " 603899.XSHG 有色金属I 0.068465 ... 448.4080 0.027533\n",
+ " 603986.XSHG 有色金属I 0.011945 ... 657.8523 0.322953\n",
+ " 603993.XSHG 有色金属I -0.029949 ... 941.7269 -0.056619\n",
+ "\n",
+ "[5 rows x 15 columns]"
+ ]
+ },
+ "execution_count": 12,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 读取对称正交后的数据集用于计算因子收益\n",
+ "datas_all = pd.read_csv(\n",
+ " '../Data/SVM_timing_datas.csv', index_col=[0, 1]) \n",
+ "datas_all.tail()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {
+ "code_folding": [
+ 1
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "#计算因子收益\n",
+ "def Neu_Ret(datas_all:pd.DataFrame)->pd.DataFrame:\n",
+ " \n",
+ " '''\n",
+ " datas_all:df MultiIndex date,code\n",
+ " ==========\n",
+ " return ser 因子收益\n",
+ " index-date columns-factor_name\n",
+ " '''\n",
+ " \n",
+ " # 提取因子名\n",
+ " factors = [\n",
+ " i for i in datas_all.columns.tolist()\n",
+ " if i not in ['INDUSTRY', 'cap', 'log_ret']\n",
+ " ] \n",
+ " \n",
+ " dates = datas_all.index.get_level_values(0).unique()\n",
+ " \n",
+ " # 创建初始因子收益矩阵\n",
+ " ret = pd.DataFrame(index=dates, columns=factors) \n",
+ " \n",
+ " \n",
+ " for trDate,data in datas_all.groupby(level='date'):\n",
+ " \n",
+ "\n",
+ " data = data.fillna(data.mean()) # 均值填充缺失值\n",
+ " dfswsdummies = pd.get_dummies(data['INDUSTRY']) # 得到申万一级行业虚拟变量\n",
+ " ret_dic = {}\n",
+ " \n",
+ " for label,factor_df in data.loc[:,factors].items():\n",
+ " \n",
+ " X = pd.concat([factor_df,dfswsdummies],axis=1) # 提取回归时的自变量\n",
+ " Y = data[['log_ret']] # 提取回归的因变量\n",
+ " ret.loc[trDate,label] = sm.regression.linear_model.OLS(\n",
+ " Y, X, missing='drop').fit().params[0] # 进行OLS回归\n",
+ " \n",
+ " return ret"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " LVGI | \n",
+ " PEG | \n",
+ " ROC20 | \n",
+ " ROC60 | \n",
+ " VOL20 | \n",
+ " Volume1M | \n",
+ " book_to_price_ratio | \n",
+ " cfo_to_ev | \n",
+ " market_cap | \n",
+ " roe_ttm | \n",
+ " sharpe_ratio_20 | \n",
+ " total_asset_growth_rate | \n",
+ "
\n",
+ " \n",
+ " date | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 2019-08-30 | \n",
+ " 0.0316327 | \n",
+ " -0.0973586 | \n",
+ " -0.0374093 | \n",
+ " -0.0431525 | \n",
+ " 0.231561 | \n",
+ " -0.122654 | \n",
+ " 0.150164 | \n",
+ " 0.0259805 | \n",
+ " -0.0801106 | \n",
+ " 0.0379536 | \n",
+ " -0.0885412 | \n",
+ " 0.118475 | \n",
+ "
\n",
+ " \n",
+ " 2019-09-30 | \n",
+ " 0.148354 | \n",
+ " -0.0264201 | \n",
+ " 0.0633897 | \n",
+ " 0.211222 | \n",
+ " -0.223375 | \n",
+ " -0.110421 | \n",
+ " 0.0286347 | \n",
+ " 0.0115868 | \n",
+ " 0.136627 | \n",
+ " 0.215342 | \n",
+ " 0.178958 | \n",
+ " 0.0819617 | \n",
+ "
\n",
+ " \n",
+ " 2019-10-31 | \n",
+ " -0.0367791 | \n",
+ " 0.011632 | \n",
+ " -0.11648 | \n",
+ " 0.124841 | \n",
+ " -0.0176064 | \n",
+ " 0.122933 | \n",
+ " 0.0141714 | \n",
+ " 0.130662 | \n",
+ " -0.19441 | \n",
+ " -0.0734863 | \n",
+ " -0.00182098 | \n",
+ " 0.0441613 | \n",
+ "
\n",
+ " \n",
+ " 2019-11-29 | \n",
+ " -0.19865 | \n",
+ " -0.102524 | \n",
+ " 0.151257 | \n",
+ " 0.0527727 | \n",
+ " 0.326576 | \n",
+ " 0.109926 | \n",
+ " -0.0871529 | \n",
+ " 0.0921143 | \n",
+ " -0.104464 | \n",
+ " -0.124047 | \n",
+ " -0.0897981 | \n",
+ " 0.248464 | \n",
+ "
\n",
+ " \n",
+ " 2019-12-31 | \n",
+ " 0.295504 | \n",
+ " 0.173699 | \n",
+ " -0.0557015 | \n",
+ " 0.256275 | \n",
+ " 0.340828 | \n",
+ " -0.101772 | \n",
+ " -0.391574 | \n",
+ " -0.0209924 | \n",
+ " -0.0792407 | \n",
+ " -0.0346713 | \n",
+ " -0.297258 | \n",
+ " 0.275343 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " LVGI ... total_asset_growth_rate\n",
+ "date ... \n",
+ "2019-08-30 0.0316327 ... 0.118475\n",
+ "2019-09-30 0.148354 ... 0.0819617\n",
+ "2019-10-31 -0.0367791 ... 0.0441613\n",
+ "2019-11-29 -0.19865 ... 0.248464\n",
+ "2019-12-31 0.295504 ... 0.275343\n",
+ "\n",
+ "[5 rows x 12 columns]"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "Ret_mat=Neu_Ret(datas_all) #此处使用对称正交后的因子计算的因子收益\n",
+ "Ret_mat.tail()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 五、择时变量的选择\n",
+ "\n",
+ "从`宏观经济环境`,`货币政策`,`市场状态变量`,以及`因子自身收益与波动`情况四方面入手,寻找因子收益预测的解释变量,最终选择如下择时变量,其中:\n",
+ "\n",
+ "+ 宏观经济环境:用于衡量经济是否健康或是否具有通货膨胀风险;\n",
+ "\n",
+ "\n",
+ "+ 货币政策:用于反映国家对经济的宏观调控走向;\n",
+ "\n",
+ "\n",
+ "+ 市场状态变量:用于衡量股票或债券市场的状态\n",
+ "\n",
+ "\n",
+ "+ 因子自身收益及波动:因子自身收益变化对预测因子未来有效性也起到了重要的作用。\n",
+ "\n",
+ "\n",
+ "**择时变量明细表:**\n",
+ "\n",
+ "|类型|指标代码|指标名称|说明|\n",
+ "|:---:|:---:|:---:|:---:|\n",
+ "|货币政策|Bond_yield_3M|3个月国债收益率|-|\n",
+ "|货币政策|M1|M1货币供应量同比增长率|-|\n",
+ "|经济环境|CPI|CPI同比增长率|-|\n",
+ "|经济环境|PPI|PPI同比增长率|-|\n",
+ "|市场状态|Risk|风险溢价指标|股息率-10年国债到期收益率|\n",
+ "|市场状态|TS|期限利差|10年国债到期收益率-1年国债到期收益率|\n",
+ "|市场状态|CS|信用利差|1年中债中短期票据到期收益率-1年国债到期收益率|\n",
+ "|市场状态|RET_300|沪深300收益率|月度|\n",
+ "|市场状态|RET_1000|中证1000收益率|月度|\n",
+ "|市场状态|STD_300|沪深300波动率|月度|\n",
+ "|市场状态|STD_1000|中证1000波动率|月度|\n",
+ "|市场状态|RET_Spread|大小盘收益差值|RET_300-RET_1000|\n",
+ "|市场状态|STD_Spread|大小盘波动差值|STD_300-STD_1000|\n",
+ "|因子收益衍生|Ret_Factor|因子收益|6个月加权移动平均|\n",
+ "|因子收益衍生|Std_Factor|因子波动|6个月加权移动平均|\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {
+ "code_folding": [
+ 1,
+ 29,
+ 48,
+ 68,
+ 79,
+ 93,
+ 105,
+ 124,
+ 143,
+ 155,
+ 166,
+ 193,
+ 226,
+ 259
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "# 获取择时因子\n",
+ "def get_Timing_variables(begdate:str, enddate:str)->pd.DataFrame:\n",
+ "\n",
+ " Bond_yield_3M = Get_Bond_yield_3M(begdate, enddate)\n",
+ " M1 = GetMacro_M1(begdate, enddate)\n",
+ " CPI = GetMacro_CPI(begdate, enddate)\n",
+ " PPI = GetMacro_PPI(begdate, enddate)\n",
+ " \n",
+ " RISK = GetRiskIndex(begdate,enddate)\n",
+ " TS = Get_TS(begdate, enddate)\n",
+ " CS = Get_CS(begdate, enddate)\n",
+ " RET_300 = GetIndexMRet('000300.XSHG', begdate, enddate, 'RET_300')\n",
+ " RET_1000 = GetIndexMRet('000852.XSHG', begdate, enddate, 'RET_1000')\n",
+ "\n",
+ " STD_300 = GetIndexMSTD('000300.XSHG', begdate, enddate, 'STD_300')\n",
+ " STD_1000 = GetIndexMSTD('000852.XSHG', begdate, enddate, 'STD_1000')\n",
+ "\n",
+ " RET_Spread = RET_300['RET_300'] - RET_1000['RET_1000']\n",
+ " RET_Spread.name = 'RET_Spread'\n",
+ " \n",
+ " STD_Spread = STD_300['STD_300'] - STD_1000['STD_1000']\n",
+ " STD_Spread.name = 'STD_Spread'\n",
+ " \n",
+ " return pd.concat([\n",
+ " Bond_yield_3M, M1, CPI, PPI, TS, CS, RET_300, RET_1000, STD_300,\n",
+ " STD_1000, RET_Spread, STD_Spread,RISK\n",
+ " ],\n",
+ " axis=1)\n",
+ " \n",
+ "def GetMacro_M1(start_date:str,end_date:str)->pd.DataFrame:\n",
+ " '''\n",
+ " start_date/end_date:YYYY-MM-dd\n",
+ " '''\n",
+ " # 获取M1同比\n",
+ " q = query(macro.MAC_MONEY_SUPPLY_MONTH.stat_month,\n",
+ " macro.MAC_MONEY_SUPPLY_MONTH.m1_yoy).filter(\n",
+ " macro.MAC_MONEY_SUPPLY_MONTH.stat_month>=start_date[:7],\n",
+ " macro.MAC_MONEY_SUPPLY_MONTH.stat_month<=end_date[:7])\n",
+ " \n",
+ " df = macro.run_query(q)\n",
+ " \n",
+ " df['stat_month'] = pd.to_datetime(df['stat_month'])\n",
+ "\n",
+ " dates = GetTradePerid(start_date,end_date)\n",
+ " idx = pd.DataFrame({'stat_month':pd.to_datetime(dates)})\n",
+ "\n",
+ " return pd.merge_asof(idx,df.sort_values('stat_month'),on='stat_month').set_index('stat_month')\n",
+ "\n",
+ "def GetMacro_CPI(start_date:str,end_date:str)->pd.DataFrame:\n",
+ " '''\n",
+ " start_date/end_date:YYYY-MM-DD\n",
+ " '''\n",
+ " # 获取cpi同比\n",
+ " q = query(macro.MAC_CPI_MONTH.stat_month,\n",
+ " macro.MAC_CPI_MONTH.yoy).filter(\n",
+ " macro.MAC_CPI_MONTH.stat_month>=start_date[:7],\n",
+ " macro.MAC_CPI_MONTH.stat_month<=end_date[:7])\n",
+ " \n",
+ " df = macro.run_query(q)\n",
+ " df.rename(columns={'yoy':'cpi_yoy'},inplace=True)\n",
+ " df['stat_month'] = pd.to_datetime(df['stat_month'])\n",
+ "\n",
+ " dates = GetTradePerid(start_date,end_date)\n",
+ " idx = pd.DataFrame({'stat_month':pd.to_datetime(dates)})\n",
+ "\n",
+ " return pd.merge_asof(idx,df.sort_values('stat_month'),on='stat_month').set_index('stat_month')\n",
+ "\n",
+ "\n",
+ "def GetMacro_PPI(start_date:str,end_date:str)->pd.DataFrame:\n",
+ " \n",
+ " df = macro_china_ppi_yearly()\n",
+ " df = df.loc[start_date:end_date]\n",
+ " df.index.names = ['date']\n",
+ " \n",
+ " dates = GetTradePerid(start_date,end_date)\n",
+ " idx = pd.DataFrame({'date':pd.to_datetime(dates)})\n",
+ " \n",
+ " return pd.merge_asof(idx,df.reset_index(),on='date').set_index('date')\n",
+ "\n",
+ "def GetRiskIndex(start_date:str,end_date:str)->pd.DataFrame:\n",
+ " \n",
+ " # 获取10年国债到期收益\n",
+ " yeild = get_bond_yield(start_date,end_date,10,'hzsylqx')\n",
+ " yeild.columns = ['n','date','yeild']\n",
+ " yeild.index = pd.to_datetime(yeild['date'])\n",
+ "\n",
+ " # 股息率使用中证红利代表全市场\n",
+ " DividendRatio = getIndexDividendRatio('000922',start_date,end_date)\n",
+ " risk = DividendRatio['DividendRatio'] - yeild['yeild']\n",
+ " dates = GetTradePerid(start_date,end_date)\n",
+ " risk = risk.reindex(dates).to_frame('Risk')\n",
+ " return risk.fillna(0)\n",
+ "\n",
+ "def Get_Bond_yield_3M(start_date:str,end_date:str)->pd.DataFrame:\n",
+ " '''\n",
+ " 单位:%\n",
+ " '''\n",
+ " df = get_bond_yield(start_date,end_date,0.25,'hzsylqx')\n",
+ " df.columns = ['n','date','yeild']\n",
+ " df.index = pd.to_datetime(df['date'])\n",
+ " dates = GetTradePerid(start_date,end_date)\n",
+ " \n",
+ " return df.reindex(dates)['yeild'].to_frame('Bond_yield_3M')\n",
+ "\n",
+ "\n",
+ "def Get_TS(start_date:str,end_date:str)->pd.DataFrame:\n",
+ " '''\n",
+ " 单位:%\n",
+ " '''\n",
+ " long = get_bond_yield(start_date,end_date,10,'hzsylqx')\n",
+ " long.columns = ['n','date','yeild']\n",
+ " short = get_bond_yield(start_date,end_date,1,'hzsylqx')\n",
+ " short.columns = ['n','date','yeild']\n",
+ " \n",
+ " long = long.set_index('date')\n",
+ " short = short.set_index('date')\n",
+ " \n",
+ " ts_ser = long['yeild'] - short['yeild']\n",
+ " ts_ser.index = pd.to_datetime(ts_ser.index)\n",
+ " dates = GetTradePerid(start_date,end_date)\n",
+ " \n",
+ " return ts_ser.reindex(dates).to_frame('TS')\n",
+ "\n",
+ "\n",
+ "def Get_CS(start_date:str,end_date:str)->pd.DataFrame:\n",
+ " '''\n",
+ " 单位:%\n",
+ " '''\n",
+ " long = get_bond_yield(start_date,end_date,1,'syyhsylqx')\n",
+ " long.columns = ['n','date','yeild']\n",
+ " short = get_bond_yield(start_date,end_date,1,'hzsylqx')\n",
+ " short.columns = ['n','date','yeild']\n",
+ " \n",
+ " long = long.set_index('date')\n",
+ " short = short.set_index('date')\n",
+ " \n",
+ " ts_ser = long['yeild'] - short['yeild']\n",
+ " ts_ser.index = pd.to_datetime(ts_ser.index)\n",
+ " dates = GetTradePerid(start_date,end_date)\n",
+ " \n",
+ " return ts_ser.reindex(dates).to_frame('CS')\n",
+ "\n",
+ "\n",
+ "def GetIndexMRet(symbol:str,start_date:str,end_date:str,name:str)->pd.DataFrame:\n",
+ " '''\n",
+ " name :设置收益名称\n",
+ " '''\n",
+ " begin_date = pd.date_range(end=start_date,periods=1,freq='M')[0].strftime('%Y-%m-%d')\n",
+ " dates = GetTradePerid(begin_date,end_date)\n",
+ " index_price = get_price(symbol,begin_date,end_date,fields='close',panel=False)\n",
+ " \n",
+ " index_price = index_price.reindex(dates)\n",
+ " \n",
+ " return index_price['close'].pct_change().dropna().to_frame(name)\n",
+ "\n",
+ "def GetIndexMSTD(symbol:str,start_date:str,end_date:str,name:str)->pd.DataFrame:\n",
+ " \n",
+ " index_price = get_price(symbol,start_date,end_date,fields='close',panel=False)\n",
+ " std_df = index_price.groupby(pd.Grouper(freq='M'),as_index=False).apply(lambda x:np.std(x.pct_change())*np.sqrt(20))\n",
+ " dates = GetTradePerid(start_date,end_date)\n",
+ " std_df.index = dates\n",
+ " return std_df.rename(columns={'close':name})\n",
+ " \n",
+ "\n",
+ "# 单次返回所有\n",
+ "# 金十数据中心-经济指标-中国-国民经济运行状况-物价水平-中国PPI年率报告\n",
+ "def macro_china_ppi_yearly():\n",
+ " \"\"\"\n",
+ " 中国年度PPI数据, 数据区间从19950801-至今\n",
+ " https://datacenter.jin10.com/reportType/dc_chinese_ppi_yoy\n",
+ " :return: pandas.Series\n",
+ " \"\"\"\n",
+ " t = time.time()\n",
+ "\n",
+ " JS_CHINA_PPI_YEARLY_URL = (\n",
+ " \"https://cdn.jin10.com/dc/reports/dc_chinese_ppi_yoy_all.js?v={}&_={}\")\n",
+ "\n",
+ " res = requests.get(\n",
+ " JS_CHINA_PPI_YEARLY_URL.format(\n",
+ " str(int(round(t * 1000))), str(int(round(t * 1000)) + 90)\n",
+ " )\n",
+ " )\n",
+ " json_data = json.loads(res.text[res.text.find(\"{\") : res.text.rfind(\"}\") + 1])\n",
+ " date_list = [item[\"date\"] for item in json_data[\"list\"]]\n",
+ " value_list = [item[\"datas\"][\"中国PPI年率报告\"] for item in json_data[\"list\"]]\n",
+ " value_df = pd.DataFrame(value_list)\n",
+ " value_df.columns = json_data[\"kinds\"]\n",
+ " value_df.index = pd.to_datetime(date_list)\n",
+ " temp_df = value_df[\"今值(%)\"]\n",
+ " temp_df.name = \"ppi\"\n",
+ " return temp_df\n",
+ "\n",
+ "\n",
+ "def bond_china_yield(start_date:str, end_date:str,gjqx:int,qxId:str=\"hzsylqx\"):\n",
+ " \"\"\"\n",
+ " 中国债券信息网-国债及其他债券收益率曲线\n",
+ " https://www.chinabond.com.cn/\n",
+ " http://yield.chinabond.com.cn/cbweb-pbc-web/pbc/historyQuery?startDate=2019-02-07&endDate=2020-02-04&gjqx=0&qxId=ycqx&locale=cn_ZH\n",
+ " 注意: end_date - start_date 应该小于一年\n",
+ " :param start_date: 需要查询的日期, 返回在该日期之后一年内的数据\n",
+ " gjqx 为收益率的年限\n",
+ " hzsylqx是中债登国债收益曲线、syyhsylqx是中债登商业银行普通债收益率曲线、zdqpjsylqx是中债登短期票据收\n",
+ " :type start_date: str\n",
+ " :param end_date: 需要查询的日期, 返回在该日期之前一年内的数据\n",
+ " :type end_date: str\n",
+ " :return: 返回在指定日期之间之前一年内的数据\n",
+ " :rtype: pandas.DataFrame\n",
+ " \"\"\"\n",
+ " url = \"http://yield.chinabond.com.cn/cbweb-pbc-web/pbc/historyQuery\"\n",
+ " params = {\n",
+ " \"startDate\": start_date,\n",
+ " \"endDate\": end_date,\n",
+ " \"gjqx\": str(gjqx),\n",
+ " \"qxId\": qxId,\n",
+ " \"locale\": \"cn_ZH\",\n",
+ " }\n",
+ " headers = {\n",
+ " \"User-Agent\":\n",
+ " \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36\",\n",
+ " }\n",
+ " res = requests.get(url, params=params, headers=headers)\n",
+ " data_text = res.text.replace(\" \", \"\")\n",
+ " data_df = pd.read_html(data_text, header=0)[1]\n",
+ " return data_df\n",
+ "\n",
+ "\n",
+ "def get_bond_yield(start_date: str, end_date: str,periods:int,bond_type:str):\n",
+ " '''\n",
+ " periods:债券期限\n",
+ " bond_type:债券类型\n",
+ " '''\n",
+ " dates = get_trade_days(start_date, end_date)\n",
+ " n_days = len(dates)\n",
+ " limit = 244\n",
+ "\n",
+ " if n_days > limit:\n",
+ "\n",
+ " n = n_days // limit\n",
+ " df_list = []\n",
+ " i = 0\n",
+ " pos1, pos2 = n * i, n * (i + 1) - 1\n",
+ " while pos2 < n_days:\n",
+ " #print(pos2)\n",
+ " df = bond_china_yield(start_date=dates[pos1], end_date=dates[pos2],gjqx=periods,qxId=bond_type)\n",
+ " df_list.append(df)\n",
+ " i += 1\n",
+ " pos1, pos2 = n * i, n * (i + 1) - 1\n",
+ "\n",
+ " if pos1 < n_days:\n",
+ " df = bond_china_yield(start_date=dates[pos1], end_date=dates[-1],gjqx=periods,qxId=bond_type)\n",
+ " df_list.append(df)\n",
+ " df = pd.concat(df_list, axis=0)\n",
+ " else:\n",
+ " df = bond_china_yield(start_date=start_date, end_date=end_date,gjqx=periods,qxId=bond_type)\n",
+ "\n",
+ " return df.dropna(axis=1)\n",
+ "\n",
+ "# 查询指数股息率\n",
+ "# 估值衍生\n",
+ "def getIndexDividendRatio(symbol: str, start_date: str, end_date: str):\n",
+ "\n",
+ " # 查询内部编码\n",
+ " InnerCode_id = jy.run_query(\n",
+ " query(jy.LC_IndexRelationship.InnerCode).filter(\n",
+ " jy.LC_IndexRelationship.SecuCode == symbol))['InnerCode'][0]\n",
+ "\n",
+ " # 查询数据\n",
+ " q = query(jy.LC_IndexDerivative.TradingDay,\n",
+ " jy.LC_IndexDerivative.DividendRatio).filter(\n",
+ " jy.LC_IndexDerivative.IndexCode == InnerCode_id,\n",
+ " jy.LC_IndexDerivative.TradingDay >= start_date,\n",
+ " jy.LC_IndexDerivative.TradingDay <= end_date)\n",
+ "\n",
+ " return jy.run_query(q).set_index('TradingDay')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 189,
+ "metadata": {
+ "code_folding": [
+ 0
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "## 设置回测区间\n",
+ "start_date = '2014-01-01'\n",
+ "end_date = '2019-12-31'\n",
+ "\n",
+ "# 前推36个月\n",
+ "begin_date = pd.date_range(end=start_date,periods=36,freq='M')[0].strftime('%Y-%m-%d')\n",
+ "\n",
+ "df_m_shifted_=get_Timing_variables(begin_date,end_date)\n",
+ "df_m_shifted_.to_csv('../Data/Timing_variables.csv')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Bond_yield_3M | \n",
+ " m1_yoy | \n",
+ " cpi_yoy | \n",
+ " ppi | \n",
+ " TS | \n",
+ " CS | \n",
+ " RET_300 | \n",
+ " RET_1000 | \n",
+ " STD_300 | \n",
+ " STD_1000 | \n",
+ " RET_Spread | \n",
+ " STD_Spread | \n",
+ " Risk | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 2019-08-30 | \n",
+ " 2.4480 | \n",
+ " 3.4 | \n",
+ " 0.031 | \n",
+ " -0.3 | \n",
+ " 0.4394 | \n",
+ " 0.3891 | \n",
+ " -0.009326 | \n",
+ " -0.003294 | \n",
+ " 0.047808 | \n",
+ " 0.056339 | \n",
+ " -0.006032 | \n",
+ " -0.008531 | \n",
+ " 1.3108 | \n",
+ "
\n",
+ " \n",
+ " 2019-09-30 | \n",
+ " 2.2846 | \n",
+ " 3.4 | \n",
+ " 0.028 | \n",
+ " -0.8 | \n",
+ " 0.5783 | \n",
+ " 0.4551 | \n",
+ " 0.003932 | \n",
+ " 0.009537 | \n",
+ " 0.034507 | \n",
+ " 0.060117 | \n",
+ " -0.005605 | \n",
+ " -0.025610 | \n",
+ " 1.1671 | \n",
+ "
\n",
+ " \n",
+ " 2019-10-31 | \n",
+ " 2.4819 | \n",
+ " 3.3 | \n",
+ " 0.046 | \n",
+ " -1.2 | \n",
+ " 0.6232 | \n",
+ " 0.4486 | \n",
+ " 0.018933 | \n",
+ " 0.004107 | \n",
+ " 0.028837 | \n",
+ " 0.050755 | \n",
+ " 0.014826 | \n",
+ " -0.021918 | \n",
+ " 0.8783 | \n",
+ "
\n",
+ " \n",
+ " 2019-11-29 | \n",
+ " 2.4335 | \n",
+ " 3.5 | \n",
+ " 0.055 | \n",
+ " -1.6 | \n",
+ " 0.5219 | \n",
+ " 0.4465 | \n",
+ " -0.014943 | \n",
+ " -0.024224 | \n",
+ " 0.031677 | \n",
+ " 0.040715 | \n",
+ " 0.009281 | \n",
+ " -0.009037 | \n",
+ " 1.0280 | \n",
+ "
\n",
+ " \n",
+ " 2019-12-31 | \n",
+ " 2.0074 | \n",
+ " 4.4 | \n",
+ " 0.053 | \n",
+ " -1.4 | \n",
+ " 0.7734 | \n",
+ " 0.6224 | \n",
+ " 0.069975 | \n",
+ " 0.085909 | \n",
+ " 0.031468 | \n",
+ " 0.042280 | \n",
+ " -0.015934 | \n",
+ " -0.010813 | \n",
+ " 1.1030 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Bond_yield_3M m1_yoy ... STD_Spread Risk\n",
+ "2019-08-30 2.4480 3.4 ... -0.008531 1.3108\n",
+ "2019-09-30 2.2846 3.4 ... -0.025610 1.1671\n",
+ "2019-10-31 2.4819 3.3 ... -0.021918 0.8783\n",
+ "2019-11-29 2.4335 3.5 ... -0.009037 1.0280\n",
+ "2019-12-31 2.0074 4.4 ... -0.010813 1.1030\n",
+ "\n",
+ "[5 rows x 13 columns]"
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df_m_shifted_=pd.read_csv('../Data/Timing_variables.csv',index_col=[0])\n",
+ "df_m_shifted_.tail()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 六、正交变换后外部变量解释能力显著增强\n",
+ "\n",
+ "对于不同的分类或者回归模型,理应选择各自适用的变量筛选准则,为了统一标准和简单起见,**此处我们选择最常见的`线性回归模型的决定系数R_square 值`来对外部变量的解释能力做初步比较**:由于线性回归OLS 模型对于特征变量直接的共线性较为敏感,所以此处测试时在市场状态变量中我们只纳入了期限利差(TS),信用利差(CS),消费价格指数同比增长率(CPI),全工业品生产价格指数同比增长率(PPI),沪深300、中证1000 月度收益差值(RET_Spread),沪深300、中证1000 月度波动率差值(STD_Spread),M1货币供应量同比增长率(M1),3个月国债收益率(Bond_yield_3M)共8个指标。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {
+ "code_folding": [
+ 2
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "# 构建拟合优度函数\n",
+ "# 输入参数:df_m_shifted_i(某段时间的择时变量),Ret_mat(对应时段的因子收益)\n",
+ "def R_squared(df_m_shifted_i:pd.DataFrame, Ret_mat:pd.DataFrame)->pd.DataFrame:\n",
+ "\n",
+ " # 提取输入的择时变量对应的索引值(即时间维度)\n",
+ " datelist = df_m_shifted_i.index.unique()\n",
+ "\n",
+ " # 提取与择时变量相同维度的因子收益\n",
+ " Ret_mat = Ret_mat.loc[datelist]\n",
+ "\n",
+ " # 计算因子个数\n",
+ " num_factor = len(Ret_mat.columns)\n",
+ "\n",
+ " # 设置用于择时的变量\n",
+ " macro_factor_selected = [\n",
+ " 'TS', 'CS', 'cpi_yoy', 'ppi', 'STD_Spread', 'RET_Spread', 'm1_yoy',\n",
+ " 'Bond_yield_3M'\n",
+ " ]\n",
+ "\n",
+ " # 构建初始的拟合优度数据框\n",
+ " R2_mat = pd.DataFrame(index=Ret_mat.columns, columns=['R_square'])\n",
+ "\n",
+ " for label,factor in Ret_mat.items():\n",
+ "\n",
+ " # 提取回归因变量(因子收益)\n",
+ " factor_ret = factor.reset_index(drop=True)\n",
+ "\n",
+ " # 提取回归自变量(择时因子)\n",
+ " macro_df = df_m_shifted_i.loc[datelist, macro_factor_selected].reset_index(drop=True)\n",
+ " \n",
+ " # 进行回归并提取拟合优度值\n",
+ " r2 = sm.regression.linear_model.OLS(\n",
+ " factor_ret.astype(float), macro_df.astype(float)).fit().rsquared\n",
+ "\n",
+ " R2_mat.loc[label, :] = r2\n",
+ "\n",
+ " return R2_mat #得到的R2是一个以因子名为索引,以'R_square'为列名的数据框\n",
+ "\n",
+ "\n",
+ "# 构建滚动计算拟合优度及拟合优度均值\n",
+ "# 输入参数:df_m_shifted_(为全部测试区间段择时因子数据集),Ret_mat(为全部时间段的因子收益),window=24(滚动周期)\n",
+ "def rolling_R2(\n",
+ " df_m_shifted_:pd.DataFrame, Ret_mat:pd.DataFrame, window:int=24\n",
+ ")->pd.DataFrame: \n",
+ " \n",
+ " # 提取日期数\n",
+ " datelist = df_m_shifted_.index.unique() \n",
+ " \n",
+ " # 初始化拟合优度数据框中,以因子名为索引值\n",
+ " R2_mat_all = pd.DataFrame(\n",
+ " index=Ret_mat.columns.tolist()) \n",
+ "\n",
+ " for i in range(window - 1, len(df_m_shifted_)):\n",
+ "\n",
+ " df_m_shifted_i = df_m_shifted_.iloc[(i - window + 1):i, :]\n",
+ "\n",
+ " # 计算每期的拟合优度\n",
+ " R2_mat_i = R_squared(df_m_shifted_i, Ret_mat) \n",
+ "\n",
+ " # 将计算的第i期数据合并入初始化拟合优度数据框中\n",
+ " R2_mat_all[datelist[i]] = R2_mat_i \n",
+ "\n",
+ " return R2_mat_all.T"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " LVGI | \n",
+ " PEG | \n",
+ " ROC20 | \n",
+ " ROC60 | \n",
+ " VOL20 | \n",
+ " Volume1M | \n",
+ " book_to_price_ratio | \n",
+ " cfo_to_ev | \n",
+ " market_cap | \n",
+ " roe_ttm | \n",
+ " sharpe_ratio_20 | \n",
+ " total_asset_growth_rate | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 2012-12-31 | \n",
+ " 0.0510165 | \n",
+ " 0.173187 | \n",
+ " 0.300352 | \n",
+ " 0.213769 | \n",
+ " 0.288701 | \n",
+ " 0.397229 | \n",
+ " 0.461012 | \n",
+ " 0.134817 | \n",
+ " 0.37171 | \n",
+ " 0.458406 | \n",
+ " 0.155312 | \n",
+ " 0.428146 | \n",
+ "
\n",
+ " \n",
+ " 2013-01-31 | \n",
+ " 0.0579054 | \n",
+ " 0.0901847 | \n",
+ " 0.33236 | \n",
+ " 0.169299 | \n",
+ " 0.287314 | \n",
+ " 0.357015 | \n",
+ " 0.443073 | \n",
+ " 0.110358 | \n",
+ " 0.301963 | \n",
+ " 0.412385 | \n",
+ " 0.43722 | \n",
+ " 0.435519 | \n",
+ "
\n",
+ " \n",
+ " 2013-02-28 | \n",
+ " 0.05469 | \n",
+ " 0.161903 | \n",
+ " 0.33188 | \n",
+ " 0.150858 | \n",
+ " 0.270569 | \n",
+ " 0.334043 | \n",
+ " 0.316359 | \n",
+ " 0.113004 | \n",
+ " 0.21819 | \n",
+ " 0.396523 | \n",
+ " 0.463417 | \n",
+ " 0.386656 | \n",
+ "
\n",
+ " \n",
+ " 2013-03-29 | \n",
+ " 0.137092 | \n",
+ " 0.183353 | \n",
+ " 0.35711 | \n",
+ " 0.146261 | \n",
+ " 0.189245 | \n",
+ " 0.332555 | \n",
+ " 0.354691 | \n",
+ " 0.0673482 | \n",
+ " 0.217751 | \n",
+ " 0.455631 | \n",
+ " 0.573672 | \n",
+ " 0.417879 | \n",
+ "
\n",
+ " \n",
+ " 2013-04-26 | \n",
+ " 0.137166 | \n",
+ " 0.164621 | \n",
+ " 0.379014 | \n",
+ " 0.214478 | \n",
+ " 0.163499 | \n",
+ " 0.357639 | \n",
+ " 0.344958 | \n",
+ " 0.0776939 | \n",
+ " 0.229596 | \n",
+ " 0.438512 | \n",
+ " 0.553267 | \n",
+ " 0.465659 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " LVGI ... total_asset_growth_rate\n",
+ "2012-12-31 0.0510165 ... 0.428146\n",
+ "2013-01-31 0.0579054 ... 0.435519\n",
+ "2013-02-28 0.05469 ... 0.386656\n",
+ "2013-03-29 0.137092 ... 0.417879\n",
+ "2013-04-26 0.137166 ... 0.465659\n",
+ "\n",
+ "[5 rows x 12 columns]"
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 构建滚动计算拟合优度及拟合优度均值\n",
+ "R2_mat_all=rolling_R2(df_m_shifted_.fillna(0),Ret_mat.fillna(0),window=24)\n",
+ "R2_mat_all.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " LVGI | \n",
+ " PEG | \n",
+ " ROC20 | \n",
+ " ROC60 | \n",
+ " VOL20 | \n",
+ " Volume1M | \n",
+ " book_to_price_ratio | \n",
+ " cfo_to_ev | \n",
+ " market_cap | \n",
+ " roe_ttm | \n",
+ " sharpe_ratio_20 | \n",
+ " total_asset_growth_rate | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 2012-12-31 | \n",
+ " 0.0451179 | \n",
+ " 0.198055 | \n",
+ " 0.216762 | \n",
+ " 0.222028 | \n",
+ " 0.325545 | \n",
+ " 0.260314 | \n",
+ " 0.46209 | \n",
+ " 0.182527 | \n",
+ " 0.389111 | \n",
+ " 0.367629 | \n",
+ " 0.0877617 | \n",
+ " 0.444364 | \n",
+ "
\n",
+ " \n",
+ " 2013-01-31 | \n",
+ " 0.0490992 | \n",
+ " 0.140642 | \n",
+ " 0.183815 | \n",
+ " 0.153531 | \n",
+ " 0.327116 | \n",
+ " 0.173851 | \n",
+ " 0.444924 | \n",
+ " 0.1413 | \n",
+ " 0.317838 | \n",
+ " 0.330239 | \n",
+ " 0.206881 | \n",
+ " 0.438815 | \n",
+ "
\n",
+ " \n",
+ " 2013-02-28 | \n",
+ " 0.0393381 | \n",
+ " 0.140113 | \n",
+ " 0.174367 | \n",
+ " 0.122871 | \n",
+ " 0.279662 | \n",
+ " 0.13169 | \n",
+ " 0.311479 | \n",
+ " 0.135058 | \n",
+ " 0.270224 | \n",
+ " 0.318173 | \n",
+ " 0.232153 | \n",
+ " 0.428219 | \n",
+ "
\n",
+ " \n",
+ " 2013-03-29 | \n",
+ " 0.1463 | \n",
+ " 0.182197 | \n",
+ " 0.211951 | \n",
+ " 0.120783 | \n",
+ " 0.176479 | \n",
+ " 0.127474 | \n",
+ " 0.360628 | \n",
+ " 0.0752018 | \n",
+ " 0.248244 | \n",
+ " 0.405733 | \n",
+ " 0.322435 | \n",
+ " 0.456936 | \n",
+ "
\n",
+ " \n",
+ " 2013-04-26 | \n",
+ " 0.157487 | \n",
+ " 0.175895 | \n",
+ " 0.205708 | \n",
+ " 0.164188 | \n",
+ " 0.150821 | \n",
+ " 0.138844 | \n",
+ " 0.357861 | \n",
+ " 0.0722729 | \n",
+ " 0.252051 | \n",
+ " 0.427718 | \n",
+ " 0.297726 | \n",
+ " 0.538786 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " LVGI ... total_asset_growth_rate\n",
+ "2012-12-31 0.0451179 ... 0.444364\n",
+ "2013-01-31 0.0490992 ... 0.438815\n",
+ "2013-02-28 0.0393381 ... 0.428219\n",
+ "2013-03-29 0.1463 ... 0.456936\n",
+ "2013-04-26 0.157487 ... 0.538786\n",
+ "\n",
+ "[5 rows x 12 columns]"
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "#同理计算对称正交变换前的拟合优度\n",
+ "noorth_factors=pd.read_csv('../Data/noorth_SVM_timing_datas.csv',index_col=[0,1]) # 读取对称正交前的数据\n",
+ "noorth_Ret_mat=Neu_Ret(noorth_factors) # 计算因子收益\n",
+ "noorth_R2_mat_all=rolling_R2(df_m_shifted_.fillna(0),noorth_Ret_mat.fillna(0),window=24) # 滚动计算拟合优度\n",
+ "noorth_R2_mat_all.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n",
+ "下面比较前面入选因子经过正交变化前后,外部变量(择时变量)对入选因子解释能力(滚动n期后平均R_square值,此处n取24个月)的变化情况,发现:\n",
+ "\n",
+ "\n",
+ "+ 经过因子值横截面对称正交后,12个入选因子中有7个因子的平均决定系数R_squre有所提升。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# 绘制对称正交前后择时因子解释能力变化对比柱状图\n",
+ "df = pd.DataFrame()\n",
+ "df['before_orthogonal'] = noorth_R2_mat_all.mean() # 计算对称正交前的平均拟合优度\n",
+ "df['after_orthogonal'] = R2_mat_all.mean() # 计算对称正交后的平均拟合优度\n",
+ "df.plot.bar(figsize=(18, 7))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 七、回测流程及参数设置\n",
+ "\n",
+ "\n",
+ "`分类模型回测的主要参数设置:`\n",
+ "+ 调仓频率:月度(每月末调仓);\n",
+ "+ 样本内测试区间:2011-01-01 ~ 2015-12-31;\n",
+ "+ 样本外测试区间:2016-01-01 ~ 2018-10-31;\n",
+ "+ 注:样本内预测区间用于训练各分类模型,寻找最优参数,由于最优参数已找出,所以后面的回测时直接基于最优参数,进行全区间的回测\n",
+ "\n",
+ "`对于任意一个因子:`\n",
+ "+ 以正交化后的因子收益的加权移动平均值 w_0作为基础权重(加权移动平均参数:半衰期h,最小期数m);\n",
+ "+ 滚动过去n期的样本作为训练集,预测未来一期的因子收益方向;\n",
+ "+ 假设模型给出的未来一期因子收益方向预测值为p,(p∈{1,-1});\n",
+ "+ 假设该因子在过去36个月的因子收益均值的方向为q,(q∈{1,-1});\n",
+ "+ 如果 p=q,则该因子本期的权重不变;\n",
+ "+ 如果 p≠q,则该因子本期的权重调整为 w_0 × z ,z ∈ (0,1),此处z为权重调整系数;\n",
+ "+ 基于变量解释能力调整z值:设置参数r2,当滚动过去n期样本的平均决定系数R_square小于阈值r2时,本期因子权重直接采用基础权重而不作调整,即z = 1;当滚动过去n期样本的平均决定系数R_square大于等于阈值r2时,则该因子本期的权重调整为 w_0×z ,z∈(0,1)。\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# step1:构建因子收益的加权移动平均函数,作为基础权重\n",
+ "# 输入因子收益和半衰期\n",
+ "def Ret_mat_emw(Ret_mat:pd.DataFrame,period:int)->pd.DataFrame: \n",
+ " \n",
+ " # 指数加权移动平均\n",
+ " return Ret_mat.ewm(halflife=period,min_periods=2).mean()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " LVGI | \n",
+ " PEG | \n",
+ " ROC20 | \n",
+ " ROC60 | \n",
+ " VOL20 | \n",
+ " Volume1M | \n",
+ " book_to_price_ratio | \n",
+ " cfo_to_ev | \n",
+ " market_cap | \n",
+ " roe_ttm | \n",
+ " sharpe_ratio_20 | \n",
+ " total_asset_growth_rate | \n",
+ "
\n",
+ " \n",
+ " date | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 2011-01-31 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " 2011-02-28 | \n",
+ " 0.037460 | \n",
+ " -0.186316 | \n",
+ " -0.191417 | \n",
+ " -0.170819 | \n",
+ " -0.088914 | \n",
+ " -0.241016 | \n",
+ " 0.078811 | \n",
+ " -0.099524 | \n",
+ " -0.147187 | \n",
+ " 0.046746 | \n",
+ " -0.061546 | \n",
+ " 0.145623 | \n",
+ "
\n",
+ " \n",
+ " 2011-03-31 | \n",
+ " 0.021255 | \n",
+ " -0.111134 | \n",
+ " -0.113005 | \n",
+ " -0.126262 | \n",
+ " -0.177534 | \n",
+ " -0.208031 | \n",
+ " 0.195582 | \n",
+ " -0.002379 | \n",
+ " -0.050085 | \n",
+ " 0.031667 | \n",
+ " -0.028889 | \n",
+ " 0.144849 | \n",
+ "
\n",
+ " \n",
+ " 2011-04-29 | \n",
+ " 0.015537 | \n",
+ " -0.067098 | \n",
+ " -0.034323 | \n",
+ " -0.093758 | \n",
+ " -0.169844 | \n",
+ " -0.130743 | \n",
+ " 0.122009 | \n",
+ " 0.002279 | \n",
+ " 0.003492 | \n",
+ " 0.047828 | \n",
+ " -0.002334 | \n",
+ " 0.076055 | \n",
+ "
\n",
+ " \n",
+ " 2011-05-31 | \n",
+ " 0.000837 | \n",
+ " -0.055413 | \n",
+ " -0.049605 | \n",
+ " -0.063075 | \n",
+ " -0.061680 | \n",
+ " -0.165503 | \n",
+ " 0.054365 | \n",
+ " -0.049604 | \n",
+ " 0.016985 | \n",
+ " 0.071120 | \n",
+ " 0.069063 | \n",
+ " 0.083549 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " LVGI ... total_asset_growth_rate\n",
+ "date ... \n",
+ "2011-01-31 NaN ... NaN\n",
+ "2011-02-28 0.037460 ... 0.145623\n",
+ "2011-03-31 0.021255 ... 0.144849\n",
+ "2011-04-29 0.015537 ... 0.076055\n",
+ "2011-05-31 0.000837 ... 0.083549\n",
+ "\n",
+ "[5 rows x 12 columns]"
+ ]
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "weight_in = Ret_mat_emw(Ret_mat,period=3)\n",
+ "weight_in.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {
+ "code_folding": [
+ 0
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "# step2:构建分类模型(可通过输入不同的方法method,选择不同的分类模型),预测因子受益方向\n",
+ "def run_predict_models(Ret_mat: pd.DataFrame,\n",
+ " df_m_shifted_: pd.DataFrame,\n",
+ " method: str,\n",
+ " window: int = 24) -> pd.DataFrame:\n",
+ " '''\n",
+ " 输入因子收益Ret_mat,择时变量df_m_shifted_,\n",
+ " 滚动窗口window为24(每次以24期的数据进行预测,\n",
+ " 其中前23期作为训练集,最后一期作为测试集)\n",
+ " method:SVM,Logistic,DecisionTree,RandomForest\n",
+ " '''\n",
+ " ##初始化数据\n",
+ " # 获取需要预测的日期序列\n",
+ " datelist = df_m_shifted_.index.tolist()[window - 1:]\n",
+ "\n",
+ " # 由于因子收益的时间跨度比择时变量的时间跨度大,为保持预测时间上的一致性,\n",
+ " # 截取与择时变量相同时间跨度的因子收益\n",
+ " Ret_mat_ = Ret_mat.loc[df_m_shifted_.index.tolist()]\n",
+ "\n",
+ " # 因子收益大于0的赋值为1,小于0的赋值为-1\n",
+ " Ret_mat_sign = np.sign(Ret_mat_)\n",
+ "\n",
+ " # 获取待预测的因子个数\n",
+ " num_factor = len(Ret_mat.columns)\n",
+ "\n",
+ " # 提取用于预测的择时因子\n",
+ " macro_df = df_m_shifted_[[\n",
+ " 'TS', 'CS', 'cpi_yoy', 'ppi', 'STD_Spread', 'RET_Spread', 'm1_yoy',\n",
+ " 'Bond_yield_3M'\n",
+ " ]]\n",
+ "\n",
+ " ## 用支持向量机进行预测\n",
+ " # 初始化因子预测数据框,其中起始预测时间为start_date\n",
+ " predicted_mat = pd.DataFrame(index=datelist, columns=Ret_mat_.columns)\n",
+ "\n",
+ " for j in range(num_factor):\n",
+ "\n",
+ " # 提取分类模型的预测变量,即第j个因子收益\n",
+ " factor_sign_i = Ret_mat_sign.iloc[:, j]\n",
+ "\n",
+ " # 初始化单个因子的测试值\n",
+ " predict = pd.Series(0, index=predicted_mat.index.tolist())\n",
+ "\n",
+ " for i in range(len(predict)):\n",
+ "\n",
+ " # 不进行变量缩减,提取预测日期前推window的择时变量数据\n",
+ " x_design = macro_df.iloc[i:i + window, :]\n",
+ "\n",
+ " # 对数据进行标准化处理\n",
+ " x_std = x_design.apply(lambda x: (x - x.mean()) / x.std())\n",
+ "\n",
+ " # 以前23期为训练集\n",
+ " x_train = x_std.iloc[:-1, :]\n",
+ "\n",
+ " # 以最后一期为测试集\n",
+ " x_test = x_std.iloc[-1:, :]\n",
+ "\n",
+ " # 提取y集\n",
+ " y_design_logit = factor_sign_i.iloc[i:i + window - 1]\n",
+ "\n",
+ " # 判断选用哪个预测模型\n",
+ " if method == 'SVM':\n",
+ " regr = svm.SVC(kernel='rbf') # 支持向量机核函数选择rbf\n",
+ "\n",
+ " elif method == 'Logistic':\n",
+ " regr = linear_model.LogisticRegression() # 导入逻辑回归模型\n",
+ "\n",
+ " elif method == 'DecisionTree':\n",
+ " regr = DecisionTreeClassifier(max_depth=3) # 导入决策树模型\n",
+ "\n",
+ " elif method == 'RandomForest':\n",
+ " regr = RandomForestClassifier(\n",
+ " n_estimators=20, max_depth=3,\n",
+ " random_state=0) # 随机森林最大深度为3,n_estimators=20\n",
+ "\n",
+ " regr.fit(x_train.fillna(0), y_design_logit.astype(float))\n",
+ " predict.iloc[i] = regr.predict(x_test.fillna(0))\n",
+ "\n",
+ " print(Ret_mat.columns[j], \"success\") # 打印测试成功的因子\n",
+ "\n",
+ " predicted_mat.iloc[:, j] = predict # 将各个因子滚动预测的因子收益存入初始化因子预测数据框中\n",
+ "\n",
+ " return predicted_mat"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "LVGI success\n",
+ "PEG success\n",
+ "ROC20 success\n",
+ "ROC60 success\n",
+ "VOL20 success\n",
+ "Volume1M success\n",
+ "book_to_price_ratio success\n",
+ "cfo_to_ev success\n",
+ "market_cap success\n",
+ "roe_ttm success\n",
+ "sharpe_ratio_20 success\n",
+ "total_asset_growth_rate success\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " LVGI | \n",
+ " PEG | \n",
+ " ROC20 | \n",
+ " ROC60 | \n",
+ " VOL20 | \n",
+ " Volume1M | \n",
+ " book_to_price_ratio | \n",
+ " cfo_to_ev | \n",
+ " market_cap | \n",
+ " roe_ttm | \n",
+ " sharpe_ratio_20 | \n",
+ " total_asset_growth_rate | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 2012-12-31 | \n",
+ " 1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " 1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " 2013-01-31 | \n",
+ " 1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " 2013-02-28 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " 2013-03-29 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " 1.0 | \n",
+ " -1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " 2013-04-26 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " -1.0 | \n",
+ " 1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " -1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " LVGI ... total_asset_growth_rate\n",
+ "2012-12-31 1.0 ... 1.0\n",
+ "2013-01-31 1.0 ... 1.0\n",
+ "2013-02-28 1.0 ... 1.0\n",
+ "2013-03-29 1.0 ... 1.0\n",
+ "2013-04-26 1.0 ... 1.0\n",
+ "\n",
+ "[5 rows x 12 columns]"
+ ]
+ },
+ "execution_count": 24,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "predict_mat = run_predict_models(\n",
+ " Ret_mat, df_m_shifted_, method='SVM', window=24)\n",
+ "predict_mat.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {
+ "code_folding": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " LVGI | \n",
+ " PEG | \n",
+ " ROC20 | \n",
+ " ROC60 | \n",
+ " VOL20 | \n",
+ " Volume1M | \n",
+ " book_to_price_ratio | \n",
+ " cfo_to_ev | \n",
+ " market_cap | \n",
+ " roe_ttm | \n",
+ " sharpe_ratio_20 | \n",
+ " total_asset_growth_rate | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 2012-12-31 | \n",
+ " 0.0510165 | \n",
+ " 0.173187 | \n",
+ " 0.300352 | \n",
+ " 0.213769 | \n",
+ " 0.288701 | \n",
+ " 0.397229 | \n",
+ " 0.461012 | \n",
+ " 0.134817 | \n",
+ " 0.37171 | \n",
+ " 0.458406 | \n",
+ " 0.155312 | \n",
+ " 0.428146 | \n",
+ "
\n",
+ " \n",
+ " 2013-01-31 | \n",
+ " 0.0579054 | \n",
+ " 0.0901847 | \n",
+ " 0.33236 | \n",
+ " 0.169299 | \n",
+ " 0.287314 | \n",
+ " 0.357015 | \n",
+ " 0.443073 | \n",
+ " 0.110358 | \n",
+ " 0.301963 | \n",
+ " 0.412385 | \n",
+ " 0.43722 | \n",
+ " 0.435519 | \n",
+ "
\n",
+ " \n",
+ " 2013-02-28 | \n",
+ " 0.05469 | \n",
+ " 0.161903 | \n",
+ " 0.33188 | \n",
+ " 0.150858 | \n",
+ " 0.270569 | \n",
+ " 0.334043 | \n",
+ " 0.316359 | \n",
+ " 0.113004 | \n",
+ " 0.21819 | \n",
+ " 0.396523 | \n",
+ " 0.463417 | \n",
+ " 0.386656 | \n",
+ "
\n",
+ " \n",
+ " 2013-03-29 | \n",
+ " 0.137092 | \n",
+ " 0.183353 | \n",
+ " 0.35711 | \n",
+ " 0.146261 | \n",
+ " 0.189245 | \n",
+ " 0.332555 | \n",
+ " 0.354691 | \n",
+ " 0.0673482 | \n",
+ " 0.217751 | \n",
+ " 0.455631 | \n",
+ " 0.573672 | \n",
+ " 0.417879 | \n",
+ "
\n",
+ " \n",
+ " 2013-04-26 | \n",
+ " 0.137166 | \n",
+ " 0.164621 | \n",
+ " 0.379014 | \n",
+ " 0.214478 | \n",
+ " 0.163499 | \n",
+ " 0.357639 | \n",
+ " 0.344958 | \n",
+ " 0.0776939 | \n",
+ " 0.229596 | \n",
+ " 0.438512 | \n",
+ " 0.553267 | \n",
+ " 0.465659 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " LVGI ... total_asset_growth_rate\n",
+ "2012-12-31 0.0510165 ... 0.428146\n",
+ "2013-01-31 0.0579054 ... 0.435519\n",
+ "2013-02-28 0.05469 ... 0.386656\n",
+ "2013-03-29 0.137092 ... 0.417879\n",
+ "2013-04-26 0.137166 ... 0.465659\n",
+ "\n",
+ "[5 rows x 12 columns]"
+ ]
+ },
+ "execution_count": 25,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "#step4:计算滚动过去n期样本的平均决定系数R_square\n",
+ "r2 = rolling_R2(df_m_shifted_.fillna(0), Ret_mat.fillna(0), window=24)\n",
+ "r2.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "metadata": {
+ "code_folding": [
+ 0
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "#step5:构建权重调整函数\n",
+ "def weight_timing_threshold(Ret_mat: pd.DataFrame,\n",
+ " predict_mat: pd.DataFrame,\n",
+ " weight_in: pd.DataFrame,\n",
+ " r2: pd.DataFrame,\n",
+ " th: float,\n",
+ " z: float = 0.1) -> pd.DataFrame:\n",
+ " '''\n",
+ " 输入参数:Ret_mat(因子收益数据集),\n",
+ " predict_mat(预测的因子收益方向),\n",
+ " weight_in(初始权重),\n",
+ " r2(拟合优度R2),\n",
+ " th(阈值),\n",
+ " z(权重调整系数)\n",
+ " '''\n",
+ " predict_mat.dropna(inplace=True) # 删除预期收益的缺失值\n",
+ "\n",
+ " # 计算过去36个月的因子收益均值\n",
+ " factor_sign = np.sign(Ret_mat.rolling(min_periods=1, window=36).mean())\n",
+ "\n",
+ " # 初始化新的因子权重\n",
+ " weight_new = pd.DataFrame(\n",
+ " 0, index=predict_mat.index, columns=weight_in.columns)\n",
+ "\n",
+ " # 提取预期因子收益时间对应的原始因子权重集\n",
+ " weight_in_chunk = weight_in.loc[predict_mat.index, :]\n",
+ "\n",
+ " for j in range(len(weight_new.index)):\n",
+ " # 判断预测的因子收益方向是否发生变化,若发生变化则需要调整权重\n",
+ " iftrue = pd.Series(predict_mat.loc[weight_new.index[j], :] ==\n",
+ " factor_sign.loc[weight_new.index[j], :])\n",
+ "\n",
+ " time_weight = iftrue.apply(lambda x: 1 if x == True else z)\n",
+ " # 调整因子权重\n",
+ " weight_new.iloc[j, :] = weight_in_chunk.iloc[j, :] * time_weight\n",
+ "\n",
+ " # 根据拟合优度判断是否有变换权重的资格\n",
+ " aa = r2.loc[weight_new.index[j], :] # 提取对应日期的过去n期的r2\n",
+ " for name in weight_new.columns:\n",
+ " if aa[name] < th:\n",
+ " weight_new.loc[:, name] = weight_in_chunk.loc[:, name].astype(\n",
+ " float) # 当拟合优度小于阈值时,权重保持不变\n",
+ "\n",
+ " return weight_new"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " LVGI | \n",
+ " PEG | \n",
+ " ROC20 | \n",
+ " ROC60 | \n",
+ " VOL20 | \n",
+ " Volume1M | \n",
+ " book_to_price_ratio | \n",
+ " cfo_to_ev | \n",
+ " market_cap | \n",
+ " roe_ttm | \n",
+ " sharpe_ratio_20 | \n",
+ " total_asset_growth_rate | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 2012-12-31 | \n",
+ " -0.029785 | \n",
+ " 0.025974 | \n",
+ " -0.071259 | \n",
+ " -0.004482 | \n",
+ " -0.036303 | \n",
+ " -0.050228 | \n",
+ " 0.013749 | \n",
+ " -0.032974 | \n",
+ " 0.051352 | \n",
+ " 0.021932 | \n",
+ " 0.121678 | \n",
+ " 0.030551 | \n",
+ "
\n",
+ " \n",
+ " 2013-01-31 | \n",
+ " -0.003086 | \n",
+ " 0.041586 | \n",
+ " -0.035746 | \n",
+ " -0.044222 | \n",
+ " -0.018383 | \n",
+ " -0.061970 | \n",
+ " 0.080987 | \n",
+ " -0.005590 | \n",
+ " 0.004063 | \n",
+ " 0.042738 | \n",
+ " 0.131450 | \n",
+ " 0.020949 | \n",
+ "
\n",
+ " \n",
+ " 2013-02-28 | \n",
+ " 0.026298 | \n",
+ " 0.070942 | \n",
+ " 0.009301 | \n",
+ " -0.025911 | \n",
+ " -0.099144 | \n",
+ " -0.072432 | \n",
+ " 0.003545 | \n",
+ " -0.027458 | \n",
+ " -0.014554 | \n",
+ " 0.054978 | \n",
+ " 0.177616 | \n",
+ " 0.041666 | \n",
+ "
\n",
+ " \n",
+ " 2013-03-29 | \n",
+ " 0.016106 | \n",
+ " 0.053079 | \n",
+ " -0.002810 | \n",
+ " -0.010726 | \n",
+ " -0.081144 | \n",
+ " -0.094617 | \n",
+ " 0.002742 | \n",
+ " -0.003835 | \n",
+ " 0.002229 | \n",
+ " 0.026993 | \n",
+ " 0.150334 | \n",
+ " 0.067063 | \n",
+ "
\n",
+ " \n",
+ " 2013-04-26 | \n",
+ " 0.016179 | \n",
+ " 0.031828 | \n",
+ " 0.008957 | \n",
+ " 0.002916 | \n",
+ " 0.007852 | \n",
+ " -0.076144 | \n",
+ " -0.002356 | \n",
+ " -0.044393 | \n",
+ " -0.007112 | \n",
+ " 0.009735 | \n",
+ " 0.174176 | \n",
+ " 0.044186 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " LVGI ... total_asset_growth_rate\n",
+ "2012-12-31 -0.029785 ... 0.030551\n",
+ "2013-01-31 -0.003086 ... 0.020949\n",
+ "2013-02-28 0.026298 ... 0.041666\n",
+ "2013-03-29 0.016106 ... 0.067063\n",
+ "2013-04-26 0.016179 ... 0.044186\n",
+ "\n",
+ "[5 rows x 12 columns]"
+ ]
+ },
+ "execution_count": 29,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "weight_new = weight_timing_threshold(\n",
+ " Ret_mat, predict_mat, weight_in, r2, th=0.05, z=0.1)\n",
+ "weight_new.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 八、 三种分类模型的参数选取\n",
+ "我们将会构建支持向量机(SVM)、随机森林(Random Forest)、逻辑回归(Logistic)3种分类模型预测因子收益,然后根据滚动预测的结果进行权重调整。\n",
+ "\n",
+ "三种分类模型的样本内优化后的参数选取如下:\n",
+ "\n",
+ "|SVM|取值|Random Forest|取值|Logistic|取值|\n",
+ "|:---:|:---:|:---:|:---:|:---:|:---:|\n",
+ "|训练集长度n|24|训练集长度n|20|训练集长度n|36|\n",
+ "|半衰期h|3|半衰期h|3|半衰期h|3|\n",
+ "|权重调整系数z|0.1|权重调整系数z|0.1|权重调整系数z|0.2|\n",
+ "|阈值r2|0.05|阈值r2|0|阈值r2|0.1|\n",
+ "|核函数|rbt|最大深度|3|-|-|\n",
+ "|-|-|n_estimators|20|-|-|\n",
+ "\n",
+ " *根据测试发现并不是训练时间越长越好,这里直接给出结果*"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 九、SVM预测能力较强,随机森林表现稳定"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#定义计算预测准确度函数\n",
+ "def get_accuracy(predict_mat: pd.DataFrame, Ret_mat: pd.DataFrame,\n",
+ " method: str) -> pd.DataFrame:\n",
+ " \n",
+ " ret_mat_sign = np.sign(Ret_mat.loc[predict_mat.index, :])\n",
+ " accuracy = pd.DataFrame(index=[method])\n",
+ " \n",
+ " for name in predict_mat.columns:\n",
+ " iftrue = pd.Series(predict_mat[name] == ret_mat_sign[name])\n",
+ " accuracy[name] = iftrue.sum() / len(predict_mat)\n",
+ " return accuracy.T"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "LVGI success\n",
+ "PEG success\n",
+ "ROC20 success\n",
+ "ROC60 success\n",
+ "VOL20 success\n",
+ "Volume1M success\n",
+ "book_to_price_ratio success\n",
+ "cfo_to_ev success\n",
+ "market_cap success\n",
+ "roe_ttm success\n",
+ "sharpe_ratio_20 success\n",
+ "total_asset_growth_rate success\n"
+ ]
+ }
+ ],
+ "source": [
+ "#计算SVM的预测准确率\n",
+ "predict_mat = run_predict_models(\n",
+ " Ret_mat, df_m_shifted_, method='SVM', window=24)\n",
+ "SVM_accuracy = get_accuracy(predict_mat, Ret_mat, method='SVM')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "LVGI success\n",
+ "PEG success\n",
+ "ROC20 success\n",
+ "ROC60 success\n",
+ "VOL20 success\n",
+ "Volume1M success\n",
+ "book_to_price_ratio success\n",
+ "cfo_to_ev success\n",
+ "market_cap success\n",
+ "roe_ttm success\n",
+ "sharpe_ratio_20 success\n",
+ "total_asset_growth_rate success\n"
+ ]
+ }
+ ],
+ "source": [
+ "#计算随机森林的预测准确率\n",
+ "predict_mat = run_predict_models(\n",
+ " Ret_mat, df_m_shifted_, method='RandomForest', window=20)\n",
+ "RF_accuracy = get_accuracy(predict_mat, Ret_mat, method='RandomForest')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "LVGI success\n",
+ "PEG success\n",
+ "ROC20 success\n",
+ "ROC60 success\n",
+ "VOL20 success\n",
+ "Volume1M success\n",
+ "book_to_price_ratio success\n",
+ "cfo_to_ev success\n",
+ "market_cap success\n",
+ "roe_ttm success\n",
+ "sharpe_ratio_20 success\n",
+ "total_asset_growth_rate success\n"
+ ]
+ }
+ ],
+ "source": [
+ "#计算逻辑回归的预测准确率\n",
+ "predict_mat = run_predict_models(\n",
+ " Ret_mat, df_m_shifted_, method='Logistic', window=36)\n",
+ "LOG_accuracy = get_accuracy(predict_mat, Ret_mat, method='Logistic')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 34,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "#绘制预测准确度对比柱状图\n",
+ "df = pd.concat([SVM_accuracy, RF_accuracy, LOG_accuracy], axis=1)\n",
+ "df.plot.bar(figsize=(18, 8))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 十、 随机森林和SVM的收益提升较为明显"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " | \n",
+ " LVGI | \n",
+ " PEG | \n",
+ " ROC20 | \n",
+ " ROC60 | \n",
+ " VOL20 | \n",
+ " Volume1M | \n",
+ " book_to_price_ratio | \n",
+ " cfo_to_ev | \n",
+ " market_cap | \n",
+ " roe_ttm | \n",
+ " sharpe_ratio_20 | \n",
+ " total_asset_growth_rate | \n",
+ "
\n",
+ " \n",
+ " date | \n",
+ " code | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 2011-01-31 | \n",
+ " 000001.XSHE | \n",
+ " -0.005361 | \n",
+ " -0.004834 | \n",
+ " 0.001217 | \n",
+ " -0.025758 | \n",
+ " 0.000809 | \n",
+ " 0.020066 | \n",
+ " 0.113213 | \n",
+ " -0.001390 | \n",
+ " 0.043196 | \n",
+ " 0.049163 | \n",
+ " -0.065174 | \n",
+ " -0.016647 | \n",
+ "
\n",
+ " \n",
+ " 000002.XSHE | \n",
+ " -0.020450 | \n",
+ " 0.016854 | \n",
+ " 0.007252 | \n",
+ " -0.030432 | \n",
+ " -0.000450 | \n",
+ " 0.008200 | \n",
+ " 0.036316 | \n",
+ " -0.041084 | \n",
+ " 0.107653 | \n",
+ " -0.013657 | \n",
+ " -0.008253 | \n",
+ " 0.060002 | \n",
+ "
\n",
+ " \n",
+ " 000009.XSHE | \n",
+ " -0.003415 | \n",
+ " 0.049266 | \n",
+ " 0.002487 | \n",
+ " 0.041332 | \n",
+ " 0.100920 | \n",
+ " 0.089511 | \n",
+ " -0.060232 | \n",
+ " -0.043439 | \n",
+ " -0.036879 | \n",
+ " -0.010946 | \n",
+ " 0.021705 | \n",
+ " 0.030863 | \n",
+ "
\n",
+ " \n",
+ " 000012.XSHE | \n",
+ " -0.042559 | \n",
+ " -0.061833 | \n",
+ " 0.062131 | \n",
+ " 0.013991 | \n",
+ " 0.073872 | \n",
+ " -0.042822 | \n",
+ " -0.054426 | \n",
+ " 0.027782 | \n",
+ " 0.017037 | \n",
+ " 0.037832 | \n",
+ " -0.036528 | \n",
+ " -0.073062 | \n",
+ "
\n",
+ " \n",
+ " 000021.XSHE | \n",
+ " 0.136461 | \n",
+ " 0.057748 | \n",
+ " 0.044247 | \n",
+ " -0.025915 | \n",
+ " -0.048909 | \n",
+ " -0.042097 | \n",
+ " -0.017216 | \n",
+ " 0.039846 | \n",
+ " -0.051565 | \n",
+ " -0.039835 | \n",
+ " -0.057085 | \n",
+ " 0.005247 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " LVGI ... total_asset_growth_rate\n",
+ "date code ... \n",
+ "2011-01-31 000001.XSHE -0.005361 ... -0.016647\n",
+ " 000002.XSHE -0.020450 ... 0.060002\n",
+ " 000009.XSHE -0.003415 ... 0.030863\n",
+ " 000012.XSHE -0.042559 ... -0.073062\n",
+ " 000021.XSHE 0.136461 ... 0.005247\n",
+ "\n",
+ "[5 rows x 12 columns]"
+ ]
+ },
+ "execution_count": 35,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "datas_all = pd.read_csv(\n",
+ " '../Data/SVM_timing_datas.csv', index_col=[0, 1]) #读取对称正交后的数据集\n",
+ "\n",
+ "datas = datas_all[[\n",
+ " i for i in datas_all.columns if i not in ['cap', 'log_ret', 'INDUSTRY']\n",
+ "]] #提取因子数据\n",
+ "\n",
+ "datas.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "metadata": {
+ "code_folding": [
+ 0
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "# 用于回测调用\n",
+ "Ret_mat.to_csv('../Data/Ret_mat.csv')\n",
+ "datas.to_csv('../Data/datas.csv')\n",
+ "df_m_shifted_.to_csv('../Data/df_m_shifted_.csv')\n",
+ "weight_in.to_csv('../Data/weight_in.csv')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 45,
+ "metadata": {
+ "code_folding": [
+ 0
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "# 定义类'参数分析'\n",
+ "class parameter_analysis(object):\n",
+ "\n",
+ " # 定义函数中不同的变量\n",
+ " def __init__(self, algorithm_id=None):\n",
+ " self.algorithm_id = algorithm_id # 回测id\n",
+ "\n",
+ " self.params_df = pd.DataFrame(\n",
+ " ) # 回测中所有调参备选值的内容,列名字为对应修改面两名称,对应回测中的 g.XXXX\n",
+ " self.results = {} # 回测结果的回报率,key 为 params_df 的行序号,value 为\n",
+ " self.evaluations = {\n",
+ " } # 回测结果的各项指标,key 为 params_df 的行序号,value 为一个 dataframe\n",
+ " self.backtest_ids = {} # 回测结果的 id\n",
+ "\n",
+ " # 新加入的基准的回测结果 id,可以默认为空 '',则使用回测中设定的基准\n",
+ " self.benchmark_id = 'f16629492d6b6f4040b2546262782c78'\n",
+ " \n",
+ " self.benchmark_returns = [] # 新加入的基准的回测回报率\n",
+ " self.returns = {} # 记录所有回报率\n",
+ " self.excess_returns = {} # 记录超额收益率\n",
+ " self.log_returns = {} # 记录收益率的 log 值\n",
+ " self.log_excess_returns = {} # 记录超额收益的 log 值\n",
+ " self.dates = [] # 回测对应的所有日期\n",
+ " self.excess_max_drawdown = {} # 计算超额收益的最大回撤\n",
+ " self.excess_annual_return = {} # 计算超额收益率的年化指标\n",
+ " self.evaluations_df = pd.DataFrame() # 记录各项回测指标,除日回报率外\n",
+ " self.failed_list = []\n",
+ " self.nav_df = pd.DataFrame()\n",
+ " \n",
+ " # 定义排队运行多参数回测函数\n",
+ " def run_backtest(\n",
+ " self, #\n",
+ " algorithm_id=None, # 回测策略id\n",
+ " running_max=10, # 回测中同时巡行最大回测数量\n",
+ " start_date='2006-01-01', # 回测的起始日期\n",
+ " end_date='2016-11-30', # 回测的结束日期\n",
+ " frequency='day', # 回测的运行频率\n",
+ " initial_cash='1000000', # 回测的初始持仓金额\n",
+ " param_names=[], # 回测中调整参数涉及的变量\n",
+ " param_values=[], # 回测中每个变量的备选参数值\n",
+ " python_version=2, # 回测的python版本\n",
+ " use_credit=False # 是否允许消耗积分继续回测\n",
+ " ):\n",
+ " # 当此处回测策略的 id 没有给出时,调用类输入的策略 id\n",
+ " if algorithm_id == None:\n",
+ " algorithm_id = self.algorithm_id\n",
+ "\n",
+ " # 生成所有参数组合并加载到 df 中\n",
+ " # 包含了不同参数具体备选值的排列组合中一组参数的 tuple 的 list\n",
+ " param_combinations = list(itertools.product(*param_values))\n",
+ " # 生成一个 dataframe, 对应的列为每个调参的变量,每个值为调参对应的备选值\n",
+ " to_run_df = pd.DataFrame(param_combinations, dtype='object')\n",
+ " # 修改列名称为调参变量的名字\n",
+ " to_run_df.columns = param_names\n",
+ "\n",
+ " # 设定运行起始时间和保存格式\n",
+ " start = time.time()\n",
+ " # 记录结束的运行回测\n",
+ " finished_backtests = {}\n",
+ " # 记录运行中的回测\n",
+ " running_backtests = {}\n",
+ " # 计数器\n",
+ " pointer = 0\n",
+ " # 总运行回测数目,等于排列组合中的元素个数\n",
+ " total_backtest_num = len(param_combinations)\n",
+ " # 记录回测结果的回报率\n",
+ " all_results = {}\n",
+ " # 记录回测结果的各项指标\n",
+ " all_evaluations = {}\n",
+ "\n",
+ " # 在运行开始时显示\n",
+ " print(('【已完成|运行中|待运行】:'), end=' ')\n",
+ " # 当运行回测开始后,如果没有全部运行完全的话:\n",
+ " while len(finished_backtests) < total_backtest_num:\n",
+ " # 显示运行、完成和待运行的回测个数\n",
+ " print(('[%s|%s|%s].' %\n",
+ " (len(finished_backtests), len(running_backtests),\n",
+ " (total_backtest_num - len(finished_backtests) -\n",
+ " len(running_backtests)))),\n",
+ " end=' ')\n",
+ " # 记录当前运行中的空位数量\n",
+ " to_run = min(\n",
+ " running_max - len(running_backtests), total_backtest_num -\n",
+ " len(running_backtests) - len(finished_backtests))\n",
+ " # 把可用的空位进行跑回测\n",
+ " for i in range(pointer, pointer + to_run):\n",
+ " # 备选的参数排列组合的 df 中第 i 行变成 dict,每个 key 为列名字,value 为 df 中对应的值\n",
+ " params = to_run_df.iloc[i].to_dict()\n",
+ " # 记录策略回测结果的 id,调整参数 extras 使用 params 的内容\n",
+ " backtest = create_backtest(\n",
+ " algorithm_id=algorithm_id,\n",
+ " start_date=start_date,\n",
+ " end_date=end_date,\n",
+ " frequency=frequency,\n",
+ " initial_cash=initial_cash,\n",
+ " extras=params,\n",
+ " # 再回测中把改参数的结果起一个名字,包含了所有涉及的变量参数值\n",
+ " name=str(params),\n",
+ " python_version=python_version,\n",
+ " use_credit=use_credit)\n",
+ " # 记录运行中 i 回测的回测 id\n",
+ " running_backtests[i] = backtest\n",
+ " # 计数器计数运行完的数量\n",
+ " pointer = pointer + to_run\n",
+ "\n",
+ " # 获取回测结果\n",
+ " failed = []\n",
+ " finished = []\n",
+ " # 对于运行中的回测,key 为 to_run_df 中所有排列组合中的序数\n",
+ " for key in list(running_backtests.keys()):\n",
+ " # 研究调用回测的结果,running_backtests[key] 为运行中保存的结果 id\n",
+ " back_id = running_backtests[key]\n",
+ " bt = get_backtest(back_id)\n",
+ " # 获得运行回测结果的状态,成功和失败都需要运行结束后返回,如果没有返回则运行没有结束\n",
+ " status = bt.get_status()\n",
+ " # 当运行回测失败\n",
+ " if status == 'failed':\n",
+ " # 失败 list 中记录对应的回测结果 id\n",
+ " print('')\n",
+ " print((\n",
+ " '回测失败 : https://www.joinquant.com/algorithm/backtest/detail?backtestId='\n",
+ " + back_id))\n",
+ " failed.append(key)\n",
+ " # 当运行回测成功时\n",
+ " elif status == 'done':\n",
+ " # 成功 list 记录对应的回测结果 id,finish 仅记录运行成功的\n",
+ " finished.append(key)\n",
+ " # 回测回报率记录对应回测的回报率 dict, key to_run_df 中所有排列组合中的序数, value 为回报率的 dict\n",
+ " # 每个 value 一个 list 每个对象为一个包含时间、日回报率和基准回报率的 dict\n",
+ " all_results[key] = bt.get_results()\n",
+ " # 回测回报率记录对应回测结果指标 dict, key to_run_df 中所有排列组合中的序数, value 为回测结果指标的 dataframe\n",
+ " all_evaluations[key] = bt.get_risk()\n",
+ " # 记录运行中回测结果 id 的 list 中删除失败的运行\n",
+ " for key in failed:\n",
+ " finished_backtests[key] = running_backtests.pop(key)\n",
+ " # 在结束回测结果 dict 中记录运行成功的回测结果 id,同时在运行中的记录中删除该回测\n",
+ " for key in finished:\n",
+ " finished_backtests[key] = running_backtests.pop(key)\n",
+ "# print (finished_backtests)\n",
+ "# 当一组同时运行的回测结束时报告时间\n",
+ " if len(finished_backtests) != 0 and len(\n",
+ " finished_backtests) % running_max == 0 and to_run != 0:\n",
+ " # 记录当时时间\n",
+ " middle = time.time()\n",
+ " # 计算剩余时间,假设没工作量时间相等的话\n",
+ " remain_time = (middle - start) * (\n",
+ " total_backtest_num -\n",
+ " len(finished_backtests)) / len(finished_backtests)\n",
+ " # print 当前运行时间\n",
+ " print(\n",
+ " ('[已用%s时,尚余%s时,请不要关闭浏览器].' %\n",
+ " (str(round((middle - start) / 60.0 / 60.0,\n",
+ " 3)), str(round(remain_time / 60.0 / 60.0, 3)))),\n",
+ " end=' ')\n",
+ " self.failed_list += failed\n",
+ " # 5秒钟后再跑一下\n",
+ " time.sleep(5)\n",
+ " # 记录结束时间\n",
+ " end = time.time()\n",
+ " print('')\n",
+ " print(\n",
+ " ('【回测完成】总用时:%s秒(即%s小时)。' %\n",
+ " (str(int(end - start)), str(round(\n",
+ " (end - start) / 60.0 / 60.0, 2)))),\n",
+ " end=' ')\n",
+ " # print (to_run_df,all_results,all_evaluations,finished_backtests)\n",
+ " # 对应修改类内部对应\n",
+ " # to_run_df = {key:value for key,value in returns.items() if key not in faild}\n",
+ " self.params_df = to_run_df\n",
+ " # all_results = {key:value for key,value in all_results.items() if key not in faild}\n",
+ " self.results = all_results\n",
+ " # all_evaluations = {key:value for key,value in all_evaluations.items() if key not in faild}\n",
+ " self.evaluations = all_evaluations\n",
+ " # finished_backtests = {key:value for key,value in finished_backtests.items() if key not in faild}\n",
+ " self.backtest_ids = finished_backtests\n",
+ "\n",
+ " #7 最大回撤计算方法\n",
+ " def find_max_drawdown(self, returns):\n",
+ " # 定义最大回撤的变量\n",
+ " result = 0\n",
+ " # 记录最高的回报率点\n",
+ " historical_return = 0\n",
+ " # 遍历所有日期\n",
+ " for i in range(len(returns)):\n",
+ " # 最高回报率记录\n",
+ " historical_return = max(historical_return, returns[i])\n",
+ " # 最大回撤记录\n",
+ " drawdown = 1 - (returns[i] + 1) / (historical_return + 1)\n",
+ " # 记录最大回撤\n",
+ " result = max(drawdown, result)\n",
+ " # 返回最大回撤值\n",
+ " return result\n",
+ "\n",
+ " # log 收益、新基准下超额收益和相对与新基准的最大回撤\n",
+ " def organize_backtest_results(self, benchmark_id=None):\n",
+ " # 若新基准的回测结果 id 没给出\n",
+ " if benchmark_id == None:\n",
+ " # 使用默认的基准回报率,默认的基准在回测策略中设定\n",
+ " self.benchmark_returns = [\n",
+ " x['benchmark_returns'] for x in self.results[0]\n",
+ " ]\n",
+ " # 当新基准指标给出后\n",
+ " else:\n",
+ " # 基准使用新加入的基准回测结果\n",
+ " self.benchmark_returns = [\n",
+ " x['returns'] for x in get_backtest(benchmark_id).get_results()\n",
+ " ]\n",
+ " # 回测日期为结果中记录的第一项对应的日期\n",
+ " self.dates = [x['time'] for x in self.results[0]]\n",
+ "\n",
+ " # 对应每个回测在所有备选回测中的顺序 (key),生成新数据\n",
+ " # 由 {key:{u'benchmark_returns': 0.022480100091729405,\n",
+ " # u'returns': 0.03184566700000002,\n",
+ " # u'time': u'2006-02-14'}} 格式转化为:\n",
+ " # {key: []} 格式,其中 list 为对应 date 的一个回报率 list\n",
+ " for key in list(self.results.keys()):\n",
+ " self.returns[key] = [x['returns'] for x in self.results[key]]\n",
+ " # 生成对于基准(或新基准)的超额收益率\n",
+ " for key in list(self.results.keys()):\n",
+ " self.excess_returns[key] = [\n",
+ " (x + 1) / (y + 1) - 1\n",
+ " for (x, y) in zip(self.returns[key], self.benchmark_returns)\n",
+ " ]\n",
+ " # 生成 log 形式的收益率\n",
+ " for key in list(self.results.keys()):\n",
+ " self.log_returns[key] = [log(x + 1) for x in self.returns[key]]\n",
+ " # 生成超额收益率的 log 形式\n",
+ " for key in list(self.results.keys()):\n",
+ " self.log_excess_returns[key] = [\n",
+ " log(x + 1) for x in self.excess_returns[key]\n",
+ " ]\n",
+ " # 生成超额收益率的最大回撤\n",
+ " for key in list(self.results.keys()):\n",
+ " self.excess_max_drawdown[key] = self.find_max_drawdown(\n",
+ " self.excess_returns[key])\n",
+ " # 生成年化超额收益率\n",
+ " for key in list(self.results.keys()):\n",
+ " self.excess_annual_return[key] = (self.excess_returns[key][-1] +\n",
+ " 1)**(252. /\n",
+ " float(len(self.dates))) - 1\n",
+ " # 把调参数据中的参数组合 df 与对应结果的 df 进行合并\n",
+ " self.evaluations_df = pd.concat(\n",
+ " [self.params_df, pd.DataFrame(self.evaluations).T], axis=1)\n",
+ "\n",
+ "\n",
+ "# self.evaluations_df =\n",
+ "\n",
+ "# 获取最总分析数据,调用排队回测函数和数据整理的函数\n",
+ "\n",
+ " def get_backtest_data(\n",
+ " self,\n",
+ " algorithm_id=None, # 回测策略id\n",
+ " benchmark_id=None, # 新基准回测结果id\n",
+ " file_name='results.pkl', # 保存结果的 pickle 文件名字\n",
+ " running_max=10, # 最大同时运行回测数量\n",
+ " start_date='2006-01-01', # 回测开始时间\n",
+ " end_date='2016-11-30', # 回测结束日期\n",
+ " frequency='day', # 回测的运行频率\n",
+ " initial_cash='1000000', # 回测初始持仓资金\n",
+ " param_names=[], # 回测需要测试的变量\n",
+ " param_values=[], # 对应每个变量的备选参数\n",
+ " python_version=2,\n",
+ " use_credit=False):\n",
+ " # 调运排队回测函数,传递对应参数\n",
+ " self.run_backtest(\n",
+ " algorithm_id=algorithm_id,\n",
+ " running_max=running_max,\n",
+ " start_date=start_date,\n",
+ " end_date=end_date,\n",
+ " frequency=frequency,\n",
+ " initial_cash=initial_cash,\n",
+ " param_names=param_names,\n",
+ " param_values=param_values,\n",
+ " python_version=python_version,\n",
+ " use_credit=use_credit,\n",
+ " )\n",
+ " # 回测结果指标中加入 log 收益率和超额收益率等指标\n",
+ " self.organize_backtest_results(benchmark_id)\n",
+ " # 生成 dict 保存所有结果。\n",
+ " results = {\n",
+ " 'returns': self.returns,\n",
+ " 'excess_returns': self.excess_returns,\n",
+ " 'log_returns': self.log_returns,\n",
+ " 'log_excess_returns': self.log_excess_returns,\n",
+ " 'dates': self.dates,\n",
+ " 'benchmark_returns': self.benchmark_returns,\n",
+ " 'evaluations': self.evaluations,\n",
+ " 'params_df': self.params_df,\n",
+ " 'backtest_ids': self.backtest_ids,\n",
+ " 'excess_max_drawdown': self.excess_max_drawdown,\n",
+ " 'excess_annual_return': self.excess_annual_return,\n",
+ " 'evaluations_df': self.evaluations_df,\n",
+ " \"failed_list\": self.failed_list\n",
+ " }\n",
+ " # 保存 pickle 文件\n",
+ " pickle_file = open(file_name, 'wb')\n",
+ " pickle.dump(results, pickle_file)\n",
+ " pickle_file.close()\n",
+ "\n",
+ " # 读取保存的 pickle 文件,赋予类中的对象名对应的保存内容\n",
+ " def read_backtest_data(self, file_name='results.pkl'):\n",
+ " pickle_file = open(file_name, 'rb')\n",
+ " results = pickle.load(pickle_file)\n",
+ " self.returns = results['returns']\n",
+ " self.excess_returns = results['excess_returns']\n",
+ " self.log_returns = results['log_returns']\n",
+ " self.log_excess_returns = results['log_excess_returns']\n",
+ " self.dates = results['dates']\n",
+ " self.benchmark_returns = results['benchmark_returns']\n",
+ " self.evaluations = results['evaluations']\n",
+ " self.params_df = results['params_df']\n",
+ " self.backtest_ids = results['backtest_ids']\n",
+ " self.excess_max_drawdown = results['excess_max_drawdown']\n",
+ " self.excess_annual_return = results['excess_annual_return']\n",
+ " self.evaluations_df = results['evaluations_df']\n",
+ " self.failed_list = results['failed_list']\n",
+ " self.nav_df = self.GetNavDf()\n",
+ "\n",
+ " # 回报率折线图\n",
+ " def plot_returns(self):\n",
+ " # 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;\n",
+ " fig = plt.figure(figsize=(20, 8))\n",
+ " ax = fig.add_subplot(111)\n",
+ " # 作图\n",
+ " for key in list(self.returns.keys()):\n",
+ " ax.plot(\n",
+ " list(range(len(self.returns[key]))),\n",
+ " self.returns[key],\n",
+ " label=key)\n",
+ " # 设定benchmark曲线并标记\n",
+ " ax.plot(\n",
+ " list(range(len(self.benchmark_returns))),\n",
+ " self.benchmark_returns,\n",
+ " label='benchmark',\n",
+ " c='k',\n",
+ " linestyle='--')\n",
+ " ticks = [int(x) for x in np.linspace(0, len(self.dates) - 1, 11)]\n",
+ " plt.xticks(ticks, [self.dates[i] for i in ticks])\n",
+ " # 设置图例样式\n",
+ " ax.legend(loc=2, fontsize=10)\n",
+ " # 设置y标签样式\n",
+ " ax.set_ylabel('returns', fontsize=20)\n",
+ " # 设置x标签样式\n",
+ " ax.set_yticklabels([str(x * 100) + '% ' for x in ax.get_yticks()])\n",
+ " # 设置图片标题样式\n",
+ " ax.set_title(\n",
+ " \"Strategy's performances with different parameters\", fontsize=21)\n",
+ " plt.xlim(0, len(self.returns[0]))\n",
+ "\n",
+ " # 超额收益率图\n",
+ " def plot_excess_returns(self):\n",
+ " # 通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸;\n",
+ " fig = plt.figure(figsize=(20, 8))\n",
+ " ax = fig.add_subplot(111)\n",
+ " # 作图\n",
+ " for key in list(self.returns.keys()):\n",
+ " ax.plot(\n",
+ " list(range(len(self.excess_returns[key]))),\n",
+ " self.excess_returns[key],\n",
+ " label=key)\n",
+ " # 设定benchmark曲线并标记\n",
+ " ax.plot(\n",
+ " list(range(len(self.benchmark_returns))),\n",
+ " [0] * len(self.benchmark_returns),\n",
+ " label='benchmark',\n",
+ " c='k',\n",
+ " linestyle='--')\n",
+ " ticks = [int(x) for x in np.linspace(0, len(self.dates) - 1, 11)]\n",
+ " plt.xticks(ticks, [self.dates[i] for i in ticks])\n",
+ " # 设置图例样式\n",
+ " ax.legend(loc=2, fontsize=10)\n",
+ " # 设置y标签样式\n",
+ " ax.set_ylabel('excess returns', fontsize=20)\n",
+ " # 设置x标签样式\n",
+ " ax.set_yticklabels([str(x * 100) + '% ' for x in ax.get_yticks()])\n",
+ " # 设置图片标题样式\n",
+ " ax.set_title(\n",
+ " \"Strategy's performances with different parameters\", fontsize=21)\n",
+ " plt.xlim(0, len(self.excess_returns[0]))\n",
+ "\n",
+ " \n",
+ " # 回测的4个主要指标,包括总回报率、最大回撤夏普率和波动\n",
+ " def get_eval4_bar(self, sort_by=[]):\n",
+ "\n",
+ " sorted_params = self.params_df\n",
+ " for by in sort_by:\n",
+ " sorted_params = sorted_params.sort(by)\n",
+ " indices = sorted_params.index\n",
+ " indices = set(sorted_params.index) - set(self.failed_list)\n",
+ " fig = plt.figure(figsize=(20, 7))\n",
+ "\n",
+ " # 定义位置\n",
+ " ax1 = fig.add_subplot(221)\n",
+ " # 设定横轴为对应分位,纵轴为对应指标\n",
+ " ax1.bar(\n",
+ " list(range(len(indices))),\n",
+ " [self.evaluations[x]['algorithm_return'] for x in indices],\n",
+ " 0.6,\n",
+ " label='Algorithm_return')\n",
+ " plt.xticks([x + 0.3 for x in range(len(indices))], indices)\n",
+ " # 设置图例样式\n",
+ " ax1.legend(loc='best', fontsize=15)\n",
+ " # 设置y标签样式\n",
+ " ax1.set_ylabel('Algorithm_return', fontsize=15)\n",
+ " # 设置y标签样式\n",
+ " ax1.set_yticklabels([str(x * 100) + '% ' for x in ax1.get_yticks()])\n",
+ " # 设置图片标题样式\n",
+ " ax1.set_title(\n",
+ " \"Strategy's of Algorithm_return performances of different quantile\",\n",
+ " fontsize=15)\n",
+ " # x轴范围\n",
+ " plt.xlim(0, len(indices))\n",
+ "\n",
+ " # 定义位置\n",
+ " ax2 = fig.add_subplot(224)\n",
+ " # 设定横轴为对应分位,纵轴为对应指标\n",
+ " ax2.bar(\n",
+ " list(range(len(indices))),\n",
+ " [self.evaluations[x]['max_drawdown'] for x in indices],\n",
+ " 0.6,\n",
+ " label='Max_drawdown')\n",
+ " plt.xticks([x + 0.3 for x in range(len(indices))], indices)\n",
+ " # 设置图例样式\n",
+ " ax2.legend(loc='best', fontsize=15)\n",
+ " # 设置y标签样式\n",
+ " ax2.set_ylabel('Max_drawdown', fontsize=15)\n",
+ " # 设置x标签样式\n",
+ " ax2.set_yticklabels([str(x * 100) + '% ' for x in ax2.get_yticks()])\n",
+ " # 设置图片标题样式\n",
+ " ax2.set_title(\n",
+ " \"Strategy's of Max_drawdown performances of different quantile\",\n",
+ " fontsize=15)\n",
+ " # x轴范围\n",
+ " plt.xlim(0, len(indices))\n",
+ "\n",
+ " # 定义位置\n",
+ " ax3 = fig.add_subplot(223)\n",
+ " # 设定横轴为对应分位,纵轴为对应指标\n",
+ " ax3.bar(\n",
+ " list(range(len(indices))),\n",
+ " [self.evaluations[x]['sharpe'] for x in indices],\n",
+ " 0.6,\n",
+ " label='Sharpe')\n",
+ " plt.xticks([x + 0.3 for x in range(len(indices))], indices)\n",
+ " # 设置图例样式\n",
+ " ax3.legend(loc='best', fontsize=15)\n",
+ " # 设置y标签样式\n",
+ " ax3.set_ylabel('Sharpe', fontsize=15)\n",
+ " # 设置x标签样式\n",
+ " ax3.set_yticklabels([str(x * 100) + '% ' for x in ax3.get_yticks()])\n",
+ " # 设置图片标题样式\n",
+ " ax3.set_title(\n",
+ " \"Strategy's of Sharpe performances of different quantile\",\n",
+ " fontsize=15)\n",
+ " # x轴范围\n",
+ " plt.xlim(0, len(indices))\n",
+ "\n",
+ " # 定义位置\n",
+ " ax4 = fig.add_subplot(222)\n",
+ " # 设定横轴为对应分位,纵轴为对应指标\n",
+ " ax4.bar(\n",
+ " list(range(len(indices))),\n",
+ " [self.evaluations[x]['algorithm_volatility'] for x in indices],\n",
+ " 0.6,\n",
+ " label='Algorithm_volatility')\n",
+ " plt.xticks([x + 0.3 for x in range(len(indices))], indices)\n",
+ " # 设置图例样式\n",
+ " ax4.legend(loc='best', fontsize=15)\n",
+ " # 设置y标签样式\n",
+ " ax4.set_ylabel('Algorithm_volatility', fontsize=15)\n",
+ " # 设置x标签样式\n",
+ " ax4.set_yticklabels([str(x * 100) + '% ' for x in ax4.get_yticks()])\n",
+ " # 设置图片标题样式\n",
+ " ax4.set_title(\n",
+ " \"Strategy's of Algorithm_volatility performances of different quantile\",\n",
+ " fontsize=15)\n",
+ " # x轴范围\n",
+ " plt.xlim(0, len(indices))\n",
+ "\n",
+ " #14 年化回报和最大回撤,正负双色表示\n",
+ " def get_eval(self, sort_by=[]):\n",
+ "\n",
+ " sorted_params = self.params_df\n",
+ " for by in sort_by:\n",
+ " sorted_params = sorted_params.sort(by)\n",
+ " indices = sorted_params.index\n",
+ " indices = set(sorted_params.index) - set(self.failed_list)\n",
+ " # 大小\n",
+ " fig = plt.figure(figsize=(20, 8))\n",
+ " # 图1位置\n",
+ " ax = fig.add_subplot(111)\n",
+ " # 生成图超额收益率的最大回撤\n",
+ " ax.bar([x + 0.3 for x in range(len(indices))],\n",
+ " [-self.evaluations[x]['max_drawdown'] for x in indices],\n",
+ " color='#32CD32',\n",
+ " width=0.6,\n",
+ " label='Max_drawdown',\n",
+ " zorder=10)\n",
+ " # 图年化超额收益\n",
+ " ax.bar([x for x in range(len(indices))],\n",
+ " [self.evaluations[x]['annual_algo_return'] for x in indices],\n",
+ " color='r',\n",
+ " width=0.6,\n",
+ " label='Annual_return')\n",
+ " plt.xticks([x + 0.3 for x in range(len(indices))], indices)\n",
+ " # 设置图例样式\n",
+ " ax.legend(loc='best', fontsize=15)\n",
+ " # 基准线\n",
+ " plt.plot([0, len(indices)], [0, 0], c='k', linestyle='--', label='zero')\n",
+ " # 设置图例样式\n",
+ " ax.legend(loc='best', fontsize=15)\n",
+ " # 设置y标签样式\n",
+ " ax.set_ylabel('Max_drawdown', fontsize=15)\n",
+ " # 设置x标签样式\n",
+ " ax.set_yticklabels([str(x * 100) + '% ' for x in ax.get_yticks()])\n",
+ " # 设置图片标题样式\n",
+ " ax.set_title(\n",
+ " \"Strategy's performances of different quantile\", fontsize=15)\n",
+ " # 设定x轴长度\n",
+ " plt.xlim(0, len(indices))\n",
+ "\n",
+ " #14 超额收益的年化回报和最大回撤\n",
+ " # 加入新的benchmark后超额收益和\n",
+ " def get_excess_eval(self, sort_by=[]):\n",
+ "\n",
+ " sorted_params = self.params_df\n",
+ " for by in sort_by:\n",
+ " sorted_params = sorted_params.sort(by)\n",
+ " indices = sorted_params.index\n",
+ " indices = set(sorted_params.index) - set(self.failed_list)\n",
+ " # 大小\n",
+ " fig = plt.figure(figsize=(20, 8))\n",
+ " # 图1位置\n",
+ " ax = fig.add_subplot(111)\n",
+ " # 生成图超额收益率的最大回撤\n",
+ " ax.bar([x + 0.3 for x in range(len(indices))],\n",
+ " [-self.excess_max_drawdown[x] for x in indices],\n",
+ " color='#32CD32',\n",
+ " width=0.6,\n",
+ " label='Excess_max_drawdown')\n",
+ " # 图年化超额收益\n",
+ " ax.bar([x for x in range(len(indices))],\n",
+ " [self.excess_annual_return[x] for x in indices],\n",
+ " color='r',\n",
+ " width=0.6,\n",
+ " label='Excess_annual_return')\n",
+ " plt.xticks([x + 0.3 for x in range(len(indices))], indices)\n",
+ " # 设置图例样式\n",
+ " ax.legend(loc='best', fontsize=15)\n",
+ " # 基准线\n",
+ " plt.plot([0, len(indices)], [0, 0], c='k', linestyle='--', label='zero')\n",
+ " # 设置图例样式\n",
+ " ax.legend(loc='best', fontsize=15)\n",
+ " # 设置y标签样式\n",
+ " ax.set_ylabel('Max_drawdown', fontsize=15)\n",
+ " # 设置x标签样式\n",
+ " ax.set_yticklabels([str(x * 100) + '% ' for x in ax.get_yticks()])\n",
+ " # 设置图片标题样式\n",
+ " ax.set_title(\n",
+ " \"Strategy's performances of different quantile\", fontsize=15)\n",
+ " # 设定x轴长度\n",
+ " plt.xlim(0, len(indices))\n",
+ " \n",
+ " def GetNavDf(self):\n",
+ " \n",
+ " df = 1 + pd.DataFrame(self.returns,index=self.dates)\n",
+ " df.columns = ['SVM','Logistic','RandomForest','not_timing']\n",
+ " df['benchmark'] = self.benchmark_returns\n",
+ " df.index=pd.to_datetime(df.index)\n",
+ " return df\n",
+ " \n",
+ " #计算组合收益率分析:年化收益率、收益波动率、夏普比率、最大回撤\n",
+ " def strategy_performance(self,nav_df=None):\n",
+ " \n",
+ " if isinstance(nav_df,pd.DataFrame):\n",
+ " \n",
+ " nav_df = nav_df\n",
+ " else:\n",
+ " nav_df = self.nav_df\n",
+ " ##part1:根据回测净值计算相关指标的数据准备(日度数据)\n",
+ " nav_next = nav_df.shift(1)\n",
+ " return_df = (nav_df - nav_next) / nav_next #计算净值变化率,即为日收益率,包含组合与基准\n",
+ " return_df = return_df.dropna() #在计算净值变化率时,首日得到的是缺失值,需将其删除\n",
+ "\n",
+ " analyze = pd.DataFrame() #用于存储计算的指标\n",
+ "\n",
+ " ##part2:计算年化收益率\n",
+ " cum_return = np.exp(np.log1p(return_df).cumsum()) - 1 #计算整个回测期内的复利收益率\n",
+ " annual_return_df = (1 + cum_return)**(252 / len(return_df)) - 1 #计算年化收益率\n",
+ " analyze['annual_return'] = annual_return_df.iloc[-1] #将年化收益率的Series赋值给数据框\n",
+ "\n",
+ " #part3:计算收益波动率(以年为基准)\n",
+ " analyze['return_volatility'] = return_df.std() * np.sqrt(\n",
+ " 252) #return中的收益率为日收益率,所以计算波动率转化为年时,需要乘上np.sqrt(252)\n",
+ "\n",
+ " #part4:计算夏普比率\n",
+ " risk_free = 0\n",
+ " return_risk_adj = return_df - risk_free\n",
+ " analyze['sharpe_ratio'] = return_risk_adj.mean() / np.std(\n",
+ " return_risk_adj, ddof=1)\n",
+ "\n",
+ " #prat5:计算最大回撤\n",
+ " cumulative = np.exp(np.log1p(return_df).cumsum()) * 100 #计算累计收益率\n",
+ " max_return = cumulative.cummax() #计算累计收益率的在各个时间段的最大值\n",
+ " analyze['max_drawdown'] = cumulative.sub(max_return).div(\n",
+ " max_return).min() #最大回撤一般小于0,越小,说明离1越远,各时间点与最大收益的差距越大\n",
+ "\n",
+ " #part6:计算相对指标\n",
+ " analyze['relative_return'] = analyze['annual_return'] - analyze.loc[\n",
+ " 'benchmark', 'annual_return'] #计算相对年化波动率\n",
+ " analyze['relative_volatility'] = analyze['return_volatility'] - analyze.loc[\n",
+ " 'benchmark', 'return_volatility'] #计算相对波动\n",
+ " analyze['relative_drawdown'] = analyze['max_drawdown'] - analyze.loc[\n",
+ " 'benchmark', 'max_drawdown'] #计算相对最大回撤\n",
+ "\n",
+ " #part6:计算信息比率\n",
+ " return_diff = return_df.sub(\n",
+ " return_df['benchmark'], axis=0).std() * np.sqrt(\n",
+ " 252) #计算策略与基准日收益差值的年化标准差\n",
+ " analyze['info_ratio'] = analyze['relative_return'].div(return_diff)\n",
+ "\n",
+ " return analyze.T\n",
+ "\n",
+ "\n",
+ " #构建每年的收益表现函数\n",
+ " def get_return_year(self,method):\n",
+ " \n",
+ " nav = self.nav_df[['benchmark',method]]\n",
+ " result_dic = {} #用于存储每年计算的各项指标\n",
+ " for y,nav_df in nav.groupby(pd.Grouper(level=0,freq='Y')):\n",
+ "\n",
+ " result = self.strategy_performance(nav_df)\n",
+ " result_dic[str(y)[:4]] = result.iloc[:, -1]\n",
+ "\n",
+ " result_df = pd.DataFrame(result_dic)\n",
+ " \n",
+ " return result_df.T\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "metadata": {
+ "code_folding": []
+ },
+ "outputs": [],
+ "source": [
+ "#2 设定回测的 策略id \n",
+ "pa = parameter_analysis('a46bf5ddd29f2ce407d284e4bb01a6ee')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {
+ "code_folding": [],
+ "scrolled": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "【已完成|运行中|待运行】: [0|0|4]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [0|2|2]. [1|1|2]. [1|2|1]. [1|2|1]. [1|2|1]. [1|2|1]. [1|2|1]. [1|2|1]. [1|2|1]. [2|1|1]. [已用0.072时,尚余0.072时,请不要关闭浏览器]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [2|2|0]. [3|1|0]. [3|1|0]. [3|1|0]. [3|1|0]. [3|1|0]. [3|1|0]. [3|1|0]. [3|1|0]. [3|1|0]. [3|1|0]. [3|1|0]. [3|1|0]. [3|1|0]. \n",
+ "【回测完成】总用时:488秒(即0.14小时)。 "
+ ]
+ }
+ ],
+ "source": [
+ "#3 运行回测\n",
+ "pa.get_backtest_data(file_name = 'results.pkl', # 保存回测结果的Pickle文件名\n",
+ " running_max = 2, # 同时回测的最大个数,可以通过积分商城兑换\n",
+ " benchmark_id = None, # 基准的回测ID,注意是回测ID而不是策略ID,为None时为策略中使用的基准\n",
+ " start_date = '2014-01-01', #回测开始时间\n",
+ " end_date = '2019-12-31', #回测结束时间\n",
+ " frequency = 'day', #回测频率,支持 day, minute, tick \n",
+ " initial_cash = '5000000', #初始资金\n",
+ " param_names = ['method'], #变量名称\n",
+ " param_values = [['SVM','Logistic','RandomForest','not_timing']], #变量对应的参数\n",
+ " python_version = 3 # 回测python版本\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 47,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#4 数据读取 已经运行过直接读取就可以\n",
+ "pa.read_backtest_data('results.pkl')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 48,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:588: RuntimeWarning: invalid value encountered in log1p\n",
+ "/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:603: RuntimeWarning: invalid value encountered in log1p\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " SVM | \n",
+ " Logistic | \n",
+ " RandomForest | \n",
+ " not_timing | \n",
+ " benchmark | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " annual_return | \n",
+ " 0.371384 | \n",
+ " 0.395206 | \n",
+ " 0.081977 | \n",
+ " 0.368583 | \n",
+ " 3.236285 | \n",
+ "
\n",
+ " \n",
+ " return_volatility | \n",
+ " 0.258946 | \n",
+ " 0.257669 | \n",
+ " 0.271958 | \n",
+ " 0.255917 | \n",
+ " 33.531337 | \n",
+ "
\n",
+ " \n",
+ " sharpe_ratio | \n",
+ " 0.085086 | \n",
+ " 0.089640 | \n",
+ " 0.026892 | \n",
+ " 0.085392 | \n",
+ " -0.011165 | \n",
+ "
\n",
+ " \n",
+ " max_drawdown | \n",
+ " -0.387147 | \n",
+ " -0.418825 | \n",
+ " -0.552289 | \n",
+ " -0.380277 | \n",
+ " -0.981639 | \n",
+ "
\n",
+ " \n",
+ " relative_return | \n",
+ " -2.864900 | \n",
+ " -2.841078 | \n",
+ " -3.154308 | \n",
+ " -2.867701 | \n",
+ " 0.000000 | \n",
+ "
\n",
+ " \n",
+ " relative_volatility | \n",
+ " -33.272391 | \n",
+ " -33.273668 | \n",
+ " -33.259379 | \n",
+ " -33.275420 | \n",
+ " 0.000000 | \n",
+ "
\n",
+ " \n",
+ " relative_drawdown | \n",
+ " 0.594492 | \n",
+ " 0.562815 | \n",
+ " 0.429350 | \n",
+ " 0.601363 | \n",
+ " 0.000000 | \n",
+ "
\n",
+ " \n",
+ " info_ratio | \n",
+ " -0.085443 | \n",
+ " -0.084733 | \n",
+ " -0.094067 | \n",
+ " -0.085526 | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " SVM Logistic ... not_timing benchmark\n",
+ "annual_return 0.371384 0.395206 ... 0.368583 3.236285\n",
+ "return_volatility 0.258946 0.257669 ... 0.255917 33.531337\n",
+ "sharpe_ratio 0.085086 0.089640 ... 0.085392 -0.011165\n",
+ "max_drawdown -0.387147 -0.418825 ... -0.380277 -0.981639\n",
+ "relative_return -2.864900 -2.841078 ... -2.867701 0.000000\n",
+ "relative_volatility -33.272391 -33.273668 ... -33.275420 0.000000\n",
+ "relative_drawdown 0.594492 0.562815 ... 0.601363 0.000000\n",
+ "info_ratio -0.085443 -0.084733 ... -0.085526 NaN\n",
+ "\n",
+ "[8 rows x 5 columns]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " annual_return | \n",
+ " return_volatility | \n",
+ " sharpe_ratio | \n",
+ " max_drawdown | \n",
+ " relative_return | \n",
+ " relative_volatility | \n",
+ " relative_drawdown | \n",
+ " info_ratio | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 2014 | \n",
+ " 1.099792 | \n",
+ " 0.224587 | \n",
+ " 0.215394 | \n",
+ " -0.111590 | \n",
+ " -3863.881091 | \n",
+ " -81.977843 | \n",
+ " 0.870049 | \n",
+ " -47.001906 | \n",
+ "
\n",
+ " \n",
+ " 2015 | \n",
+ " 0.484994 | \n",
+ " 0.421962 | \n",
+ " 0.072478 | \n",
+ " -0.361505 | \n",
+ " 0.414161 | \n",
+ " -0.622279 | \n",
+ " 0.408428 | \n",
+ " 0.618584 | \n",
+ "
\n",
+ " \n",
+ " 2016 | \n",
+ " 0.050648 | \n",
+ " 0.258483 | \n",
+ " 0.020246 | \n",
+ " -0.219820 | \n",
+ " 0.195008 | \n",
+ " -0.554913 | \n",
+ " 0.347267 | \n",
+ " 0.341932 | \n",
+ "
\n",
+ " \n",
+ " 2017 | \n",
+ " 0.375174 | \n",
+ " 0.102547 | \n",
+ " 0.199046 | \n",
+ " -0.052014 | \n",
+ " -0.337757 | \n",
+ " -0.171973 | \n",
+ " 0.099371 | \n",
+ " -1.762492 | \n",
+ "
\n",
+ " \n",
+ " 2018 | \n",
+ " -0.078880 | \n",
+ " 0.192486 | \n",
+ " -0.020834 | \n",
+ " -0.160097 | \n",
+ " 0.548713 | \n",
+ " -0.491310 | \n",
+ " 0.519260 | \n",
+ " 1.077565 | \n",
+ "
\n",
+ " \n",
+ " 2019 | \n",
+ " 0.635615 | \n",
+ " 0.225380 | \n",
+ " 0.144691 | \n",
+ " -0.169371 | \n",
+ " -1.232669 | \n",
+ " -0.327385 | \n",
+ " 0.141103 | \n",
+ " -3.420030 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " annual_return ... info_ratio\n",
+ "2014 1.099792 ... -47.001906\n",
+ "2015 0.484994 ... 0.618584\n",
+ "2016 0.050648 ... 0.341932\n",
+ "2017 0.375174 ... -1.762492\n",
+ "2018 -0.078880 ... 1.077565\n",
+ "2019 0.635615 ... -3.420030\n",
+ "\n",
+ "[6 rows x 8 columns]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 3 | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " method | \n",
+ " SVM | \n",
+ " Logistic | \n",
+ " RandomForest | \n",
+ " not_timing | \n",
+ "
\n",
+ " \n",
+ " __version | \n",
+ " 101 | \n",
+ " 101 | \n",
+ " 101 | \n",
+ " 101 | \n",
+ "
\n",
+ " \n",
+ " algorithm_return | \n",
+ " 5.91364 | \n",
+ " 5.25583 | \n",
+ " 5.18202 | \n",
+ " 0.579988 | \n",
+ "
\n",
+ " \n",
+ " algorithm_volatility | \n",
+ " 0.256558 | \n",
+ " 0.257829 | \n",
+ " 0.254813 | \n",
+ " 0.270784 | \n",
+ "
\n",
+ " \n",
+ " alpha | \n",
+ " 0.289332 | \n",
+ " 0.265174 | \n",
+ " 0.263378 | \n",
+ " -0.0250981 | \n",
+ "
\n",
+ " \n",
+ " annual_algo_return | \n",
+ " 0.391209 | \n",
+ " 0.367658 | \n",
+ " 0.364889 | \n",
+ " 0.0812425 | \n",
+ "
\n",
+ " \n",
+ " annual_bm_return | \n",
+ " 0.101153 | \n",
+ " 0.101153 | \n",
+ " 0.101153 | \n",
+ " 0.101153 | \n",
+ "
\n",
+ " \n",
+ " avg_excess_return | \n",
+ " 0.000952917 | \n",
+ " 0.000883705 | \n",
+ " 0.00087605 | \n",
+ " -5.69881e-05 | \n",
+ "
\n",
+ " \n",
+ " avg_position_days | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 165.23 | \n",
+ "
\n",
+ " \n",
+ " avg_trade_return | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 0.0899963 | \n",
+ "
\n",
+ " \n",
+ " benchmark_return | \n",
+ " 0.758166 | \n",
+ " 0.758166 | \n",
+ " 0.758166 | \n",
+ " 0.758166 | \n",
+ "
\n",
+ " \n",
+ " benchmark_volatility | \n",
+ " 0.23633 | \n",
+ " 0.23633 | \n",
+ " 0.23633 | \n",
+ " 0.23633 | \n",
+ "
\n",
+ " \n",
+ " beta | \n",
+ " 1.01185 | \n",
+ " 1.02178 | \n",
+ " 1.00586 | \n",
+ " 1.08483 | \n",
+ "
\n",
+ " \n",
+ " day_win_ratio | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 0.506831 | \n",
+ "
\n",
+ " \n",
+ " excess_return | \n",
+ " 2.9323 | \n",
+ " 2.55816 | \n",
+ " 2.51617 | \n",
+ " -0.101343 | \n",
+ "
\n",
+ " \n",
+ " excess_return_max_drawdown | \n",
+ " 0.0907727 | \n",
+ " 0.0828414 | \n",
+ " 0.084735 | \n",
+ " 0.303404 | \n",
+ "
\n",
+ " \n",
+ " excess_return_max_drawdown_period | \n",
+ " [2015-08-17, 2015-09-30] | \n",
+ " [2015-08-24, 2015-09-30] | \n",
+ " [2015-06-30, 2015-07-09] | \n",
+ " [2015-03-25, 2019-07-04] | \n",
+ "
\n",
+ " \n",
+ " excess_return_sharpe | \n",
+ " 2.40526 | \n",
+ " 2.23159 | \n",
+ " 2.17414 | \n",
+ " -0.64979 | \n",
+ "
\n",
+ " \n",
+ " information | \n",
+ " 3.11929 | \n",
+ " 2.94462 | \n",
+ " 2.87357 | \n",
+ " -0.222655 | \n",
+ "
\n",
+ " \n",
+ " lose_count | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1101 | \n",
+ "
\n",
+ " \n",
+ " max_drawdown | \n",
+ " 0.418825 | \n",
+ " 0.387147 | \n",
+ " 0.380277 | \n",
+ " 0.552289 | \n",
+ "
\n",
+ " \n",
+ " max_drawdown_period | \n",
+ " [2015-06-12, 2016-01-28] | \n",
+ " [2015-06-12, 2016-01-28] | \n",
+ " [2015-06-12, 2015-08-26] | \n",
+ " [2015-06-02, 2019-01-03] | \n",
+ "
\n",
+ " \n",
+ " max_leverage | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ "
\n",
+ " \n",
+ " period_label | \n",
+ " 2019-12 | \n",
+ " 2019-12 | \n",
+ " 2019-12 | \n",
+ " 2019-12 | \n",
+ "
\n",
+ " \n",
+ " profit_loss_ratio | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1.20456 | \n",
+ "
\n",
+ " \n",
+ " sharpe | \n",
+ " 1.36893 | \n",
+ " 1.27084 | \n",
+ " 1.27501 | \n",
+ " 0.152308 | \n",
+ "
\n",
+ " \n",
+ " sortino | \n",
+ " 1.64223 | \n",
+ " 1.53232 | \n",
+ " 1.52263 | \n",
+ " 0.180824 | \n",
+ "
\n",
+ " \n",
+ " trading_days | \n",
+ " 1464 | \n",
+ " 1464 | \n",
+ " 1464 | \n",
+ " 1464 | \n",
+ "
\n",
+ " \n",
+ " treasury_return | \n",
+ " 0.23989 | \n",
+ " 0.23989 | \n",
+ " 0.23989 | \n",
+ " 0.23989 | \n",
+ "
\n",
+ " \n",
+ " turnover_rate | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 0.0231716 | \n",
+ "
\n",
+ " \n",
+ " win_count | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1629 | \n",
+ "
\n",
+ " \n",
+ " win_ratio | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 0.596703 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 0 ... 3\n",
+ "method SVM ... not_timing\n",
+ "__version 101 ... 101\n",
+ "algorithm_return 5.91364 ... 0.579988\n",
+ "algorithm_volatility 0.256558 ... 0.270784\n",
+ "alpha 0.289332 ... -0.0250981\n",
+ "annual_algo_return 0.391209 ... 0.0812425\n",
+ "annual_bm_return 0.101153 ... 0.101153\n",
+ "avg_excess_return 0.000952917 ... -5.69881e-05\n",
+ "avg_position_days NaN ... 165.23\n",
+ "avg_trade_return NaN ... 0.0899963\n",
+ "benchmark_return 0.758166 ... 0.758166\n",
+ "benchmark_volatility 0.23633 ... 0.23633\n",
+ "beta 1.01185 ... 1.08483\n",
+ "day_win_ratio NaN ... 0.506831\n",
+ "excess_return 2.9323 ... -0.101343\n",
+ "excess_return_max_drawdown 0.0907727 ... 0.303404\n",
+ "excess_return_max_drawdown_period [2015-08-17, 2015-09-30] ... [2015-03-25, 2019-07-04]\n",
+ "excess_return_sharpe 2.40526 ... -0.64979\n",
+ "information 3.11929 ... -0.222655\n",
+ "lose_count NaN ... 1101\n",
+ "max_drawdown 0.418825 ... 0.552289\n",
+ "max_drawdown_period [2015-06-12, 2016-01-28] ... [2015-06-02, 2019-01-03]\n",
+ "max_leverage 0 ... 0\n",
+ "period_label 2019-12 ... 2019-12\n",
+ "profit_loss_ratio NaN ... 1.20456\n",
+ "sharpe 1.36893 ... 0.152308\n",
+ "sortino 1.64223 ... 0.180824\n",
+ "trading_days 1464 ... 1464\n",
+ "treasury_return 0.23989 ... 0.23989\n",
+ "turnover_rate NaN ... 0.0231716\n",
+ "win_count NaN ... 1629\n",
+ "win_ratio NaN ... 0.596703\n",
+ "\n",
+ "[32 rows x 4 columns]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "#6 查看回测结果指标\n",
+ "print_table(pa.strategy_performance())\n",
+ "print_table(pa.get_return_year('SVM'))\n",
+ "print_table(pa.evaluations_df.T)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ " #7 回报率折线图 \n",
+ "pa.plot_returns()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.7"
+ },
+ "toc": {
+ "base_numbering": 1,
+ "nav_menu": {},
+ "number_sections": false,
+ "sideBar": true,
+ "skip_h1_title": false,
+ "title_cell": "MarkDown菜单",
+ "title_sidebar": "Contents",
+ "toc_cell": false,
+ "toc_position": {},
+ "toc_section_display": true,
+ "toc_window_display": true
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/README.md b/README.md
index bde586b..3f1c36b 100644
--- a/README.md
+++ b/README.md
@@ -1,3 +1,11 @@
+
# Quantitative-analysis
## 利用python对国内各大券商的金工研报进行复现
@@ -30,6 +38,8 @@
13. [基于点位效率理论的个股趋势预测研究](https://www.joinquant.com/view/community/detail/f5d05b8233169adbbf44fb7522b2bf53)
14. [技术指标形态识别](https://www.joinquant.com/view/community/detail/1636a1cadab86dc65c65355fe431380c)
- 复现《Foundations of Technical Analysis》
+ - Technical Pattern Recognition文件:申万行业日度跟踪(Technical Pattern Recognition)
+
**因子**
@@ -45,7 +55,11 @@
10. [振幅因子的隐藏结构](https://www.joinquant.com/view/community/detail/a35fe484e3164893d4e48fafd3e08fd2)
11. [高质量动量因子选股](https://www.joinquant.com/view/community/detail/f72c599da7d4ca155b25bff4b281e2e6)
12. [APM因子改进模型](https://www.joinquant.com/view/community/detail/992fe40cc06c0bde50aa4aaf93fa042c)
-12. [高频价量相关性,意想不到的选股因子](https://www.joinquant.com/view/community/detail/539e74507dbf571f2be21d8fa4ebb8e6)
+13. [高频价量相关性,意想不到的选股因子](https://www.joinquant.com/view/community/detail/539e74507dbf571f2be21d8fa4ebb8e6)
+14. ["因时制宜"系列研究之二:基于企业生命周期的因子有效性分析]()
+ 1. composition_factor算法来源于:《20190104-华泰证券-因子合成方法实证分析》
+ 2. [IPCA](https://github.com/bkelly-lab/ipca)来源于[《Instrumented Principal Component Analysis》](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2983919)
+15. [因子择时](https://www.joinquant.com/view/community/detail/a873b8ba2b510a228eac411eafb93bea)
**量化价值**