快速研究主题神器

来源:https://uqer.io/community/share/551e5160f9f06c8f33904513

用于快速研究某个主题,可以获得以下信息

  • 主题相关的成分股
  • 主题在最近1年、3个月、5个交易日内的涨幅
  • 依据涨幅和成交量来获取在最近1年、3个月、5个交易日内的主题龙头股,并列出龙头股在这段时间区间内的涨幅
  • 依据通联算法,获得与主题相关度最高的个股以及个股在最近1年、3个月、5个交易日内的涨幅

该代码用法

  • step1:先在输入1处输入待研究的主题名称,如“新能源汽车”,运行“输入1”所在的cell,可以看到该主题所对应的主题id。有可能有多个主题包含了输入的主题名称,需要从中挑选自己想要研究的主题
  • step2:确定了主题id,在“输入2”所在cell修改theme_id,注意格式是字符串
  • step3:运行所有cell,便可获取与主题相关的信息了
  1. #先通过主题名称获得主题id
  2. themeName = u'生物医药' ###################输入1,在此处输入要研究的主题名称###################
  3. field1 = ['themeID','themeName']
  4. thms_id = DataAPI.ThemesContentGet(themeName=themeName,field=field1)
  5. thmid2nm_dic = dict(zip(thms_id['themeID'],thms_id['themeName'])) #获得主题id与主题名称的对应
  6. thms_id
themeIDthemeName
04462生物医药股
1120419生物医药
2120420生物医药产业
  1. ##这里是输入
  2. theme_id = '120419' ###################输入2,由上面可获得主题id,在此处输入主题id,注意格式是字符串###################
  3. field2 = ['themeID','themeName','ticker','secShortName','returnScore','textContributionScore','industryScore']
  4. thm_tks = DataAPI.TickersByThemesGet(themeID=theme_id,field=field2) #获得该主题相关的证券,以及证券与主题的相关度
  5. tk2nm_dic = dict(zip(thm_tks['ticker'],thm_tks['secShortName']))
  1. import pandas as pd
  2. from CAL.PyCAL import *
  3. cal = Calendar('China.SSE')
  4. def CountTime(): #返回的是datetime格式
  5. today = datetime.today()
  6. today_str = today.strftime("%Y%m%d")
  7. cal_date = Date.fromDateTime(today)
  8. time1=" 15:05:00"
  9. ben_time = datetime.strptime(today_str+time1,"%Y%m%d %H:%M:%S")
  10. if cal.isBizDay(cal_date) & (today>ben_time): #如果是交易日,则判断当天是不是在15点前
  11. date = today
  12. else: #如果当天不是交易日,则获得前一个交易日
  13. cal_wd = cal.adjustDate(cal_date,BizDayConvention.Preceding) #Date格式
  14. date = cal_wd.toDateTime() #datetime格式
  15. return date
  16. def GetMktEqud(tk_list,**kargs): #该函数是用来调取市场行情数据,由于调取时有长度限制,如果查询的个股数太多,需要分批调取
  17. num = 100
  18. cnt_num = len(tk_list)/num
  19. if cnt_num > 0:
  20. df = pd.DataFrame({})
  21. for i in range(cnt_num):
  22. sub_df = DataAPI.MktEqudGet(ticker=tk_list[i*num:(i+1)*num],**kargs)
  23. df = pd.concat([df,sub_df])
  24. if (i+1)*num != len(tk_list):
  25. sub_df = DataAPI.MktEqudGet(ticker=tk_list[(i+1)*num:],**kargs)
  26. df = pd.concat([df,sub_df])
  27. else:
  28. df = DataAPI.MktEqudGet(ticker=tk_list,**kargs)
  29. return df
  30. def GetReturn(Mkt_Info_df): #该函数是用来获得主题在一段时间内的收益,以及个股在这段时间内的收益(先计算成分股在一段时间内的涨幅,再加权成交金额得到主题的涨幅)
  31. Mkt_Info_df_gp = Mkt_Info_df.groupby('ticker')
  32. tk_inc_dic = {'ticker':[],'return':[],'turnoverValue':[]}
  33. for tk,sub_info in Mkt_Info_df_gp:
  34. rtn = sub_info['increase'].prod()-1
  35. tnv = sub_info['turnoverValue'].sum()/len(sub_info) #获得平均成交金额
  36. tk_inc_dic['ticker'].append(tk)
  37. tk_inc_dic['return'].append(rtn)
  38. tk_inc_dic['turnoverValue'].append(tnv)
  39. tk_inc_df = pd.DataFrame(tk_inc_dic)
  40. tk_inc_df['secShortName'] = tk_inc_df['ticker'].apply(lambda x:tk2nm_dic[x])
  41. rtn_together = (tk_inc_df['return']*tk_inc_df['turnoverValue']).sum()/tk_inc_df['turnoverValue'].sum() #获得该主题一段时间的涨幅,成交金额加权收益
  42. return rtn_together,tk_inc_df
  1. print '主题关联的个股'
  2. thm_tks
  3. 主题关联的个股
themeIDthemeNametickersecShortNamereturnScoretextContributionScoreindustryScore
0120419生物医药000004国农科技0.9353630.0000000.785714
1120419生物医药000403ST生化0.9279000.0000000.714286
2120419生物医药000513丽珠集团0.9635050.0303030.714286
3120419生物医药000538云南白药0.9850110.2606060.714286
4120419生物医药000597东北制药0.9889890.1030300.714286
5120419生物医药000661长春高新0.9380840.1939390.714286
6120419生物医药000739普洛药业0.9544980.0424240.714286
7120419生物医药000790华神集团0.8163600.0060610.714286
8120419生物医药000820金城股份0.6301090.0000000.017857
9120419生物医药000931中关村0.9279001.0000000.062500
10120419生物医药000963华东医药0.6939500.1939390.714286
11120419生物医药002004华邦颖泰0.7919380.0787880.750000
12120419生物医药002007华兰生物0.9429440.4060610.714286
13120419生物医药002019亿帆鑫富0.9822010.1212120.750000
14120419生物医药002020京新药业0.9157400.0181820.714286
15120419生物医药002030达安基因0.1429270.5454550.714286
16120419生物医药002038双鹭药业0.6802010.0121210.714286
17120419生物医药002102冠福股份0.8477860.0000000.053571
18120419生物医药002107沃华医药0.0000000.2484850.714286
19120419生物医药002219恒康医疗0.9300440.1696970.714286
20120419生物医药002286保龄宝0.9040690.0000000.017857
21120419生物医药002287奇正藏药0.8977390.0121210.714286
22120419生物医药002294信立泰0.7858570.1696970.714286
23120419生物医药002317众生药业0.9279000.1151520.714286
24120419生物医药002349精华制药0.9279000.0121210.714286
25120419生物医药002432九安医疗0.8047170.3333330.714286
26120419生物医药002462嘉事堂0.8358830.0363640.714286
27120419生物医药002550千红制药0.9612970.0121210.714286
28120419生物医药002581万昌科技0.7725910.0787880.035714
29120419生物医药002653海思科0.9002340.0545450.714286
52120419生物医药600220江苏阳光0.7547400.0000000.035714
53120419生物医药600222太龙药业0.8667470.0181820.714286
54120419生物医药600249两面针0.9444270.0060610.035714
55120419生物医药600252中恒集团0.9072640.0727270.776786
56120419生物医药600267海正药业0.9679120.0484850.714286
57120419生物医药600272开开实业0.9954950.0000000.035714
58120419生物医药600276恒瑞医药0.9359740.7515150.714286
59120419生物医药600297美罗药业0.8333230.0787880.714286
60120419生物医药600332白云山0.9562380.3090910.714286
61120419生物医药600340华夏幸福0.8818920.3818180.062500
62120419生物医药600381贤成矿业0.9219780.0121210.107143
63120419生物医药600385ST金泰0.7659460.0000000.714286
64120419生物医药600422昆药集团0.9569650.0606060.714286
65120419生物医药600503华丽家族0.9279000.0969700.062500
66120419生物医药600521华海药业0.9829250.0121210.714286
67120419生物医药600535天士力0.9838130.5212120.714286
68120419生物医药600557康缘药业0.9884320.2363640.714286
69120419生物医药600587新华医疗0.9671480.0303030.714286
70120419生物医药600594益佰制药0.8366190.2303030.714286
71120419生物医药600624复旦复华0.9772620.1151520.017857
72120419生物医药600645中源协和0.5990700.5212120.750000
73120419生物医药600666西南药业0.8310560.0909090.714286
74120419生物医药600783鲁信创投0.8789170.2363640.026786
75120419生物医药600789鲁抗医药0.9934660.1151520.714286
76120419生物医药600826兰生股份0.9131970.1212120.035714
77120419生物医药600867通化东宝0.8221120.0787880.714286
78120419生物医药600873梅花生物0.9584170.1030300.026786
79120419生物医药600895张江高科0.6277300.2969700.062500
80120419生物医药601607上海医药0.5196100.4424240.714286
81120419生物医药603168莎普爱思0.9949700.0121210.714286
  1. 82 rows × 7 columns
  1. #获得该主题的上涨幅度
  2. #获得研究的结束时间,如果在当天收盘前,则为前一个交易日
  3. endDate_dt = CountTime()
  4. endDate_CAL = Date.fromDateTime(endDate_dt)
  5. #前一季度的时间
  6. beginDate_3M_CAL = cal.advanceDate(endDate_CAL,Period('-3M'),BizDayConvention.Following)
  7. beginDate_3M_dt = beginDate_3M_CAL.toDateTime()
  8. #前5个交易日的时间
  9. period_day = 5 ###################输入###################
  10. period_CAL = '-'+str(period_day)+'B'
  11. beginDate_5B_CAL = cal.advanceDate(endDate_CAL, period_CAL, BizDayConvention.Following)
  12. beginDate_5B_dt = beginDate_5B_CAL.toDateTime()
  1. #获得主题在这一年、一季度、5个交易日内的涨幅
  2. tk_list = thm_tks['ticker'].tolist() #获得主题关联的证券代码列表
  3. field = ['ticker','secShortName','tradeDate','preClosePrice','closePrice','turnoverValue','marketValue']
  4. #计算主题在最近1年的涨幅
  5. Mkt_Info_df_1Y = GetMktEqud(tk_list=tk_list,field =field) #获取市场行情,省略了beginDate和endDate,则获取最近1年的行情
  6. Mkt_Info_df_1Y['tradeDate'] = pd.to_datetime(Mkt_Info_df_1Y['tradeDate']) #将tradeDate这一列的格式由string改为datetime
  7. Mkt_Info_df_1Y['increase'] = Mkt_Info_df_1Y['closePrice']/Mkt_Info_df_1Y['preClosePrice']
  8. (rtn_1Y,tk_rt_df_1Y) = GetReturn(Mkt_Info_df_1Y)
  9. #计算主题在最近3个月的涨幅
  10. Mkt_Info_df_3M = Mkt_Info_df_1Y[Mkt_Info_df_1Y['tradeDate']>beginDate_3M_dt]
  11. (rtn_3M,tk_rt_df_3M) = GetReturn(Mkt_Info_df_3M)
  12. #计算主题在最近5个交易日的涨幅
  13. Mkt_Info_df_5B = Mkt_Info_df_1Y[Mkt_Info_df_1Y['tradeDate']>beginDate_5B_dt]
  14. (rtn_5B,tk_rt_df_5B) = GetReturn(Mkt_Info_df_5B)
  1. def add_nm_rtn(mkt_df): #将个股名称与收益拼接,方便做展示
  2. add_info_list = []
  3. for i in range(len(mkt_df)):
  4. add_info = mkt_df['secShortName'].iloc[i] + str(round(mkt_df['return'].iloc[i],3))
  5. add_info_list.append(add_info)
  6. return add_info_list
  1. #获取主题在最近1年、3个月、5个交易日内的龙头股及其涨幅
  2. df_list = [tk_rt_df_1Y,tk_rt_df_3M,tk_rt_df_5B]
  3. bigstk_dic = {'bigstk_by_rtn':[],'bigstk_by_rnv':[]}
  4. for df_i in df_list:
  5. df_sort_rtn = df_i.sort(columns='return',ascending=False)[0:3] #按照收益率对其排序,取前3
  6. df_sort_tnv = df_i.sort(columns='turnoverValue',ascending=False)[0:3] #按照成交量对其排序,取前3
  7. bigstk_rtn_list = add_nm_rtn(df_sort_rtn)
  8. bigstk_tnv_list = add_nm_rtn(df_sort_tnv)
  9. bigstk_dic['bigstk_by_rtn'].append(bigstk_rtn_list)
  10. bigstk_dic['bigstk_by_rnv'].append(bigstk_tnv_list)
  11. bigstk_dic['thm_rtn'] = [round(rtn_1Y,3),round(rtn_3M,3),round(rtn_5B,3)]
  12. bigstk_df = pd.DataFrame(bigstk_dic)
  13. bigstk_df = bigstk_df.loc[:,['thm_rtn','bigstk_by_rtn','bigstk_by_rnv']]
  14. bigstk_df.index = [u'最近一年',u'最近3个月',u'最近5个交易日']
  15. bigstk_df.columns = [u'主题涨幅',u'龙头股_按涨幅',u'龙头股_按成交量']
  16. print '主题:',thmid2nm_dic[int(theme_id)]
  17. bigstk_df
  18. 主题: 生物医药
主题涨幅龙头股按涨幅龙头股按成交量
最近一年0.983[沃华医药5.498, 莎普爱思4.354, 达安基因3.13][云南白药0.268, 达安基因3.13, 白云山0.344]
最近3个月0.518[沃华医药1.938, 达安基因1.348, 博腾股份1.149][达安基因1.348, 张江高科0.548, 上海医药0.418]
最近5个交易日0.091[江苏阳光0.266, 恒康医疗0.221, 兰生股份0.22][达安基因0.198, 华夏幸福0.122, 上海医药0.088]
  1. #按照相关度做研究,不同维度得到的最相关的个股,查看其收益率
  2. tks_rtnscore = thm_tks.sort(columns='returnScore',ascending=False)[0:3]['ticker'].tolist() #根据returnScore排序
  3. tks_textscore = thm_tks.sort(columns='textContributionScore',ascending=False)[0:3]['ticker'].tolist() #根据textContributionScore排序
  4. tks_indscore = thm_tks.sort(columns='industryScore',ascending=False)[0:3]['ticker'].tolist() #根据industryScore排序
  5. tks_score_list = [tks_rtnscore,tks_textscore,tks_indscore]
  6. bigstk_score_dic = {}
  7. def noname(df,lt): #将结果按照传入的list中的ticker顺序排列,而不是默认由市场行情获得的的那个dataframe的顺序,我说清楚了吗
  8. new_df = pd.DataFrame({})
  9. for i in lt:
  10. a = df[df['ticker']==i]
  11. new_df = pd.concat([new_df,a])
  12. return new_df
  13. for i in range(3):
  14. tk_score_list = tks_score_list[i]
  15. #先获得1年、3个月、5个交易日的dataframe
  16. sub_mkt_1Y = noname(tk_rt_df_1Y,tk_score_list)
  17. add_info_1Y = add_nm_rtn(sub_mkt_1Y)
  18. sub_mkt_3M = noname(tk_rt_df_3M,tk_score_list)
  19. add_info_3M = add_nm_rtn(sub_mkt_3M)
  20. sub_mkt_5B = noname(tk_rt_df_5B,tk_score_list)
  21. add_info_5B = add_nm_rtn(sub_mkt_5B)
  22. if i == 0:
  23. bigstk_score_dic['rtn_score'] = [add_info_1Y,add_info_3M,add_info_5B]
  24. if i == 1:
  25. bigstk_score_dic['text_score'] = [add_info_1Y,add_info_3M,add_info_5B]
  26. if i == 2:
  27. bigstk_score_dic['ind_score'] = [add_info_1Y,add_info_3M,add_info_5B]
  28. bigstk_score_dic['thm_rtn'] = [round(rtn_1Y,3),round(rtn_3M,3),round(rtn_5B,3)]
  29. bigstk_score_df = pd.DataFrame(bigstk_score_dic)
  30. bigstk_score_df = bigstk_score_df.loc[:,['thm_rtn','text_score','ind_score','rtn_score']]
  31. bigstk_score_df.index = [u'最近一年',u'最近3个月',u'最近5个交易日']
  32. bigstk_score_df.columns = [u'主题涨幅',u'最相关_文本',u'最相关_行业',u'最相关_收益']
  33. bigstk_score_df
主题涨幅最相关文本最相关行业最相关_收益
最近一年0.983[中关村0.986, 恒瑞医药0.642, 达安基因3.13][国农科技1.028, 中恒集团0.599, 华邦颖泰1.034][开开实业0.697, 莎普爱思4.354, 鲁抗医药1.183]
最近3个月0.518[中关村0.35, 恒瑞医药0.258, 达安基因1.348][国农科技0.648, 中恒集团0.224, 华邦颖泰0.902][开开实业0.241, 莎普爱思0.487, 鲁抗医药0.612]
最近5个交易日0.091[中关村0.097, 恒瑞医药0.096, 达安基因0.198][国农科技0.073, 中恒集团0.028, 华邦颖泰0.148][开开实业0.086, 莎普爱思0.037, 鲁抗医药0.197]
  1. thm_tks_text = thm_tks.sort(columns='textContributionScore',ascending=False)[0:5]
  2. print '排名按照textContributionScore(文本贡献关联度,主题和证券在新闻文本中的相似度,取值范围[0,1],值越大表示关联度越高)'
  3. thm_tks_text
  4. 排名按照textContributionScore(文本贡献关联度,主题和证券在新闻文本中的相似度,取值范围[01],值越大表示关联度越高)
themeIDthemeNametickersecShortNamereturnScoretextContributionScoreindustryScore
9120419生物医药000931中关村0.9279001.0000000.062500
58120419生物医药600276恒瑞医药0.9359740.7515150.714286
15120419生物医药002030达安基因0.1429270.5454550.714286
72120419生物医药600645中源协和0.5990700.5212120.750000
67120419生物医药600535天士力0.9838130.5212120.714286
  1. thm_tks_ind = thm_tks.sort(columns='industryScore',ascending=False)[0:5]
  2. print '排名按照industryScore(行业关联度,主题和证券在行业分布上的相似度,取值范围[0,1],值越大表示关联度越高)'
  3. thm_tks_ind
  4. 排名按照industryScore(行业关联度,主题和证券在行业分布上的相似度,取值范围[01],值越大表示关联度越高)
themeIDthemeNametickersecShortNamereturnScoretextContributionScoreindustryScore
0120419生物医药000004国农科技0.9353630.0000000.785714
55120419生物医药600252中恒集团0.9072640.0727270.776786
11120419生物医药002004华邦颖泰0.7919380.0787880.750000
72120419生物医药600645中源协和0.5990700.5212120.750000
13120419生物医药002019亿帆鑫富0.9822010.1212120.750000
  1. thm_tks_rtn = thm_tks.sort(columns='returnScore',ascending=False)[0:5]
  2. print '排名按照returnScore(收益关联程度,主题和证券在短期收益上的相似度,取值范围[0,1],值越大表示关联度越高)'
  3. thm_tks_rtn
  4. 排名按照returnScore(收益关联程度,主题和证券在短期收益上的相似度,取值范围[01],值越大表示关联度越高)
themeIDthemeNametickersecShortNamereturnScoretextContributionScoreindustryScore
57120419生物医药600272开开实业0.9954950.0000000.035714
81120419生物医药603168莎普爱思0.9949700.0121210.714286
75120419生物医药600789鲁抗医药0.9934660.1151520.714286
4120419生物医药000597东北制药0.9889890.1030300.714286
68120419生物医药600557康缘药业0.9884320.2363640.714286