Explanations of the usefulness of each file in ReadMe.md
Configure the list of desired actions, during all steps they should be the same
CSV_NAME = **"@FOLO3".** list_stocks = DICT_COMPANYS [CSV_NAME]
You also have to set the parameter
opion = _KEYS_DICT.Option_Historical.MONTH_3
If done with MONTH_3_AD
it takes more than 3 months, and requires running the 1.0 (Recommended) alphavantage API before.
This execution has to generate the following files, for each action:
- d_price/GOOG_PLAIN_stock_history_MONTH_3.csv
- plots_relations/best_selection_GOOG_pos.json
- plots_relations/best_selection_GOOG_both.json
- plots_relations/best_selection_GOOG_neg.json
Generate all pattern combinations with the combination of multidimensional models, the .h5 are the TensorFlow models (the list of models is in Model_TF_definitions.ModelDefinition.get_dicts_models_multi_dimension ). :
- Models/TF_multi/TFm_GOOG_pos_reg4_mult_28.h5
- Models/TF_multi/TFm_GOOG_pos_good9_mult_28.h5
- Models/TF_multi/TFm_GOOG_neg_good9_mult_64.h5
- Models/TF_multi/TFm_GOOG_pos_reg4_mult_dense2.h5
- Models/TF_multi/TFm_GOOG_pos_reg4_mult_lstm.h5
- Models/TF_multi/TFm_GOOG_pos_low1_mult_conv2.h5
- Models/TF_multi/TFm_GOOG_pos_good9__simp_cordo.h5
- Models/TF_multi/TFm_GOOG_neg_reg4_mult_linear.h5
- Models/TF_multi/TFm_GOOG_neg_good9_mult_28.h5
- Models/TF_multi/TFm_GOOG_pos_reg4_mult_128.h5
- Models/TF_multi/TFm_GOOG_pos_reg4_mult_linear.h5
- Models/TF_multi/TFm_GOOG_neg_good9__simp_cordo.h5
- Models/TF_multi/TFm_GOOG_neg_good9_mult_dense2.h5
- Models/TF_multi/TFm_GOOG_neg_reg4_mult_28.h5
- Models/TF_multi/TFm_GOOG_pos_good9_mult_dense2.h5
- Models/TF_multi/TFm_GOOG_pos_reg4_mult_64.h5
- Models/TF_multi/TFm_GOOG_pos_good9_mult_lstm.h5
- Models/TF_multi/TFm_GOOG_neg_good9_mult_conv.h5
- Models/TF_multi/TFm_GOOG_neg_reg4_mult_128.h5
- Models/TF_multi/TFm_GOOG_pos_low1_mult_128.h5
- Models/TF_multi/TFm_GOOG_neg_vgood16__mult_gru.h5
- Models/TF_multi/TFm_GOOG_neg_good9_mult_128.h5
- Models/TF_multi/TFm_GOOG_neg_reg4_mult_dense2.h5
- Models/TF_multi/TFm_GOOG_pos_reg4__simp_cordo.h5
- Models/TF_multi/TFm_GOOG_neg_vgood16_mult_linear.h5
- Models/TF_multi/TFm_GOOG_neg_reg4__simp_cordo.h5
- Models/TF_multi/TFm_GOOG_neg_vgood16__simp_cordo.h5
- Models/TF_multi/TFm_GOOG_pos_low1_mult_28.h5
- Models/TF_multi/TFm_GOOG_neg_good9_mult_conv2.h5
- Models/TF_multi/TFm_GOOG_neg_good9_mult_linear.h5
- Models/TF_multi/TFm_GOOG_neg_vgood16_mult_dense2.h5
- Models/TF_multi/TFm_GOOG_pos_low1_mult_64.h5
- Models/TF_multi/TFm_GOOG_neg_reg4_mult_conv.h5
- Models/TF_multi/TFm_GOOG_neg_low1_mult_64.h5
- Models/TF_multi/TFm_GOOG_neg_reg4_mult_conv2.h5
- Models/TF_multi/TFm_GOOG_pos_reg4__mult_gru.h5
- Models/TF_multi/TFm_GOOG_pos_good9_mult_conv.h5
- Models/TF_multi/TFm_GOOG_neg_vgood16_mult_64.h5
- Models/TF_multi/TFm_GOOG_neg_vgood16_mult_128.h5
- Models/TF_multi/TFm_GOOG_pos_good9_mult_conv2.h5
- Models/TF_multi/TFm_GOOG_neg_vgood16_mult_conv.h5
- Models/TF_multi/TFm_GOOG_neg_vgood16_mult_28.h5
- Models/TF_multi/TFm_GOOG_pos_reg4_mult_conv2.h5
- Models/TF_multi/TFm_GOOG_neg_reg4_mult_lstm.h5
- Models/TF_multi/TFm_GOOG_neg_good9_mult_lstm.h5
- Models/TF_multi/TFm_GOOG_neg_vgood16_mult_conv2.h5
- Models/TF_multi/TFm_GOOG_pos_good9_mult_linear.h5
- Models/TF_multi/TFm_GOOG_pos_good9_mult_128.h5
- Models/TF_multi/TFm_GOOG_neg_low1_mult_28.h5
- Models/TF_multi/TFm_GOOG_neg_low1_mult_128.h5
- Models/TF_multi/TFm_GOOG_pos_good9__mult_gru.h5
- Models/TF_multi/TFm_GOOG_pos_good9_mult_64.h5
- Models/TF_multi/TFm_GOOG_pos_reg4_mult_conv.h5
- Models/TF_multi/TFm_GOOG_neg_reg4__mult_gru.h5
- Models/TF_multi/TFm_GOOG_pos_vgood16_mult_28.h5
- Models/TF_multi/TFm_GOOG_neg_reg4_mult_64.h5
- Models/TF_multi/TFm_GOOG_neg_vgood16_mult_lstm.h5
It also generates the _per_score files, these are scores in which the predictive model is considered valid, there are models that when asked for an answer answer answer with values between 0 and 0.5 and others between 0.4 and 0.9, to know the prediction weighting of each model are generated.
- Models/TF_multi/GOOG_pos_pos_vgood16__per_score.csv
- Models/TF_multi/GOOG_neg_good9__per_score.csv
- Models/TF_multi/GOOG_pos_reg4__per_score.csv
- Models/TF_multi/GOOG_pos_good9__per_score.csv
- Models/TF_multi/GOOG_neg_vgood16__per_score.csv
- Models/TF_multi/GOOG_neg_reg4__per_score.csv
- Models/TF_multi/GOOG_pos_low1__per_score.csv
- Models/TF_multi/GOOG_neg_low1__per_score.csv
to get the file Models/TF_multi/_SCORE_ALL_T_multi_all.csv
This file contains the hit thresholds for each model.
TODO: I think it's not necessary at all. With some refactor it can be avoid, because: "Models/TF_multi/_RESULTS_profit_multi_all.csv" Models/TF_multi/_SCORE_ALL_T_multi_all.csv" this two files have the same values with diferent format
You have to execute the functions , for each desired action eval_profits_per_of_all_models()
for S in list_stocks: eval_profits_per_of_all_models(S)
this function generates the file Models/Eval_multi/Eval_profict_GOOG.csv , this file contains the predictions of all models .h5 for the last months data
Execute the function , for each desired action
add_models_colum_by_best_Profits()
for S in list_stocks:
path_csv_file_EVAL = **"Models/Eval_multi/Eval_profict_"** + S + **".csv"**
add_models_colum_by_best_Profits(path_csv_file_EVAL, NUMBER_MACHTING_TEST_BEST= 3)
Select the models whose average profit per operation is greater than 2.2%, if not met in any of them are discarded.
m_highs_dif_per[m_highs_dif_per > 2.2].
The list of these valid models is used to generate the file
df_valids.to_csv("Models/TF_multi/_RESULTS_profit_multi_all.csv", sep='\t')
TODO make the execution in Model_predictions_Multi_N_eval_profits.py cleaner, and clearer.
run for real-time predictions, which you will see in the file d_result/predi_MULTI_real_time_2022_12_29.csv
There are 2 threads:
one is in charge of downloading the data in real time and queueing it, it is associated to the producer() function.
producer_thr = Thread(target=producer, args=(), name='PROD')
And the consumer which queues and calculates the best models obtained from Models/TF_multi/_RESULTS_profit_multi_all.csv and record them in the d_result/predi_MULTI_real_time_2022_12_29.csv file.
`consumer_thr_1 = threading.Thread(target=consumer, args=(1,), name='CONS_1')
Analyze, debug and try to see small bugs , all described are critical points of the code.
- Debug with
Model_creation_models_for_a_stock.py
: How GT is generated IMPORTANT Another opinion , we have a lot of debate with it ?
def rolling_get_sell_price_POS_next_value(rolling_col_slection):
def rolling_get_sell_price_NEG_next_value(rolling_col_slection):
-
Debug with
Model_predictions_Multi_N_eval_profits.py
for LUIS How to evaluate the quality of the models in €€€€ IMPORTANT. LUIS : take the best data from the df_testget_best_proftis_models(df_eval, S, type_buy_sell :Op_buy_sell, df_molds_S_NP_columns, NUMBER_MACHTING_TEST_BEST = 3)
SEBAS: take and calculate the buy-sell price and net profitModel_predictions_Multi_N_eval_profits_2.py
-
Debug with
Model_creation_models_for_a_stock.py
: How to split (df_train, df_val, df_test ) and balance the SMOTE modelData_multidimension.Data_multidimension.load_split_data_multidimension()
-
Debug with
Model_creation_models_for_a_stock.py
: How to train:early detection callback early_stopping = Utils_model_predict.CustomEarlyStopping(patience=8)
How to obtain the neural model:model = multi_data.get_dicts_models_multi_dimension(model_type)
-
Debug with
Model_creation_models_for_a_stock.py
: How to evaluate the quality of the models with df_test4.0 Eval de model with test_features this data had splited , and the .h5 model never see it
predit_test = model.predict(test_features).reshape(-1,)
-
Debug with
Model_creation_models_for_a_stock.py
(the code are comment): skleanr and XGB have been removed, when switching from 2D monodimension to 3D multidimension, what is left of them, should they be used?def train_model_with_custom_columns(name_model, columns_list, csv_file_SCALA, op_buy_sell : _KEYS_DICT.Op_buy_sell):
-
Debug with
Model_creation_models_for_a_stock.py
: How some data in .csv format is passed to 3D for TF.
def df_to_df_multidimension_array_2D(dataframe, BACHT_SIZE_LOOKBACK, will_check_reshaped = True):
#must be displayed in 3D for TF by format of variable shape_imput_3d.
train_features = np.array(train_cleaned_df_target.drop(columns=[Y_TARGET]) ).reshape(shape_imput_3d)
- Debug with
gen_technical_indicators.py
: When the technical patterns are generated 1068 are generated, of which are screened to plus or minus 120 in different leagues class MODEL_TYPE_COLM(Enum):def get_best_columns_to_train(cleaned_df, op_buy_sell : a_manage_stocks_dict.Op_buy_sell , num_best , CSV_NAME,path = None):
###this is a sociological problem You have to think in the principle of self-fulfilling prophecy, that is, use the same time frames that people in the city of london use,
the traders I know use 15 and 1 hour candles, never smaller, if the people of the city of london do not use 5 minutes or 1 minute, we should not use 5 minutes (I have thought a lot about making the models with 1 hour candles, since I understand that it is more common among the people of london).
How many candles backwards se people in the city of london look ? I don't think they look more than 20, I think that to put more than 20 is to put noise to the model.
Ask your trader friends, mine friends says, 15 minutes, 1 hour and daily.
this is a sociological problem, i.e. , to interpret the art painting, like the one in london, you have to look at the painting from the same position as london