Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

大神请教一个关于训练数据的问题 #2

Open
26597925 opened this issue Jan 7, 2019 · 4 comments
Open

大神请教一个关于训练数据的问题 #2

26597925 opened this issue Jan 7, 2019 · 4 comments

Comments

@26597925
Copy link

26597925 commented Jan 7, 2019

我自己添加了三个领域大概几千行的数据,然后我用设置现成64线程跑在32核16g的机器上,两天都没跑完,我看了一下,内存也没用完,cpu也就100%,感觉性能没用完,但是跑了两天模型都没生产,这是什么原因呀!

@howl-anderson
Copy link
Owner

训练停留在什么阶段,以及你的 pipeline 是什么样子的?

@26597925
Copy link
Author

26597925 commented Jan 8, 2019

`language: "zh"

pipeline:

  • name: "nlp_mitie"
    model: "data/total_word_feature_extractor_zh.dat"
  • name: "tokenizer_jieba"
    default_dict: "./default_dict.big"
    user_dicts: "./jieba_userdict"

you can put in file path or directory path as the "user_dicts" value

user_dicts: "./jieba_userdict/jieba_userdict.txt"

  • name: "ner_mitie"
  • name: "ner_synonyms"
  • name: "intent_entity_featurizer_regex"
  • name: "intent_featurizer_mitie"
  • name: "intent_classifier_sklearn"`

Part II: train segment classifier now do training num training samples: 178

就是这个配置,但是我昨天租用了几天阿里云的,一个晚上还是训练完了,配置超级高的。

@howl-anderson
Copy link
Owner

平均下来你的每个意图多少个训练数据呢?

@26597925
Copy link
Author

26597925 commented Jan 8, 2019

55,38,9这是三个领域的三个数据

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants