-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnotes.txt
36 lines (30 loc) · 1.9 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
1. No offer in training data.
2. Preprocessing -
- Remove split, PII and original fields
- Select some topics in which help will be offered
- Selected columns are - food, shelter and water
3. Column Description
- id: Unique ID for each individual row
- split: Test, tune, validation split
- message: English text of actual messages related to disaster
- original: Text of column 3 in native language as originally written
- genre: Type of message, including direct messages, social posting, and news stories or bulletins
- related: Is the message disaster related? 1= yes, 2=no
- PII: Does the message contain PII? 1= yes, 2=no
- request: Does the message contain a request? 1= yes, 2=no
- offer: Does the message contain an offer? 1= yes, 2=no
- aid_related: Is the message aid related? 1= yes, 2=no
- medical_help: Does the message concern medical help? 1= yes, 2=no
- medical_products: Does the message concern medical products? 1= yes, 2=no
- search_and_rescue: Does the message concern search and rescue? 1= yes, 2=no
- security: Does the message concern security? 1= yes, 2=no
- military: Does the message concern military? 1= yes, 2=no
- child_alone: Does the message mention a child alone? 1= yes, 2=no
- water: Does the message concern water? 1= yes, 2=no
- food: Does the message concern food? 1= yes, 2=no
- shelter: Does the message concern shelter? 1= yes, 2=no
- clothing: Does the message concern clothing? 1= yes, 2=no
- Build various feature matrices - count matrix and tfidf matrix on message, message_stem, message_lemma. Play around with values of max feature size.
4. Train LogisticRegression models on all feature matrices for all chosen columns
5. Initial Training Conclusion: TFIDF feature matrices are performing better than Count matrices
6. Use SVM Classifier instead of LogisticRegression since the accuracy was poor