Skip to content

Commit

Permalink
Created using Colaboratory
Browse files Browse the repository at this point in the history
  • Loading branch information
Rajspeaks committed Feb 7, 2023
1 parent 54489e4 commit a6f9753
Showing 1 changed file with 82 additions and 4 deletions.
86 changes: 82 additions & 4 deletions Bengali POS Tagging/Bengali_POS_Tagging_project.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,19 @@
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/Rajspeaks/Bengali-POS-Tagging-using-BNLP/blob/main/Bengali%20POS%20Tagging/Bengali_POS_Tagging_project.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
"<a href=\"https://colab.research.google.com/github/Rajspeaks/Machine-Learning-approach-to-Bengali-POS-Tagging-using-BNLP/blob/main/Bengali%20POS%20Tagging/Bengali_POS_Tagging_project.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"We have first used Natural Language ToolKit or NLTK library to define & apply basic POS tagging on English Corpus."
],
"metadata": {
"id": "nhT_ER8KSARN"
},
"id": "nhT_ER8KSARN"
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -228,6 +238,16 @@
"nltk.pos_tag(text)"
]
},
{
"cell_type": "markdown",
"source": [
"Now we are going to install bnlp_toolkit."
],
"metadata": {
"id": "YBLBPoj-SPeY"
},
"id": "YBLBPoj-SPeY"
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -281,6 +301,16 @@
"pip install bnlp_toolkit"
]
},
{
"cell_type": "markdown",
"source": [
"In the next step, we took a small Bengali Corpus & tokenized each Bengali words from sentences individually using BasicTokenizer from BNLP under Rule-Based Approach. Then the same applied on two larger Bengali corpora.\n"
],
"metadata": {
"id": "K85canWmSkNT"
},
"id": "K85canWmSkNT"
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -331,6 +361,16 @@
"print(tokens)"
]
},
{
"cell_type": "markdown",
"source": [
"In next step, we have used NLTKTokenizer from BNLP to tokenize Bengali small corpus into two phases. One is in Word Tokenizing & second one is in Sentence Tokenizing under Rule-based approach. Word Tokenizer tokenized Bengali Words while Sentence Tokenizer tokenized each sentences separately. Then applied the same on two larger Bengali Corpora."
],
"metadata": {
"id": "pQVUZkemSqKR"
},
"id": "pQVUZkemSqKR"
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -393,6 +433,16 @@
"print(sentence_tokens)"
]
},
{
"cell_type": "markdown",
"source": [
"In the next step, we used POS function with pre-trained model from BNLP & took a small Bengali Corpus to tag Bengali words & categorize them into different Parts of Speeches under Conditional Random Field based approach."
],
"metadata": {
"id": "64DBokj_S-Ru"
},
"id": "64DBokj_S-Ru"
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -445,6 +495,16 @@
"print(res)"
]
},
{
"cell_type": "markdown",
"source": [
"Next we used SentencePieceTokenizer to apply Unsupervised Learning on two Bengali Corpora."
],
"metadata": {
"id": "Cwg77f3-UZqa"
},
"id": "Cwg77f3-UZqa"
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -499,6 +559,16 @@
"print(tokens)"
]
},
{
"cell_type": "markdown",
"source": [
"In the next we have embedded Bengali Words of a corpus using BengaliWord2Vector with pre-trained model from BNLP to get the vector shape of words & their values under Deep Learning approach."
],
"metadata": {
"id": "nCm5xh8aUgzm"
},
"id": "nCm5xh8aUgzm"
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -544,6 +614,16 @@
"print(vector)"
]
},
{
"cell_type": "markdown",
"source": [
"We have used again BasicTokenizer, NLTKTokenizer and POS function on different copora for testing the same."
],
"metadata": {
"id": "MbYg1_xvUmZY"
},
"id": "MbYg1_xvUmZY"
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -634,9 +714,7 @@
"id": "7ef58636"
},
"outputs": [],
"source": [
""
]
"source": []
}
],
"metadata": {
Expand Down

0 comments on commit a6f9753

Please sign in to comment.