From a6f9753dea8bbcb9279200510949dc4a73af8553 Mon Sep 17 00:00:00 2001
From: Rajdeep Das <dasrajdeep0306@gmail.com>
Date: Tue, 7 Feb 2023 20:12:47 +0530
Subject: [PATCH] Created using Colaboratory

---
 .../Bengali_POS_Tagging_project.ipynb         | 86 ++++++++++++++++++-
 1 file changed, 82 insertions(+), 4 deletions(-)
diff --git a/Bengali POS Tagging/Bengali_POS_Tagging_project.ipynb b/Bengali POS Tagging/Bengali_POS_Tagging_project.ipynb
index 16955a3..6804d59 100644
--- a/Bengali POS Tagging/Bengali_POS_Tagging_project.ipynb	
+++ b/Bengali POS Tagging/Bengali_POS_Tagging_project.ipynb	
@@ -7,9 +7,19 @@
         "colab_type": "text"
       },
       "source": [
-        "<a href=\"https://colab.research.google.com/github/Rajspeaks/Bengali-POS-Tagging-using-BNLP/blob/main/Bengali%20POS%20Tagging/Bengali_POS_Tagging_project.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+        "<a href=\"https://colab.research.google.com/github/Rajspeaks/Machine-Learning-approach-to-Bengali-POS-Tagging-using-BNLP/blob/main/Bengali%20POS%20Tagging/Bengali_POS_Tagging_project.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We have first used Natural Language ToolKit or NLTK library to define & apply basic POS tagging on English Corpus."
+      ],
+      "metadata": {
+        "id": "nhT_ER8KSARN"
+      },
+      "id": "nhT_ER8KSARN"
+    },
     {
       "cell_type": "code",
       "execution_count": null,
@@ -228,6 +238,16 @@
         "nltk.pos_tag(text)"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Now we are going to install bnlp_toolkit."
+      ],
+      "metadata": {
+        "id": "YBLBPoj-SPeY"
+      },
+      "id": "YBLBPoj-SPeY"
+    },
     {
       "cell_type": "code",
       "execution_count": null,
@@ -281,6 +301,16 @@
         "pip install bnlp_toolkit"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "In the next step, we took a small Bengali Corpus & tokenized each Bengali words from sentences individually using BasicTokenizer from BNLP under Rule-Based Approach. Then the same applied on two larger Bengali corpora.\n"
+      ],
+      "metadata": {
+        "id": "K85canWmSkNT"
+      },
+      "id": "K85canWmSkNT"
+    },
     {
       "cell_type": "code",
       "execution_count": null,
@@ -331,6 +361,16 @@
         "print(tokens)"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "In next step, we have used NLTKTokenizer from BNLP to tokenize Bengali small corpus into two phases. One is in Word Tokenizing & second one is in Sentence Tokenizing under Rule-based approach. Word Tokenizer tokenized Bengali Words while Sentence Tokenizer tokenized each sentences separately. Then applied the same on two larger Bengali Corpora."
+      ],
+      "metadata": {
+        "id": "pQVUZkemSqKR"
+      },
+      "id": "pQVUZkemSqKR"
+    },
     {
       "cell_type": "code",
       "execution_count": null,
@@ -393,6 +433,16 @@
         "print(sentence_tokens)"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "In the next step, we used POS function with pre-trained model from BNLP & took a small Bengali Corpus to tag Bengali words & categorize them into different Parts of Speeches under Conditional Random Field based approach."
+      ],
+      "metadata": {
+        "id": "64DBokj_S-Ru"
+      },
+      "id": "64DBokj_S-Ru"
+    },
     {
       "cell_type": "code",
       "execution_count": null,
@@ -445,6 +495,16 @@
         "print(res)"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Next we used SentencePieceTokenizer to apply Unsupervised Learning on two Bengali Corpora."
+      ],
+      "metadata": {
+        "id": "Cwg77f3-UZqa"
+      },
+      "id": "Cwg77f3-UZqa"
+    },
     {
       "cell_type": "code",
       "execution_count": null,
@@ -499,6 +559,16 @@
         "print(tokens)"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "In the next we have embedded Bengali Words of a corpus using BengaliWord2Vector with pre-trained model from BNLP to get the vector shape of words & their values under Deep Learning approach."
+      ],
+      "metadata": {
+        "id": "nCm5xh8aUgzm"
+      },
+      "id": "nCm5xh8aUgzm"
+    },
     {
       "cell_type": "code",
       "execution_count": null,
@@ -544,6 +614,16 @@
         "print(vector)"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We have used again BasicTokenizer, NLTKTokenizer and POS function on different copora for testing the same."
+      ],
+      "metadata": {
+        "id": "MbYg1_xvUmZY"
+      },
+      "id": "MbYg1_xvUmZY"
+    },
     {
       "cell_type": "code",
       "execution_count": null,
@@ -634,9 +714,7 @@
         "id": "7ef58636"
       },
       "outputs": [],
-      "source": [
-        ""
-      ]
+      "source": []
     }
   ],
   "metadata": {