Fix errors reported by CRAN

- The CRAN urls in the overview.Rmd, features.Rmd and README.Rmd files need to be in canonical form. - Remove the link to the wordpredictor demo from README.Rmd. - Add Shiny demo to the wordpredictor package. - Add instructions on how to access the demo to README.Rmd. - Remove the bold style from wordpredictor in README.Rmd.
pakjiddat · Jun 14, 2021 · 4473e3e · 4473e3e
1 parent ddb67e3
commit 4473e3e
Show file tree

Hide file tree

Showing 15 changed files with 224 additions and 82 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -49,7 +49,7 @@ clean_up <- function(ve) {
 [![test-coverage](https://github.com/pakjiddat/word-predictor/workflows/test-coverage/badge.svg)](https://github.com/pakjiddat/word-predictor/actions)
 <!-- badges: end -->
 
-The goal of the **wordpredictor** package is to provide a flexible and easy to use framework for generating [n-gram models](https://en.wikipedia.org/wiki/N-gram) for word prediction.
+The goal of the wordpredictor package is to provide a flexible and easy to use framework for generating [n-gram models](https://en.wikipedia.org/wiki/N-gram) for word prediction.
 
 The package allows generating n-gram models from input text files. It also allows exploring n-grams using plots. Additionally it provides methods for measuring n-gram model performance using [Perplexity](https://en.wikipedia.org/wiki/Perplexity) and accuracy.
 
@@ -71,7 +71,7 @@ devtools::install_github("pakjiddat/word-predictor")
 ```
 
 ## Package structure
-The **wordpredictor** package is based on **R6 classes**. It is easy to customize and improve. It provides the following classes:
+The wordpredictor package is based on **R6 classes**. It is easy to customize and improve. It provides the following classes:
 
 1. **DataAnalyzer**. It allows analyzing n-grams.
 2. **DataCleaner**. It allows cleaning text files. It supports several data cleaning options.
@@ -150,7 +150,7 @@ clean_up(ve)
 
 ## Analyzing N-grams
 
-The **wordpredictor** package includes a class called **DataAnalyzer**, that can be used to get an idea of the frequency distribution of n-grams in a model. The model generation process described above, creates an n-gram file in the model directory.
+The wordpredictor package includes a class called **DataAnalyzer**, that can be used to get an idea of the frequency distribution of n-grams in a model. The model generation process described above, creates an n-gram file in the model directory.
 
 For each n-gram number less than or equal to the n-gram size of the model, a n-gram file is generated. In the example above the n-gram size of the model is 4. So 4 n-gram files are generated in the model folder. These files are: **n1.RDS, n2.RDS, n3.RDS and n4.RDS**. The **n2.RDS** file contains n-grams of size 2.
 
@@ -284,7 +284,7 @@ tg_opts = list(
 
 ## Evaluating model performance
 
-The **wordpredictor** package allows evaluating n-gram model performance. It can measure the performance of a single model as well as compare the performance of multiple models. When evaluating the performance of a model, intrinsic and extrinsic evaluation is performed.
+The wordpredictor package allows evaluating n-gram model performance. It can measure the performance of a single model as well as compare the performance of multiple models. When evaluating the performance of a model, intrinsic and extrinsic evaluation is performed.
 
 Intrinsic evaluation measures the Perplexity score for each sentence in a validation text file. It returns the minimum, maximum and mean Perplexity score for the sentences.
 
@@ -314,32 +314,38 @@ clean_up(ve)
 
 ## Demo
 
-A [DEMO](https://pakjiddat.shinyapps.io/word-predictor/) application demonstrates how to make word predictions. It is based on the Shiny package. It allows predicting the next word based on the given set of words. It displays the 10 most likely words along with their respective probabilities.
+The wordpredictor package includes a demo called "word-predictor". The demo is a Shiny application that displays the ten most likely words for a given set of words. To access the demo, run the following command from the R shell: 
 
-The demo app is based on Shiny platform. It consists of two files. [server.r](https://gist.github.com/pakjiddat/43c61c54b645e5bd0096d6fd75e58127) and [ui.r](https://gist.github.com/pakjiddat/96727c1df77755e5bcf8a7d4ff731dea). The n-gram model file must be present in the same folder as the two files. It can be generated using the ModelGenerator class.
+**`demo("word-predictor", package = "wordpredictor", ask = F)`**.
+
+The following is a screenshot of the demo:
+
+```{r demo, out.width="70%", out.height="70%", echo=F}
+knitr::include_graphics("man/figures/README-demo.png")
+```
 
 ## Website
 
-The [wordpredictor website](https://pakjiddat.github.io/word-predictor/) provides details about how the packages works. It includes code samples and details of all the classes and methods.
+The [wordpredictor website](https://pakjiddat.github.io/word-predictor/) provides details about how the package works. It includes code samples and details of all the classes and methods.
 
 ## Benefits
 
-The **wordpredictor** package provides an easy to use framework for working with n-gram models. It allows n-gram model generation, performance evaluation and word prediction.
+The wordpredictor package provides an easy to use framework for working with n-gram models. It allows n-gram model generation, performance evaluation and word prediction.
 
 ## Limitations
 
-The n-gram language model requires a lot of memory for storing the n-grams. The **wordpredictor** package has been tested on a machine with dual core processor and 4 GB of RAM. It works well for input data files of size less than 40 Mb and n-gram size 4. For larger data files and n-gram size, more memory and CPU power will be needed.
+The n-gram language model requires a lot of memory for storing the n-grams. The wordpredictor package has been tested on a machine with dual core processor and 4 GB of RAM. It works well for input data files of size less than 40 Mb and n-gram size 4. For larger data files and n-gram size, more memory and CPU power will be needed.
 
 ## Future Work
 
-The **wordpredictor** package may be extended by adding support for different smoothing techniques such as [Good-Turing](https://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation), [Katz-Back-off](https://en.wikipedia.org/wiki/Katz%27s_back-off_model) and handling of [Out Of Vocabulary Words](https://en.wikipedia.org/wiki/N-gram#Out-of-vocabulary_words).
+The wordpredictor package may be extended by adding support for different smoothing techniques such as [Good-Turing](https://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation), [Katz-Back-off](https://en.wikipedia.org/wiki/Katz%27s_back-off_model) and handling of [Out Of Vocabulary Words](https://en.wikipedia.org/wiki/N-gram#Out-of-vocabulary_words).
 
 Support for different types of n-gram models such as [Skip-Grams](https://en.wikipedia.org/wiki/N-gram#Skip-gram) and [Syntatic n-grams](https://en.wikipedia.org/wiki/N-gram#Syntactic_n-grams).
 
-The **wordpredictor** package is used for predicting words. It may be extended to support other use cases such as spelling correction, biological sequence analysis, data compression and more. This will require further performance optimization.
+The wordpredictor package is used for predicting words. It may be extended to support other use cases such as spelling correction, biological sequence analysis, data compression and more. This will require further performance optimization.
 
 The source code is organized using R6 classes. It is easy to extend. Contributions are welcome !.
 
 ## Acknowledgments
 
-I was motivated to develop the **wordpredictor** package after taking the courses in the [Data Science Specialization](https://www.coursera.org/specializations/jhu-data-science) offered by John Hopkins university on Coursera. I would like to thank the course instructors for making the courses interesting and motivating for the students.
+I was motivated to develop the wordpredictor package after taking the courses in the [Data Science Specialization](https://www.coursera.org/specializations/jhu-data-science) offered by John Hopkins university on Coursera. I would like to thank the course instructors for making the courses interesting and motivating for the students.
diff --git a/README.md b/README.md
@@ -10,8 +10,8 @@
 [![test-coverage](https://github.com/pakjiddat/word-predictor/workflows/test-coverage/badge.svg)](https://github.com/pakjiddat/word-predictor/actions)
 <!-- badges: end -->
 
-The goal of the **wordpredictor** package is to provide a flexible and
-easy to use framework for generating [n-gram
+The goal of the wordpredictor package is to provide a flexible and easy
+to use framework for generating [n-gram
 models](https://en.wikipedia.org/wiki/N-gram) for word prediction.
 
 The package allows generating n-gram models from input text files. It
@@ -39,7 +39,7 @@ devtools::install_github("pakjiddat/word-predictor")
 
 ## Package structure
 
-The **wordpredictor** package is based on **R6 classes**. It is easy to
+The wordpredictor package is based on **R6 classes**. It is easy to
 customize and improve. It provides the following classes:
 
 1.  **DataAnalyzer**. It allows analyzing n-grams.
@@ -135,10 +135,10 @@ clean_up(ve)
 
 ## Analyzing N-grams
 
-The **wordpredictor** package includes a class called **DataAnalyzer**,
-that can be used to get an idea of the frequency distribution of n-grams
-in a model. The model generation process described above, creates an
-n-gram file in the model directory.
+The wordpredictor package includes a class called **DataAnalyzer**, that
+can be used to get an idea of the frequency distribution of n-grams in a
+model. The model generation process described above, creates an n-gram
+file in the model directory.
 
 For each n-gram number less than or equal to the n-gram size of the
 model, a n-gram file is generated. In the example above the n-gram size
@@ -294,10 +294,10 @@ tg_opts = list(
 
 ## Evaluating model performance
 
-The **wordpredictor** package allows evaluating n-gram model
-performance. It can measure the performance of a single model as well as
-compare the performance of multiple models. When evaluating the
-performance of a model, intrinsic and extrinsic evaluation is performed.
+The wordpredictor package allows evaluating n-gram model performance. It
+can measure the performance of a single model as well as compare the
+performance of multiple models. When evaluating the performance of a
+model, intrinsic and extrinsic evaluation is performed.
 
 Intrinsic evaluation measures the Perplexity score for each sentence in
 a validation text file. It returns the minimum, maximum and mean
@@ -333,42 +333,40 @@ clean_up(ve)
 
 ## Demo
 
-A [DEMO](https://pakjiddat.shinyapps.io/word-predictor/) application
-demonstrates how to make word predictions. It is based on the Shiny
-package. It allows predicting the next word based on the given set of
-words. It displays the 10 most likely words along with their respective
-probabilities.
+The wordpredictor package includes a demo called “word-predictor”. The
+demo is a Shiny application that displays the ten most likely words for
+a given set of words. To access the demo, run the following command from
+the R shell:
 
-The demo app is based on Shiny platform. It consists of two files.
-[server.r](https://gist.github.com/pakjiddat/43c61c54b645e5bd0096d6fd75e58127)
-and
-[ui.r](https://gist.github.com/pakjiddat/96727c1df77755e5bcf8a7d4ff731dea).
-The n-gram model file must be present in the same folder as the two
-files. It can be generated using the ModelGenerator class.
+**`demo("word-predictor", package = "wordpredictor", ask = F)`**.
+
+The following is a screenshot of the demo:
+
+<img src="man/figures/README-demo.png" width="70%" height="70%" />
 
 ## Website
 
 The [wordpredictor website](https://pakjiddat.github.io/word-predictor/)
-provides details about how the packages works. It includes code samples
+provides details about how the package works. It includes code samples
 and details of all the classes and methods.
 
 ## Benefits
 
-The **wordpredictor** package provides an easy to use framework for
-working with n-gram models. It allows n-gram model generation,
-performance evaluation and word prediction.
+The wordpredictor package provides an easy to use framework for working
+with n-gram models. It allows n-gram model generation, performance
+evaluation and word prediction.
 
 ## Limitations
 
 The n-gram language model requires a lot of memory for storing the
-n-grams. The **wordpredictor** package has been tested on a machine with
+n-grams. The wordpredictor package has been tested on a machine with
 dual core processor and 4 GB of RAM. It works well for input data files
 of size less than 40 Mb and n-gram size 4. For larger data files and
 n-gram size, more memory and CPU power will be needed.
 
 ## Future Work
 
-The **wordpredictor** package may be extended by adding support for
+The wordpredictor package may be extended by adding support for
 different smoothing techniques such as
 [Good-Turing](https://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation),
 [Katz-Back-off](https://en.wikipedia.org/wiki/Katz%27s_back-off_model)
@@ -380,7 +378,7 @@ Support for different types of n-gram models such as
 [Syntatic
 n-grams](https://en.wikipedia.org/wiki/N-gram#Syntactic_n-grams).
 
-The **wordpredictor** package is used for predicting words. It may be
+The wordpredictor package is used for predicting words. It may be
 extended to support other use cases such as spelling correction,
 biological sequence analysis, data compression and more. This will
 require further performance optimization.
@@ -390,8 +388,8 @@ Contributions are welcome \!.
 
 ## Acknowledgments
 
-I was motivated to develop the **wordpredictor** package after taking
-the courses in the [Data Science
+I was motivated to develop the wordpredictor package after taking the
+courses in the [Data Science
 Specialization](https://www.coursera.org/specializations/jhu-data-science)
 offered by John Hopkins university on Coursera. I would like to thank
 the course instructors for making the courses interesting and motivating

diff --git a/demo/00Index b/demo/00Index
@@ -0,0 +1 @@
+word-predictor   A shiny application that shows the top ten predicted words
diff --git a/demo/word-predictor.R b/demo/word-predictor.R
@@ -0,0 +1,120 @@
+# This is the demo word-predictor application. You can run the application by
+# clicking 'Run App' above.
+#
+# The application allows users to enter a set of words. For the given words the
+# application attempts to predict the top ten most likely words. These words are
+# presented in a bar plot along with the respective probabilities.
+#
+# Find out more about building applications with Shiny here:
+#
+# http://shiny.rstudio.com/
+
+library(shiny)
+library(ggplot2)
+library(wordpredictor)
+
+# Define UI for application that draws a histogram
+ui <- fluidPage(
+
+    # Application title
+    titlePanel("Word Predictor"),
+    # Horizontal rule
+    hr(),
+
+    # Sidebar with a slider input for number of bins
+    sidebarLayout(
+        sidebarPanel(
+            # The input field
+            textInput("ngram", "Enter a n-gram:", value = "where is")
+        ),
+
+        # Show a plot of the possible predicted words
+        mainPanel(
+            # The predicted word
+            uiOutput("next_word"),
+            # The predicted word probability
+            uiOutput("word_prob"),
+            # Horizontal rule
+            hr(),
+            # The bar plot of possible next words
+            plotOutput("next_word_plot")
+        )
+    )
+)
+
+# Define server logic required to draw a histogram
+server <- function(input, output) {
+
+    # The model file path
+    sfp <- system.file("extdata", "def-model.RDS", package = "wordpredictor")
+    # The ModelPredictor object is created
+    mp <- ModelPredictor$new(sfp)
+    # The predicted word information
+    p <- NULL
+
+    # The next word is predicted
+    output$next_word <- renderUI({
+        # If the user entered some text
+        if (trimws(input$ngram) != "") {
+            # The text entered by the user is split on space
+            w <- trimws(input$ngram)
+            # The next word is predicted
+            p <- mp$predict_word(w, 10)
+            # If the next word was not found
+            if (!p$found) {
+                # The next word and next word is set to an information
+                # message
+                nw <- span("Not Found", style = "color:red")
+                # The next word probability is set to an information
+                # message
+                nwp <- span("N.A", style = "color:red")
+                # The plot is set to empty
+                output$next_word_plot <- renderPlot({})
+                # The predicted next word
+                nw <- tags$div("Predicted Word: ", tags$strong(nw))
+                # The predicted next word probability
+                nwp <- tags$div("Word Probability: ", tags$strong(nwp))
+                # The next word probability is updated
+                output$word_prob <- renderUI(nwp)
+            }
+            else {
+                # The next word
+                nw <- p$words[[1]]
+                # The next word probability
+                nwp <- p$probs[[1]]
+                # The plot is updated
+                output$next_word_plot <- renderPlot({
+                    # A data frame containing the data to plot
+                    df <- data.frame("word" = p$words, "prob" = p$probs)
+                    # The data frame is sorted in descending order
+                    df <- (df[order(df$prob, decreasing = T),])
+                    # The words and their probabilities are plotted
+                    g <- ggplot(data = df, aes(x = reorder(word, prob), y = prob)) +
+                        geom_bar(stat = "identity", fill = "red") +
+                        ggtitle("Predicted words and their probabilities") +
+                        ylab("Probability") +
+                        xlab("Word")
+                    print(g)
+                })
+                # The predicted next word
+                nw <- tags$div("Predicted Word: ", tags$strong(nw))
+                # The predicted next word probability
+                nwp <- tags$div("Word Probability: ", tags$strong(nwp))
+                # The next word probability is updated
+                output$word_prob <- renderUI(nwp)
+            }
+        }
+        else {
+            # The next word is set to ""
+            nw <- tags$span()
+            # The next word probability text is set to ""
+            output$word_prob <- renderUI(tags$span())
+            # The plot is set to empty
+            output$next_word_plot <- renderPlot({})
+        }
+        return(nw)
+    })
+}
+
+# Run the application
+shinyApp(ui = ui, server = server)
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		word-predictor A shiny application that shows the top ten predicted words