rtools-ngrams

R code for querying and parsing results from Google n-grams. This code is very much a work in progress and should only be used as a reference point for writing your own.

I've used these to query noun-verb co-occurrence patterns, with the final goal of creating pseudo-sentences with a range of probability values. Here's a brief overview:

run_queries.R: specify the server address or local path for your rotated ngrams database, then run this query. Variables not declared in this script are found in ngrams.RData.
preproc_query.R: basically adds line breaks to a continuous stream of text.
parse_query_output.R: counts the occurrences of target words in query results and saves these counts in some data frames.
count_subj_verb_pairs.R: co-occurrence counts for subject/verb pairs.
calc_freq.R: co-occurrence counts and pointwise mutual information (PMI) for noun-verb pairs.
make_proto_sentences.R: put together sentences with desired

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
src		src
README.md		README.md
README.md~		README.md~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rtools-ngrams

About

Releases

Packages

Languages

jeffrey-phillips/rtools-ngrams

Folders and files

Latest commit

History

Repository files navigation

rtools-ngrams

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages