Frequent Items Counting

A Study on Memory-Efficient Algorithms

Description

Determining the most frequent items on a data stream has many applications and is a hot topic on the research community of the field. The challenges inherent of data stream processing in a memory efficient way are very much worth exploring and some of the existing solutions already provide with great optimization strategies.

In this project, we focus on one of the most famous approximate counters to determine an estimation of the most frequent words of literary works from several authors in several languages and compare it to an exact counter. We also present a few conclusions drawn from the study applied to the dataset.

Repository Structure

/dataset - literary works taken from Project Gutenberg used as input data

/out - contains the programs' output

/report - the written report on the study conducted is made available here

/src - contains the source code, written in Python

Instructions to Run

First install all required packages:

$ pip3 install -r requirements

To run the word counting program, execute the following command:

$ python frequentWordFinder.py -d 1 -m 100 aliceInput/

Authors

The authors of this repository are Filipe Pires and João Alegria, and the project was developed for the Advanced Algorithms Course of the Master's degree in Informatics Engineering of the University of Aveiro.

For further information, please read our report or contact us at [email protected] or [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
__pycache__		__pycache__
dataset		dataset
out/aliceOut		out/aliceOut
report		report
README.md		README.md
count_min_sketch.py		count_min_sketch.py
frequentWordFinder.py		frequentWordFinder.py
graphConstructor.py		graphConstructor.py
langAnalyzer.py		langAnalyzer.py
requirements		requirements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Frequent Items Counting

Description

Repository Structure

Instructions to Run

Authors

About

Releases

Packages

Contributors 2

Languages

joao-alegria/AAProj3

Folders and files

Latest commit

History

Repository files navigation

Frequent Items Counting

Description

Repository Structure

Instructions to Run

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages