Massive-Data-Mining

Codes for Massive Data Mining in SJTU.

HW1:Wordcount Problem

It is a basic problem implemented in Pyspark. Read in the massive text data. Use Map and Reduce to count the counts of each word to save time. Attaching tasks are like finding the most frequently metioned word, figuring out the count of a certain word and calculating the counts of words starting with a certain letter. Got the full score.

HW2:DGIM and LSH

Implemented DGIM to calculate the approximated counts of 1 in a streaming data file. Compared the approximated count with the accurate one. Implemented LSH to determine the similarities of some documents. DGIM is right. LSH shows no problem in the process but the result.

HW3:PageRank and Node2vec

A template by myself. Valid PageRank with the built-in function. Node2vec is realized by following the instructions.Got the full score.

FINAL:Node classification and link prediction

Node classification

A heterogeneous network, Metapath2vec (Random walk+Node2vec+Negative sampling). Got the 3th in class's Kaggle competition.

Link prediction

A homogeneous network,node2vec. The outcome is not good.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Final-node classification and link prediction		Final-node classification and link prediction
HW1 Wordcount in PySpark		HW1 Wordcount in PySpark
HW2 DGIM and LSH		HW2 DGIM and LSH
HW3 PageRank and Node2vec		HW3 PageRank and Node2vec
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Massive-Data-Mining

HW1:Wordcount Problem

HW2:DGIM and LSH

HW3:PageRank and Node2vec

FINAL:Node classification and link prediction

Node classification

Link prediction

About

Releases

Packages

Languages

Seraen/EE226-Massive-Data-Mining

Folders and files

Latest commit

History

Repository files navigation

Massive-Data-Mining

HW1:Wordcount Problem

HW2:DGIM and LSH

HW3:PageRank and Node2vec

FINAL:Node classification and link prediction

Node classification

Link prediction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages