Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Plagarism Checker #1364

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions Python/Plagarism_Checker/Plagarism_Checker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
from numpy import vectorize
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import os # ===== all the libraries =======

# ======= defining two files for checking =======
student_files = [doc for doc in os.listdir() if doc.endswith('.txt')]
student_notes = [open(File).read() for File in student_files]
# === getting values for texts============
vectorize = lambda Text: TfidfVectorizer().fit_transform(Text).toarray()
similarity = lambda doc1, doc2: cosine_similarity([doc1, doc2])

vectors = vectorize(student_notes)
s_vectors = list(zip(student_files, vectors))

# ======-function for checking two text files ======
def check_plagarism():
plagarism_results = set()
global s_vectors
for student_a, text_vector_a in s_vectors:
# ====== copying the texts =====
new_vectors = s_vectors.copy()
# === indexes for vectors_a and students =====
curr_index = new_vectors.index((student_a, text_vector_a))
del new_vectors[curr_index]
for student_b, text_vector_b in new_vectors:
sim_score = similarity(text_vector_a, text_vector_b)[0][1]
student_pair = sorted((student_a, student_b))
score = (student_pair[0], student_pair[1], sim_score)
plagarism_results.add(score)
return plagarism_results

# === checking the data ===========
for data in check_plagarism():
print(data)
25 changes: 25 additions & 0 deletions Python/Plagarism_Checker/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Plagarism Checker

- The python script will check the plagarism between texts.
- All the plagarisms are stored in a ```.txt``` file, named after the keyword.

## Tech Stack:
<ul> <li> Python </li></ul>

## Setup Environment

- A virtual environment.
- `pip install -r requirements.txt`
- Automatically it will fetch the text files.


## Running the script:


```sh
$ python Plagarism_Checker.py
```

## Working screenshots:

![Plagarism Checker](https://user-images.githubusercontent.com/72568715/161045972-034079db-b328-4fd5-a8ea-753e4249ab5e.PNG)
4 changes: 4 additions & 0 deletions Python/Plagarism_Checker/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Numpy == 1.22.03
Vectorize == 0.2.0
Scikit-learn == 1.0.2
Sklearn == 0.0
5 changes: 5 additions & 0 deletions Python/Plagarism_Checker/sample_1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
The process of searching for a job can be very stressful but it doesn't have to be.

start with a well-written resume has appropriate keywords on yout occupation. Next, conduct a targeted

job search for positions that meet your needs.
3 changes: 3 additions & 0 deletions Python/Plagarism_Checker/sample_2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Looking for a job can be very stressful , but it doesn't have to be.
Begin by writing a good resume with appropriate keywords for yout occupation.
Second, target yout job search for positions that match yout needs.
3 changes: 3 additions & 0 deletions Python/Plagarism_Checker/sample_3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Gardening in mixed beds is a great way to get the most productivity from a
small space. Some investment is required, to purchase materials for the beds themselves,
as well as soil and compost. The investment will likely pay-off in terms of increased productivity.
5 changes: 5 additions & 0 deletions Python/Plagarism_Checker/sample_4.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
If you don’t have a lot of space for a garden, raised beds can be a great option.
Gardening in mixed beds is a great way to get the most productivity from a small area. Some
investment is required. You’ll need to purchase materials for the raised beds themselves, as well as soil
and compost.
The investment will pay off, though, in the form of increased productivity.