HarshCasper · Sulagna-Dutta-Roy · Mar 31, 2022 · Mar 31, 2022 · Mar 31, 2022 · Mar 31, 2022
diff --git a/Python/Plagarism_Checker/Plagarism_Checker.py b/Python/Plagarism_Checker/Plagarism_Checker.py
@@ -0,0 +1,35 @@
+from numpy import vectorize
+from sklearn.feature_extraction.text import TfidfVectorizer
+from sklearn.metrics.pairwise import cosine_similarity
+import os # ===== all the libraries =======
+
+# ======= defining two files for checking =======
+student_files = [doc for doc in os.listdir() if doc.endswith('.txt')]
+student_notes = [open(File).read() for File in student_files]
+# === getting values for texts============
+vectorize = lambda Text: TfidfVectorizer().fit_transform(Text).toarray()
+similarity = lambda doc1, doc2: cosine_similarity([doc1, doc2])
+
+vectors = vectorize(student_notes)
+s_vectors = list(zip(student_files, vectors))
+
+# ======-function for checking two text files ======
+def check_plagarism():
+    plagarism_results = set()
+    global s_vectors
+    for student_a, text_vector_a in s_vectors:
+        # ====== copying the texts =====
+        new_vectors = s_vectors.copy()
+        # === indexes for vectors_a and students =====
+        curr_index = new_vectors.index((student_a, text_vector_a))
+        del new_vectors[curr_index]
+        for student_b, text_vector_b in new_vectors:
+            sim_score = similarity(text_vector_a, text_vector_b)[0][1]
+            student_pair = sorted((student_a, student_b))
+            score = (student_pair[0], student_pair[1], sim_score)
+            plagarism_results.add(score)
+    return plagarism_results
+
+# === checking the data ===========
+for data in check_plagarism():
+    print(data)
diff --git a/Python/Plagarism_Checker/Readme.md b/Python/Plagarism_Checker/Readme.md
@@ -0,0 +1,25 @@
+# Plagarism Checker
+
+- The python script will check the plagarism between texts.
+- All the plagarisms are stored in a ```.txt``` file, named after the keyword.
+
+## Tech Stack:
+<ul> <li> Python </li></ul>
+
+## Setup Environment
+
+- A virtual environment.
+- `pip install -r requirements.txt`
+- Automatically it will fetch the text files.
+
+
+## Running the script:
+
+
+```sh
+    $ python Plagarism_Checker.py
+```
+
+## Working screenshots:
+
+![Plagarism Checker](https://user-images.githubusercontent.com/72568715/161045972-034079db-b328-4fd5-a8ea-753e4249ab5e.PNG)
diff --git a/Python/Plagarism_Checker/requirements.txt b/Python/Plagarism_Checker/requirements.txt
@@ -0,0 +1,4 @@
+Numpy == 1.22.03
+Vectorize == 0.2.0
+Scikit-learn == 1.0.2
+Sklearn == 0.0
diff --git a/Python/Plagarism_Checker/sample_1.txt b/Python/Plagarism_Checker/sample_1.txt
@@ -0,0 +1,5 @@
+The process of searching for a job can be very stressful but it doesn't have to be.
+
+start with a well-written resume has appropriate keywords on yout occupation. Next, conduct a targeted
+
+job search for positions that meet your needs.
diff --git a/Python/Plagarism_Checker/sample_2.txt b/Python/Plagarism_Checker/sample_2.txt
@@ -0,0 +1,3 @@
+Looking for a job can be very stressful , but it doesn't have to be.
+Begin by writing a good resume with appropriate keywords for yout occupation.
+Second, target yout job search for positions that match yout needs.
diff --git a/Python/Plagarism_Checker/sample_3.txt b/Python/Plagarism_Checker/sample_3.txt
@@ -0,0 +1,3 @@
+Gardening in mixed beds is a great way to get the most productivity from a
+small space. Some investment is required, to purchase materials for the beds themselves,
+as well as soil and compost. The investment will likely pay-off in terms of increased productivity.
diff --git a/Python/Plagarism_Checker/sample_4.txt b/Python/Plagarism_Checker/sample_4.txt
@@ -0,0 +1,5 @@
+If you don’t have a lot of space for a garden, raised beds can be a great option.
+ Gardening in mixed beds is a great way to get the most productivity from a small area. Some
+ investment is required. You’ll need to purchase materials for the raised beds themselves, as well as soil
+ and compost.
+The investment will pay off, though, in the form of increased productivity.