Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ranjan Paudel Task4: IMDb Scraper (Flask App) #26

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion Python/RanjanPaudel/.gitignore
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
#python caches
*/__pycache__
*/*/__pycache__
*/*/*/__pycache__

.pytest_cache

#vscode configs
.vscode

#environment files
.env
*/.env

#python virtual environments
*/*venv
24 changes: 24 additions & 0 deletions Python/RanjanPaudel/scraper/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#MYSQL_DB
MYSQL_DB_HOST=<your_host>
MYSQL_DB_PORT=<your_mysql_db_port>
MYSQL_DB_USER=<your_mysql_db_user>
MYSQL_DB_PASSWORD=<your_mysql_db_user_password>

#MYSQL_TEST_DB
MYSQL_TEST_DB_HOST=<your_test_host>
MYSQL_TEST_DB_PORT=<your_mysql_test_db_port>
MYSQL_TEST_DB_USER=<your_mysql_test_db_user>
MYSQL_TEST_DB_PASSWORD=<your_mysql_test_db_user_password>

#JWT
JWT_SECRET_A=<your_jwt_secret_a>
JWT_SECRET_B=<your_jwt_secret_b>
JWT_ALGORITHM=<your_jwt_algorithm>
JWT_ACCESS_TOKEN_LIFE=<access_token_life_in_seconds>
JWT_REFRESH_TOKEN_LIFE=<refresh_token_life_in_seconds>

#COOKIE
COOKIE_LIFE=<in_seconds>

#SESSION
SESSION_SECRET=""
15 changes: 15 additions & 0 deletions Python/RanjanPaudel/scraper/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#python caches
*/__pycache__
*/*/__pycache__
*/*/*/__pycache__
.pytest_cache

#vscode configs
.vscode

#environment files
.env
*/.env

#python virtual environments
*/*venv
61 changes: 61 additions & 0 deletions Python/RanjanPaudel/scraper/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# IMDb Scraper: A Flask App
Simple Server-side-rendering web-app(Python) in Flask. An authorized user can scrape (or update the scraped data of) four IMDb pages:<br />
[Top Rated Movies](https://www.imdb.com/chart/top/?ref_=nv_mv_250)<br />
[Most Popular Movies](https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm)<br />
[Top Rated TV Shows](https://www.imdb.com/chart/toptv/?ref_=nv_tvv_250)<br />
[Most Popular TV Shows](https://www.imdb.com/chart/tvmeter/?ref_=nv_tvv_mptv)

## Install
Clone the repo.
```
# clone this repo
$ git clone https://github.com/mrranjan31paudel/lf-training.git
$ cd Python/RanjanPaudel/scraper
```
Create a virtual environment and activate it. (Recommended)
```
$ python3 -m venv scraper_app_venv
$ source scraper_app_venv/bin/activate

----Or in Windows cmd----
$ scraper_app_venv\Scripts\activate.bat
```
Install packages from requrements.txt.
```
$ python3 -m pip install -r requirements.txt
```
## Setup
Install mysql in your system (Follow [this guide](https://dev.mysql.com/doc/mysql-installation-excerpt/5.7/en/)).<br />
After the installation is complete, copy `.env.example` as `.env`:<br />
```
$ cp .env.example .env
```
Then set the parameters as per your requirement.<br />
To run the migrations, first create database `scraper_app` for development and `scraper_app_test` for test. Then use the simple migrating CLI app:
```
$ python3 scraper_app/db_migrator.py --env {test|development} --action {create|drop}
```
*You can use: `$ python3 scraper_app/db_migrator.py --help` for detailed info about the CLI migrator-app.*
## Run
### Mode: Development
In the terminal do:
```
$ export FLASK_ENV=development
$ export FLASK_APP=scraper_app/app.py
$ flask run
```
### Mode: Test
This is used to perform the unit-tests in the `/tests` folder. For this you just have to do:
```
$ pytest -s
#'-s' to show the logs of test passings.
```
## Some screenshots
![Screenshot-1](./readme_pics/sc1.png)
![Screenshot-2](./readme_pics/sc2.png)
![Screenshot-3](./readme_pics/sc3.png)
![Screenshot-4](./readme_pics/sc4.png)
![Screenshot-5](./readme_pics/sc5.png)
![Screenshot-6](./readme_pics/sc6.png)
![Screenshot-7](./readme_pics/sc7.png)
![Screenshot-8](./readme_pics/sc8.png)
Binary file added Python/RanjanPaudel/scraper/readme_pics/sc1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Python/RanjanPaudel/scraper/readme_pics/sc2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Python/RanjanPaudel/scraper/readme_pics/sc3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Python/RanjanPaudel/scraper/readme_pics/sc4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Python/RanjanPaudel/scraper/readme_pics/sc5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Python/RanjanPaudel/scraper/readme_pics/sc6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Python/RanjanPaudel/scraper/readme_pics/sc7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Python/RanjanPaudel/scraper/readme_pics/sc8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions Python/RanjanPaudel/scraper/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Flask==1.1.2
PyJWT==1.7.1
pytest==6.0.1
requests==2.24.0
watchdog==0.10.3
cryptography==3.1
Flask-WTF==0.14.3
SQLAlchemy==1.3.19
mysqlclient==2.0.1
beautifulsoup4==4.9.1
python-dotenv==0.14.0
Empty file.
197 changes: 197 additions & 0 deletions Python/RanjanPaudel/scraper/scraper_app/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
from flask import Flask, request, Markup, render_template, redirect, url_for, make_response, flash
from jinja2 import Environment, PackageLoader, select_autoescape
from functools import wraps

import scraper_app.config as config
import scraper_app.validators as validators
import scraper_app.services as services
from scraper_app.app_constants import (
empty_signin_form,
empty_login_form,
tab_list,
list_table_columns,
list_table_column_keys,
tab_label_map)

app = Flask(__name__)
app.secret_key = bytes(config.SESSION_SECRET, encoding='utf8')


@app.route('/')
def root_page():
return redirect(url_for('home_page'))


@app.route('/home', methods=['GET'])
@app.route('/home/<tab_name>', methods=['GET'])
def home_page(tab_name=None):
authentication_info = services.authenticate_user(
request.cookies.copy().to_dict(flat=True))

if authentication_info == 'token_invalid':
return redirect(url_for('login_page'))

if ('status' in authentication_info) and (
authentication_info['status'] == 'token_expired'):
return redirect(url_for('refresh_tokens', tab_name=tab_name))

if request.method == 'GET':
scraped_list = ''
if tab_name:
scraped_list = services.get_scraped_list(tab_name)
if ('movie_list' not in scraped_list) or len(scraped_list['movie_list']) < 1:
flash(
f'Could not load the list for {tab_label_map[tab_name]}', 'error')
return render_template('home.html',
user_is_logged_in=True,
user=authentication_info['user'],
tab_list=tab_list,
selected_tab=tab_name,
tab_label_map=tab_label_map,
list_table_columns=list_table_columns,
list_table_column_keys=list_table_column_keys,
scraped_list=scraped_list)


@app.route('/scrape/<list_name>', methods=['GET'])
def scrape_list(list_name=None):
authentication_info = services.authenticate_user(
request.cookies.copy().to_dict(flat=True))

if authentication_info == 'token_invalid':
return redirect(url_for('login_page'))

if list_name in tab_label_map.keys():
try:
services.scrape(list_name)
flash(
f'{tab_label_map[list_name]} scraped successfully!', 'success')
except Exception as error:
flash(
f'Could not scrape {tab_label_map[list_name]} due to some internal errors!', 'error')

return redirect(url_for('home_page', tab_name=list_name))
else:
flash(f'Invalid request!', 'error')
return redirect(url_for('home_page'))


@app.route('/refresh', methods=['GET'])
def refresh_tokens():
if request.method == 'GET':
new_tokens_info = services.refresh_tokens(
request.cookies.copy().to_dict(flat=True))

if new_tokens_info in ['token_expired', 'token_invalid', 'token_refresh_error']:
refresh_response = make_response(redirect(url_for('login_page')))
refresh_response.set_cookie('access_token', '',
path='/', httponly=True, max_age=0)
refresh_response.set_cookie('refresh_token', '',
path='/refresh', httponly=True, max_age=0)

return refresh_response

query_params = request.args.copy().to_dict(flat=True)
tab_name = ''
if 'tab_name' in query_params:
tab_name = query_params['tab_name']
response = make_response(
redirect(url_for('home_page', tab_name=tab_name)))
response.set_cookie('access_token', new_tokens_info['access_token'],
path='/', httponly=True, max_age=config.COOKIE_LIFE)
response.set_cookie('refresh_token', new_tokens_info['refresh_token'],
path='/refresh', httponly=True, max_age=config.COOKIE_LIFE)

return response


@app.route('/login', methods=['GET', 'POST'])
def login_page():
authentication_info = services.authenticate_user(
request.cookies.copy().to_dict(flat=True))

if ('status' in authentication_info) and (
authentication_info['status'] == 'token_expired'):
return redirect(url_for('refresh_tokens'))

if 'user' in authentication_info:
return redirect(url_for('home_page'))

if request.method == 'GET':
return render_template('login.html', form_data=empty_login_form)

if request.method == 'POST':
validation = validators.validate_login_form(request.form)

if validation['has_error']:
return render_template('login.html', form_data=request.form, error=validation['error'])

try:
tokens = services.log_in_user(
request.form.copy().to_dict(flat=True))

response = make_response(redirect(url_for('home_page')))
response.set_cookie('access_token', tokens['access_token'],
path='/', httponly=True, max_age=config.COOKIE_LIFE)
response.set_cookie('refresh_token', tokens['refresh_token'],
path='/refresh', httponly=True, max_age=config.COOKIE_LIFE)

return response
except Exception as error:
error_dict = error.args[0]
resp = make_response(render_template(
'login.html', form_data=request.form, error=error_dict), error_dict['code'])
return resp


@app.route('/signin', methods=['GET', 'POST'])
def signin_page():
authentication_info = services.authenticate_user(
request.cookies.copy().to_dict(flat=True))

if ('status' in authentication_info) and (
authentication_info['status'] == 'token_expired'):
return redirect(url_for('refresh_tokens'))

if 'user' in authentication_info:
return redirect(url_for('home_page'))

if request.method == 'GET':
return render_template('signin.html', form_data=empty_signin_form)

if request.method == 'POST':
validation = validators.validate_signin_form(request.form)

if validation['has_error']:
resp = make_response(render_template(
'signin.html', form_data=request.form, error=validation['error']), 400)
return resp

try:
services.create_new_user(
request.form.copy().to_dict(flat=True))
flash('You were signed in successfully!', 'success')

return redirect(url_for('login_page'))
except Exception as error:
error_dict = error.args[0]
resp = make_response(render_template(
'signin.html', form_data=request.form, error=error_dict), error_dict['code'])
return resp


@app.route('/logout', methods=['GET'])
def logout_page():
authentication_info = services.authenticate_user(
request.cookies.copy().to_dict(flat=True))

if 'user' in authentication_info:
services.log_out_user(authentication_info['user'])

response = make_response(redirect(url_for('login_page')))
response.set_cookie('access_token', '',
path='/', httponly=True, max_age=0)
response.set_cookie('refresh_token', '',
path='/refresh', httponly=True, max_age=0)

return response
Loading