GitHub

Purpose

This effort is made to demonstrate part of the work described in paper Urban Planning and Smart City Decision Management Empowered by Real-Time Data Processing Using Big Data Analytics for "Big Data" course project.

The paper shows a research and study on how to utilize "Big Data" technologies and frameworks in implementing Smart Cities and Urban planning systems and how it could improve the performance and throughput of data to be analyzed.

This projects demonstrates the batch processing part described in the document using Hadoop and HBase and our project group decided to work on "Water Meters" data.

Water Meters

The research paper lists the resources from where the data sets where fetched, Water Meters is a CSV file containing all meters installed in City of Surrey, it only contains information where the meters are installed and if they are still in service or not.

We were looking for the source to get the readings for those equipments and we found City of Surrey Mapping Online System which provides a REST endpoints to get the readings and other attributes related to the meters.

Expected Outcome

The expected behavior we wanted to demonstrate is to import the "normalized" meters readings to Hadoop and HBase to run analyses tasks using Apache Spark against those readings.

We were able to use Hadoop map-reduce job to do the initial mapping and normalization for readings data then importing it to HBase but we will apply the analyses part of work in future.

Project components

Docker: to run Hadoop and HBase cluster, utilizes jupyter lab container to demonstrate the code and use it as a utility to upload CSV files.
Python-3.8: implementing Map-Reduce Hadoop jobs
Shell scripts: to run back-end jobs which listens to files changes and running Hadoop and HBase jobs.

Docker images

The Docker images used in this demo are based on Docker images built in project https://github.com/big-data-europe/docker-hbase, additionally jupyter/pyspark-notebook image was used to run the jupyter lab.

Running the application

A docker compose file is introduced to ignite the project containers, below command runs the docker containers:

docker-compose -f docker-compose.yml up -d

You can shutdown the containers by executing the below command, you can additionally pass -v and --rmi all options to clean all volumes and created images.

docker-compose -f docker-compose.yml down

Demo Steps

Once the docker containers are started

trace the logs of the jupyter notebook container to get the notebook URL, the last line displays the required URL

docker logs -t pyspark_notebook

You should see the notebook main page like below:

Navigate through the notebook files explorer to /work/uploads, you can have a look into the shell scripts used and Python Map-Reduce jobs under /work/hadoop and /work/hbase
Download a water meter dataset from the Water Meters
After navigating through notebook to /work/uploads upload the downloaded CSV to the upload folder.
Uploading a file should trigger a Map-Reduce hadoop job.
You can trace the job by tailing a log file inside the name node container:

docker exec -it namenode bash

Then inside the container tail the log file

tail -f upload_logs.txt

You can trace hbase import job logs

docker exec -it hbase bash

Then tail import log file:

tail -f import_logs.txt

You can check the imported records into hbase table, bash into hbase container then open hbase shell to scan the table:

docker exec -it hbase bash
hbase shell
>scan 'water_meters', {LIMIT => 1}

You should see the first record as below:

ROW                                          COLUMN+CELL
 1000928957-2020-10-28                       column=basic:ACCOUNT_NO, timestamp=1610903090138, value=440853
 1000928957-2020-10-28                       column=basic:FACILITYID, timestamp=1610903090138, value=1000928957
 1000928957-2020-10-28                       column=location:HOUSE_NO, timestamp=1610903090138, value=11576
 1000928957-2020-10-28                       column=location:Latitude, timestamp=1610903090138, value=49.178048
 1000928957-2020-10-28                       column=location:Longitude, timestamp=1610903090138, value=-122.901761\x09
 1000928957-2020-10-28                       column=location:STREET_NO, timestamp=1610903090138, value=96A AVE
 1000928957-2020-10-28                       column=reading:LAST_READ, timestamp=1610903090138, value=1435
 1000928957-2020-10-28                       column=reading:LAST_READ_DT, timestamp=1610903090138, value=2020-10-28
1 row(s) in 1.1870 seconds

HBase Table Structure

The final table "water_meters" which hold the meters reading structure is:

Columns Group	Columns	Description
basic	ACCOUNT_NO,FACILITYID	Holds the meter ID
location	HOUSE_NO,Latitude,Longitude,STREET_NO	Meter location
reading	LAST_READ,LAST_READ_DT	Last reading value and date

Map-Reduce Python scripts

Mapper accepts each line in water meters csv file, read all line fields except: FOLIO, STATUS, GPS, IMAGE, and LOTLINK, adds the last reading value and last reading date from the REST web service by the facility ID, then generates the final line to be passed to the reduce job, it also add additional key that will be used as the key for the record in HBase which is the result of concatenation between FACILITYID and LAST_READ_DT, this will help in not inserting the first reading multiple times to the HBase table.
Reducer: will pass the record as an output only if the Min-Max normalization equation for the record reading falls within 0 and 1.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
datanode		datanode
docs_images		docs_images
hbase		hbase
namenode		namenode
nodemanager		nodemanager
pysparknotebook		pysparknotebook
resourcemanager		resourcemanager
scripts		scripts
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
hadoop.env		hadoop.env
hbase-standalone.env		hbase-standalone.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Purpose

Water Meters

Expected Outcome

Project components

Docker images

Running the application

Demo Steps

HBase Table Structure

Map-Reduce Python scripts

About

Releases

Packages

Languages

waedbara/BigDataUrbanProject

Folders and files

Latest commit

History

Repository files navigation

Purpose

Water Meters

Expected Outcome

Project components

Docker images

Running the application

Demo Steps

HBase Table Structure

Map-Reduce Python scripts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages