Kinesis is a new service recently launched by AWS for real-time processing of streaming big-data.
This application shows how to use Kinesis to build a very simple real-time web analytics platform. It has two main components: the collector and the dashboard.
The collector is a web application that exposes a single path: /collect
.
Every hit on this path results in a PUT request, injecting data (records) into Kinesis. Each record contains a few pieces of data about the hit being tracked: the Source IP (a best-effort match), the User-Agent, the Referer, a Session ID and a Timestamp.
This Session ID is stored as a cookie in the browser that made the hit. This allows a sequence of hits from the same session.
The dashboard is another web application that consumes the Kinesis stream described above and creates real-time visualizations of the data.
In the current version, it simply shows line graphs with the number of page views aggregated in three different ways:
- per second (for the last minute),
- per minute (for the last hour), and
- per hour (for the last day)
By accessing /dashboard
on this application, some JavaScript code will
interact with the server through AJAX calls, pulling data from three other
endpoints: /api/minute
, /api/hour
, /api/day
.
To keep the focus on Kinesis and keep the code as simple as possible, the Dashboard application doesn't persist data. It makes all calculations in-memory.
Let's suppose you want to track the visits on a website, for example,
www.tracked.com
. Let's also suppose you already deployed the two
components of this sample application on collector.example.com
and
dashboard.example.com
.
To initiate tracking pageviews, you have to insert a tag on each page
served by www.tracked.com
:
<script src="http://collector.example.com/collect"></script>
This will hit the URL described above and make a record be injected into the Kinesis stream.
In less than 10 seconds after inserting the tracking tag on the tracked web site, there will be records available for consumption by the Dashboard application.
By pointing your browser to http://dashboard.example.com/dashboard
,
you should start seeing realtime pageviews on the tracked website.
In order to deploy this sample application, you'll need first to
create a Kinesis stream. To do so, point your browser to
https://console.aws.amazon.com/kinesis
, and create a Stream
called hits
with 1 shard. If you wish to use another Stream name,
you'd have to modify the file modules/common/conf/application.conf
This application is built on the Play Framework 2.2.1. Please visit
http://playframework.com
and follow the installation instructions
for your operating system.
Once you have the Play Framework 2.2.1 up and running, cd
into
the project root directory (ie, the one with this file) and run the
command:
play war
It should take less than a minute to build the two .WAR files:
- modules/collector/target/collector-0.1-DATE-TIME.war
- modules/dashboard/target/dashboard-0.1-DATE-TIME.war
You'll need to create two Elastic Beanstalk applications: one for the collector and one for the dashboard.
Point your browser to https://console.aws.amazon.com/elasticbeanstalk
and create two new Tomcat 7 Java 7 applications.
Note: please ensure that your Elastic Beanstalks have an associated IAM roles with Kinesis-related permissions.
Make a few pageviews on the tracked website (or, alternatively, point
your browser to the /collect
endpoint on the collector application
and hit refresh a few times).
Then go to the Dashboard endpoint (i.e., /dashboard
on the dashboard
application) and see the pageviews appearing in real-time!
Please be aware that running this sample applications will incur in costs, including:
- the Kinesis shards;
- DynamoDB tables created by the Kinesis Client Library;
- the EC2 instances created by Elastic Beanstalk;
- some storage on S3 for the WAR files;
- data transfer;
You are responsible for this costs.