- Reproduce the architecture of the example serverless data engineering project or perform something similar using only serverless technologies
- Enhance the project by extending the functionality of the NLP analysis: adding entity extraction, key phrase extraction, or some other NLP feature or doing Applied Computer Vision.
In this project, I will learn to build a serverless data engineering pipeline.
- DynamoDB: A fast and flexible NoSQL database service for any scale
- Table: lambda-dynamodb-stream
- partition key: id(String)
- AWS lambda
- Function1: ProcessDynamoDBRecords
- add trigger, remember to add the DynamoDB permission for this IAM role
- Function2: updateTable.
- choose API: AWS Lambda and
invoke
- to invoke the
updateTable
lambda function every 60 seconds
- The EventBridge will invoke the
updateTable
function every minute, and it will insert a new item in the table in the DynamoDB; - The
ProcessDynamoDBRecords
will detect the change of DynamoDB database, and it will write the result to S3 bucket.
1.AWS lambda
CloudWatch Timer: schedule a 30-second job to code the lambda function;
Lambda function: update data from the database;
Log Lambda function: detect the changes from the database, then create some logs to S3;
1.AWS S3
2.lambda function
- Revisit project 3.