Skip to content

Commit

Permalink
adding architecture, reference, rewrite introduction section
Browse files Browse the repository at this point in the history
  • Loading branch information
hdthinh1012 committed Sep 30, 2024
1 parent 34f2b25 commit 6a54707
Show file tree
Hide file tree
Showing 12 changed files with 579 additions and 471 deletions.
160 changes: 0 additions & 160 deletions .$Untitled Diagram.drawio.dtmp

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
title : "HTTP Live Streaming"
date : "`r Sys.Date()`"
weight : 1
chapter : false
pre : " <b> 1.1.1 </b> "
---

## HTTP Live Streaming
A protocol of adaptive streaming proposed by Apple, providing ability to stream videos with different bitrate, automatically or manually chosen by the client.

HLS breaks down videos into smaller chunks and delivers them using HTTP protocol. Client front-end compliant to HLS protocol downloads the video chunk and serve to users. HLS protocol provides options to generate and deliver chunks with different video encodings, bitrate, resolution, audio quality.

You can learn more about HTTP Live Streaming in this brilliant guide [here](https://duthanhduoc.com/blog/hls-streaming-nodejs "HLS Streaming với Node.js. Tạo server phát video như Youtube") (Vietnamese)
85 changes: 85 additions & 0 deletions content/1-Introduce/1.1-concepts/1.1.2-aws-services/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
title : "AWS Services in this workshop"
date : "`r Sys.Date()`"
weight : 2
chapter : false
pre : " <b> 1.1.2 </b> "
---

## AWS Elasic Compute Service (EC2)
A High-level general purpose virtual private server provided by AWS, providing multiple features such as:
- Amazon Machine Images (AMIs): provide multiple image packages for variety of Operating System, developement environment, package components.
- Instance types: multiple configurations of CPU cores, memory, disk size, network capacity and hardware graphic acclerations
- Amazon EBS volumes: persistent storage volumes
- Instance store volumes: storage volumes for temporary data that is deleted when you stop, hibernate, or terminate your instance.
- Key pairs: secure login information for your instances. AWS stores the public key and you store the private key in a secure place.

## AWS VPC Security Group
Act as virtual firewall, setting allow/deny rules for inbound/outbound requests based on protocol, port, destination IP ranges, source IP ranges, etc.

## AWS Simple Storage Service (S3)
A cloud file storage system provided by AWS for general purpose file storage. AWS S3 is a flat file system, meaning all the files stored in the same level, but abstracted as folder-like structure for easier management.
For example, a bucket display as below in the AWS S3 management console:

```
uploads/
chunks/
sintel_trailer.mp4.part_0
sintel_trailer.mp4.part_1
videos/
sintel_trailer.mp4
```

Is stored physically as:
```
- uploads/chunks/sintel_trailer.mp4.part_0
- uploads/chunks/sintel_trailer.mp4.part_1
- uploads/videos/sintel_trailer.mp4
```

## AWS SDK v3 for S3
This workshop uses AWS SDK v3 to upload and download large files from AWS S3 Bucket, the code example for be founded in "Upload or download large files" section in this [Developer Guide](https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/javascript_s3_code_examples.html "Amazon S3 examples using SDK for JavaScript (v3)").

Commands for uploading files:
- `CreateMultipartUploadCommand`: Initialize a multipart upload process, return an `uploadId` value for subsequent uses.
- `UploadPartCommand`: The actual upload command, called multiple times to upload multiple chunks to destination defined by `CreateMultipartUploadCommand`
- `CompleteMultipartUploadCommand`: Finalize the multipart upload process.
- `AbortMultipartUploadCommand`: Deleted the uploaded part and abort multipart upload process (use in case of error)

Commands for downloading files:
`GetObjectCommand`: Get all or part of data bytes from a AWS S3 file

AWS S3 Client SDK is provided throught an NPM library package [here](https://www.npmjs.com/package/@aws-sdk/client-s3 "aws-sdk/client-s3"). To send a request, you:
- Initiate client with configuration (e.g. credentials, region):
```js
import { S3Client } from "@aws-sdk/client-s3";
const client = new S3Client({
region: "REGION",
credentials: {
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
},
});
```

- Initiate command with input parameter
```js
import { ListBucketsCommand } from "@aws-sdk/client-s3";
const params = {
/** input parameters */
};
const command = new ListBucketsCommand(params);
```

- Executing command with send method:
```js
// async/await.
try {
const data = await client.send(command);
// process data.
} catch (error) {
// error handling.
} finally {
// finally.
}
```
256 changes: 256 additions & 0 deletions content/1-Introduce/1.1-concepts/1.1.3-nodejs-concepts/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,256 @@
---
title : "NodeJS Classes in this workshop"
date : "`r Sys.Date()`"
weight : 3
chapter : false
pre : " <b> 1.1.3 </b> "
---

## NodeJS Stream
Stream is a method of handling data method that worked best for large file size transmission. Instead of reading or writing data all at once, NodeJS streams read or write data chunk-by-chunk into its buffers.
When the buffers reach full capacity (determined by `highWaterMark` properties of a NodeJS stream instance), NodeJS streams must release it data by pushing onto the next stream or write to a final file destination

### NodeJS bases classes (Writable, Readable)
The application ultilizes stream:Writable, stream:Readable from "node:stream" library to implement custom read and write stream to S3 buckets using AWS v3 SDK.

You can learn more about implement custom stream in this brilliant guide [Node.js: How to implement own streams?](https://medium.com/@vaibhavmoradiya/how-to-implement-own-streams-in-node-js-edd9ab54a59b)
#### stream.Writable
To implement a custom write stream, we extends `Writable` class from `node:stream` library. The important method to override are:
- `construct(opt: {}): void`: constructor for passing configuration argument for the stream
- `_construct():void`: called right after the constructor to intialize the data, like opening local file or setting AWS sdk command
- `_write(chunk: any, encoding: BufferEncoding, callback: (error?: Error | null) => void): void`: called any time user called `writeStreamObj.write(<some-bytes...>)` or the stream right before it in a NodeJS pipeline call `push(chunk: any, encoding: Encoding): boolean` into it.
- `_final(callback: (err: Error|undefined) => void): void`: call after user called `writeStreamObj.end(<some-bytes...>)` or the stream right before it in a NodeJS pipeline emit the event 'end'.
- `_destroy(err: Error | undefined, callback: (err: Error|undefined) => void)`: called after the `_final` function, cleanup unused resources.

An example code of reimplement fs.WriteStream by extending the stream.Writable class
```js
const { Writable } = require("node:stream");
const fs = require("fs");

class FileWriteStream extends Writable {
constructor({ highWaterMark, fileName }) {
super({ highWaterMark });

this.fileName = fileName;
this.fd = null;
this.chunks = [];
this.chunkSize = 0;
this.writesCount = 0;
}

// This method is run after the constructor has been called and it will put off all calling the
// other methods until we call the callback function
// we can use this method to initialize the data like opening a new file
_construct(callback) {
fs.open(this.fileName, "w", (err, fd) => {
if (err) {
// if we call the callback with an arguments means that we have an error
// and we should not proceed
callback(err);
} else {
this.fd = fd;
// no arguments means it was successfull
callback();
}
});
}

_write(chunk, encoding, callback) {
this.chunks.push(chunk);
this.chunkSize += chunk.length;
if (this.chunkSize > this.highWaterMark) {
fs.write(this.fd, Buffer.concat(this.chunks), (err) => {
if (err) return callback(err);
this.chunks = [];
this.chunkSize = 0;
++this.writesCount;
callback();
});
} else {
// when we are done, we should call the callback function
callback();
}
}

// this will run after the our stream has finished
_final(callback) {
fs.write(this.fd, Buffer.concat(this.chunks), (err) => {
if (err) {
return callback(err);
}

this.chunks = [];
callback();
});
}

// this method is called when we are done with the final method
_destroy(error, callback) {
console.log("Write Count:", this.writesCount);
if (this.fd) {
fs.close(this.fd, (err) => {
callback(err | error);
});
} else {
callback(error);
}
}
}

const stream = new FileWriteStream({
highWaterMark: 1800,
fileName: "text.txt",
});

stream.write(Buffer.from("abc"));
stream.end(Buffer.from("xyz"));

stream.on("finish", () => {
console.log("stream was finished");
});
```
#### stream.Readable
To implement a custom read stream, we extends `Readable` class from `node:stream` library. The important method to override are:
- `construct(opt: {}): void`: constructor for passing configuration argument for the stream
- `_construct():void`: called right after the constructor to intialize the data, like opening local file or setting AWS sdk command
- `_read(size: number): void`: responsible to read data from input and push to following stream, the method body must call `this.push(chunk: any): void`. The `_read` method is called the first time when the event listener to 'data' is added to the read stream or the read stream is piped into another write stream.
- if `this.push(chunk)` called with chunk is not nulled, the `_read` method will be called again and again until `this.push(chunk)` push a null data
- `_destroy(err: Error | undefined, callback: (err: Error|undefined) => void)`: called after the last `_read` called, clean unused resource.
An example code of reimplement fs.ReadStream by extending the stream.Readable class
```js
const { Readable } = require("node:stream");
const fs = require("fs");

class FileReadStream extends Readable {
constructor({ highWaterMark, fileName }) {
super({ highWaterMark });
this.fileName = fileName;
this.fd = null;
}

_construct(callback) {
fs.open(this.fileName, "r", (err, fd) => {
if (err) return callback(err);
this.fd = fd;
callback();
});
}

_read(size) {
const buff = Buffer.alloc(size);
fs.read(this.fd, buff, 0, size, null, (err, byteRead) => {
if (err) this.destroy(err);
// null means that we are at end of the stream
this.push(byteRead > 0 ? buff.subarray(0, byteRead) : null);
});
}

_destroy(error, callback) {
if (this.fd) {
fs.close(this.fd, (err) => callback(err | error));
} else {
callback(error);
}
}
}

const stream = new FileReadStream({ fileName: "text.txt" });

stream.on("data", (chunk) => {
console.log(chunk.toString("utf-8"));
});

stream.on("end", () => {
console.log("Stream read complete");
});
```

### Multer storage engine
`multer` is a popular library package used for parsing multipart POST request, especially for file uploading. `multer` provided a `multer.diskStorage()` for easy file upload parsing and temporarily storing before processing in request handler. In this workshop, a Multer custom engine is implemented for direct AWS S3 file upload without writing to server disk, saving uploading times. (You can learn about implementing custom multer engine in this [article](https://javascript.plainenglish.io/custom-storage-engine-for-multer-in-typescript-613ebd35d61e))

We will extend `multer.StorageEngine` class in `multer` package. The important method to override are:
- `_handleFile = (req: Request, file: Express.Multer.File, cb: (error?: any, info?: CustomFileResult) => void): void `: responsible to write uploaded file to the desired destination, whether it is a local file storage or cloud bucket
- `_removeFile = (_req: Request, file: Express.Multer.File & { name: string }, cb: (error: Error | null) => void): void`: remove the uploaded file in case an error occured.

An example code of implement multer custom engine to upload to google cloud storage by extending the `multer.StorageEngine` class.
```js
import { Bucket, CreateWriteStreamOptions } from "@google-cloud/storage";
import { Request } from "express";
import multer from "multer";
import path from "path";

type nameFnType = (req: Request, file: Express.Multer.File) => string;

type Options = {
bucket: Bucket;
options?: CreateWriteStreamOptions;
nameFn?: nameFnType;
validator?: validatorFn;
};

const defaultNameFn: nameFnType = (
_req: Request,
file: Express.Multer.File
) => {
const fileExt = path.extname(file.originalname);

return `${file.fieldname}_${Date.now()}${fileExt}`;
};

interface CustomFileResult extends Partial<Express.Multer.File> {
name: string;
}

class CustomStorageEngine implements multer.StorageEngine {
private bucket: Bucket;
private options?: CreateWriteStreamOptions;
private nameFn: nameFnType;

constructor(opts: Options) {
this.bucket = opts.bucket;
this.options = opts.options || undefined;
this.nameFn = opts.nameFn || defaultNameFn;
}

_handleFile = (
req: Request,
file: Express.Multer.File,
cb: (error?: any, info?: CustomFileResult) => void
): void => {
if (!this.bucket) {
cb(new Error("bucket is a required field."));
return;
}

const fileName = this.nameFn(req, file);

const storageFile = this.bucket.file(fileName);
const fileWriteStream = storageFile.createWriteStream(this.options);
const fileReadStream = file.stream;

fileReadStream
.pipe(fileWriteStream)
.on("error", (err) => {
fileWriteStream.end();
storageFile.delete({ ignoreNotFound: true });
cb(err);
})
.on("finish", () => {
cb(null, { name: fileName });
});
};

_removeFile = (
_req: Request,
file: Express.Multer.File & { name: string },
cb: (error: Error | null) => void
): void => {
this.bucket.file(file.name).delete({ ignoreNotFound: true });
cb(null);
};
}

export default (opts: Options) => {
return new CustomStorageEngine(opts);
};
```
Loading

0 comments on commit 6a54707

Please sign in to comment.