diff --git a/README.md b/README.md
index 3a1cb73..1dc5fae 100644
--- a/README.md
+++ b/README.md
@@ -5,37 +5,30 @@ This project provides a URL shortening service that converts long URLs into shor
- Users submit a long URL to receive a shortened URL.
- Users can use the shortened URL to redirect to the original URL.
-# Functional Requirements
+## Functional Requirements
-1. URL Shortening
- - Users submit a long URL to receive a shortened URL.
-2. URL Redirection
- - Users can use the shortened URL to be redirected to the original URL.
+1. **URL Shortening** : Users submit a long URL to receive a shortened URL.
+2. **URL Redirection** : Users can use the shortened URL to be redirected to the original URL.
-> Unconsidered Requirements:
->
-> - Users can optionally specify a custom alias.
-> - Users can optionally specify an expiration date.
-
-# Non-Functional Requirements
+## Non-Functional Requirements
1. High Scalability: Load balancer, DB replication, Redis cluster.
-2. High Availability: 99.99%
-3. Low Latency: Hybrid cache, DB replication.
-4. Uniqueness Guarantee: Hash keys.
+2. High Availability: Load balancer, Auto scaling, DB replication, Redis cluster, Health check
+3. Low Latency: Hybrid cache, DB replication, Message Queue
+4. Uniqueness Guarantee: Base58, ID Generator
-# Technology Stack
+## Technology Stack
- Java 21
- Kotlin 1.9.25
- Spring Boot 3.4.1
- PostgreSQL 17
- Redis 8
-- EhCache
+- Caffeine
- Docker
- JUnit 5
-# Installation and Execution
+## Installation and Execution
Run using Docker:
@@ -43,9 +36,10 @@ Run using Docker:
make run
```
-# API Specifications
+## API Specifications
### 1. URL Shortening
+URL shortening API to shorten a long URL
```shell
POST /api/v1/shorten
@@ -54,9 +48,9 @@ POST /api/v1/shorten
Request Body:
```json
{
- "longUrl": "https://example.com/long-url"
+ "longUrl": "https://example.com/long-url",
+ "userId": 1
}
-
```
Response:
@@ -70,6 +64,7 @@ Status:
- `400` Bad Request: Invalid URL input.
### 2. URL Redirection
+API for redirecting to the original URL with the shortened URL
```shell
GET /{hash}
@@ -82,155 +77,70 @@ HTTP/1.1 301 Moved Permanently
Location: "https://example.com/long-url"
```
-# Database Schema
-
-| Field Name | Type | Description |
-|--------------| --- |--------------------|
-| id | SERIAL | Primary key |
-| long_url | VARCHAR | Original URL |
-| short_url | VARCHAR | Shortened URL |
-| hash | VARCHAR | Hash for short URL |
-| createdDate | TIMESTAMP | Creation date |
-| modifiedDate | TIMESTAMP | Modification date |
-
-# Testing
-
-Run tests using JUnit 5:
+### 3. Find URLs by userId
+Find the shorten and original URLs that user have
```shell
-make test
+GET /api/v1/users/{userId}/urls
+```
+Response:
+```shell
+[
+ {
+ "id": 1,
+ "longUrl": "http://amazon.com",
+ "shortUrl": "http://localhost:8080/3gXe",
+ "hash": "3gXe",
+ "userId": 1,
+ "createdAt": "2025-01-06T00:14:12.429006",
+ "updatedAt": "2025-01-06T00:14:12.429006"
+ },
+]
```
-# High-Level Design
-
-
-
-# Details
-
-
-Response Code
-
-🟢 **Status Code `301`** 🟢
-
-- Prevents traffic loss through browser caching.
-- Generally used for permanent URL redirection.
-- Adjust `Cache-Control` and `Expires` headers when changing URLs.
-
-### 301 Moved Permanently
-
-- Permanently redirects the URL.
-- Internally utilizes browser caching.
-- Advantages:
- - SEO-friendly: Prompts search engines to update the indexed URL.
- - Prevents traffic loss: Cached URL reduces server traffic.
-- Disadvantages:
- - Difficult to change: Permanent setting can complicate updates.
- - Caching: Requires additional work to update redirection.
-
-### 302 Found
-
-- Temporarily redirects the URL.
-- Advantages:
- - Temporary redirection: Suitable for event or promotion pages.
- - No impact on search engines: Original URL remains indexed.
-- Disadvantages:
- - Traffic loss: URL redirection occurs every time.
-
-
-
-Unique URL
-
-🟢 **Auto-Generated ID + Base58** 🟢
-
-- Combines uppercase letters, lowercase letters, and 58 digits.
-- Easy for humans to read.
-- Allows generation of diverse URLs (e.g., 6 characters can create 38 billion URLs).
-- Uses auto-generated database keys.
-
-### 1. Base58
-
-Uses 58 combinations of uppercase letters, lowercase letters, and digits (excluding 0, O, l, I).
-
-- Advantages:
- - Prevents confusion: Easy for humans to read, reducing errors (e.g., avoiding 0/O/l/I confusion).
- - Shorter URLs: More efficient than Base62.
-- Disadvantages:
- - Smaller character set: Fewer combinations than Base62.
- - Limited special characters.
-
-### 2. Base62
-
-Uses 62 combinations of uppercase letters, lowercase letters, and digits.
-
-- Advantages:
- - Larger combinations: Utilizes all 62 characters.
- - Short URLs: Efficient and widely compatible.
- - Excludes special characters: Suitable for various systems.
-- Disadvantages:
- - Similar characters may cause confusion (e.g., 0/O/l/I).
-
-### 3. Hash
-
-- Advantages:
- - Guarantees consistent output length.
- - Low collision probability.
- - Produces the same result for identical inputs.
-- Disadvantages:
- - Potential collisions.
- - Long URLs may require trimming hash values.
-
-### 4. UUID
+## Database Schema
+URL Table
-- Advantages:
- - High uniqueness.
- - Extremely low collision probability.
-- Disadvantages:
- - Long URLs.
- - Hard to read.
+| Field Name | Type | Description |
+|------------|-----------|------------------------|
+| id | SERIAL | Primary key |
+| long_url | VARCHAR | Original URL |
+| short_url | VARCHAR | Shortened URL |
+| hash | VARCHAR | Hash for short URL |
+| userId | BIGINT | User Identity |
+| createdAt | TIMESTAMP | Creation date time |
+| updatedAt | TIMESTAMP | Modification date time |
-
-
-Database
+## Testing
-🟢 **DB Replication** 🟢
+Run tests using JUnit 5:
-- Improved read performance: Master for writes, replicas for reads.
-- Scalability and availability: Backup in case of failures.
-- Load distribution: Spreads read and write operations across replicas.
+```shell
+make test
+```
-
-
-Cache
-
-🟢 **Hybrid Cache** 🟢
-Uses both local and remote cache.
-* Local Cache: Caffeine
-* Remote Cache: Redis (Lettuce)
-
-- Low latency: Local cache is faster than remote.
-- Prevents cache stampede: Minimizes backend load when cache is missing.
-- Cache warm-up: Updates local cache during server startup.
-
-### Lettuce
-* pros:
- * Asynchronous and non-blocking for high-concurrency environments.
- * Thread-safe, supports multi-threaded applications.
- * Built-in Redis cluster and sharding support.
- * Supports reactive programming.
-* cons:
- * More complex to use (requires understanding of async programming).
- * May use more memory due to async I/O model.
-
-### Jedis
-* pros:
- * Simple and easy to use (synchronous).
- * Low memory overhead.
- * Ideal for small-scale or single-instance Redis setups.
-* Cons:
- * Not thread-safe by default (requires separate connections per thread).
- * Limited or more complex cluster and sharding support.
- * Synchronous, which can be less efficient for high-concurrency use cases.
-
-
-
-# Performance Test
\ No newline at end of file
+## Current Architecture
+
+High-level design of current architecture
+
+## Future Architecture
+
+High-level design of future architecture
+
+- [Response Code](docs/ResponseCode.md)
+- [Unique URL](docs/UniqueURL.md)
+- [Database](docs/Database.md)
+- [Cache](docs/Cache.md)
+- [ID Generator](docs/IDGenerator.md)
+- [Message Queue](docs/MessageQueue.md)
+- [Rate Limiter](docs/RateLimiter.md)
+
+## Performance Test
+Test Machine Specifications
+- Processor: Apple M3 Pro
+- Cores: 11 cores
+- Memory: 36GB
+
+Load Test APIs
+1. [URL Shortening API](docs/test/ShortenUrlPerformanceTest): generate a shortened version of a provided URL
+2. [Short URL resolution API](docs/test/ResolveUrlPerformanceTest.md): resolves a shortened URL and redirect the user to the origin URL
diff --git a/docs/Cache.md b/docs/Cache.md
new file mode 100644
index 0000000..c20a6ea
--- /dev/null
+++ b/docs/Cache.md
@@ -0,0 +1,36 @@
+# Cache
+
+
+**Hybrid Cache**
+
+Uses both local and remote cache.
+* Local Cache: Caffeine
+* Remote Cache: Redis (Lettuce)
+
+- Low latency: Local cache is faster than remote.
+- Memory management: handle memory by using LRU policy
+- Prevents cache stampede: Minimizes backend load when cache is missing (local cache -> remote cache -> db)
+- Clustering: scaling in high traffic, replicate data across node
+- Cache warm-up: Updates local cache during server startup.
+
+### Redis
+* pros:
+ * rich data structure: string, list, sorted set, etc
+ * persistence: RDB snapshot, AOF logs for backup, recovery
+ * replication and clustering: scalability, fault-tolerance
+ * atomic operation
+ * high performance
+* cons:
+ * single thread: might cause bottleneck
+
+### Memcache
+* pros:
+ * simplicity: easy to setup, simple key-value
+ * performance: fast get/set operation
+ * multi thread: multi core process
+ * low memory
+* cons:
+ * no persistence: no backup, data is lost on restart of failure
+ * limited data structure
+ * no built in clustering
+ * only LRU eviction
\ No newline at end of file
diff --git a/docs/Database.md b/docs/Database.md
new file mode 100644
index 0000000..6a4a09c
--- /dev/null
+++ b/docs/Database.md
@@ -0,0 +1,8 @@
+# Database
+
+**DB Replication**
+
+- Improved read performance: Master for writes, replicas for reads.
+- Scalability and availability: Backup in case of failures. auto recovery
+- Load distribution: Spreads read and write operations across replicas.
+- read time replication of transaction logs
\ No newline at end of file
diff --git a/docs/IDGenerator.md b/docs/IDGenerator.md
new file mode 100644
index 0000000..f83b988
--- /dev/null
+++ b/docs/IDGenerator.md
@@ -0,0 +1,24 @@
+# ID Generator
+
+creating unique and short IDs for URLs
+
+### Redis `INCR`
+The `INCR` in Redis generates auto increment IDs
+* pros:
+ * Low latency: fast due to memory-based
+ * Atomicity: The single-threaded of Redis ensures no collisions.
+ * Scalability: clustering and sharding, master and replica
+* cons
+ * SPOF: Data can be lost if the Redis server crashes.
+ * Replication Lag: delays in replication cause ID synchronization issues.
+
+### Database `Auto Increment ID`
+use `AUTO_INCREMENT` to automatically generate unique IDs.
+* pros:
+ * Built-in Functionality: minimal setup
+ * Durability: IDs managed by DB system
+ * Fail over: replication
+* cons:
+ * Performance: Slower than Redis
+ * Scalability: Inefficient in distributed database
+ * Locking Issues: concurrency issue due to transaction locking.
\ No newline at end of file
diff --git a/docs/MessageQueue.md b/docs/MessageQueue.md
new file mode 100644
index 0000000..1447bb2
--- /dev/null
+++ b/docs/MessageQueue.md
@@ -0,0 +1,32 @@
+# Message Queue
+
+
+In case of high write traffic, a message queue ensures asynchronous traffic handling
+
+* Prevent Overload: Manages high write traffic by queuing requests
+* Loose Coupling: Enables independent system components
+* Asynchronous: Ensures low response time for clients
+* Scalability: easy scaling by adding more consumers
+* Retry Queue: retries requests that are retriable (reliability)
+* Dead Letter Queue (DLQ): Handles non-retriable requests manually
+
+### Kafka
+distributed streaming platform handling large-scale data
+
+* pros:
+ * durability: store data on disk, replicate data (retention)
+ * sequential process
+ * high availability: data is replicated across servers (automatic recovery)
+* cons:
+ * consumer lag: process delay
+ * complexity
+
+### RabbitMQ
+general message queue design for asynchronous communication
+
+* pros:
+ * simplicity: simpler setup than kafka
+ * light weight: suitable for smaller traffic
+* cons:
+ * lack of durability: data hold in memory (potentially loss)
+
\ No newline at end of file
diff --git a/docs/RateLimiter.md b/docs/RateLimiter.md
new file mode 100644
index 0000000..8270c9f
--- /dev/null
+++ b/docs/RateLimiter.md
@@ -0,0 +1,9 @@
+# Rate Limiter
+
+Protect servers from overload, enhance stability and strength security by preventing malicious attack (DDoS)
+
+* User based: limit the number of creating short url by user
+* IP based: limit the number of resolving url by IP (DDoS)
+* Token bucket algorithm: easy to implement, enable handle steady traffic and traffic spike
+* 429 Too Many Requests: block or delay requests
+* Redis: count request using `INCR`, reset count using `EXPIRE`
\ No newline at end of file
diff --git a/docs/ResponseCode.md b/docs/ResponseCode.md
new file mode 100644
index 0000000..2555acd
--- /dev/null
+++ b/docs/ResponseCode.md
@@ -0,0 +1,27 @@
+# Response Code
+
+**Status Code `301`**
+
+- Prevents traffic loss through browser caching.
+- Generally used for permanent URL redirection.
+- Adjust `Cache-Control` and `Expires` headers when changing URLs.
+
+### 301 Moved Permanently
+
+- Permanently redirects the URL.
+- Internally utilizes browser caching.
+- Advantages:
+ - SEO-friendly: Prompts search engines to update the indexed URL.
+ - Prevents traffic loss: Cached URL reduces server traffic.
+- Disadvantages:
+ - Difficult to change: Permanent setting can complicate updates.
+ - Caching: Requires additional work to update redirection.
+
+### 302 Found
+
+- Temporarily redirects the URL.
+- Advantages:
+ - Temporary redirection: Suitable for event or promotion pages.
+ - No impact on search engines: Original URL remains indexed.
+- Disadvantages:
+ - Traffic loss: URL redirection occurs every time.
\ No newline at end of file
diff --git a/docs/UniqueURL.md b/docs/UniqueURL.md
new file mode 100644
index 0000000..c9a286b
--- /dev/null
+++ b/docs/UniqueURL.md
@@ -0,0 +1,49 @@
+# Unique URL
+
+**Auto-Generated ID + Base58**
+
+- Combines uppercase letters, lowercase letters, and 58 digits.
+- Easy for humans to read.
+- Allows generation of diverse URLs (e.g., 6 characters can create 38 billion URLs).
+- Uses auto-generated database keys.
+
+### 1. Base58
+
+Uses 58 combinations of uppercase letters, lowercase letters, and digits (excluding 0, O, l, I).
+
+- Advantages:
+ - Prevents confusion: Easy for humans to read, reducing errors (e.g., avoiding 0/O/l/I confusion).
+ - Shorter URLs: More efficient than Base62.
+- Disadvantages:
+ - Smaller character set: Fewer combinations than Base62.
+ - Limited special characters.
+
+### 2. Base62
+
+Uses 62 combinations of uppercase letters, lowercase letters, and digits.
+
+- Advantages:
+ - Larger combinations: Utilizes all 62 characters.
+ - Short URLs: Efficient and widely compatible.
+ - Excludes special characters: Suitable for various systems.
+- Disadvantages:
+ - Similar characters may cause confusion (e.g., 0/O/l/I).
+
+### 3. Hash
+
+- Advantages:
+ - Guarantees consistent output length.
+ - Low collision probability.
+ - Produces the same result for identical inputs.
+- Disadvantages:
+ - Potential collisions.
+ - Long URLs may require trimming hash values.
+
+### 4. UUID
+
+- Advantages:
+ - High uniqueness.
+ - Extremely low collision probability.
+- Disadvantages:
+ - Long URLs.
+ - Hard to read.
\ No newline at end of file
diff --git a/docs/images/advanced_architecture.png b/docs/images/advanced_architecture.png
new file mode 100644
index 0000000..76d34dc
Binary files /dev/null and b/docs/images/advanced_architecture.png differ
diff --git a/docs/images/architecture.png b/docs/images/architecture.png
new file mode 100644
index 0000000..62637b8
Binary files /dev/null and b/docs/images/architecture.png differ
diff --git a/src/main/resources/static/overview.png b/docs/images/overview.png
similarity index 100%
rename from src/main/resources/static/overview.png
rename to docs/images/overview.png
diff --git a/docs/images/resolve_db_locus.png b/docs/images/resolve_db_locus.png
new file mode 100644
index 0000000..4cf966e
Binary files /dev/null and b/docs/images/resolve_db_locus.png differ
diff --git a/docs/images/resolve_db_metric.png b/docs/images/resolve_db_metric.png
new file mode 100644
index 0000000..249f4fe
Binary files /dev/null and b/docs/images/resolve_db_metric.png differ
diff --git a/docs/images/resolve_local_locus.png b/docs/images/resolve_local_locus.png
new file mode 100644
index 0000000..8552666
Binary files /dev/null and b/docs/images/resolve_local_locus.png differ
diff --git a/docs/images/resolve_local_metric.png b/docs/images/resolve_local_metric.png
new file mode 100644
index 0000000..0022252
Binary files /dev/null and b/docs/images/resolve_local_metric.png differ
diff --git a/docs/images/resolve_redis_locus.png b/docs/images/resolve_redis_locus.png
new file mode 100644
index 0000000..70c6e26
Binary files /dev/null and b/docs/images/resolve_redis_locus.png differ
diff --git a/docs/images/resolve_redis_metric.png b/docs/images/resolve_redis_metric.png
new file mode 100644
index 0000000..e882092
Binary files /dev/null and b/docs/images/resolve_redis_metric.png differ
diff --git a/docs/images/shorten_locus.png b/docs/images/shorten_locus.png
new file mode 100644
index 0000000..bdf4409
Binary files /dev/null and b/docs/images/shorten_locus.png differ
diff --git a/docs/images/shorten_metric.png b/docs/images/shorten_metric.png
new file mode 100644
index 0000000..0c96bad
Binary files /dev/null and b/docs/images/shorten_metric.png differ
diff --git a/docs/test/ResolveUrlPerformanceTest.md b/docs/test/ResolveUrlPerformanceTest.md
new file mode 100644
index 0000000..a0f7ac9
--- /dev/null
+++ b/docs/test/ResolveUrlPerformanceTest.md
@@ -0,0 +1,25 @@
+# Short URL resolution API
+
+### Test Configuration
+- Users: 500 test users
+- Ramp-up: 5 minutes gradual ramp-up
+
+### 1. Database Integration
+
+
+tomcat
+- max.thread: 200
+
+hikari
+- max.pool.size: 10
+- connection.timeout: 1000
+
+(Use the same configration for following tests)
+
+### 2. Redis Integration
+
+
+
+### 3. LocalCache Integration
+
+
\ No newline at end of file
diff --git a/docs/test/ShortenUrlPerformanceTest.md b/docs/test/ShortenUrlPerformanceTest.md
new file mode 100644
index 0000000..df0186a
--- /dev/null
+++ b/docs/test/ShortenUrlPerformanceTest.md
@@ -0,0 +1,15 @@
+# URL Shortening API
+
+### Test Configuration
+- Users: 500 test users
+- Ramp-up: 5 minutes gradual ramp-up
+
+### Test
+
+
+tomcat
+- max.thread: 200
+
+hikari
+- max.pool.size: 10
+- connection.timeout: 1000
diff --git a/http/url.http b/http/url.http
new file mode 100644
index 0000000..dddbdb0
--- /dev/null
+++ b/http/url.http
@@ -0,0 +1,14 @@
+### shorten URL
+POST localhost:8080/api/v1/shorten
+Content-Type: application/json
+
+{
+ "longUrl": "http://amazon.com",
+ "userId": 1
+}
+
+### resolve URL
+GET localhost:8080/2
+
+### find urls by userId
+GET localhost:8080/api/v1/users/1/urls
diff --git a/load-test/url-shortener-v2.py b/load-test/url-shortener-v2.py
index 7d453e4..0287ab3 100644
--- a/load-test/url-shortener-v2.py
+++ b/load-test/url-shortener-v2.py
@@ -6,7 +6,7 @@ class UrlShortener(HttpUser):
@task
def url_shorten(self):
- random_number = random.randint(1000000, 2000000)
+ random_number = random.randint(1, 500000)
hash = base58_encode(random_number)
res = self.client.get(f"/{hash}", allow_redirects=False)
print(f"response: {res}, hash: {hash}")