diff --git a/README.md b/README.md index 3a1cb73..1dc5fae 100644 --- a/README.md +++ b/README.md @@ -5,37 +5,30 @@ This project provides a URL shortening service that converts long URLs into shor - Users submit a long URL to receive a shortened URL. - Users can use the shortened URL to redirect to the original URL. -# Functional Requirements +## Functional Requirements -1. URL Shortening - - Users submit a long URL to receive a shortened URL. -2. URL Redirection - - Users can use the shortened URL to be redirected to the original URL. +1. **URL Shortening** : Users submit a long URL to receive a shortened URL. +2. **URL Redirection** : Users can use the shortened URL to be redirected to the original URL. -> Unconsidered Requirements: -> -> - Users can optionally specify a custom alias. -> - Users can optionally specify an expiration date. - -# Non-Functional Requirements +## Non-Functional Requirements 1. High Scalability: Load balancer, DB replication, Redis cluster. -2. High Availability: 99.99% -3. Low Latency: Hybrid cache, DB replication. -4. Uniqueness Guarantee: Hash keys. +2. High Availability: Load balancer, Auto scaling, DB replication, Redis cluster, Health check +3. Low Latency: Hybrid cache, DB replication, Message Queue +4. Uniqueness Guarantee: Base58, ID Generator -# Technology Stack +## Technology Stack - Java 21 - Kotlin 1.9.25 - Spring Boot 3.4.1 - PostgreSQL 17 - Redis 8 -- EhCache +- Caffeine - Docker - JUnit 5 -# Installation and Execution +## Installation and Execution Run using Docker: @@ -43,9 +36,10 @@ Run using Docker: make run ``` -# API Specifications +## API Specifications ### 1. URL Shortening +URL shortening API to shorten a long URL ```shell POST /api/v1/shorten @@ -54,9 +48,9 @@ POST /api/v1/shorten Request Body: ```json { - "longUrl": "https://example.com/long-url" + "longUrl": "https://example.com/long-url", + "userId": 1 } - ``` Response: @@ -70,6 +64,7 @@ Status: - `400` Bad Request: Invalid URL input. ### 2. URL Redirection +API for redirecting to the original URL with the shortened URL ```shell GET /{hash} @@ -82,155 +77,70 @@ HTTP/1.1 301 Moved Permanently Location: "https://example.com/long-url" ``` -# Database Schema - -| Field Name | Type | Description | -|--------------| --- |--------------------| -| id | SERIAL | Primary key | -| long_url | VARCHAR | Original URL | -| short_url | VARCHAR | Shortened URL | -| hash | VARCHAR | Hash for short URL | -| createdDate | TIMESTAMP | Creation date | -| modifiedDate | TIMESTAMP | Modification date | - -# Testing - -Run tests using JUnit 5: +### 3. Find URLs by userId +Find the shorten and original URLs that user have ```shell -make test +GET /api/v1/users/{userId}/urls +``` +Response: +```shell +[ + { + "id": 1, + "longUrl": "http://amazon.com", + "shortUrl": "http://localhost:8080/3gXe", + "hash": "3gXe", + "userId": 1, + "createdAt": "2025-01-06T00:14:12.429006", + "updatedAt": "2025-01-06T00:14:12.429006" + }, +] ``` -# High-Level Design - -![overview.png](src/main/resources/static/overview.png) - -# Details - -
-Response Code - -🟢 **Status Code `301`** 🟢 - -- Prevents traffic loss through browser caching. -- Generally used for permanent URL redirection. -- Adjust `Cache-Control` and `Expires` headers when changing URLs. - -### 301 Moved Permanently - -- Permanently redirects the URL. -- Internally utilizes browser caching. -- Advantages: - - SEO-friendly: Prompts search engines to update the indexed URL. - - Prevents traffic loss: Cached URL reduces server traffic. -- Disadvantages: - - Difficult to change: Permanent setting can complicate updates. - - Caching: Requires additional work to update redirection. - -### 302 Found - -- Temporarily redirects the URL. -- Advantages: - - Temporary redirection: Suitable for event or promotion pages. - - No impact on search engines: Original URL remains indexed. -- Disadvantages: - - Traffic loss: URL redirection occurs every time. - -
-
-Unique URL - -🟢 **Auto-Generated ID + Base58** 🟢 - -- Combines uppercase letters, lowercase letters, and 58 digits. -- Easy for humans to read. -- Allows generation of diverse URLs (e.g., 6 characters can create 38 billion URLs). -- Uses auto-generated database keys. - -### 1. Base58 - -Uses 58 combinations of uppercase letters, lowercase letters, and digits (excluding 0, O, l, I). - -- Advantages: - - Prevents confusion: Easy for humans to read, reducing errors (e.g., avoiding 0/O/l/I confusion). - - Shorter URLs: More efficient than Base62. -- Disadvantages: - - Smaller character set: Fewer combinations than Base62. - - Limited special characters. - -### 2. Base62 - -Uses 62 combinations of uppercase letters, lowercase letters, and digits. - -- Advantages: - - Larger combinations: Utilizes all 62 characters. - - Short URLs: Efficient and widely compatible. - - Excludes special characters: Suitable for various systems. -- Disadvantages: - - Similar characters may cause confusion (e.g., 0/O/l/I). - -### 3. Hash - -- Advantages: - - Guarantees consistent output length. - - Low collision probability. - - Produces the same result for identical inputs. -- Disadvantages: - - Potential collisions. - - Long URLs may require trimming hash values. - -### 4. UUID +## Database Schema +URL Table -- Advantages: - - High uniqueness. - - Extremely low collision probability. -- Disadvantages: - - Long URLs. - - Hard to read. +| Field Name | Type | Description | +|------------|-----------|------------------------| +| id | SERIAL | Primary key | +| long_url | VARCHAR | Original URL | +| short_url | VARCHAR | Shortened URL | +| hash | VARCHAR | Hash for short URL | +| userId | BIGINT | User Identity | +| createdAt | TIMESTAMP | Creation date time | +| updatedAt | TIMESTAMP | Modification date time | -
-
-Database +## Testing -🟢 **DB Replication** 🟢 +Run tests using JUnit 5: -- Improved read performance: Master for writes, replicas for reads. -- Scalability and availability: Backup in case of failures. -- Load distribution: Spreads read and write operations across replicas. +```shell +make test +``` -
-
-Cache - -🟢 **Hybrid Cache** 🟢 -Uses both local and remote cache. -* Local Cache: Caffeine -* Remote Cache: Redis (Lettuce) - -- Low latency: Local cache is faster than remote. -- Prevents cache stampede: Minimizes backend load when cache is missing. -- Cache warm-up: Updates local cache during server startup. - -### Lettuce -* pros: - * Asynchronous and non-blocking for high-concurrency environments. - * Thread-safe, supports multi-threaded applications. - * Built-in Redis cluster and sharding support. - * Supports reactive programming. -* cons: - * More complex to use (requires understanding of async programming). - * May use more memory due to async I/O model. - -### Jedis -* pros: - * Simple and easy to use (synchronous). - * Low memory overhead. - * Ideal for small-scale or single-instance Redis setups. -* Cons: - * Not thread-safe by default (requires separate connections per thread). - * Limited or more complex cluster and sharding support. - * Synchronous, which can be less efficient for high-concurrency use cases. - -
- -# Performance Test \ No newline at end of file +## Current Architecture +![architecture.png](docs/images/architecture.png) +High-level design of current architecture + +## Future Architecture +![advanced_architecture.png](docs/images/advanced_architecture.png) +High-level design of future architecture + +- [Response Code](docs/ResponseCode.md) +- [Unique URL](docs/UniqueURL.md) +- [Database](docs/Database.md) +- [Cache](docs/Cache.md) +- [ID Generator](docs/IDGenerator.md) +- [Message Queue](docs/MessageQueue.md) +- [Rate Limiter](docs/RateLimiter.md) + +## Performance Test +Test Machine Specifications +- Processor: Apple M3 Pro +- Cores: 11 cores +- Memory: 36GB + +Load Test APIs +1. [URL Shortening API](docs/test/ShortenUrlPerformanceTest): generate a shortened version of a provided URL +2. [Short URL resolution API](docs/test/ResolveUrlPerformanceTest.md): resolves a shortened URL and redirect the user to the origin URL diff --git a/docs/Cache.md b/docs/Cache.md new file mode 100644 index 0000000..c20a6ea --- /dev/null +++ b/docs/Cache.md @@ -0,0 +1,36 @@ +# Cache + + +**Hybrid Cache** + +Uses both local and remote cache. +* Local Cache: Caffeine +* Remote Cache: Redis (Lettuce) + +- Low latency: Local cache is faster than remote. +- Memory management: handle memory by using LRU policy +- Prevents cache stampede: Minimizes backend load when cache is missing (local cache -> remote cache -> db) +- Clustering: scaling in high traffic, replicate data across node +- Cache warm-up: Updates local cache during server startup. + +### Redis +* pros: + * rich data structure: string, list, sorted set, etc + * persistence: RDB snapshot, AOF logs for backup, recovery + * replication and clustering: scalability, fault-tolerance + * atomic operation + * high performance +* cons: + * single thread: might cause bottleneck + +### Memcache +* pros: + * simplicity: easy to setup, simple key-value + * performance: fast get/set operation + * multi thread: multi core process + * low memory +* cons: + * no persistence: no backup, data is lost on restart of failure + * limited data structure + * no built in clustering + * only LRU eviction \ No newline at end of file diff --git a/docs/Database.md b/docs/Database.md new file mode 100644 index 0000000..6a4a09c --- /dev/null +++ b/docs/Database.md @@ -0,0 +1,8 @@ +# Database + +**DB Replication** + +- Improved read performance: Master for writes, replicas for reads. +- Scalability and availability: Backup in case of failures. auto recovery +- Load distribution: Spreads read and write operations across replicas. +- read time replication of transaction logs \ No newline at end of file diff --git a/docs/IDGenerator.md b/docs/IDGenerator.md new file mode 100644 index 0000000..f83b988 --- /dev/null +++ b/docs/IDGenerator.md @@ -0,0 +1,24 @@ +# ID Generator + +creating unique and short IDs for URLs + +### Redis `INCR` +The `INCR` in Redis generates auto increment IDs +* pros: + * Low latency: fast due to memory-based + * Atomicity: The single-threaded of Redis ensures no collisions. + * Scalability: clustering and sharding, master and replica +* cons + * SPOF: Data can be lost if the Redis server crashes. + * Replication Lag: delays in replication cause ID synchronization issues. + +### Database `Auto Increment ID` +use `AUTO_INCREMENT` to automatically generate unique IDs. +* pros: + * Built-in Functionality: minimal setup + * Durability: IDs managed by DB system + * Fail over: replication +* cons: + * Performance: Slower than Redis + * Scalability: Inefficient in distributed database + * Locking Issues: concurrency issue due to transaction locking. \ No newline at end of file diff --git a/docs/MessageQueue.md b/docs/MessageQueue.md new file mode 100644 index 0000000..1447bb2 --- /dev/null +++ b/docs/MessageQueue.md @@ -0,0 +1,32 @@ +# Message Queue + + +In case of high write traffic, a message queue ensures asynchronous traffic handling + +* Prevent Overload: Manages high write traffic by queuing requests +* Loose Coupling: Enables independent system components +* Asynchronous: Ensures low response time for clients +* Scalability: easy scaling by adding more consumers +* Retry Queue: retries requests that are retriable (reliability) +* Dead Letter Queue (DLQ): Handles non-retriable requests manually + +### Kafka +distributed streaming platform handling large-scale data + +* pros: + * durability: store data on disk, replicate data (retention) + * sequential process + * high availability: data is replicated across servers (automatic recovery) +* cons: + * consumer lag: process delay + * complexity + +### RabbitMQ +general message queue design for asynchronous communication + +* pros: + * simplicity: simpler setup than kafka + * light weight: suitable for smaller traffic +* cons: + * lack of durability: data hold in memory (potentially loss) + \ No newline at end of file diff --git a/docs/RateLimiter.md b/docs/RateLimiter.md new file mode 100644 index 0000000..8270c9f --- /dev/null +++ b/docs/RateLimiter.md @@ -0,0 +1,9 @@ +# Rate Limiter + +Protect servers from overload, enhance stability and strength security by preventing malicious attack (DDoS) + +* User based: limit the number of creating short url by user +* IP based: limit the number of resolving url by IP (DDoS) +* Token bucket algorithm: easy to implement, enable handle steady traffic and traffic spike +* 429 Too Many Requests: block or delay requests +* Redis: count request using `INCR`, reset count using `EXPIRE` \ No newline at end of file diff --git a/docs/ResponseCode.md b/docs/ResponseCode.md new file mode 100644 index 0000000..2555acd --- /dev/null +++ b/docs/ResponseCode.md @@ -0,0 +1,27 @@ +# Response Code + +**Status Code `301`** + +- Prevents traffic loss through browser caching. +- Generally used for permanent URL redirection. +- Adjust `Cache-Control` and `Expires` headers when changing URLs. + +### 301 Moved Permanently + +- Permanently redirects the URL. +- Internally utilizes browser caching. +- Advantages: + - SEO-friendly: Prompts search engines to update the indexed URL. + - Prevents traffic loss: Cached URL reduces server traffic. +- Disadvantages: + - Difficult to change: Permanent setting can complicate updates. + - Caching: Requires additional work to update redirection. + +### 302 Found + +- Temporarily redirects the URL. +- Advantages: + - Temporary redirection: Suitable for event or promotion pages. + - No impact on search engines: Original URL remains indexed. +- Disadvantages: + - Traffic loss: URL redirection occurs every time. \ No newline at end of file diff --git a/docs/UniqueURL.md b/docs/UniqueURL.md new file mode 100644 index 0000000..c9a286b --- /dev/null +++ b/docs/UniqueURL.md @@ -0,0 +1,49 @@ +# Unique URL + +**Auto-Generated ID + Base58** + +- Combines uppercase letters, lowercase letters, and 58 digits. +- Easy for humans to read. +- Allows generation of diverse URLs (e.g., 6 characters can create 38 billion URLs). +- Uses auto-generated database keys. + +### 1. Base58 + +Uses 58 combinations of uppercase letters, lowercase letters, and digits (excluding 0, O, l, I). + +- Advantages: + - Prevents confusion: Easy for humans to read, reducing errors (e.g., avoiding 0/O/l/I confusion). + - Shorter URLs: More efficient than Base62. +- Disadvantages: + - Smaller character set: Fewer combinations than Base62. + - Limited special characters. + +### 2. Base62 + +Uses 62 combinations of uppercase letters, lowercase letters, and digits. + +- Advantages: + - Larger combinations: Utilizes all 62 characters. + - Short URLs: Efficient and widely compatible. + - Excludes special characters: Suitable for various systems. +- Disadvantages: + - Similar characters may cause confusion (e.g., 0/O/l/I). + +### 3. Hash + +- Advantages: + - Guarantees consistent output length. + - Low collision probability. + - Produces the same result for identical inputs. +- Disadvantages: + - Potential collisions. + - Long URLs may require trimming hash values. + +### 4. UUID + +- Advantages: + - High uniqueness. + - Extremely low collision probability. +- Disadvantages: + - Long URLs. + - Hard to read. \ No newline at end of file diff --git a/docs/images/advanced_architecture.png b/docs/images/advanced_architecture.png new file mode 100644 index 0000000..76d34dc Binary files /dev/null and b/docs/images/advanced_architecture.png differ diff --git a/docs/images/architecture.png b/docs/images/architecture.png new file mode 100644 index 0000000..62637b8 Binary files /dev/null and b/docs/images/architecture.png differ diff --git a/src/main/resources/static/overview.png b/docs/images/overview.png similarity index 100% rename from src/main/resources/static/overview.png rename to docs/images/overview.png diff --git a/docs/images/resolve_db_locus.png b/docs/images/resolve_db_locus.png new file mode 100644 index 0000000..4cf966e Binary files /dev/null and b/docs/images/resolve_db_locus.png differ diff --git a/docs/images/resolve_db_metric.png b/docs/images/resolve_db_metric.png new file mode 100644 index 0000000..249f4fe Binary files /dev/null and b/docs/images/resolve_db_metric.png differ diff --git a/docs/images/resolve_local_locus.png b/docs/images/resolve_local_locus.png new file mode 100644 index 0000000..8552666 Binary files /dev/null and b/docs/images/resolve_local_locus.png differ diff --git a/docs/images/resolve_local_metric.png b/docs/images/resolve_local_metric.png new file mode 100644 index 0000000..0022252 Binary files /dev/null and b/docs/images/resolve_local_metric.png differ diff --git a/docs/images/resolve_redis_locus.png b/docs/images/resolve_redis_locus.png new file mode 100644 index 0000000..70c6e26 Binary files /dev/null and b/docs/images/resolve_redis_locus.png differ diff --git a/docs/images/resolve_redis_metric.png b/docs/images/resolve_redis_metric.png new file mode 100644 index 0000000..e882092 Binary files /dev/null and b/docs/images/resolve_redis_metric.png differ diff --git a/docs/images/shorten_locus.png b/docs/images/shorten_locus.png new file mode 100644 index 0000000..bdf4409 Binary files /dev/null and b/docs/images/shorten_locus.png differ diff --git a/docs/images/shorten_metric.png b/docs/images/shorten_metric.png new file mode 100644 index 0000000..0c96bad Binary files /dev/null and b/docs/images/shorten_metric.png differ diff --git a/docs/test/ResolveUrlPerformanceTest.md b/docs/test/ResolveUrlPerformanceTest.md new file mode 100644 index 0000000..a0f7ac9 --- /dev/null +++ b/docs/test/ResolveUrlPerformanceTest.md @@ -0,0 +1,25 @@ +# Short URL resolution API + +### Test Configuration +- Users: 500 test users +- Ramp-up: 5 minutes gradual ramp-up + +### 1. Database Integration +![img.png](../images/resolve_db_metric.png) +![img_1.png](../images/resolve_db_locus.png) +tomcat +- max.thread: 200 + +hikari +- max.pool.size: 10 +- connection.timeout: 1000 + +(Use the same configration for following tests) + +### 2. Redis Integration +![img_2.png](../images/resolve_redis_metric.png) +![img_3.png](../images/resolve_redis_locus.png) + +### 3. LocalCache Integration +![img_4.png](../images/resolve_local_metric.png) +![img_5.png](../images/resolve_local_locus.png) \ No newline at end of file diff --git a/docs/test/ShortenUrlPerformanceTest.md b/docs/test/ShortenUrlPerformanceTest.md new file mode 100644 index 0000000..df0186a --- /dev/null +++ b/docs/test/ShortenUrlPerformanceTest.md @@ -0,0 +1,15 @@ +# URL Shortening API + +### Test Configuration +- Users: 500 test users +- Ramp-up: 5 minutes gradual ramp-up + +### Test +![img.png](../images/shorten_metric.png) +![img_1.png](../images/shorten_locus.png) +tomcat +- max.thread: 200 + +hikari +- max.pool.size: 10 +- connection.timeout: 1000 diff --git a/http/url.http b/http/url.http new file mode 100644 index 0000000..dddbdb0 --- /dev/null +++ b/http/url.http @@ -0,0 +1,14 @@ +### shorten URL +POST localhost:8080/api/v1/shorten +Content-Type: application/json + +{ + "longUrl": "http://amazon.com", + "userId": 1 +} + +### resolve URL +GET localhost:8080/2 + +### find urls by userId +GET localhost:8080/api/v1/users/1/urls diff --git a/load-test/url-shortener-v2.py b/load-test/url-shortener-v2.py index 7d453e4..0287ab3 100644 --- a/load-test/url-shortener-v2.py +++ b/load-test/url-shortener-v2.py @@ -6,7 +6,7 @@ class UrlShortener(HttpUser): @task def url_shorten(self): - random_number = random.randint(1000000, 2000000) + random_number = random.randint(1, 500000) hash = base58_encode(random_number) res = self.client.get(f"/{hash}", allow_redirects=False) print(f"response: {res}, hash: {hash}")