Skip to content

Commit

Permalink
doc: update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Peter Park committed Jan 6, 2025
1 parent 07cd949 commit ff34acc
Show file tree
Hide file tree
Showing 23 changed files with 313 additions and 164 deletions.
236 changes: 73 additions & 163 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,47 +5,41 @@ This project provides a URL shortening service that converts long URLs into shor
- Users submit a long URL to receive a shortened URL.
- Users can use the shortened URL to redirect to the original URL.

# Functional Requirements
## Functional Requirements

1. URL Shortening
- Users submit a long URL to receive a shortened URL.
2. URL Redirection
- Users can use the shortened URL to be redirected to the original URL.
1. **URL Shortening** : Users submit a long URL to receive a shortened URL.
2. **URL Redirection** : Users can use the shortened URL to be redirected to the original URL.

> Unconsidered Requirements:
>
> - Users can optionally specify a custom alias.
> - Users can optionally specify an expiration date.
# Non-Functional Requirements
## Non-Functional Requirements

1. High Scalability: Load balancer, DB replication, Redis cluster.
2. High Availability: 99.99%
3. Low Latency: Hybrid cache, DB replication.
4. Uniqueness Guarantee: Hash keys.
2. High Availability: Load balancer, Auto scaling, DB replication, Redis cluster, Health check
3. Low Latency: Hybrid cache, DB replication, Message Queue
4. Uniqueness Guarantee: Base58, ID Generator

# Technology Stack
## Technology Stack

- Java 21
- Kotlin 1.9.25
- Spring Boot 3.4.1
- PostgreSQL 17
- Redis 8
- EhCache
- Caffeine
- Docker
- JUnit 5

# Installation and Execution
## Installation and Execution

Run using Docker:

```shell
make run
```

# API Specifications
## API Specifications

### 1. URL Shortening
URL shortening API to shorten a long URL

```shell
POST /api/v1/shorten
Expand All @@ -54,9 +48,9 @@ POST /api/v1/shorten
Request Body:
```json
{
"longUrl": "https://example.com/long-url"
"longUrl": "https://example.com/long-url",
"userId": 1
}

```

Response:
Expand All @@ -70,6 +64,7 @@ Status:
- `400` Bad Request: Invalid URL input.

### 2. URL Redirection
API for redirecting to the original URL with the shortened URL

```shell
GET /{hash}
Expand All @@ -82,155 +77,70 @@ HTTP/1.1 301 Moved Permanently
Location: "https://example.com/long-url"
```

# Database Schema

| Field Name | Type | Description |
|--------------| --- |--------------------|
| id | SERIAL | Primary key |
| long_url | VARCHAR | Original URL |
| short_url | VARCHAR | Shortened URL |
| hash | VARCHAR | Hash for short URL |
| createdDate | TIMESTAMP | Creation date |
| modifiedDate | TIMESTAMP | Modification date |

# Testing

Run tests using JUnit 5:
### 3. Find URLs by userId
Find the shorten and original URLs that user have

```shell
make test
GET /api/v1/users/{userId}/urls
```
Response:
```shell
[
{
"id": 1,
"longUrl": "http://amazon.com",
"shortUrl": "http://localhost:8080/3gXe",
"hash": "3gXe",
"userId": 1,
"createdAt": "2025-01-06T00:14:12.429006",
"updatedAt": "2025-01-06T00:14:12.429006"
},
]
```

# High-Level Design

![overview.png](src/main/resources/static/overview.png)

# Details

<details>
<summary>Response Code</summary>

🟢 **Status Code `301`** 🟢

- Prevents traffic loss through browser caching.
- Generally used for permanent URL redirection.
- Adjust `Cache-Control` and `Expires` headers when changing URLs.

### 301 Moved Permanently

- Permanently redirects the URL.
- Internally utilizes browser caching.
- Advantages:
- SEO-friendly: Prompts search engines to update the indexed URL.
- Prevents traffic loss: Cached URL reduces server traffic.
- Disadvantages:
- Difficult to change: Permanent setting can complicate updates.
- Caching: Requires additional work to update redirection.

### 302 Found

- Temporarily redirects the URL.
- Advantages:
- Temporary redirection: Suitable for event or promotion pages.
- No impact on search engines: Original URL remains indexed.
- Disadvantages:
- Traffic loss: URL redirection occurs every time.

</details>
<details>
<summary>Unique URL</summary>

🟢 **Auto-Generated ID + Base58** 🟢

- Combines uppercase letters, lowercase letters, and 58 digits.
- Easy for humans to read.
- Allows generation of diverse URLs (e.g., 6 characters can create 38 billion URLs).
- Uses auto-generated database keys.

### 1. Base58

Uses 58 combinations of uppercase letters, lowercase letters, and digits (excluding 0, O, l, I).

- Advantages:
- Prevents confusion: Easy for humans to read, reducing errors (e.g., avoiding 0/O/l/I confusion).
- Shorter URLs: More efficient than Base62.
- Disadvantages:
- Smaller character set: Fewer combinations than Base62.
- Limited special characters.

### 2. Base62

Uses 62 combinations of uppercase letters, lowercase letters, and digits.

- Advantages:
- Larger combinations: Utilizes all 62 characters.
- Short URLs: Efficient and widely compatible.
- Excludes special characters: Suitable for various systems.
- Disadvantages:
- Similar characters may cause confusion (e.g., 0/O/l/I).

### 3. Hash

- Advantages:
- Guarantees consistent output length.
- Low collision probability.
- Produces the same result for identical inputs.
- Disadvantages:
- Potential collisions.
- Long URLs may require trimming hash values.

### 4. UUID
## Database Schema
URL Table

- Advantages:
- High uniqueness.
- Extremely low collision probability.
- Disadvantages:
- Long URLs.
- Hard to read.
| Field Name | Type | Description |
|------------|-----------|------------------------|
| id | SERIAL | Primary key |
| long_url | VARCHAR | Original URL |
| short_url | VARCHAR | Shortened URL |
| hash | VARCHAR | Hash for short URL |
| userId | BIGINT | User Identity |
| createdAt | TIMESTAMP | Creation date time |
| updatedAt | TIMESTAMP | Modification date time |

</details>
<details>
<summary>Database</summary>
## Testing

🟢 **DB Replication** 🟢
Run tests using JUnit 5:

- Improved read performance: Master for writes, replicas for reads.
- Scalability and availability: Backup in case of failures.
- Load distribution: Spreads read and write operations across replicas.
```shell
make test
```

</details>
<details>
<summary>Cache</summary>

🟢 **Hybrid Cache** 🟢
Uses both local and remote cache.
* Local Cache: Caffeine
* Remote Cache: Redis (Lettuce)

- Low latency: Local cache is faster than remote.
- Prevents cache stampede: Minimizes backend load when cache is missing.
- Cache warm-up: Updates local cache during server startup.

### Lettuce
* pros:
* Asynchronous and non-blocking for high-concurrency environments.
* Thread-safe, supports multi-threaded applications.
* Built-in Redis cluster and sharding support.
* Supports reactive programming.
* cons:
* More complex to use (requires understanding of async programming).
* May use more memory due to async I/O model.

### Jedis
* pros:
* Simple and easy to use (synchronous).
* Low memory overhead.
* Ideal for small-scale or single-instance Redis setups.
* Cons:
* Not thread-safe by default (requires separate connections per thread).
* Limited or more complex cluster and sharding support.
* Synchronous, which can be less efficient for high-concurrency use cases.

</details>

# Performance Test
## Current Architecture
![architecture.png](docs/images/architecture.png)
High-level design of current architecture

## Future Architecture
![advanced_architecture.png](docs/images/advanced_architecture.png)
High-level design of future architecture

- [Response Code](docs/ResponseCode.md)
- [Unique URL](docs/UniqueURL.md)
- [Database](docs/Database.md)
- [Cache](docs/Cache.md)
- [ID Generator](docs/IDGenerator.md)
- [Message Queue](docs/MessageQueue.md)
- [Rate Limiter](docs/RateLimiter.md)

## Performance Test
Test Machine Specifications
- Processor: Apple M3 Pro
- Cores: 11 cores
- Memory: 36GB

Load Test APIs
1. [URL Shortening API](docs/test/ShortenUrlPerformanceTest): generate a shortened version of a provided URL
2. [Short URL resolution API](docs/test/ResolveUrlPerformanceTest.md): resolves a shortened URL and redirect the user to the origin URL
36 changes: 36 additions & 0 deletions docs/Cache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Cache


**Hybrid Cache**

Uses both local and remote cache.
* Local Cache: Caffeine
* Remote Cache: Redis (Lettuce)

- Low latency: Local cache is faster than remote.
- Memory management: handle memory by using LRU policy
- Prevents cache stampede: Minimizes backend load when cache is missing (local cache -> remote cache -> db)
- Clustering: scaling in high traffic, replicate data across node
- Cache warm-up: Updates local cache during server startup.

### Redis
* pros:
* rich data structure: string, list, sorted set, etc
* persistence: RDB snapshot, AOF logs for backup, recovery
* replication and clustering: scalability, fault-tolerance
* atomic operation
* high performance
* cons:
* single thread: might cause bottleneck

### Memcache
* pros:
* simplicity: easy to setup, simple key-value
* performance: fast get/set operation
* multi thread: multi core process
* low memory
* cons:
* no persistence: no backup, data is lost on restart of failure
* limited data structure
* no built in clustering
* only LRU eviction
8 changes: 8 additions & 0 deletions docs/Database.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Database

**DB Replication**

- Improved read performance: Master for writes, replicas for reads.
- Scalability and availability: Backup in case of failures. auto recovery
- Load distribution: Spreads read and write operations across replicas.
- read time replication of transaction logs
24 changes: 24 additions & 0 deletions docs/IDGenerator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# ID Generator

creating unique and short IDs for URLs

### Redis `INCR`
The `INCR` in Redis generates auto increment IDs
* pros:
* Low latency: fast due to memory-based
* Atomicity: The single-threaded of Redis ensures no collisions.
* Scalability: clustering and sharding, master and replica
* cons
* SPOF: Data can be lost if the Redis server crashes.
* Replication Lag: delays in replication cause ID synchronization issues.

### Database `Auto Increment ID`
use `AUTO_INCREMENT` to automatically generate unique IDs.
* pros:
* Built-in Functionality: minimal setup
* Durability: IDs managed by DB system
* Fail over: replication
* cons:
* Performance: Slower than Redis
* Scalability: Inefficient in distributed database
* Locking Issues: concurrency issue due to transaction locking.
Loading

0 comments on commit ff34acc

Please sign in to comment.