ClimdexDash

A dash leaflet-based Dashboard for Extreme Climate Indices

This project demonstrates the Strategy Design Pattern to handle multiple data sources (CSV, PostgreSQL, etc.) for retrieving and visualizing total precipitation time series data. The implementation ensures scalability, maintainability, and flexibility by decoupling data retrieval logic from the dashboard setup.

A demo webapp can be found here.

Overview

When a user clicks on the map, the dashboard retrieves total precipitation time series data for the selected location.
The data retrieval is handled through strategies, currently supporting:
- CSV-based source (mock data)
- PostgreSQL-based source (real data from a weather database)

Project Structure

./src/data/source.py contains the Strategy Pattern implementation for handling different data sources.
The dashboard is built using Dash (Plotly) for interactive data visualization.

Running the Dashboard

1. Clone the repository

git clone https://github.com/jojo0094/ClimdexDash.git
cd ClimdexDash

2. Install the dependencies
I have used uv (Rust cargo equivalent for Python) to speed up the installation process.

uv sync

or for fresh installation

uv init
uv venv
uv sync

Data Source Strategies

1. Utlize ABCs to define the interface for data sources

class DataSource(ABC):
    @abstractmethod
    def get_precipitation_data(self, lat: float, lon: float) -> pd.DataFrame:
        pass

2. Implement the concrete dataclasses for each data source

class CSVSource(DataSource):

    def __post_init__(self):
        """step to take care of data retieval logic from CSV file/s"""

    def get_precipitation_data(self, lat: float, lon: float) -> pd.DataFrame:
        ...

class PostgreSQLSource(DataSource):

    def __post_init__(self):
        """step to take care of data retieval logic from PostgreSQL database"""

    def get_precipitation_data(self, lat: float, lon: float) -> pd.DataFrame:
        ...

3. Use the data source in the dashboard setup

# Initialize the data source
data_source = CSVFileSource()

# Dashboard setup code 
.......

or if you want to switch to PostgreSQL source, just change the data source initialization

# Initialize the data source
data_source = PostgreDataSource()

# Dashboard setup code
.......

Like above, once the data retrieval logic is decoupled from the dashboard setup, it becomes easier to add more data sources without modifying the existing codebase. For example, adding a NetCDF source would require creating a new dataclass that implements the DataSource interface.

You can just create the following code to implement the NetCDF source without modifying the existing dashboard setup thanks to the decoupled benefits offered by the Strategy Pattern.

class NetCDFSource(DataSource):

    def __post_init__(self):
        """step to take care of data retieval logic from NetCDF file/s"""

    def get_precipitation_data(self, lat: float, lon: float) -> pd.DataFrame:
        ...

5. Currently, only PostgresDataSource is implemented while others were just mocked for demonstration purposes.

**6. Other consideration

NetCDF Source: Loads the dataset into RAM at initialization (__post_init__) to provide faster analytical access.
DuckDB Source: A file-based approach for improved performance over traditional databases.
Distributed Computing: Exploring parallelization for large-scale climate data processing through BigQuery, Spark, etc.

Key Features

✅ Strategy Design Pattern: Allows seamless switching between different data sources.
✅ Decoupled Logic: Enhances maintainability and reusability of the codebase.
✅ Extensible Framework: Easily add more data sources without modifying existing logic.
✅ Interactive Visualization: Dash + Plotly integration for extreme climate indices exploration.

Notes

🚀 The database is not included in this repository. You can set up your own using the provided SQL scripts in
ReanalysisIngestion Repository

🖥️ This project runs on a low-capacity remote compute unit, so the dataset is currently limited to 1-5 years.
⏳ Loading time is still slow, but on-RAM processing (NetCDF/DuckDB) should significantly improve performance.
🔧 Type hints: I plan to refine this further.

Future Plans

✅ Implement NetCDF and DuckDB data sources for enhanced performance.
✅ Improve distributed computing capabilities for large-scale climate data processing.
✅ Refine type hinting and optimizations for better readability and robustness.

Stay tuned for upcoming updates! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
dummy.txt		dummy.txt
hello.py		hello.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClimdexDash

Overview

Project Structure

Running the Dashboard

Data Source Strategies

1. Utlize ABCs to define the interface for data sources

2. Implement the concrete dataclasses for each data source

3. Use the data source in the dashboard setup

5. Currently, only PostgresDataSource is implemented while others were just mocked for demonstration purposes.

**6. Other consideration

Key Features

Notes

Future Plans

About

Releases

Packages

Languages

jojo0094/ClimdexDash

Folders and files

Latest commit

History

Repository files navigation

ClimdexDash

Overview

Project Structure

Running the Dashboard

Data Source Strategies

1. Utlize ABCs to define the interface for data sources

2. Implement the concrete dataclasses for each data source

3. Use the data source in the dashboard setup

5. Currently, only PostgresDataSource is implemented while others were just mocked for demonstration purposes.

**6. Other consideration

Key Features

Notes

Future Plans

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages