Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add geospatial-python-urban-analysis-with-postgis project #107

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Base image with Python
FROM python:3.12-slim as builder

# Set the working directory
WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y \
libgdal-dev \
&& rm -rf /var/lib/apt/lists/*

# Create non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser

FROM python:3.12-slim

# Copy built dependencies
COPY --from=builder /usr/lib/libgdal.so /usr/lib/libgdal.so
# Create non-root user
COPY --from=builder /etc/passwd /etc/passwd
USER appuser

# Add healthcheck
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the script
COPY transport.py .

# Run the script when the container starts
CMD ["python", "transport.py"]
139 changes: 139 additions & 0 deletions examples/geospatial-python-urban-analysis-with-postgis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# 🌍 Geospatial Urban Analysis Project

## πŸ“Œ Overview

This project focuses on **geospatial data analysis** for urban environments, particularly analyzing **pedestrian zones, transportation networks, census data, and geographic boundaries**. The dataset includes **shapefiles, GeoJSON, Parquet, and raster files**, allowing advanced **spatial processing and visualization**.

The project uses **PostgreSQL with PostGIS**, **Docker**, and **GeoPandas**, enabling **efficient spatial queries, ETL pipelines, and geospatial machine learning models**.

### ✨ **Key Features**

- πŸ™ **Urban Infrastructure Analysis**: Analyzes bike paths, subway entrances, and school locations.
- πŸ“Š **Geospatial Data Processing**: Supports various spatial formats (Shapefile, GeoJSON, Parquet, Raster).
- πŸ”„ **ETL Pipelines**: Extract, transform, and load urban data into **PostGIS**.
- πŸ€– **Geospatial Machine Learning**: Clustering models to optimize urban planning decisions.
- πŸ—Ί **Interactive Mapping**: Generates visualizations using **Folium and Matplotlib**.

---

## πŸ›  **Requirements**

Before running the project, ensure you have the following dependencies installed:

### πŸ’» **System Requirements**

- 🐳 **Docker** (for PostgreSQL with PostGIS)
- 🐍 **Python 3.8+**

### πŸ“¦ **Python Dependencies**

All required Python libraries are listed in `requirements.txt`. Install them using:

```sh
pip install -r requirements.txt
```

Main dependencies:

- 🌍 **GeoPandas**: Geospatial data processing.
- πŸ—„ **PostgreSQL & PostGIS**: Geospatial database support.
- πŸ“ˆ **Matplotlib & Folium**: Data visualization.
- πŸ€– **Scikit-learn**: Clustering and machine learning models.

---

## πŸš€ **Setup & Installation**

### πŸ“‚ **1. Clone the Repository**

```sh
git clone [email protected]:nanlabs/backend-reference.git
cd examples/geospatial-python-urban-analysis-with-postgis
```

### πŸ— **2. Set Up a Virtual Environment**

Create and activate a Python virtual environment:

```sh
python -m venv env
source env/bin/activate # On macOS/Linux
env\Scripts\activate # On Windows
```

Once activated, install dependencies:

```sh
pip install -r requirements.txt
```

### 🐳 **3. Set Up Docker with PostgreSQL and PostGIS**

Ensure that **Docker** is installed and running. Then, start the database with:

```sh
docker-compose up -d
```

This will:

- πŸ›’ Start a **PostgreSQL database** with **PostGIS** extensions enabled.
- πŸ“Œ Create the necessary **database schema** for storing geospatial data.

## πŸ“– **Working with Notebooks**

To start the analysis and visualization:

```sh
jupyter notebook
```

Then, open one of the notebooks in the `notebooks/` directory.

The notebooks cover:

- 🌍 **Geospatial Data Exploration**: Loading and visualizing spatial datasets.
- πŸš‡ **Urban Accessibility Analysis**: Assessing accessibility of public transport.
- πŸ€– **Clustering and Machine Learning**: Applying spatial clustering algorithms.


Check failure on line 99 in examples/geospatial-python-urban-analysis-with-postgis/README.md

View workflow job for this annotation

GitHub Actions / Markdownlint / Markdown Lint

Multiple consecutive blank lines [Expected: 1; Actual: 2]
### πŸ“Œ **Pipelines Overview**

Check failure on line 100 in examples/geospatial-python-urban-analysis-with-postgis/README.md

View workflow job for this annotation

GitHub Actions / Markdownlint / Markdown Lint

Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] [Context: "### πŸ“Œ **Pipelines Overview**"]
The project includes **several geospatial data processing pipelines**, located in `src/pipelines/`:

- 🚌 **`bus_stop_analysis.py`**: Analyzes bus stops and their spatial distribution.
- πŸ“ **`optimal_stop_pipeline.py`**: Computes the best locations for public transportation stops.
- πŸ—Ί **`shapefile_to_raster.py`**: Converts vector-based shapefiles into raster format for GIS applications.

### βš™οΈ **Running Pipelines**

Check failure on line 107 in examples/geospatial-python-urban-analysis-with-postgis/README.md

View workflow job for this annotation

GitHub Actions / Markdownlint / Markdown Lint

Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] [Context: "### βš™οΈ **Running Pipelines**"]
To execute a pipeline, use the following command:

```sh
PYTHON=. python -m src.pipelines.bus_stop_analysis
```

Replace `bus_stop_analysis` with the pipeline you want to run.

Each pipeline processes geospatial data **efficiently**, ensuring the data is ready for **urban planning and visualization**.

---

## πŸ— **Project Structure**

```sh
.
β”œβ”€β”€ Dockerfile # 🐳 Docker configuration for Python environment
β”œβ”€β”€ docker-compose.yml # πŸ›’ PostgreSQL + PostGIS setup
β”œβ”€β”€ requirements.txt # πŸ“¦ Python dependencies
β”œβ”€β”€ config.py # βš™οΈ Configuration settings
β”œβ”€β”€ data/ # 🌍 Raw geospatial datasets
β”œβ”€β”€ notebooks/ # πŸ“– Jupyter Notebooks for geospatial analysis
β”œβ”€β”€ scripts/ # πŸ”„ Data processing scripts
β”œβ”€β”€ src/ # πŸ— Source code
β”‚ β”œβ”€β”€ database/ # πŸ—„ Database connection and queries
β”‚ β”œβ”€β”€ etl/ # πŸ”„ ETL pipeline for spatial data
β”‚ β”œβ”€β”€ ml/ # πŸ€– Machine learning models for clustering
β”‚ β”œβ”€β”€ pipelines/ # πŸ“Œ Spatial data processing workflows
β”‚ β”œβ”€β”€ visualization/ # πŸ—Ί Map and data visualization modules
```

---
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from pathlib import Path

# Add the necessary imports
PROJECT_ROOT = Path(__file__).resolve().parent

POPULATION_DENSITY_RASTER = PROJECT_ROOT / "data/census/caba/census.tif"

print(POPULATION_DENSITY_RASTER)
# API URLs
BUS_STOPS_URL = "https://cdn.buenosaires.gob.ar/datosabiertos/datasets/transporte-y-obras-publicas/colectivos-paradas/paradas-de-colectivo.geojson"

DISTRICTS_URL = "https://cdn.buenosaires.gob.ar/datosabiertos/datasets/ministerio-de-educacion/comunas/comunas.geojson"
# Standard coordinate reference system (CRS)
EPSG_TARGET = 3857
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
PROJCS["POSGAR_94_Argentina_3",GEOGCS["GCS_POSGAR 94",DATUM["D_POSGAR_1994",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",-90],PARAMETER["central_meridian",-66],PARAMETER["scale_factor",1],PARAMETER["false_easting",3500000],PARAMETER["false_northing",0],UNIT["Meter",1]]
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
PROJCS["POSGAR 94 / Argentina 3",GEOGCS["POSGAR 94",DATUM["Posiciones_Geodesicas_Argentinas_1994",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],TOWGS84[0,0,0,0,0,0,0],AUTHORITY["EPSG","6694"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4694"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",-90],PARAMETER["central_meridian",-66],PARAMETER["scale_factor",1],PARAMETER["false_easting",3500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AUTHORITY["EPSG","22183"]]
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Loading