Skip to content

Commit

Permalink
first commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Jorrit Sandbrink committed Dec 4, 2023
0 parents commit 7f408a7
Show file tree
Hide file tree
Showing 15 changed files with 111 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
json2parquet.py
72 changes: 72 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Apache Superset – DuckDB Docker image
Execute the steps below to setup a local Apache Superset instance—with DuckDB support—using Docker.

## Build the image
```Shell
docker build -t jorritsandbrink/superset-duckdb docker
```

## Run the container
```Shell
docker run -d -p 8080:8088 \
-e "SUPERSET_SECRET_KEY=your_secret_key" \
--mount type=bind,source=/$(pwd)/data,target=/data \
--name superset-duckdb \
jorritsandbrink/superset-duckdb
```
> Note: the local `/data` folder is mounted to make the data files accessible from within the container.
## Setup Superset
```Shell
./docker/setup.sh
```
This includes creating an admin user and configuring a DuckDB database connection.

## Navigate to UI
Go to http://localhost:8080/login/ and login with `username=admin` and `password=admin`.

## Check database connection
Go to _Database Connections_ (http://localhost:8080/databaseview/list/) to validate the database connection has been created:

![Overview of database connections in Superset UI](database-connection-overview.png)

Click the _Edit_ button to see the connection details:

<img src='duckdb-database-connection.png' alt='DuckDB database connection configuration in Superset UI' width='300'/>

SQLAlchemy URI:
```
duckdb:///:memory:
```

Click `TEST CONNECTION` and make sure you see this popup message:

![Popup message indicating a good connection](connection-looks-good.png)
# Querying files from Superset using DuckDB
Go to _SQL Lab_ (http://localhost:8080/sqllab/) to query `Parquet`, `JSON`, or `CSV`, files as follows:

![Apache Superset DuckDB SQL Lab](sql-lab-duckdb-parquet.png)

The queries use a glob syntax to read multiple files as documented on https://duckdb.org/docs/data/multiple_files/overview.html.

## Parquet
```sql
SELECT *
FROM '/data/parquet_table/*.parquet'
```

## JSON
```sql
SELECT *
FROM '/data/json_table/*.json'
```

## CSV
```sql
SELECT *
FROM '/data/csv_table/*.csv'
```

# References
- [Portable Data Stack](https://github.com/cnstlungu/portable-data-stack-dagster/tree/main)
- [Preparing Apache Superset to working with Delta Lake, DuckDB, Prophet, Python LDAP Active Directory, Jinja and MS-SQL driver using Ubuntu](https://medium.com/@syarifz.id/preparing-apache-superset-to-working-with-delta-lake-duckdb-prophet-python-ldap-active-d9da7a9a68c3)
- [Create DuckDB Connection and Dataset using Delta Lake Parquet File in Apache Superset](https://medium.com/@syarifz.id/create-duckdb-connection-and-create-dataset-using-parquet-file-in-apache-superset-8765e5772342)
3 changes: 3 additions & 0 deletions data/csv_table/multi_row.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
c1,c2,c3
bar,False,2
baz,True,10
2 changes: 2 additions & 0 deletions data/csv_table/single_row.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
c1,c2,c3
foo,True,5
4 changes: 4 additions & 0 deletions data/json_table/multi_row_1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[
{"c1": "bar", "c2": false, "c3": 2},
{"c1": "baz", "c2": true, "c3": 10}
]
2 changes: 2 additions & 0 deletions data/json_table/multi_row_2.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
{"c1": "bar", "c2": false, "c3": 2}
{"c1": "baz", "c2": true, "c3": 10}
1 change: 1 addition & 0 deletions data/json_table/single_row.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"c1": "foo", "c2": true, "c3": 5}
Binary file added data/parquet_table/multi_row.parquet
Binary file not shown.
Binary file added data/parquet_table/single_row.parquet
Binary file not shown.
8 changes: 8 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM apache/superset:3.1.0rc1-py310

USER root

RUN pip install duckdb==0.9.2
RUN pip install duckdb-engine==0.9.2

USER superset
18 changes: 18 additions & 0 deletions docker/setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Create admin user.
docker exec -it superset-duckdb superset fab create-admin \
--username admin \
--firstname Superset \
--lastname Admin \
--email [email protected] \
--password admin

# Upgrade database to latest.
docker exec -it superset-duckdb superset db upgrade

# Setup roles.
docker exec -it superset-duckdb superset init

# Create database connection for DuckDB.
docker exec -it superset-duckdb superset set_database_uri \
-d DuckDB-memory \
-u duckdb:///:memory:
Binary file added docs/img/connection-looks-good.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/database-connection-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/duckdb-database-connection.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/sql-lab-duckdb-parquet.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 7f408a7

Please sign in to comment.