Last updated 2021-03-03 by Asheesh Laroia
The product team wants a way to query more data.
We'll install Metabase and give it access to a collection of SQL VIEWs that remove sensitive information from our database tables. Engineering will maintain these SQL VIEWs.
Metabase will read from a Postgres database replica to avoid excessive load on the primary database. It will need a password to access Postgres.
We'll deploy Metabase using the Aptible Docker image deploy strategy along with an Aptible public endpoint. This is similar to the Get Your Refund app except that it uses Docker image deploy rather than git-based deploy.
Metabase users will be manually created by an admin, and then log in with their GSuite accounts.
In an Aptible environment like vita-min-staging, we will create:
- A replica of the main app database called vita-min-staging-replica
- A database called vita-min-staging-metabase, in which Metabase will store its own configuration
- An app called vita-min-staging-metabase
Run each step in its own terminal tab so it's easy to copy & paste from the output of one into another step.
aptible db:replicate vita-min-staging vita-min-staging-replica
aptible db:create --environment vita-min-staging --type postgresql --disk-size 5 vita-min-staging-metabase
Create the app.
aptible apps:create --environment vita-min-staging vita-min-staging-metabase
Configure it to be able to access its own database. You'll need to change passWord
and hostName
and portNumber
to be based on the output you saw when creating Metabase's own database.
aptible config:set --app "vita-min-staging-metabase" "MB_DB_CONNECTION_URI=postgresql://aptible:passWord@hostName:portNumber/db?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory"
Note that you must add ?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory
to the end of the Aptible
database URL for Metabase to use it. We're using TLS without validating the certificate, which is sketchy in general,
but acceptable within our Aptible private network.
Deploy the current version of Metabase.
aptible deploy --app vita-min-staging-metabase --docker-image metabase/metabase:latest
Open its logs.
aptible logs --app vita-min-staging-metabase
Make sure you see a line like this:
2021-02-16T23:03:29.478Z [vita-min-staging-metabase-cmd 3ac580e5c9b9]: 2021-02-16 23:03:29,478 INFO metabase.db :: Successfully verified PostgreSQL 12.5 (Debian 12.5-1.pgdg90+1) application database connection. ✅
Create an endpoint to reach the Metabase app
aptible endpoints:https:create --environment vita-min-staging --app vita-min-staging-metabase --default-domain cmd
This will print a URL like: https://app-25318.on-aptible.com
Make sure this worked properly by visiting Metabase at that URL.
Work through the initial setup process using your real name and email address. Set a strong password. Tell it you will set up your data later.
If you want a better URL for the Metabase instance, that is out of scope of this document for now, but it should be very doable with AWS Route 53 and Aptible's documentation.
Restart the app to make sure the settings are preserved and you can still log in.
aptible restart --app vita-min-staging-metabase
SSH into the app.
aptible ssh --app vita-min-staging
Create Metabase's Postgres user & schema.
rake analytics:create_views analytics:create_metabase_user
Visit the Metabase interface and click the gear in the top right to visit the admin area.
Use these values always:
- Name: app
- Database: db
Copy these values from the Postgres replica:
- Host
- Port
Copy these values from the output of SSHing into the app & creating Metabase's Postgres user & schema
- Username
- Password
Check the boxes to Use a secure connection (SSL) and Automatically run queries when doing simple filtering and summarizing.
If everything is working properly, you should start to see X-Rays in the Metabase home view that show tables from the analytics schema.
Create a new Google API token for use with Metabase. The details tend to change over time, but this worked at the time of writing:
-
Visit https://console.cloud.google.com/ and make sure you are in the getyourrefund project. It's OK to use this one project for all Aptible environments.
-
Click APIs and Services, then Credentials.
-
Click Create Credentials, then OAuth client ID. Enter an appropriate name like Metabase (Staging) for Get Your Refund. Be sure to enter a URL like https://app-25318.on-aptible.com/ into the URIs under Allowed JavaScript origins. Click save, then copy the client ID to your clipboard.
-
Create a Consent Screen if necessary. Make it Internal and use an app name of Metabase for Get Your Refund.
-
Configure an app domain of e.g. app-25318.on-aptible.com and enter your email address as needed.
-
Click Add or Remove Scopes, then check the boxes for ".../auth/userinfo.email", ".../auth/userinfo.profile", and "openid".
-
Click Save And Continue.
Now, in Metabase:
-
Visit the admin area
-
Visit Authentication, then Google Sign-In. Paste the client ID into the box. Hit save.
In Metabase, visit the admin area.
Enter an email address of a user you want to invite. Create their account and let Metabase assign them a random password. Ignore the password. Tell the person to sign in with their Google Account.
For authentication, one option was to put this behind an Amazon AWS Application Load Balancer (ALB) and require an @codeforamerica.org email address. I skipped that for now because I expect our Metabase users to create dashboards that they might share with VITA personnel, and I don't have an easy way to whitelist some URLs to require ALB login and others not to.
I considered Metabase's hosted service. It seemed marginally more secure to host it ourselves since it ensures that no external parties can connect to our Postgres database directly. This is subtle, and it's not 100% clear to me that self-hosted is fundamentally better.
I considered asking product team members to use the Metabase desktop app or another desktop SQL app along with an Aptible DB tunnel. That would be nice because it would decrease our external attack surface, but it would be not as useful because there is no way to make a shared dashboard this way, as far as I know.
I learned from SHIBA that their data science team members use an EC2 instance to process all sorts of data and optionally place a copy in Metabase for viewing by product. Our approach permits us to do that later.
I learned from GetCalFresh that the engineering team members sometimes are uncertain if their database schema changes will interfere with the data science team. I learned from the data science team that they are OK with product schema changes that force them to adjust how they are computing some data. The SQL VIEWs do create some linkage between what we show for analysis vs. what's used in production. This may be helpful or may be a hindrance.
I considered using Metabase's password authentication. I declined because we get 2FA by relying on Google authentication.
I considered making the full Postgres replica data available to Metabase. I felt it would be more secure to limit what Metabase can see.