Skip to content

Latest commit

 

History

History
204 lines (130 loc) · 7.65 KB

2021-02-08-metabase.md

File metadata and controls

204 lines (130 loc) · 7.65 KB

Metabase for analytics

Last updated 2021-03-03 by Asheesh Laroia

Summary

The product team wants a way to query more data.

We'll install Metabase and give it access to a collection of SQL VIEWs that remove sensitive information from our database tables. Engineering will maintain these SQL VIEWs.

Metabase will read from a Postgres database replica to avoid excessive load on the primary database. It will need a password to access Postgres.

We'll deploy Metabase using the Aptible Docker image deploy strategy along with an Aptible public endpoint. This is similar to the Get Your Refund app except that it uses Docker image deploy rather than git-based deploy.

Metabase users will be manually created by an admin, and then log in with their GSuite accounts.

Aptible deployment details

In an Aptible environment like vita-min-staging, we will create:

  • A replica of the main app database called vita-min-staging-replica
  • A database called vita-min-staging-metabase, in which Metabase will store its own configuration
  • An app called vita-min-staging-metabase

Run each step in its own terminal tab so it's easy to copy & paste from the output of one into another step.

Create the replica

aptible db:replicate vita-min-staging vita-min-staging-replica

Create Metabase's own database

aptible db:create --environment vita-min-staging  --type postgresql --disk-size 5 vita-min-staging-metabase

Create and deploy Metabase app

Create the app.

aptible apps:create --environment vita-min-staging vita-min-staging-metabase

Configure it to be able to access its own database. You'll need to change passWord and hostName and portNumber to be based on the output you saw when creating Metabase's own database.

aptible config:set --app "vita-min-staging-metabase" "MB_DB_CONNECTION_URI=postgresql://aptible:passWord@hostName:portNumber/db?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory"

Note that you must add ?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory to the end of the Aptible database URL for Metabase to use it. We're using TLS without validating the certificate, which is sketchy in general, but acceptable within our Aptible private network.

Deploy the current version of Metabase.

aptible deploy --app vita-min-staging-metabase --docker-image metabase/metabase:latest

Open its logs.

aptible logs --app vita-min-staging-metabase

Make sure you see a line like this:

2021-02-16T23:03:29.478Z [vita-min-staging-metabase-cmd 3ac580e5c9b9]: 2021-02-16 23:03:29,478 INFO metabase.db :: Successfully verified PostgreSQL 12.5 (Debian 12.5-1.pgdg90+1) application database connection. ✅

Create an endpoint to reach the Metabase app

Create an endpoint to reach the Metabase app

aptible endpoints:https:create --environment vita-min-staging --app vita-min-staging-metabase --default-domain cmd

This will print a URL like: https://app-25318.on-aptible.com

Make sure this worked properly by visiting Metabase at that URL.

Work through the initial setup process using your real name and email address. Set a strong password. Tell it you will set up your data later.

If you want a better URL for the Metabase instance, that is out of scope of this document for now, but it should be very doable with AWS Route 53 and Aptible's documentation.

Restart the app to make sure the settings are preserved and you can still log in.

aptible restart --app vita-min-staging-metabase 

SSH into the app and create Metabase's Postgres user & schema

SSH into the app.

aptible ssh --app vita-min-staging

Create Metabase's Postgres user & schema.

rake analytics:create_views analytics:create_metabase_user

Configure Metabase to read data from the replica

Visit the Metabase interface and click the gear in the top right to visit the admin area.

Use these values always:

  • Name: app
  • Database: db

Copy these values from the Postgres replica:

  • Host
  • Port

Copy these values from the output of SSHing into the app & creating Metabase's Postgres user & schema

  • Username
  • Password

Check the boxes to Use a secure connection (SSL) and Automatically run queries when doing simple filtering and summarizing.

If everything is working properly, you should start to see X-Rays in the Metabase home view that show tables from the analytics schema.

Configure Metabase for Google login

Create a new Google API token for use with Metabase. The details tend to change over time, but this worked at the time of writing:

  • Visit https://console.cloud.google.com/ and make sure you are in the getyourrefund project. It's OK to use this one project for all Aptible environments.

  • Click APIs and Services, then Credentials.

  • Click Create Credentials, then OAuth client ID. Enter an appropriate name like Metabase (Staging) for Get Your Refund. Be sure to enter a URL like https://app-25318.on-aptible.com/ into the URIs under Allowed JavaScript origins. Click save, then copy the client ID to your clipboard.

  • Create a Consent Screen if necessary. Make it Internal and use an app name of Metabase for Get Your Refund.

  • Configure an app domain of e.g. app-25318.on-aptible.com and enter your email address as needed.

  • Click Add or Remove Scopes, then check the boxes for ".../auth/userinfo.email", ".../auth/userinfo.profile", and "openid".

  • Click Save And Continue.

Now, in Metabase:

  • Visit the admin area

  • Visit Authentication, then Google Sign-In. Paste the client ID into the box. Hit save.

Adding new users to Metabase

In Metabase, visit the admin area.

Enter an email address of a user you want to invite. Create their account and let Metabase assign them a random password. Ignore the password. Tell the person to sign in with their Google Account.

Rationale, research, and alternatives considered

For authentication, one option was to put this behind an Amazon AWS Application Load Balancer (ALB) and require an @codeforamerica.org email address. I skipped that for now because I expect our Metabase users to create dashboards that they might share with VITA personnel, and I don't have an easy way to whitelist some URLs to require ALB login and others not to.

I considered Metabase's hosted service. It seemed marginally more secure to host it ourselves since it ensures that no external parties can connect to our Postgres database directly. This is subtle, and it's not 100% clear to me that self-hosted is fundamentally better.

I considered asking product team members to use the Metabase desktop app or another desktop SQL app along with an Aptible DB tunnel. That would be nice because it would decrease our external attack surface, but it would be not as useful because there is no way to make a shared dashboard this way, as far as I know.

I learned from SHIBA that their data science team members use an EC2 instance to process all sorts of data and optionally place a copy in Metabase for viewing by product. Our approach permits us to do that later.

I learned from GetCalFresh that the engineering team members sometimes are uncertain if their database schema changes will interfere with the data science team. I learned from the data science team that they are OK with product schema changes that force them to adjust how they are computing some data. The SQL VIEWs do create some linkage between what we show for analysis vs. what's used in production. This may be helpful or may be a hindrance.

I considered using Metabase's password authentication. I declined because we get 2FA by relying on Google authentication.

I considered making the full Postgres replica data available to Metabase. I felt it would be more secure to limit what Metabase can see.