version: v1.3
The context for this challenge is that you work at a company that powers a marketplace app for healthcare facilities to hire healthcare professionals (a.k.a. workers).
Your role is that of a senior software engineer in charge of the open-shift backend service. This service stores information on the Shift
, Facility
, Worker
, Document
, FacilityRequirement
, and DocumentWorker
entities.
Your task is to complete the following User Story: As a worker, I want to get all available shifts across all active facilities where I'm eligible to work.
For a Worker to be eligible for a facility's shift:
- The
Facility
must be active - The
Shift
must be active (i.e., not deleted) - The
Worker
must be active - The
Shift
must not be claimed by someone else - The
Worker
must have all of the facility's required documents - The professions between the
Shift
andWorker
must match
We provide a PostgreSQL database and a seed file. It is random such that:
- Some
Shifts
are claimed - Some
Workers
are inactive - Some
Facilities
are inactive - Some
Workers
don't have all of a facility's required documents
Provide a RESTful HTTP server (or another interchange format if you think it's a better match) with the following:
- Risk mitigation through proper testing
- Proper error handling and logging
- A brief writeup on how you would improve the performance of the endpoint with a justification of why it would perform better than your submission
- (Bonus) Measure the performance of your endpoint and provide a brief report in a
PERFORMANCE.md
file
We provide a folder called seed
, which contains a docker-compose.yaml
file that helps you set up a database. It is a PostgreSQL database seeded with about 2 million records.
To set it up, go into the seed
folder and execute the command docker compose up --build
. Once seeded, do not stop docker-compose
. Keep the database running and use your framework of choice to connect to it using the database URL postgres://postgres:postgres@localhost:5432/postgres
.
The seed script inserts a lot of workers. Among those workers, three fulfill all document requirements; they all have one of the professions. The seed script prints their IDs and professions at the end so you can verify them against your query.
Please submit your solution by creating a pull request (PR) on this repository. Do not merge your PR. Instead, please return to your Hatchways assessment page to confirm your submission.
I modified seed.ts so:
- It generates 200k records instead of 2 million, pg in docker was failing with
code: SqlState(E53100), message: could not resize shared memory segment
- It distributes shifts through 5 years time span, better for UI display and it was generating shifts in the past fixed to a specific month
I modified docker-compose.yml so
-
It doesn't run seed.ts everytime the container is (re)-created thus duplicating records. It now resets and seeds
-
Had to run
prisma generate
from /seed to generate prism client types - might want to include it in the README
Finds available shifts for a given workerId. Internally, there are 2 implementations/strategies
- Memory: fetches worker docs and facility requirements independently and diffs in-memory. Triggered by passing strategy: 'memory' (default)
- Raw: performs an sql raw query using join + intersect to diff docs. Triggered by passing strategy: 'raw'
I believe the exercise can't be deterministically tested the way it is presented. Even if there are in fact 3 deterministic workers having all documents, that does not imply there will be shifts available to them. One could expect there would be shifts available due to the sheer amount of shifts generated, but not certain, thus it would be a flacky test.
Thus, I seed/teardown additional test data before each suite:
-
Use case 1: Shift1 for Facility1 which requires no docs, Worker1 has no docs -> Shift1 should exist in results for Worker1
-
Use case 2: Shift2 for Facility2 requires Doc1, worker has Doc1 -> Shift1 and Shift2 should exist in results for Worker2
-
Run with:
npm test
- will run both service unit tests + server e2e tests. It doesn't require the api server to be running as it injects requests
Fastify server exposing the endpoint. TODO: Properly separate into modules/routes/controllers
- Run with
npm run api:dev
-> http://localhost:3000/api/shifts?workerId=101
A short demo to showcase finding shifts by worker id and displaying in calendar format
- cd into /web and run
npm run dev
-> http://localhost:5173/
There's a simple explicit measurement when hitting the /api/shifts enpoint, it will determine the time spent around the invoked shift service and add it as { meta: { ts: number }} in the response. Of course this is not scalable as performance is really a cross-cutting concern.
Measures /api/shifts endpoint using both memory/raw implementation. It seems memory impl wins in latency/requests/throughput
- Ensure api server is running
- Run with
npm run benchmark
- Add real observability via instrumentation, measure function calls, chokepoints. ie: OpenTelemetry
- Add caching at the application level. ie: Redis
- Optimize sql queries
- Use materialized views in db. Entities that have fields
is_deleted: true
oris_active: false
would not even be considered for querying - Use stored procedures in db
- Partition db using distinct fields that could help segmenting data. Ie: facility location