Skip to content

Commit

Permalink
Add documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
eliotjordan committed Feb 18, 2025
1 parent a5a71d2 commit 1704da2
Show file tree
Hide file tree
Showing 4 changed files with 68 additions and 0 deletions.
59 changes: 59 additions & 0 deletions architecture-decisions/0004-deleting-records.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# 4. Deleting Records

Date: 2025-02-18

## Status

Accepted

## Context

When resources are deleted in Figgy, a DeletionMarker resource is created at the
same time. The DeletionMarker stores the deleted resource's identifier,
resource type, and a serialized copy of the metadata (in the `deleted_object`
field). We need a method for processing DeletionMarkers in DPUL-C to remove the
corresponding record from the Solr index.

## Decision

#### Hydration Consumer

1. We will process DeletionMarkers that reference a deleted resource with a
resource type that we currently index into DPUL-C. In addition, we will check if a hydration
cache entry exists for the deleted resource and discard the DeletionMarker if not.
1. A special CacheMarker is created from the DeletionMarker that uses the
deleted resource's id as the id and the updated_at value from the
DeletionMarker as the timestamp.
1. Special hydration cache entry attributes are generated. The hydration cache
entry created from these attributes will replace the hydration cache entry of
the deleted resource.
- Existing metadata is replaced with a simple deleted => true kv pair
- The entry id is set to the deleted resource's id
- The entry internal_resource type is set to that of the deleted resource

#### Transformation Consumer

1. Messages with the deleted => kv pair are handled separately.
1.A special solr document is generated from the deleted object hydration cache
entry with the following structure.
```
%{ id: "id", deleted: true }
```

#### Indexing Consumer

1. Messages with the `deleted: true` field are handled sperately and assigned to
the `delete` batcher.
1. The delete batcher sends the deleted record ids to a the Solr.delete_batch
function which iterates over them, deletes each record, and then commits the
batch of deletes.

## Consequences

- DeletionMarkers will stay in the Figgy database unless the resource is
restored. This means that DPUL-C will have to triage an ever increasing number
over time.

- Deleted resource hydration and transformation cache entries will stay in the cache after the
resource is remove from Solr until the next full reindex, or in the case of the
transformation cache, partial reindex.
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ defmodule DpulCollections.IndexingPipeline.Figgy.HydrationCacheEntry do
"metadata" => %{"deleted" => true}
}
}) do
# Generate a small json document for deleted resources that indicates that
# the Solr record with that id should be deleted from the index.
%{
id: id,
deleted: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,9 @@ defmodule DpulCollections.IndexingPipeline.Figgy.HydrationConsumer do
)
when internal_resource in ["DeletionMarker"] and
resource_type in ["EphemeraFolder", "EphemeraTerm"] do
# Only process messages where the deleted resource has an exisiting
# hydration cache entry. If one does not exist, it means that the resource
# has not been indexed into DPUL-C.
resource = IndexingPipeline.get_hydration_cache_entry!(resource_id, cache_version)

cond do
Expand Down
4 changes: 4 additions & 0 deletions lib/dpul_collections/indexing_pipeline/figgy/resource.ex
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,10 @@ defmodule DpulCollections.IndexingPipeline.Figgy.Resource do
%{"resource_id" => [%{"id" => deleted_resource_id}], "resource_type" => [resource_type]} =
resource.metadata

# Create attributes specifically for deletion markers.
# 1. Replace existing metadata with a simple deleted => true kv pair
# 2. Set the entry id to the deleted resource's id
# 3. Set the entry internal_resource type to that of the deleted resource
resource
|> Map.from_struct()
|> Map.delete(:__meta__)
Expand Down

0 comments on commit 1704da2

Please sign in to comment.