Skip to content

Commit

Permalink
Add some documentation on validation
Browse files Browse the repository at this point in the history
  • Loading branch information
Paul Sandoz authored and PaulSandoz committed Nov 12, 2018
1 parent 5785712 commit 2d0d050
Show file tree
Hide file tree
Showing 2 changed files with 159 additions and 0 deletions.
158 changes: 158 additions & 0 deletions docs/validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# Validation

Hollow supports user-defined validation of data state.
Validation can ensure that bad data does not propagate to consumers.

Validation occurs after data state is published (as blobs) and before the availability of
the data state is announced, thereby allowing inspection of erroneous but published data state.

## Implementing and registering a validator

A validator is created by implementing the interface `com.netflix.hollow.api.producer.validation.ValidatorListener`.

The following is an example validator that checks if there are two or more objects that
have the same primary key:


```java
public static class DuplicateValidator implements ValidatorListener {

private final String dataTypeName;

public DuplicateDataDetectionValidator(String dataTypeName) {
this.dataTypeName = Objects.requireNonNull(dataTypeName);
}

@Override
public String getName() {
return this.getClass().getName() + ":" + dataTypeName;
}

@Override
public ValidationResult onValidate(HollowProducer.ReadState dataState) {
ValidationResult.ValidationResultBuilder vrb = ValidationResult.from(this);
vrb.detail("Type", dataTypeName);

HollowSchema schema = dataState.getStateEngine().getSchema(dataTypeName);
if (schema.getSchemaType() != HollowSchema.SchemaType.OBJECT) {
return vrb.failed("Bad configuration: data type is not an Object");
}

HollowObjectSchema oSchema = (HollowObjectSchema) schema;
PrimaryKey primaryKey = oSchema.getPrimaryKey();
if (primaryKey == null) {
return vrb.failed("Bad configuration: data type does not have a primary key");
}
vrb.detail("PrimaryKey", primaryKey);

Collection<Object[]> duplicateKeys = getDuplicateKeys(dataState.getStateEngine(), primaryKey);
if (!duplicateKeys.isEmpty()) {
return vrb
.detail("Objects",
duplicateKeys.stream().map(Arrays::toString).collect(Collectors.joining(",")))
.failed("Duplicate objects with the same primary key");
}

return vrb.passed();
}

private Collection<Object[]> getDuplicateKeys(HollowReadStateEngine stateEngine, PrimaryKey primaryKey) {
...
}
}

```
The following highlights some important aspects (without going into the specific details of how
this validator detects duplicate objects with the same primary key):

- A validator has state, in this case the name of data type of object instances it is checking.
The name of the validator, returned from the call to `getName`, should include pertinent state to
differentiate between two or more registered instances of the same validator class.

- When validation is to be performed the producer will emit a validate event by calling
the `onValidate` method, with the data state, for all registered validators.

- A validator builds and returns a `ValidationResult` reporting the name of the validator and
whether validation passed or failed.
Details may be included to provide additional information, such as if the duplicate objects
to help resolve the problem of the bad data.

- If the validator throws an unexpected runtime exception then it is as if a
`ValidationResult` is returned reporting error with that exception.

The example validator presented above may be registered when building a `HollowProducer`:

```java
HollowProducer producer = HollowProducer.withPublisher(publisher)
.withAnnouncer(announcer)
.withValidator(new DuplicateValidator("Movie"))
.build();

producer.runCycle(state -> {
for(Movie movie : movies)
state.add(movie);
});
```


## Pre-defined validators

It is not necessarily easy to write a validator operating on the data state using an
instance of `ReadState`. To make it easier for developers Hollow provides a few
pre-defined validators for common use cases:

1. A duplicate object validator (similar to the example above), `DuplicateDataDetectionValidator`.
2. A record cound variance validator, `RecordCountVarianceValidator`.
This validator can be configured to check if cardinality of objects varies within a required
percentage. This is useful to detect if the number of objects of a particular data type
unexpectedly decreases or increases.
3. An object modification validator, `ObjectModificationValidator`.
This validator can be configured to compare the state of objects (with the same primary key)
that have been modified (not added or removed).


## Using the generated object API

Ordinarily the generated object API is utilized by consumers, but there is no inherent reason
why it cannot also be utilized by the producer. This can make it significantly easier to write
a custom validator. For example, the `MovieAPI` may be created from data state and all movies
traversed as follows:

```java
public static class MyValidator implements ValidatorListener {

@Override
public String getName() { ... }

@Override
public ValidationResult onValidate(HollowProducer.ReadState dataState) {
MovieAPI api = new MovieAPI(dataState);

for (Movie m : api.getAllMovie()) {
...
}
}
}

```

The `ObjectModificationValidator` is designed to be used with a generated object API.

## Listening to events emitted by the producer for other stages

A registered validator, implementing `ValidatorListener`, may also receive events for other
stages by implementing other listener interfaces. For example implementing the `CycleListener`
will enable the validator to receive events for when a production cycle is started and completed.
Receiving an event when the cycle starts (a call to the `onCycleStart`) method may enable
the validator to load validator configuration state dynamically and freeze that state for the
duration of the cycle.


## Integrity checking

Hollow additionally supports a special form of validation, integrity checking of data state,
that occurs before the user-defined validation of data state.

The integrity checking ensures that the data state is not corrupted, perhaps due to a bug.
Failure is likely rare but an important safety check to ensure corrupt data is not propagated
to consumers.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ pages:
- Diving Deeper: 'diving-deeper.md'
- Interacting with a Hollow Dataset: 'interacting-with-a-dataset.md'
- Data Ingestion: 'data-ingestion.md'
- Validation: 'validation.md'
- Advanced Topics: 'advanced-topics.md'
- Testing: 'testing.md'
- Glossary: 'glossary.md'
Expand Down

0 comments on commit 2d0d050

Please sign in to comment.