Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for object change detection in filters #2

Merged
merged 15 commits into from
Jan 8, 2019

Conversation

amajedi
Copy link
Contributor

@amajedi amajedi commented Nov 13, 2018

NOTE: This PR is currently unstable and should not be merged. It suffers from issue #4.
Update: Issue #4 is not exclusive to this PR, removing [UNSTABLE] tag.

Why this PR is being opened:
This PR fixes issue #5 by allowing filters to detect changes. Given a One-To-Many relationship with frequent updates to a single record in the first relation, may cause an excessive number of output records to be produced if a large number of records corresponding to it exists in the second relation. This PR introduces the capability for filters to detect changes in input records and decide the course of action that should be taken, one of: UPDATE, DELETE, or SKIP in an effort to combat this problem.

Why existing functionality does not work:
The existing filter functionality only allows records to be treated as deleted (or tombstoned). This may cause output records to exclude necessary data.

What this PR changes:

  • BaseFilter requires a filter function which now returns FilterMode
  • BaseFilter.filter() additionally accepts the old BaseRecord as the last argument
  • Existing DefaultFilter class updated to be more explicit about previously implicit filtering logic
  • The next record is cached in a staging variable, this may have slight memory implications

Notes:

  • See Commit 5dbb83a Filtering happens in the KafkaTopicIterator.next() call. An important caveat is that hasNext() may return True if records exist in the input topic, however, if those records are all skipped, the return value of next() will be NULL. In an upcoming commit, I will be changing hasNext() to perform filtering as well to avoid this discrepancy

  • Commit 25f3212 updates BaseFilter to support two filter() signatures, one which accepts the old record and one which doesn't. This allows users who only care about the new record in their filters to avoid having an ignored argument.

  • Commit 415f0b0 corrects [known issue] Since KafkaTopic.hasNext() and KafkaTopic.next() can now theoretically consume many input topic messages as they skip along, they should also now increment records.consumed and records.consumed.by.entity. Previously, this was only done here. Without any changes, we will be under reporting these metrics.

  • Commit c0d719f addresses PR feedback related to large number of arguments necessary for BaseTopic configuration. Created a TopicConfig class, an instance can be passed to BaseTopic.configure instead.

  • Commit 168dcec addresses a bug which fails to null the value field on deletes, see: https://github.com/jwplayer/southpaw/blob/master/src/main/java/com/jwplayer/southpaw/topic/KafkaTopic.java#L91 this is used by the engine to avoid generating denorm docs

  • 0b72cf1 removes deprecated function signature from BaseFilter. This was initially intended to be backwards compatible with existing filters but isn't due to change in return value.

@amajedi amajedi changed the title [WIP] Support for change detection in filters Support for change detection in filters Nov 14, 2018
@@ -25,6 +25,17 @@
* in any denormalized records. They are effectively treated as a tombstone.
*/
public abstract class BaseFilter {

/*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings should start with /**.
There are some other cases of missing * in this PR.

*/
public abstract boolean isFiltered(String entity, BaseRecord record);
public FilterMode filter(String entity, BaseRecord record, BaseRecord oldRecord) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to keep that last parameter optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is optional in the sense that a class extending the BaseFilter has the option of implementing either function.

protected Iterator<ConsumerRecord<byte[], byte[]>> iter;
protected KafkaTopic<K, V> topic;

// Information about the next valid record

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to describe what null values mean.

/*
* Internal helper to obtain and stage the next non-skipped record
*/
private ConsumerRecord<byte[], byte[]> getAndStageNextRecord() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getAndStage is a bit confusing to me. Maybe findAndStage or something, implying that this is more than a simple getter.

}

@Override
public boolean hasNext() {
return iter.hasNext();
return (getAndStageNextRecord() != null);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

() are superfluous

@@ -69,49 +76,125 @@
private KafkaTopicIterator(Iterator<ConsumerRecord<byte[], byte[]>> iter, KafkaTopic<K, V> topic) {
this.iter = iter;
this.topic = topic;
this.nextRecord = null;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.resetStagedRecord() ?

Copy link

@ksindi ksindi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Copy link

@TBoshoven TBoshoven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went over the changes, LGTM.
Naturally, I could not approve without a comment about a newline ;)

@@ -34,10 +45,26 @@ public void configure(Map<String, Object> config) {
}

/**
* Determines if the given record should be filtered based on its entity
* DEPRECATED

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use this instead.

public void update(T value) {
this.value = value;
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline, looks like your editor needs to be configured.

@amajedi amajedi changed the title Support for change detection in filters [UNSTABLE] Support for change detection in filters Nov 30, 2018
@amajedi amajedi changed the title [UNSTABLE] Support for change detection in filters Support for change detection in filters Dec 14, 2018
@amajedi amajedi changed the title Support for change detection in filters Fix for issue #5: Add support for object change detection in filters Jan 2, 2019
@amajedi amajedi changed the title Fix for issue #5: Add support for object change detection in filters Fix for issue #4: Add support for object change detection in filters Jan 2, 2019
@amajedi amajedi changed the title Fix for issue #4: Add support for object change detection in filters Add support for object change detection in filters Jan 2, 2019
@amajedi amajedi force-pushed the change-detection-filtering branch from 168dcec to a0678a3 Compare January 2, 2019 16:51
*/
public abstract boolean isFiltered(String entity, BaseRecord record);
public FilterMode filter(String entity, BaseRecord record) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing from @moritastic comment, this signature isn't deprecated. It's a breaking change since the return value changes. This PR already introduces a number of backwards compatibility changes. Lets just remove this and I'll make a note of it in the release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, 0b72cf1 removes the deprecated signature

@eric-weaver eric-weaver merged commit 6cdd913 into master Jan 8, 2019
@eric-weaver eric-weaver deleted the change-detection-filtering branch January 8, 2019 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants