Add ingest/processed/bytes metric #17581

neha-ellur · 2024-12-17T23:52:49Z

A new metric ingest/processed/bytes has been introduced to track the total number of bytes processed during ingestion tasks, including native batch ingestion, streaming ingestion, and multi-stage query (MSQ) ingestion tasks. This metric helps provide a unified view of data processing across different ingestion pathways.

Key changed/added classes in this PR

This metric was added in three key ingestion task classes:

IndexTask: A sequential ingestion task. The processed bytes were retrieved from the RowIngestionMetersTotals object (buildSegmentsMeters) and emitted directly after segment publication.
ParallelIndexSupervisorTask: A task that supervises parallel ingestion. Processed bytes were aggregated from subtasks' ingestion metrics (RowIngestionMetersTotals).
SeekableStreamIndexTaskRunner: A runner for ingestion tasks that consume data from seekable streams (e.g., Kafka). The processed bytes were calculated based on the size of the data buffers (e.g., ByteEntity buffers) being processed for each record. The metric was emitted for each processed record.
MsqContollerImpl: The ingest/processed/bytes metric is emitted by aggregating bytes processed across all stages and workers during MSQ task execution. This includes fetching counters from the CounterSnapshotsTree, summing up bytes from all input channels for each worker and stage using a stream-based aggregation logic and emitting the aggregated bytes as the ingest/processed/bytes metric for the entire MSQ task.

This PR has:

Testing (locally)

cryptoe · 2025-01-07T17:26:44Z

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java

@@ -329,6 +331,27 @@ public void run(final QueryListener queryListener) throws Exception
    }
    // Call onQueryComplete after Closer is fully closed, ensuring no controller-related processing is ongoing.
    queryListener.onQueryComplete(reportPayload);
+
+    long totalProcessedBytes = reportPayload.getCounters().copyMap().values().stream()


This seems like a wrong place to put this logic .
Ingest/processed/bytes seems like a ingestion only metric no ?
If that is the case, we should emit the metric only if the query is an ingestion query.

you could probably expose a method here https://github.com/apache/druid/blob/9bebe7f1e5ab0f40efbff620769d0413c943683c/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java#L517 saying emit summary metrics and have the task report and the query passed to it.

Moved the logic

I think the place where it has moved is correct.
Rather than ingest in the metric name can we rename the matric to input/processed/bytes or something since we would want that metric in msq selects as well.

Also the msq code might need to be adjusted so that only leaf nodes contribute to this metric no ? as an equivalent batch ingest with range partitioning will show less processed bytes since the shuffle stage input is not being counted for. A simple test should be sufficient to rule this out.

Try a query like replace bar all using select * from extern(http) partitioned by day clustered by col1 and an equivalent range partitioning spec for batch ingestion for the same http input source.
cc @kfaraz

@cryptoe This metric will be used in a fronting UI and should be named ingest/processed/bytes.
Regarding the msq code to being on the leaf nodes, where would that be? Regarding the test, any pointers to existing tests would be helpful, this is my first time in this area of code.

I am not sure ingest/processed/bytes make sense for select query.

MSQ runs using a DAG of stages.

druid/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/kernel/QueryDefinition.java

Line 48 in cedf9bb

public class QueryDefinition

has a
private final Map<StageId, StageDefinition> stageDefinitions;
And each stage definition would have private final List<InputSpec> inputSpecs;
I think the metric makes sense when we check if the input spec is not a StageInputSpec and only then plumb the input bytes to the final summary metric.

A UT like this can help you debug stuff :

druid/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQInsertTest.java

Line 404 in cedf9bb

public void testInsertOnExternalDataSource(String contextName, Map<String, Object> context) throws IOException

Attach breakpoint to

druid/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java

Line 372 in cedf9bb

final InputSpecSlicerFactory inputSpecSlicerFactory =

to see the query definition in action.

Hope it helps.

cryptoe · 2025-01-07T17:28:04Z

Also there are some static check failures which need to be looked at.

neha-ellur · 2025-01-09T14:56:27Z

Also there are some static check failures which need to be looked at.

@cryptoe fixed

...n/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java

...ce/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java

...ce/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java

...ce/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java

kfaraz · 2025-01-15T15:02:32Z

@neha-ellur , just found this PR #14582 .
I wonder if the changes here are even needed since the ingest/input/bytes metric already contains the processed bytes for index, index_kafka, and some other task types.

We probably just need to wire up things for MSQ tasks.

kyhtsang · 2025-01-16T19:43:52Z

Even tho for some task types ingest/processed/bytes is the same as ingest/input/bytes, it isn't for all so we'd still like to emit ingest/processed/bytes.

...n/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java

kfaraz · 2025-01-20T04:59:11Z

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java

+    }
+
+    log.debug("Processed bytes[%d] for query[%s].", totalProcessedBytes, querySpec.getQuery());
+    context.emitMetric("ingest/processed/bytes", totalProcessedBytes);


Suggested change

context.emitMetric("ingest/processed/bytes", totalProcessedBytes);

context.emitMetric("ingest/input/bytes", totalProcessedBytes);

@neha-ellur if this is indeed what is reported in the task reports (i.e. what we meter on, after you confirm the table in the Jira), then we can use this name and either expose it as ingest/processed/bytes in the cube or hide it behind a measure in Detailed Metrics

kfaraz

Minor comments, rest looks good to me.
Please also verify that the metrics are being emitted correctly for the compact task type.
If possible, please add some unit tests to verify the values of the emitted metric.

kfaraz · 2025-01-23T03:28:17Z

...n/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java

        IngestionState.COMPLETED,
        taskStatus.getErrorMsg(),
        segmentsRead,
        segmentsPublished
    );
+    final var totalProcessedBytes = indexGenerateRowStats.lhs.get("processedBytes");


Since we are going to cast this later in the code anyway, let's just do the cast here and avoid the var.

Suggested change

final var totalProcessedBytes = indexGenerateRowStats.lhs.get("processedBytes");

final Number totalProcessedBytes = (Number) indexGenerateRowStats.lhs.get("processedBytes");

...n/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java

neha-ellur · 2025-01-23T04:23:02Z

lease also verify that the metrics are being emitted correctly for the compact task type.

Compact task:

kfaraz

Looks good to me. Thanks for the contribution, @neha-ellur !

changes

7c1e15b

github-actions bot added the Area - Ingestion label Dec 17, 2024

changes

4991db1

github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Dec 18, 2024

kfaraz self-requested a review December 18, 2024 03:52

apache deleted a comment from neha-ellur Dec 18, 2024

cryptoe reviewed Jan 7, 2025

View reviewed changes

neha-ellur added 5 commits January 8, 2025 05:59

move the logic

d435068

fix imports

2a95606

fix imports

23d6c2f

fix checkstyle

62b12db

fix tests

b5e6879

kfaraz reviewed Jan 9, 2025

View reviewed changes

neha-ellur added 2 commits January 9, 2025 21:23

address comments

6dba948

refactor

89b1cc3

kfaraz reviewed Jan 15, 2025

View reviewed changes

...ce/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java Outdated Show resolved Hide resolved

kfaraz reviewed Jan 15, 2025

View reviewed changes

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java Outdated Show resolved Hide resolved

cleanup

05003c2

kfaraz reviewed Jan 15, 2025

View reviewed changes

...ce/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTaskRunner.java Outdated Show resolved Hide resolved

input bytes metric alreadye exists

ce0a935

kfaraz reviewed Jan 20, 2025

View reviewed changes

rename to input bytes

67ad7f6

kfaraz approved these changes Jan 23, 2025

View reviewed changes

add the parent stage filter

3db203f

nits

109b17f

kfaraz approved these changes Jan 23, 2025

View reviewed changes

fix build tests

9f066dd

kfaraz approved these changes Jan 24, 2025

View reviewed changes

kfaraz merged commit 0aadf1e into apache:master Jan 24, 2025
79 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ingest/processed/bytes metric #17581

Add ingest/processed/bytes metric #17581

neha-ellur commented Dec 17, 2024 •

edited

Loading

cryptoe Jan 7, 2025 •

edited

Loading

neha-ellur Jan 8, 2025

cryptoe Jan 11, 2025

neha-ellur Jan 11, 2025 •

edited by cryptoe

Loading

cryptoe Jan 17, 2025

cryptoe commented Jan 7, 2025

neha-ellur commented Jan 9, 2025 •

edited

Loading

kfaraz commented Jan 15, 2025

kyhtsang commented Jan 16, 2025

kfaraz Jan 20, 2025

kyhtsang Jan 20, 2025

kfaraz left a comment

kfaraz Jan 23, 2025

neha-ellur commented Jan 23, 2025 •

edited

Loading

kfaraz left a comment

	context.emitMetric("ingest/processed/bytes", totalProcessedBytes);
	context.emitMetric("ingest/input/bytes", totalProcessedBytes);

	final var totalProcessedBytes = indexGenerateRowStats.lhs.get("processedBytes");
	final Number totalProcessedBytes = (Number) indexGenerateRowStats.lhs.get("processedBytes");

Add ingest/processed/bytes metric #17581

Add ingest/processed/bytes metric #17581

Conversation

neha-ellur commented Dec 17, 2024 • edited Loading

Key changed/added classes in this PR

Testing (locally)

cryptoe Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

neha-ellur Jan 8, 2025

Choose a reason for hiding this comment

cryptoe Jan 11, 2025

Choose a reason for hiding this comment

neha-ellur Jan 11, 2025 • edited by cryptoe Loading

Choose a reason for hiding this comment

cryptoe Jan 17, 2025

Choose a reason for hiding this comment

cryptoe commented Jan 7, 2025

neha-ellur commented Jan 9, 2025 • edited Loading

kfaraz commented Jan 15, 2025

kyhtsang commented Jan 16, 2025

kfaraz Jan 20, 2025

Choose a reason for hiding this comment

kyhtsang Jan 20, 2025

Choose a reason for hiding this comment

kfaraz left a comment

Choose a reason for hiding this comment

kfaraz Jan 23, 2025

Choose a reason for hiding this comment

neha-ellur commented Jan 23, 2025 • edited Loading

kfaraz left a comment

Choose a reason for hiding this comment

neha-ellur commented Dec 17, 2024 •

edited

Loading

cryptoe Jan 7, 2025 •

edited

Loading

neha-ellur Jan 11, 2025 •

edited by cryptoe

Loading

neha-ellur commented Jan 9, 2025 •

edited

Loading

neha-ellur commented Jan 23, 2025 •

edited

Loading