pageserver: add `page_trace` API for debugging #10293

jcsp · 2025-01-07T12:23:22Z

Problem

When a pageserver is receiving high rates of requests, we don't have a good way to efficiently discover what the client's access pattern is.

Closes: #10275

Summary of changes

Add /v1/tenant/x/timeline/y/page_trace?size_limit_bytes=...&time_limit_secs=... API, which returns a binary buffer. Tool to decode and report on the output will follow separately

erikgrinaker

LGTM.

erikgrinaker · 2025-01-07T12:28:09Z

pageserver/src/http/routes.rs

+            .await?;
+
+    let (page_trace, mut trace_rx) = PageTrace::new(event_limit);
+    timeline.page_trace.store(Arc::new(Some(page_trace)));


Should this error if there's already a trace in progress?

erikgrinaker · 2025-01-07T12:30:07Z

pageserver/src/http/routes.rs

+    // Above code is infallible, so we guarantee to switch the trace off when done
+    timeline.page_trace.store(Arc::new(None));


nit: we could also stream to the client, and cancel if the client goes away.

erikgrinaker · 2025-01-07T12:36:44Z

pageserver/src/tenant/timeline.rs

+    pub(crate) fn new(
+        size_limit: u64,
+    ) -> (Self, tokio::sync::mpsc::UnboundedReceiver<PageTraceEvent>) {
+        let (trace_tx, trace_rx) = tokio::sync::mpsc::unbounded_channel();


nit: we could also use a buffered channel with the max size here, to avoid the size accounting.

github-actions · 2025-01-07T13:21:58Z

7227 tests run: 6875 passed, 0 failed, 352 skipped (full report)

Flaky tests (1)

Postgres 14

test_parallel_copy: release-arm64

Code coverage* (full report)

functions: 31.2% (8411 of 26998 functions)
lines: 47.9% (66784 of 139358 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
6bf00a6 at 2025-01-07T15:04:52.449Z :recycle:}

problame

Neat!

I think this is safe to deploy, barring the check_permission problem.

Nits can be addressed in a follow-up.

problame · 2025-01-07T14:31:07Z

pageserver/src/http/routes.rs

+async fn timeline_page_trace_handler(
+    request: Request<Body>,
+    _cancel: CancellationToken,
+) -> Result<Response<Body>, ApiError> {


check_permission is missing

problame · 2025-01-07T14:33:07Z

pageserver/src/http/routes.rs

+
+    let size_limit =
+        parse_query_param::<_, u64>(&request, "size_limit_bytes")?.unwrap_or(1024 * 1024);
+    let time_limit_secs = parse_query_param::<_, u64>(&request, "time_limit_secs")?.unwrap_or(5);


nit: Why not parse a humantime::Duration?

problame · 2025-01-07T14:37:02Z

pageserver/src/http/routes.rs

+    loop {
+        let timeout = deadline.saturating_duration_since(Instant::now());
+        tokio::select! {
+            event = trace_rx.recv() => {
+                buffer.extend(bincode::serialize(&event).unwrap());
+
+                if buffer.len() >= size_limit as usize {
+                    // Size threshold reached
+                    break;
+                }
+            }
+            _ = tokio::time::sleep(timeout) => {
+                // Time threshold reached
+                break;
+            }
+        }
+    }


nit: instead of doing a repeat select!(), I think it's better style to declare one async block that does the loop { trace_rx.recv().await; } , then poll that block inside a timeout.
Roughly like so:

tokio::time::timeout(time_limit_secs, async { loop { let event = trace_rx.recv().await; ... } }).await;

problame · 2025-01-07T14:38:49Z

pageserver/src/http/routes.rs

+            event = trace_rx.recv() => {
+                buffer.extend(bincode::serialize(&event).unwrap());


I first thought event is always Ok() but it isn't if this handler is called concurrently on the same timeline.

We should

be only writing the Ok() value to the buffer and

bail out of the loop as soon as recv() fails

This is going to busyloop if the timeline is dropped, but seems fine to deploy temporarily for now.

problame · 2025-01-07T14:43:38Z

pageserver/ctl/src/page_trace.rs

+    let event_size = bincode::serialized_size(&PageTraceEvent {
+        key: (0 as i128).into(),
+        effective_lsn: Lsn(0),
+        time: SystemTime::now(),
+    })?;


You're deserializing PageTraceEvent here, but we need to deserialize Option<PageTraceEvent> with the current impl.

cf #10293 (comment)

I think we're getting away with it because of the event_size + 1 below. But yeah, we have to decode the actual bytes as Option for now to get the proper values.

problame · 2025-01-07T14:43:44Z

pageserver/ctl/src/page_trace.rs

+                return Err(e.into());
+            }
+        }
+        let event = bincode::deserialize::<PageTraceEvent>(&event_bytes)?;


jcsp requested review from erikgrinaker and problame January 7, 2025 12:23

jcsp mentioned this pull request Jan 7, 2025

Jcsp/pagetrace releasebased #10294

Draft

erikgrinaker approved these changes Jan 7, 2025

View reviewed changes

jcsp added 4 commits January 7, 2025 14:02

pageserver: add PageTrace machinery

9659df3

pageserver: API for invoking page trace

598a18f

pageserver: move PageTraceEvent into pageserver_api

8b6a185

pagectl: add page-trace command

6bf00a6

jcsp force-pushed the jcsp/pagetrace branch from dfff907 to 6bf00a6 Compare January 7, 2025 14:05

problame requested changes Jan 7, 2025

View reviewed changes

problame reviewed Jan 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pageserver: add `page_trace` API for debugging #10293

pageserver: add `page_trace` API for debugging #10293

jcsp commented Jan 7, 2025 •

edited

Loading

erikgrinaker left a comment

erikgrinaker Jan 7, 2025

erikgrinaker Jan 7, 2025

erikgrinaker Jan 7, 2025

github-actions bot commented Jan 7, 2025 •

edited

Loading

Postgres 14

problame left a comment

problame Jan 7, 2025

problame Jan 7, 2025

problame Jan 7, 2025

problame Jan 7, 2025

erikgrinaker Jan 7, 2025

problame Jan 7, 2025

erikgrinaker Jan 7, 2025 •

edited

Loading

problame Jan 7, 2025

		// Above code is infallible, so we guarantee to switch the trace off when done
		timeline.page_trace.store(Arc::new(None));

		event = trace_rx.recv() => {
		buffer.extend(bincode::serialize(&event).unwrap());

pageserver: add page_trace API for debugging #10293

Are you sure you want to change the base?

pageserver: add page_trace API for debugging #10293

Conversation

jcsp commented Jan 7, 2025 • edited Loading

Problem

Summary of changes

erikgrinaker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 7, 2025 • edited Loading

7227 tests run: 6875 passed, 0 failed, 352 skipped (full report)

Postgres 14

Code coverage* (full report)

problame left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erikgrinaker Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pageserver: add `page_trace` API for debugging #10293

pageserver: add `page_trace` API for debugging #10293

jcsp commented Jan 7, 2025 •

edited

Loading

github-actions bot commented Jan 7, 2025 •

edited

Loading

erikgrinaker Jan 7, 2025 •

edited

Loading