Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: add page_trace API for debugging #10293

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Jan 7, 2025

Problem

When a pageserver is receiving high rates of requests, we don't have a good way to efficiently discover what the client's access pattern is.

Closes: #10275

Summary of changes

  • Add /v1/tenant/x/timeline/y/page_trace?size_limit_bytes=...&time_limit_secs=... API, which returns a binary buffer. Tool to decode and report on the output will follow separately

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

.await?;

let (page_trace, mut trace_rx) = PageTrace::new(event_limit);
timeline.page_trace.store(Arc::new(Some(page_trace)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this error if there's already a trace in progress?

Comment on lines +1570 to +1571
// Above code is infallible, so we guarantee to switch the trace off when done
timeline.page_trace.store(Arc::new(None));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could also stream to the client, and cancel if the client goes away.

pub(crate) fn new(
size_limit: u64,
) -> (Self, tokio::sync::mpsc::UnboundedReceiver<PageTraceEvent>) {
let (trace_tx, trace_rx) = tokio::sync::mpsc::unbounded_channel();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could also use a buffered channel with the max size here, to avoid the size accounting.

Copy link

github-actions bot commented Jan 7, 2025

7227 tests run: 6875 passed, 0 failed, 352 skipped (full report)


Flaky tests (1)

Postgres 14

Code coverage* (full report)

  • functions: 31.2% (8411 of 26998 functions)
  • lines: 47.9% (66784 of 139358 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
6bf00a6 at 2025-01-07T15:04:52.449Z :recycle:

Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

I think this is safe to deploy, barring the check_permission problem.

Nits can be addressed in a follow-up.

Comment on lines +1526 to +1529
async fn timeline_page_trace_handler(
request: Request<Body>,
_cancel: CancellationToken,
) -> Result<Response<Body>, ApiError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_permission is missing


let size_limit =
parse_query_param::<_, u64>(&request, "size_limit_bytes")?.unwrap_or(1024 * 1024);
let time_limit_secs = parse_query_param::<_, u64>(&request, "time_limit_secs")?.unwrap_or(5);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why not parse a humantime::Duration?

Comment on lines +1552 to +1568
loop {
let timeout = deadline.saturating_duration_since(Instant::now());
tokio::select! {
event = trace_rx.recv() => {
buffer.extend(bincode::serialize(&event).unwrap());

if buffer.len() >= size_limit as usize {
// Size threshold reached
break;
}
}
_ = tokio::time::sleep(timeout) => {
// Time threshold reached
break;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of doing a repeat select!(), I think it's better style to declare one async block that does the loop { trace_rx.recv().await; } , then poll that block inside a timeout.
Roughly like so:

tokio::time::timeout(time_limit_secs, async {
    loop {
        let event = trace_rx.recv().await;
        ...
    }
}).await;

Comment on lines +1555 to +1556
event = trace_rx.recv() => {
buffer.extend(bincode::serialize(&event).unwrap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first thought event is always Ok() but it isn't if this handler is called concurrently on the same timeline.

We should

  1. be only writing the Ok() value to the buffer and
  2. bail out of the loop as soon as recv() fails

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to busyloop if the timeline is dropped, but seems fine to deploy temporarily for now.

Comment on lines +25 to +29
let event_size = bincode::serialized_size(&PageTraceEvent {
key: (0 as i128).into(),
effective_lsn: Lsn(0),
time: SystemTime::now(),
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're deserializing PageTraceEvent here, but we need to deserialize Option<PageTraceEvent> with the current impl.

cf #10293 (comment)

Copy link
Contributor

@erikgrinaker erikgrinaker Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're getting away with it because of the event_size + 1 below. But yeah, we have to decode the actual bytes as Option for now to get the proper values.

return Err(e.into());
}
}
let event = bincode::deserialize::<PageTraceEvent>(&event_bytes)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Access pattern observation in keyspace ("pagetrace")
3 participants