-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fastapi: fix wrapping of middlewares #3012
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice comment and test!
@adriangb Please take a look at the failing tests |
# are handled. | ||
# This should not happen unless there is a bug in OpenTelemetryMiddleware, but if there is we don't want that | ||
# to impact the user's application just because we wrapped the middlewares in this order. | ||
stack = ServerErrorMiddleware(stack) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, is the purpose of this pr to handle exceptions generated from the OpenTelemetryMiddleware
itself (if there is a bug) or unhandled exceptions thrown from the request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! If there is a bug in OpenTelemetryMiddleware
without this double wrapping then the server might not send a 500 to the client and instead abruptly disconnect. In practice I believe every ASGI server I know of does also have it's own handling of unhandled exceptions such that this would not be the case, but I'm not sure we want to rely on that + there would still be a noticeable change (different response content at least, maybe different headers, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since asgi middleware does not have exception handling of requests on it's own, wouldn't legitimate exceptions that are raised from the request itself (that we would want to collect telemetry from) be swallowed up by the new middleware? Or does ServerErrorMiddleware
ONLY handle 500
s? Apologies, I'm not very familiar with the inner workings of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ServerErrorMiddleware
catches unhandled exceptions and turns them into 500 responses. If any exception has made it up that far it's going to be a 500 response. I'm not sure if that's what your asking. Maybe you could give an example test case to integrate and see how it behaves.
fixes #795 |
e406be8
to
3d8dd4f
Compare
@@ -170,7 +170,8 @@ def setUp(self): | |||
self._instrumentor = otel_fastapi.FastAPIInstrumentor() | |||
self._app = self._create_app() | |||
self._app.add_middleware(HTTPSRedirectMiddleware) | |||
self._client = TestClient(self._app) | |||
self._client = TestClient(self._app, base_url="https://testserver:443") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the root cause of tests failures. It turns out every request was being made twice because it was getting redirected to https. Before this PR that wasn't being instrumented correctly, so this was not being caught! I think that's another major bug this PR is fixing.
@xrmx see #3012 (comment) |
@Kludex could you update the Starlette instrumentations after we finish up this PR? I'm assuming they have the same bugs. |
The pipeline has been running for some hours... 👀 |
app = ServerErrorMiddleware(app) | ||
return app | ||
|
||
app._original_build_middleware_stack = app.build_middleware_stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: do you think it would be possible to use wrapt for monkeypatching instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not. This is simpler and works just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xrmx genuine question: what would be the benefit of that in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xrmx genuine question: what would be the benefit of that in this case?
wrapt monkeypatching tend to generally work better than shuffling classes under the hood, had to move to wrapt in httpx instrumentation because otherwise the instrumentation did not patch stuff loaded before the load of the instrumentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there's a specific reason I don't think introducing more complexity and code to be executed is helpful here. There are plenty of tests checking that the monkey patching is working correctly. This is also not a method that users would ever call, in fact as someone who's responsible for significant changes to this very method I would have made it a private method if it had not already been public for years.
instrumentation/opentelemetry-instrumentation-fastapi/tests/test_fastapi_instrumentation.py
Outdated
Show resolved
Hide resolved
Hello! Just wondering if you'd expect this to enable getting the This has been discussed in more detail here: open-telemetry/opentelemetry-python#3477, but currently OpenTelemetryMiddleware's context seems to be removed by the time an exception gets to the Would be great to enable users to report back a [Update]: Have now verified this achieves the above behaviour - this fix is greatly appreciated! |
Fixed issue with test shutdown hanging. All checks passing now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I trust you on the fastapi knowledge but would be nice to add a test to check that the change does what it's supposed to fix
Ups looks like that got lost, I had added it in 4220b09 |
Co-authored-by: Riccardo Magliocchetti <[email protected]>
7497d32
to
f825c76
Compare
Added back and updated the branch |
.../opentelemetry-instrumentation-fastapi/src/opentelemetry/instrumentation/fastapi/__init__.py
Show resolved
Hide resolved
ICYMI the added test is red |
Thanks. I hadn't understood how the tests are parametrized and that |
app = ServerErrorMiddleware(app) | ||
return app | ||
|
||
app._original_build_middleware_stack = app.build_middleware_stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xrmx genuine question: what would be the benefit of that in this case?
# run the lifespan, initialize the middleware stack | ||
# this is more in-line with what happens in a real application when the server starts up | ||
self._exit_stack = ExitStack() | ||
self._exit_stack.enter_context(self._client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build_middleware_stack
this PR patches is called on the startup event - which the TestClient
only runs if called within a context manager, that's why @adriangb is using ExitStack
here. FYI
MRE:
With this change it shows up properly
Run this and you'll see that there is no
"http.status_code": 500
in the logs.