Make sure to call close() on Scope returned from io.opentelemetry.context.Context.makeCurrent() #924

joost-de-vries · 2024-12-30T10:21:55Z

The documentation for io.opentelemetry.context.Context.makeCurrent() states
Every makeCurrent() must be followed by a Scope#close(). Breaking these rules may lead to memory leaks and incorrect scoping.

Running zio-opentelemetry with -Dio.opentelemetry.context.enableStrictContext=true leads to an error

Dec 30, 2024 11:08:42 AM io.opentelemetry.context.StrictContextStorage$PendingScopes run
SEVERE: Scope garbage collected before being closed.
java.lang.AssertionError: Thread [ZScheduler-Worker-2] opened a scope of OtelContext{currentSpan=datadog.opentelemetry.shim.trace.OtelSpan$NoopSpanContext@1f7a631e, rootSpan=datadog.opentelemetry.shim.trace.OtelSpan$NoopSpanContext@1f7a631e} here:
	at datadog.opentelemetry.shim.context.OtelContext.makeCurrent(OtelContext.java:90)
	at zio.telemetry.opentelemetry.context.ContextStorage$Native$.set$$anonfun$1(ContextStorage.scala:63)
	at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1027)
	at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1063)
	at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1090)
	at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:412)
	at zio.internal.FiberRuntime.evaluateMessageWhileSuspended(FiberRuntime.scala:488)
	at zio.internal.FiberRuntime.drainQueueOnCurrentThread(FiberRuntime.scala:249)
	at zio.internal.FiberRuntime.run(FiberRuntime.scala:137)
	at zio.internal.ZScheduler$$anon$3.run(ZScheduler.scala:380)

This PR uses zio.Scope to make sure that the otel Scope returned by Context.makeCurrent is always closed.

…text.Context.makeCurrent()

CLAassistant · 2024-12-30T10:22:01Z

All committers have signed the CLA.

grouzen · 2024-12-30T11:50:11Z

@joost-de-vries Thanks for spotting this! I'll look into it when I have time and tell you what I think about your solution. The only concern I have is that these changes are not backward-compatible.

joost-de-vries · 2024-12-30T13:18:13Z

@grouzen thank you for your quick response.

Yes, too bad that the Baggage api is affected. I couldn't think of a solution that doesn't. Maybe you have an idea?

I've tested this fix on datadog.
In our code we call after a http response Tracing.setAttribute("http.status_code", response.status.code.toString).
That status code was always missing in the span.
When I deployed this fix it started showing up. You can see the exact moment of the deploy

joost-de-vries · 2024-12-30T13:31:35Z

There's still a failing assertion java.lang.IllegalStateException: Thread [ZScheduler-Worker-7] opened scope, but thread [ZScheduler-Worker-3] closed it
The assertion occurs at the end of the 2nd (zio) argument of Tracing.root(...)(zio)
So the otel Scope is getting closed. But from a different thread.
I'm not sure whether that is an issue for zio code or no. The otel strict code expects to be called by a blocking java call.

grouzen · 2024-12-30T15:56:10Z

Now I realize that it is required to change the Baggage API this way to reflect the .makeCurrent semantics.

grouzen · 2024-12-30T20:52:56Z

@joost-de-vries Please pull the changes from series/2.x branch to fix the CI error.

…close-scope

joost-de-vries · 2025-01-02T07:41:45Z

@grouzen I wonder how to test the fix. One way would be to mock de java OpenTelemetry Context.makeCurrent. And check that close() is called.

joost-de-vries · 2025-01-02T08:36:01Z

Thinking about the changed api of Baggage: I guess I can introduce a new trait. And add deprecation warnings to the existing.
Of course that raises the question what to call the new trait. And what to call the alternative to the OpenTelemetry function

 def baggage(logAnnotated: Boolean = false): URLayer[ContextStorage, Baggage] =
    Baggage.live(logAnnotated)

I guess we could call the existing trait BaggageDeprecated

grouzen · 2025-01-02T14:15:33Z

Thinking about the changed api of Baggage: I guess I can introduce a new trait. And add deprecation warnings to the existing. Of course that raises the question what to call the new trait. And what to call the alternative to the OpenTelemetry function
 def baggage(logAnnotated: Boolean = false): URLayer[ContextStorage, Baggage] =
    Baggage.live(logAnnotated)
I guess we could call the existing trait BaggageDeprecated

Frankly, I'm still thinking about how to avoid having scoped baggage API. I need to find some time to read the Baggage OTEL spec. I assume it is fine to have nested OTEL scopes when dealing with baggage data because, as far as I remember, according to the spec, it stores the baggage data in the trace context rather than the span, which makes perfect sense to me.
Sorry for this stream of consciousness if it doesn't make any sense ;) Just don't have enough time at the moment to dig deeply into it.

grouzen · 2025-01-02T14:17:05Z

@grouzen I wonder how to test the fix. One way would be to mock de java OpenTelemetry Context.makeCurrent. And check that close() is called.

Sounds good. I think it is a sane way of testing this kind of stuff.

joost-de-vries · 2025-01-03T06:26:26Z

Thinking about the changed api of Baggage: I guess I can introduce a new trait. And add deprecation warnings to the existing. Of course that raises the question what to call the new trait. And what to call the alternative to the OpenTelemetry function
 def baggage(logAnnotated: Boolean = false): URLayer[ContextStorage, Baggage] =
    Baggage.live(logAnnotated)
I guess we could call the existing trait BaggageDeprecated
Frankly, I'm still thinking about how to avoid having scoped baggage API. I need to find some time to read the Baggage OTEL spec. I assume it is fine to have nested OTEL scopes when dealing with baggage data because, as far as I remember, according to the spec, it stores the baggage data in the trace context rather than the span, which makes perfect sense to me. Sorry for this stream of consciousness if it doesn't make any sense ;) Just don't have enough time at the moment to dig deeply into it.

Yes, I'm chewing on that too.
In my mind the Tracing working correctly is much more important than Baggage. Thankfully that api is not affected. So I would like the fix to Tracing to not get hold up by the Baggage api.
We can leave the Baggage api as is. (Unless we think of a transparent solution.) And introduce a ScopedBaggage trait. Or for the affected methods a scoped variant. Like def setScoped(name: String, value: String)(implicit trace: Trace): URIO[Scope, Unit]. That way current usage is unaffected.

Come to think of it: the latter solution is simplest probably.

joost-de-vries · 2025-01-03T06:34:24Z

@grouzen I wonder how to test the fix. One way would be to mock de java OpenTelemetry Context.makeCurrent. And check that close() is called.

Sounds good. I think it is a sane way of testing this kind of stuff.

I've looked into this:
It seems the test for ContextStorage.native doesn't initialize the otel OpenTelemetry. So I don't know if the strict jvm argument will work.
Also for a jvm argument to be applied to the test it probably needs to run forked in a separate jvm.
I don't think we can mock the ThreadLocal.

joost-de-vries · 2025-01-03T11:25:16Z

@grouzen I've implemented the latter approach. Existing code will compile unchanged.

(of course that raises the question what the implementation of the existing methods should be. I've implemented it with a local ZIO.scoped. If we want to keep the old behaviour we should provide a scope that is never finalized. )

joost-de-vries · 2025-01-07T14:19:42Z

I've added a test to verify that ContextStorage.native adheres to the contract of otel Context.makeCurrent:

Context prevCtx = Context.current(); 
try (Scope ignored = ctx.makeCurrent()) {   
  assert Context.current() == ctx; 
  ... 
} 
assert Context.current() == prevCtx;

Most of those tests fail on master. But not all.

grouzen · 2025-01-07T14:28:34Z

Hey! Sorry for being silent, don't have time to work on this atm.
I want to take a timeout to think more since I don't quite like the breaking changes and the scoped API in general. However, I still hope it may be possible to avoid this.
Here are some links to have a look if you are interested:

joost-de-vries · 2025-01-07T14:43:02Z

@grouzen I understand.
I agree it would be nice if we don't need to change the api.

Are you on zio discord? I can share our experiences with zio + datadog.

grouzen · 2025-01-07T16:20:26Z

@grouzen I understand. I agree it would be nice if we don't need to change the api.

Are you on zio discord? I can share our experiences with zio + datadog.

Yeah, you can find me on the zio-telemetry channel as well. It would be lovely to hear about your experience!

…close-scope

joost-de-vries · 2025-02-10T08:18:43Z

@grouzen we can split this issue in two: 1. the above fix for Tracing only. That won't affect current api 2. a fix for Baggage. We can then take a bit more time in coming up with an appropriate api.
What do you think?

It would be nice to get the fix for Tracing merged.

grouzen · 2025-02-10T09:59:40Z

@grouzen we can split this issue in two: 1. the above fix for Tracing only. That won't affect current api 2. a fix for Baggage. We can then take a bit more time in coming up with an appropriate api. What do you think?

It would be nice to get the fix for Tracing merged.

Yes, lets do the first PR for Tracing. I spent lots of time researching this and now I think we need to change how we manage the context and context storage completely.

…scope' into opentelemetry-jvm-storage-close-scope

joost-de-vries · 2025-02-10T11:32:05Z

opentelemetry/src/main/scala/zio/telemetry/opentelemetry/baggage/Baggage.scala

          injectLogAnnotations *> modifyBuilder(_.put(name, value)).unit

+        // to preserve the existing behavior where the otel Scope returned from Context.makeCurrent is not closed
+        // see io.opentelemetry.context.Context.makeCurrent
+        private def unclosedScope[A](io: URIO[Scope, A]): UIO[A] =


@grouzen this seems to me the way to preserve existing behaviour. What do you think?

Also: perhaps good to link to an issue in that comment?

It needs to be checked with some end-to-end test. One of the options could be a manual verification using an optelemetry-instrumentation example. Additionally, it would be good to check how it works with dd-trace-java because it is the reason why this issue was created in the first place.

I'm just afraid we close the current span this way, and I believe this is not what we want to do while modifying baggage data, right? That's why I think we need to do manual tests. Also, adding a unit test to prove/disprove my assumption would be great.

What kind of implementation would you propose?

It can't be fixed this way, as I said before. I think the whole idea of having specific ZIO2 instrumentation in opentelemetry-java-instrumentation and dd-trace-java was a mistake, hence the current implementation of ContextStorage.native was a mistake too. It must be done differently and I'm working on this right now.

I think the better solution in your situation would be to use your fork with the fix for your case while I'm figuring out how to fix it properly to make everyone happy.

Using the previous supervisor based approach?
For us it would be nice not to be dependent on the zio support in dd-trace-java. I get the sense that that's not maintained actively.

The solution is quite straightforward - map between FiberRef-based and ThreadLocal-based ContextStorage implementations when needed manually. This automatically allows getting rid of finicky Supervisor-based implementations

joost-de-vries · 2025-02-10T11:35:06Z

A different way of managing context storage. Interesting!

I removed the scoped methods from Baggage.
There's the question what to do with the implementation of existing methods. Can you have a look at that?

grouzen · 2025-02-10T14:31:14Z

One more thing. I should have asked about it in the very beginning. Could you please provide a repo with a minimal example of reproducing this issue? I'm curious about how your setup differs from everyone else's and why nobody has experienced this issue before.
I think it is legit to have such a repo due to the severe backward incompatible changes you are asking to make.
What do you think?

joost-de-vries · 2025-02-11T08:05:54Z

I've looked into the telemetry tests in zio-telemetry for one that involves 1. global otel 2. jvm context storage 3. auto instrumentation 4. zio auto intrumentation. But didn't see one.
I do have a test like that with datadog auto instrumentation (dd-trace-java). That was quite a bit of work to setup.
An expected span attribute does show up with the above fix. But not without. So that shows the issue.
Unfortunately that's in a private repo. I guess I can show you in a call.

grouzen · 2025-02-14T15:51:55Z

I've looked into the telemetry tests in zio-telemetry for one that involves 1. global otel 2. jvm context storage 3. auto instrumentation 4. zio auto intrumentation. But didn't see one. I do have a test like that with datadog auto instrumentation (dd-trace-java). That was quite a bit of work to setup. An expected span attribute does show up with the above fix. But not without. So that shows the issue. Unfortunately that's in a private repo. I guess I can show you in a call.

An open telemetry-instrumentation-example module may be considered an end-to-end test involving all the parts you mentioned.
In order to fix your case, we need either a complete example that reproduces your issue with dd-trace-java or a general understanding of your current setup. In any case, someone will need to implement such an e2e test at the end of the day :) I think jumping on a quick call could be a good start.

Make sure to call close() on Scope returned from io.opentelemetry.con…

82bc169

…text.Context.makeCurrent()

joost-de-vries requested a review from a team as a code owner December 30, 2024 10:21

cleanup

6b0baf1

joost-de-vries and others added 3 commits December 31, 2024 11:03

Merge remote-tracking branch 'origin' into opentelemetry-jvm-storage-…

77cb707

…close-scope

imports

af8ff0c

Merge branch 'series/2.x' into opentelemetry-jvm-storage-close-scope

8fe46f2

joost-de-vries added 4 commits January 3, 2025 11:48

scoped

b6109c9

revert

a0039fb

revert

5723420

revert

45f82a1

test ContextStorage.native

a24b703

fix destruct tuple

03e8835

linting

6527c89

joost-de-vries and others added 2 commits January 19, 2025 10:11

Merge remote-tracking branch 'origin' into opentelemetry-jvm-storage-…

fdf8652

…close-scope

Merge branch 'series/2.x' into opentelemetry-jvm-storage-close-scope

e4447d0

joost-de-vries added 2 commits February 10, 2025 12:29

remove scoped Baggage methods

65e1f68

Merge remote-tracking branch 'origin/opentelemetry-jvm-storage-close-…

3ba0275

…scope' into opentelemetry-jvm-storage-close-scope

joost-de-vries commented Feb 10, 2025

View reviewed changes

Merge branch 'series/2.x' into opentelemetry-jvm-storage-close-scope

5f16959

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure to call close() on Scope returned from io.opentelemetry.context.Context.makeCurrent() #924

Make sure to call close() on Scope returned from io.opentelemetry.context.Context.makeCurrent() #924

joost-de-vries commented Dec 30, 2024 •

edited

Loading

CLAassistant commented Dec 30, 2024 •

edited

Loading

grouzen commented Dec 30, 2024

joost-de-vries commented Dec 30, 2024

joost-de-vries commented Dec 30, 2024

grouzen commented Dec 30, 2024 •

edited

Loading

grouzen commented Dec 30, 2024 •

edited

Loading

joost-de-vries commented Jan 2, 2025

joost-de-vries commented Jan 2, 2025 •

edited

Loading

grouzen commented Jan 2, 2025

grouzen commented Jan 2, 2025

joost-de-vries commented Jan 3, 2025

joost-de-vries commented Jan 3, 2025

joost-de-vries commented Jan 3, 2025

joost-de-vries commented Jan 7, 2025

grouzen commented Jan 7, 2025 •

edited

Loading

joost-de-vries commented Jan 7, 2025 •

edited

Loading

grouzen commented Jan 7, 2025

joost-de-vries commented Feb 10, 2025

grouzen commented Feb 10, 2025

joost-de-vries Feb 10, 2025

grouzen Feb 10, 2025

grouzen Feb 10, 2025

joost-de-vries Feb 10, 2025

grouzen Feb 10, 2025

grouzen Feb 10, 2025

joost-de-vries Feb 10, 2025 •

edited

Loading

grouzen Feb 10, 2025

joost-de-vries commented Feb 10, 2025

grouzen commented Feb 10, 2025

joost-de-vries commented Feb 11, 2025 •

edited

Loading

grouzen commented Feb 14, 2025

Make sure to call close() on Scope returned from io.opentelemetry.context.Context.makeCurrent() #924

Are you sure you want to change the base?

Make sure to call close() on Scope returned from io.opentelemetry.context.Context.makeCurrent() #924

Conversation

joost-de-vries commented Dec 30, 2024 • edited Loading

CLAassistant commented Dec 30, 2024 • edited Loading

grouzen commented Dec 30, 2024

joost-de-vries commented Dec 30, 2024

joost-de-vries commented Dec 30, 2024

grouzen commented Dec 30, 2024 • edited Loading

grouzen commented Dec 30, 2024 • edited Loading

joost-de-vries commented Jan 2, 2025

joost-de-vries commented Jan 2, 2025 • edited Loading

grouzen commented Jan 2, 2025

grouzen commented Jan 2, 2025

joost-de-vries commented Jan 3, 2025

joost-de-vries commented Jan 3, 2025

joost-de-vries commented Jan 3, 2025

joost-de-vries commented Jan 7, 2025

grouzen commented Jan 7, 2025 • edited Loading

joost-de-vries commented Jan 7, 2025 • edited Loading

grouzen commented Jan 7, 2025

joost-de-vries commented Feb 10, 2025

grouzen commented Feb 10, 2025

joost-de-vries Feb 10, 2025

Choose a reason for hiding this comment

grouzen Feb 10, 2025

Choose a reason for hiding this comment

grouzen Feb 10, 2025

Choose a reason for hiding this comment

joost-de-vries Feb 10, 2025

Choose a reason for hiding this comment

grouzen Feb 10, 2025

Choose a reason for hiding this comment

grouzen Feb 10, 2025

Choose a reason for hiding this comment

joost-de-vries Feb 10, 2025 • edited Loading

Choose a reason for hiding this comment

grouzen Feb 10, 2025

Choose a reason for hiding this comment

joost-de-vries commented Feb 10, 2025

grouzen commented Feb 10, 2025

joost-de-vries commented Feb 11, 2025 • edited Loading

grouzen commented Feb 14, 2025

joost-de-vries commented Dec 30, 2024 •

edited

Loading

CLAassistant commented Dec 30, 2024 •

edited

Loading

grouzen commented Dec 30, 2024 •

edited

Loading

grouzen commented Dec 30, 2024 •

edited

Loading

joost-de-vries commented Jan 2, 2025 •

edited

Loading

grouzen commented Jan 7, 2025 •

edited

Loading

joost-de-vries commented Jan 7, 2025 •

edited

Loading

joost-de-vries Feb 10, 2025 •

edited

Loading

joost-de-vries commented Feb 11, 2025 •

edited

Loading