Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interrupts for closeScope0 #13262

Closed

Conversation

EricYangIBM
Copy link
Contributor

closeScope0 should interrupt scope accessor threads with a
Scope.ScopedAccessError.

Closes: #13256
Signed-off-by: Eric Yang [email protected]

@EricYangIBM
Copy link
Contributor Author

@tajila is this the idea?

@EricYangIBM EricYangIBM force-pushed the closeScopeInterrupts branch from acde4b9 to 0a76487 Compare July 30, 2021 15:43
@EricYangIBM
Copy link
Contributor Author

Test is passing, previously e.printStackTrace() printed

java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:567)
        at org.openj9.test.foreignMemoryAccess.TestCloseScope0.lambda$closeScopeDuringAccess$1(TestCloseScope0.java:157)
        at org.openj9.test.foreignMemoryAccess.TestCloseScope0$$Lambda$54/0x00000000243a69a0.run(Unknown Source)
        at java.base/java.lang.Thread.run(Thread.java:883)
Caused by: jdk.internal.misc.ScopedMemoryAccess$Scope$ScopedAccessError: Attempt to access an already released memory resource

@EricYangIBM EricYangIBM marked this pull request as ready for review July 30, 2021 15:45
@EricYangIBM EricYangIBM changed the title [WIP] Add interrupts for closeScope0 Add interrupts for closeScope0 Jul 30, 2021
@EricYangIBM
Copy link
Contributor Author

When running TestHandshake I am getting * ** ASSERTION FAILED ** at /root/openj9-openjdk-jdk16/openj9/runtime/vm/VMAccess.cpp:477: (!(vmThread->inNative))
Do I have to acquire vm access for the scopeThread before throwing?

@tajila
Copy link
Contributor

tajila commented Aug 4, 2021

I dont think its that simple, we need to make sure the thread we are attempting to interrupt has landed at a safe point before setting the exception. And by safe point I mean, not holding on to vmaccess.

@tajila
Copy link
Contributor

tajila commented Aug 4, 2021

Take a look at how Thread::interrupt works.

Specifically

if (J9VMJAVALANGTHREAD_STARTED(currentThread, receiverObject) && (NULL != targetThread)) {
		void (*sidecarInterruptFunction)(J9VMThread*) = vm->sidecarInterruptFunction;
		if (NULL != sidecarInterruptFunction) {
			sidecarInterruptFunction(targetThread);
		}
		omrthread_interrupt(targetThread->osThread);
	}

@EricYangIBM EricYangIBM force-pushed the closeScopeInterrupts branch from 0a76487 to 5e19150 Compare August 6, 2021 16:36
@EricYangIBM
Copy link
Contributor Author

EricYangIBM commented Aug 6, 2021

I tried to ensure that the top frame of the scope accessor thread was interpreted but I am still getting TEST RESULT: Failed. Execution failed: `main' threw exception: jdk.internal.misc.ScopedMemoryAccess$Scope$ScopedAccessError: Attempt to access an already released memory resource. I'm not sure if it has anything to do with interpreted/compiled frames. The printf output always gave Interp and setException so I think the top frame was always interpreted.
This is for TestHandshake.java. TestCloseScope0.java passes.

Also, commented out is my other attempt at determining if the top frame is interpreted, but walkThread->literals was always null.

@tajila
Copy link
Contributor

tajila commented Aug 6, 2021

Can you run the test with -Xint just to understand if this only occurs with jit frames. Perhaps run in a grinder.

Assuming it fails with -Xint try setting currentThread->stopThrowable = exception. Then set vmFuncs->setHaltFlag(targetThread, J9_PUBLIC_FLAGS_STOP);. Note youll need to acquire the threads publicFlags mutex.

Then repeat the test with -Xint again

@tajila
Copy link
Contributor

tajila commented Aug 6, 2021

You can remove this code:

vmFuncs->haltThreadForInspection(currentThread, scopeThread);
		VM_VMHelpers::setExceptionPending(scopeThread, J9_JNI_UNWRAP_REFERENCE(exception));
		vmFuncs->resumeThreadForInspection(currentThread, scopeThread);

@tajila
Copy link
Contributor

tajila commented Aug 6, 2021

Also, set the haltFlags, etc. while the currentThread still has exclusive access

@EricYangIBM EricYangIBM force-pushed the closeScopeInterrupts branch 2 times, most recently from 17b8bf2 to 01dd3f7 Compare August 26, 2021 16:14
@EricYangIBM
Copy link
Contributor Author

@tajila This is consistently passing locally but not in ppc64le in Grinder (the exception thrown is still not caught). I think it may have to do with the exception being thrown in a native method? Is there a way to check this?

@tajila
Copy link
Contributor

tajila commented Aug 30, 2021

The romMethod->modifiers will have the J9AccNative bit set if its a native method

@EricYangIBM
Copy link
Contributor Author

@EricYangIBM EricYangIBM force-pushed the closeScopeInterrupts branch 2 times, most recently from 9a94802 to ab95cf4 Compare September 1, 2021 13:47
@babsingh
Copy link
Contributor

babsingh commented Sep 1, 2021

Problem

TestHandshake.java times out because scope accessor threads are not interrupted so they keep holding the scope, preventing close. This issue is intermittent.

How to investigate?

To diagnose the above problem, we need to know

  • is the exception being thrown and caught properly?
  • are the threads working correctly? time out sounds like a hang. we should study the state of the threads.

Running the test with verbose output

make test TEST="java/foreign/TestHandshake.java" JTREG="VERBOSE=all"

The test is suspending, terminating and resuming threads. The test output may give the hint why the threads are not interrupted.

Java and system cores

Java and system cores can be used to investigate if the exception is correctly thrown and caught, and the state of the threads:

make test TEST="java/foreign/TestHandshake.java" JTREG="VERBOSE=all;VM_OPTIONS=-Xdump:java+system:events=uncaught,filter=<EXCEPTION_NAME>"
  • -Xdump documentation: https://www.eclipse.org/openj9/docs/xdump/#class-loading-and-exception-events
  • You can also try other exception events: throw, catch, uncaught, systhrow.
  • If a user signal (kill -QUIT) is sent to the VM, a brief listing of the Java threads including their stacks, status, and monitor information is written to stderr. This can help study the state of the threads in the absence of an interrupt.

Trace points

  • You can add new tracepoints to your code in order to verify if it is functioning correctly.
  • Currently, there are no tracepoints in your implementation of ScopedMemoryAccess_closeScope0.
  • Also, you can enable existing tracepoints to study thread and exception behaviour.
  • Refer to the -Xtrace documentation: https://www.eclipse.org/openj9/docs/xtrace.

@EricYangIBM EricYangIBM force-pushed the closeScopeInterrupts branch 3 times, most recently from caab3e5 to b73e110 Compare September 8, 2021 16:56
J9VMThread *scopeThread = current->thread;
LinkedThreads *prev = NULL;

if (NULL == scopeThread->currentException) {
Copy link
Contributor

@babsingh babsingh Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopThrowable will not be set if currentException is not null. On PPCLE, the JIT is capable of setting exceptions for hardware events. The same applies to S390 (zOS/zLinux). Refer to jitPPCHandler:

switch (trapType) {
case TRAP_TYPE_NULL_CHECK:

This will set the currentException and prevent stopThrowable from being set in your code. Most probably, this is the cause for the intermittent TestHandshake timeout failures on PPCLE. For fixing these failures, you will need to clear currentException and always set stopThrowable.

@gacholio Do we need to clear currentException while setting stopThrowable? If so, can we set currentException to null for clearing it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currentException is only set from the current thread, and should only be read from the current thread. Re-using stopThrowable is going to cause problems interacting with stop(), which is deprecated but still hasn't been removed.

@EricYangIBM
Copy link
Contributor Author

We're thinking of decompiling all frames of threads that are accessing the scope during close (should be rare in practice). Is this possible in normal jit code? fyi @gacholio

@gacholio
Copy link
Contributor

gacholio commented Sep 9, 2021

I haven't seen a detailed spec, but from what I understand, any thread in the closing scope needs to be made to throw a ScopedAccessError. Before getting into the implementation problems, I see several issues with the spec itself, which does not say that it waits for the other threads to be kicked out of the scope before freeing the underlying memory. This implies that the throwing threads could never again access the scoped memory, which seems clearly untrue in these situations:

Native method on top of stack

The exception could be thrown upon return to the VM, so perhaps this isn't really a problem.

Though errors are not supposed to be caught, they regularly are, particularly in the try/finally case (which is implemented as try/catch).

For example:

try {
   something();
} catch(ScopedAccessError) {
   use the scoped memory somehow
}

If the closing thread frees the memory, the above access would obviously be illegal.

@gacholio
Copy link
Contributor

gacholio commented Sep 9, 2021

Part of the problem with implementing this is that the JIT does not allow exceptions to be thrown from async checks (other than the implicit one on method entry), so it's possible that there could be a loop in the compiled code which uses the scoped memory which will never be broken by the exception throw.

I do not recall the reasoning behind this - perhaps @0xdaryl or @fjeremic might recall.

Also note that the stop() exception is also not thrown on return from JIT helpers, again for no reason I can think of (since other exceptions can be thrown from them).

@gacholio
Copy link
Contributor

gacholio commented Sep 9, 2021

A third suggestion is that the exception threads be decompiled before the throw. This is done for breakpoints in FSD mode, but is currently never done in non-FSD mode.

This may work, assuming that every GC point has an OSR decompile block. If we can't force it from the closing thread, it might be possible to inject an async event and do the decompile on the local thread before throwing the exception. This would require some modifications to the async message mechanism.

@gacholio
Copy link
Contributor

OK, assuming the JIT doesn't want to change that, decompilation seems the only viable solution for compiled frames. This would require that asynccheck be an OSR point in all cases (which I'm pretty sure is already true).

Unless there's a specific reason why the exception needs to be allocated in the closing thread, I suggest this be implemented like frame pop, with an interrupt bit in the publicFlags that triggers the allocation and throwing of the exception, decompiling the top frame in the compiled case.

@gacholio
Copy link
Contributor

@tajila Is this still under discussion? The async exception cannot be implemented reliably.

@gacholio
Copy link
Contributor

Discussed with @tajila - the @Scoped annotation is only allowed in base library code, so we will just have to assume it's correct, though I'm sure there are situations where we will go wrong.

@gacholio
Copy link
Contributor

Much like stop and frame pop, we will need to interrupt the target thread to break it out of wait/sleep/park.

@gacholio
Copy link
Contributor

We'll need to check for this upon return from native (currently it does not check for frame pop or stop). This does not cover JIT direct JNI (not currently sure what to do about that).

@EricYangIBM
Copy link
Contributor Author

To clarify, we use popFrameCheckIterator and set J9_PUBLIC_FLAGS_POP_FRAMES_INTERRUPT and an additional new flag, and handle it with an async event?

@tajila
Copy link
Contributor

tajila commented Sep 17, 2021

To clarify, we use popFrameCheckIterator

Not quite. I think we need similar behaviour to popFrameCheckIterator.

We need to check if the top frame is a JIT frame, if so trigger decompilation (ie. jitAddDecompilationForFramePop)

We also need to check if the top frame is wait/sleep/park, if so we need to interrupt the thread.

@gacholio
Copy link
Contributor

As mentioned above, we will need to do an async check on return from JNI native calls.

The VM already has these checks in the sleep and wait INLs. I believe park is a JNI native.

The JIT will also need to do this, and the inserted asynccheck must not be moved or elided. Is that going to be a problem? @fjeremic

`closeScope0` should interrupt scope accessor threads with a
`Scope.ScopedAccessError`.

Closes: eclipse-openj9#13256
Signed-off-by: Eric Yang <[email protected]>
@fjeremic
Copy link
Contributor

As mentioned above, we will need to do an async check on return from JNI native calls.

The VM already has these checks in the sleep and wait INLs. I believe park is a JNI native.

The JIT will also need to do this, and the inserted asynccheck must not be moved or elided. Is that going to be a problem? @fjeremic

Tagging @0xdaryl and @jdmpapin for this one. I'm not sure if we can guarantee the optimizer won't touch such asyncchecks if they are inserted, unless we do it at tree lowering time or something. Also do we need an asynccheck after every JNI call? I'm unsure about the performance implications.

@tajila
Copy link
Contributor

tajila commented Sep 23, 2021

We also need to check if the top frame is wait/sleep/park, if so we need to interrupt the thread.

@gacholio I no longer think this is needed. Since we are restricting this to threads that have methods with the @scoped frame (and are part of the jdk.foreign package) on the stack, we essentially have the bounded list of methods that could possibly be called. None of those methods are calling wait|sleep|park so its not a possible scenario and we could assert this. The @scoped methods do call a lot of unsafe natives so we will need the async check after jni native calls.

@gacholio
Copy link
Contributor

That would certainly simplify things. I suspect we will still need to add extra checks in various places in the interpreter, but for the first pass, we can just assume the async check will be hit and handle the throw.

For the JIT case, it may be sufficient to detect the compiled frame on TOS, add a decompilation for it, then perform the throw (the exception throw code already handles the decompile cases). I will need to think about how exactly to do this, so perhaps it's best to proceed ignoring the JIT case for now.

@EricYangIBM
Copy link
Contributor Author

Our test currently has a wait to ensure that the scope is closed as a thread is in a @Scoped method. So we would have to modify this test or add the interrupt. https://github.com/eclipse-openj9/openj9/pull/13262/files#diff-9758c2d1a02d66db3df3f7663ecbc4250c78ad374e47820a1216881554a78383R147

@tajila
Copy link
Contributor

tajila commented Sep 24, 2021

Our test currently has a wait to ensure that the scope is closed as a thread

Ya those tests will need to be modified

@EricYangIBM
Copy link
Contributor Author

With the VM_VMHelpers::setExceptionPending(currentThread, J9_JNI_UNWRAP_REFERENCE(userData)); in the async handler, the test crashes (without that line it just times out like before). The crash happens after closeScope and during a System.out.println. In gdb backtrace I found

#10 0x00007f7eb269e1e5 in vmSignalHandler (portLibrary=0x7f7eb289b340 <j9portLibrary>, gpType=24, 
    gpInfo=<optimized out>, userData=<optimized out>) at /root/openj9-openjdk-jdk16/openj9/runtime/vm/gphandle.c:848
#11 0x00007f7eb2494c4a in mainSynchSignalHandler (signal=11, sigInfo=0x7f7eb00f3db0, contextInfo=0x7f7eb00f3c80)
    at /root/openj9-openjdk-jdk16/omr/port/unix/omrsignal.c:1066
#12 <signal handler called>
#13 isExceptionTypeCaughtByHandler (currentThread=<optimized out>, thrownExceptionClass=0xac001400, 
    constantPool=<optimized out>, handlerIndex=<optimized out>, walkState=<optimized out>)
    at /root/openj9-openjdk-jdk16/openj9/runtime/vm/exceptionsupport.c:509
#14 0x00007f7eb2694554 in exceptionHandlerSearch (currentThread=0x31a800, walkState=0x31ab10)
    at /root/openj9-openjdk-jdk16/openj9/runtime/vm/exceptionsupport.c:292
#15 0x00007f7eb26dc079 in walkFrame (walkState=walkState@entry=0x31ab10)
    at /root/openj9-openjdk-jdk16/openj9/runtime/vm/swalk.c:487
#16 0x00007f7eb26dcdb0 in walkStackFrames (currentThread=0x31a800, walkState=0x31ab10)
    at /root/openj9-openjdk-jdk16/openj9/runtime/vm/swalk.c:324
#17 0x00007f7eb2692860 in walkStackForExceptionThrow (currentThread=<optimized out>, exception=<optimized out>, 
    walkOnly=<optimized out>) at /root/openj9-openjdk-jdk16/openj9/runtime/vm/exceptionsupport.c:847
#18 0x00007f7eb26ef62a in VM_BytecodeInterpreterCompressed::throwException (_pc=<optimized out>, 
    _sp=<optimized out>, this=<optimized out>)
    at /root/openj9-openjdk-jdk16/openj9/runtime/vm/BytecodeInterpreter.hpp:2507
#19 VM_BytecodeInterpreterCompressed::run (this=0x7f7eb00f5910, vmThread=0x7f7eac016400)

so maybe I didn't set the exception correctly?

@gacholio
Copy link
Contributor

Some time (perhaps next week) we should have a call about this. I don't believe the current changes are getting us to where we need to be.

@EricYangIBM EricYangIBM deleted the closeScopeInterrupts branch January 24, 2022 16:14
@babsingh
Copy link
Contributor

@EricYangIBM This PR was closed due to the deletion of closeScopeInterrupts. Did you intend to close this PR? Can you post a comment describing the future for this work?

@EricYangIBM EricYangIBM restored the closeScopeInterrupts branch January 26, 2022 18:40
@EricYangIBM EricYangIBM reopened this Jan 26, 2022
@EricYangIBM
Copy link
Contributor Author

I accidentally deleted my branches when trying to push a renamed branch

@EricYangIBM
Copy link
Contributor Author

Closing as per #13256 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Jep393 closeScope0 native does not interrupt accessor threads
5 participants