Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testSoftMxDisclaimMemory_GC_3_FAILED : Segmentation error vmState=0x0002000f #14396

Closed
JasonFengJ9 opened this issue Jan 28, 2022 · 23 comments · Fixed by #14577
Closed

testSoftMxDisclaimMemory_GC_3_FAILED : Segmentation error vmState=0x0002000f #14396

JasonFengJ9 opened this issue Jan 28, 2022 · 23 comments · Fixed by #14577

Comments

@JasonFengJ9
Copy link
Member

Failure link

From an internal build job/Test_openjdk18_j9_extended.functional_x86-64_linux_testList_0/3/(ub20x64rt3-8):

openjdk version "18-beta" 2022-03-22
IBM Semeru Runtime Open Edition 18.0.0.0 (build 18-beta+32-202201270250)
Eclipse OpenJ9 VM 18.0.0.0 (build master-5e9bd4ea0, JRE 18 Linux amd64-64-Bit Compressed References 20220127_21 (JIT enabled, AOT enabled)
OpenJ9   - 5e9bd4ea0
OMR      - 4a009d2fe
JCL      - 0764ffcb69 based on jdk-18+32)

Rerun in Grinder - Change TARGET to run only the failed test targets.

Optional info

Failure output (captured from console output)

[2022-01-27T05:34:00.406Z] Running test testSoftMxDisclaimMemory_GC_3 ...
[2022-01-27T05:34:00.406Z] ===============================================

[2022-01-27T05:34:00.819Z] variation: Mode501
[2022-01-27T05:34:00.819Z] JVM_OPTIONS:  -Xjit -Xgcpolicy:balanced -Xnocompressedrefs 

[2022-01-27T05:34:02.105Z] Unhandled exception
[2022-01-27T05:34:02.105Z] Type=Segmentation error vmState=0x0002000f
[2022-01-27T05:34:02.105Z] J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
[2022-01-27T05:34:02.105Z] Handler1=00007F684EF631D0 Handler2=00007F684ECC0C70 InaccessibleAddress=0000000000000000
[2022-01-27T05:34:02.105Z] RDI=00007F67E8961A10 RSI=00007F68071E1030 RAX=0000000000000000 RBX=00007F6807EDDC18
[2022-01-27T05:34:02.105Z] RCX=0000000099669966 RDX=0000000000000000 R8=0000000000000002 R9=0000000000000005
[2022-01-27T05:34:02.105Z] R10=0000000000000005 R11=00007F6807EDDC10 R12=0000000000000001 R13=000000000000003E
[2022-01-27T05:34:02.105Z] R14=00007F6808618E20 R15=00007F6808618E20
[2022-01-27T05:34:02.105Z] RIP=00007F684C3A9A2F GS=0000 FS=0000 RSP=00007F67E8961970
[2022-01-27T05:34:02.105Z] EFlags=0000000000010246 CS=0033 RBP=0000000000000001 ERR=0000000000000004
[2022-01-27T05:34:02.105Z] TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000000
[2022-01-27T05:34:02.105Z] xmm0 4237ef3784400000 (f: 2218786816.000000, d: 1.027976e+11)
[2022-01-27T05:34:02.105Z] xmm1 4058000000000000 (f: 0.000000, d: 9.600000e+01)
[2022-01-27T05:34:02.105Z] xmm2 7c6f2f0bdadef0f9 (f: 3672043776.000000, d: 2.431165e+291)
[2022-01-27T05:34:02.105Z] xmm3 8e80d92457731b67 (f: 1467161472.000000, d: -8.085507e-239)
[2022-01-27T05:34:02.105Z] xmm4 000000003f693c00 (f: 1063861248.000000, d: 5.256173e-315)
[2022-01-27T05:34:02.105Z] xmm5 bff0000000000000 (f: 0.000000, d: -1.000000e+00)
[2022-01-27T05:34:02.105Z] xmm6 bfba4e76ce8c0e5e (f: 3465285120.000000, d: -1.027598e-01)
[2022-01-27T05:34:02.105Z] xmm7 000000000000006e (f: 110.000000, d: 5.434722e-322)
[2022-01-27T05:34:02.105Z] xmm8 002f003c000a003e (f: 655422.000000, d: 8.622416e-308)
[2022-01-27T05:34:02.105Z] xmm9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2022-01-27T05:34:02.105Z] xmm10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2022-01-27T05:34:02.105Z] xmm11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2022-01-27T05:34:02.105Z] xmm12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2022-01-27T05:34:02.105Z] xmm13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2022-01-27T05:34:02.105Z] xmm14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2022-01-27T05:34:02.105Z] xmm15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2022-01-27T05:34:02.105Z] Module=/home/jenkins/workspace/Test_openjdk18_j9_extended.functional_x86-64_linux_testList_0/openjdkbinary/j2sdk-image/lib/default/libj9gc_full29.so
[2022-01-27T05:34:02.105Z] Module_base_address=00007F684C2D0000
[2022-01-27T05:34:02.105Z] Target=2_90_20220127_21 (Linux 5.4.0-91-generic)
[2022-01-27T05:34:02.105Z] CPU=amd64 (4 logical CPUs) (0xf5fa5000 RAM)
[2022-01-27T05:34:02.105Z] ----------- Stack Backtrace -----------
[2022-01-27T05:34:02.105Z] (0x00007F684C3A9A2F [libj9gc_full29.so+0xd9a2f])
[2022-01-27T05:34:02.105Z] (0x00007F684C3BAB25 [libj9gc_full29.so+0xeab25])
[2022-01-27T05:34:02.105Z] (0x00007F684C3BB73B [libj9gc_full29.so+0xeb73b])
[2022-01-27T05:34:02.105Z] (0x00007F684C3BDD09 [libj9gc_full29.so+0xedd09])
[2022-01-27T05:34:02.105Z] (0x00007F684C3FDDD7 [libj9gc_full29.so+0x12ddd7])
[2022-01-27T05:34:02.105Z] (0x00007F684C3FD5E9 [libj9gc_full29.so+0x12d5e9])
[2022-01-27T05:34:02.105Z] (0x00007F684ECC19D3 [libj9prt29.so+0x2a9d3])
[2022-01-27T05:34:02.105Z] (0x00007F684C3FD0EF [libj9gc_full29.so+0x12d0ef])
[2022-01-27T05:34:02.105Z] (0x00007F684EA8A4B2 [libj9thr29.so+0xe4b2])
[2022-01-27T05:34:02.105Z] (0x00007F684FF16609 [libpthread.so.0+0x9609])
[2022-01-27T05:34:02.105Z] clone+0x43 (0x00007F684FE35293 [libc.so.6+0x122293])
[2022-01-27T05:34:02.105Z] ---------------------------------------
[2022-01-27T05:34:02.105Z] JVMDUMP039I Processing dump event "gpf", detail "" at 2022/01/26 21:34:01 - please wait.
[2022-01-27T05:34:02.105Z] JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Test_openjdk18_j9_extended.functional_x86-64_linux_testList_0/aqa-tests/TKG/output_16432552049195/testSoftMxDisclaimMemory_GC_3/core.20220126.213401.267050.0001.dmp' in response to an event
[2022-01-27T05:34:04.128Z] JVMDUMP010I System dump written to /home/jenkins/workspace/Test_openjdk18_j9_extended.functional_x86-64_linux_testList_0/aqa-tests/TKG/output_16432552049195/testSoftMxDisclaimMemory_GC_3/core.20220126.213401.267050.0001.dmp
[2022-01-27T05:34:04.128Z] JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdk18_j9_extended.functional_x86-64_linux_testList_0/aqa-tests/TKG/output_16432552049195/testSoftMxDisclaimMemory_GC_3/javacore.20220126.213401.267050.0002.txt' in response to an event
[2022-01-27T05:34:04.128Z] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk18_j9_extended.functional_x86-64_linux_testList_0/aqa-tests/TKG/output_16432552049195/testSoftMxDisclaimMemory_GC_3/javacore.20220126.213401.267050.0002.txt
[2022-01-27T05:34:04.128Z] JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Test_openjdk18_j9_extended.functional_x86-64_linux_testList_0/aqa-tests/TKG/output_16432552049195/testSoftMxDisclaimMemory_GC_3/Snap.20220126.213401.267050.0003.trc' in response to an event
[2022-01-27T05:34:04.546Z] JVMDUMP010I Snap dump written to /home/jenkins/workspace/Test_openjdk18_j9_extended.functional_x86-64_linux_testList_0/aqa-tests/TKG/output_16432552049195/testSoftMxDisclaimMemory_GC_3/Snap.20220126.213401.267050.0003.trc
[2022-01-27T05:34:04.546Z] JVMDUMP032I JVM requested JIT dump using '/home/jenkins/workspace/Test_openjdk18_j9_extended.functional_x86-64_linux_testList_0/aqa-tests/TKG/output_16432552049195/testSoftMxDisclaimMemory_GC_3/jitdump.20220126.213401.267050.0004.dmp' in response to an event
[2022-01-27T05:34:04.546Z] JVMDUMP051I JIT dump occurred in 'GC Worker' thread 0x00007F6794003C00
[2022-01-27T05:34:04.546Z] JVMDUMP010I JIT dump written to /home/jenkins/workspace/Test_openjdk18_j9_extended.functional_x86-64_linux_testList_0/aqa-tests/TKG/output_16432552049195/testSoftMxDisclaimMemory_GC_3/jitdump.20220126.213401.267050.0004.dmp
[2022-01-27T05:34:04.546Z] JVMDUMP013I Processed dump event "gpf", detail "".
[2022-01-27T05:34:04.546Z] 
[2022-01-27T05:34:04.546Z] testSoftMxDisclaimMemory_GC_3_FAILED

fyi @dmitripivkine

@dmitripivkine
Copy link
Contributor

This issue looks like a duplicate of #14382
I will double check in the core and close if confirmed

@dmitripivkine
Copy link
Contributor

The crash occur an attempt to scan mixed object, one of slots points to the "hole":

> !j9object 0x7F6807EDDC00
!J9Object 0x00007F6807EDDC00 {
	struct J9Class* clazz = !j9class 0x7F6848819200 // jdk/internal/math/FDBigInteger
	Object flags = 0x00000000;
	J lockword = 0x0000000000000000 (offset = 0) (java/lang/Object) <hidden>
	[I data = !fj9object 0x7f6808618e20 (offset = 8) (jdk/internal/math/FDBigInteger) <--- points to the "hole"
	I offset = 0x00000000 (offset = 16) (jdk/internal/math/FDBigInteger)
	I nWords = 0x00000011 (offset = 20) (jdk/internal/math/FDBigInteger)
	Z isImmutable = 0x00000001 (offset = 24) (jdk/internal/math/FDBigInteger)
}
0x7F6808618E20 :  0000000000000001 00000000000671e0 [ .........q...... ] <---
0x7F6808618E30 :  00007f6808618e38 df544342213246f1 [ 8.a.h....F2!BCT. ]

> !markmap ismarked 0x7F6807EDDC00
Object 0x00007F6807EDDC00 is marked

@dmitripivkine
Copy link
Contributor

Also I noticed pointer to jdk/internal/math/FDBigInteger object type was in registers for #14382

@pshipton
Copy link
Member

pshipton commented Feb 3, 2022

Another like #14399 which was closed as a dup of this.

https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_extended.functional_x86-64_linux_Nightly_testList_1/189 - cent6-x64-6
testSoftMxLocal_LP4k_3 -Xjit -Xgcpolicy:balanced -Xnocompressedrefs

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk11_j9_extended.functional_x86-64_linux_Nightly_testList_1/189/functional_test_output.tar.gz

Unhandled exception
Type=Segmentation error vmState=0x0002000f
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007F6A2894D7E0 Handler2=00007F6A286AABF0 InaccessibleAddress=0000000000000000
RDI=00007F69B8FE79D0 RSI=00007F69DB76E030 RAX=0000000000000000 RBX=00007F69DC28EE38
RCX=0000000099669966 RDX=0000000000000000 R8=0000000000000002 R9=0000000000000005
R10=0000000000000005 R11=00007F69DC28EE30 R12=0000000000000001 R13=000000000000003E
R14=00007F69DCB0E8F8 R15=00007F69DCB0E8F8
RIP=00007F6A1C8DEB1F GS=0000 FS=0000 RSP=00007F69B8FE7930
EFlags=0000000000010246 CS=0033 RBP=0000000000000001 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000000
xmm0 4207a00000000000 (f: 0.000000, d: 1.268358e+10)
xmm1 405e000000000000 (f: 0.000000, d: 1.200000e+02)
xmm2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm3 00000000be40290c (f: 3191875840.000000, d: 1.576996e-314)
xmm4 00000000bdcbce27 (f: 3184250368.000000, d: 1.573229e-314)
xmm5 00000000b9347648 (f: 3107223040.000000, d: 1.535172e-314)
xmm6 000000003c318000 (f: 1009876992.000000, d: 4.989455e-315)
xmm7 0000000038c73bcc (f: 952581056.000000, d: 4.706376e-315)
xmm8 00000000b624e000 (f: 3055869952.000000, d: 1.509800e-314)
xmm9 000000003c075284 (f: 1007112832.000000, d: 4.975799e-315)
xmm10 0000000040400000 (f: 1077936128.000000, d: 5.325712e-315)
xmm11 bd63d96e3e52c44a (f: 1045611584.000000, d: -5.641521e-13)
xmm12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/home/jenkins/workspace/Test_openjdk11_j9_extended.functional_x86-64_linux_Nightly_testList_1/openjdkbinary/j2sdk-image/lib/default/libj9gc_full29.so
Module_base_address=00007F6A1C805000
Target=2_90_20220202_207 (Linux 2.6.32-754.35.1.el6.x86_64)
CPU=amd64 (4 logical CPUs) (0x1f40ff000 RAM)
----------- Stack Backtrace -----------
(0x00007F6A1C8DEB1F [libj9gc_full29.so+0xd9b1f])
(0x00007F6A1C8EFF05 [libj9gc_full29.so+0xeaf05])
(0x00007F6A1C8F0B1B [libj9gc_full29.so+0xebb1b])
(0x00007F6A1C8F30E9 [libj9gc_full29.so+0xee0e9])
(0x00007F6A1C933237 [libj9gc_full29.so+0x12e237])
(0x00007F6A1C932A49 [libj9gc_full29.so+0x12da49])
(0x00007F6A286AB953 [libj9prt29.so+0x2a953])
(0x00007F6A1C93254F [libj9gc_full29.so+0x12d54f])
(0x00007F6A284744F6 [libj9thr29.so+0xe4f6])
(0x00007F6A2A4CEAA1 [libpthread.so.0+0x7aa1])
clone+0x6d (0x00007F6A29E06C4D [libc.so.6+0xe8c4d])
---------------------------------------

@dmitripivkine
Copy link
Contributor

yes, it is the same issue. Just for record:

> !j9object 0x7F69DC28EE20
!J9Object 0x00007F69DC28EE20 {
	struct J9Class* clazz = !j9class 0x7F6A247C6E00 // jdk/internal/math/FDBigInteger
	Object flags = 0x00000000;
	J lockword = 0x0000000000000000 (offset = 0) (java/lang/Object) <hidden>
	[I data = !fj9object 0x7f69dcb0e8f8 (offset = 8) (jdk/internal/math/FDBigInteger) <--- points to the hole
	I offset = 0x00000000 (offset = 16) (jdk/internal/math/FDBigInteger)
	I nWords = 0x00000017 (offset = 20) (jdk/internal/math/FDBigInteger)
	Z isImmutable = 0x00000001 (offset = 24) (jdk/internal/math/FDBigInteger)
}

0x7F69DCB0E8E0 :  00007f69dcb0e8e8 0000000100000000 [ ....i........... ]
0x7F69DCB0E8F0 :  00007f6900000002 0000000000000001 [ ....i........... ] <---
0x7F69DCB0E900 :  0000000000071708 00007f69dcb0e910 [ ............i... ]
0x7F69DCB0E910 :  7d3f6e75f0f94991 e3ec3f33b2acf32b [ .I..un?}+...3?.. ]

@pshipton
Copy link
Member

pshipton commented Feb 4, 2022

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_extended.functional_x86-64_linux_Nightly_testList_0/135 - ub20-x86-1
testSoftMxLocal_LP4k_3

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_extended.functional_x86-64_linux_Nightly_testList_0/135/functional_test_output.tar.gz

Unhandled exception
Type=Segmentation error vmState=0x00020002
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007F7577C3E1A0 Handler2=00007F757C173CB0 InaccessibleAddress=0000000000000000
RDI=00007F75780880A0 RSI=00007F74C8005968 RAX=0000000099669966 RBX=0000000000000000
RCX=0000000000000001 RDX=0000000000000003 R8=00007F74C8005968 R9=0000000000000005
R10=00007F753568D760 R11=00007F75780880A0 R12=0000000000000001 R13=00007F7575839360
R14=00007F75780880A0 R15=00007F74C8005968
RIP=00007F7575468B41 GS=0000 FS=0000 RSP=00007F7511A9EB30
EFlags=0000000000010206 CS=0033 RBP=00007F7535699E10 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000000
xmm0 00000000000004f6 (f: 1270.000000, d: 6.274634e-321)
xmm1 00007f757801b668 (f: 2013378176.000000, d: 6.923960e-310)
xmm2 00000000000004f5 (f: 1269.000000, d: 6.269693e-321)
xmm3 796c746867694e5f (f: 1734954624.000000, d: 7.881345e+276)
xmm4 632e656d65686353 (f: 1701340032.000000, d: 5.735706e+169)
xmm5 706f432f6367686c (f: 1667721344.000000, d: 3.882841e+233)
xmm6 746e75722f396a6e (f: 792291968.000000, d: 6.978447e+252)
xmm7 67694e5f78756e69 (f: 2020961920.000000, d: 1.409397e+190)
xmm8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/home/jenkins/workspace/Test_openjdk17_j9_extended.functional_x86-64_linux_Nightly_testList_0/openjdkbinary/j2sdk-image/lib/default/libj9gc_full29.so
Module_base_address=00007F757536F000
Target=2_90_20220203_150 (Linux 5.4.0-81-generic)
CPU=amd64 (4 logical CPUs) (0x5df59a000 RAM)
----------- Stack Backtrace -----------
(0x00007F7575468B41 [libj9gc_full29.so+0xf9b41])
(0x00007F757546A5B0 [libj9gc_full29.so+0xfb5b0])
(0x00007F757546BD74 [libj9gc_full29.so+0xfcd74])
(0x00007F757549D277 [libj9gc_full29.so+0x12e277])
(0x00007F757549CA89 [libj9gc_full29.so+0x12da89])
(0x00007F757C174A13 [libj9prt29.so+0x2aa13])
(0x00007F757549C58F [libj9gc_full29.so+0x12d58f])
(0x00007F75779F14F2 [libj9thr29.so+0xe4f2])
(0x00007F757CFCB609 [libpthread.so.0+0x9609])
clone+0x43 (0x00007F757CEEA293 [libc.so.6+0x122293])
---------------------------------------

@pshipton
Copy link
Member

pshipton commented Feb 4, 2022

As per #14382 (comment) setting as a blocker.

@dmitripivkine
Copy link
Contributor

@pshipton
I am sure this item is a duplicate of #14382.
We can close one of the defects or keep both if you like

@pshipton
Copy link
Member

pshipton commented Feb 4, 2022

Closing one is fine, but the remaining issue should be in the 0.31 milestone plan.

@LinHu2016
Copy link
Contributor

1, The issue can not be reproduced by grinder x 100 with single test(testSoftMxLocal_LP4k_3 or testSoftMxDisclaimMemory_GC_3) on both java17 and java18, but can be reproduced with whole test list(extended.funtional test_list_0, 2/10)on java 17 and java 18, never been reproduced on Java 8(axxon).

2, The tests are running under extreme corner case for balanced GC(-Xmx1024m -Xsoftmx512m -Xmn1m, -XX:+DisclaimVirtualMemory, regionsize = 512K eden region count = 2, test case uses a couple of large arrays, the largest array has 40 leaf regions, it cost more than half GCs did almost nothing, because all of two eden regions had been allocated for arraylet leaves, then there is zero region for collection set), not sure if the tests intend to check this env.

3, these tenure regions were marked and swept by the last GMP, but after the GMP the region haven't been used as survivor, crash happened, GC tried to scan a reference point to freeEntry header, the reference might come from remembered set or live object, looks like GMP missed to mark some traceable object, but no clue is found to cause the case yet.

4,It started failing after Jan 26th, 2022, it seems a regression. during January there was only one related GC change, (eclipse-omr/omr#6303, merged on Jan 20), it might not be related, but trying to create a latest build exclude the change, see if it is possible that eclipse-omr/omr#6303 trigger the issue.

the latest personal build with the change
https://hyc-runtimes-jenkins.swg-devops.com/view/OpenJ9%20-%20Personal/job/Pipeline-Build-Test-Personal/11732/
grinder x 5
https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/20723/
the latest personal build without the change
https://hyc-runtimes-jenkins.swg-devops.com/view/OpenJ9%20-%20Personal/job/Pipeline-Build-Test-Personal/11736/
grinder x 10
https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/20728/
https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/20744/ --java18
https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/20743/ --java17

@pshipton
Copy link
Member

pshipton commented Feb 7, 2022

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_extended.functional_x86-64_linux_Nightly_testList_0/137 - ub20-x86-1
testSoftMxLocal_LP4k_3

04:43:36.193 0x7f67c4003c00    j9mm.316    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK17_x86-64_linux_Nightly/openj9/runtime/gc_vlhgc/WriteOnceCompactor.cpp:971: ((uintptr_t)0x99669966 == (((J9Class*)((((env)->compressObjectReferences()) ? (UDATA)(((J9ObjectCompressed*)((walk)))->clazz) : (UDATA)(((J9ObjectFull*)((walk)))->clazz)) & (~((UDATA)((UDATA)(0x100 - 1)))))))->eyecatcher)

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_extended.functional_x86-64_linux_Nightly_testList_0/137/functional_test_output.tar.gz

@LinHu2016
Copy link
Contributor

LinHu2016 commented Feb 7, 2022

both personal build with/without allocation hint change have similar failure cases, so this issue wasn't cause by allocation hint change. look like the failure started between Jan 26 and Jan 27 for internal build.

the latest personal build with allocation hint change(eclipse-omr/omr#6303)
https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/20743/tapResults/
the latest personal build without allocation hint change
https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/20728/tapResults/

@LinHu2016
Copy link
Contributor

uodate:
have rerun a couple of grindersx10 with extended.functional tests on java17 and java18(x86-64 linux), base on results, some changes were merged on Jan 26 might trigger the issue.
on java 18
build 20 -- good build
03:05:10 openjdk version "18-beta" 2022-03-22
03:05:10 IBM Semeru Runtime Open Edition 18.0.0.0 (build 18-beta+32-202201260740)
03:05:10 Eclipse OpenJ9 VM 18.0.0.0 (build master-729fc428f, JRE 18 Linux amd64-64-Bit Compressed References 20220126_20 (JIT enabled, AOT enabled)
03:06:03 OpenJ9 - 729fc42
03:06:03 OMR - 81bc4828b
03:06:03 JCL - d3986cddfd based on jdk-18+32)
https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/20770/
build 21 -- bad build
22:18:55 openjdk version "18-beta" 2022-03-22
22:18:55 IBM Semeru Runtime Open Edition 18.0.0.0 (build 18-beta+32-202201270250)
22:18:55 Eclipse OpenJ9 VM 18.0.0.0 (build master-5e9bd4ea0, JRE 18 Linux amd64-64-Bit Compressed References 20220127_21 (JIT enabled, AOT enabled)
22:18:55 OpenJ9 - 5e9bd4e
22:18:55 OMR - 4a009d2fe
22:18:55 JCL - 0764ffcb69 based on jdk-18+32)
eclipse-omr/omr@81bc482...4a009d2
729fc42...5e9bd4e
on java 17
build 112 -- good build
openjdk version "17.0.2" 2022-01-18
IBM Semeru Runtime Open Edition 17.0.2.0-rc1 (build 17.0.2+8)
Eclipse OpenJ9 VM 17.0.2.0-rc1 (build openj9-0.30.0-rc1, JRE 17 Linux amd64-64-Bit Compressed References 20220126_112 (JIT enabled, AOT enabled)
OpenJ9 - f441547
OMR - dac962a28
JCL - 7a0f6b5186d based on jdk-17.0.2+8)
build 113 -- bad build
11:52:11 openjdk version "17.0.2-beta" 2022-01-18
11:52:11 IBM Semeru Runtime Open Edition 17.0.2+8-202201261627 (build 17.0.2-beta+8-202201261627)
11:52:11 Eclipse OpenJ9 VM 17.0.2+8-202201261627 (build master-301473370, JRE 17 Linux amd64-64-Bit Compressed References 20220126_113 (JIT enabled, AOT enabled)
11:52:11 OpenJ9 - 3014733
11:52:11 OMR - 4a009d2fe
11:52:11 JCL - 4d901a6aa5 based on jdk-17.0.2+8)
https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/20778/
build 114
00:32:50 openjdk version "17.0.2-beta" 2022-01-18
00:32:50 IBM Semeru Runtime Open Edition 17.0.2+8-202201270505 (build 17.0.2-beta+8-202201270505)
00:32:50 Eclipse OpenJ9 VM 17.0.2+8-202201270505 (build master-bdf0b3d46, JRE 17 Linux amd64-64-Bit Compressed References 20220127_114 (JIT enabled, AOT enabled)
00:32:50 OpenJ9 - bdf0b3d
00:32:50 OMR - 4a009d2fe
00:32:50 JCL - 148d7b1285 based on jdk-17.0.2+8)
https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/20780/
since there are issues in both java 18 and java 17, so unlikely the related change is in JCL, java 17 build 112 is release build and build 113 is regular build, so we cross compare between java 18 and java 17 for openJ9, likely the related change is in the below.
eclipse-omr/omr@81bc482...4a009d2
729fc42...3014733

@JasonFengJ9 JasonFengJ9 changed the title JDK18 testSoftMxDisclaimMemory_GC_3_FAILED : Segmentation error vmState=0x0002000f testSoftMxDisclaimMemory_GC_3_FAILED : Segmentation error vmState=0x0002000f Feb 8, 2022
@pshipton
Copy link
Member

pshipton commented Feb 8, 2022

https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_extended.functional_x86-64_linux_Nightly_testList_0/194 - ub16x64j98
testDefaultDisclaimMemory_3
-Xjit -Xgcpolicy:balanced -Xnocompressedrefs

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk11_j9_extended.functional_x86-64_linux_Nightly_testList_0/194/functional_test_output.tar.gz

Unhandled exception
Type=Segmentation error vmState=0x00020002
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000080
Handler1=00007F2F54FA1990 Handler2=00007F2F54CFEBF0 InaccessibleAddress=0000000000000000
RDI=00007F2F50087060 RSI=00007F2E940056B8 RAX=0000000099669966 RBX=0074007300650000
RCX=0000000000000001 RDX=0000000000000003 R8=00007F2E940056B8 R9=0000000000000005
R10=00007F2F0E313EA8 R11=00007F2F50087060 R12=0000000000000001 R13=00007F2F4E4EE340
R14=00007F2F50087060 R15=00007F2E940056B8
RIP=00007F2F4E11DF41 GS=0000 FS=0000 RSP=00007F2EB3B94B90
EFlags=0000000000010206 CS=0033 RBP=00007F2F0E315910 ERR=0000000000000000
TRAPNO=000000000000000D OLDMASK=0000000000000000 CR2=00007FC5550FAD80
xmm0 0000000000000f9d (f: 3997.000000, d: 1.974780e-320)
xmm1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm2 0000000000000f9c (f: 3996.000000, d: 1.974286e-320)
xmm3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm6 0000000000000001 (f: 1.000000, d: 4.940656e-324)
xmm7 0000000000000001 (f: 1.000000, d: 4.940656e-324)
xmm8 0000000000000001 (f: 1.000000, d: 4.940656e-324)
xmm9 0000000000000001 (f: 1.000000, d: 4.940656e-324)
xmm10 000000003e8c05e2 (f: 1049363968.000000, d: 5.184547e-315)
xmm11 000000003c075284 (f: 1007112832.000000, d: 4.975799e-315)
xmm12 0000000040400000 (f: 1077936128.000000, d: 5.325712e-315)
xmm13 3f0b77584b155748 (f: 1259689856.000000, d: 5.238760e-05)
xmm14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm15 3fe0000000000000 (f: 0.000000, d: 5.000000e-01)
Module=/home/jenkins/workspace/Test_openjdk11_j9_extended.functional_x86-64_linux_Nightly_testList_0/openjdkbinary/j2sdk-image/lib/default/libj9gc_full29.so
Module_base_address=00007F2F4E024000
Target=2_90_20220208_212 (Linux 4.4.0-170-generic)
CPU=amd64 (4 logical CPUs) (0x5e2f07000 RAM)
----------- Stack Backtrace -----------
(0x00007F2F4E11DF41 [libj9gc_full29.so+0xf9f41])
(0x00007F2F4E11F9B0 [libj9gc_full29.so+0xfb9b0])
(0x00007F2F4E121174 [libj9gc_full29.so+0xfd174])
(0x00007F2F4E1526B7 [libj9gc_full29.so+0x12e6b7])
(0x00007F2F4E151EC9 [libj9gc_full29.so+0x12dec9])
(0x00007F2F54CFF953 [libj9prt29.so+0x2a953])
(0x00007F2F4E1519CF [libj9gc_full29.so+0x12d9cf])
(0x00007F2F54AC84F6 [libj9thr29.so+0xe4f6])
(0x00007F2F56CBC6BA [libpthread.so.0+0x76ba])
clone+0x6d (0x00007F2F565DD41D [libc.so.6+0x10741d])
---------------------------------------

@dmitripivkine
Copy link
Contributor

@LinHu2016 repeated testing to narrow down the problem and has provided diffs between last good and first bad builds:
eclipse-omr/omr@81bc482...4a009d2
729fc42...5e9bd4e

@LinHu2016
Copy link
Contributor

I have kind narrowed down the issue,
the issue can be reproduced with the latest build grinder x10 extended.functional tests(in both java17 and java18), the below two grinders from the latest personal build exclude the change(#14357), both grinders were passed without the failure, looks like #14357 triggered this issue.

java 17 grinder x10
https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder/20888/
java 18 grinder x10
https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder/20883/

@pshipton
Copy link
Member

That means #14471 didn't fix it, it's still failing in the nightly build last night as per #14396 (comment)

@pshipton
Copy link
Member

The failures continue to occur but I'm not going to keep reporting them.

@pshipton
Copy link
Member

Is there any update on fixing this problem?

@dmitripivkine
Copy link
Contributor

This issue is still being investigated. We know it caused by #14357 but don't understand how exactly yet. We are running number of grinders to narrow down scope if the issue.
#14357 can not be reverted easily because it fixing Java 18 issue

@LinHu2016
Copy link
Contributor

fixing #14577, still need to run grinders for confirming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants