Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPE in ConnectorXAResource.getResourceHandle causes connection pool problems #24805

Open
escay opened this issue Feb 12, 2024 · 17 comments · May be fixed by #25131
Open

NPE in ConnectorXAResource.getResourceHandle causes connection pool problems #24805

escay opened this issue Feb 12, 2024 · 17 comments · May be fixed by #25131

Comments

@escay
Copy link
Contributor

escay commented Feb 12, 2024

Environment Details

  • GlassFish Version (and build number): 7.0.10
  • JDK version: jdk-17.0.8.1_1-openjdk-adoptium
  • OS: Windows 10
  • Database: Postgresql

Problem Description

I am running integration tests against a large ear file. During the tests database connection pool exceptions occur.
These tests have been running against Payara 5 + 6 and Weblogic 12 and 14. Now I am testing against Glassfish 7.

I ran into the same issue which I logged in the past for Payara 5:
payara/Payara#3025

Which was fixed / improved in:
payara/Payara#3344

Note: exact cause was never found. This fix was more preventing the npe.

Now I am testing in Glassfish 7 and it shows a similar stacktrace and debugging shows same point as error.

image

Caused by: java.lang.RuntimeException: Got exception during XAResource.end: javax.transaction.xa.XAException: com.sun.appserv.connectors.internal.api.PoolingException: java.lang.NullPointerException: Cannot invoke "com.sun.enterprise.resource.ResourceHandle.getResourceState()" because "resourceHandle" is null
	at com.sun.enterprise.transaction.JavaEETransactionManagerSimplified.delistResource(JavaEETransactionManagerSimplified.java:522) ~[jta.jar:?]
	at com.sun.enterprise.resource.rm.ResourceManagerImpl.unregisterResource(ResourceManagerImpl.java:248) ~[connectors-runtime.jar:?]
	at com.sun.enterprise.resource.rm.ResourceManagerImpl.delistResource(ResourceManagerImpl.java:209) ~[connectors-runtime.jar:?]
	at com.sun.enterprise.resource.pool.PoolManagerImpl.resourceClosed(PoolManagerImpl.java:353) ~[connectors-runtime.jar:?]
	at com.sun.enterprise.resource.listener.LocalTxConnectionEventListener.connectionClosed(LocalTxConnectionEventListener.java:58) ~[connectors-runtime.jar:?]
	at com.sun.gjc.spi.ManagedConnectionImpl.connectionClosed(ManagedConnectionImpl.java:735) ~[jdbc-ra-ds.jar:?]
	at com.sun.gjc.spi.base.ConnectionHolder.close(ConnectionHolder.java:193) ~[jdbc-ra-ds.jar:?]
	at com.sun.gjc.spi.jdbc40.ConnectionHolder40.close(ConnectionHolder40.java:590) ~[jdbc-ra-ds.jar:?]
	at com.mycompany.database.scheduler.dbbase.TaskDbBase.insertTask(TaskDbBase.java:605) ~[database.interface-3.0.2-SNAPSHOT_jar/:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
	at org.glassfish.exousia.AuthorizationService.lambda$invokeBeanMethod$5(AuthorizationService.java:505) ~[exousia.jar:2.1.1]
	at org.glassfish.exousia.AuthorizationService.runInScope(AuthorizationService.java:581) ~[exousia.jar:2.1.1]
	at org.glassfish.exousia.AuthorizationService.invokeBeanMethod(AuthorizationService.java:505) ~[exousia.jar:2.1.1]
	at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:295) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4193) ~[ejb-container.jar:?]
	at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:604) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:706) ~[ejb-container.jar:?]
	at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:553) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:122) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:99) ~[ejb-container.jar:?]
	at jdk.internal.reflect.GeneratedMethodAccessor100.invoke(Unknown Source) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
	at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:766) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:706) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:307) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4168) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4158) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:169) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:61) ~[ejb-container.jar:?]
	at jdk.proxy43.$Proxy356.insertTask(Unknown Source) ~[?:?]
	at com.mycompany.task.TaskMainInfBean.justCreateTaskTask(TaskMainInfBean.java:689) ~[scheduler-3.0.2-SNAPSHOT_jar/:?]
	at com.mycompany.task.TaskMainInfBean.createTask(TaskMainInfBean.java:266) ~[scheduler-3.0.2-SNAPSHOT_jar/:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
	at org.glassfish.exousia.AuthorizationService.lambda$invokeBeanMethod$5(AuthorizationService.java:505) ~[exousia.jar:2.1.1]
	at org.glassfish.exousia.AuthorizationService.runInScope(AuthorizationService.java:581) ~[exousia.jar:2.1.1]
	at org.glassfish.exousia.AuthorizationService.invokeBeanMethod(AuthorizationService.java:505) ~[exousia.jar:2.1.1]
	at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:295) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4193) ~[ejb-container.jar:?]
	at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:604) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:706) ~[ejb-container.jar:?]
	at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:553) ~[ejb-container.jar:?]
	at org.jboss.weld.module.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:78) ~[weld-osgi-bundle.jar:5.1.2.Final]
	at org.jboss.weld.module.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52) ~[weld-osgi-bundle.jar:5.1.2.Final]
	at jdk.internal.reflect.GeneratedMethodAccessor99.invoke(Unknown Source) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
	at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:766) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:706) ~[ejb-container.jar:?]
	at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:553) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:122) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:99) ~[ejb-container.jar:?]
	at jdk.internal.reflect.GeneratedMethodAccessor100.invoke(Unknown Source) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
	at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:766) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:706) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:307) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4168) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4158) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:169) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:61) ~[ejb-container.jar:?]
	at jdk.proxy43.$Proxy396.createTask(Unknown Source) ~[?:?]
	at com.mycompany.useraction.UserActionScheduler.createTaskWithoutActivation(UserActionScheduler.java:331) ~[useraction-3.0.2-SNAPSHOT_jar/:?]
	at com.mycompany.useraction.UserActionScheduler.startUserActionTargetProcessing(UserActionScheduler.java:242) ~[useraction-3.0.2-SNAPSHOT_jar/:?]
	at com.mycompany.useraction.UserActionInfBean.startUserActionTargetProcessing(UserActionInfBean.java:373) ~[useraction-3.0.2-SNAPSHOT_jar/:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
	at org.glassfish.exousia.AuthorizationService.lambda$invokeBeanMethod$5(AuthorizationService.java:505) ~[exousia.jar:2.1.1]
	at org.glassfish.exousia.AuthorizationService.runInScope(AuthorizationService.java:581) ~[exousia.jar:2.1.1]
	at org.glassfish.exousia.AuthorizationService.invokeBeanMethod(AuthorizationService.java:505) ~[exousia.jar:2.1.1]
	at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:295) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4193) ~[ejb-container.jar:?]
	at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:604) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:706) ~[ejb-container.jar:?]
	at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:553) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:122) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:99) ~[ejb-container.jar:?]
	at jdk.internal.reflect.GeneratedMethodAccessor100.invoke(Unknown Source) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
	at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:766) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:706) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:307) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4168) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4158) ~[ejb-container.jar:?]
	at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:169) ~[ejb-container.jar:?]
	... 131 more

The bean code is:

@Stateless
@TransactionAttribute(TransactionAttributeType.REQUIRED)
TaskMainInfBean {
    @EJB
    private TaskDbLocal taskDb;
    
    @Override
    @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
    public long createTask(BaseTaskValue task) {
        taskDb.insertTask(task);
    }
}

and database code is:

@Stateless
TaskDbBase {
    @EJB
    protected DbUtilsLocal dbUtils;
    
    @Override
    @TransactionAttribute(TransactionAttributeType.REQUIRED)
    public void insertTask(BaseTaskValue taskValue) {
        log.debug("@>insertTask(taskValue={})", taskValue);

        try (Connection conn = dbUtils.getConnection();
                PreparedStatement ps = conn.prepareStatement(INSERT_INTO_TASK_QUERY)) {

  ...
            ps.executeUpdate();
        } catch (SQLException ex) {
            log.error("insertTask(taskValue={})->failed", taskValue, ex);
            throw new MycompanySqlRuntimeException(ex);
        } catch (IOException e) {
            log.error("insertTask(taskValue={})->failed", taskValue, e);
            throw new MycompanyApplicationException(e);
        }

        log.debug("@<insertTask(taskValue={})", taskValue);
    }
}     

with:

@Singleton
DbUtilsBase {
    @Resource(name = SysDef.JNDI_DIR + SysDef.JDBC_DMPMAIN)
    private DataSource mycompanyDatabase;

    @Override
    @TransactionAttribute(TransactionAttributeType.SUPPORTS)
    public Connection getConnection() throws SQLException {
        long numberOfCalls = callCounter.incrementAndGet();
        
        Connection connection = mycompanyDatabase.getConnection();
        // For PostgreSQL it is required to set the autocommit to false to be able to import masterdata in
        // parts and undo the commit when too many failures occurred. For Oracle it is not needed, but it does not hurt.
        connection.setAutoCommit(false);
        return connection;
    }
}

One thing special: the TaskMainInfBean is looked up via jndi via "java:global"

Steps to reproduce

Start Glassfish 7.
Deploy ear file.
Run integration test, which makes REST api calls to test code in the ear file.

Impact of Issue

The test itself is inserting new records in the database. The connection pool is 25 connections. The test assumes the inserts into the database work and should succeed. The rest calls made to the server are not in parallel, only one call at a time: not expecting any connection pool sizing issues.

Exepectation was: test succeeds, but fails due to above exception.
Extra problem: Due to exceptions the connections from the connection pool get lost / are not immediately available again. Without ConnectionLeakReclaim it seems they stay 'lost'.

With ConnectionLeakReclaim=true and ConnectionLeakTimeoutInSeconds=60 you see reclaims:

[#|2024-02-12T10:31:56.786289+01:00|INFO|GF 7.0.10|jakarta.enterprise.resource.resourceadapter.com.sun.enterprise.resource.pool|_ThreadID=163;_ThreadName=connector-timer-proxy;_LevelValue=800;|
  Reclaiming the leaked connection of pool [mycompanyAmmDSPool] and destroying it so as to avoid both the application that leaked the connection and any other request that can potentially acquire the same connection from the pool end up using the connection at the same time|#]

[#|2024-02-12T10:31:56.787285+01:00|WARNING|GF 7.0.10|jakarta.enterprise.resource.resourceadapter.com.sun.enterprise.resource.pool|_ThreadID=163;_ThreadName=connector-timer-proxy;_LevelValue=900;|
  A potential connection leak detected for connection pool org.glassfish.resourcebase.resources.api.PoolInfo@11cca80c[jndiName=mycompanyAmmDSPool, applicationName=null, moduleName=null]. The stack trace of the thread is provided below:
com.sun.enterprise.resource.pool.ConnectionPool.setResourceStateToBusy(ConnectionPool.java:302)
com.sun.enterprise.resource.pool.ConnectionPool.getResourceFromPool(ConnectionPool.java:719)
com.sun.enterprise.resource.pool.ConnectionPool.getUnenlistedResource(ConnectionPool.java:590)
com.sun.enterprise.resource.pool.ConnectionPool.internalGetResource(ConnectionPool.java:475)
com.sun.enterprise.resource.pool.ConnectionPool.getResource(ConnectionPool.java:344)
com.sun.enterprise.resource.pool.PoolManagerImpl.getResourceFromPool(PoolManagerImpl.java:223)
com.sun.enterprise.resource.pool.PoolManagerImpl.getResource(PoolManagerImpl.java:152)
com.sun.enterprise.connectors.ConnectionManagerImpl.getResource(ConnectionManagerImpl.java:355)
com.sun.enterprise.connectors.ConnectionManagerImpl.internalGetConnection(ConnectionManagerImpl.java:298)
com.sun.enterprise.connectors.ConnectionManagerImpl.allocateConnection(ConnectionManagerImpl.java:183)
com.sun.enterprise.connectors.ConnectionManagerImpl.allocateConnection(ConnectionManagerImpl.java:157)
com.sun.enterprise.connectors.ConnectionManagerImpl.allocateConnection(ConnectionManagerImpl.java:151)
com.sun.gjc.spi.base.AbstractDataSource.getConnection(AbstractDataSource.java:99)
com.mycompany.database.util.DbUtilsBase.getConnection(DbUtilsBase.java:55)
jdk.internal.reflect.GeneratedMethodAccessor104.invoke(Unknown Source)
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.base/java.lang.reflect.Method.invoke(Method.java:568)
org.glassfish.exousia.AuthorizationService.lambda$invokeBeanMethod$5(AuthorizationService.java:505)
org.glassfish.exousia.AuthorizationService.runInScope(AuthorizationService.java:581)
org.glassfish.exousia.AuthorizationService.invokeBeanMethod(AuthorizationService.java:505)
org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:295)
com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4193)
com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:604)
com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:706)
com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:553)
com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:122)
com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:99)
jdk.internal.reflect.GeneratedMethodAccessor100.invoke(Unknown Source)
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.base/java.lang.reflect.Method.invoke(Method.java:568)
com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:766)
com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:706)
com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:307)
com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4168)
com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4158)
com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:169)
com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:61)
jdk.proxy43/jdk.proxy43.$Proxy373.getConnection(Unknown Source)
com.mycompany.database.scheduler.dbbase.TaskDbBase.insertTask(TaskDbBase.java:567)

Investigation

What I will do:

@escay
Copy link
Contributor Author

escay commented Feb 12, 2024

Testing using the fix of: payara/Payara#3344 solves the issue when applied to 7.0.10 codeline in class ConnectorXAResource: No exceptions. Integration test succeeds.

Postgres connection pool connections are all state: 'idle' after the test, validated using postgresql query:
select application_name, query_start, state_change, state, left(query, 70) from pg_stat_activity;

NOTE I did not apply the fix of payara/Payara#769 to ConnectorXAResource, but it might be needed in the future in Glassfish as well. I am only testing working connections and transactions at this moment, no negative testing which might show issue 769 at this moment.

@arjantijms
Copy link
Contributor

I reviewed that fix back then and tried to find some root cause, but it's illusive indeed. As the simple NPE guard has been in use for years now, maybe we should just apply it too. Although I still like to find that actual cause.

@escay
Copy link
Contributor Author

escay commented Feb 13, 2024

I have been able to reproduce it with: jakarta.enterprise.resource.jta.level=FINE enabled.

After 1946 times --Created new JavaEETransactionImpl logs it happens. Log contains at that moment about 4173 log lines with In JavaEETransactionManagerSimplified.delistResource. The log file is 575740 lines long.
glassfish-issue-24805-jta-log-until-first-failure.zip

Tips where to look or focus at are welcome.

One pattern I see is that all records before the failure show a not null value for nonXAResource, like:
txId=1945 nonXAResource=9 and suddenly there is txId=1946 nonXAResource=null and the syncs= does not contain PoolManagerImpl$SynchronizationListener

Example line 572522 differs from the ones before it:

Line 572104: In JavaEETransactionManagerSimplified.delistResource, h=11 h.xares=com.sun.enterprise.resource.ConnectorXAResource@79c1612b tran=JavaEETransactionImpl: txId=1945 nonXAResource=9 jtsTx=null localTxStatus=0 syncs=[com.sun.ejb.containers.ContainerSynchronization@6b11826c, com.sun.ejb.containers.SimpleEjbResourceHandlerImpl@11e3e28d, com.sun.enterprise.resource.pool.PoolManagerImpl$SynchronizationListener@3030ec9f]]]
Line 572130: In JavaEETransactionManagerSimplified.delistResource, h=9 h.xares=com.sun.enterprise.resource.ConnectorXAResource@389683f4 tran=JavaEETransactionImpl: txId=1945 nonXAResource=9 jtsTx=null localTxStatus=0 syncs=[com.sun.ejb.containers.ContainerSynchronization@6b11826c, com.sun.ejb.containers.SimpleEjbResourceHandlerImpl@11e3e28d, com.sun.enterprise.resource.pool.PoolManagerImpl$SynchronizationListener@3030ec9f]]]
Line 572320: In JavaEETransactionManagerSimplified.delistResource, h=9 h.xares=com.sun.enterprise.resource.ConnectorXAResource@389683f4 tran=JavaEETransactionImpl: txId=1945 nonXAResource=9 jtsTx=null localTxStatus=0 syncs=[com.sun.ejb.containers.ContainerSynchronization@6b11826c, com.sun.ejb.containers.SimpleEjbResourceHandlerImpl@11e3e28d, com.sun.enterprise.resource.pool.PoolManagerImpl$SynchronizationListener@3030ec9f]]]
Line 572328: In JavaEETransactionManagerSimplified.delistResource, h=10 h.xares=com.sun.enterprise.resource.ConnectorXAResource@27ed6f23 tran=JavaEETransactionImpl: txId=1945 nonXAResource=9 jtsTx=null localTxStatus=0 syncs=[com.sun.ejb.containers.ContainerSynchronization@6b11826c, com.sun.ejb.containers.SimpleEjbResourceHandlerImpl@11e3e28d, com.sun.enterprise.resource.pool.PoolManagerImpl$SynchronizationListener@3030ec9f]]]
Line 572336: In JavaEETransactionManagerSimplified.delistResource, h=11 h.xares=com.sun.enterprise.resource.ConnectorXAResource@79c1612b tran=JavaEETransactionImpl: txId=1945 nonXAResource=9 jtsTx=null localTxStatus=0 syncs=[com.sun.ejb.containers.ContainerSynchronization@6b11826c, com.sun.ejb.containers.SimpleEjbResourceHandlerImpl@11e3e28d, com.sun.enterprise.resource.pool.PoolManagerImpl$SynchronizationListener@3030ec9f]]]
Line 572362: In JavaEETransactionManagerSimplified.delistResource, h=9 h.xares=com.sun.enterprise.resource.ConnectorXAResource@389683f4 tran=JavaEETransactionImpl: txId=1945 nonXAResource=9 jtsTx=null localTxStatus=0 syncs=[com.sun.ejb.containers.ContainerSynchronization@6b11826c, com.sun.ejb.containers.SimpleEjbResourceHandlerImpl@11e3e28d, com.sun.enterprise.resource.pool.PoolManagerImpl$SynchronizationListener@3030ec9f]]]
Line 572522: In JavaEETransactionManagerSimplified.delistResource, h=10 h.xares=com.sun.enterprise.resource.ConnectorXAResource@27ed6f23 tran=JavaEETransactionImpl: txId=1946 nonXAResource=null jtsTx=null localTxStatus=0 syncs=[com.sun.ejb.containers.ContainerSynchronization@1ef62a62, com.sun.ejb.containers.SimpleEjbResourceHandlerImpl@2c0ec22c]]]
Line 572532: 	at com.sun.enterprise.transaction.JavaEETransactionManagerSimplified.delistResource(JavaEETransactionManagerSimplified.java:520)

Why is nonXAResource null in this case?
Application code is just plain EJB transactions + database code, sometimes called via REST api, sometimes called from managed thread pool. I would not expect "XA / two-phase commit calls" in our code.

The only other situation where I see delistResource with nonXAResource=null is at the very beginning of server startup, but then the jtsTx is set. Right after EJB5109:EJB Timer Service started successfully for data source [jdbc/__TimerPool]:

Line   1287: In JavaEETransactionManagerSimplified.delistResource, h=1 h.xares=com.sun.gjc.spi.XAResourceImpl@67c07864 tran=JavaEETransactionImpl: txId=1 nonXAResource=null jtsTx=com.sun.jts.jta.TransactionImpl@de0690a7 localTxStatus=0 syncs=[com.sun.ejb.containers.ContainerSynchronization@1b1a70a6]]]

Another thing I see is:

In JavaEETransactionManagerSimplified.enlistResource, h=11 h.xares=com.sun.enterprise.resource.ConnectorXAResource@79c1612b tran=JavaEETransactionImpl: txId=1945 nonXAResource=9 jtsTx=null localTxStatus=0 syncs=[com.sun.ejb.containers.ContainerSynchronization@6b11826c, com.sun.ejb.containers.SimpleEjbResourceHandlerImpl@11e3e28d, com.sun.enterprise.resource.pool.PoolManagerImpl$SynchronizationListener@3030ec9f]]]

is called for transaction txId=1945, but not for txId = 1946, this is probably causing nonXAResource to be null.

@escay
Copy link
Contributor Author

escay commented Feb 14, 2024

Enabled more debugging and added some debug info myself.
glassfish-issue-24805-jta-log-until-first-failure-v2.zip

Enabled:

jakarta.enterprise.resource.jta.level=FINE
jakarta.enterprise.resource.resourceadapter.level=FINE

When comparing the successful transaction before the failing one I see:

  • enlistResource is not called, thus nonXAResource is never set in the failing situation
  • enlistResource is not called, because com.sun.enterprise.resource.rm.ResourceManagerImpl.registerResource(ResourceHandle) is not called
  • registerResource(ResourceHandle) is not called because com.sun.enterprise.resource.pool.PoolManagerImpl.getResource(ResourceSpec, ResourceAllocator, ClientSecurityInfo) does not call PoolManagerImpl -> getResourceManager(resourceSpec).enlistResource(resourceHandle);

Changed toString of com.sun.enterprise.resource.ResourceHandle to list the state:

    @Override
    public String toString() {
        return String.valueOf(id) + ", state=" + state;
    }

In the PoolManagerImpl.getResource(ResourceSpec, ResourceAllocator, ClientSecurityInfo) method I added logging:

    public Object getResource(ResourceSpec resourceSpec, ResourceAllocator resourceAllocator, ClientSecurityInfo clientSecurityInfo) throws PoolingException, RetryableUnavailableException {
        if (LOG.isLoggable(Level.FINE)) {
            LOG.log(Level.FINE, "getResource, resourceSpec=" + resourceSpec + " resourceAllocator=" + resourceAllocator + " clientSecurityInfo=" + clientSecurityInfo);
        }
        
        ...
        
        if (LOG.isLoggable(Level.FINE)) {
            LOG.log(Level.FINE, "getResource, transaction=" + transaction + " resourceHandle=" + resourceHandle + " isUnenlisted=" + resourceHandle.getResourceState().isUnenlisted());
        }

        try {
            if (resourceHandle.getResourceState().isUnenlisted()) {
                // The spec being used here is the spec with the updated lazy enlistment info.

                // Here's the real place where we care about the correct resource manager
                // (which in turn depends upon the ResourceSpec)
                // and that's because if lazy enlistment needs to be done we need to get the
                // LazyEnlistableResourceManager

                getResourceManager(resourceSpec).enlistResource(resourceHandle);
            } else {
                if (LOG.isLoggable(Level.FINE)) {
                    LOG.log(Level.FINE, "getResource DO NOT ENLIST RESOURCE: "+ resourceHandle);
                }
            }
        } catch (Exception e) {

Which shows in the successful situation Enlisted :false eventually results in nonXAResource to be assigned and is no longer null:

[2024-02-14T11:14:57.219388+01:00] [GF 7.0.10] [FINE] [] [jakarta.enterprise.resource.resourceadapter.com.sun.enterprise.resource.pool] [tid: _ThreadID=77 _ThreadName=http-listener-1(2)] [levelValue: 500] [[
  getResource, transaction=JavaEETransactionImpl: txId=1956 nonXAResource=null jtsTx=null localTxStatus=0 syncs=[com.sun.ejb.containers.ContainerSynchronization@3f68400, com.sun.ejb.containers.SimpleEjbResourceHandlerImpl@22f98538] resourceHandle=9, state=Enlisted :false Busy :true isUnenlisted=true]]

and in the failing situation:


[2024-02-14T11:14:57.677185+01:00] [GF 7.0.10] [FINE] [] [jakarta.enterprise.resource.resourceadapter.com.sun.enterprise.resource.pool] [tid: _ThreadID=77 _ThreadName=http-listener-1(2)] [levelValue: 500] [[
  getResource, transaction=JavaEETransactionImpl: txId=1957 nonXAResource=null jtsTx=null localTxStatus=0 syncs=[com.sun.ejb.containers.ContainerSynchronization@5370d7cc, com.sun.ejb.containers.SimpleEjbResourceHandlerImpl@1f2be94b] resourceHandle=10, state=Enlisted :true Busy :true isUnenlisted=false]]

[2024-02-14T11:14:57.677185+01:00] [GF 7.0.10] [FINE] [] [jakarta.enterprise.resource.resourceadapter.com.sun.enterprise.resource.pool] [tid: _ThreadID=77 _ThreadName=http-listener-1(2)] [levelValue: 500] [[
  getResource DO NOT ENLIST RESOURCE: 10, state=Enlisted :true Busy :true]]

The state Enlisted :true results in nonXAResource=null and never links to a resource.
I do not yet understand why Enlisted is true at that moment.

(Note to myself: there are other places in the log with the getResource DO NOT ENLIST RESOURCE line. In those places it seems the nonXAResource is already filled. Perhaps I should compare these to understand Enlisted behavior.)

Also the unregister is called in the failing situation:

[2024-02-14T11:14:57.678183+01:00] [GF 7.0.10] [FINE] [] [jakarta.enterprise.resource.resourceadapter.com.sun.enterprise.resource.rm] [tid: _ThreadID=77 _ThreadName=http-listener-1(2)] [levelValue: 500] [[
  unregisterResource: handle=10, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@5f8bedda, transactional=true]]

and then the NPE exception occurs.
Here is also some other mismatch: if the resource was not registered via registerResource, why does unregisterResource happen, should it not always be a 1:1 relation?

And this relates to:

  • how does register and unregister work by design? Why is unregister called more times than the amount of register calls?

I see more calls to unregister than to register, example:

	Line 686134:   registerResource: handle=9, state=Enlisted :false Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 688476:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 691958:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	
	Line 696584:   registerResource: handle=9, state=Enlisted :false Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 697749:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 701456:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 701698:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 701912:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	
	Line 702302:   registerResource: handle=9, state=Enlisted :false Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 702476:   registerResource: handle=10, state=Enlisted :false Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@5f8bedda, transactional=true]]
	Line 702688:   registerResource: handle=11, state=Enlisted :false Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@7454e799, transactional=true]]
	Line 702799:   unregisterResource: handle=11, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@7454e799, transactional=true]]
	Line 702847:   unregisterResource: handle=10, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@5f8bedda, transactional=true]]
	Line 702895:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 703644:   unregisterResource: handle=11, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@7454e799, transactional=true]]
	Line 703687:   unregisterResource: handle=10, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@5f8bedda, transactional=true]]
	Line 703730:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 703973:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 704588:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 704880:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 705200:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 705695:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 706316:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 706706:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 707229:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	Line 707591:   unregisterResource: handle=9, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@522b4e52, transactional=true]]
	
	## missing register for handle 10
	Line 707836:   unregisterResource: handle=10, state=Enlisted :true Busy :true, resource=com.sun.gjc.spi.ManagedConnectionImpl@5f8bedda, transactional=true]]

@escay
Copy link
Contributor Author

escay commented Feb 17, 2024

Based on what I can make of my logging, versus the code I suspect the problem is related to state of one transaction influencing the next transaction.

I added class instance ids in the logging for the ResourceState object.
I added logging in the ResourceState.enlisted setter, to see when and who changes the value.

Below my notes from the failing situation (perhaps I will later add it with the log file) but just to give an idea of what I see:

  • two different transactions
  • same Thread (not sure if that is relevant, I suspect not I assume the resource handles can be reused by any thread)
  • The first transaction succeeds, but leaves the ResourceState as enlisted=true.
  • The second transaction happens to reuse the Resource handle with same ResourceState id 2ddbcffd and enlisted is already true.
Line 2983378: ##### start of txId=2525 - ThreadID=78
Line 2983379: ##### Created new JavaEETransactionImpl txId = 2525, ThreadID=78
Line 2988293: ##### resource 2ddbcffd, enlisted false, txId=2525
Line 2988697: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2990367: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2992701: ##### resource 2ddbcffd, enlisted false, txId=2525
Line 2993053: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2993327: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2993595: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2993857: ##### resource 2ddbcffd, enlisted false, txId=2525
Line 2994002: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2994577: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2995007: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2995420: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2995891: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2996255: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2996650: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2997292: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2997572: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2998140: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2998539: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2998947: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2999509: ##### resource 2ddbcffd, enlisted true,  txId=2525
Line 2999634: ##### delistResource END (3) - ### AFTER DELIST STILL enlisted=true -> <ResourceState enlisted=true, busy=false, instance@2ddbcffd/> - txId=2525, ThreadID=78
Line 2999812: ##### postInvokeTx END (2) end of Bean call - end of txId=2525, ThreadID=78

Line 2999813: ##### starts with UserActionInfBean call - this is the beginning of txId=2546, ## important: same _ThreadID=78 ##
Line 2999844: ##### Created new JavaEETransactionImpl txId = 2546, ThreadID=78 - which eventually fails
Line 3000108: ##### enlisted is already true for 2ddbcffd, txId=2546 ### ERROR: Why already enlisted for new transsaction #####
Line 3000157: ##### postInvokeTx END (2) - DbUtilsLocal.getConnection(), ThreadID=78
Line 3000181: ##### nonXAResource=null causing exception below. Note also: syncs= does not contain a sync for PoolManagerImpl, while in a sinple tranaction I see PoolManagerImpl in the list.

I wonder: should ResourceState enlisted not be altered to false in:
com.sun.enterprise.transaction.JavaEETransactionManagerSimplified.delistResource(Transaction, TransactionalResource, int)

Same for: com.sun.enterprise.transaction.JavaEETransactionManagerSimplified.delistJTSResource(Transaction, TransactionalResource, int) where h.isEnlisted() is called to delist the resource from the transaction, but state is not altered to reflect it?

Also the logic in 'com.sun.enterprise.transaction.JavaEETransactionManagerSimplified.delistComponentResources(ComponentInvocation, boolean)':

if (h.isEnlisted()) {
    delistResource(tran, h, flag);
}

but it is not updating the handle state there if the delistResource succeeded. The delistResource success/false result it not even looked at.

To test this I have a dependency problem: jta module does not know about the connectors-runtime com.sun.enterprise.resource.ResourceHandle, it only has com.sun.enterprise.transaction.spi.TransactionalResource available with isEnlisted.

Another bit I do not understand, the interface has:

  • a way to ask if the resource is enlisted
  • a way to enlist the resource
  • but no way to unenlist the resource?? only a destroy, but this is only to be used in error cases.
public interface TransactionalResource {
    public boolean isEnlisted();

    public void destroyResource();

    /**
     * Indicates that a resource has been enlisted in the transaction.
     * @param tran Transaction to which the resource is enlisted
     * @throws IllegalStateException when unable to enlist the resource
     */
    void enlistedInTransaction(Transaction tran) throws IllegalStateException;

as if this was created before Resources could be pooled / reused ?

@escay
Copy link
Contributor Author

escay commented Feb 18, 2024

One workaround "pleister" that seems to work is:

  • log WARNING
  • repair enlisted state to be false
com.sun.enterprise.resource.pool.ConnectionPool {
...
    /**
     * To provide an unenlisted, valid, matched resource from pool.
     *
     * @param resourceSpec ResourceSpec
     * @param resourceAllocator ResourceAllocator
     * @param transaction Transaction
     * @return ResourceHandle resource from pool
     * @throws PoolingException Exception while getting resource from pool
     */
    protected ResourceHandle getUnenlistedResource(ResourceSpec resourceSpec, ResourceAllocator resourceAllocator, Transaction transaction) throws PoolingException {
        ResourceHandle resource = getResourceFromPool(resourceAllocator, resourceSpec);
        if (resource.getResourceState().isEnlisted()) {
            LOG.log(Level.WARNING, "Resource: " + resource + ", state is enlisted, while resource is coming from pool. Previous thread did not cleanup enlisted state.");
            resource.getResourceState().setEnlisted(false);
        }
        return resource;
    }
}

Result:

  • Test succeeds, without the NPE fix in
  • Log contains my new warnings:
[#|2024-02-18T17:25:25.108106+01:00|WARNING|GF 7.0.10|jakarta.enterprise.resource.resourceadapter.com.sun.enterprise.resource.pool|_ThreadID=79;_ThreadName=http-listener-1(3);_LevelValue=900;|
  Resource: <ResourceHandle id=10, state=<ResourceState enlisted=true, busy=true, instance@41f12c00/>, instance@a/>, state is enlisted, while resource is coming from pool. Previous thread did not cleanup enlisted state.|#]

[#|2024-02-18T17:25:25.120073+01:00|WARNING|GF 7.0.10|jakarta.enterprise.resource.resourceadapter.com.sun.enterprise.resource.pool|_ThreadID=79;_ThreadName=http-listener-1(3);_LevelValue=900;|
  Resource: <ResourceHandle id=11, state=<ResourceState enlisted=true, busy=true, instance@626779d9/>, instance@b/>, state is enlisted, while resource is coming from pool. Previous thread did not cleanup enlisted state.|#]

[#|2024-02-18T17:25:25.620174+01:00|WARNING|GF 7.0.10|jakarta.enterprise.resource.resourceadapter.com.sun.enterprise.resource.pool|_ThreadID=79;_ThreadName=http-listener-1(3);_LevelValue=900;|
  Resource: <ResourceHandle id=10, state=<ResourceState enlisted=true, busy=true, instance@41f12c00/>, instance@a/>, state is enlisted, while resource is coming from pool. Previous thread did not cleanup enlisted state.|#]

This is a patch up / repair. Not a clean fix. Just like preventing the NPE in Connector was not a clean fix.
But also shows: state is busy when the resource comes from the pool. Shouldn't the state of the resource be 'fresh' ?!
Is the state busy perhaps also not always set back to false correctly?

This could be related to issue:

Caused by: java.lang.IllegalStateException: state.isBusy() : false
at com.sun.enterprise.resource.pool.ConnectionPool.resourceClosed(ConnectionPool.java:1008)

as I have seen in Payara in:
payara/Payara#769
payara/Payara#3029
myself, and also ran into this Glassfish 7.0.10 a few times last week.

Note: is see the changes made in Payara, but Glassfish still contains the old implementation. Thread safeness of LocalTxConnectionEventListener and closeConnection, which I logged in Payara issue: 3029 in the past was fixed by:
payara/Payara@c09e8aa

@escay
Copy link
Contributor Author

escay commented Feb 19, 2024

Stress tested my "fix" by running the same test I used to reproduce this issue multiple times.
And I run now into:

Caused by: java.lang.NullPointerException: Cannot invoke "com.sun.enterprise.resource.ResourceHandle.getResourceSpec()" because "resource" is null
        at com.sun.enterprise.resource.pool.PoolManagerImpl.resourceClosed(PoolManagerImpl.java:367)
        at com.sun.enterprise.resource.listener.LocalTxConnectionEventListener.connectionClosed(LocalTxConnectionEventListener.java:58)
        at com.sun.gjc.spi.ManagedConnectionImpl.connectionClosed(ManagedConnectionImpl.java:735)
        at com.sun.gjc.spi.base.ConnectionHolder.close(ConnectionHolder.java:193)
        at com.sun.gjc.spi.jdbc40.ConnectionHolder40.close(ConnectionHolder40.java:590)

which is equivalent to my logged Payara issue:
payara/Payara#3029
where I figured out in the past:

        if (associatedHandles.containsKey(connectionHandle)) {
            handle = (ResourceHandle) associatedHandles.get(connectionHandle);
        }

was not thread safe.

@escay
Copy link
Contributor Author

escay commented Sep 5, 2024

Update: when using the related fixes mentioned in this issue I still run into issues like
IllegalStateException: state.isBusy()

I still suspect there are some threading issues in the code, but I have not been able to proof it in the last few months. I do have a workaround, that avoids having to implement the fix mentioned in:
#24805 (comment)

The workaround for me is:

  • make sure leak detection is enabled
  • investigate the exception -> focus on the part where the leak is detected
  • in all cases I have seen it is 1 transaction with many many sub transactions logic within the application server code (not in the application code) due to 1 bean calling multiple other beans calling multiple other beans to store data in the database - and multiple threads at the same time doing similar bean calls.
  • the fix for me is to mark relevant bean methods in this transaction tree as NotSupported in the application code, where no transaction is needed for my business logic (for example when a database read query is ok with older data from the database which was not altered within this transaction). This seems to avoid some transaction overhead in the transaction code and results in avoiding the problem to occur in my application in that piece of code. (Drawback is: problem sometimes arrives at another location -> which requires the same approach).

I still want to keep this issue open, and I still do not want to implement the fix mentioned in #24805 (comment) at this moment in time

I did local testing with the fix:

    private ResourceHandle getResourceHandle() throws PoolingException {
        try {
            ResourceHandle resourceHandle;
            JavaEETransaction javaEETransaction = getCurrentTransaction();
            if (javaEETransaction == null || javaEETransaction.getNonXAResource() == null) { // Only if some thing is wrong with tx manager.
                if (javaEETransaction != null && javaEETransaction.getNonXAResource() == null) {
                    _logger.warning("Non XA resource not found for transaction: " + javaEETransaction + ", returning local handle: " + localHandle_);
                }

to see a warning.

@OndroMih
Copy link
Contributor

OndroMih commented Sep 5, 2024

If you want to try with virtual threads if the issue is related to thread locals not properly cleaned, you can try with this experimental virtual threads pool for Grizzly/GlassFish: https://github.com/OmniFish-EE/glassfish-grizzly-virtual-threads-pool

@OndroMih
Copy link
Contributor

OndroMih commented Sep 5, 2024

There's also Thread Local cleaner component:

. Currently, it clears thread locals only when WAR is undeployed. But I think it could be used also when a thread s added back to the pool, to detect and clear stale thread locals.

@OndroMih
Copy link
Contributor

OndroMih commented Sep 5, 2024

Just to clarify the terms used in the source code, correct me if I'm wrong:

  • resource - represents a DB connection
  • enlisted - value true means that the connection is associated to a transaction
  • closed - a connection is not used by the application anymore and can be put back to the connection pool
  • free - a connection is in the pool and free to be reused

If I understand correctly, the problem is that sometimes a connection is marked as enlisted and stays enlisted even after it's put back to the pool (it's closed and free). When reused, the connection is not enlisted with the transaction because it's already marked as enlisted. This means that the enlistResource method on JavaEETransactionManagerSimplified is not called, and therefore the nonXAResource is not set on JavaEETransactionImpl.

It seems that a resource is delisted only in transactionCompleted method of PoolTxHelper, which is called from SynchronizationListener.afterCompletion. The fact that, according to one of the above comments, this listener is not registered with the problematic transaction, means that this method is not called. And thus the resource is not delisted.

The question now is:

Why SynchronizationListener is not registered for the problematic transaction?

@OndroMih
Copy link
Contributor

OndroMih commented Sep 5, 2024

@escay , can you see messages "JavaEETransactionImpl.registerSynchronization" ? They should be visible if you set jakarta.enterprise.resource.jta logger to level FINE, which you did previously. This message is logged when the SynchronizationListener is registered with a transaction.

@escay
Copy link
Contributor Author

escay commented Sep 5, 2024

In the log file above:
glassfish-issue-24805-jta-log-until-first-failure-v2.zip

the JavaEETransactionImpl.registerSynchronization is logged:

...
[2024-02-14T11:14:57.673198+01:00] [GF 7.0.10] [FINE] [] [jakarta.enterprise.resource.jta.com.sun.enterprise.transaction] [tid: _ThreadID=77 _ThreadName=http-listener-1(2)] [levelValue: 500] [[
  --Created new JavaEETransactionImpl, txId = 1957]]

[2024-02-14T11:14:57.673198+01:00] [GF 7.0.10] [FINE] [] [jakarta.enterprise.resource.jta.com.sun.enterprise.transaction] [tid: _ThreadID=77 _ThreadName=http-listener-1(2)] [levelValue: 500] [[
  TM: getTransaction: tx=JavaEETransactionImpl: txId=1957 nonXAResource=null jtsTx=null localTxStatus=0 syncs=[], tm=null]]

[2024-02-14T11:14:57.674193+01:00] [GF 7.0.10] [FINE] [] [jakarta.enterprise.resource.jta.com.sun.enterprise.transaction] [tid: _ThreadID=77 _ThreadName=http-listener-1(2)] [levelValue: 500] [[
  TM: enlistComponentResources]]

[2024-02-14T11:14:57.674193+01:00] [GF 7.0.10] [FINE] [] [jakarta.enterprise.resource.jta.com.sun.enterprise.transaction] [tid: _ThreadID=77 _ThreadName=http-listener-1(2)] [levelValue: 500] [[
  TM: getTransaction: tx=JavaEETransactionImpl: txId=1957 nonXAResource=null jtsTx=null localTxStatus=0 syncs=[], tm=null]]

[2024-02-14T11:14:57.674193+01:00] [GF 7.0.10] [FINE] [] [jakarta.enterprise.resource.jta.com.sun.enterprise.transaction] [tid: _ThreadID=77 _ThreadName=http-listener-1(2)] [levelValue: 500] [[
  

In JavaEETransactionManagerSimplified.getExistingResourceList,  ResourceHandler type: com.sun.ejb.containers.SessionContextImpl ResourceHandler: TaskMainInfBean; id: [B@48ef7fb5 return list: [] list size: 0]]

[2024-02-14T11:14:57.674193+01:00] [GF 7.0.10] [FINE] [] [jakarta.enterprise.resource.jta.com.sun.enterprise.transaction] [tid: _ThreadID=77 _ThreadName=http-listener-1(2)] [levelValue: 500] [[
  TM: enlistComponentResources inv=EjbInvocation  componentId=mycompany-test.payara.postgresql_scheduler-3.0.2-SNAPSHOT.jar_TaskMainInfBean_111929274229850425,isLocal=true,isRemote=false,isBusinessInterface=true,isWebService=false,isMessageDriven=false,isHome=false,clientInterface=interface com.mycompany.serverlib.task.TaskMainInfLocal,method=public abstract long com.mycompany.serverlib.task.TaskMainInfLocal.createTask(com.mycompany.lib.server.scheduler.BaseTaskValue),ejb=com.mycompany.task.TaskMainInfBean@2daba204,exception=null,exceptionFromBeanMethod=null,invId=0,wasCancelCalled=false,yetToSubmitStatus=true, tran=JavaEETransactionImpl: txId=1957 nonXAResource=null jtsTx=null localTxStatus=0 syncs=[], l = []]]

[2024-02-14T11:14:57.674193+01:00] [GF 7.0.10] [FINE] [] [jakarta.enterprise.resource.jta.com.sun.enterprise.transaction] [tid: _ThreadID=77 _ThreadName=http-listener-1(2)] [levelValue: 500] [[
  --In JavaEETransactionImpl.registerSynchronization, jtsTx=null nonXAResource=null]]
...

Note that ThreadID=77 seems to be reused.
Note that I run tests where transactions also might fail and are rolled back on purpose -> could mean these fail situations do not clean up correctly?

I have made some attempts to write some code that recognizes the already "marked as enlisted and stays enlisted" situation when the resource (which indeed represents a DB connection) is retreived from the database connection pool. But I have failed so far. I would like to turn the situation around: already figure out that something is wrong at the start, to understand all expected states I started with issue 24900: document the code to understand it myself.

In commit bb41ac5#diff-21f318ff963fcb278c1231bc78adad08cb850c3527d071d5c4753e101633ab5d -> PoolManagerImplTest.java (maybe not related), I have added some unit test code and TODOs in PoolManagerImpl where I do not trust or not understand the error handling code:

        if (resourceAbortedOccured) {
            // Return the resource to the pool using resourceAbortOccurred
            poolManagerImpl.resourceAbortOccurred(resource);
            // Related transaction delist should be called for the resource
            assertTrue(javaEETransactionManager.isDelistIsCalled(resource));

            // TODO: connection error occurred flag on the resource is only set via badConnectionClosed
            // why isn't it also called for resourceAbortedOccured? Bug?!
            assertResourceHasNoConnectionErrorOccured(resource);
        } else {
            // Return the resource to the pool using resourceErrorOccurred
            poolManagerImpl.resourceErrorOccurred(resource);
            // Related transaction delist should be called for the resource, shouldn't it?
            // resourceErrorOccurred does not seem to remove the resource from the transaction, possible bug?
            // TODO: should be assertTrue
            assertFalse(javaEETransactionManager.isDelistIsCalled(resource));

            // TODO: connection error occurred flag on the resource is only set via badConnectionClosed
            // why isn't it also called for resourceErrorOccurred? Bug?!
            assertResourceHasNoConnectionErrorOccured(resource);
        }

@OndroMih
Copy link
Contributor

OndroMih commented Sep 6, 2024

@escay , can you please try this PR: #25131?

@bvfalcon
Copy link

bvfalcon commented Jan 6, 2025

I suppose, the importance of this issue is underestimated, because of described problem exists not only for PostgreSQL, but also for Oracle (v 19.21) and MS SQL (Microsoft SQL Server 2019).
The appearance of problem is sporadic, that makes it difficult to reproduce.

@escay
Copy link
Contributor Author

escay commented Jan 6, 2025

I can confirm I can reproduce the issue both on PostgreSQL and Oracle without code changes, using the same application.
More info in PR #25131.

@OndroMih
Copy link
Contributor

OndroMih commented Jan 7, 2025

Hi @bvfalcon, I tried to fix this in #25131 but it's not easy to fix it properly for all scenarios. Since then, my team didn't have time to work in this.

Issues like these require a significant time and effort by experts in the codebase, it's not easy to fix them for free. My team at OmniFish has the experts but currently no time.

I encourage you to contact us at https://omnifish.ee/contact-us/ if you'd like to get dedicated support for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants