Avoid system freeze by increasing SLSB instance pool limit

Summary

In certain system constellations it was observed that the default settings for the SLSB (stateless session beans) instance pool can cause the system to repeatedly freeze temporarily. One such constellation affected is the use of the ImageMaster Content Services for ERP (based on SAP ArchiveLink) in combination with retention management and the stamping feature.

Symptoms

In this situation just ImageMaster freezes temporarily while the administration console of the application server stays accessible and responsive. In the application server log file a message like the following is generated:

WFLYEJB0378: Failed to acquire a permit within 5 MINUTES

This situation is also characterized by many threads which are in the state “waiting on condition”, which is logged in the Java thread stack trace in combination with the class “StrictMaxPool”:

"default task-2262" #17575 prio=5 os_prio=0 tid=0x0000000017058800 nid=0xef5 waiting on condition [0x00007f3e977c9000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000003c86301d0> (a java.util.concurrent.Semaphore$NonfairSync)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
	at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:409)
	at org.jboss.as.ejb3.pool.strictmax.StrictMaxPool.get(StrictMaxPool.java:108)
	at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:47)
	at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
	at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:185)
	at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:420)
	at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:148)

To confirm if this situation applies in your case, you can use the jstack utility:

Find the process ID of the application server process, e.g. with jps (Java Virtual Machine Process Status Tool):

[wildfly@edmd126-10 ~]$ jps -v
18694 jboss-modules.jar -D[Standalone] -Xms8g -Xmx8g -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=512m -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true ...
67084 Jps -Dapplication.home=/opt/java/jdk1.8.0_221 -Xms8m

An entry which contains “jboss-modules.jar” represents the application server process. In the example above the belonging process ID is “18694”.

Count the threads with the waiting event, e.g. by analyzing the identified process with the jstack utility:
```
[wildfly@edmd126-10 ~]$ jstack -l 18694 | grep "StrictMaxPool" | wc -l
54
```
In the example above 54 threads are counted, which indicates that the described situation applies.

Related setting – “derive-size”

If this issue is identified, most likely the “derive-size” of the stateless session bean (SLSB) instance pool is not enough, which is associated with the “default-slsb-instance-pool” and related setting “slsb-strict-max-pool”, which you can check in the following manner:

/opt/wildfly/bin/jboss-cli.sh --connect
/subsystem=ejb3/strict-max-bean-instance-pool=slsb-strict-max-pool:read-resource(recursive=true,include-runtime=true)
{
    "outcome" => "success",
    "result" => {
        "derive-size" => "from-worker-pools",
        "derived-size" => 192,
        "max-pool-size" => 20,
        "timeout" => 5L,
        "timeout-unit" => "MINUTES"
    }
}

The example above shows the “derive-size” on a system with 12 CPUs (cores), which is calculated as 192 (= number of CPUs multiplied by 16).

Resolution – increase max pool size

You can increase the pool size by following two steps:

Undefine the derive-size.
Set a dedicated value like 512 or 1024, which should be enough for a performance environment.

Do not set just any too large value because this can cause other issues depending on system resources.

To find an ideal value in a system where the issue already occurs, stepwise try larger values than the default and test when the issue is resolved by monitoring the system.

The commands to achieve this are illustrated below:

/subsystem=ejb3/strict-max-bean-instance-pool=slsb-strict-max-pool:undefine-attribute(name=derive-size)
/subsystem=ejb3/strict-max-bean-instance-pool=slsb-strict-max-pool:write-attribute(name=max-pool-size, value=1024)

You must restart the server so the changes take effect (Reload or restart of the server using CLI). You can achieve this with the CLI command:

--command=":shutdown(restart=true)"

The new settings eventually look as follows:

/subsystem=ejb3/strict-max-bean-instance-pool=slsb-strict-max-pool:read-resource(recursive=true,include-runtime=true)
{
    "outcome" => "success",
    "result" => {
        "derive-size" => undefined,
        "derived-size" => 1024,
        "max-pool-size" => 1024,
        "timeout" => 5L,
        "timeout-unit" => "MINUTES"
    }
}