Add support for tests with intentional app deployment failures #137

brideck · 2024-09-10T21:29:10Z

Some TCKs contain tests that are intentionally causing and testing deployment failures. These have no hope of passing currently because any deployed app that fails to start causes an exception to be thrown from the plugin:

[ERROR] com.sun.ts.tests.websocket.negdep.invalidpathparamtype.srv.onclose.WSCClientIT -- Time elapsed: 193.6 s <<< ERROR!
org.jboss.arquillian.container.spi.client.container.DeploymentException: Timeout while waiting for "wsc_negdep_invalidpathparamtype_srv_onclose_web" ApplicationMBean to reach STARTED. Actual state: INSTALLED.
	at io.openliberty.arquillian.managed.WLPManagedContainer.waitForApplicationTargetState(WLPManagedContainer.java:1280)
	at io.openliberty.arquillian.managed.WLPManagedContainer.deploy(WLPManagedContainer.java:539)
	at org.jboss.arquillian.container.impl.client.container.ContainerDeployController$3.call(ContainerDeployController.java:151)
	at org.jboss.arquillian.container.impl.client.container.ContainerDeployController$3.call(ContainerDeployController.java:118)
	at org.jboss.arquillian.container.impl.client.container.ContainerDeployController.executeOperation(ContainerDeployController.java:239)
	at org.jboss.arquillian.container.impl.client.container.ContainerDeployController.deploy(ContainerDeployController.java:118)
        ...

I'm not sure if there's a better way to check for this particular scenario, but we could add a configuration property (e.g. allowAppDeployFailures) that makes it so the expected/desired state of deployed applications to just be INSTALLED instead of STARTED.

The text was updated successfully, but these errors were encountered:

Azquelt · 2024-09-11T14:15:25Z

There are already many TCK tests which intentionally cause deployment failures in the CDI TCK which have no issues, so we need to see what's different with these tests.

Azquelt · 2024-09-11T14:21:24Z

Assuming we're talking about this test: https://github.com/jakartaee/platform-tck/blob/c4d7da5b46aae3acaf8f9f2eed1ac2b07bb25468/websocket/spec-tests/src/main/java/com/sun/ts/tests/websocket/negdep/malformedpath/WSCClientIT.java#L46-L47

Comparing to LookupAmbiguousTest from the CDI TCK, the CDI test has @ShouldThrowException(DeploymentException.class) to indicate that deployment of the application is expected to fail.

The platform TCK test does not have this annotation so I'm not sure how the arquillian container is supposed to know that that deployment is not expected to succeed.

brideck · 2024-09-11T14:29:36Z

That test among others, yeah. Thanks for sharing the mechanism for this. I can update all of the failing tests locally and see if that gets OL a cleaner looking result. If so, this becomes an obvious TCK challenge.

Curious how Tomcat might be passing this TCK in its current state, unless they either a) aren't failing the deployments or b) are compensating for it in their own Arquillian plugin.

Azquelt · 2024-09-11T14:37:31Z

Note that DeploymentException in this case is a CDI spec exception. If the websocket spec doesn't specify a specific exception, I would use @ShouldThrowException(Exception.class).

The arqillian support feature looks for and reports the exceptions thrown during app startup so that we can check the correct exception was thrown so you should have it installed and enabled if you're running tests expecting deployments to fail. (The arquillian container will fall back to searching FFDC logs if the support feature is not installed, but you don't always get FFDCs out if several tests cause similiar exceptions to be thrown).

brideck · 2024-09-11T17:21:30Z

Noted. There is likewise a jakarta.websocket.DeploymentException that is expected here per the comments in the various tests, so I'll give that a go first. I'll make sure that's an appropriate specific expectation of the spec before pursuing any changes in the TCK.

volosied · 2024-09-12T17:18:35Z

Hey @Azquelt,

Brian tried the @ShouldThrowException, but said it didn't work. I looked at the implementation and found this code:

liberty-arquillian/liberty-managed/src/main/java/io/openliberty/arquillian/managed/WLPManagedContainer.java

Line 1484 in 3aec059

    
           if ( (line.contains("CWWKZ0002") || line.contains("CWWKZ0001I")) && line.contains(applicationName)) {

It looks for the CWWKZ0002 error message and then throws a DeploymentException to Arquillian. However, in our websocket scenario, we don't log CWWKZ0002.

We only log:

[ERROR   ] CWWKH0022E: Exception occurred during WebSocket application deployment because @PathParam parameter arg defined in method onClose has invalid type in Annotated endpoint class

This looks like it might cause issues with how @ShouldThrowException is expected to work?

Azquelt · 2024-09-17T13:06:18Z

@volosied CWWKZ0002 is logged when the application fails to start, does the application not fail to start in this instance?

Why is CWWKZ0002 not logged?

Does CWWKZ0001I get logged? (Application started successfully)

volosied · 2024-09-17T13:46:09Z

The app does not start, and these are the only messages from the console.log. I don't know why CWWKZ0002 is not logged.

Launching defaultServer (Open Liberty 24.0.0.10/wlp-1.0.94.202409101606) on OpenJDK 64-Bit Server VM, version 17.0.7+7 (en_US)
[AUDIT   ] CWWKE0001I: The server defaultServer has been launched.
[AUDIT   ] CWWKZ0058I: Monitoring dropins for applications.
[AUDIT   ] CWWKT0016I: Web application available (default_host): http://localhost:9080/wsc_negdep_invalidpathparamtype_srv_onclose_web/
[ERROR   ] CWWKH0022E: Exception occurred during WebSocket application deployment because @PathParam parameter arg defined in method onClose has invalid type in Annotated endpoint class com.sun.ts.tests.websocket.negdep.invalidpathparamtype.srv.onclose.OnCloseStringHolderServerEndpoint.
[AUDIT   ] CWWKZ0012I: The application wsc_negdep_invalidpathparamtype_srv_onclose_web was not started.
[AUDIT   ] CWWKT0017I: Web application removed (default_host): http://localhost:9080/wsc_negdep_invalidpathparamtype_srv_onclose_web/
[AUDIT   ] CWWKF0012I: The server installed the following features: [servlet-6.1, websocket-2.2].
[AUDIT   ] CWWKF0011I: The defaultServer server is ready to run a smarter planet. The defaultServer server started in 1.842 seconds.

volosied · 2024-09-17T16:23:58Z

I look a look at one CDI application (LegacyNonEmpty that reports the CWWKZ0002E error), and I found one difference.

CDI has an implementation for applicationStarting:
https://github.com/OpenLiberty/open-liberty/blob/662ce6274ec2cee5476283980fe61d7bd8872f25/dev/com.ibm.ws.cdi.weld/src/com/ibm/ws/cdi/liberty/CDIRuntimeImpl.java#L453

WebContainer does not:
https://github.com/OpenLiberty/open-liberty/blob/662ce6274ec2cee5476283980fe61d7bd8872f25/dev/com.ibm.ws.webcontainer/src/com/ibm/ws/webcontainer/osgi/WebContainerListener.java#L165

However, I don't know all the various paths the CWWKZ0002 can be reported.

UPDATE:

I was looking at the webcontainer initialization process and found this code:

https://github.com/OpenLiberty/open-liberty/blob/662ce6274ec2cee5476283980fe61d7bd8872f25/dev/com.ibm.ws.webcontainer/src/com/ibm/ws/webcontainer/osgi/WebContainer.java#L987-L1011

If I set <webContainer deferServletLoad="false"/> then the StateChangeException("startWebApplication"); is thrown and the CWWKZ0002E message occurs:

[ERROR   ] CWWKZ0002E: An exception occurred while starting the application wsc_negdep_invalidpathparamtype_srv_onclose_web. The exception message was: com.ibm.ws.container.service.state.StateChangeException: com.ibm.ws.container.service.state.StateChangeException: startWebApplication

It's not the original websocket deployment error, but it looks like this is where we need to look further. It seems like we need to throw a StateChangeException to get the CWWKZ0002 message?

However, I'm worried about updating any code here. It could cause a regression. I'm also not familiar with the deployment code either.

The default code is executed from the Runnable (run doesn't throw any exceptions), so I don't think we can just simply throw an exception here. Likely why it was coded this way.

Azquelt · 2024-09-18T10:46:51Z

It's not the original websocket deployment error, but it looks like this is where we need to look further. It seems like we need to throw a StateChangeException to get the CWWKZ0002 message?

Possibly? I'm not sure. For CDI it looks like we do throw a StateChangeException with the actual exception as the cause.

Arquillian will check the exception thrown and all of its causes for the expected exception when a deployment fails.

However, I'm worried about updating any code here. It could cause a regression. I'm also not familiar with the deployment code either.

I don't think we should be changing any liberty code here, but we need to understand how the startup failure is being reported.

At the moment, the liberty-arquillian container assumes that an application will either start successfully (and log CWWKZ0001) or fail to start (and log CWWKZ0002).

It then has various ways of getting the exception that caused the failure (see the classes in the exceptions package), but if you have the arquillian-support-feature, it will call that which uses IncidentListener which listens for a StateChangeException being recorded by the FFDC mechanism.

It sounds like in your case, both of those things aren't true?

It looks like the message that reports that the app fails to start is

[AUDIT   ] CWWKZ0012I: The application wsc_negdep_invalidpathparamtype_srv_onclose_web was not started.

so we probably need to look for that as well as CWWKZ0002.

I'm not familiar with the deployment code either, so I'm not sure when you would get CWWKZ0002 and when you would get CWWKZ0012.

We then also need a way of getting hold of the actual exception that occurred (ideally we would enable arquillian support feature to report the exact exception thrown, though we could also make up an exception based on the logs as the fallback logic in CDILogExceptionLocator does). I would suggest you discuss with the liberty kernel team on what they think is the best way to do this.

volosied · 2024-10-03T19:35:15Z

@Azquelt I was able to get the following message logged ( with the help of <webContainer deferServletLoad="false"/>) :

[ERROR ] CWWKZ0002E: An exception occurred while starting the application wsc_negdep_invalidpathparamtype_srv_onclose_web (2). The exception message was: com.ibm.ws.container.service.state.StateChangeException: jakarta.websocket.DeploymentException: CWWKH0022E: Exception occurred during WebSocket application deployment because @PathParam parameter arg defined in method onClose has invalid type in Annotated endpoint class com.sun.ts.tests.websocket.negdep.invalidpathparamtype.srv.onclose.OnCloseStringHolderServerEndpoint.

From here, arquillian should be able to pick up the original exception -- jakarta.websocket.DeploymentException when we use@ShouldThrowException(DeploymentException.class)?

Azquelt · 2024-10-04T12:03:55Z

Yes, if you get that message and an FFDC for StateChangeException which has your DeploymentException somewhere in the cause chain, it should work with @ShouldThrowException.

Ideally though, the liberty-arquillian container would pick up the exception however it's thrown.

jhanders34 · 2024-10-14T23:24:56Z

I thought we did work to make <webContainer deferServletLoad="false"/> be the default, but looking at the metatype I am wrong. I think it is worth having Brian run with that setting in his server.xml files.

volosied · 2024-10-15T19:43:18Z

Hey @jhanders34

@brideck was able to make some progress with this issue through deferServletLoad=false. (I also updated the webcontainer code slightly to have the StateChangeException log the jakarta.websocket.DeploymentException as seen above).

However, some more issues were discovered in the websocket TCK. It's likely going to be challenge, and we'll see where the conversation goes from there.

brideck · 2024-12-11T18:44:23Z

@volosied has opened a challenge against the WebSocket TCK -- jakartaee/websocket#465

We'll see if that makes any headway for us.

volosied · 2024-12-15T02:45:58Z

Mark isn't against improving the websocket tests, but he thinks it's a liberty / arquillian integration problem.

The Tomcat and Glassfish Arquillian plugins have a simpler deploy method, so Brian will see what he can do with our code. Any changes will be hidden behind a property, of course.

Azquelt · 2024-12-16T15:10:16Z

I'm not sure why he thinks it's a liberty / arquillian integration problem. As far as I can see, the test expects an application to start. If it's valid for the application to not start in response to the exception that the spec requires, then the test is wrong.

The issue that you originally raised where an app can fail to start without logging CWWKZ0002 is a bug (or at least a missing feature) in the liberty arquillian integration, but I don't understand how the issue you've reported upstream can be.

brideck · 2024-12-18T21:03:51Z

Found that we actually have a special case for these very tests in the legacy porting package for the Platform TCK. That code prevents the deployer from throwing a deployment exception when the app fails to start, and allows it to proceed.

For better or worse, I've added a PR here that will essentially provide the equivalent behavior option to this plugin, as it matches our precedent. I suspect that trying to force a TCK change would require everyone else to update their plugins in order to handle the deployment exception case properly.

Azquelt · 2024-12-19T10:26:35Z

Found that we actually have a special case for these very tests in the legacy porting package for the Platform TCK. That code prevents the deployer from throwing a deployment exception when the app fails to start, and allows it to proceed.

Good detective work, at least this means we know it's not a new problem.

I suspect that trying to force a TCK change would require everyone else to update their plugins in order to handle the deployment exception case properly.

I suspect it wouldn't, since they would need to implement it at least somewhat correctly to pass the CDI TCK.

Practically I can understand not fixing it until the next release.

brideck · 2024-12-19T17:15:43Z

I suspect it wouldn't, since they would need to implement it at least somewhat correctly to pass the CDI TCK.

This is likely true for full EE implementations like Glassfish that have had to concern themselves with the CDI TCK. Potentially less likely true for other implementers of WebSocket, like Tomcat, since these tests are used for them, too.

brideck · 2024-12-20T00:28:12Z

I've continued pushing on the TCK side as well. They handed me a Glassfish + Tyrus runner, and interestingly they don't seem to fail app deployment for these at all. Maybe we can get them to throw these tests out for EE 11/WS 2.2 and address in future.

Azquelt · 2024-12-20T12:03:22Z

Certainly it's difficult to test if the spec isn't clear on whether the app should deploy successfully or not. The plot thickens :)

brideck added the jakartaEE label Sep 10, 2024

brideck changed the title ~~Add support for tests with intentional deployment failures~~ Add support for tests with intentional app deployment failures Sep 10, 2024

brideck mentioned this issue Dec 18, 2024

Add allowAppDeployFailures boolean property #144

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for tests with intentional app deployment failures #137

Add support for tests with intentional app deployment failures #137

brideck commented Sep 10, 2024

Azquelt commented Sep 11, 2024

Azquelt commented Sep 11, 2024

brideck commented Sep 11, 2024

Azquelt commented Sep 11, 2024 •

edited

Loading

brideck commented Sep 11, 2024

volosied commented Sep 12, 2024

Azquelt commented Sep 17, 2024 •

edited

Loading

volosied commented Sep 17, 2024

volosied commented Sep 17, 2024 •

edited

Loading

Azquelt commented Sep 18, 2024 •

edited

Loading

volosied commented Oct 3, 2024

Azquelt commented Oct 4, 2024 •

edited

Loading

jhanders34 commented Oct 14, 2024

volosied commented Oct 15, 2024 •

edited

Loading

brideck commented Dec 11, 2024

volosied commented Dec 15, 2024

Azquelt commented Dec 16, 2024

brideck commented Dec 18, 2024

Azquelt commented Dec 19, 2024 •

edited

Loading

brideck commented Dec 19, 2024

brideck commented Dec 20, 2024

Azquelt commented Dec 20, 2024

Add support for tests with intentional app deployment failures #137

Add support for tests with intentional app deployment failures #137

Comments

brideck commented Sep 10, 2024

Azquelt commented Sep 11, 2024

Azquelt commented Sep 11, 2024

brideck commented Sep 11, 2024

Azquelt commented Sep 11, 2024 • edited Loading

brideck commented Sep 11, 2024

volosied commented Sep 12, 2024

Azquelt commented Sep 17, 2024 • edited Loading

volosied commented Sep 17, 2024

volosied commented Sep 17, 2024 • edited Loading

Azquelt commented Sep 18, 2024 • edited Loading

volosied commented Oct 3, 2024

Azquelt commented Oct 4, 2024 • edited Loading

jhanders34 commented Oct 14, 2024

volosied commented Oct 15, 2024 • edited Loading

brideck commented Dec 11, 2024

volosied commented Dec 15, 2024

Azquelt commented Dec 16, 2024

brideck commented Dec 18, 2024

Azquelt commented Dec 19, 2024 • edited Loading

brideck commented Dec 19, 2024

brideck commented Dec 20, 2024

Azquelt commented Dec 20, 2024

Azquelt commented Sep 11, 2024 •

edited

Loading

Azquelt commented Sep 17, 2024 •

edited

Loading

volosied commented Sep 17, 2024 •

edited

Loading

Azquelt commented Sep 18, 2024 •

edited

Loading

Azquelt commented Oct 4, 2024 •

edited

Loading

volosied commented Oct 15, 2024 •

edited

Loading

Azquelt commented Dec 19, 2024 •

edited

Loading