Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for tests with intentional app deployment failures #137

Open
brideck opened this issue Sep 10, 2024 · 22 comments
Open

Add support for tests with intentional app deployment failures #137

brideck opened this issue Sep 10, 2024 · 22 comments

Comments

@brideck
Copy link
Member

brideck commented Sep 10, 2024

Some TCKs contain tests that are intentionally causing and testing deployment failures. These have no hope of passing currently because any deployed app that fails to start causes an exception to be thrown from the plugin:

[ERROR] com.sun.ts.tests.websocket.negdep.invalidpathparamtype.srv.onclose.WSCClientIT -- Time elapsed: 193.6 s <<< ERROR!
org.jboss.arquillian.container.spi.client.container.DeploymentException: Timeout while waiting for "wsc_negdep_invalidpathparamtype_srv_onclose_web" ApplicationMBean to reach STARTED. Actual state: INSTALLED.
	at io.openliberty.arquillian.managed.WLPManagedContainer.waitForApplicationTargetState(WLPManagedContainer.java:1280)
	at io.openliberty.arquillian.managed.WLPManagedContainer.deploy(WLPManagedContainer.java:539)
	at org.jboss.arquillian.container.impl.client.container.ContainerDeployController$3.call(ContainerDeployController.java:151)
	at org.jboss.arquillian.container.impl.client.container.ContainerDeployController$3.call(ContainerDeployController.java:118)
	at org.jboss.arquillian.container.impl.client.container.ContainerDeployController.executeOperation(ContainerDeployController.java:239)
	at org.jboss.arquillian.container.impl.client.container.ContainerDeployController.deploy(ContainerDeployController.java:118)
        ...

I'm not sure if there's a better way to check for this particular scenario, but we could add a configuration property (e.g. allowAppDeployFailures) that makes it so the expected/desired state of deployed applications to just be INSTALLED instead of STARTED.

@brideck brideck changed the title Add support for tests with intentional deployment failures Add support for tests with intentional app deployment failures Sep 10, 2024
@Azquelt
Copy link
Member

Azquelt commented Sep 11, 2024

There are already many TCK tests which intentionally cause deployment failures in the CDI TCK which have no issues, so we need to see what's different with these tests.

@Azquelt
Copy link
Member

Azquelt commented Sep 11, 2024

Assuming we're talking about this test: https://github.com/jakartaee/platform-tck/blob/c4d7da5b46aae3acaf8f9f2eed1ac2b07bb25468/websocket/spec-tests/src/main/java/com/sun/ts/tests/websocket/negdep/malformedpath/WSCClientIT.java#L46-L47

Comparing to LookupAmbiguousTest from the CDI TCK, the CDI test has @ShouldThrowException(DeploymentException.class) to indicate that deployment of the application is expected to fail.

The platform TCK test does not have this annotation so I'm not sure how the arquillian container is supposed to know that that deployment is not expected to succeed.

@brideck
Copy link
Member Author

brideck commented Sep 11, 2024

That test among others, yeah. Thanks for sharing the mechanism for this. I can update all of the failing tests locally and see if that gets OL a cleaner looking result. If so, this becomes an obvious TCK challenge.

Curious how Tomcat might be passing this TCK in its current state, unless they either a) aren't failing the deployments or b) are compensating for it in their own Arquillian plugin.

@Azquelt
Copy link
Member

Azquelt commented Sep 11, 2024

Note that DeploymentException in this case is a CDI spec exception. If the websocket spec doesn't specify a specific exception, I would use @ShouldThrowException(Exception.class).

The arqillian support feature looks for and reports the exceptions thrown during app startup so that we can check the correct exception was thrown so you should have it installed and enabled if you're running tests expecting deployments to fail. (The arquillian container will fall back to searching FFDC logs if the support feature is not installed, but you don't always get FFDCs out if several tests cause similiar exceptions to be thrown).

@brideck
Copy link
Member Author

brideck commented Sep 11, 2024

Noted. There is likewise a jakarta.websocket.DeploymentException that is expected here per the comments in the various tests, so I'll give that a go first. I'll make sure that's an appropriate specific expectation of the spec before pursuing any changes in the TCK.

@volosied
Copy link
Member

Hey @Azquelt,

Brian tried the @ShouldThrowException, but said it didn't work. I looked at the implementation and found this code:

if ( (line.contains("CWWKZ0002") || line.contains("CWWKZ0001I")) && line.contains(applicationName)) {

It looks for the CWWKZ0002 error message and then throws a DeploymentException to Arquillian. However, in our websocket scenario, we don't log CWWKZ0002.

We only log:

[ERROR   ] CWWKH0022E: Exception occurred during WebSocket application deployment because @PathParam parameter arg defined in method onClose has invalid type in Annotated endpoint class 

This looks like it might cause issues with how @ShouldThrowException is expected to work?

@Azquelt
Copy link
Member

Azquelt commented Sep 17, 2024

@volosied CWWKZ0002 is logged when the application fails to start, does the application not fail to start in this instance?

Why is CWWKZ0002 not logged?

Does CWWKZ0001I get logged? (Application started successfully)

@volosied
Copy link
Member

The app does not start, and these are the only messages from the console.log. I don't know why CWWKZ0002 is not logged.

Launching defaultServer (Open Liberty 24.0.0.10/wlp-1.0.94.202409101606) on OpenJDK 64-Bit Server VM, version 17.0.7+7 (en_US)
[AUDIT   ] CWWKE0001I: The server defaultServer has been launched.
[AUDIT   ] CWWKZ0058I: Monitoring dropins for applications.
[AUDIT   ] CWWKT0016I: Web application available (default_host): http://localhost:9080/wsc_negdep_invalidpathparamtype_srv_onclose_web/
[ERROR   ] CWWKH0022E: Exception occurred during WebSocket application deployment because @PathParam parameter arg defined in method onClose has invalid type in Annotated endpoint class com.sun.ts.tests.websocket.negdep.invalidpathparamtype.srv.onclose.OnCloseStringHolderServerEndpoint.
[AUDIT   ] CWWKZ0012I: The application wsc_negdep_invalidpathparamtype_srv_onclose_web was not started.
[AUDIT   ] CWWKT0017I: Web application removed (default_host): http://localhost:9080/wsc_negdep_invalidpathparamtype_srv_onclose_web/
[AUDIT   ] CWWKF0012I: The server installed the following features: [servlet-6.1, websocket-2.2].
[AUDIT   ] CWWKF0011I: The defaultServer server is ready to run a smarter planet. The defaultServer server started in 1.842 seconds.

@volosied
Copy link
Member

volosied commented Sep 17, 2024

I look a look at one CDI application (LegacyNonEmpty that reports the CWWKZ0002E error), and I found one difference.

CDI has an implementation for applicationStarting:
https://github.com/OpenLiberty/open-liberty/blob/662ce6274ec2cee5476283980fe61d7bd8872f25/dev/com.ibm.ws.cdi.weld/src/com/ibm/ws/cdi/liberty/CDIRuntimeImpl.java#L453

WebContainer does not:
https://github.com/OpenLiberty/open-liberty/blob/662ce6274ec2cee5476283980fe61d7bd8872f25/dev/com.ibm.ws.webcontainer/src/com/ibm/ws/webcontainer/osgi/WebContainerListener.java#L165

However, I don't know all the various paths the CWWKZ0002 can be reported.

UPDATE:

I was looking at the webcontainer initialization process and found this code:

https://github.com/OpenLiberty/open-liberty/blob/662ce6274ec2cee5476283980fe61d7bd8872f25/dev/com.ibm.ws.webcontainer/src/com/ibm/ws/webcontainer/osgi/WebContainer.java#L987-L1011

If I set <webContainer deferServletLoad="false"/> then the StateChangeException("startWebApplication"); is thrown and the CWWKZ0002E message occurs:

[ERROR   ] CWWKZ0002E: An exception occurred while starting the application wsc_negdep_invalidpathparamtype_srv_onclose_web. The exception message was: com.ibm.ws.container.service.state.StateChangeException: com.ibm.ws.container.service.state.StateChangeException: startWebApplication

It's not the original websocket deployment error, but it looks like this is where we need to look further. It seems like we need to throw a StateChangeException to get the CWWKZ0002 message?

However, I'm worried about updating any code here. It could cause a regression. I'm also not familiar with the deployment code either.

The default code is executed from the Runnable (run doesn't throw any exceptions), so I don't think we can just simply throw an exception here. Likely why it was coded this way.

@Azquelt
Copy link
Member

Azquelt commented Sep 18, 2024

It's not the original websocket deployment error, but it looks like this is where we need to look further. It seems like we need to throw a StateChangeException to get the CWWKZ0002 message?

Possibly? I'm not sure. For CDI it looks like we do throw a StateChangeException with the actual exception as the cause.

Arquillian will check the exception thrown and all of its causes for the expected exception when a deployment fails.

However, I'm worried about updating any code here. It could cause a regression. I'm also not familiar with the deployment code either.

I don't think we should be changing any liberty code here, but we need to understand how the startup failure is being reported.

At the moment, the liberty-arquillian container assumes that an application will either start successfully (and log CWWKZ0001) or fail to start (and log CWWKZ0002).

It then has various ways of getting the exception that caused the failure (see the classes in the exceptions package), but if you have the arquillian-support-feature, it will call that which uses IncidentListener which listens for a StateChangeException being recorded by the FFDC mechanism.

It sounds like in your case, both of those things aren't true?

It looks like the message that reports that the app fails to start is

[AUDIT   ] CWWKZ0012I: The application wsc_negdep_invalidpathparamtype_srv_onclose_web was not started.

so we probably need to look for that as well as CWWKZ0002.

I'm not familiar with the deployment code either, so I'm not sure when you would get CWWKZ0002 and when you would get CWWKZ0012.

We then also need a way of getting hold of the actual exception that occurred (ideally we would enable arquillian support feature to report the exact exception thrown, though we could also make up an exception based on the logs as the fallback logic in CDILogExceptionLocator does). I would suggest you discuss with the liberty kernel team on what they think is the best way to do this.

@volosied
Copy link
Member

volosied commented Oct 3, 2024

@Azquelt I was able to get the following message logged ( with the help of <webContainer deferServletLoad="false"/>) :

[ERROR ] CWWKZ0002E: An exception occurred while starting the application wsc_negdep_invalidpathparamtype_srv_onclose_web (2). The exception message was: com.ibm.ws.container.service.state.StateChangeException: jakarta.websocket.DeploymentException: CWWKH0022E: Exception occurred during WebSocket application deployment because @PathParam parameter arg defined in method onClose has invalid type in Annotated endpoint class com.sun.ts.tests.websocket.negdep.invalidpathparamtype.srv.onclose.OnCloseStringHolderServerEndpoint.

From here, arquillian should be able to pick up the original exception -- jakarta.websocket.DeploymentException when we use@ShouldThrowException(DeploymentException.class)?

@Azquelt
Copy link
Member

Azquelt commented Oct 4, 2024

Yes, if you get that message and an FFDC for StateChangeException which has your DeploymentException somewhere in the cause chain, it should work with @ShouldThrowException.

Ideally though, the liberty-arquillian container would pick up the exception however it's thrown.

@jhanders34
Copy link
Member

I thought we did work to make <webContainer deferServletLoad="false"/> be the default, but looking at the metatype I am wrong. I think it is worth having Brian run with that setting in his server.xml files.

@volosied
Copy link
Member

volosied commented Oct 15, 2024

Hey @jhanders34

@brideck was able to make some progress with this issue through deferServletLoad=false. (I also updated the webcontainer code slightly to have the StateChangeException log the jakarta.websocket.DeploymentException as seen above).

However, some more issues were discovered in the websocket TCK. It's likely going to be challenge, and we'll see where the conversation goes from there.

@brideck
Copy link
Member Author

brideck commented Dec 11, 2024

@volosied has opened a challenge against the WebSocket TCK -- jakartaee/websocket#465

We'll see if that makes any headway for us.

@volosied
Copy link
Member

Mark isn't against improving the websocket tests, but he thinks it's a liberty / arquillian integration problem.

The Tomcat and Glassfish Arquillian plugins have a simpler deploy method, so Brian will see what he can do with our code. Any changes will be hidden behind a property, of course.

@Azquelt
Copy link
Member

Azquelt commented Dec 16, 2024

I'm not sure why he thinks it's a liberty / arquillian integration problem. As far as I can see, the test expects an application to start. If it's valid for the application to not start in response to the exception that the spec requires, then the test is wrong.

The issue that you originally raised where an app can fail to start without logging CWWKZ0002 is a bug (or at least a missing feature) in the liberty arquillian integration, but I don't understand how the issue you've reported upstream can be.

@brideck
Copy link
Member Author

brideck commented Dec 18, 2024

Found that we actually have a special case for these very tests in the legacy porting package for the Platform TCK. That code prevents the deployer from throwing a deployment exception when the app fails to start, and allows it to proceed.

For better or worse, I've added a PR here that will essentially provide the equivalent behavior option to this plugin, as it matches our precedent. I suspect that trying to force a TCK change would require everyone else to update their plugins in order to handle the deployment exception case properly.

@Azquelt
Copy link
Member

Azquelt commented Dec 19, 2024

Found that we actually have a special case for these very tests in the legacy porting package for the Platform TCK. That code prevents the deployer from throwing a deployment exception when the app fails to start, and allows it to proceed.

Good detective work, at least this means we know it's not a new problem.

I suspect that trying to force a TCK change would require everyone else to update their plugins in order to handle the deployment exception case properly.

I suspect it wouldn't, since they would need to implement it at least somewhat correctly to pass the CDI TCK.

Practically I can understand not fixing it until the next release.

@brideck
Copy link
Member Author

brideck commented Dec 19, 2024

I suspect it wouldn't, since they would need to implement it at least somewhat correctly to pass the CDI TCK.

This is likely true for full EE implementations like Glassfish that have had to concern themselves with the CDI TCK. Potentially less likely true for other implementers of WebSocket, like Tomcat, since these tests are used for them, too.

@brideck
Copy link
Member Author

brideck commented Dec 20, 2024

I've continued pushing on the TCK side as well. They handed me a Glassfish + Tyrus runner, and interestingly they don't seem to fail app deployment for these at all. Maybe we can get them to throw these tests out for EE 11/WS 2.2 and address in future.

@Azquelt
Copy link
Member

Azquelt commented Dec 20, 2024

Certainly it's difficult to test if the spec isn't clear on whether the app should deploy successfully or not. The plot thickens :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants