Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-detailed-exitcode causes a wrong exit code to be emitted when there was a retry #3845

Open
juljaeg opened this issue Feb 6, 2025 · 4 comments
Labels
bug Something isn't working preserved Preserved issues never go stale

Comments

@juljaeg
Copy link

juljaeg commented Feb 6, 2025

Describe the bug

When using -detailed-exitcode this causes a wrong exit code to be emitted when there was a retry according to the terragrunt config. It should show the exit code 0 if the retry was successful.

Steps To Reproduce

Configure some testing suitable retry values:

# File: terragrunt.hcl
retry_max_attempts       = 2
retry_sleep_interval_sec = 1
retryable_errors         = ["(?s).*marker.*"]

Put a dummy resource which executes the script and prints the script name on failure (i.e. triggering a retry):

# File: main.tf
data "external" "script" {
  program = ["/bin/bash", "-c", "./marker.sh"]
}

The script will fail on the first run but succeed on the second:

# File: marker.sh
#!/usr/bin/env bash
set -euo pipefail

if [[ -f .file ]]; then
  echo "{}"
  rm .file
  exit 0
else
  touch .file
  exit 1
fi

Expected behavior

Works as it should and gives us exit code 0.

$ terragrunt plan
13:59:05.069 STDOUT terraform: data.external.script: Reading...
13:59:05.099 STDOUT terraform: Planning failed. Terraform encountered an error while generating this plan.
13:59:05.099 STDOUT terraform: 
13:59:05.099 STDERR terraform: ╷
13:59:05.099 STDERR terraform: │ Error: External Program Execution Failed
13:59:05.099 STDERR terraform: │ 
13:59:05.099 STDERR terraform: │   with data.external.script,
13:59:05.099 STDERR terraform: │   on main.tf line 2, in data "external" "script":
13:59:05.099 STDERR terraform: │    2:   program = ["/bin/bash", "-c", "./marker.sh"]
13:59:05.099 STDERR terraform: │ 
13:59:05.099 STDERR terraform: │ The data source received an unexpected error while attempting to execute
13:59:05.099 STDERR terraform: │ the program.
13:59:05.099 STDERR terraform: │ 
13:59:05.099 STDERR terraform: │ The program was executed, however it returned no additional error
13:59:05.099 STDERR terraform: │ messaging.
13:59:05.099 STDERR terraform: │ 
13:59:05.099 STDERR terraform: │ Program: /bin/bash
13:59:05.099 STDERR terraform: │ State: exit status 1
13:59:05.099 STDERR terraform: ╵
13:59:05.101 INFO   Encountered an error eligible for retrying. Sleeping 1s before retrying.

13:59:06.221 STDOUT terraform: data.external.script: Reading...
13:59:06.242 STDOUT terraform: data.external.script: Read complete after 0s [id=-]
13:59:06.243 STDOUT terraform: No changes. Your infrastructure matches the configuration.
13:59:06.243 STDOUT terraform: Terraform has compared your real infrastructure against your configuration
13:59:06.243 STDOUT terraform: and found no differences, so no changes are needed.
$ echo $?
0

Will show exit code 1, although the retry was successful. It should be exit code 0. Also visible in the terminal output:

$ terragrunt plan -detailed-exitcode
13:59:33.243 STDOUT terraform: data.external.script: Reading...
13:59:33.258 STDOUT terraform: Planning failed. Terraform encountered an error while generating this plan.
13:59:33.258 STDOUT terraform: 
13:59:33.258 STDERR terraform: ╷
13:59:33.258 STDERR terraform: │ Error: External Program Execution Failed
13:59:33.258 STDERR terraform: │ 
13:59:33.258 STDERR terraform: │   with data.external.script,
13:59:33.258 STDERR terraform: │   on main.tf line 2, in data "external" "script":
13:59:33.258 STDERR terraform: │    2:   program = ["/bin/bash", "-c", "./marker.sh"]
13:59:33.258 STDERR terraform: │ 
13:59:33.258 STDERR terraform: │ The data source received an unexpected error while attempting to execute
13:59:33.258 STDERR terraform: │ the program.
13:59:33.258 STDERR terraform: │ 
13:59:33.258 STDERR terraform: │ The program was executed, however it returned no additional error
13:59:33.258 STDERR terraform: │ messaging.
13:59:33.258 STDERR terraform: │ 
13:59:33.258 STDERR terraform: │ Program: /bin/bash
13:59:33.258 STDERR terraform: │ State: exit status 1
13:59:33.258 STDERR terraform: ╵
13:59:33.261 INFO   Encountered an error eligible for retrying. Sleeping 1s before retrying.

13:59:34.388 STDOUT terraform: data.external.script: Reading...
13:59:34.404 STDOUT terraform: data.external.script: Read complete after 0s [id=-]
13:59:34.405 STDOUT terraform: No changes. Your infrastructure matches the configuration.
13:59:34.405 STDOUT terraform: Terraform has compared your real infrastructure against your configuration
13:59:34.405 STDOUT terraform: and found no differences, so no changes are needed.
$ echo $?
1

Versions

  • Terragrunt version: 0.72.6
  • OpenTofu/Terraform version: 1.10.5
  • Environment details: MacOS Sonoma 14.7

PS

Thank you for your great work! 😄

@juljaeg juljaeg added the bug Something isn't working label Feb 6, 2025
@juljaeg
Copy link
Author

juljaeg commented Feb 6, 2025

I suspect the error to hide here: https://sourcegraph.com/github.com/gruntwork-io/terragrunt@f9eb618e4a8a41facbee47d3c9b6042752596f0f/-/blob/shell/run_shell_cmd.go?L75-78 as the exit code processing is only done when there was an actual error and the previously set error code probably remains in the context and used once the function exits.

@yhakbar
Copy link
Collaborator

yhakbar commented Feb 6, 2025

Thanks for reporting this, @juljaeg !

It seems like you're also familiar with the codebase, and have found the source of the bug. Are you interested in contributing a pull request to patch the bug?

@juljaeg
Copy link
Author

juljaeg commented Feb 6, 2025

I can try, but when I look in the past where I tried to work on the terragrunt code base this was not very successful 😆 I imagine it's mainly rearranging the condition block to look something like this:

	output, err := RunShellCommandWithOutput(ctx, opts, "", false, needsPTY, opts.TerraformPath, args...)

	code, _ := util.GetExitCode(err)
	if exitCode := DetailedExitCodeFromContext(ctx); err != nil && exitCode != nil {
		exitCode.Set(code)
	}

	if util.ListContainsElement(args, terraform.FlagNameDetailedExitCode) && code != 1 {
		return output, nil
	}

	return output, err

Does that sound right? But I am definitely not familiar with possible side effects of this, let alone writing a proper unit/integration test for this 😕

I am not sure whether this is the right place to do the evaluation, shouldn't it be somewhere around here:

runTerraformError := RunTerraformWithRetry(ctx, terragruntOptions)

There one can be more certain that all retries have been done and the final result is there. So we don't need to take care of "reverting" the changes from retried run. I imagine in a concurrency scenario this might be unreliable.

EDIT 6: Also thank you for the (very) fast response 😉

@yhakbar
Copy link
Collaborator

yhakbar commented Feb 6, 2025

If you're not comfortable with cutting the pull request, don't worry! We'll mark the pull request as preserved so that it doesn't go stale, and we'll get to it when we have bandwidth.

If you are interested in ramping up for contributing to Terragrunt, join the Terragrunt Discord if you haven't already, and read the Contribution docs. Especially when it comes to edge cases like this, it's extremely valuable to the maintainers to have community members contributing to help us cover all our edge cases.

You've already written a viable fixture, which could go here (though I'd recommend that you try the retry block instead of the deprecated attributes like retryable_errors. These integration tests test those fixtures.

@yhakbar yhakbar added the preserved Preserved issues never go stale label Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working preserved Preserved issues never go stale
Projects
None yet
Development

No branches or pull requests

2 participants