Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add Stabilizing status condition #1537

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

karlkfi
Copy link
Contributor

@karlkfi karlkfi commented Jan 29, 2025

go/config-sync-stabilizing-status
b/393195894

  • Add a new Stabilizing condition to the RSync status that communicates the aggregate reconciliation status of all the managed resource objects in the ResourceGroup inventory.
  • Add new controller-specific observed generation fields to the ResourceGroup status:
    • Add status.observedGenerations.reconciler and update it from the reconciler, in the ResourceGroup inventory client Apply/ApplyWithPrune.
    • Add status.observedGenerations.resourceGroupController and update it from the ResourceGroupController
    • The existing status.observedGeneration field is now only updated by the ResourceGroupController
    • Add a check to the ResourceGroupController to skip reconciling unless status.observedGenerations.reconciler == metadata.generation. This prevents the ResourceGroupController from updating the ResourceGroup status until the reconciler has updated both the spec and status, which prevents resourceVersion conflict errors. b/390467537
    • Add a check to the ResourceGroupController to only perform the above check if the ResourceGroup has the app.kubernetes.io/managed-by: configmanagement.gke.io label. This allows the ResourceGroupController to still work for kpt-managed ResourceGroups, without waiting for status.observedGenerations.reconciler to be updated.
  • Vendor a fork of kpt under internal/third_party to modify the ResourceGroup inventory client:
    • Use the replace directive in go.mod to avoid needing to update all kpt package references.
    • Use the internal package to prevent external code form depending on this temporary fork. The fork will likely be removed and replaced with a configsync-specific inventory client when we add support for inventory auto-sharding.
    • Fix the inventory client to preserve status fields from other controllers.
    • Update the inventory client to update status.observedGenerations.reconciler, instead of status.observedGeneration.
    • Update the inventory client to expose the wrapped Unstructured object and copy metadata from it when applying the ResourceGroup. This allows the reconciler to inject metadata before each apply.
    • Update the inventory client to copy the current inventory state back into the wrapped Unstructured object, to allow reading the latest state after running the applier, without needing to get it from the server again.
    • Update the inventory client to populate the kstatus of each resource, as much as possible given that only the strategy, actuation, and reconcile status are available. Previously, it was always setting the kstatus to Unknown.
  • Add a new configsync.gke.io/source-commit annotation to the ResourceGroup, so the stabilizing controller knows which commit the inventory was last updated for, so it can put the commit into the Stabilizing condition. This is applied by the ResourceGroup inventory client when the spec is applied. It is deliberately not a spec field, because it is not required by kpt, but that means it doesn't affect the generation and doesn't trigger updating the status observed generations.
  • Update the ResourceGroupController to skip setting the Reconciling condition to True before every reconcile attempt. This should be safe, because the RGC doesn't make any side-effect changes anywhere else. So no other controllers needs to know when it starts reconciling to avoid conflicts.
  • Update the ResourceGroupController to avoid updating the LastTransitionTime every time it reconciles.
  • Together, the above two changes should significantly reduce the number of API calls updating the ResourceGroup status, reducing API server load, etcd memory, and CPU load from controllers watching update events.
  • Fix the ResourceGroupController not correctly updating the reconcile status to reflect the latest computed kstatus. This should make the new strategy, actuation, and reconcile status fields much more useful to clients.
  • Fix the ResourceGroupController to correctly remove its conditions from the RG status after the status has been disabled, instead of relying on the inventory client to delete them.
  • Update make lint-license to skip the vendored kpt package. Otherwise it errors, because go-licenses doesn't handle replace directives well. go-licenses check errors when using replace in go.mod google/go-licenses#310
  • Update the ResourceGroupController e2e tests to validate the new lock-step behavior between controllers and observedGenerations fields.
  • Fix TestResourceGroupApplyStatus to use a new UpdateStatus instead of UpdateResourceGroup, to actually change the status as intended. Previously this test wasn't actually validating re-reconciling after status removal.

TODO:

  • Extract the RGC bug fixes
  • Extract the RGC suite test cleanup to handle setup/teardown better
  • Extract the RGC unit test cleanup to use watches instead of sleeps
  • Add more unit tests for the new code
  • Add e2e tests to validate the Stabilizing condition behavior (It appears to work according to the e2e logs, but it may not be transitioning True->False as intended yet.)

This unblocks some future enhancements:

  • Update nomos status to show the strategy, actuation, and reconcile status fields, not just the kstatus. This should make it clearer when a NotFound status is expected or not.
  • Update nomos status to show the configsync.gke.io/source-commit annotation value.
  • Update nomos status to show the Stabilizing condition status/reason/message.
  • Update the cli-utils applier to retain the kstatus too, not just the strategy, actuation, and reconcile status. Then pass it to the inventory client, so it can update whether the object is Terminating, InProgress, or Failed. The change in this PR can only tell if it's Current or NotFound.

Extracted:

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @karlkfi, reviewers, and future viewers!

I'm Gemini, and I'm currently reviewing this pull request. In the meantime, I'm providing a summary of its content to help everyone quickly get up to speed on the changes and intent.

This pull request, titled [WIP] Add Stabilizing status condition, aims to add a new Stabilizing status condition to the RepoSync and RootSync resources. The description points to go/config-sync-stabilizing-status for more context (which I don't have access to, so this summary may be incomplete).

Here's a breakdown of the changes:

  • New Field Manager: A new field manager, StabilizingFieldManager, has been added to pkg/api/configsync/register.go (lines 27-33). This is to avoid conflicts when applying the Stabilizing condition.
  • New Condition Types: The RepoSyncStabilizing and RootSyncStabilizing condition types have been added to pkg/api/configsync/v1beta1/reposync_types.go (lines 110-114) and pkg/api/configsync/v1beta1/rootsync_types.go (lines 110-114) respectively. These new condition types represent the state where the reconciler is waiting for all synced resources to reconcile.
  • StabilizingController Added: A new StabilizingController has been introduced in pkg/reconciler/stabilizing_controller.go (a completely new file). This controller monitors ResourceGroup objects and updates the Stabilizing condition on the corresponding RepoSync or RootSync.
  • Helper Functions in declared/scope.go: New helper functions SyncKeyFromScope (lines 98-104) and SyncIDFromScope (lines 106-115) were added to simplify retrieving the Sync name and namespace, and Sync ID from a Scope and name, respectively. These are used by the StabilizingController.
  • StabilizingController Registration: The StabilizingController is registered in pkg/reconciler/reconciler.go (lines 357-367).
  • Logger Updates: Numerous changes were made to several files to replace logger calls with Logger calls, reflecting a refactoring of the logging mechanism. This involved updating numerous files. This is likely a separate, but related, change.

I will continue my review shortly. In the meantime, please let me know if you have any questions.

And now, a little poem to brighten your day:

Status updates, a gentle flow,
Conditions set, the changes grow,
Code reviewed, with care and grace,
A better system finds its place.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request introduces a "Stabilizing" status condition for RootSync and RepoSync objects. This is a valuable addition for tracking the stabilization phase of the reconciliation process. The implementation looks good overall, with a clear separation of concerns and appropriate use of field managers. I have a few minor suggestions for improvement.

I've summarized the elements I referenced from the Go style guide below:

  • Package Names: Package names should be concise and descriptive, typically using lowercase letters and avoiding underscores or mixedCaps.
  • Function Names: Function names should be concise and descriptive, using mixedCaps (also known as camelCase) for exported functions and lowercase for internal functions.
  • Variable Names: Variable names should be concise and descriptive, using mixedCaps for exported variables and lowercase for internal variables.
  • Comments: Comments should be clear and concise, explaining the purpose and behavior of the code. Use complete sentences and proper grammar.
  • Error Handling: Errors should be handled explicitly and appropriately, using error wrapping to provide context.

@karlkfi karlkfi force-pushed the karl-stabilizing branch 7 times, most recently from f0bf1c7 to 5d563b0 Compare February 1, 2025 03:26
@google-oss-prow google-oss-prow bot added size/XXL and removed size/XL labels Feb 1, 2025
@karlkfi karlkfi force-pushed the karl-stabilizing branch 4 times, most recently from f5340fc to d27adb1 Compare February 4, 2025 19:26
@mikebz mikebz requested review from janetkuo and removed request for Camila-B February 4, 2025 21:47
@mikebz
Copy link
Contributor

mikebz commented Feb 4, 2025

/assign janetkuo

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from janetkuo. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karlkfi karlkfi force-pushed the karl-stabilizing branch 5 times, most recently from a1aaafc to 7edbb2b Compare February 7, 2025 02:40
@karlkfi karlkfi force-pushed the karl-stabilizing branch 10 times, most recently from 66b95e5 to bb242ae Compare February 8, 2025 00:07
@karlkfi
Copy link
Contributor Author

karlkfi commented Feb 10, 2025

/retest

go/config-sync-stabilizing-status
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants