test: EKS e2e test using eksctl #667

whatnick · 2024-08-27T12:12:41Z

Description

NOTE : Since this will take a bit of CI and other account provisioning planning to keep this synced to upstream once a week till across the line or I run out of juice.

Add EKS based e2e tests by execing EKSCtl to provision and delete temporary cluster. Currently at POC stage since Account setup etc. are needed to run this in practice in conjuction with secrets and variables associated with this repository.

The AWS integration should be setup via OIDC as shown here : https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services

with roles relevant to EKSCtl as shown here :

https://eksctl.io/usage/minimum-iam-policies/

Related Issue

Partially addresses #451

Checklist

I have read the contributing documentation.
I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
I have correctly attributed the author(s) of the code.
I have tested the changes locally.
I have followed the project's style guidelines.
I have updated the documentation, if necessary.
I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

With drop packets metrics Scenario disabled as per #746 , the AWS e2e test suite runs successfully.

go test -run TestE2ERetinaAWS ./test/e2e/ -timeout 40m
ok      github.com/microsoft/retina/test/e2e    1866.467s

For failing test runs cluster creation and tear down is as below.

go test -run TestE2ERetinaAWS ./test/e2e/ -timeout 30m
CreateCluster setting stored value for parameter [AccountID] set as [XXXXXXXXXXXXXX]
CreateCluster setting stored value for parameter [Region] set as [us-west-2]
CreateCluster setting stored value for parameter [ClusterName] set as [whatnick-e2e-netobs-1724757102]
CreateCluster setting stored value for parameter [KubeConfigFilePath] set as [/home/whatnick/dev/retina/test/e2e/test.pem]
#################### CreateCluster ######################################################################
2024-08-27 20:41:44 [ℹ]  eksctl version 0.189.0
2024-08-27 20:41:44 [ℹ]  using region us-west-2
2024-08-27 20:41:45 [ℹ]  setting availability zones to [us-west-2d us-west-2c us-west-2b]
2024-08-27 20:41:45 [ℹ]  subnets for us-west-2d - public:192.168.0.0/19 private:192.168.96.0/19
2024-08-27 20:41:45 [ℹ]  subnets for us-west-2c - public:192.168.32.0/19 private:192.168.128.0/19
2024-08-27 20:41:45 [ℹ]  subnets for us-west-2b - public:192.168.64.0/19 private:192.168.160.0/19
2024-08-27 20:41:45 [ℹ]  nodegroup "ng-2a2471d7" will use "" [AmazonLinux2/1.30]
2024-08-27 20:41:45 [ℹ]  using Kubernetes version 1.30
2024-08-27 20:41:45 [ℹ]  creating EKS cluster "whatnick-e2e-netobs-1724757102" in "us-west-2" region with managed nodes
2024-08-27 20:41:45 [ℹ]  will create 2 separate CloudFormation stacks for cluster itself and the initial managed nodegroup
2024-08-27 20:41:45 [ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-west-2 --cluster=whatnick-e2e-netobs-1724757102'
2024-08-27 20:41:45 [ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "whatnick-e2e-netobs-1724757102" in "us-west-2"
2024-08-27 20:41:45 [ℹ]  CloudWatch logging will not be enabled for cluster "whatnick-e2e-netobs-1724757102" in "us-west-2"
2024-08-27 20:41:45 [ℹ]  you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-west-2 --cluster=whatnick-e2e-netobs-1724757102'
2024-08-27 20:41:45 [ℹ]  default addons vpc-cni, kube-proxy, coredns were not specified, will install them as EKS addons
2024-08-27 20:41:45 [ℹ]  
2 sequential tasks: { create cluster control plane "whatnick-e2e-netobs-1724757102", 
    2 sequential sub-tasks: { 
        2 sequential sub-tasks: { 
            1 task: { create addons },
            wait for control plane to become ready,
        },
        create managed nodegroup "ng-2a2471d7",
    } 
}
2024-08-27 20:41:45 [ℹ]  building cluster stack "whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:41:46 [ℹ]  deploying stack "whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:42:16 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:42:47 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:43:48 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:44:49 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:45:50 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:46:52 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:47:53 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:48:54 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:49:55 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:50:56 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 20:51:00 [!]  recommended policies were found for "vpc-cni" addon, but since OIDC is disabled on the cluster, eksctl cannot configure the requested permissions; the recommended way to provide IAM permissions for "vpc-cni" addon is via pod identity associations; after addon creation is completed, add all recommended policies to the config file, under `addon.PodIdentityAssociations`, and run `eksctl update addon`
2024-08-27 20:51:00 [ℹ]  creating addon
2024-08-27 20:51:01 [ℹ]  successfully created addon
2024-08-27 20:51:02 [ℹ]  creating addon
2024-08-27 20:51:02 [ℹ]  successfully created addon
2024-08-27 20:51:03 [ℹ]  creating addon
2024-08-27 20:51:03 [ℹ]  successfully created addon
2024-08-27 20:53:08 [ℹ]  building managed nodegroup stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 20:53:09 [ℹ]  deploying stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 20:53:09 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 20:53:40 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 20:54:24 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 20:55:30 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 20:55:31 [ℹ]  waiting for the control plane to become ready
2024-08-27 20:55:31 [✔]  saved kubeconfig as "/home/whatnick/dev/retina/test/e2e/test.pem"
2024-08-27 20:55:31 [ℹ]  no tasks
2024-08-27 20:55:31 [✔]  all EKS cluster resources for "whatnick-e2e-netobs-1724757102" have been created
2024-08-27 20:55:31 [✔]  created 0 nodegroup(s) in cluster "whatnick-e2e-netobs-1724757102"
2024-08-27 20:55:33 [ℹ]  nodegroup "ng-2a2471d7" has 2 node(s)
2024-08-27 20:55:33 [ℹ]  node "ip-192-168-39-250.us-west-2.compute.internal" is ready
2024-08-27 20:55:33 [ℹ]  node "ip-192-168-75-149.us-west-2.compute.internal" is ready
2024-08-27 20:55:33 [ℹ]  waiting for at least 2 node(s) to become ready in "ng-2a2471d7"
2024-08-27 20:55:33 [ℹ]  nodegroup "ng-2a2471d7" has 2 node(s)
2024-08-27 20:55:33 [ℹ]  node "ip-192-168-39-250.us-west-2.compute.internal" is ready
2024-08-27 20:55:33 [ℹ]  node "ip-192-168-75-149.us-west-2.compute.internal" is ready
2024-08-27 20:55:33 [✔]  created 1 managed nodegroup(s) in cluster "whatnick-e2e-netobs-1724757102"
2024-08-27 20:55:34 [ℹ]  kubectl command should work with "/home/whatnick/dev/retina/test/e2e/test.pem", try 'kubectl --kubeconfig=/home/whatnick/dev/retina/test/e2e/test.pem get nodes'
2024-08-27 20:55:34 [✔]  EKS cluster "whatnick-e2e-netobs-1724757102" in "us-west-2" region is ready
2024/08/27 20:55:34 Cluster created successfully!
InstallHelmChart setting stored value for parameter [Namespace] set as [kube-system]
InstallHelmChart setting stored value for parameter [ReleaseName] set as [retina]
...
#################### DeleteCluster ######################################################################
2024-08-27 20:59:55 [ℹ]  deleting EKS cluster "whatnick-e2e-netobs-1724757102"
2024-08-27 20:59:57 [ℹ]  will drain 0 unmanaged nodegroup(s) in cluster "whatnick-e2e-netobs-1724757102"
2024-08-27 20:59:57 [ℹ]  starting parallel draining, max in-flight of 1
2024-08-27 20:59:57 [✖]  failed to acquire semaphore while waiting for all routines to finish: context canceled
2024-08-27 20:59:58 [ℹ]  deleted 0 Fargate profile(s)
2024-08-27 21:00:00 [ℹ]  cleaning up AWS load balancers created by Kubernetes objects of Kind Service or Ingress
2024-08-27 21:00:05 [ℹ]  
2 sequential tasks: { delete nodegroup "ng-2a2471d7", delete cluster control plane "whatnick-e2e-netobs-1724757102" [async] 
}
2024-08-27 21:00:05 [ℹ]  will delete stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:00:05 [ℹ]  waiting for stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7" to get deleted
2024-08-27 21:00:05 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:00:36 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:01:15 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:02:38 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:03:55 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:05:46 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:06:34 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:08:28 [ℹ]  waiting for CloudFormation stack "eksctl-whatnick-e2e-netobs-1724757102-nodegroup-ng-2a2471d7"
2024-08-27 21:08:28 [ℹ]  will delete stack "eksctl-whatnick-e2e-netobs-1724757102-cluster"
2024-08-27 21:08:29 [✔]  all cluster resources were deleted
2024/08/27 21:08:29 Cluster deleted successfully!

Additional Notes

The helm chart install portion of this test fails in practice presumably due to unreachable image registry. May need to push images to corresponding ECR or debug ghcr access.

Opening this PR for feedback and discussions on AWS e2e testing approach. In practice I have successfully deployed retina legacy charts in EKS.

Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

nddq · 2024-08-27T14:06:25Z

linters are flagging the exec cmds. IMO shelling out commands is not ideal here. I know that AKS has their own SDK that is able to interact with EKS https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/eks so maybe this could b[e worth look into as an alternative?

timraymond · 2024-08-27T21:56:41Z

@nddq I believe the goal is PoC here, and exec'ing is taking an efficiency here to prove the concept faster. @rbtr curious on your thoughts.

rbtr · 2024-08-27T22:03:17Z

Yeah, to do it for real we will want to use aws-sdk, but shelling out to eksctl is fine while we're just trying to say hey, Retina E2E could work on EKS

rbtr · 2024-08-27T22:07:03Z

@whatnick this is great, thanks for putting it together so fast!
While we review/discuss I do want to set the expectation appropriately that us getting an AWS account provisioned will likely be the slow/hard part of this 😓

whatnick · 2024-08-27T23:02:40Z

The aim overall is to bolster Retina's **ANY CNI** claim with a public demonstration. I just wanted to start the ball rolling and from past e2e building experience I expect this to be slow. Just a POC that this is possible and also flagging corner cases and other setup necessary - ECR repo for pulling into cluster OR managing GHCR Credentials in cluster - AWS-Github OIDC pairing for securely logging into account for GHA, setting up policies and roles on AWS side. - Setting up OIDC in EKS during cluster provisioning to hook AWS VPC CNI and demonstrate using that + retina. - Possible issues with parallel execution of Azure and AWS tests and clobbering of the kubeconfig. The slowness will also give me an opportunity to figure out pulling in eksctl go code via a `require` + `import` to avoid shelling out.

…

On Wed, Aug 28, 2024, 07:37 Evan Baker ***@***.***> wrote: @whatnick <https://github.com/whatnick> this is great, thanks for putting it together so fast! While we review/discuss I do want to set the expectation appropriately that us getting an AWS account provisioned will likely be the slow/hard part of this 😓 — Reply to this email directly, view it on GitHub <#667 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADX7BGZS7I7VYQNRV56VSTZTT2B5AVCNFSM6AAAAABNF7YW52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJTGY2TSNBQGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

matmerr · 2024-08-28T20:46:17Z

good stuff, thanks for taking a look into this @whatnick

whatnick · 2024-09-02T14:36:01Z

linters are flagging the exec cmds. IMO shelling out commands is not ideal here. I know that AKS has their own SDK that is able to interact with EKS https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/eks so maybe this could b[e worth look into as an alternative?

It has added a lot of requires, but I have updated the PoC to consume eksctl as a package and run the cobra commands. It can be slimmed down to remove fancy things like coloured logging which are not really relevant for this use-case.

github-actions · 2024-10-22T00:22:19Z

This PR will be closed in 7 days due to inactivity.

whatnick · 2024-10-22T02:11:51Z

Will merge to upstream soon.

whatnick · 2024-11-17T23:49:46Z

Currently disabled Windows tests for AWS, can enable once windows cluster setup via eksctl is tested.

github-actions · 2024-12-18T00:23:13Z

This PR will be closed in 7 days due to inactivity.

github-actions · 2024-12-26T00:22:03Z

Pull request closed due to inactivity.

whatnick · 2024-12-26T23:31:50Z

Thanks for re-opening this. Been busy otherwise , will fix conflicts and maintain it over the weekend.

github-actions · 2025-01-26T00:22:14Z

This PR will be closed in 7 days due to inactivity.

whatnick added 4 commits August 25, 2024 00:26

Start refactoring to introduce AWS cluster provisioner

1b3e69b

Test using eksctl to create cluster

80b1887

Just check AWS connectivity

6dcf90e

Revert go deps and exec to setup cluster

641a05c

whatnick requested a review from a team as a code owner August 27, 2024 12:12

whatnick requested review from karina-ranadive and spencermckee August 27, 2024 12:12

Merge branch 'main' into feat/eks-e2e-test

1484f38

nddq mentioned this pull request Aug 27, 2024

test: e2e for other CNI other than Azure #451

Open

nddq added the area/infra Test, Release, or CI Infrastructure label Aug 27, 2024

rbtr requested review from matmerr, vakalapa and neaggarwMS and removed request for karina-ranadive and spencermckee August 27, 2024 21:59

rbtr assigned timraymond and rbtr Aug 27, 2024

whatnick mentioned this pull request Aug 28, 2024

[Feature] Document Calling EKSCtl Cobra subcommands from Go eksctl-io/eksctl#7950

Closed

Merge branch 'main' into feat/eks-e2e-test

fce17f3

rbtr assigned ibezrukavyi Aug 29, 2024

whatnick added 2 commits September 1, 2024 11:15

Merge branch 'microsoft:main' into feat/eks-e2e-test

eb2edcf

Use cobra commands from eksctl directly

002c0a1

Add oidc to provision VPC CNI addon

551d5e8

whatnick mentioned this pull request Sep 14, 2024

Support or clearly indicate as unsupported the drop metrics plugin on AWS VPC CNI #746

Open

whatnick added 2 commits September 14, 2024 22:33

Create separate aws and azure e2e scenario sets

adb5bf1

Merge branch 'main' into feat/eks-e2e-test

db6d7b0

whatnick force-pushed the feat/eks-e2e-test branch from 223e957 to db6d7b0 Compare September 21, 2024 06:18

whatnick added 3 commits September 21, 2024 16:38

Merge to upstream

404dc03

Clean up modules

7c8620d

Mod tidy after eksctl addition

20a3052

ibezrukavyi assigned SRodi Oct 2, 2024

github-actions bot added the meta/waiting-for-author Blocked and waiting on the author label Oct 22, 2024

Merge to upstream

4c01faa

whatnick force-pushed the feat/eks-e2e-test branch from 4bb5b0f to 4c01faa Compare October 22, 2024 03:01

Merge to upstream

ed835fd

github-actions bot removed the meta/waiting-for-author Blocked and waiting on the author label Oct 23, 2024

whatnick added 3 commits November 16, 2024 13:12

Merged to upstream

b28db7b

Merged to upstream

a041dca

Disable Windows Metrics tests for AWS

32cc1be

github-actions bot added the meta/waiting-for-author Blocked and waiting on the author label Dec 18, 2024

github-actions bot closed this Dec 26, 2024

nddq reopened this Dec 26, 2024

nddq removed the meta/waiting-for-author Blocked and waiting on the author label Dec 26, 2024

nddq mentioned this pull request Jan 24, 2025

Retina multicloud #1267

Open

github-actions bot added the meta/waiting-for-author Blocked and waiting on the author label Jan 26, 2025

nddq removed the meta/waiting-for-author Blocked and waiting on the author label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: EKS e2e test using eksctl #667

test: EKS e2e test using eksctl #667

whatnick commented Aug 27, 2024 •

edited

Loading

nddq commented Aug 27, 2024

timraymond commented Aug 27, 2024

rbtr commented Aug 27, 2024

rbtr commented Aug 27, 2024

whatnick commented Aug 27, 2024 via email

matmerr commented Aug 28, 2024

whatnick commented Sep 2, 2024

github-actions bot commented Oct 22, 2024

whatnick commented Oct 22, 2024

whatnick commented Nov 17, 2024

github-actions bot commented Dec 18, 2024

github-actions bot commented Dec 26, 2024

whatnick commented Dec 26, 2024

github-actions bot commented Jan 26, 2025

test: EKS e2e test using eksctl #667

Are you sure you want to change the base?

test: EKS e2e test using eksctl #667

Conversation

whatnick commented Aug 27, 2024 • edited Loading

Description

Related Issue

Checklist

Screenshots (if applicable) or Testing Completed

Additional Notes

nddq commented Aug 27, 2024

timraymond commented Aug 27, 2024

rbtr commented Aug 27, 2024

rbtr commented Aug 27, 2024

whatnick commented Aug 27, 2024 via email

matmerr commented Aug 28, 2024

whatnick commented Sep 2, 2024

github-actions bot commented Oct 22, 2024

whatnick commented Oct 22, 2024

whatnick commented Nov 17, 2024

github-actions bot commented Dec 18, 2024

github-actions bot commented Dec 26, 2024

whatnick commented Dec 26, 2024

github-actions bot commented Jan 26, 2025

whatnick commented Aug 27, 2024 •

edited

Loading