test: Retina e2e scale test #720

alexcastilio · 2024-09-10T16:09:46Z

Description

Create Retina E2E Scale tests using test/e2e framework.

Related Issue

If this pull request is related to any issue, please mention it here. Additionally, make sure that the issue is assigned to you before submitting this pull request.

Checklist

I have read the contributing documentation.
I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
I have correctly attributed the author(s) of the code.
I have tested the changes locally.
I have followed the project's style guidelines.
I have updated the documentation, if necessary.
I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

Please add any relevant screenshots or GIFs to showcase the changes made.

Additional Notes

Add any additional notes or context about the pull request here.

Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

timraymond

I realize that this is a draft, and I ordinarily don't review drafts since things are subject to change... but I figured these would be good suggestions early-on if you're planning to write some more here.

test/e2e/framework/types/runner.go

test/e2e/scale_test.go

anubhabMajumdar · 2024-09-10T16:49:22Z

test/e2e/jobs/jobs.go

@@ -62,6 +62,20 @@ func DeleteTestInfra(subID, clusterName, location string) *types.Job {
 	return job
 }

+func InstallRetina(kubeConfigFilePath, chartPath string) *types.Job {


Why do you need this? Can you refactor InstallAndTestRetinaBasicMetrics.

I preferred not to change InstallAndTestRetinaBasicMetrics since this might be in use by someone else and the scope seems to be different.

The plan for the new pipeline is to Create Infra, Install Retina, Scale Up and Run Tests, so my thinking is to create different jobs for each stage so each job could be reused somewhere else. So in the future instead of using InstallAndTestRetinaBasicMetrics, one could [re-]use InstallRetina, Test Basic Metrics, and perhaps add some other set of tests. Does that make sense?

test/e2e/framework/types/runner.go

test/e2e/framework/types/background_test.go

test/e2e/framework/types/runner.go

test/e2e/framework/types/scenarios_test.go

github-actions · 2024-10-12T00:21:05Z

This PR will be closed in 7 days due to inactivity.

matmerr · 2024-10-17T14:55:05Z

test/e2e/framework/scaletest/create-network-policies.go

+	"k8s.io/client-go/tools/clientcmd"
+)
+
+type CreateNetworkPolicies struct {


there is a create network policy in the stdlib, can we either move this there or converge on to one way of creating netpols?
https://github.com/microsoft/retina/blob/main/test/e2e/framework/kubernetes/create-network-policy.go

Yes, that could be done with some refactoring.

kubernetes.CreateDenyAllNetworkPolicy step creates a specific NetworkPolicy (NP) that is encapsulated in a way where the only Namespace and one podSelector label could be provided.

scaletest.CreateNetworkPolicies step creates a number of NPs that are also specific with some podSelector labels needed for the scale-tests.

The first one would need to have more fields to allow more customization on generated NP. But it would also require to check and change the code everywhere it's currently being used.

If it were done that way, how should this step that generates a single NP be reused to generate many NPs? Inside the ScaleTest function that generates the job (file test/e2e/jobs/scale.go) add a loop to add this same Step several times? Or a new step should be created encapsulating these multiple NP creations inside of it? (I'm not sure if the framework currently allows creation/reuse of steps inside of steps.)

Anyway, if this is the best approach, maybe this could be done in a separate refactoring task. Do you agree?

I see the reasoning, while there might be a way to accommodate both scenarios, that may be out of scope for this pr given current limitations. Existing cases are verifying granular drop behavior unlike this suite, can shelve for later

matmerr · 2024-10-17T14:57:13Z

test/e2e/framework/scaletest/create-resources.go

+	"k8s.io/client-go/tools/clientcmd"
+)
+
+type CreateResources struct {


is it possible to reuse something like this?

https://github.com/microsoft/retina/blob/main/test/e2e/framework/kubernetes/create-resource.go#L23

pidgeonholing these relatively common actions to specific tests is going to result in a lot of redundant code

It's being reused inside Run() method (line 66). This step puts together the generation of all objects required for this scale test and then it reuses the CreateResource function you mention to apply that to the cluster.

What would you suggest to reuse it in a better way?

My thought stems from the fact that creating/destroying Kubernetes resources is a core component of these suites, and if there was a way of making this CreateResources step able to be shared amongst other suites.

In other such cases we've seen people create resources that require pod/job/daemonset to be ready, but the creating resources aspect of the test just creates the resource in apiserver and moves on without waiting for job or pod, which then results in faulty tests. Then it is necessary to add a wait for pod ready like the added step in this pr and/or verify stable cluster, etc.

I don't have a clear solution for this at the moment so I won't block, but it seems everybody has different way of creating test resources, so just starting a dialog around the topic

matmerr · 2024-10-17T14:59:44Z

test/e2e/framework/scaletest/templates/kapinger-deployment.go

+
+var (
+	kapingerPort       = int32(8080)
+	KapingerDeployment = appsv1.Deployment{


Is there a reason this is a dupe of this?
https://github.com/microsoft/retina/blob/main/test/e2e/framework/kubernetes/create-kapinger-deployment.go

already it's proven difficult when we tweak kapinger to track down all of the places that use it, having another place where a DeploymentSpec is define makes that more difficult

I think I just have not seen it before. Re-using previous implementation

matmerr · 2024-10-17T15:02:10Z

test/e2e/framework/scaletest/validate-options.go

+// Returning an error will cause the test to fail
+func (po *ValidateAndPrintOptions) Run() error {
+
+	log.Printf(`Starting to scale with folowing options:


Use %+v formatting directive to print struct keys and values

Suggested change

log.Printf(`Starting to scale with folowing options:

log.Printf(`Starting to scale with following options %+v", po.Options)

matmerr · 2024-10-17T15:03:30Z

test/e2e/framework/scaletest/validate-nodes.go

+	KubeConfigFilePath string
+}
+
+// Useful when wanting to do parameter checking, for example


this comment is a bit redundant when copy pasted to every struct :)

This comment comes from the interface. I use a plugin to automatically create the methods from the interface that I want to implement, so it also gets the comment from it :)

matmerr · 2024-10-17T15:04:27Z

test/e2e/framework/scaletest/templates/networkpolicy.go

+)
+
+var (
+	NetworkPolicy = netv1.NetworkPolicy{


+1 this doesn't necessarily seem scale test specific, let's put it in a more common directory and/or reuse with existing create deny netpol steps

I can put it in the folder test/e2e/framework/kubernetes/templates and existing create deny netpol step can be changed in another task to reuse it and perform changes needed. Is that ok?

matmerr · 2024-10-17T15:05:13Z

test/e2e/framework/scaletest/templates/kapinger-rbac.go

+)
+
+var (
+	KapingerClusterRole = rbacv1.ClusterRole{


+same for this, let's stick to one way of deploying kapinger and tweak that if we need to

matmerr · 2024-10-17T15:11:16Z

test/e2e/framework/kubernetes/create-namespace.go

+type CreateNamespace struct {
+	Namespace          string
+	KubeConfigFilePath string
+	DryRun             bool


Do we have any strong use cases for dry run? In a lot of these cases so far all core step code is gated by this conditional, and when dry run is set to true, it just pulls kubeconfig

For what it's worth the prevalidate step is effectively a dry run

The script in which this code is based on has a flag to stop execution after generating objects and before applying them to cluster, to allow inspection of generated objects. The idea here is to print manifests to logs without applying them. Perhaps this could be removed. @vipul-21 what do you think about this?

cc: @huntergregory for his thoughts

I think we can remove this for simplicity

This was primarily used to calculate how many IP sets et cetera that npm will create

DryRun removed

Signed-off-by: Alex Castilio dos Santos <[email protected]>

alexcastilio added the area/infra Test, Release, or CI Infrastructure label Sep 10, 2024

alexcastilio requested review from matmerr and anubhabMajumdar September 10, 2024 16:09

alexcastilio self-assigned this Sep 10, 2024

alexcastilio requested a review from a team as a code owner September 10, 2024 16:09

alexcastilio marked this pull request as draft September 10, 2024 16:10

timraymond reviewed Sep 10, 2024

View reviewed changes

test/e2e/framework/types/runner.go Outdated Show resolved Hide resolved

test/e2e/framework/types/runner.go Outdated Show resolved Hide resolved

test/e2e/scale_test.go Outdated Show resolved Hide resolved

test/e2e/scale_test.go Outdated Show resolved Hide resolved

anubhabMajumdar reviewed Sep 10, 2024

View reviewed changes

alexcastilio force-pushed the create-scale-test-cluster branch from 7b88495 to a853797 Compare September 10, 2024 18:35

matmerr reviewed Sep 10, 2024

View reviewed changes

test/e2e/framework/types/runner.go Outdated Show resolved Hide resolved

timraymond reviewed Sep 10, 2024

View reviewed changes

test/e2e/framework/types/background_test.go Outdated Show resolved Hide resolved

test/e2e/framework/types/runner.go Outdated Show resolved Hide resolved

test/e2e/framework/types/scenarios_test.go Outdated Show resolved Hide resolved

alexcastilio force-pushed the create-scale-test-cluster branch 5 times, most recently from 6662faf to 69189ce Compare September 11, 2024 09:39

github-actions bot added the meta/waiting-for-author Blocked and waiting on the author label Oct 12, 2024

alexcastilio force-pushed the create-scale-test-cluster branch 7 times, most recently from 83abb41 to 6ef0d3b Compare October 16, 2024 13:24

alexcastilio removed the meta/waiting-for-author Blocked and waiting on the author label Oct 16, 2024

alexcastilio force-pushed the create-scale-test-cluster branch from 6ef0d3b to dd8089a Compare October 16, 2024 14:11

alexcastilio marked this pull request as ready for review October 16, 2024 14:12

alexcastilio force-pushed the create-scale-test-cluster branch 2 times, most recently from 2904623 to 84b6bd2 Compare October 16, 2024 14:30

alexcastilio force-pushed the create-scale-test-cluster branch 3 times, most recently from 695817d to dce97a3 Compare October 16, 2024 16:03

matmerr requested changes Oct 17, 2024

View reviewed changes

alexcastilio added 3 commits October 21, 2024 10:52

test: Retina e2e scale test

2e67638

Signed-off-by: Alex Castilio dos Santos <[email protected]>

test: scale test job skeleton + scale test options

d368c47

Signed-off-by: Alex Castilio dos Santos <[email protected]>

test: add steps: ValidateNumOfNodes DeleteNamespace CreateNamespace

81dbc79

Signed-off-by: Alex Castilio dos Santos <[email protected]>

alexcastilio force-pushed the create-scale-test-cluster branch from dce97a3 to 8e3600a Compare October 21, 2024 10:10

alexcastilio added 3 commits October 21, 2024 15:12

test: add steps: CreateResources

b9fc6f5

Signed-off-by: Alex Castilio dos Santos <[email protected]>

test: add steps: AddSharedLabels, AddUniqueLabels, DeleteAndReAddLabels

7f9a813

Signed-off-by: Alex Castilio dos Santos <[email protected]>

test: add steps: CreateNetworkPolicies, WaitPodsReady

b34dad4

Signed-off-by: Alex Castilio dos Santos <[email protected]>

alexcastilio force-pushed the create-scale-test-cluster branch from 8e3600a to b34dad4 Compare October 21, 2024 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Retina e2e scale test #720

test: Retina e2e scale test #720

alexcastilio commented Sep 10, 2024

timraymond left a comment

anubhabMajumdar Sep 10, 2024

alexcastilio Sep 10, 2024

github-actions bot commented Oct 12, 2024

matmerr Oct 17, 2024

alexcastilio Oct 18, 2024

matmerr Oct 18, 2024

matmerr Oct 17, 2024

alexcastilio Oct 18, 2024

matmerr Oct 18, 2024

matmerr Oct 17, 2024

alexcastilio Oct 21, 2024

matmerr Oct 17, 2024

alexcastilio Oct 18, 2024

matmerr Oct 17, 2024

alexcastilio Oct 18, 2024

matmerr Oct 17, 2024

alexcastilio Oct 21, 2024

matmerr Oct 17, 2024

alexcastilio Oct 21, 2024

matmerr Oct 17, 2024

alexcastilio Oct 18, 2024

matmerr Oct 18, 2024

huntergregory Oct 18, 2024

huntergregory Oct 18, 2024

alexcastilio Oct 21, 2024

	log.Printf(`Starting to scale with folowing options:
	log.Printf(`Starting to scale with following options %+v", po.Options)

test: Retina e2e scale test #720

Are you sure you want to change the base?

test: Retina e2e scale test #720

Conversation

alexcastilio commented Sep 10, 2024

Description

Related Issue

Checklist

Screenshots (if applicable) or Testing Completed

Additional Notes

timraymond left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment