Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CFP for flow log aggregation #65

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

anubhabMajumdar
Copy link

Adding a proposal for flowlog aggregation in Hubble.

Signed-off-by: Anubhab Majumdar <[email protected]>
@xmulligan
Copy link
Member

cc: @cilium/sig-hubble

@rolinh rolinh self-assigned this Feb 7, 2025
@rolinh rolinh added the sig/hubble CFP is related to Hubble label Feb 7, 2025
Copy link
Member

@rolinh rolinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the proposal makes a lot of sense and the design is sane. The approach of using fieldAggregate is both intuitive (for users who are used to fieldMask at least) and flexible.

The only caveat I can think of at the moment is potential pitfalls when combining fieldMask and fieldAggregate (e.g. exclude a field via fieldMask but attempt to use it in fieldAggregate). However, I think this is a problem that can be solved with specific checks.

hubble/CFP-37472-flowlog-aggregation.md Outdated Show resolved Hide resolved

### Will this impact existing clusters?

No, this is not a breaking change. By default, the new option won't aggregate any logs. Adding fields to `fieldAggregate` will still write the same `Flow` json structure in the same location - only the number of entries written will differ. Any tooling built to ingest these logs will keep working as expected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the Motivation section:

| source_pod | source_port | destination_pod | destination_port | protocol | ingress_flow_count | egress_flow_count |
|------------|-------------|-----------------|------------------|----------|--------------------|-------------------|
| client-1   | 12345       | server          | 80               | TCP      | 4                 | 6                 |
| client-2   | 23456       | server          | 80               | TCP      | 4                 | 6                 |

How are the ingress_flow_count and egress_flow_count computed with the current JSON flow structure?

Copy link
Author

@anubhabMajumdar anubhabMajumdar Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may not be able to count exactly how many "packets" were sent/received between two pods for x seconds. Since we are aggregating the flows, I was thinking of conveying the information to user as to how many "flows" Hubble processed between the two pods in a period of time.

I was thinking of extending Flow using the extensions field to convey this information.

@anubhabMajumdar
Copy link
Author

I think the proposal makes a lot of sense and the design is sane. The approach of using fieldAggregate is both intuitive (for users who are used to fieldMask at least) and flexible.

The only caveat I can think of at the moment is potential pitfalls when combining fieldMask and fieldAggregate (e.g. exclude a field via fieldMask but attempt to use it in fieldAggregate). However, I think this is a problem that can be solved with specific checks.

@rolinh I was thinking the fieldMask and fieldAggregate can work independently. We use fieldAggregate first to group the flows we see from dataplane, then use fieldMask when writing to file. I can test this out during implementation to make sure there's no error.

Co-authored-by: Bill Mulligan <[email protected]>
Signed-off-by: Anubhab Majumdar <[email protected]>
Signed-off-by: Anubhab Majumdar <[email protected]>
@anubhabMajumdar anubhabMajumdar force-pushed the topic/anmajumdar/aggregate-cfp branch from fcee6dd to fde9c92 Compare February 7, 2025 18:52
@anubhabMajumdar
Copy link
Author

@rolinh @xmulligan @kaworu Thanks for the reviews!
Addressed your comments and updated the proposal. Let me know if this can be merged.

@neaggarwMS
Copy link

I think the proposal makes a lot of sense and the design is sane. The approach of using fieldAggregate is both intuitive (for users who are used to fieldMask at least) and flexible.
The only caveat I can think of at the moment is potential pitfalls when combining fieldMask and fieldAggregate (e.g. exclude a field via fieldMask but attempt to use it in fieldAggregate). However, I think this is a problem that can be solved with specific checks.

@rolinh I was thinking the fieldMask and fieldAggregate can work independently. We use fieldAggregate first to group the flows we see from dataplane, then use fieldMask when writing to file. I can test this out during implementation to make sure there's no error.

Do we need checks? if fieldMask is empty, shouldnt we add all fields, no?

@rolinh
Copy link
Member

rolinh commented Feb 12, 2025

Let me know if this can be merged.

Let's way a couple more days to give time to other Hubble maintainers to have a look at the proposal.

Do we need checks?

I think we just need to ensure the relation from fieldMask to fieldAggregate is intuitive.

if fieldMask is empty, shouldnt we add all fields, no?

Yes

Comment on lines +49 to +52
| source_pod | source_port | destination_pod | destination_port | protocol | ingress_flow_count | egress_flow_count |
|------------|-------------|-----------------|------------------|----------|--------------------|-------------------|
| client-1 | 12345 | server | 80 | TCP | 4 | 6 |
| client-2 | 23456 | server | 80 | TCP | 4 | 6 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this assume that we take the traffic direction into account, or just ignore the reply flows?

- name: "test003"
filePath: "/var/run/cilium/hubble/test003.log"
fieldMask: ["source", "destination","verdict"]
fieldAggregate: ["source.pod_name", "destination.pod_name", "l4.TCP.destination_port"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expand a little bit on the https://github.com/cilium/design-cfps/pull/65/files#r1946717206 question:

If we use SQL group_by as something that has similar functionality to aggregation, what happens to all the other fields not aggregated?

When a group_by is added to the sql query, all the other fields not included in the group_by need an aggregation function, so integers can be summed, values concatenated, etc.

Using your provided table as an example, lets say we don't include the destination port into the aggregation and just use source_pod, source_port, how are destination_pod, destination_port, protocol and flags are going to be handled?

| source_pod | source_port | destination_pod | destination_port | protocol | flags   |
|------------|-------------|-----------------|------------------|----------|---------|
| client-1   | 12345       | server          | 80               | TCP      | SYN     |
| server     | 80          | client-1        | 12345            | TCP      | SYN-ACK |
| client-1   | 12345       | server          | 80               | TCP      | ACK     |
| client-1   | 12345       | server          | 80               | TCP      | PSH     |
| server     | 80          | client-1        | 12345            | TCP      | ACK     |
| server     | 80          | client-1        | 12345            | TCP      | PSH     |
| client-1   | 12345       | server          | 80               | TCP      | ACK     |
| client-1   | 12345       | server          | 80               | TCP      | FIN     |
| server     | 80          | client-1        | 12345            | TCP      | FIN-ACK |
| client-1   | 12345       | server          | 80               | TCP      | ACK     |
| client-2   | 23456       | server          | 80               | TCP      | SYN     |
| server     | 80          | client-2        | 23456            | TCP      | SYN-ACK |
| client-2   | 23456       | server          | 80               | TCP      | ACK     |
| client-2   | 23456       | server          | 80               | TCP      | PSH     |
| server     | 80          | client-2        | 23456            | TCP      | ACK     |
| server     | 80          | client-2        | 23456            | TCP      | PSH     |
| client-2   | 23456       | server          | 80               | TCP      | ACK     |
| client-2   | 23456       | server          | 80               | TCP      | FIN     |
| server     | 80          | client-2        | 23456            | TCP      | FIN-ACK |
| client-2   | 23456       | server          | 80               | TCP      | ACK     |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/hubble CFP is related to Hubble
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants