Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Incorrect default value of llq_policy #331

Closed
3 tasks done
Binary-Vanguard-12138 opened this issue Dec 1, 2024 · 17 comments
Closed
3 tasks done

[Bug]: Incorrect default value of llq_policy #331

Binary-Vanguard-12138 opened this issue Dec 1, 2024 · 17 comments
Labels
bug Report errors or unexpected behavior DPDK driver

Comments

@Binary-Vanguard-12138
Copy link

Binary-Vanguard-12138 commented Dec 1, 2024

Preliminary Actions

Driver Type

DPDK PMD for Elastic Network Adapter (ENA)

Driver Tag/Commit

ena_dpdk_2.11.0

Custom Code

No

OS Platform and Distribution

4.14.336-257.568.amzn2.x86_64 #1 SMP Sat Mar 23 09:49:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Bug description

According to the 18. ENA Poll Mode Driver of official DPDK documentation, after upgrading the DPDK to 24.11 version, runtime configuration for LLQ policy has been refactored and llq_policy is supposed to be 1 by default.
But during the test, it throws the following warning message if I don't specify the llq_policy devargs.

ENA_DRIVER: ena_get_metrics_entries(): 0x6 customer metrics are supported
ENA_DRIVER: ena_set_queues_placement_policy(): NOTE: LLQ has been disabled as per user's request. This may lead to a huge performance degradation!
ENA_DRIVER: ena_get_metrics_entries(): 0x6 customer metrics are supported
ENA_DRIVER: ena_set_queues_placement_policy(): NOTE: LLQ has been disabled as per user's request. This may lead to a huge performance degradation!

That means, llq_policy was applied as 0 which disabled LLQ.

Reproduction steps

1. Download the latest version of DPDK (24.11).
2. Write a test UDP application using DPDK and run it on AWS `c6in.metal` instance. (I believe it can be reproduced on other instance types.)

Expected Behavior

Should not throw the above type of warnings.

Actual Behavior

Could see the warning, and the total PPS decreased by 10%.

Additional Data

No response

Relevant log output

No response

Contact Details

@Binary-Vanguard-12138 Binary-Vanguard-12138 added bug Report errors or unexpected behavior triage Determine the priority and severity labels Dec 1, 2024
@Binary-Vanguard-12138
Copy link
Author

@shaibran This bug is not directly related to this repository because the DPDK driver has been outdated as pointed out in #317 .
But it has appeared on the latest release of DPDK.

@Binary-Vanguard-12138
Copy link
Author

I also checked the git history on ena driver of DPDK repository, and I could not find the code that sets the default llq_policy to 1 if I understand correctly.
Please feel free to correct me if I am wrong in any case. @shaibran

@Binary-Vanguard-12138 Binary-Vanguard-12138 changed the title [Bug]: [Bug]: Incorrect default value of llq_policy Dec 2, 2024
@shaibran
Copy link
Contributor

shaibran commented Dec 2, 2024

Thank you for contacting us.
The default should be 1 so the driver will use the device recommendation.
We will investigate this and update.

@selevit
Copy link

selevit commented Dec 6, 2024

Also faced with this. Same DPDK version.

Distro:

Linux dpdk-pg1 6.1.115-126.197.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Nov  5 17:36:57 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

@Binary-Vanguard-12138
Copy link
Author

Also faced with this. Same DPDK version.

Distro:

Linux dpdk-pg1 6.1.115-126.197.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Nov  5 17:36:57 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Thanks for confirming. Would like to exchange our experience in using DPDK on AWS.
I have been experiencing some problems recently and the AWS support team could not answer me satisfactorily.
Sent a message to your TG.

@amitbern-aws
Copy link
Contributor

amitbern-aws commented Dec 9, 2024

@Binary-Vanguard-12138

Thank you for reaching out and sharing your concerns. I’d like to address the three issues you reported:

Performance issue of ENA PMD driver for DPDK
This report was not created by AWS. We recommend contacting the publisher of the report directly for detailed technical assistance, as they are best positioned to address your concerns.

Outdated ENA PMD driver
The merge process is still in progress. Since this repository supports multiple kernels, including DPDK, the update requires careful consideration across all environments. We will notify you as soon as this task is complete.

Incorrect default value of llq_policy
Our team is actively investigating this matter and will provide updates as soon as new information becomes available.

We value your feedback and are committed to providing a resolution to each of these issues.
Please feel free to share any additional details or concerns, and we’ll do our best to address them.
Thanks

@amitbern-aws
Copy link
Contributor

amitbern-aws commented Dec 9, 2024

@Binary-Vanguard-12138 / @selevit,

Regarding this issue [Bug]: Incorrect default value of llq_policy
we’re actively working on a solution for the DPDK upstream repository.
In the meantime, you can use one of the following workarounds:

  1. Set llq_policy devarg:
    Add -a "0000:28:00.0,llq_policy=1" to the EAL options, where 0000:28:00.0 represents the PCI BDF ID of the device.
  2. Apply this patch to DPDK 24.11: 0001-net-ena-set-default-LLQ-header-policy.patch
    git apply ./0001-net-ena-set-default-LLQ-header-policy.patch

Let us know if you need further assistance
Thanks

@Binary-Vanguard-12138
Copy link
Author

@Binary-Vanguard-12138 / @selevit,

Regarding this issue [Bug]: Incorrect default value of llq_policy we’re actively working on a solution for the DPDK upstream repository. In the meantime, you can use one of the following workarounds:

  1. Set llq_policy devarg:
    Add -a "0000:28:00.0,llq_policy=1" to the EAL options, where 0000:28:00.0 represents the PCI BDF ID of the device.
  2. Apply this patch to DPDK 24.11: 0001-net-ena-set-default-LLQ-header-policy.patch
    git apply ./0001-net-ena-set-default-LLQ-header-policy.patch

Let us know if you need further assistance Thanks

Thanks for the patch.
I came up with resolving this issue by passing devargs to set llq_policty to rte_eal_init function.
I hope and suppose this patch will be applied to the DPDK upstream repository soon.

@Binary-Vanguard-12138
Copy link
Author

Binary-Vanguard-12138 commented Dec 9, 2024

@Binary-Vanguard-12138

Thank you for reaching out and sharing your concerns. I’d like to address the three issues you reported:

Performance issue of ENA PMD driver for DPDK This report was not created by AWS. We recommend contacting the publisher of the report directly for detailed technical assistance, as they are best positioned to address your concerns.

Outdated ENA PMD driver The merge process is still in progress. Since this repository supports multiple kernels, including DPDK, the update requires careful consideration across all environments. We will notify you as soon as this task is complete.

Incorrect default value of llq_policy Our team is actively investigating this matter and will provide updates as soon as new information becomes available.

We value your feedback and are committed to providing a resolution to each of these issues. Please feel free to share any additional details or concerns, and we’ll do our best to address them. Thanks

  1. I mentioned the DPDK AWS testing result here because I was able to get a very similar result on my test.
  2. Good to know, I thought it was quite simple like just copying the code from the DPDK upstream. Apologies if that's not the case.

@Binary-Vanguard-12138
Copy link
Author

Binary-Vanguard-12138 commented Dec 9, 2024

@amitbern-aws or @shaibran , I wonder if I need to open another issue for a performance problem, but here is a brief summary of the problem that I am currently facing.
Used DPDK 24.11 adding llq_policy devargs for all ENA devices.
Used two c6in.metal instances with 2 internal NICs.
Tried to send UDP packets of 64 bytes size using a single CPU core for each NIC from one metal to another metal instance.
When I activated only 1 RX and TX single core (like from eth1 of instance #1 to eth1 of instance #2), it was able to receive all packets with about 4.7Mpps with almost no packet loss.
But when both NICs were sending and receiving (like from eth1 of instance #1 to eth1 of instance #2, and from eth2 of instance #1 to eth2 of instance #2), I can see packets loss of about 15%. (The sending machine sends packets with 4.7Mpps for each NIC, but it only receives about 4.2Mpps on both NICs)

@shaibran
Copy link
Contributor

Binary-Vanguard-12138 it is recommended to use unique cores for Tx and Rx. If this is already the case, than it is might be related to the underlayers implementation which is something that is not exposed to the host.

@shaibran shaibran removed the triage Determine the priority and severity label Dec 12, 2024
@Binary-Vanguard-12138
Copy link
Author

Binary-Vanguard-12138 it is recommended to use unique cores for Tx and Rx. If this is already the case, than it is might be related to the underlayers implementation which is something that is not exposed to the host.

Thanks for your response.
Yes, it is already the case.
I enabled 2 NICs on a single AWS instance and utilized a single core on each NIC.
The 1st instance is only sending packets while the 2nd instance is only receiving.
I am not clear what you meant by underlayers implementation.

@shaibran
Copy link
Contributor

  1. Ensure that PPS is measured on the receiving side during all tests.
  2. Attempt to run both flows in the same direction:
    • Scenario 1: Intence#1 ENI#1 → Intence#2 ENI#1
    • Scenario 2: Intence#1 ENI#2 → Intence#2 ENI#2
  3. As a general observation, when additional flows are introduced, the increase in PPS is not expected to scale linearly.

@shaibran
Copy link
Contributor

shaibran commented Dec 15, 2024

BTW, we already up streamed the LLQ policy bug fix and it is expected to be staged to DPDK 24.11.1.
https://inbox.dpdk.org/dev/[email protected]/

Thank you again for letting us know about the problem.

@shaibran
Copy link
Contributor

shaibran commented Dec 17, 2024

Binary-Vanguard-12138 I am resolving this issue as the bug discussed is staged for the upcoming release 25.03 and LTS 24.11.1.
Please contact me directly via email [email protected] to further investigate the performance issue you described.

@Binary-Vanguard-12138
Copy link
Author

BTW, we already up streamed the LLQ policy bug fix and it is expected to be staged to DPDK 24.11.1. https://inbox.dpdk.org/dev/[email protected]/

Thank you again for letting us know about the problem.

Hey, I downloaded the newest DPDK-24-11-1 and checked the code, and it looks like your patch was not added to this stable release version.
Can you check it and let me know when I can expect it to be merged to the LTS version of 24.11?

@shaibran
Copy link
Contributor

Hi, DPDK 24.11.1 was released only few days after 24.11 to include a fix for CVE-2024-11614. The ENA patch was accepted but will be included in the upcoming release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Report errors or unexpected behavior DPDK driver
Projects
None yet
Development

No branches or pull requests

4 participants