-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate to a Linux 6.12 kernel #1808
Comments
Sources already out ? Can i get link to source? Btw, where i can met tempesta tech authors in chat? Telegram channel doesnt exist. Irc maybe? Newest lts kernel will be 6.1 according linux kernel maintainer.. Thanks and Best regards |
Hi @osevan , so far we have only https://github.com/tempesta-tech/linux-5.10.35-tfw , which should be replaced with a newer longterm kernel in the next release. Unfortunately, we don't have a public chat yet. |
Can we create one in telegram? |
Hi @osevan , at some time we had a public chat in Slack, but it vanished after some time due to not enough traction. It still makes sense to create a chat (BTW Telegram looks like a good platform) and a Reddit group, but not earlier than we reach GA. |
Probably it makes sense to migrate to 6.8 or later, even not stable yet, to get the TCP performance optimizations |
I have some questions:
|
I created new repo for the kernel https://github.com/tempesta-tech/linux-6.8.9-tfw , so please
|
I divided this issue into 7 parts:
1. fpu invalid opcode: 0000
Unlike the old version, the new kernel raises softirq even in the boot phase, when fpu related stuff (such as registers or process fpu state) is not ready for manipulation (otherwise exception raised), so it should be enabled in the first
2. paged-skb-patchThe skb stuff is refactored by the latest kernel a lot, for example, now the kernel uses |
3. Assembly problem: endbr64-disallow-indirect-jumpWe have below assembly functions:
Solution
As an aside, interestingly, if a user-mode C program uses a switch statement that meets the conditions for generating a jump table (gcc uses |
4. kernel crash triggered by test cases4.1 cryptd_queue_workerThis issue is related to fpu. Because fpu save/restore in softirq context has not worked so far, some assumptions of our code are broken:
workaround──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
modified: crypto/simd.c
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
@@ -317,10 +317,10 @@ static int simd_aead_encrypt(struct aead_request *req)
subreq = aead_request_ctx(req);
*subreq = *req;
- if (!crypto_simd_usable() ||
- (in_atomic() && cryptd_aead_queued(ctx->cryptd_tfm)))
- child = &ctx->cryptd_tfm->base;
- else
+ //if (!crypto_simd_usable() ||
+ // (in_atomic() && cryptd_aead_queued(ctx->cryptd_tfm)))
+ // child = &ctx->cryptd_tfm->base;
+ //else
child = cryptd_aead_child(ctx->cryptd_tfm);
aead_request_set_tfm(subreq, child);
logThe cryptd worker queue contains invalid list items whose
4.2 ipv6_dup_options
workaround──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
modified: net/ipv6/tcp_ipv6.c
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
@@ -564,8 +564,8 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst,
rcu_read_lock();
opt = ireq->ipv6_opt;
- if (!opt)
- opt = rcu_dereference(np->opt);
+ //if (!opt)
+ // opt = rcu_dereference(np->opt);
err = ip6_xmit(sk, skb, fl6, skb->mark ? : READ_ONCE(sk->sk_mark),
opt, tclass, READ_ONCE(sk->sk_priority));
rcu_read_unlock();
@@ -1489,8 +1489,8 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff *
to newsk.
*/
opt = ireq->ipv6_opt;
- if (!opt)
- opt = rcu_dereference(np->opt);
+ //if (!opt)
+ // opt = rcu_dereference(np->opt);
if (opt) {
opt = ipv6_dup_options(newsk, opt);
RCU_INIT_POINTER(newnp->opt, opt);
log
5. userspace segment fault triggered by test cases
|
6. test failures
|
if (!skb_queue_empty(&sk->sk_error_queue)) { |
test_cached_data_equal_to_original (cache.test_cache.TestChunkedResponse) ... b"tempesta_lib: loading out-of-tree module taints kernel.\ntempesta_lib: module verification failed: signature and/or required key missing - tainting kernel\n[tdb] Start Tempesta DB\n[tempesta fw] Initializing Tempesta FW kernel module...\n[tempesta fw] Warning: Vhost default doesn't have certificate with matching SAN/CN.\n Maybe that's fine, but it's worth checking the\n config - if there is no relations between the\n names, then host name confusion attack is possible.\n[tempesta fw] Configuration processing is completed.\n[tdb] Opened table /opt/tempesta/db/filter0.tdb: size=16777216 rec_size=20 base=00000000f2e6a053\n[tdb] Opened table /opt/tempesta/db/cache0.tdb: size=268435456 rec_size=0 base=00000000356bc22b\n[tdb] Opened table /opt/tempesta/db/sessions0.tdb: size=16777216 rec_size=312 base=0000000087b8ee1a\n[tdb] Opened table /opt/tempesta/db/client0.tdb: size=16777216 rec_size=624 base=00000000b49ee93e\n[tempesta fw] Open listen socket on: 0.0.0.0:443\n[tempesta fw] Open listen socket on: 0.0.0.0\n[tempesta fw] Tempesta FW is ready\n[tempesta fw] ERROR: error data in socket 00000000b7242a90\n[tdb] Close table 'client0.tdb'\n[tdb] Close table 'sessions0.tdb'\n[tdb] Close table 'cache0.tdb'\n[tdb] Close table 'filter0.tdb'\n[tempesta fw] modules are stopped\n[tempesta fw] exiting...\n[tdb] Shutdown Tempesta DB\n"
ERROR
test_h2_cached_data_equal_to_original (cache.test_cache.TestChunkedResponse) ... ok
======================================================================
ERROR: test_cached_data_equal_to_original (cache.test_cache.TestChunkedResponse)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/kingluo/tempesta-test/framework/tester.py", line 411, in cleanup_check_dmesg
raise Exception(f"{err} happened during test on Tempesta")
Exception: ERROR happened during test on Tempesta
solution
In the new kernel, the tx timestamp is looped with the original packet content received, and in our code, an error msg is printed, which fails the test case, but in fact, the test case passes all asserts and is successful.
We should filter out such non-error skb even though it is appended to sk_error_queue
.
[ 276.898889] Call Trace:
[ 276.898893] <IRQ>
[ 276.898898] dump_stack_lvl+0x70/0x90
[ 276.898908] dump_stack+0x14/0x20
[ 276.898914] ss_tcp_data_ready+0xfe/0x160 [tempesta_fw]
[ 276.898937] tcp_data_ready+0x35/0xe0
[ 276.899097] tcp_data_queue+0x8d5/0xe20
[ 276.899235] tcp_rcv_established+0x244/0x790
[ 276.899366] ? tcp_inbound_hash.constprop.0+0x4e/0x3e0
[ 276.899493] tcp_v4_do_rcv+0x16a/0x2a0
[ 276.899613] tcp_v4_rcv+0xf01/0xf70
[ 276.899730] ? raw_local_deliver+0xcd/0x240
[ 276.899847] ip_protocol_deliver_rcu+0x37/0x180
[ 276.899962] ip_local_deliver_finish+0x8a/0xb0
[ 276.900073] ip_local_deliver+0x73/0x120
[ 276.900184] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 276.900295] ip_rcv+0x18f/0x1b0
[ 276.900408] ? __pfx_ip_rcv_finish+0x10/0x10
[ 276.900518] __netif_receive_skb_one_core+0x8a/0xa0
[ 276.900629] __netif_receive_skb+0x15/0x60
[ 276.900739] process_backlog+0x9a/0x140
[ 276.900843] __napi_poll+0x31/0x1d0
[ 276.900945] net_rx_action+0x29d/0x310
[ 276.901048] __do_softirq+0xcd/0x2a0
[ 276.901151] do_softirq.part.0+0x41/0x60
[ 276.901307] </IRQ>
[ 276.901454] <TASK>
[ 276.901601] __local_bh_enable_ip+0x6e/0x70
[ 276.901751] __dev_queue_xmit+0x33d/0xde0
[ 276.901897] ? mas_alloc_nodes+0x16a/0x200
[ 276.902045] ? hash_conntrack_raw+0x6b/0xe0 [nf_conntrack]
[ 276.902202] ? __pte_offset_map+0x20/0x190
[ 276.902348] ip_finish_output2+0x2dc/0x550
[ 276.902495] ? nf_conntrack_in+0xeb/0x6c0 [nf_conntrack]
[ 276.902644] __ip_finish_output+0xb7/0x190
[ 276.902786] ip_finish_output+0x2d/0xe0
[ 276.902926] ip_output+0x63/0xf0
[ 276.903062] ? __pfx_ip_finish_output+0x10/0x10
[ 276.903199] ip_local_out+0x62/0x70
[ 276.903334] __ip_queue_xmit+0x19b/0x4f0
[ 276.903471] ? set_ptes.constprop.0+0x2b/0x90
[ 276.903605] ip_queue_xmit+0x19/0x20
[ 276.903739] __tcp_transmit_skb+0xada/0xc90
[ 276.903867] tcp_write_xmit+0x5d0/0x1420
[ 276.903992] __tcp_push_pending_frames+0x3b/0x110
[ 276.904114] tcp_send_fin+0x52/0x190
[ 276.904237] __tcp_close+0x2eb/0x3f0
[ 276.904360] tcp_close+0x29/0xa0
[ 276.904482] inet_release+0x4c/0x90
[ 276.904601] __sock_release+0x40/0xc0
[ 276.904716] sock_close+0x19/0x30
[ 276.904827] __fput+0xa8/0x2f0
[ 276.904940] __fput_sync+0x1e/0x30
[ 276.905051] __x64_sys_close+0x42/0x90
[ 276.905162] x64_sys_call+0x18ea/0x20c0
[ 276.905276] do_syscall_64+0x54/0x120
[ 276.905388] entry_SYSCALL_64_after_hwframe+0x78/0x80
[ 276.905503] RIP: 0033:0x7f0e86914f67
[ 276.905616] Code: ff e8 0d 16 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 73 ba f7 ff
[ 276.905871] RSP: 002b:00007ffea09c97f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[ 276.906003] RAX: ffffffffffffffda RBX: 0000562c8829ab60 RCX: 00007f0e86914f67
[ 276.906136] RDX: 0000000000000006 RSI: 0000000000000006 RDI: 0000000000000006
[ 276.906270] RBP: 0000000000000006 R08: 0000000000000000 R09: 0000000000000000
[ 276.906515] R10: 00007f0e8680fb40 R11: 0000000000000246 R12: 0000562c8829b950
[ 276.906657] R13: 0000000000000000 R14: 00007ffea09c9e70 R15: 0000000000000000
[ 276.906797] </TASK>
6.2 frang_resp_fwd_process() not called
Bug: type mismatch
-1
of int type will be cast to 255
of char type, working out an invalid frang index, when compiling with the compile options in the new kernel.
Line 167 in de0a8a3
char curr; |
Line 137 in de0a8a3
return -1; |
test log
======================================================================
FAIL: test_block_action_attack_reply_not_on_req_rcv_event (http_general.test_block_action.BlockActionReply)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/kingluo/tempesta-test/http_general/test_block_action.py", line 204, in test_block_action_attack_reply_not_on_req_rcv_event
self.check_last_error_response(client, expected_status_code="403")
File "/home/kingluo/tempesta-test/http_general/test_block_action.py", line 98, in check_last_error_response
self.assertEqual(client.last_response.status, expected_status_code)
AssertionError: '200' != '403'
- 200
+ 403
======================================================================
FAIL: test_reaching_the_limit_2 (t_frang.test_http_resp_code_block.HttpRespCodeBlockOneClientHttp)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/kingluo/tempesta-test/t_frang/test_http_resp_code_block.py", line 146, in test_reaching_the_limit_2
self.assertTrue(client.wait_for_connection_close())
AssertionError: False is not true
======================================================================
FAIL: test_timeout_invalid (t_frang.test_client_body_and_header_timeout.ClientBodyTimeout)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/kingluo/tempesta-test/t_frang/test_client_body_and_header_timeout.py", line 58, in test_timeout_invalid
self.check_last_response(self.get_client("deproxy-1"), "403", self.error)
File "/home/kingluo/tempesta-test/t_frang/frang_test_case.py", line 122, in check_last_response
self.assertEqual(
AssertionError: '200' != '403'
- 200
+ 403
: HTTP response status codes mismatch.
======================================================================
FAIL: test_timeout_invalid (t_frang.test_client_body_and_header_timeout.ClientBodyTimeoutH2)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/kingluo/tempesta-test/t_frang/test_client_body_and_header_timeout.py", line 58, in test_timeout_invalid
self.check_last_response(self.get_client("deproxy-1"), "403", self.error)
File "/home/kingluo/tempesta-test/t_frang/frang_test_case.py", line 122, in check_last_response
self.assertEqual(
AssertionError: '200' != '403'
- 200
+ 403
: HTTP response status codes mismatch.
======================================================================
FAIL: test_body_len (t_frang.test_length.FrangLengthH2)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/kingluo/tempesta-test/t_frang/test_length.py", line 283, in test_body_len
self.check_response(
File "/home/kingluo/tempesta-test/t_frang/frang_test_case.py", line 136, in check_response
self.assertEqual(
AssertionError: '200' != '403'
- 200
+ 403
: HTTP response status codes mismatch.
======================================================================
FAIL: test_two_clients_two_ip (t_frang.test_request_rate_burst.FrangRequestRateH2)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/kingluo/tempesta-test/t_frang/test_request_rate_burst.py", line 109, in test_two_clients_two_ip
self.assert_reset_socks(self.sniffer.packets)
File "/home/kingluo/tempesta-test/helpers/asserts.py", line 40, in assert_reset_socks
self.assertTrue(
AssertionError: False is not true : Ports must be reset: {39561}, but the actual state is: set()
======================================================================
FAIL: test_chunk_cnt_invalid (t_frang.test_http_body_and_header_chunk_cnt.HttpHeaderChunkCnt)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/kingluo/tempesta-test/t_frang/test_http_body_and_header_chunk_cnt.py", line 63, in test_chunk_cnt_invalid
self.check_response(client, "403", self.error)
File "/home/kingluo/tempesta-test/t_frang/frang_test_case.py", line 136, in check_response
self.assertEqual(
AssertionError: '200' != '403'
- 200
+ 403
: HTTP response status codes mismatch.
7. TSO
|
Do not forget to remove commit from #2299. This commit is already in new kernel |
New LTS kernel 6.12 have been released. It would be better to migrate to this version. |
#1808 (comment) Problems: 1. In the new kernel, assembly functions uniformly return from `__x86_return_thunk`. However, our assembly code uses the original `ret` instruction, so objtool in the kernel will notice this is a naked return during compilation. 2. `SYM_FUNC_START` in the new kernel will add endbr64 to the head of the assembly function, and all indirect jumps to ENDBR instructions, that is, the code snippet within the same function, will fail, but we use jump tables in the assembly function to perform indirect jumps. It will raise CET exception: https://en.wikipedia.org/wiki/X86_instruction_listings#Added_with_Intel_CET). Solutions: 1. Substitute the `ret` with `RET`, a macro in the new kernel to ensure the correct return. 2. `notrack jmp` and enable notrack in CPU setting: `wrmsrl(MSR_IA32_S_CET, CET_ENDBR_EN | CET_NO_TRACK_EN)` As an aside, interestingly, if a user-mode C program uses a switch statement that meets the conditions for generating a jump table (gcc uses `-fcf-protection=full` by default), the generated jump table will use a `jmp` with the `notrack` prefix, and IBT will be marked as `true` in the `.note.gnu.property` section of the compiled elf file, so that the `NO_TRACK_EN` of the `MSR` will be set to `true` in user mode when the kernel is loaded. So user mode can use `notrack` to bypass CET without caring about setting or not setting `NO_TRACK_EN`. bignum_x86-64.S: replace ret with RET, to use __x86_return_thunk
#1808 (comment) Problems: 1. In the new kernel, assembly functions uniformly return from `__x86_return_thunk`. However, our assembly code uses the original `ret` instruction, so objtool in the kernel will notice this is a naked return during compilation. 2. `SYM_FUNC_START` in the new kernel will add endbr64 to the head of the assembly function, and all indirect jumps to ENDBR instructions, that is, the code snippet within the same function, will fail, but we use jump tables in the assembly function to perform indirect jumps. It will raise CET exception: https://en.wikipedia.org/wiki/X86_instruction_listings#Added_with_Intel_CET). Solutions: 1. Substitute the `ret` with `RET`, a macro in the new kernel to ensure the correct return. 2. `notrack jmp` and enable notrack in CPU setting: `wrmsrl(MSR_IA32_S_CET, CET_ENDBR_EN | CET_NO_TRACK_EN)` As an aside, interestingly, if a user-mode C program uses a switch statement that meets the conditions for generating a jump table (gcc uses `-fcf-protection=full` by default), the generated jump table will use a `jmp` with the `notrack` prefix, and IBT will be marked as `true` in the `.note.gnu.property` section of the compiled elf file, so that the `NO_TRACK_EN` of the `MSR` will be set to `true` in user mode when the kernel is loaded. So user mode can use `notrack` to bypass CET without caring about setting or not setting `NO_TRACK_EN`. bignum_x86-64.S: replace ret with RET, to use __x86_return_thunk
#1808 (comment) Problems: 1. In the new kernel, assembly functions uniformly return from `__x86_return_thunk`. However, our assembly code uses the original `ret` instruction, so objtool in the kernel will notice this is a naked return during compilation. 2. `SYM_FUNC_START` in the new kernel will add endbr64 to the head of the assembly function, and all indirect jumps to ENDBR instructions, that is, the code snippet within the same function, will fail, but we use jump tables in the assembly function to perform indirect jumps. It will raise CET exception: https://en.wikipedia.org/wiki/X86_instruction_listings#Added_with_Intel_CET). Solutions: 1. Substitute the `ret` with `RET`, a macro in the new kernel to ensure the correct return. 2. `notrack jmp` and enable notrack in CPU setting: `wrmsrl(MSR_IA32_S_CET, CET_ENDBR_EN | CET_NO_TRACK_EN)` As an aside, interestingly, if a user-mode C program uses a switch statement that meets the conditions for generating a jump table (gcc uses `-fcf-protection=full` by default), the generated jump table will use a `jmp` with the `notrack` prefix, and IBT will be marked as `true` in the `.note.gnu.property` section of the compiled elf file, so that the `NO_TRACK_EN` of the `MSR` will be set to `true` in user mode when the kernel is loaded. So user mode can use `notrack` to bypass CET without caring about setting or not setting `NO_TRACK_EN`. bignum_x86-64.S: replace ret with RET, to use __x86_return_thunk Use endbr64 on each switch label
We were living with 5.10 for too long, it's time to migrate to the 6.1 longterm kernel.
Please update all the https://github.com/tempesta-tech/tempesta/wiki pages referencing the old kernel.
Please also grep and fix all
TODO #1808
comments.The text was updated successfully, but these errors were encountered: