-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFE: Add support for maximum supported kernel version #457
base: main
Are you sure you want to change the base?
Conversation
According to my system calls table there are holes in syscall numbering on several architectures (looked at arm64, arm, armoabi, x86-64, x32 and i386). New style architectures share syscall numbering and new entries are added at the end of table. Your syscalls.csv shown me that I missed "parisc64" architecture. Will have to add support for it. (Edit: DONE) When it comes to LTS/stable kernels then I think that one of rules in them is "no new stuff" which in this case mean no new system calls. Distribution kernels may add them and many did that in the past so check "is syscall present" may need to be more complex than "is kernel version high enough". As you have support for syscall.tbl for x86 variants then for start it can be expanded for other architectures too. Will not cover all system calls but you get data for many. I used those scripts for quick check with my syscalls-table project: #!/bin/bash
KERNELDIR=~/devel/sources/linux/
for kernel_version in 3.{0..19} 4.{0..20} 5.{0..19} 6.{0..13}
do
echo $kernel_version
(cd $KERNELDIR; git checkout v${kernel_version})
bash scripts/update-tables.sh $KERNELDIR
pip install .
python examples/tables-to-yaml.py $kernel_version
cp -r data/tables data/tables-${kernel_version}
cp syscalls.yml syscalls-${kernel_version}.yml
done examples/tables-to-yaml.py one: #!/usr/bin/python3
import sys
import system_calls
import yaml
kernel_version = ""
if len(sys.argv) > 1:
kernel_version = sys.argv[1]
syscalls = system_calls.syscalls()
with open("syscalls.yml", "r") as yf:
yml = yaml.safe_load(yf)
for syscall_name in yml["syscalls"]:
if not yml["syscalls"][syscall_name]["from"]:
yml["syscalls"][syscall_name]["from"] = kernel_version
for arch in syscalls.archs():
try:
number = syscalls.get(syscall_name, arch)
except system_calls.NotSupportedSystemCall:
number = ""
pass
yml["syscalls"][syscall_name]["archs"][arch]["number"] = number
if number and not yml["syscalls"][syscall_name]["archs"][arch]["from"]:
yml["syscalls"][syscall_name]["archs"][arch]["from"] = kernel_version
with open("syscalls.yml", "w") as yf:
yaml.dump(yml, yf) Not checked result for correctness yet. |
Promote the scmp_kver enumeration to the public header file, seccomp.h.in. Add enumerations for all kernel versions from 4.0 to 6.12 Signed-off-by: Tom Hromatka <[email protected]>
A placeholder, KV_UNDEF, was added for when each syscall was added to the kernel for each architecture, but the C code has defined this enum value as SCMP_KV_UNDEF. Find and replace all instances of KV_UNDEF with SCMP_KV_UNDEF. Signed-off-by: Tom Hromatka <[email protected]>
78a0a6e
to
9db8052
Compare
Moved the discussion list to the v3 comment Here's a side-by-side diff of between v1 of this patchset's syscalls.csv and v2's syscalls.csv |
Isn't that 'kernel wide' new system calls are added at the end and 'new on this architecture' ones are added where they were supposed to be? I remember system calls which were added on subset of architectures in kernel X (and got the highest number) and then kernel X+1, X+2 added it for other architectures. And if there were any new 'kernel wide' system calls added in meantime then it looked like some were added in a middle of table. |
Please note that "afs_syscall, break, fattach, fdetach, ftime, getmsg, getpmsg, gtty, isastream, lock, madvise1, mpx, prof, profil, putmsg, putpmsg, security, stty, tuxcall, ulimit, vserver" are officially unimplemented system calls. My syscalls-table has them on ignorelist so that can be why you have some diff. And problem of x32 is that you need x32 headers in system to get them properly handled. Otherwise you get x86-64 ones. My github action which updates syscalls-table data has extra step to make sure that they are present. |
Posted on mastodon about it: https://society.oftrolls.com/@hrw/114030254556485861 as some other people may find it useful too. |
Yes, that was my recollection as well, but I wanted data to back it up. I expect this model to continue going forward. For libseccomp I think that means that we can't rely on a "less than" rule for unknown syscalls. We'll either need an explicit rule for each syscall or a series of ranges. Thanks for the verification, @hrw |
https://gpages.juszkiewicz.com.pl/syscalls-table/syscalls.html allows to disable and reorder columns which can be handy when you want to compare numbers between architectures. I recommend sorting by arm64 or riscv64 column to see how new system calls are present on each architecture. Note that everything from 'avr32' to right side does not exist in current Linux kernel - they are kept for historical purposes. |
Add a tool to populate the syscalls.csv table. It parses the data output from the syscalls-table [1] tool. The following script was used to build the directories and files with the relevant syscall data: #!/bin/bash KERNELDIR=~/devel/sources/linux/ for kernel_version in 3.{0..19} 4.{0..20} 5.{0..19} 6.{0..13} do echo $kernel_version (cd $KERNELDIR; git checkout v${kernel_version}) bash scripts/update-tables.sh $KERNELDIR pip install . python examples/tables-to-yaml.py $kernel_version cp -r data/tables data/tables-${kernel_version} cp syscalls.yml syscalls-${kernel_version}.yml done Note that the inlined script above takes quite a bit of time to run :) [1] https://github.com/hrw/syscalls-table Signed-off-by: Tom Hromatka <[email protected]>
Using the script from the previous commit, populate the syscalls.csv table for all architectures. Signed-off-by: Tom Hromatka <[email protected]>
Add a tool, scmp_get_max_syscall_num.py, that can calculate the largest current syscall number. As of this commit, the largest syscall number is 547 via pwritev2() in the x32 architecture. Signed-off-by: Tom Hromatka <[email protected]>
Add two new filter attributes, SCMP_FLTATR_ACT_ENOSYS and SCMP_FLTATR_CTL_KVER. When SCMP_FLTATR_CTL_KVERMAX is set, then libseccomp will handle syscalls as follows: * syscalls with explicit actions set by the user will behave as before * syscalls that are not explicitly called out by the user's filter but are valid for the specified kernel version will return the default filter action (SCMP_FLTATR_ACT_DEFAULT). * syscalls that are newer than the specified kernel version will return the unknown filter action (SCMP_FLTATR_ACT_ENOSYS) Note that setting the SCMP_FLTATR_CTL_KVERMAX can result in large seccomp BPF filters. It's recommended to also enable the binary tree optimization (SCMP_FLTATR_CTL_OPTIMIZE = 2) to speed up filter traversal in the kernel. Signed-off-by: Tom Hromatka <[email protected]>
Add support for an application to specify the maximum kernel version it currently supports. Any syscalls that have been added to a kernel version newer than this specified version will return the unknown action. The unknown action defaults to returning ENOSYS, but it can be overridden via the filter attribute SCMP_FLTATR_ACT_ENOSYS. When the maximum supported kernel version is enabled, libseccomp will create a filter as follows: * Users explicitly declare rules for syscalls. No changes here from previous behavior * The default action provided via seccomp_init() will still be used for all syscalls that existed as of the user-specified supported kernel * Any syscalls that did not exist at the time of the user-specified supported kernel will return the unknown action. By default libseccomp sets this to return ENOSYS, but it can be overridden via the filter attribute SCMP_FLTATR_ACT_ENOSYS. Below is a rough pseudo-code outline of a typical usage of this feature: seccomp_init() seccomp_add_rules() (optional but recommended) seccomp_attr_set( binary tree ) seccomp_attr_set( max supported kernel version, e.g. SCMP_KV_6_5 ) (optional) seccomp_attr_set( default unknown action ) seccomp_load() seccomp_release() Fixes: seccomp#11 Signed-off-by: Tom Hromatka <[email protected]>
Add a test, 63-sim-kernel_version.[c|py], to test the kernel version logic. Signed-off-by: Tom Hromatka <[email protected]>
Add documentation for SCMP_FLTATR_ACT_UNKNOWN and SCMP_FLTATR_CTL_KVER. Signed-off-by: Tom Hromatka <[email protected]>
9db8052
to
425defc
Compare
Changes for v3:
Discussion
|
Like I wrote above: afs_syscall() and a bunch of others are listed in system call tables in kernel but are not implemented. My code ignores them. |
Ack. That's on my todo list :) |
This patchset proposes to solve issue #11 - RFE: support "maximum kernel version".
Signficant changes in this patchset
SCMP_FLTATR_ACT_ENOSYS
andSCMP_FLTATR_CTL_KVERMAX
, for managing the maximum supported kernel version and what to do with syscalls that are newer than that versionDEFAULT
action. (See the discussion below for more info.)Fixes: #11
CC: @kolyshkin @cyphar
Finally, I am hoping to discuss this issue at Linux Security Summit 2025 in Denver, Colorado USA on June 26th and 27th. I would love to get community feedback about the problem, the proposed solution, etc.