Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux fah-client user in video/render groups, but no effect? #323

Open
marcosfrm opened this issue Jan 30, 2025 · 5 comments
Open

Linux fah-client user in video/render groups, but no effect? #323

marcosfrm opened this issue Jan 30, 2025 · 5 comments

Comments

@marcosfrm
Copy link
Contributor

See:
https://foldingforum.org/viewtopic.php?p=367199#p367199

Maybe when fah-client.service starts, udev hasn't tweaked the device nodes permissions yet -- and DRM drivers are huge these days and take a while to load, especially amdgpu. One way to test is to add ExecStartPre=-/usr/bin/ls -l /dev/dri/ to fah-client.service and comment out StandardOutput=null. Use systemctl edit --full fah-client.service for that (and systemctl revert fah-client.service to undo). Pro tip: set your EDITOR environment variable so you don't get trapped in vim! haha. Then reboot the machine and check the output of journalctl -b -u fah-client.service. When the rules are applied on time, it'll look something like this:

Jan 30 13:16:59 fedora ls[942]: drwxr-xr-x. 2 root root         80 Jan 30 13:16 by-path
Jan 30 13:16:59 fedora ls[942]: crw-rw----. 1 root video  226,   0 Jan 30 13:16 card0
Jan 30 13:16:59 fedora ls[942]: crw-rw-rw-. 1 root render 226, 128 Jan 30 13:16 renderD128

If it's a real issue, is there a simple way to get the service synced up after udev runs its rules, without messing with libsystemd/libdrm?

@muziqaz

@muziqaz
Copy link
Contributor

muziqaz commented Jan 30, 2025

So give more context on this:
This is affecting AMD only, I even alluded about this to AMD, but got no reaction to it from rocm crowd.
So the way I came to the conclusion, is by setting up 8 or so different Linux distros to work out the easiest and consistent way to get AMD GPUs working with opencl FAH, as before this getting AMD GPU to fold on linux was a complete joke and a lottery.
Distros, which I had set up were:
Fedora (38 and 40-41),
Debian,
OpenSuse (latest),
Kubuntu (24.04),
Manjaro (later Endeavours),
Pop_OS,
CentOS (later nuked)
Linux Mint (21.3 and 22)
rocm-opencl-sdk was the most consistent package, which would get AMD GPUs up and running with OpenCL platform (clinfo would print platform and device successfully after that package was installed).
However, FAHClient would still show my GPUs as not supported by FAHClient.

sudo usermod -a -G render muziqaz
sudo usermod -a -G video muziqaz

were the commands which would let FAHClient see AMD GPUs as supported. It would require a reboot for that after commands were used. In most cases I could get away with adding my user to render group only.
This was also confirmed with multiple other users across internets who were having issues getting AMD GPUs to fold. They would get things going after users were added to render and video groups.

The way I look at it, this is not a massive issue, since AMD users require to manually set up their systems to be able to fold. running couple of extra commands in console is not a big deal at all, thus I never bothered anyone with that :)
At this current moment getting AMD GPU to run FAH is relatively painless procedure:
add radeon repo to distro sources, install rocm-opencl-sdk, run usermod commands, reboot and fold. A farcry from dark magic lottery which was required few years back. Obviously nothing close to how nvidia works, which only need drivers installed, but that's AMDs issue, not FAH's. Though incoming AMD HIP platform might complicate things a little bit, but we will see

@marcosfrm
Copy link
Contributor Author

I can't imagine how adding the machine's regular user to the video and render groups could help in any way. The FAH client process and its descendants, including the WUs, have no connection to that user. They run under the fah-client system user. Another thing, which doesn't seem to matter in this case: systemd-logind applies ACLs to the device nodes, granting access to the logged-in user while a local session exists:

$ getfacl /dev/dri/card0
getfacl: Removing leading '/' from absolute path names
# file: dev/dri/card0
# owner: root
# group: video
user::rw-
user:marcos:rw-
group::rw-
mask::rw-
other::---

That's why adding the user who will log in locally to such groups is no longer necessary for a long time with systemd-logind. It must have been compiled with ACL support (+ACL in the output of systemctl --version); however, I think it's the default in all distributions.

From what I could infer, the operation is more or less like this:

  • FAH client will load libOpenCL.so
  • libOpenCL.so is a shim, which in turn will load the implementation of AMD, Intel or Nvidia
  • this implementation will do its job under the hood and eventually open /dev/dri/card<n> and/or /dev/dri/renderD<n> (this one is accessible to everyone) to talk to the GPU
  • all this will be done running with the fah-client user

At the same time, we have:

  • the GPU driver is loaded on demand by udev when the kernel announces the device via uevent
  • this loading may not be instantaneous, particularly in the case of the amdgpu module, which is huge (it's a problem in other circumstances too: https://hansdegoede.dreamwidth.org/28552.html)
  • the module creates the device nodes (Nvidia, in some cases, depends on a SUID root binary for this: Disable NoNewPrivileges #315)
  • udev processes the permission/owner adjustment on the nodes on demand, as they appear
  • there is no synchronization between the start of fah-client.service and the above sequence being finished -- the default DefaultDependencies=yes means that sysinit.target will be started first, which in turn only ensures that udevd was started, without guaranteeing any processing of kernel uevents

If you've got a machine where this always happens, it'd be interesting to edit fah-client.service, as I mentioned earlier, this time adding ExecStartPre=-/usr/bin/udevadm wait --timeout 10 /dev/dri/card0 to the [Service] section to see if it helps. This is intended to ensure that udev has fully processed the device. It is not a generic enough solution, which applies to all cases, such as more than one GPU -- it is possible to specify multiple devices to be waited for, however, if any does not exist, the command will block until the timer (in seconds) finishes. Besides that, the wait verb is relatively recent, available from systemd 251.

@muziqaz
Copy link
Contributor

muziqaz commented Jan 31, 2025

All the recent attempts were done on Debian based systems, where I used fahclient installer from FAH website. Each of them (Mint 22 and Mint 21.3, and Kubuntu) required usermod magic.
I'll try to see if I can find unmolested OS to try it from the start to really confirm if usermod is the catalyst.
I never questioned it because it used to solve issues with FAH, and it has been as one of the suggestions for ages. But I will try with the clean system, or even removing my user from render/video groups should be enough

@muziqaz
Copy link
Contributor

muziqaz commented Feb 1, 2025

So I tried untouched Debian 12 system.
To my surprise after I installed fahclient, and then rocm-opencl-sdk and then sudo systemctl restart fah-client, fahclient is showing GPU as supported, and I can fold on it, and fah-client is in render group, while my username is not in the render group.
Image

There might be couple of explanations to my defense (and madness) :D
sudo usermod might have been a habit from my side waaay from v7 days, and pre-rocm days or during early rocm days, when sudo usermod was a must and was one of many things required to get AMD folding in Linux, so it hung in my guides as a precaution. And I used clinfo output as a sign for opencl platform being fully functional in the system.
Second reason: it might be distro dependant, too, since I went through many distros with these guides and many times if I forgot to run usermod commands, things would not show as supported and would not fold. Fedora, Manjaro, probably even Kubuntu.
Since I encountered so many weird and random issues while trying to get a common install guide, I just stuck with usermod command as a requirement without doublechecking if anything changed or not.
Due to you questioning this, I can now move usermod command to "In case sh*t doesn't work" section instead of required section. Checking which groups fah-client is residing in might be a priority now. To be honest, I knew fah-client acted as a separate user, but I didn't even realise that fah-client would add itself to render group.
Either way, thanks for tagging this, I can now simplify my guides even more, until HIP hits the streets...

@marcosfrm
Copy link
Contributor Author

Thanks for checking. In theory, the permission/owner adjustments on the device nodes might not be applied in time. I think it's unlikely, at least while the DRM drivers are loaded in the initramfs, which is pretty much universal, I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants