Systemd units, timers and podman as a cron replacement
I was implementing a cronjob replacement server with systemd user units, systemd user timer units and podman containers started from scripts.
To have units and timers being automatically activated after boot, it was nescessary to run the following as root:
loginctl enable-linger testme
To have users check the output of their jobs with journalctl I found out that adding them to the systemd-journal group would allow them to do that.
The plan was to use a mailto systemd service to mail failed timer triggered jobs to the developers, including status of service and the last lines of journalctl output that comes with systemctl status.
The problem
Now, after having added the user to the systemd-journal group, when logging in as the user checking journalctl output:
[testme@somehostname ~]$ journalctl --user-unit testme
Hint: You are currently not seeing messages from the system.
Users in groups 'adm', 'systemd-journal', 'wheel' can see all messages.
Pass -q to turn off this notice.
Oct 07 22:47:19 somehostname systemd[37628]: Started test me.
Oct 07 22:47:19 somehostname systemd[37628]: testme.service: Main process exited, code=exited, status=1/FAILURE
Oct 07 22:47:19 somehostname systemd[37628]: testme.service: Failed with result 'exit-code'.
[testme@somehostname ~]$
logout
[root@somehostname ~]# usermod -aG systemd-journal testme
[root@somehostname ~]# su - testme
[testme@somehostname ~]$ journalctl --user-unit testme
Oct 07 22:47:19 somehostname systemd[37628]: Started test me.
Oct 07 22:47:19 somehostname systemd[37628]: testme.service: Main process exited, code=exited, status=1/FAILURE
Oct 07 22:47:19 somehostname systemd[37628]: testme.service: Failed with result 'exit-code'.
worked perfectly fine.
On the other hand it was not working perfectly fine when journalctl output or systemctl status was run in a script started with systemd, i.e. triggered by OnFailure mailto service.
Diagnosis
After some head scratching, I decided to check the users group membership when executing the mailto service from a failed systemd user unit.
Logged in directly as user:
Show group memberships
uid=1002(testme) gid=1002(testme) groups=1002(testme),983(systemd-journal)
× testme.service - test me
Loaded: loaded (/home/testme/.config/systemd/user/testme.service; disabled; preset: enabled)
Active: failed (Result: exit-code) since Sat 2023-10-07 22:47:19 CEST; 22h ago
Duration: 36ms
Process: 37965 ExecStart=/home/testme/bin/testme.sh (code=exited, status=1/FAILURE)
Main PID: 37965 (code=exited, status=1/FAILURE)
CPU: 17ms
Oct 07 22:47:19 somehostname systemd[37628]: Started test me.
Oct 07 22:47:19 somehostname systemd[37628]: testme.service: Main process exited, code=exited, status=1/FAILURE
Oct 07 22:47:19 somehostname systemd[37628]: testme.service: Failed with result 'exit-code'.
Running the mailto script from systemd:
Show group memberships
uid=1002(testme) gid=1002(testme) groups=1002(testme)
× testme.service - test me
Loaded: loaded (/home/testme/.config/systemd/user/testme.service; disabled; preset: enabled)
Active: failed (Result: exit-code) since Sun 2023-10-08 21:43:13 CEST; 37ms ago
Duration: 16ms
Process: 115072 ExecStart=/home/testme/bin/testme.sh (code=exited, status=1/FAILURE)
Main PID: 115072 (code=exited, status=1/FAILURE)
CPU: 14ms
Oct 08 21:43:13 somehostname systemd[37628]: Started test me.
Oct 08 21:43:13 somehostname systemd[37628]: testme.service: Main process exited, code=exited, status=1/FAILURE
Oct 08 21:43:13 somehostname systemd[37628]: testme.service: Failed with result 'exit-code'.
Oct 08 21:43:13 somehostname systemd[37628]: testme.service: Triggering OnFailure= dependencies.
Notice the missing group membership in the last command.
The Fix
The was rather simple in hindsight. Since a loginctl enable-linger had been issued for the user, it had been continously logged-in seen from systemd's perspective and hence the group membership for systemd-journal had not been activated for that session.
A simple loginctl enable/disable-linger fixed the problem:
[root@somehostname ~]# loginctl disable-linger testme
[root@somehostname ~]# loginctl enable-linger testme
[root@somehostname ~]# su - testme
[testme@somehostname ~]$ export XDG_RUNTIME_DIR=/run/user/1002
[testme@somehostname ~]$ systemctl --user start testme
[testme@somehostname ~]$ cat yo.out
Show group memberships
uid=1002(testme) gid=1002(testme) groups=1002(testme),983(systemd-journal)
× testme.service - test me
Loaded: loaded (/home/testme/.config/systemd/user/testme.service; disabled; preset: enabled)
Active: failed (Result: exit-code) since Sun 2023-10-08 21:52:31 CEST; 38ms ago
Duration: 18ms
Process: 115614 ExecStart=/home/testme/bin/testme.sh (code=exited, status=1/FAILURE)
Main PID: 115614 (code=exited, status=1/FAILURE)
CPU: 16ms
Oct 08 21:52:31 somehostname systemd[115572]: Started test me.
Oct 08 21:52:31 somehostname systemd[115572]: testme.service: Main process exited, code=exited, status=1/FAILURE
Oct 08 21:52:31 somehostname systemd[115572]: testme.service: Failed with result 'exit-code'.
Oct 08 21:52:31 somehostname systemd[115572]: testme.service: Triggering OnFailure= dependencies.
which basically logged the user out and in again.