Skip to content

Conversation

@hpk42
Copy link
Contributor

@hpk42 hpk42 commented Jan 10, 2026

found this fix in omidz4t fork
main...omidz4t:relay-ir:main

@hpk42
Copy link
Contributor Author

hpk42 commented Jan 10, 2026

hum, CI fails. it seems postfix isn't actually having a systemd service unit or so?
my manual deploy to c2.testrun.org worked, however, and did not give this error.

@link2xt
Copy link
Contributor

link2xt commented Jan 10, 2026

There is a postfix systemd unit at /lib/systemd/system/postfix.service, this path is displayed with systemctl status postfix.

@link2xt
Copy link
Contributor

link2xt commented Jan 10, 2026

The error in CI log:

--> Starting operation: Start and enable Postfix 
    [staging2.testrun.org] Failed to restart postfix.service: Unit postfix.service has a bad unit file setting.
    [staging2.testrun.org] See system logs and 'systemctl status postfix.service' for details.
    [staging2.testrun.org] Error: executed 1 commands

@j-g00da
Copy link
Collaborator

j-g00da commented Jan 12, 2026

More context:

Service has Restart= set to either always or on-success, which isn't allowed for Type=oneshot services. Refusing.

@adbenitez
Copy link
Contributor

btw, does dovecot already has similar restarting service? it happened to me in the past that dovecot got killed in a small vps during a peak/bottleneck while I was sleeping and the service was down until I realize it, having some downtime hours unnecessarily, that could have been avoided if it would have just restarted

@missytake
Copy link
Contributor

btw, does dovecot already has similar restarting service? it happened to me in the past that dovecot got killed in a small vps during a peak/bottleneck while I was sleeping and the service was down until I realize it, having some downtime hours unnecessarily, that could have been avoided if it would have just restarted

Yes, for dovecot we have it since #765, mostly because you reported that problem

@missytake
Copy link
Contributor

missytake commented Jan 13, 2026

hum, CI fails. it seems postfix isn't actually having a systemd service unit or so? my manual deploy to c2.testrun.org worked, however, and did not give this error.

postfix has a more complex systemd unit architecture, which is not easily transferrable from dovecot.

postfix.service is a oneshot service that triggers many smaller processes, and then exits. That's why it's working state is active (exited), while dovecot is active (running). The main postfix unit will not really fail during operation like that, because it isn't actually running during operation. So restarting on failure makes less sense.

We should remove the restarting logic from the postfix deployer, which anyway only tries to restart dovecot because of an earlier oversight, and then think about which problem we really want to solve with restarting postfix.

@missytake missytake temporarily deployed to staging-ipv4.testrun.org January 13, 2026 22:41 — with GitHub Actions Inactive
@missytake missytake temporarily deployed to staging2.testrun.org January 13, 2026 22:41 — with GitHub Actions Inactive
@missytake
Copy link
Contributor

Looks like this is the systemd unit we actually want to restart:

root@staging2:~# systemctl status postfix@-
● postfix@-.service - Postfix Mail Transport Agent (instance -)
     Loaded: loaded (/lib/systemd/system/postfix@.service; enabled-runtime; preset: enabled)
    Drop-In: /etc/systemd/system/postfix@.service.d
             └─10_restart.conf
     Active: active (running) since Tue 2026-01-13 22:52:40 UTC; 8min ago
       Docs: man:postfix(1)
      Tasks: 6 (limit: 4530)
     Memory: 8.7M
        CPU: 2.755s
     CGroup: /system.slice/system-postfix.slice/postfix@-.service
             ├─31184 /usr/lib/postfix/sbin/master -w
             ├─31185 pickup -l -t unix -u -c
             ├─31186 qmgr -l -t unix -u
             ├─31407 tlsmgr -l -t unix -u -c
             ├─32447 smtpd -n smtp -t inet -u -c -o stress= -s 2 -o smtpd_tls_security_level=encrypt -o "smtpd_tls_mandatory_protocols=>=TLSv1.2" -o smtpd_proxy_filter=127.0.0.1:10081
             └─32449 anvil -l -t unix -u -c

Jan 13 22:54:15 staging2 postfix/smtps/smtpd[31408]: disconnect from unknown[172.182.226.233] ehlo=1 auth=1 mail=62 rcpt=61/62 data=61 rset=1 commands=187/188
Jan 13 22:54:17 staging2 postfix/smtps/smtpd[31405]: disconnect from unknown[172.182.226.233] ehlo=1 auth=1 mail=1 rcpt=1 data=1 commands=5
Jan 13 22:57:37 staging2 postfix/anvil[31410]: statistics: max connection rate 51/60s for (smtps:172.182.226.233) at Jan 13 22:54:04
Jan 13 22:57:37 staging2 postfix/anvil[31410]: statistics: max connection count 12 for (smtps:172.182.226.233) at Jan 13 22:53:24
Jan 13 22:57:37 staging2 postfix/anvil[31410]: statistics: max cache size 3 at Jan 13 22:53:52
Jan 13 22:59:10 staging2 postfix/smtpd[32447]: connect from unknown[158.94.210.39]
Jan 13 22:59:10 staging2 postfix/smtpd[32447]: disconnect from unknown[158.94.210.39] ehlo=1 auth=0/1 rset=0/1 quit=1 commands=2/4
Jan 13 23:00:45 staging2 postfix/smtpd[32447]: connect from unknown[31.129.22.226]
Jan 13 23:00:45 staging2 postfix/smtpd[32447]: lost connection after EHLO from unknown[31.129.22.226]
Jan 13 23:00:45 staging2 postfix/smtpd[32447]: disconnect from unknown[31.129.22.226] ehlo=0/1 commands=0/1
root@staging2:~# systemctl cat postfix@-
# /lib/systemd/system/postfix@.service
[Unit]
Description=Postfix Mail Transport Agent (instance %i)
Documentation=man:postfix(1)
PartOf=postfix.service
Before=postfix.service
ReloadPropagatedFrom=postfix.service
After=network-online.target nss-lookup.target
Wants=network-online.target

[Service]
Type=forking
GuessMainPID=no
ExecStartPre=/usr/lib/postfix/configure-instance.sh %i
ExecStart=/usr/sbin/postmulti -i %i -p start
ExecStop=/usr/sbin/postmulti -i %i -p stop
ExecReload=/usr/sbin/postmulti -i %i -p reload

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/postfix@.service.d/10_restart.conf
[Service]
Restart=always
RestartSec=30

Now it should work. Can a fresh pair of eyes look over it?

Copy link
Contributor

@ccclxxiii ccclxxiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@missytake i pulled the branch and went through smoke test, no regressions seen so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants