(Go: >> BACK << -|- >> HOME <<)

Page MenuHomePhabricator

generate_vrts_aliases failing on mx-in1001
Open, MediumPublic

Description

Despite recent changes in T284145: Clean up OTRS/Znuny addresses handles by gsuite generate_vrts_aliases is still failing on mx-in1001 almost daily at different times.

[2024-06-24 10:53:48] <jinxer-wm> FIRING: [2x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[2024-06-24 11:08:48] <jinxer-wm> FIRING: [2x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[2024-06-24 11:15:49] <jinxer-wm> RESOLVED: [2x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed

Event Timeline

jhathaway triaged this task as Medium priority.Mon, Jun 24, 2:45 PM

I was able to capture this traceback:

Traceback (most recent call last):
  File "/home/jhathaway/./vrts_aliases", line 162, in main
    if verify_email(row[0], config["DEFAULT"]["smtp_server"]):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jhathaway/./vrts_aliases", line 41, in verify_email
    smtp.connect(smtp_server)
  File "/usr/lib/python3.11/smtplib.py", line 343, in connect
    (code, msg) = self.getreply()
                  ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/smtplib.py", line 405, in getreply
    raise SMTPServerDisconnected("Connection unexpectedly closed")
smtplib.SMTPServerDisconnected: Connection unexpectedly closed

I'm not sure why are connection is getting closed, or why this would only appear on mx-in hosts, perhaps the bookworm upgrade? We may want to add some retry logic.