There is a major outage at our FRA3 site. We're investigating.
Recovery is done, systems are up and running.
The power to the rack has been restored, we are now in process of bringing the systems up again.
One of our racks at FRA3 has lost power (electricity). We are in contact with the on-site technicians of the data centre provider to figure out how to restore power.
Mails to Microsoft accounts are rejected, we are not given a reason and our request to unblock our mailgates is not responded to. We change the mail gateways as far as we can, but unfortunately the block seems to take effect after a few hours on the gateway we changed.
Google Mail (googlemail.com; gmail.com) no longer accepts mails when the sender domain does not have valid SPF and/or DKIM set up. To set up SPF and/or DKIM for your domain hosted at Uberspace.de, please refer to our manual.
Currently, we can not deliver mail to any t-online servers.
We can connect to t-online's MX servers again.
While we have no detailed information at this time, we suspect an error with t-online's infrastructure, as reported elsewhere on the net over the last hour. We keep investigating.
E.g. https://www.ifun.de/telekom-kunden-berichten-von-flaechendeckenden-e-mail-stoerungen-238410/
Since the end of last week we have been experiencing increasing problems with mail delivery to certain destinations from all uberspace.de hosts.
The exact cause is still is not yet known, but we are investigating and working on workarounds.
Mail delivery was rerouted from the individual hosts to our mail gateway. Delivery problems should be resolved.
We're checking our logs on a daily basis and identify domains, which are affected. We work around these domains on the next business day. So if you get a bounce for a domain you haven't sent a mail within the last two weeks today please try again the next day. We'll have that domain on our list by then.
Since we still haven’t received any helpful response from the reputation service provider, we have now implemented a permanent workaround for all domains with mail servers behind IronPort that we are aware of.
After a brief recovery, a large proportion of our hosts are now blocked again. While we continue to wait for answers as to why, we have now placed an smtproute over working hosts for all blocked hosts.
Apparently, a major provider of e-mail security solutions had classified us as “poor” - contrary to its publicly available information and initial responses to our inquiries - and blocked us on its IronPort firewall.
Unfortunately, despite several attempts, we still have no information about which patterns might have been responsible, but at least the block seems to have been lifted. Nevertheless, we will continue to monitor this closely.
While ananke itself is working fine, its MariaDB instance seems to hang. We're checking things out.
Recovery is done, system is up and running. MariaDB works fine again. As it didn't even react to a SIGKILL and completely blocked even a shutdown, we had to hard-reset ananke. Afterwards, MariaDB was running fine; tables are undergoing a consistency check. There's no indication of circumstances that could have led to this situation, so we suspect a rare bug in MariaDB itself. No other hosts were affected, and the underlying storage system is completely healthy with no errors.
The system needs to be rebooted and will reappear in a few minutes.
A planned routing failover had unexpected side effects. We're already in debug mode, stable operation should be back in a moment.
We noticed a network outage at our FRA4 datacentre. Based on information of our provider, this is reportedly caused by a power outage at a major data carrier.
Our datacenter has immediately switched to backup connections, but it took a few minutes until the routing tables have been adapted accordingly. During this time, we observed partial connectivity in varying ways (some peerings worked faster than others). The majority of connections was back on 12:06 GMT+2, three minutes after the incident. A few peerings needed up to ten more minutes to reconnect cleanly. Between 12:21 GMT+2 and 12:23 GMT+2 there was another short outage due to works needed to stabilize the situation, which resolved quickly.
After MariaDB being unresponsive to signals, we decided to give the whole system a reboot. We will check for integrity afterwards.
One of our nodes crashed, we are rebooting these servers:
We have enabled mitigations for the Terrapin SSH vulnerability. If you are experiencing trouble connecting to Uberspace hosts via SSH, please make sure your SSH client is up to date and supports more secure ciphers.