Archive for the ‘Downtime’ Category

Group share woes

Friday, December 8th, 2017

Update 20.12.: the strange intermittent permission problems some of you experienced could be traced back to a kernel regression. We're now back to using an older kernel.

Update 13.12.: we're cautiously optimistic that the problems have been fixed. Since Monday the file server has survived everything we threw at it. The culprit seems to be an Infiniband switch that sporadically disconnected under heavy load. We're now also turning on some performance improvements again, so you should see a speed increase when browsing files.

Update 06:45: group shares are back. Please let us know if you encounter any problems.

As some of you might have noticed, we've had some service quality issues with our group share server in the last few months. While not all interruptions are under our control (Informatikdienste lately have been very busy upgrading the ETH network, causing various network disruptions), we do have a problem with the group share server: it runs fine for weeks on end until it suddenly doesn't. To this day we have not been able to pinpoint the underlying problem, despite having changed a lot of parameters, both software and hardware. Our next step will be replacing the kernel on the disk backends and switch some hardware - for that we need a scheduled downtime on

Monday, December 11, starting at 06:00

during which the group shares will be unavailable for about 90 minutes. This affects all D-PHYS and IGP shares except the Astro and newly migrated IPA ones. We will post an update when the system is back.

We do apologize for the inconvenience these service issues might have caused you. Please bear with us while we're trying to locate and eliminate the root cause. We're monitoring the situation 24/7 and try to react as quickly as possible whenever a problem occurs. But wait! You can help! There seems to be a correlation between crash probability and large scale small file I/O. This means you should, whenever possible, avoid reading or writing a lot of small files and bundle your data into fewer and larger files. This also increases performance!

Server room migration on Wed, Aug 23

Tuesday, July 25th, 2017

Update Thursday 01:45: we hit some unexpected problems with the non-Astro group shares. Everything is back now, please let us know if you expericence any problems..

Some months ago, we were informed by Informatikdienste that we would have to migrate our two water cooled racks in the HIT server room due to upcoming remodeling. This move will take place on

Wednesday, August 23, starting at 16:00

and last for several hours. During this time, all our IT services will be unavailable, including login, e-mail, storage and ISG-hosted websites. Incoming e-mail will be kept back and delivered afterwards. We will give our best to have login and e-mail back up within the first two hours, but group drives will take a bit longer due to the sheer amount of hardware we have to move.
We apologize for any inconvenience. Unfortunately, this migration cannot be performed on a weekend as we might have to interact with our colleagues at Informatikdienste, but it will ensure secure and enduring operation of our servers in the future.

some impressions from the migration - thanks to the whole team!

Web server upgrade on Jan 19

Thursday, January 12th, 2017

On Thursday, January 19, starting at 08:00, we will OS upgrade the main D-PHYS web server. All websites hosted on zwoelfi*.ethz.ch will be down for several hours and will gradually come back as we progress. This does not affect the department website, the institutes and many of the group websites. However our groupware, the wikis and many special interest sites will be inaccessible. Note that if you're using the ActiveSync connector via groupware to sync email to your cell phone or Outlook, this won't work either. Temporarily use webmail while we bring back groupware as one of the first services.

Update 17:30 - due to an inordinate amount of user files on the web server the upgrade took a bit longer than anticipated. Now almost all websites should be back online, please let us know if you encounter any problems.

Network downtime 15th of September 05:30 – 07:30

Tuesday, September 13th, 2016

The Informatikdienste are upgrading the routers in our HIT/D/13 server room causing a downtime of the network of about 1 hour on Thursday morning, 15th September, between 05:30 and 07:30. Please note that various services will not be available during that time.

Maintenance window on Monday, September 5, 17:00

Tuesday, August 30th, 2016

In order to perform some core service upgrades, we schedule a server maintenance window on

Monday, September 5, starting at 17:00 and lasting for approximately 3 hours.

Most D-PHYS IT services will be affected by that downtime, including logins, file servers and e-mail services.
E-mails coming in during the downtime will be held on the sender’s side and will arrive at D-PHYS with a delay. Sending e-mails won’t be possible during the window.

We’ll update this posting as soon as things are back to normal.

Update Monday 18:30 We managed to complete the migration ahead of time, everything should be back to normal. If you still encounter any problems, please let us know.

Scheduled Maintenance Downtime Starting on Thursday, 14th of April, 5pm

Thursday, April 7th, 2016

Due to required changes to our network infrastructure and some hardware maintenance, we're scheduling a maintenance downtime for most D-PHYS servers starting on Thursday, 14th of April 2016, at 5pm. The downtime will last several hours and single services may be down for longer than others or will be down multiple times in a row.

We'll update this posting as soon as things are back to normal.

Most D-PHYS services will be affected by that downtime, especially file servers and e-mail services, but also some virtual machines and most websites hosted by ISG D-PHYS are affected. (http://www.phys.ethz.ch/ and other AEM-hosted websites are not affected.)

E-mails coming in during the downtime will be held on the sender's side and will arrive at D-PHYS with a delay. Sending e-mails won't be possible during the downtime either.

After the migration we will benefit from a faster and more reliable network connection to our servers.

Update at 19:30: Most services are back to normal. Expect further downtimes for home directories and mail later this evening.

Update at 23:00: All services are available again.

Update Fri 09:00: After Thursday's network migration a defective patch cable caused network problems on Friday morning.

Emergency Downtime of Mail Server

Wednesday, December 2nd, 2015

We had to shut down the D-PHYS mail server on short notice for replacing faulty hardware. Mail service should be back in the evening.

Mails sent to D-PHYS during the downtime will be on hold on the sending side.

Update, 19:30: The dust is settling. We're soon back to normal.

Hardware Maintenance Downtime of Mail Server Today 5pm

Thursday, July 2nd, 2015

In order to perform system-level maintenance work, we schedule a maintenance downtime of the D-PHYS mail server today (Thursday, 2nd of July 2015) starting at 17:00. We expect the downtime to take about one hour.

During the downtime all mail services (sending mail, receiving mail, accessing mailboxes, webmail, etc.) will be unavailable. Mails sent to D-PHYS users during the downtime will be held back on the sending side and will be delivered after the downtime.

We will post an update as soon as mail services are back.

Update 19:30: Took a little bit longer than expected, but everything is back to normal now.