Posts Tagged ‘maintenance’

home server maintenance

Wednesday, July 5th, 2023

Scheduled maintenance will be taking place on our home.phys.ethz.ch file server on Wednesday, July 12, starting at 16:00. The service will be down for approximately 4 hours. We will be replacing the hardware with all-flash storage and upgrade the base system.

Update 18:15: the new home server is open for business. Most SMB + NFS clients will not have survived the 2h downtime and will have to be rebooted. We'll go through the most obvious ones, but if yours won't work, try restarting.

All home directories (Linux, Windows and Mac, SMB and NFS) will be unavailable during this time.

For emergency cases, you'll have read-only access to the backups as described here.

This migration will mark the end of the huge storage migration project of 2023. Thanks for your patience.

group-data server maintenance

Wednesday, May 31st, 2023

Scheduled maintenance will be taking place on our group-data.phys.ethz.ch server on Wednesday, June 7, starting at 16:00. The service will be down for approximately 4 hours. We will be replacing some hardware and upgrade the base system.

All group shares will be affected except IPA, IGP and Galaxy.

For emergency cases, you'll have read-only access to the backups as described here.

Web server upgrade

Tuesday, February 8th, 2022

This Thursday 2022-02-10 starting at 07:00 we will upgrade the server hosting most of our websites.

Affected websites

The following websites are unavailable during the downtime:

Important changes for website owners

All website owners: If you are a website owner/admin, please join our new Matrix room #web:phys.ethz.ch, to get support and news. After the upgrade, please check your websites for problems.

Python WSGI app owners: All WSGI apps have been switched to use a virtual environment to pin the currently used Python package versions. We encourage you to review and upgrade your dependency versions (via requirements.txt) after the server upgrade. Please read our new WSGI documentation for details.

Versions

  • OS: Debian 10 -> 11
  • Python: 3.7 -> 3.9
  • PHP: 7.3 -> 7.4

Partial Network Downtime on Mon 6th Dec after 19h00

Monday, November 29th, 2021

The central Informatikdienste will have a scheduled downtime of all networking (cable and wireless) in the buildings HPK, HEZ, HPM, HPL and HPW on Monday 6th Dec 2021 in the evening between 19h00 and 23h00.

This is the second of three downtimes for the ongoing project to split the current networks into smaller chunks. This major undertaking will also induce a short downtime for some computers in the dynamic DHCP pool in other buildings (as some of our IP ranges are being moved to the listed buildings).

Users don’t need to do anything and their computers should come back online automatically. Otherwise try to reboot or get in touch with us.

In order to prepare for the migration, Informatikdienste will forbid all changes to their DHCP servers between Friday 3th Dec 13:00 and Tuesday morning. As a consequence we will not be able to register new devices or hostnames during this period.

Partial Network Downtime on Mon 8th Nov after 19h00

Monday, November 1st, 2021

The central Informatikdienste will have a scheduled downtime of all networking (cable and wireless) in the buildings HPH, HPP, HPR, HPS, HPV and HPZ on Monday 8th Nov 2021 in the evening between 19h00 and 23h00.

This is the first of three downtimes for the ongoing project to split the current networks into smaller chunks. This major undertaking will also induce a short downtime for some computers in the dynamic DHCP pool in other buildings (as some of our IP ranges are being moved to the listed buildings).

Users don't need to do anything and their computers should come back online automatically. Otherwise try to reboot or get in touch with us.

In order to prepare for the migration, Informatikdienste will forbid all changes to their DHCP servers between Friday 5th Nov 13:00 and Tuesday morning. As a consequence we will not be able to register new devices or hostnames during this period.

Hardware maintenance of storage front-end servers.

Thursday, July 30th, 2020

Update 23:50: we ran into severe problems and the migration took longer than expected. Everything is back online now. Sorry we're late.


Planned maintenance will be taking place on all shared-storage front-end servers on Thursday, August 6th, starting at 17:00. The service will be down for approximately 2-3 hours. This post will be updated as soon as work is completed, were we to finish earlier than expected. We will be upgrading the network switch and replacing hardware in several machines.

All group shares will be affected, i.e. group-data, IPA, IGP and Galaxy. Only the home and backup servers will be accessible during this time.

For emergency cases, there will be read-only access to last night’s backup as described here.

Group-data server hardware maintenance.

Tuesday, June 16th, 2020

Update 18:15 group-data is back!

Planned maintenance will be taking place on our group-data.phys.ethz.ch server on Friday, June 19, starting at 17:00. The service will be down for approximately 2 hours. We will be replacing the network interface card to improve service stability.

All group shares will be affected except IPA, IGP and Galaxy.

For emergency cases, there will be read-only access to last night’s backup as described here.

Home server maintenance on Tue, July 9, 17:00

Wednesday, July 3rd, 2019

Update 20:10 Migration finished! Everything should work as normal.

In order to guarantee sustained performance and availability of our storage system, we schedule a maintenance downtime of our home directory server on

Tuesday, July 09, starting at 17:00

This only affects the home shares (technically: smb:\\home.phys.ethz.ch & /home/USERNAME). Email and group shares will have no interruption.

Since the server also needs a file system check, the downtime will take several hours.

For emergency cases, there will be read-only access to last night’s backup as described here .

We will update this posting once the home server is back online.

Mail server maintenance on Tue, March 27

Friday, March 23rd, 2018

Update 07:25 The migration is complete and our mail server is back online. Please let us know if you notice anything peculiar. This concludes our multi-step migration to the new mail server hardware

---

In order to finalize the upgrade of the D-PHYS mail server, we schedule a maintenance downtime on

Tuesday, March 27, between 06:30 and 08:00 in the morning

During that time it will not be possible to send or receive emails. In particular, incoming external emails will not be lost, but held on the sender’s side and will be delivered after the migration. Outgoing mail will be kept in your mail client until the connection is restored.

We will update this posting once the mail server is back online.

New location for mail filtering rules, forwarding and vacation auto-replies

After the migration, all mail-related settings will be consolidated into the Roundcube Webmail interface:

  • spam filtering rules (whitelist, blacklist)
  • forwarding of your emails to a different account
  • setting a vacation or out-of-office auto-reply message
  • defining rules to automatically file incoming mails into specific folders

This will make configuring your email settings easier and also give you more options than before (for example, the out-of-office auto-reply can now be configured to automatically terminate at the end of your absence).

Please refer to our readme for details on how to customize these settings in the future. Feel free to contact us if you have any questions.

The current settings of all active users have been converted and imported.

In technical terms we are migrating from procmail to sieve. In particular the hidden text file ~/.procmailrc in the user's home folder will be ignored after the migration.

Group share woes

Friday, December 8th, 2017

Update 20.12.: the strange intermittent permission problems some of you experienced could be traced back to a kernel regression. We're now back to using an older kernel.

Update 13.12.: we're cautiously optimistic that the problems have been fixed. Since Monday the file server has survived everything we threw at it. The culprit seems to be an Infiniband switch that sporadically disconnected under heavy load. We're now also turning on some performance improvements again, so you should see a speed increase when browsing files.

Update 06:45: group shares are back. Please let us know if you encounter any problems.

As some of you might have noticed, we've had some service quality issues with our group share server in the last few months. While not all interruptions are under our control (Informatikdienste lately have been very busy upgrading the ETH network, causing various network disruptions), we do have a problem with the group share server: it runs fine for weeks on end until it suddenly doesn't. To this day we have not been able to pinpoint the underlying problem, despite having changed a lot of parameters, both software and hardware. Our next step will be replacing the kernel on the disk backends and switch some hardware - for that we need a scheduled downtime on

Monday, December 11, starting at 06:00

during which the group shares will be unavailable for about 90 minutes. This affects all D-PHYS and IGP shares except the Astro and newly migrated IPA ones. We will post an update when the system is back.

We do apologize for the inconvenience these service issues might have caused you. Please bear with us while we're trying to locate and eliminate the root cause. We're monitoring the situation 24/7 and try to react as quickly as possible whenever a problem occurs. But wait! You can help! There seems to be a correlation between crash probability and large scale small file I/O. This means you should, whenever possible, avoid reading or writing a lot of small files and bundle your data into fewer and larger files. This also increases performance!