Author Archive

2018 in review

Tuesday, December 18th, 2018

This post is meant to give you a short overview of what has been accomplished in D-PHYS IT by ISG this year. We’ve been hard at work to further improve and extend our services for you, our customers. Some highlights of 2018:

  • New mail server: between January and March, the virtual machines that make up the D-PHYS mail server were migrated to new hardware. We're now running on a state-of-the-art server with SSD storage that will serve the department's needs for many years to come.
  • New LDAP servers: in late 2017 we started a big migration to a cluster of new LDAP servers. This move was completed in the spring of 2018 and the old server turned off.
  • group membership edit: one of the benefits of the LDAP migration is that group memberships can now be managed directly by dedicated owners of a group. If you feel responsible for one such group and would like to be able to perform member management yourself without having to go through us each time, please get in touch.
  • New web server: we purchased new D-PHYS web server hardware to replace the old 10-year-old system. Since we're also planning to change the setup of your web hosting, migrating the existing web sites to the new hardware will be a long process that will extend well into 2019.
  • Network migration: while we were in an advanced planning stage of a segmentation of the D-PHYS network and had already started to implement the first changes, Informatikdienste announced that the underlying network layout of the whole Hönggerberg campus would be redesigned in 2018/19 which deeply influences and impacts our work as well. We're now on hold until we know details of ID's technical implementation.
  • Storage: in 2018 the disk space occupied by data and backup grew from 1.6 PiB to 2.1 PiB, which means that growth in storage has picked up steam again after two slow years.
  • Outages: apart from the above-mentioned pre-announced migration windows and some short-term network interruptions, our systems have been very stable in 2018.
  • OS upgrades: the Windows 10 rollout has been largely completed and most Linux workstations have been upgraded to Ubuntu 18.04.
  • WiFi change: we accompanied and supported ETH's wifi change project in November.
  • UCC: the UCC rollout which will replace the existing ETH telephony system with an all-IP based solution has been put on hold by Informatikdienste since the service quality was severely lacking. We'll know more in 2019.Q2.
  • IT security: we participate in and support the ETH-wide IT security initiative.

I would like to take this opportunity to thank my whole team for their hard and dedicated work all year long.

Happy Holidays and see you in 2019!

Storage migration

Monday, December 3rd, 2018

Update 21:00 - IGP shares are back. Welcome to igp-data!
Update 19:30 - the D-PHYS shares are back. IGP will take a little more time.

In order to guarantee sustained performance and availability of our storage system, we need to schedule a few storage maintenance windows. The first one will take place on Wednesday, 12.12.2018 at 16:00 and affect all D-PHYS and IGP group shares, but not IPA or galaxy (technically: windata/macdata, but not astrogate or ipa-data). The relevant shares will be offline for at least 3 hours.

For emergency cases, there will be read-only access to last night's backup as described here.

Please note that these migrations will bring some overall changes to the D-PHYS storage setup:

  • the SMBv1 protocol will be disabled on all file servers. It has a long history of security issues and we've migrated all clients to newer versions, so this should not affect anyone. However, there's a small chance that we didn't catch all connections, so please contact us if you experience any issues after the migration.
  • all SMB protocol versions will be restricted to ETH-internal access. This step has been long overdue and since most ISPs block the necessary ports anyway, it shouldn't affect too many users. What it means however: in the future, file server access from outside ETH requires VPN.
  • IGP/D-BAUG will get their own front-end server igp-data. If you're with IGP and have already switched your file server mounts from windata to igp-data, you're good and don't have to do anything. If you haven't, you should do so before Dec 12 in order to get a seamless migration experience.

We'll update this post as the migration progresses and as soon as the systems are back.

Groupware migration

Thursday, September 27th, 2018

On Tuesday, October 2, starting at 07:00, we will migrate our groupware instance to another server. For about 1 hour you won't have access to your calendar. If you're one of the few people who also sync their email via groupware, mail will be offline too (you can always use webmail). After the migration your clients should just reconnect and resume syncing. If you notice any issues after we're done, please get in touch.

Update Wed 07:45: migration completed, please let us know if you experience any problems.

Advance information: network migration

Thursday, July 12th, 2018

After a long (11 years) phase of stability in the D-PHYS network, we are preparing a pretty extensive network reorganization for 2018. This is mainly driven by ever-increasing information security requirements mandated by ETH. The D-PHYS network has traditionally been very open and we will try to keep it that way, but we need to implement some modifications. The basic premise is to partition our current /21 network (2048 IP addressess) into smaller groups that better represent the types of machines in those networks. This will then allow us to tailor each group's firewall rules to the services needed by those machines. The roadmap looks like this:

  • Rearrange hosts in current /21 net to align with future VLAN boundaries
  • Partition the /21 net into smaller VLANs
  • Migrate individual subnets from our DHCP server to that of ID. This will also allow us to assign IPv6 addresses
  • Migrate the subnets into different virtual private zones (VPZ)
  • Assign and fine tune firewall settings on the different VPZ

As usual, we'll try to implement these steps as smoothly as possible. However, a migration on this scale will not go entirely without issues. Step 1 will entail an IP address change for quite a number of hosts. We'll make sure that our dyndns host names (foobar.dhcp.phys.ethz.ch) will be in sync with the new addresses, but this only works for properly configured DHCP hosts. Here's how you can help: if you have any hosts in the 192.33.96.0/21 D-PHYS network that are statically configured (non-DHCP), please get in touch with us ASAP. The same is true if you're using hard-coded IP addresses from that range instead of host names. We'll need to deal with those hosts individually.
In the course of 2018 we'll keep you updated on project progress and announce specific dates when we implement changes.

Update: since Informatikdienste are currently drafting an even more comprehensive Hönggerberg network reorganization that will deeply impact our plans as well, this project is currently on hold until we know more. Stay tuned.

Edit group share memberships yourself

Wednesday, April 18th, 2018

Owners of our group shares so far always had to contact us in order to have members added or removed to/from the underlying LDAP group. One of the benefits of the recent LDAP migration is that we can now offer a web interface for LDAP group member management.

group-edit
If you're the owner of a group share and would like to be able to perform user management yourself, please get in touch with me. You can also use this interface to edit your group report settings.

Removal of old LDAP server

Tuesday, March 6th, 2018

As already described in this past posting, we have recreated our LDAP server infrastructure and will now retire the old server. For the last 4 weeks we've been sniffing for LDAP queries that still use the old server and we've addressed each of those requests individually. Since we can't guarantee to detect each and every single network packet, now is your last chance to migrate to the new servers in case you haven't done so already. The old server will go offline on

Friday, March 16

Please let us know if you have any questions.

2017 in review

Monday, December 18th, 2017

This post is meant to give you a short overview of what has been accomplished in D-PHYS IT by ISG this year. We’ve been hard at work to further improve and extend our services for you, our customers. Some highlights of 2017:

  • Account expiry: in early 2017 we finished assessing all ~7600 D-PHYS accounts and blocked the expired ones. We also tied all D-PHYS accounts to their nethz counterparts wherever possible. This allows us to make use of ETH's employment information from now on. While we were at it:
  • New LDAP servers: Since implementing account expiration meant touching most aspects of our identity management infrastructure anyway, we decided to completely overhaul our LDAP user database. We reworked the LDAP schema (the original one dating back to the early 90s) and set up a 3-way replicating OpenLDAP cluster.
  • Windows Server Cluster: Several mission critical Windows Server instances have been moved to a newly created Windows Cluster. This complements last year's Linux cluster.
  • Storage: in 2017 the disk space occupied by data and backup grew from 1.3 PiB to 1.6 PiB, making this a very slow year as far as storage growth is concerned.
  • Server room migration: in August we had to move most of D-PHYS's servers three rack rows down in the HIT D 13 server room. We now have a solid foundation for our servers for the next years.
  • Outages: apart from the above-mentioned migration, some short-term network interruptions and the unfortunate file server issues of late our systems have been very stable in 2017.
  • Web server upgrade: in January we upgraded the operating system on the D-PHYS web server. We also used the occasion to clean up a lot of legacy cruft.
  • OS upgrades: 2017 brought new OS versions for almost every system: the Windows 10 rollout picked up steam, High Sierra arrived on the Macs and Ubuntu 16.04 on the remaining Linux workstations.
  • eXile: we migrated the configuration management from Puppet to Ansible and then re-installed all eXile gateways in a fully automated way with the latest Debian release.
  • UCC: we laid the technical groundwork and performed implementation tests for the upcoming UCC rollout which will replace the existing ETH telephony system with an all-IP based solution.
  • IT security: we participate in and support the ETH-wide IT security initiative.

I would like to take this opportunity to thank my whole team for their hard and dedicated work all year long.

Happy Holidays and see you in 2018!

Group share woes

Friday, December 8th, 2017

Update 20.12.: the strange intermittent permission problems some of you experienced could be traced back to a kernel regression. We're now back to using an older kernel.

Update 13.12.: we're cautiously optimistic that the problems have been fixed. Since Monday the file server has survived everything we threw at it. The culprit seems to be an Infiniband switch that sporadically disconnected under heavy load. We're now also turning on some performance improvements again, so you should see a speed increase when browsing files.

Update 06:45: group shares are back. Please let us know if you encounter any problems.

As some of you might have noticed, we've had some service quality issues with our group share server in the last few months. While not all interruptions are under our control (Informatikdienste lately have been very busy upgrading the ETH network, causing various network disruptions), we do have a problem with the group share server: it runs fine for weeks on end until it suddenly doesn't. To this day we have not been able to pinpoint the underlying problem, despite having changed a lot of parameters, both software and hardware. Our next step will be replacing the kernel on the disk backends and switch some hardware - for that we need a scheduled downtime on

Monday, December 11, starting at 06:00

during which the group shares will be unavailable for about 90 minutes. This affects all D-PHYS and IGP shares except the Astro and newly migrated IPA ones. We will post an update when the system is back.

We do apologize for the inconvenience these service issues might have caused you. Please bear with us while we're trying to locate and eliminate the root cause. We're monitoring the situation 24/7 and try to react as quickly as possible whenever a problem occurs. But wait! You can help! There seems to be a correlation between crash probability and large scale small file I/O. This means you should, whenever possible, avoid reading or writing a lot of small files and bundle your data into fewer and larger files. This also increases performance!

New D-PHYS LDAP servers

Monday, November 13th, 2017

executive summary: you only need to read this if you run a service or tool that uses our LDAP server

A surprisingly large number of people at D-PHYS run services or use tools that connect to our LDAP server to obtain user information. If you are among those, this post is meant to inform you that our LDAP infrastructure is about to change and you need to take action in order to keep your service up and running. You can read about the details and technical background here. The situation right now is:

  • The new servers are running and sync with the current master.
  • We have started migrating services from the old server to the new ones.
  • The old server will be turned off in 2018.
  • You can now start to migrate your service / tool to the new LDAP infrastructure.
  • In early 2018 we will start searching for clients that still use the old server and address them individually.

So if you're affected, please change your LDAP connection according to the documentation or get in touch if you have any questions.

Server room migration on Wed, Aug 23

Tuesday, July 25th, 2017

Update Thursday 01:45: we hit some unexpected problems with the non-Astro group shares. Everything is back now, please let us know if you expericence any problems..

Some months ago, we were informed by Informatikdienste that we would have to migrate our two water cooled racks in the HIT server room due to upcoming remodeling. This move will take place on

Wednesday, August 23, starting at 16:00

and last for several hours. During this time, all our IT services will be unavailable, including login, e-mail, storage and ISG-hosted websites. Incoming e-mail will be kept back and delivered afterwards. We will give our best to have login and e-mail back up within the first two hours, but group drives will take a bit longer due to the sheer amount of hardware we have to move.
We apologize for any inconvenience. Unfortunately, this migration cannot be performed on a weekend as we might have to interact with our colleagues at Informatikdienste, but it will ensure secure and enduring operation of our servers in the future.

some impressions from the migration - thanks to the whole team!