Group share woes

Update 20.12.: the strange intermittent permission problems some of you experienced could be traced back to a kernel regression. We're now back to using an older kernel.

Update 13.12.: we're cautiously optimistic that the problems have been fixed. Since Monday the file server has survived everything we threw at it. The culprit seems to be an Infiniband switch that sporadically disconnected under heavy load. We're now also turning on some performance improvements again, so you should see a speed increase when browsing files.

Update 06:45: group shares are back. Please let us know if you encounter any problems.

As some of you might have noticed, we've had some service quality issues with our group share server in the last few months. While not all interruptions are under our control (Informatikdienste lately have been very busy upgrading the ETH network, causing various network disruptions), we do have a problem with the group share server: it runs fine for weeks on end until it suddenly doesn't. To this day we have not been able to pinpoint the underlying problem, despite having changed a lot of parameters, both software and hardware. Our next step will be replacing the kernel on the disk backends and switch some hardware - for that we need a scheduled downtime on

Monday, December 11, starting at 06:00

during which the group shares will be unavailable for about 90 minutes. This affects all D-PHYS and IGP shares except the Astro and newly migrated IPA ones. We will post an update when the system is back.

We do apologize for the inconvenience these service issues might have caused you. Please bear with us while we're trying to locate and eliminate the root cause. We're monitoring the situation 24/7 and try to react as quickly as possible whenever a problem occurs. But wait! You can help! There seems to be a correlation between crash probability and large scale small file I/O. This means you should, whenever possible, avoid reading or writing a lot of small files and bundle your data into fewer and larger files. This also increases performance!

New D-PHYS LDAP servers

executive summary: you only need to read this if you run a service or tool that uses our LDAP server

A surprisingly large number of people at D-PHYS run services or use tools that connect to our LDAP server to obtain user information. If you are among those, this post is meant to inform you that our LDAP infrastructure is about to change and you need to take action in order to keep your service up and running. You can read about the details and technical background here. The situation right now is:

  • The new servers are running and sync with the current master.
  • We have started migrating services from the old server to the new ones.
  • The old server will be turned off in 2018.
  • You can now start to migrate your service / tool to the new LDAP infrastructure.
  • In early 2018 we will start searching for clients that still use the old server and address them individually.

So if you're affected, please change your LDAP connection according to the documentation or get in touch if you have any questions.

Used hardware bargain bin / yard sale, part II

ISG sits on a pile of old hardware that for various reasons cannot be used in our setup any more. Various people have expressed interest in and that still might be useful for certain scenarios (e.g. lab use or tinkering at home). We will therefore host a grab-your-used-piece-of-hardware session with mostly TFT monitors (15" - 19"), a few Mac Pros (2010) and printers, free of charge for ETH-internal use, prices for private use according to the rules: Wed Oct 18 in HPT H floor, between 11:00 - 13:00.
As usual, some rules apply:

  • this goes to all D-PHYS members
  • no registration necessary. Just come by and take whatever is left.
  • all items come as they are. We do not have any details or specifications
  • there’s no warranty or service whatsoever. All devices have successfully been turned on, but that’s it
  • if your item doesn’t turn on, you can bring it back within 5 days and get a full refund (if it wasn’t free in the first place)
  • no OS, no software, no manual, no keyboard, often no cables. You get one piece of hardware. All HDs are blank
  • all proceeds go to the D-PHYS funds, not ISG
  • bring cash

Server room migration on Wed, Aug 23

Update Thursday 01:45: we hit some unexpected problems with the non-Astro group shares. Everything is back now, please let us know if you expericence any problems..

Some months ago, we were informed by Informatikdienste that we would have to migrate our two water cooled racks in the HIT server room due to upcoming remodeling. This move will take place on

Wednesday, August 23, starting at 16:00

and last for several hours. During this time, all our IT services will be unavailable, including login, e-mail, storage and ISG-hosted websites. Incoming e-mail will be kept back and delivered afterwards. We will give our best to have login and e-mail back up within the first two hours, but group drives will take a bit longer due to the sheer amount of hardware we have to move.
We apologize for any inconvenience. Unfortunately, this migration cannot be performed on a weekend as we might have to interact with our colleagues at Informatikdienste, but it will ensure secure and enduring operation of our servers in the future.

some impressions from the migration - thanks to the whole team!

Expiration of D-PHYS accounts

As announced previously, about a year ago ISG was tasked by the department board to devise a workflow to expire D-PHYS accounts which so far had a life expectancy of ∞. In summer we started blocking accounts that were virtually unused, which almost by definition went very smoothly. Now we will start addressing accounts of users who are still using our services but do no longer have an affiliation with D-PHYS. They will receive an email informing them of a 1 month grace period before the account gets blocked. This posting is meant to serve as a reminder to everybody that this process is underway and questions may arise.
The project is explained in more detail in our readme.

Web server upgrade on Jan 19

On Thursday, January 19, starting at 08:00, we will OS upgrade the main D-PHYS web server. All websites hosted on zwoelfi*.ethz.ch will be down for several hours and will gradually come back as we progress. This does not affect the department website, the institutes and many of the group websites. However our groupware, the wikis and many special interest sites will be inaccessible. Note that if you're using the ActiveSync connector via groupware to sync email to your cell phone or Outlook, this won't work either. Temporarily use webmail while we bring back groupware as one of the first services.

Update 17:30 - due to an inordinate amount of user files on the web server the upgrade took a bit longer than anticipated. Now almost all websites should be back online, please let us know if you encounter any problems.

Access to Windows Remote Desktop blocked from outside ETH

In the last few weeks we discovered some attempted attacks on the Windows Remote Desktop feature from sources outside of ETH.

In order to protect both your machines and our network, we decided to block RDP access from ETH-external networks. If you still need access from outside the ETH network (e.g. from home) you have to first open a VPN connection to ETH and then start the Remote Desktop client.

More information about installing the VPN client is available here.

2016 in review

This post is meant to give you a short overview of what has been accomplished in D-PHYS IT by ISG this year. We’ve been hard at work to further improve and extend our services for you, our customers. Some highlights of 2016:

  • New team member: Sven Mäder joined ISG this year to replace Axel in our Linux server team.
  • Account expiry: you might have heard that D-PHYS decided to phase out old accounts in the future. We spent the last year laying the technical groundwork for a smooth and painless implementation of this policy change. One first visible result is our new account portal.
  • Printing: in summer we integrated student printing into the pia printing system which means that we now have a comprehensive printing solution for the whole department. The D-PHYS print server will be shut down in early 2017.
  • Storage: in 2016 the disk space occupied by data and backup grew from 929 TiB to 1.3 PiB, again increasing the yearly growth rate. We are now using 60-disk toploader chassis to maximize storage space-per-volume.
  • Outages: we scheduled two maintenance windows, on April 14 and September 5, in order to perform hardware and system upgrades. Together with a network upgrade by Informatikdienste on September 15, these were the only noteworthy downtimes in 2016.
  • Docking network: in fall 2016 we migrated most of the department's network sockets to the 802.1x-enabled docking network. While there is little immediate benefit for most of us, this is a prerequisite for future network projects like the upcoming Unified Collaboration & Communication (UCC) project.
  • Wifi: in early 2016 we developed and installed a portable wifi probe that eventually led to the discovery of one of the underlying problems causing ETH's wifi woes. Since then, wifi has been much more stable.
  • OS upgrades: 2016 brought new OS versions for almost every system: the Windows 10 rollout picked up steam, Sierra arrived on the Macs and Ubuntu 16.04 on the Linux workstations.
  • Cluster: we built and deployed a new high-availability cluster setup for our virtual servers this year.
  • Core services: a lot of infrastructure work has happened in the background to ensure smooth operation and seamless growth of our services in the future. Examples are: new ActiveDirectory servers for our Windows users, migrating our webserver certificates to Let's Encrypt, a facelift for most of our websites to match the AEM design and an upgrade of our iPXE boot screen.
  • IT security: we participate in and support the ETH-wide IT security initiative and also worked hard to make the mandated n.ethz password change as humane as possible.

I would like to take this opportunity to thank my whole team for their hard and dedicated work all year long.

Happy Holidays and see you in 2017!

short maintenance downtime of astrogate on Monday, Dec 12, starting at 07:00

We schedule a short (~30 min) maintenance downtime of astrogate on Monday, Dec 12, starting at 07:00 in order to replace a network card. During this time no access to the astro SAN data will be possible.

new ISG staff member

Sven Mäder

Sven Mäder

It is my pleasure to welcome Sven Mäder into our group. He joins us to replace Axel Beckert in the Linux team.

Welcome Sven!