Short maintenance downtime of LDAP server on Mon Aug 2

On Monday, August 2nd, starting at 18:00, we need to modify our LDAP user database to incorporate structural changes needed for a new service we're currently setting up. This will cause a downtime of about 1 h, probably even shorter, that will affect user logins, email and file server access. We will post an update when things are back to normal. Update, 18:30: Things are now back to normal. šŸ™‚

We apologize in advance for any inconvenience this service interruption might cause.

Short downtime for plempy, plompy and plumpy on Monday Aug 2

On Monday, August 2nd, starting at 07:00, our terminal server / computation nodes plempy, plompy and plumpy will be moved into the water-cooled racks in the HIT server room. This will cause a downtime of said machines of about 30 min. If your thin client connects to plompy or if you're performing calculations on plempy or plumpy, please make sure your data has been saved by Monday morning. After the move, the trio will enjoy the amenities of our most advanced server room that only another thunderstorm could disrupt.

Update Mon Aug 2 08:40 All servers have reached their final destination.

Home Directories Outage

For reasons still unknown our home directory server fulen stopped serving any files via NFS at about 5:20 pm. This stopped most active Linux logins. In order to restore functionality we had to reboot the file server. As of 5:35 pm fulen is again functioning.

Unfortunately, due to the necessary reboot we cannot fully assess the reason for the incident. It follows a history of poor performance that we have been investigating intensly for the last couple of weeks. We are still trying to find the ultimate cause.

We apologise for the inconvenience.

Major outage due to water ingress


This morning around 03:00 a water ingress in our HIT server room shut down most of our essential infrastructure servers. As soon as power was back around 08:00 we started to bring our services online.
Please let us know if you still experience any problems. We apologize for theĀ inconvenience. I guess water and servers just don't mix very well.

Status 12:14 apart from the BackupPC server everything should be working again.

Plimpy maintenance reboot

Our terminal server plimpy (uptime: 47 days) is slowly clogging up with runaway processes, eating up memory and CPU. Since we cannot tell apart good processes from bad ones, we schedule a maintenance reboot for the Weekend, Saturday night June 19th in order to give the system a fresh start. We ask all users to save their data and log out of their thin clients.

BTW: Users of second generation thin clients (named tc00xx) are kindly invited to test our new ubuntu based terminal server plompy. If you're interested to give it a try, please let us know and we will configure your client.

No more Mailman password reminders in the future

Since the monthly password reminders of our mailing lists software Mailman have caused more confusion and unnecessary work than they seemed to help, from now on they are globally disabled on our mailing list server.

If you lost or forgot your password for the subscription of any of our mailing lists, you can always go to https://lists.phys.ethz.ch/listinfo, choose the appropriate list, scroll down to the bottom of that page, enter your e-mail address, click on "Unsubscribe or edit options", and then click on "Remind". Your password will be sent to your e-mail address.
Read the rest of this entry »

Homeserver Maintenance Downtime


Because of performance problems on our Homefileserver we need to reboot the server tomorrow Wednesday, 28th of April 2010. This will cause a downtime between 07:00 and approximately 07:30.

This will result in a short service interruption for the home directories!

To protect you from losing or corrupting any of your files, it is best to close all open files on the home directories.

Update, 07:20: the homes are back...

Update, 09:45: various people experience login problems. We're working on it.

Update, Friday 10:00: problems resolved!

Printing problems

Printing currently doesn't work on most Macs. We're still trying to find the source of the problem.

Maintenance Downtime of IDL License and Condor Master Server on 14. April, 5 pm

Because of hardware problems with one of our infrastructure servers we couldn't perform the planned software upgrade on the IDL license server and Condor master server during the big maintenance downtime last week.

Those hardware problems are fixed now and so we will install the software upgrade on the IDL license and Condor master server tomorrow, Wednesday, 14th of April 2010, starting at 5 pm. Duration of the maintenance downtime will be approximately two hours.

Update, 20:20h: IDL License and Condor server are both back online.

New computation node plumpy

In order to relieve our Terminal Server plimpy of some of its computational burden, we assigned it a new sidekick specifically targeted at number crunching tasks: plumpy.ethz.ch

So if you've been using plimpy to perform calculations in the past, please ssh into plumpy from now on and do your work there. This should make both you and the plimpy users happy. Thank you.