Yesterday ECF was offline for several hours due to a combination of failures at our website hosts, plus there were some staff issues at ECF which meant that updates on the situation were not issued.
The hard drive on the core/loadbalancer webserver crashed, but before it died it also corrupted the RAID disk, which is the main backup. Then the host's backup of the drive proved faulty, leaving us to reconstruct the data from other backups. Then, their DNS redirect to a backup page failed. Our database server and other servers were not affected, and the webhost also failed to bypass the faulty box and get us back up that way. All in all not really a win.
Plus some of our people were out - so recovering from all that was not smooth.
We apologize for the inconvenience caused, and hope that in the future we will cope better. Some procedures will be changed as a result of yesterday's experience.
The hard drive on the core/loadbalancer webserver crashed, but before it died it also corrupted the RAID disk, which is the main backup. Then the host's backup of the drive proved faulty, leaving us to reconstruct the data from other backups. Then, their DNS redirect to a backup page failed. Our database server and other servers were not affected, and the webhost also failed to bypass the faulty box and get us back up that way. All in all not really a win.
Plus some of our people were out - so recovering from all that was not smooth.
We apologize for the inconvenience caused, and hope that in the future we will cope better. Some procedures will be changed as a result of yesterday's experience.
Last edited: