What happened ??

Status
Not open for further replies.

rolygate

Vaping Master
Supporting Member
ECF Veteran
Verified Member
Sep 24, 2009
8,354
12,405
ECF Towers
Explanation time...

The main server with the load balancer crashed due to a hard disk crash that also corrupted the data on the RAID disk, which is the main backup. For some reason not yet explained yet the host's backup was faulty and could not be restored, so data had to be recreated from other sources.

I was ill today and did not do several things that should have been done - and for that I deeply apologize.

Facebook is a good idea, we should have posted there. However there is an unresolved issue with this, we currently have two FB pages and it needs the two admins to sit down and work something out - each page has grown up independently of the other. I think one should close as it must be very confusing to people looking for ECF's page, the one that can be found under ECF is not the 'official' page, which is at e.cigarette.forum - but it is less busy than the other one. Hey hum. Add to that SJ's FB page and it is very confusing at the moment.

Let's hope tomorrow is a better day, today was a bit grim.
 

jlauro

Moved On
Apr 6, 2011
38
43
57
Owosso
DNS has been screwed all day. Is the ECF DNS server on the same RAID as the database?

Just interested...


Sort of (don't want to give details away) ... Which made it impractical to point people to a working system as updating root servers could take 24 hours... and the system took significantly longer than expected to get back online.


Steps will be taken so we can handle a similar outage better in the future. It's only been about 3-4 months there was another server to really make splitting the load for DNS and other services practical. No excuse, but it mostly fell into the if it isn't broke... category for the last few months.

Having the DNS split probably wouldn't have helped in the amount of down time, but would of made putting a status page up more practical, and may have also made it feasible to restore and reroute services to one of the remaining servers instead of waiting for rebuild and recover, which could mean lower downtime at the expense of some data loss.

During the entire outage the estimate would of probably always been 2 hours. So having a status page probably wouldn't of helped much.
 

stravaigin

Vaping Master
ECF Veteran
Verified Member
Dec 15, 2010
5,531
8,892
Australia
Explanation time...

The main server with the load balancer crashed due to a hard disk crash that also corrupted the data on the RAID disk, which is the main backup. For some reason not yet explained yet the host's backup was faulty and could not be restored, so data had to be recreated from other sources.

I was ill today and did not do several things that should have been done - and for that I deeply apologize.

Facebook is a good idea, we should have posted there. However there is an unresolved issue with this, we currently have two FB pages and it needs the two admins to sit down and work something out - each page has grown up independently of the other. I think one should close as it must be very confusing to people looking for ECF's page, the one that can be found under ECF is not the 'official' page, which is at e.cigarette.forum - but it is less busy than the other one. Hey hum. Add to that SJ's FB page and it is very confusing at the moment.

Let's hope tomorrow is a better day, today was a bit grim.

Sorry to hear you're not feeling well. Thanks for the explanation. Being without ECF and my Reo friends made me realise how much I rely on this place :) There are good people here.
 

Olef

Super Member
ECF Veteran
Apr 22, 2011
689
298
UK
Sort of (don't want to give details away) ... Which made it impractical to point people to a working system as updating root servers could take 24 hours... and the system took significantly longer than expected to get back online.


Steps will be taken so we can handle a similar outage better in the future. It's only been about 3-4 months there was another server to really make splitting the load for DNS and other services practical. No excuse, but it mostly fell into the if it isn't broke... category for the last few months.

Having the DNS split probably wouldn't have helped in the amount of down time, but would of made putting a status page up more practical, and may have also made it feasible to restore and reroute services to one of the remaining servers instead of waiting for rebuild and recover, which could mean lower downtime at the expense of some data loss.

During the entire outage the estimate would of probably always been 2 hours. So having a status page probably wouldn't of helped much.

Thanks for the info, as I say just interested as IT is my day job. I imagine my day without ECF has been a lot more restful than everyone who keeps the site running so thanks all for your efforts. They are appreciated :thumbs:
 

ime5000

Ultra Member
ECF Veteran
Verified Member
Aug 15, 2011
2,401
1,955
maryland
Glad that you're back up

(ECF withdrawal is awful)
no kidding! i checked like every half hour at least!


i thought something bad happened... before e ciggs i was doing real good with snus, till i logged in to by some and that day MD was banned from ordering snus online...
 
Last edited:

Tendril

Super Member
ECF Veteran
Jul 21, 2010
479
283
USA - Illinois
Hey gang,

Thanks for working so hard to get the site back up :thumbs:

You don't have to apologize, but it was nice of The Team. My professional opinion would be that you don't need to disclose any information about your network beyond the internet-facing IP address, though. Saying it was "a hard drive issue" or a "hard drive failure" is more than adequate.

Repeat: Thanks for working so hard to get the site back up! :thumbs:
 
Status
Not open for further replies.

Users who are viewing this thread