>After a really bad day yesterday, the CRM-DEVEL mailing list is back to
>normal operations. I will have a full report later on what happened, why,
>and how we will prevent it in the future.
OK, gang. Here's the story. We'll make this into a case study soon. :-)
Shortly after 9:30 pm CST on Monday, the mailbox of one of our list members
became full and a message from the CRM-DEVEL list server was "bounced".
Normally when this happens, the "bounce" is sent to the list owners (in our
case, Vince Mancuso and myself.) Normally, we note the problem and no
action is required. List members never know it happened.
Monday night, however, the "bounce" went back to the mailing list address
instead of the list owners' addresses. From there, it was immediately sent
out to everyone on the list, including the account with the full mailbox.
This generated andother "bounce" which went to the entire list, which
generated another "bounce" which went ... Well, you get the idea and you
all experienced the result. A textbook feedback loop.
I became aware of the problem about 8:00 am CST on Tuesday and, with
Vince's help, we put out the alert to the administrators of the list at
Embry Riddle in Daytona Beach, Florida. We identified the problem account,
and at around 10:00 pm CST Tuesday that account was unsubscribed and the
runaway stopped. During that time, several list members unsubscribed to
stop the flow of unwanted messages. Others had their mailboxes filled to
the limit.
Our response to this crisis will focus on several areas.
1. Improve our ability to delete "problem" accounts in a timely manner.
Vince and I now have a higher level of access allowing us to
directly unsubscribe members. This also allows us more freedom to
conduct other housekeeping tasks, improving service to you.
2. Determining why "bounces" from this particular account were directed
differently than normal. There may be a configuration problem with
either the system which had the full mailbox, or the listserv system
itself. We are investigating to find out and will report when we
know more.
3. Configuring the systems involved to prevent a repeat performance.
The problem could have been with either the Embry Riddle system, or
the system which had the full mailbox. THE PROBLEM WAS NOT THE
FAULT OF THE PERSON WHO USES THE FULL MAIL BOX.
If you have any information relative to this event that you think is
relevant, drop me an email. We will do our best to ensure that it never
happens again.
On with the discussions!!!
Best regards,
Neil Krey
neilkrey_at_why.net
http://users.why.net/neilkrey/