It just crashed…

I was horrified when I looked in detail at the Windows Event Viewer on our server.

It wasn’t just that there were errors, it was the number of errors. The following error, paired with an error with Event ID 2401, occured more than 2,500 times over the course of a year.

Event Type: Error
Event Source: SAS
Event Category: None
Event ID: 2400
Date: 30/12/2008
Time: 10:24:08
User: N/A
Computer: STASAS01
SAS Error Event: FATAL ERROR: WRCODE=8000583d, MODULE='VQTCP': readcon #2 failed on only link.

What did this error mean? Each pair of errors corresponds to a SAS® session crash. The error is generated due to a long-fixed issue with SAS/Connect, long-fixed globally, but not locally. Often the crash occurs immediately as a user tries to log in, sometimes, however it appears to happen spontaneously in mid-session.

Experienced users had assumed that it was caused by some funny sequence of key-strokes, and, having accepted it as part of life, given up reporting it. I first heard about it after a new member of staff suffered this nineteen times in one day. I raised a call with SAS UK Tech Support, but, at the time, had nothing for them to go on. When SAS crashed the client application just disappeared from the screen and the user received no error or warning message.

A couple of weeks later somebody else described the same phenomenon. This time I went to the event log on the server, found the error message(s) and Googled it, updating the SAS Support track as I went. The SAS technician emailed back almost immediately with a couple of Knowledge Base articles: Problem Notes 4499 and 6975. A hotfix was available and we will soon have it installed (see Hotfixes for more details).

What struck me, though, was how much time this error must have cost us. Individually it might happen a couple of times a fortnight to a user and may mean the loss of a few minutes, sometimes more, each time. If each crash cost ten minutes of lost work, then we had lost fifty-two working days over the course of a year! And no one had noticed because, to each individual, it was small change!