(ITIL Service Operation) KEDB in ITIL is a database containing all Known Error Records. This database is created as part of the Problem Management process and is used by the Incident and Problem Management processes. The known error base can be part of a configuration management system, or otherwise be part of a service knowledge management system.
To understand what a KEDB is and how important it is to an IT team and broader customers, let's look at some ITIL terms. (Remember, ITIL was formerly known as the Information Technology Infrastructure Library. ITIL provides detailed best practices for IT service management, known as ITSM.)
An incident is an unplanned outage of an IT service. This could mean that the email service went down without warning, it could mean that software is no longer connected to other software, etc.
One problem is the main cause of the accident; is the cause of the crash, although after the crash it may take some time to identify the problem.
Once the problem is identified, it is no longer a problem but a known bug - the IT team knows what is causing an incident and what the problem is, but it has not yet been fixed.
The difference between incident and problem is significant: many users report outages or failures, but IT may not know the problem and the underlying cause. When IT is able to discover the problem that caused the incident, it can begin resolving it, either with a short-term workaround or a long-term solution.
A known bug database therefore tracks all known bugs within the purview of IT, which is typically an entire system or even an organization. Ideally, the KEDB includes:
Once IT is able to determine the problem of an accident, it has two approaches.
The first is to find a long-term and permanent solution. Depending on how complicated the problem is and whether it has already occurred, IT must prioritize the time and resources required to find a permanent solution, as well as the distribution and severity of the problem. This can mean that some issues are not being prioritized.
The second way is to find a short term workaround. A workaround is a temporary solution that allows the job to be done until the problem is permanently resolved. Workarounds are critical as IT must prioritize how they spend time and money solving which problems.
Situations reduced by the need for a long-term solution mean users can continue to experience the incident. If users encounter the incident repeatedly, a solution to the problem will ensure that the user has minimal disruption to productive work.
How exactly does a company justify database capital and operating costs?
To return to the email incident, let's assume the critical service was in sleep mode after a series of diagnoses and tests. Once identified, the solution may have been faster if the service was stopped and restarted. But it took a lot of effort to get to the solution and, more importantly, it cost valuable time. The e-mail service was down while the diagnosis and resolution were applied. This could result in customer penalties and non-asset losses such as future business opportunities and customer satisfaction.
However, this organization, which provides e-mail services to its customers, maintains a KEDB and this particular incident has been recorded. If the email service goes down again, the technical support team can simply refer to the previous outage in the KEDB and start diagnostics with the service that caused the problem last time. Now, if the service itself is causing the problem, it will be fixed in a fraction of the time. As you can see, this greatly reduces downtime and all other negative effects of service outages. This is KEDB in action!
A KEDB record contains the details of the incident, when the error occurred and what was done to fix it. For quick resolution, however, the KEDB needs to be powerful enough to retrieve relevant records using filters and search terms. Without a KEDB, service management organizations tend to keep reinventing the wheel instead of working to build a mature organization that dedicates its resources to improving services.
In the event of a service outage, there are two ways to restore it. The first and most ideal is a permanent solution. A permanent solution includes a fix that no longer guarantees failure at least to some degree. The second and most common type of recovery is the workaround, which seeks an alternate workaround. After a workaround, a permanent solution is usually identified and implemented at a later time.
If the e-mail service is not working, restarting the service is an alternative solution. The coaching staff know that this will fix the problem on the spot (which is of great importance), but will repeat itself in the future. Before the incident recurs, the technical team must investigate why the service is not responding and find a permanent solution.
Let's take a look at another classic example I've used over and over in training courses - this really brings home the workaround and permanent solution concepts. Imagine if the printer in your cubicle has stopped working and you need it right away. You are recording an incident with your technical staff informing you that you are about to enter a meeting with a customer and to print some documents. The support agent finds out that he cannot fix the printer in a timely fashion and offers you a workaround to send your files to a shared printer in the lobby.
The workaround helps because your goal is to get the prints and attend a meeting. However, you don't have to do this every time you need to print. So when the meeting is over, push for a permanent solution. When you come back, your printer is working and there will be a notification from support staff that the power cord was faulty and has been replaced. This is a permanent solution. And while there is a chance that the new cable will break as well, the odds are good.
In short, the workaround is a temporary solution. As the term suggests, the permanent solution is permanent.
Why did I discuss an alternative and permanent solution in a post aimed at KEDB? There are known bugs because the fix is temporary. The known bug database consists of records for which there is no permanent solution but an alternative solution. If a permanent solution needs to be implemented on a record with a known bug, the record can be deleted or archived for evidence. Known error records with a permanent solution implemented do not need to be part of the KEDB.
This concept is explored further in the next section, which discusses the various process trees for creating, using, and storing known error records.