Process - Problem Control
The main objective of Problem Control is to turn problems into Known Errors so that Error Control can propose the relevant solutions.
Problem Control basically consists of three phases:
1. Identification and Logging
One of the main tasks of Problem Management is to identify problems. The main sources of information used are:
- The incident database: in principle, any incident of which the cause is unknown and which has been closed by means of some sort of work-around is potentially a problem. However, it will be necessary to examine whether the incident is isolated and assess its impact on the IT structure before raising it to the category of a problem.
- Analysis of IT infrastructure: in collaboration with Availability Management and Capacity Management, Problem Management needs to analyse the different processes and determine the aspects in which IT systems and structures need to be bolstered in order to avoid future problems.
- Service Level Degradation: a decline in performance may be an indication of underlying problems that have not manifested themselves explicitly as incidents.
All the IT infrastructure areas need to work with Problem Management in order to identify real and potential problems and report any symptoms to it that may be a signal of a deterioration in the IT service.
The problem log is basically similar to the incident log, except that the emphasis should not be on specific details of associated incidents but on the nature and possible impact of problems.
Among other things, the log should include information about:
- The CIs involved.
- Causes of the problem.
- Associated symptoms.
- Temporary solutions.
- Services involved.
- Levels of urgency, priority and impact.
- Status: active, known error, closed.
2. Classification and Allocation of Resources.
Problems are classified according to their general characteristics, such as whether they are hardware or software problems, the functional areas affected and details of the various configuration items (CIs) involved.
An essential factor is determining the priority of the problem, which, as in the case of incidents, is based on its urgency (the acceptable delay in solving the problem) and the impact (degree of deterioration in the quality of service).
As in the case of Incident Management the priority may change over the course of the life cycle of the problem, for example, if a temporary solution is found that considerably reduces its impact.
Once the problem has been classified and its priority defined, the resources necessary to solve the problem should be assigned. These resources need to be sufficient to ensure that the associated problems are dealt with effectively and the impact on the IT infrastructure minimised.
3. Analysis and Diagnosis: Known error
The main objectives of the process of analysis are:
- Determining the causes of the problem.
- Providing work-arounds for Incident Management to minimise the impact of the problem until the necessary changes are made so as to resolve the problem definitively.
It is essential to take into account the fact the the source of a problem is not always a hardware or software fault. It is commonplace for problems to be caused by:
- Errors of procedure.
- Incorrect documentation.
- A lack of coordination between different areas.
It is also possible for the cause of the problem to be a well-known bug in one or other of the applications used. It is therefore a good idea to establish direct contact with the development environment, in the case of applications developed in-house, or to look for information on the Internet about known errors applicable to the problem in question.
Once the causes of the problem have been determined, it becomes a Known Error and is forwarded to Error Control for processing.