This text has grown out of an answer I provided in an on-line discussion group, posted on 16 November 2012. The issue I would like to address is the evident confusion in many minds between risk and impact. Often, you ask people for a list of risks and they provide, instead, a list of impacts, and vice versa.
Changes and the Management of Changes
We might start by distinguishing changes from the management of changes. Altering the start-up parameters of a database manager is a change. Filing a request for change, assessing and approving it, scheduling the change and evaluating the results is the management of changes. Changes are performed as activities in disciplines such as release management or capacity management. The change management discipline controls those changes and seeks means by which they may be facilitated.
We will see that certain impacts, and their attendant risks, concern the changes themselves, whereas other risks concern the management of those changes.
Measuring impact
The impact of a change may be measured from three perspectives:
- the impact that it will have on the users and customers of what has been changed
- the impact that the change, once completed, will have on the system used to create or deliver goods and services
- the impact that the change will have on that system during the implementation of the change
There are no fixed units for measuring impact, although all impact can ultimately be reduced to financial terms. (Many organizations do not have the know-how to make such measurements, but that does not mean it is not possible.) Depending on the nature of the impact, it might be measured in time (e.g., a feature becomes available sooner, or later); in productivity (e.g., users will be able to work 20% more quickly); in money (e.g., the change will save $10 million); in reliability or availability (e.g., the change will reduce the number of outages by 20%); and so forth.
An example will help illustrate the concepts. Suppose it is proposed to change a database by adding an index. The impact on users will presumably be an increase in productivity. Because the change will help certain queries to complete much more quickly, users will get their reports and will see lower response times in the applications using that database. The impact on the system will include such factors as the greater volume of storage required for the database, the increased times required for ETL operations and the greater complexity in the database structure, leading to a slight increase in database management effort. During the implementation of the change, it might be necessary to backup the table concerned, drop that table, create the index and then restore the data to the table. This requires a period of downtime for any applications using the table that has been dropped.
Each impact has a risk
Associated with each impact is a specific risk. Risk is defined as uncertainty of outcome. For example, a change may be intended to increase productivity by 20%. This impact is attended by two types of uncertainties: will the desired and agreed change be implemented as agreed and desired; and if so, will the increase in productivity really be 20%. For the latter uncertainty, all we might be able to do is to estimate that in 80% of similar cases, the productivity will be increased anywhere from 10% to 30%. Risk is therefore expressed as a probability that an expected or desired value will fall with a given range, typically as a fraction of 100% probability. 100% probability means there is no uncertainty, that the impact is sure to occur.
If we apply this concept of risk to the example cited above, we see the following examples of risk. Suppose the new index is designed to increase user productivity by 5% with a margin of ±2%. The downside risk of the change is that the productivity increase is lower than 3%, meaning that the payback on the investment in making and operating the change will not be high enough, or might even be negative. The upside risk is that the benefits will be higher than expected, which only means that there was an opportunity cost. The change probably should have been done much sooner. The risks to the system include such factors as the incorrect estimation of the additional space required in database storage or the additional time required for ETL operations. Suppose, for example, that there is a nightly ETL operation on the database lasting 7 hours, performed during a window of 7.5 hours. It is assumed that the new index will extend the data load time by 15 minutes, ±5 minutes depending on the number of updates to the indexed table. What is the risk that this estimate will be wrong, that there are an exceptionally large number of updates, and that the additional load time will be more than 30 minutes, thereby provoking an incident?
Types of change management risks
The preceding sections concerned changes themselves. In this section, we address the management of those changes. There is one top level risk for the management all changes: what is the probability that it will not achieve its agreed outcomes? In fact, this top level risk concerns both the changes themselves and how they are managed. If we limit ourselves here to the management aspect, this top level risk should be broken down into some more useful types, which may be examined in more detail. These risks include:
- Failure to identify a key impact
- Incorrect implicit or explicit assumptions
- Conflicts during change implementation
- Incorrect implementation
- Ineffective mitigation in case of failure
- Acts by a deity
If we look at the types of risks related to the changes themselves, as opposed to the management of change, every different type of change may have its own impacts and therefore its own risks. These types of risks are perhaps best managed by a combination of experience, skills and documented change models.
Failure to identify a key impact
Effective change management attempts to identify significant impacts for each change. The types of impacts have been described above. Failure to identify a significant impact may result in the change not achieving its objectives, because its planning did not take into account that impact, or because necessary mitigation was not implemented, or because the change itself turns out to be wrong-headed. The failure to identify such impacts may be due to the wrong composition of the CAB, or incorrect configuration information and other documentation, or due to poor performance by the CAB members. What if you decide NOT to perform a change, but you have incorrectly assessed the impact of not doing it?
Incorrect implicit or explicit assumptions
It is not feasible to identify and assess every possible impact and risk associated with a change. Therefore, when a change is authorized, it is done so on the basis of certain explicit, untested assumptions, as well as a large number of implicit assumptions.
For example, most of the risks that are managed by IT service continuity management are not explicitly managed by change management. Similarly, if a change is authorized on the basis of information coming from a configuration management system, there is always a risk that such information might be inaccurate or incomplete. Unless such issues are endemic to an organization, it makes little sense to reassess them for every change.
Conflicts during change implementation
Conflicts during the implementation of changes may lead to a delay in the implementation, a partial implementation only, or a complete failure to implement the change. These three results are in increasing order of loss of value.
Incorrect implementation, or implementation of something that differs from what was approved
Change management acts as a front end for configuration management. It provides to configuration management the definitions of the authorized states of the components under control. In principle, these components may be in only one of two states: the configuration before the change, and the authorized configuration after the change.
The risk of incorrect implementation of an authorized change is essentially that the controlled components might end up in a state that is neither the authorized state before the change, nor the one after the change. There is the additional sub-risk of ending up in a state that is quite unknown.
In addition to this risk, such changes have the additional impact of wasting effort, insofar as the change is likely to need redoing.
Ineffective mitigation in case of failure
A reminder that the mitigation of change failure refers to reducing the impact of possible failure in a planned way. The typical mitigation of change failure is to roll back the change, restoring the system to its state before the change implementation started. Sometimes, however, a roll-back is not practical, or not possible, or not worthwhile. Therefore, other types of mitigation might be planned. The mitigation of failed changes may be ineffective due to inadequate or unrealistic planning, or due to untested or incorrectly implemented mitigation.
Here are some examples of the impact of ineffective mitigation, together with their risks:
1. You release a new application version, assuming that all target platforms have the prerequisite underpinning hardware and software. How likely is this assumption to be correct?
2. How likely is it that an unplanned or unapproved conflicting change will occur at the same time? How likely is it that a change on which the current change depends will not have been finished on time, or not done at all, or will have failed?
3. Do the implementers know their jobs? Do they have the requisite skills? Is the implementation plan coherent? Has the change been practiced or tested?
4. You back up your data before a change; the change fails; and the restoration of the backed up data fails. How likely is that?
Acts by a deity
In this category are all the impacts and risks that are so unlikely to occur and over which it is virtually impossible to have any control so the risks remain unmanaged. They are similar to what the ancient Greeks called ἀπὸ μηχανῆς θεός, a deus ex machina for the Romans. Certain completely unpredictable events may occur that have significant impact on the planned change. These events are the same as the threats that contribute to the calculation of risk, as per the Management of Risk® approach. Any analysis of risk has the potential for identifying an unlimited number of threats, ranging from the sublime to the absurd. Most of these threats are so unlikely or so unmanageable that we do nothing about them.
Thus, rather than resolving the underlying cause of the error a system was established that radically reduced productivity, without any evidence that the error would ever be repeated. Every single error resulted in a similar layer of control, creating an exceedingly slow process, heavily dependent on the physical presence of managers.
For example, a change might fail because an airplane crashes into the data center while the change is in progress. Unless the data center is located at the end of a runway at a major airport, this sort of event is extremely unlikely. But even more important is the fact that the realization of such an event will not change our way of managing risk. A meteorite might crash through the ceiling of our building. That does not mean that we will create anti-meteorite protection for our building. Indeed, we are not even likely to put in place a change freeze during the Perseid or Leonid meteor shower. That being said, one of the great dangers of change management is the continuous accretion of new controls that are supposed to limit risks that have been realized. See the example in the inset.
Another issue is that it may be difficult to distinguish between a genuine act of a deity and a case where ignorance or a misunderstanding of the principles of probability—the so-called Black Swan effect—has led to a failure to manage a risk that was both manageable and not at all improbable.
What does all this mean for your management of changes?
The effective and efficient management of risk during changes requires a clear understanding of what might threaten both the change itself, as well as the methods used to manage the change. For each of those threats, the likelihood of their being realized also requires assessment. As we have seen above, the failure to perform these analyses adequately is itself a risk to the management of changes.
In summary:
- use a systematic approach to identifying the types of impacts related to both changes and to the management of changes
- the size of impact alone is not sufficient cause for implementing controls to a change; the probability of that impact occurring and the degree of uncertainty should also be taken into account
- creating systematic controls for risks in the management of change is a very effective way of preventing the efficient handling of changes and accelerating the volume of changes implemented, which is one of the main goals of change management. At the same time they may have doubtful value in reducing uncertainty and may have limited value in reducing the probability of a negative impact from being realized.
Leave a Reply