Wednesday, December 8, 2010

Study notes on Cobit: Assessing and managing IT risks

Life could be at stake

Suppose that you were the CIO of a hospital. To allow the people to quickly locate expensive medical equipment, you had implemented a WiFi tag tracking system and it had been working fine. Until one day, a patient was rushed to the ICU for urgent operation and the surgery doctor was looking for a certain piece of critical but rarely used equipment but it just couldn't be found. As a result, the patient died and the hospital was sued and ordered to pay a million dollars for the damage. The incident was also widely reported in the media. Investigation showed that the WiFi tag on that piece of equipment simply ran out of battery, but the hospital didn't have a procedure for checking for low-battery WiFi tags, so it was considered a negligence.
The above example shows a risk (medical equipment not found when it is urgently needed) that has become true. It had a very serious impact (loss of human life, public image of the hospital, financial damage). So, the question is, how to mitigate the risks?

Identifying and assessing the risks

The most important thing is to identify (find out) the risks first, otherwise you won't even consider them at all. How to identify the risks? As risks are not just about IT, your organization should appoint someone higher up to take overall accountability for risk management in the company. This role could be the chief risk officer. He should form an enterprise risk management committee and the members should include himself, the CIO (or someone in IT responsible for IT risks), compliance officer (for legal risks), and probably other business executives. Then, the members will work together to identify the risks. This should take the form of a brainstorm session or some other forms.
As there are so many possible risks to consider (the Earth hit by a meteor from the outer space?), it is ineffective to list them all. Instead, you should focus on those applicable to your organization's activities and environments (treating patients, laws for hospitals, local labor laws, in earthquake zone? etc.). This is called establishing the risk contexts. Then, you should focus on those risks with a high impact and/or a high probability (so, you're assessing the impacts and probabilities of the risks at the same time). For example:

  1. High impact and high probability such as a baby is stolen (assuming you have very poor security) or a patient dies due to wrong dose of medication (assuming your staff are very lousy). Usually this kind of risks should not exist for very long as the company would have gone out of business.

  2. High impact and medium probability such as the WiFi tag case above.

  3. High impact and low probability such as avian flu pandemic.

  4. Medium impact and a high probability such as hardware failure of a server.

Note that there is no strict definition of what is a high impact (MOP10,000 high impact or just medium impact?) and what is high probability (once a month or once a year?); it all depends on the rough concepts across the risk management committee members.
In addition, it is usually very difficult to quantify the impact to a dollar amount (how much a human life is worth in dollars?) or to quantify the probability to a number (the chances of having avian flu pandemic in this year is 0.1%?). So, in most cases you will just roughly classify them as high, medium or low.

Mitigating the risks

Once you have the list of risks (called the risk register), the risk management committee can prioritize them (by their impact and probability) and consider how to deal with each of them. How? It is simple: either reduce the probability or reduce the impact. For example:

  • For the risk of babies being stolen due to poor security, just enhance the security to reduce the probability of losing a baby. Put security guards at various spots in the hospital (in particular, the baby room). Attach WiFi or RFID tags to the babies for checking whenever they are transported.

  • For the risk of WiFi tags running out of battery, locate all the the equipment everyday to detect low-battery and replace the battery as required. This reduces the probability of running out of battery when the equipment is needed.

  • For the risk of avian flu pandemic, while you can't reduce the probability, you can reduce the impact. For example, set up remote interaction system between the hospital and n quarantined facility for avian flu patients that supports remote diagnosis and automated delivery of medication, so that the medical staff staff members don't need to physically enter that facility and won't get infected.

  • For the risk of a server hardware failure, you can reduce the impact by having a stand-by server, a 2-hour emergency repair service contract, a pool of spare parts and etc.

As you can see above, to mitigate the risks you may need to change your procedures (e.g., locating all the equipment everyday), IT infrastructure (stand-by server), contracts (emergency repair), to integrate risk management into all elements of your enterprise.

Transferring the risks

Sometimes it is just impossible or too difficult to reduce the probability or the impact of a risk in isolation. For example, a patient dies due to a mistake by the doctor. In that case, you can buy insurance. So, you will pay a certain small amount (a small impact but 100% probability). Essentially, you are trading a risk with a high impact (could be millions) and a low probability for a risk with a small impact and a high probability with the insurance company. For a smaller organization with limited cash such as your hospital, the latter is easier to absorb. For a larger organization with a lot of cash such as the insurance company, the former is more profitable.
In addition to insuring, outsourcing is another way to trade the risk: you just pay someone else to do the risky task for you. You pay more but the outsourcer takes the risk.
In either case, the event now has a 100% probability so in common sense is no longer a risk as it will definitely happen. Therefore, you may view it as transferring the risk to the insurance company or the outsourcer by paying them.

Monitoring and reviewing the risks

So, you have taken some measures to mitigate the risks, but do they work? Therefore, you should review regularly to see if the measures are indeed effective enough to reduce the probability or the impact enough. If yes, fine. If no, you need to come up with more mitigation measures. To allow for such reviews, you should record the corresponding mitigation measures for each risk in the risk register.
Will a certain existing or new risk become more prominent? For example, are there indications that an avian flu pandemic is coming? A new law is coming into effect? To deal with these cases, the risk management committee should review and update the risk registry regularly accordingly.