How to detect Stuxnet, Irongate and other similar malware in Automation & Control Systems (PLC/DCS/SCADA) or Safety Instrumented Systems (SIS/ESD)
Stuxnet is arguably the world’s best known piece of Industrial Control System malware that was apparently used to sabotage nuclear plant equipment at Iran’s Natanz facility. The malware did damage to a lot of critical equipment in that facility. It did not just stay inside Natanz however, it spread everywhere and within a couple of years was found on hundreds of other Siemens control systems in Asia, Europe and elsewhere. After it was somewhat accidentally discovered, researchers analyzed it, guessed its objectives and revealed their findings to the world. They also put the code online (presumably for other security researchers to analyze and take steps to harden other control systems). The entire episode put Industrial Cyber security back into focus. Several cybersecurity experts and others were warning for quite a long time, about Industrial malware, but nobody in positions of authority seriously believed that something like this could actually happen in the real world. In that respect, Stuxnet was a big wake up call to everybody in the Industrial Instrumentation, Control Systems, Automation Systems, Safety Instrumented Systems and Functional Safety fraternity. It also got noticed by those who were warning about cyber war. Until now, cyberwar experts were studying the implications of hostile agents cyber attacking banking, internet, telecom and other infrastructure including electrical SCADA systems used in power distribution. However Industrial cyber attacks were more in the realm of science fiction than reality. That myth got shattered by this.
II. IRONGATE-STUXNET IN A NEW AVATAR?
Now that the Stuxnet code was there in the open, cyber security and cyberwar experts were warning us about copycat attacks, using code similar to Stuxnet. Any other malicious actor could simple copy and use it in creating similar attacks. This apparently seems to have been done, with the discovery of “Irongate” named by Fireye, who have released a report about it on their blog here.
III. MALWARE COMPONENTS
Note that any well written malware has several components. One part may use so called “Zero Day” exploits, to surreptitiously get into an unsuspecting control system, another part may use obfuscation techniques to hide itself and yet another part may be the main payload (that actually carries out the tasks set by the designers). Stuxnet for example, used as many as four zero day exploits to spread around. However, one of the main components of the Stuxnet payload, was its MITM (Man In The Middle) component, which it used to intercept commands and data to and from the PLC to the Control System’s operator and engineering stations. This MITM component observed and recorded normal equipment behavior and used it to generate fake data, that it sent to the operator consoles. It used to play back the same data again and again that lulled operators into thinking that everything was running normally. It also intercepted Operator setpoint commands or other On/Off commands and changed them in a way such as to damage the equipment under control. For example, the code increased the speed of a centrifuge by 150% of the setpoint entered by the operator and it also displayed the Process Value ( the same centrifuge speed) as lesser than the actual value (so as to keep the operator in the dark about the real speed of the centrifuge). This same MITM component has been found in Irongate.
IV. USING FUNCTIONAL SAFETY TO DETECT MALWARE
So how can we detect and stop these kind of malware? Conventional antivirus systems will not be able to distinguish this from other pieces of control system software. Indeed “Irongate” may not raise any red flags at all, if it were present on a Control or Safety system. The only way to detect it is by using methods related to Functional Safety. How?
The International Functional Safety standard IEC 61511 says that Proof Testing is essential to detect dangerous faults in Safety Instrumented Systems. We can use this, to find all kinds of dangerous faults, including the presence of such malware.
Proof Testing can easily determine if the Safety Instrumented System (or even any other non Safety System such as a DCS/PLC/SCADA system) is really working as designed and has not been hijacked by malicious code. In proof testing, one runs a simulated test where the inputs are simulated and we check if the Control System (or SIS as the case may be) behaves as expected. For example, if we have a closed loop temperature control loop, where we need to cool a reactor in case the temperature inside the reactor exceeds the set point, by opening a coolant line via a control valve, then we can simulate this situation in the proof test and check if actually the valve opens as expected or not. In case of Safety Instrumented Systems, the proof test interval gets decided at the time of design of the Safety Function (safety loop). Typically proof tests may be run every year or every 6 months, depending on the SIL that is required to be achieved. In case of proof tests to catch Stuxnet like malware however, they need to be run more frequently. In fact they should be designed so that every day or every shift, any one loop in a particular controller can be proof tested by a field operator and/or Instrument technician, to verify if the desired action such as opening or closing a valve, or starting a motor did take place. You can rotate the controllers or loops to be tested, depending on the criticality. So for example if our temperature control loop were critical, then it could be tested every day. The proof test protocol has to be well thought out and detailed, so that all behavior including generation of alarms and logging takes place as expected to. If there were any Stuxnet like components, then the proof test will fail. This should immediately raise the alarm (in a managerial sense), so that the Plant engineer or manager can analyze and take appropriate action.
Note that some plants already have such proof test protocols (especially in multi-purpose batch plants), where the safety interlocks or main control loops are tested before the start of every batch. They were not designed to catch Stuxnet like malware, but for other reasons such as to catch failures in transmitters or control valves or other equipment that was prone to failure. By using these same tests, they can also catch these malware injections easily. Other than this many plants also have a periodic calibration and bump tests for their critical instruments. Well simply, modify these tests and their frequencies, so that they will also reveal the presence of malicious code that acts as an MITM entity.
V. WHAT NEXT
So now one fine morning you discover that the proof testing has revealed a failure. The failure may not be necessarily due to malware, but due to other failures in the DCS or PLC or SIS logic solver. Carefully eliminate these possibilities and then if the finding is that indeed, it is a case of a malware infection, then you should proceed with whatever emergency plan has been put in place in case of such incidents, including reporting to the Computer Emergency Response Team (CERT) or ICS-CERT as the case may be (say, in the US). Needless to say you will have to take a safe shutdown of the plant or facility as the behavior of the system may be unpredictable. Ask your DCS or control system/SIS vendor about THEIR emergency response plans in such cases and follow it. You may have to take steps to locate the source of the infection, to avoid further problems down the line, as well as inform your organization's top management about it.
By having a properly designed periodic proof testing and calibration checking plan, our likelihood of detection of industrial malware such as Stuxnet or Irongate increases dramatically. However note that, it is not the only step to be taken in protecting your plant or facility from an Industrial cyber attack. There are possibly many more types of industrial malware out there and of course not all of them will use this Man in the middle attack method. Hence please appreciate that there are many more aspects to this issue, including following a set of design and maintenance/operational guidelines that you will find in our upcoming e-learning course on Industrial Cybersecurity.
Like this White paper? Get our other White paper on the Top 5 myths of Industrial (DCS, SCADA, SIS, PLC) Cyber Security below
Our Safety Instrumented Systems Course, shown below has everything you need to know about Functional Safety, Safety Instrumented Systems and SIL (Safety Integrity Level). Get it now! Also get Certified for Free!