Big Data is a technology that is growing in importance in a wide variety of business sectors. It is about using special techniques on large amounts of data, to gain insights as well as actionable pieces of information. It’s importance can be gauged from the fact that it is considered as a second Industrial revolution, meaning that it will completely change how business is conducted, across a wide swathe of enterprises and markets. Until recently, Big data was mainly being used in Retail (both bricks and mortar as well as E-commerce), Online Advertising, Healthcare & Medical Insurance, Banking and Financial markets. However, it is now being used also in the traditional industries such as Oil and Gas, for example in smart Reservoir Management of oilfields.
In this White paper, we will examine how we can fruitfully use Big Data techniques, in the downstream industries related to Oil and Gas as well. This includes industries such as Petrochemical & Chemical manufacturing, Pharmaceutical manufacturing, Power Generation and so on.
Note that as far as “Big Data Techniques and software” are considered, problems and solutions need NOT be really be industry specific. So the same Data Analysis techniques that are used to find patterns in Online buying behavior can be used to find patterns in Alarm Floods that happen during Startup. Domain knowledge is necessary to only know which of the particular domain's problems could be solved by Big Data techniques. In that respect, the same tools can be used again and again in different domains to solve different problems.
What is Big Data
If you recall the early days of computing history, data was considered “big” even if was in Megabytes. The earliest PC models used to have RAM only of the order of 256MB (I still remember using one of those machines). At that point of time, data of the order of Gigabytes was considered “Big”. As computing power and memory densities increased exponentially over the years, 1 GB is no longer considered big, in fact phone memories are of that order and PCs/servers are processing even bigger chunks of data.
Thus what was considered big, just a few years big is pretty small by today’s standards. Now when we talk about Big Data, we are talking about data sizes of the order of Petabytes, Femtobytes and so on. The thing about Big Data is not just related to the storage and processing of large amounts of data, but the fact that today large amounts of Big Data is being generated every day and is growing exponentially as well. This is because in the last decade, large numbers of devices became “smart” due to embedded processors like microcontrollers and microprocessors. These devices started contributing large amounts of data points to their owners. These data points could be again of a variety of types- be they clicks on a particular ad on Google, or pageviews on a Facebook ad, or the number of SKUs of a particular brand of cola in a Wal Mart warehouse or the number of people passing through a busy Metro ticket turnstile in a city.
Traditional ways of number crunching and analysis could not cope up with the amount of data that started being generated and hence we had the science of Big Data evolving.
Note that when we refer to Big Data, we are not referring to just the volume of data, but also to the analysis techniques used in crunching this data. These are completely different than what we normally are used to. We do not use Excel, for example.
Big Data Generators in the Process Industries
The generation of data in the Process industries, on a really large scale, started with the gradual introduction of Smart Transmitters and Distributed Control Systems. In previous years, when pneumatic transmitters were used, data was being collected of course, but the scan times were low. e.g. a pneumatic pressure transmitter connected to a strip chart recorder would record the pressure value once every 30 seconds. When the pneumatic transmitter got replaced with an electronic one (analog electronic to start with and then a digital “smart” one equipped with a microprocessor), which was in turn connected to a Distributed Control System (again a microprocessor based one) the scan time increased to 1 reading per second. So the data generated suddenly increased by 30 times. Then the DCS itself started generating data about process conditions in the form of various alarms. So while in the older plants a single pressure measurement had maybe one alarm point, say a High Pressure alarm, wired to inform the operator if the pressure increased above a certain value, the newer transmitter generated 6 alarms by default such as High, High High, Low, Low Low, Diagnostic and Rate of change alarms. This was just regarding the Process Pressure.
When the older 4-20 mA signals gave way to HART, Fieldbus and other technologies, the data generation increased even more. Today the transmitter can not only give us all these alarm values but also other information such as when it was last calibrated, who did the calibration, when it is next due, how many times did the process value go beyond a certain value, whether any surges were experienced in the pressure, whether the impulse lines are choked and so on.
This is about the one pressure sensing point in the plant. The declining cost of smart transmitters (in inflation adjusted terms) and the increase in (relatively) newer technology based instruments such as Mass Flow Meters, Fire and Gas detectors, Weight transmitters and various Analyzers has meant that the absolute numbers of instruments in plants today are far more than what used to be present say 30 years ago, where we had predominantly Differential Pressure transmitters (to measure Flow and Level), Pressure transmitters and Temperature transmitters.
Thus today every sensor/transmitter in the process plant today is a data generator, which throws out large amounts of data.
This means that accounting for the higher numbers of transmitters, coupled with higher scan times, coupled with new data parameters other than just the Process Value, implies that the data being generated would be easily 100 times what it was 30 years ago. And this is just from the field instrument side.
On the Control System side, we have other data about operator actions, operator logs, inventory receipts, dispatches and so on. Typically this kind of data was never available before. In earlier days the plant operator simply used a physical key to allow him access to the Control System. There was no differentiation between operators. They simply used the same key to access the Control System. Ditto for the Plant Manager and the Controls engineer; each used his own physical key to get privileged access for accessing logs, configuring and programming the system. Todays’ systems have the facility of individual logins, so that we can track a particular operators actions during the shift. So now this is yet another data generator.
Are there other data generators? Yes, many plants these days have LIMS (short for Laboratory Information Management Systems) that collate lab sample analysis data. So this is another data generator. We may also have CMMS (Computerized Maintenance Management Systems) that collate data about maintenance work orders, equipment downtime, failure history and so on. There is a wealth of data in CMMS that can be used to gauge reliability of the plant. Some plants may also have separate Inventory Management Systems for their tank farms. There is a lot of data stored here which could be useful. So this is another data generator.
Note that these systems, may or may not be connected to the plants Control System or to the Enterprise systems.
Other data generators, which may not be directly connected to the Control System are Safety Systems. These include not only Safety Instrumented Systems, which act as Emergency Shutdown Systems and Sequence of Events recorders, but also Incident Reporting systems which record undesirable incidents such as accidents and near misses, as well as spurious trips.
Some plants may have other data stored somewhere in the location, that has records of Safety Studies such as HAZOP or What-If analysis, Hazardous Area Classification studies, MSDS (Material Safety Data Sheets) and plant modifications done over the years. These are not data generators, but data warehouses.
In short, we now have a much larger amount of data today than what we did 30 years ago, of the order of say 200X or even more.
How can we use this ginormous amount of data?
How to use this Big Data fruitfully
The data generators are installed by different companies, people and departments. For example, we have various kinds of instruments procured from various companies. Then we have various systems such as CMMS and LIMS, installed by different companies. As an owner/operator the production company management has to collate all the data from disparate sources and then glean actionable information from it. Can we use Big Data techniques to do so? Can we find correlations (may or may not be cause and effects) between these various chunks of information? Yes, we can certainly use Big Data techniques to derive valuable insights from this sea of data that has now started becoming available.
Alarm Management Systems
Due to the huge number of alarms being generated, literally every minute, in a process plant, alarm management has become a priority. Can we reduce the number of alarms? Can we build in a better hierarchy of alarms? Can we use Big Data techniques to do so? Certainly.
For example, we can gather data about near miss incidents from the Incident Reporting System and correlate it with the HAZOP study, for the respective nodes and then again correlate it with the alarms that were generated, which again can be correlated with operator actions taken with the Sequence of Events recorder. We could then find out if certain alarms did act as early warning systems and also find out those spurious alarms that only caused operator confusion and information overload, that can be eliminated.
This requires the gathering of data from multiple data warehouses and then extracting meaningful info from this big data, using Big Data techniques.
Advanced Process Control
Some process plants these days have advanced process control systems (APC for short). Traditional controls rely on individual closed loops (say controlling the level at the bottom of a distillation column and the reflux ratio at the top, we control the composition of the distillate). The assumption is that these loops do not interact, but in fact they do and we may not get an optimum composition. APC attempts to construct a mathematical model of the process (say the distillation column) and then generate setpoints about the optimum column bottom level, reflux ratio and so on.
We can use big data to correlate the behavior of bigger units of the plant. So for example APC implementation on a particular Distillation Column may be causing disturbances in some downstream units. We can find this by using Big Data techniques.
For process plants that have hazardous processes that use hazardous materials which may be toxic or explosive, we have Safety Instrumented Systems to ensure that undesirable incidents such as leaks, explosions and other such events are prevented. To design and maintain these systems, we use Reliability Engineering to calculate values such as the PFDavg(average Probability to Fail on Demand) for different Safety functions. For example we may need to calculate the reliability of a particular Safety Function that is used to prevent overfilling of a storage tank. To calculate the reliability of this system, one needs to have reliability data about each of the components that comprise this safety function e.g. a transmitter and/or a level switch, coupled with logic solvers, solenoid valves, actuators and so on. Many times, actual field data about a particular instrument (its failure rate data) may not be available and hence it is calculated on the basis of its design, what components are used in it, performing a FMEA (Failure Modes and Effects Analysis) and so on. This gives us a predictive PFDavg of the device, from which we calculate the PFDavg of the Safety Function. Note that these are predicted values and not actual.
Now with the advent of all these modern Control Systems, maintenance systems, incident reporting systems, we can have actual failure rate data from the field. Using this data is much better than using the predicted data. Thus if a plant has all these systems and is say 10 years old, then we would have all the reliability data that we need for the various instruments. We can use this historical data to then analyze the reliability of the Safety Functions. This will be having a much higher confidence level than the predicted reliability figures. Using this data can then help increase the Safety Performance and even reduce costs.
Many process plants that have batch processes have different outcomes for different batches. This is because it may be a new product or process. Sometimes, a process plant manager may observe that a certain batch on a certain date had the best performance, as compared to other batches. This may be because it used less energy, or raw materials or catalysts. The manager can then collect all kinds of data from this so called “Ideal batch” such as the performance of the various control loops, pressures, temperatures, operator actions and timings, analytical data about the material compositions and so on. This data can then be correlated with other non ideal batches, using big data techniques to find out optimum setpoints and recipes for upcoming batches, so that over a period of time all subsequent batches could meet or even exceed the last ideal batch. In fact, one could create a model that either predicts how good the current batch would be as compared to the ideal batch, or even better, it could predict what you need to do to make EVERY batch an ideal batch!
In HAZOP studies, typically we break down the plant (conceptually of course), into nodes and every node is analyzed using Guide words to evaluate possible dangerous deviations. These deviations are then tackled using different mitigations, to ensure that the deviation cannot happen, or, if it does, the impact is less severe.
However, in this technique, the nodes are almost always analyzed one at a time. Further, within one node, every deviation is analyzed one at a time. However combinations of guide words, nodes are not easy to analyze manually or even using software. This can be done using Big Data techniques. Even better, we can compile the near miss accidents, undesirable events and actual accidents that occur and correlate it with the previous HAZOP studies to find out which mitigations did not work or which hazards were not considered, even which alarms sounded and which operators responded in what way.
Startup and Shutdown Sequence Simulation
It is said that Startup and Shutdown of any process plant have the most potential for errors, undesirable events and accidents. We can use Big Data techniques to simulate various things that could go wrong (in particular, combinations of circumstances). This gives us a better idea of planning startup and shutdown sequences that have the least potential to do harm.
We have listed only a few applications here, related to the Process Industry that can use Big Data techniques to gain insights. There may be many more applications such as Supply chain and Inventory Management Systems or even Cybersecurity (network data traffic analysis for example, that could point to an attack on a Control system).
Big Data thus will have an increasingly important role to play in all process industries in the years to come. Stay tuned for more!