Short paper based on discussions and presentations regarding
the importance of semantic modelling in process industries.
M. J. Neuer
Introduction
In recent years, the term digitalisation is used very frequently. Technically it refers to the conversion of analogue, in other words continuous, signals to discrete sample points. The latter can then be interpreted by computers. Let us therefore even put the term digitalisation in more simple words, defining digitalization as a mean to provide computers with information. It is a weak definition, but for the coming considerations this is quite convenient.
Sampling of signals as an example of the "First Digitalisation"
An obvious first example for data are signals. All various types of signals have been digitalised and this affected our daily life: our music went from vinyl records to tapes, both analogue media and lastly over compact discs to MP3 files, both representing the digital media. The same happened to video information, too, but later due to the bigger data volume involved with motion pictures. Let us call this type of conversion “first digitalisation”, termed in the most physical sense as sampling data points.
First digitalisation then applies to many information in our daily life, but essentially for industrial data. With the inflation of the world-wide-web, social media arised, spawning data volumes of extreme sizes. In the wake of these technologies it became obvious that we needed concepts to analyse and understand relations hidden in these data. The term Big Data [1]was coined, defined in terms of the big ‘v’s denoting data that is big in volume, exchanged with high velocity, exhibits a certain amount of veracity and contains a reasonable value to be extracted. Methods to handle such data were also rapidly developed [2],[3], finally leading to ways to analyse them [4]and getting impact and value [5].
Compared to social media data e.g. originating from twitter, common industrial data is indeed much smaller. Even when pictures, video and audio information is included, industrial data streams tend to be smaller than social media traffic. Moreover, many industrial data are simply unused. Recorded for “in-case-of” scenarios, many data were not systematically parsed and just stored away. Upon realizing this drawback, activities were initiated to push the topic of exploiting data further into focus. Therefore, by today, several research projects are dedicated to the evaluation of large data streams coming from process chains. We may conclude that in a certain sense, the first digitalisation was successful, because data science as field is prospering and all industrial sectors have realized the impact potential.
Context
Let us now come back on our definition of digitalization. Implicitly, many people imagine data as tables with rows and columns. Sometimes, one may think about vectors or matrices. Now, please keep exactly this idea in mind and consider some vector of numbers, for instance a series of velocities that you reached with your car. Once these velocities are recorded, they are nothing else than numbers in a table. You can plot them in a diagram, look at them or use them for any kind of posterior analysis. For your computer, they are still just numbers. Numbers without any context. The computer does not know that they are velocities. This knowledge is added by you when you analysed the data, because you knew these values are recorded velocities.
We proceed by asking the question, can we give such a context information to the computer? And the answer is yes. First we define abstractly the term “car”. Next, we associate so-called state variables with the “car”, which are “position” and “velocity”, whereas “velocity” can be calculated by differentiating “position” with respect to “time”. Imagine we have these kinds of objects available, we could now tell the computer that there are even multiple cars. Each car being the instance of the object “car” we just defined in an abstract way. The computer does now know, that each of the cars must have an “velocity” associated with it. We can even go one step further and define the object “street”. “Streets” may have several “cars” on them. “Cars” have “engines” that influence the state variables, thus without an “engine” the “car” has the “velocity” zero. You can see, that with this way, we can supply context to the computer and the computer can start to deduce and perform inference.
Semantic modelling as "Second Digitalisation"
Of course, what we described here is semantic modelling. It is a second digitalisation which introduces a new type of information to the computer, information that does not originate from sampling signals, that does not come from sensors or any other automatic acquisition. Semantic information must be modelled and it is essentially different than just sampling data. Once the computer has a domain model of a “street” and the “car”, it can start to work with these terms. Especially in industrial data applications, semantic modelling and contextual sampling are currently still underrepresented.
References
[1] S. Yin and O. Kaynak, “Big Data for Modern Industry,” Proceedings of the IEEE, vol. 103, no. 2, pp. 143--146, February 2015.
[2] N. Marz and J. Warren, Big Data: Principles and best practices of scalable realtime data systems, ISBN 9781617290343, Manning Publications, 2015.
[3] M. Hausenblas and N. Bijnens, “Lambda Architecture Repository,” [Online]. Available: http://lambda-architecture.net/. [Accessed 07 Sept. 2016].
[4] S. Aggarwal and N. Manuel, “Big data analytics should be driven by business needs, not technology,” McKinsey & Co., June 2016.
[5] D. Court, “Getting big impact from big data,” McKinsey Quarterly, January 2016.