Data & systems management

The data & systems management curriculum encompasses many disciplines to provide students with the knowledge necessary to face the challenges of complexity and interdependency, which arise more and more in many areas such as energy, transportation, health, finance, and social networks.

Systems are sets of interdependent and dynamic parts that should be properly analyzed to be understood and/or properly designed to be useful. Due to technology development, the size and complexity of the systems is increasing dramatically, along with their capability of generating lots of data. Handling the data can lead to better understanding of phenomena and improved capability to design more efficient strategies to govern them. Understanding data and developing suitable models (meaning models that are complex enough to capture the phenomenon and simple enough to be of practical use) is crucial for facing the emerging challenges.

Data & systems management is mandatory to efficiently and properly address any kind of problem. Five key phases are usually implemented. First, data from possibly different source are gathered to retrieve all useful information to formulate a good solution. Then, the problem is described in natural language, making explicit all decisions to be made and all aspects to be taken into consideration. At this point, the problem is formulated in mathematical terms through an analytical model, such as an optimization model described by a set of decision variables, constraints, and an objective function. Simulation models are designed when the analytic approach is too cumbersome or unsuitable. Once all the data and the appropriate model are satisfactorily integrated, it is possible to get a solution to the problem, if one exists. Finally, the solution is validated to make sure its appropriateness for the real life, and not only on paper.

The development of these five steps are, inter alia, carefully introduced and described in this curriculum. Courses such as algorithms and data structures, databases, machine learning and data mining, will provide you with the tools and methodologies to collect, organize and analyze data. Courses such as complexity and cryptography will also enable you to manage data that require encryption, an issue of great importance in modern ICT. Topics such as systems and control theory, optimal and robust control, optimization, identification and adaptive control will introduce you to the field of decision theory and control of dynamical systems. 

The data & systems management curriculum is basically a cross-discipline and cross-application one. Its nature is methodological and the applications are many, ranging from energy management to traffic control and to robotics, to mention just a few. To give you a flavour of projects that have been developed in our department that are strongly related to systems and data management, we now present two valuable examples.

 

 

The first research project is an application of optimization techniques to air traffic management. Under the SATURN European project (www.saturn-sesar.eu), carried out in collaboration with the universities of Westminster, Brussels (ULB) and Belgrade, different mathematical models are developed in order to study how air traffic may be redistributed months before the day of operations of flights to reduce airspace congestion and, consequently, flight delays. One approach is based on congestion pricing, which works under the concept that it is possible to "naturally" redistribute traffic by increasing the price that an airline should pay in order to fly in a congested area. This approach is fully consistent with European regulations that impose each airline to pay air navigation service charges for all its flights in the European airspace. Another approach chooses flights to be redistributed centrally, minimizing either the displacement applied to flights with respect to the requested schedule, or their operational costs. These models, however, would not be much more than a nice mathematical study without their application on real flight data. Thus, relying on real flight data from EUROCONTROL (the European Organisation for the Safety of Air Navigation), describing both the structure of the airspace and airports, and flight plans, we created a geographic database containing all this information. Raw data were then subject to complex manipulation and transformation to be used as input for our models. Real problem instances involve all flights operated in Europe in one day, i.e., around 30000 flights. This leads to models having approximately 50000 constraints and 6 million of variables, which are solved in about five minutes on a standard server. Hence, optimal or near optimal decisions for all these flights are taken in reasonable computational time. The large size of these instances requires proper data structures and to make appropriate decisions that are neither too detailed nor too simple. Since models should be executable on multiple instances representing different days, the geographic database needs to contain an even larger amount of data, representing the complete trajectories of hundreds of thousands of flights. Once problem instances are solved, results are obtained, and they also involve large quantities of data that needed proper management and analysis. In this complex environment, we were able to propose some solutions that may be able to reduce airspace congestion, flight delays, and related costs. This work was presented at many international conferences across Europe and the United States.

The second example is the identification of the model of an Electric Arc Furnace (EAF). EAFs are largely employed in the steelmaking process and their proper regulation is of fundamental importance for limiting the energy consumption and the pollution, as well as for guaranteeing high quality production (green melting).  From the thermo-chemical point of view, an EAF is very complicated. Even if the mechanisms and chemical reactions are well understood, the lack of knowledge of many physical parameters that are specific of each furnace, renders almost impossible to build an accurate physical model suitable for simulations and predictions. Still, a model is necessary for the regulation. Model identification is the process of generating dynamical models from the data collected during the plant operation. In this particular case, the goal of the model is to predict the gas emissions (O2, CO, CO2 and H2O) based on some measured variables.

The data consists of 50 temporal series corresponding to 50 castings of an EAF furnace, belonging to a same family (i.e., with similar material and production). For each of them, 19 variables have been acquired (power, temperatures, concentrations, and so on).

During a casting, successive operational phases in the furnace EAF are performed (preparation, moulding, refining and tapping). Each of these steps must be well described by the model, so in this study we proposed to use for each phase a different linear dynamic model to describe the entire dynamic evolution of the furnace through a multi-model approach. By employing so-called black box identification techniques it was possible to identify the model (of, better the family of models) able to predict the behaviour of the outputs (gas emissions) to a satisfying accuracy. The family of models may be employed for regulation, for instance by using the so-called Model Predictive Control.