Development of a neural network system for air accidents forecasting and safety risks management based on retrospective data including many parameters and text event descriptions.
Developing a method of applying deep neural networks for effectively solving problems of constructing multidimensional dynamic event models and forecasting the probability of events occurrence, characterized by parameters vector and text descriptions on expanded samples.
Developing special software for multidimensional modelling, factor analysis and forecasting events, that affect economic values using large amounts of retrospective data (Big Data), including unstructured text information.
глубокие нейронные сети; рекуррентные нейронные сети; LSTM-сети; многомерное прогнозирование; факторный анализ; проклятие размерности; Big Data; экономические прогнозы; прогнозирование событий
deep neural networks; recurrent neural networks; LSTM-networks, multi-dimensional forecast, factor analysis, curse of dimensionality, Big Data; economic forecast; event forecast
Problem description, research relevance substantiation:
Nowadays the Customers of Industrial Partner (mostly air companies) fove no modern system for forecasting risk events and factor analysis for risk management from historical observations of events numerical and textual descriptions. Due to this, there are economic and production losses and difficulties in developing methods for reducing risks of potential catastrophic events that have high social resonance.
Existing solutions (software used by the Customers of the Industrial Partner) for forecasting (including forecasting components of one the most popular ERP SAP system) are oriented to one-dimensional time sets: for example, to sales counts, but air accidents are characterized by various parameters (type of aircraft, time of day, flight stage, season, flight duration, weather data, etc.). In addition, existing solutions are not able to process textual data, though in air accidents the log of text event characteristics, containing useful semantic data, that can be potentially generalized and allows to estimate the similarity in events at semantic level, thus increasing the forecast reliability.
At the moment, the theory of functions approximation and extrapolation is highly developed, including neural network methods, when there is a task of forecasting certain objective function behavior, depending on multidimensional characteristics vector, and time, in particular. However, the methods of probability forecasting for events occurrence if events are characterized by various parameters and textual descriptions are poorly developed.
In the task of forecasting the events occurrence historical observations are only the facts of event realization with various parameters describing these events (including text descriptions). As a rule, there are many historical event records, but every event is characterized by unique combination of parameter sets (may also include unique text description), so it is impossible to directly calculate these events occurrence frequency as every event occurs only once. While solving this problem by discarding data, for example, using histograms method, the loss of information inevitably occurs. In addition, in case if an event is characterized by a large number of attributes, it is necessary to build multidimensional histograms, which are very difficult to fill with data (the “curse of dimension” problem occurs). However, there is similarity between the events, so the person (an expert, an employee of risk department), for example, intuitively understands “what happens more often, and what happens rarely”. This is done due to generalization of information of constructing multidimensional dynamic event model using natural intelligence. Understanding the situation at a qualitative level, it is difficult for the expert to make a quantitative forecast which could be used in economic analysis. In this project, it is proposed to use neurobiological model to create a quantitative system for forecasting possibility of risk events based on modern methods of artificial neural networks.
Mathematically, there is no one significant component for described problem in machine learning — there is no data on target value (events occurrence frequency), there are only several examples of events realization. This problem can be formulated as a problem of modelling dynamic multidimensional distribution density function using diffused data, and that is poorly solved using standard statistical methods. In case of large number of measurements, the classical method (the EM algorithm, Expectation Minimization) encounters with the problem of curse of dimensions and disability to capture periodical trends over long time periods. That’s why, it is planned to research neural network methods, including LSTM recurrent architectures, which a proven in modern researches on event forecasting, including text-based information (see publications on research topic).
Publications on research topic, including foreign:
- Pichotta K., Mooney R. J. Learning Statistical Scripts with LSTM Recurrent Neural Networks //AAAI. – 2016. – С. 2800-2806. https://pdfs.semanticscholar.org/1ceb/038d8b4838120e0dc0a11c949d032cebf5dd.pdf
- Ahooyi T. M. et al. Estimation of complete discrete multivariate probability distributions from scarce data with application to risk assessment and fault detection //Industrial & Engineering Chemistry Research. – 2014. – Т. 53. – №. – С. 7538-7547. http://pubs.acs.org/doi/abs/10.1021/ie404232v?journalCode=iecred
- Tax N. et al. Predictive business process monitoring with LSTM neural networks //arXiv preprint arXiv:1612.02130. – 2016.https://arxiv.org/pdf/1612.02130.pdf
- Shiga M., Tangkaratt V., Sugiyama M. Direct conditional probability density estimation with sparse feature selection //Machine Learning. – 2015. – Т. 100. – №. 2-3. – С. 161-182. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.686.5765&rep=rep1&type=pdf
- Granroth-Wilding M., Clark S. What Happens Next? Event Prediction Using a Compositional Neural Network Model //AAAI. – 2016. – С. 2727-2733
Tasks and possible solutions:
We can formulate the following tasks to solve mathematical problems using modern neural network methods in this project:
- developing deep neural network algorithms for modelling dynamic multidimensional distribution density function, which is an intensity model of dynamic Poisson distribution of the number of events, based on the existing set of historical observations of realization examples of present dynamic distribution;
- developing and researching regularization components of error function, that ensure the convergence of learning algorithm to the most reliable solution in case of absence of data on the target output value of neural network in the learning sequence;
- researching opportunities and developing a methodology for applying LSTM (Long Short Time Memory) architectures to learn long-term recurrent trends in the events;
- formation of event attributes vector descriptions based on textual descriptions as compact as possible;
- developing a method for calculating the forecast reliability (confidence intervals to calculated probability of event occurrence);
- developing demonstrative methods of visualization of the constructed multidimensional prognostic and retrospective model and factor analysis, which allows user to receive and answer to the question, what factors influence or will influence the occurrence of certain events;
- developing formalizing method and solving multidimensional optimization economic problems and risk management problems using built neural network models.
Expected results are the following:
- Intermediate and final reports on ASREX
- Reports on patent researches, issued in accordance with State Standard 15.011-96
- Neural network algorithms for modelling dynamic multidimensional density distribution function, which is an intensity model of dynamic Poisson distribution of the number of events. This is done based on available set of historical observations of realization examples of present dynamic distribution, including regularization components of error function, which provides the convergence of the learning algorithm to the most reliable solution in case of absence of data on the target output value of neural network in the learning sequence;
- Method of applying LSTM architectures to learn long-term recurrent trends in the events;
- Algorithm for formation of event attributes vector descriptions based on textual descriptions as compact as possible;
- Method of calculating forecast reliability (confidence intervals to calculated possibility of event occurrence);
- Demonstrative methods of visualization of the constructed multidimensional prognostic and retrospective model and factor analysis, which allows user to receive and answer to the question, what factors influence or will influence the occurrence of certain events;
- Formalizing method and solving multidimensional optimization economic problems and risk management problems using built neural network models;
- Experimental software sample;
- Depersonalized datasets for formation of learning and test samples of machine learning algorithms for event forecasting system;
- The draft of specification for design and development work on the theme “ Developing universal neural network system for multidimensional modeling, factor analysis, and event occurrence forecasting that can affect economic index, using large amounts of historical data (Big Data), including unstructured textual information”.
Additional results are the following:
- Improved aircraft safety in the long-term and reduced probability of risky events, increase economic efficiency of aircraft manufacturing and maintenance;
- Positive impact on economic efficiency of commercial and government organizations, due to using new forecasting tools, factor analysis and optimization of event occurrence probabilities, based on generalization of historical observations of events characterized by various parameters and textual descriptions;
- Positive impact on the reputation of Russia in the field of innovative neural network methods development to solve modern problems with large amounts of unstructured data analysis by publishing research results in refereed international publications (Scopus, Web of Science);
- Training young local specialists in the field of data analysis and practical application of modern neural network technologies.
Application of expected results:
- Forecasting the probability of air accidents of various types and danger categories (technical failures, human factor) characterized by a set of numerical parameters and textual descriptions and also by links to already performed events with the purpose of improving the level of flight safety;
- Manufacturing — forecasting and factor analysis of important events during manufacture process (failures, successful project completion, etc.), effectiveness analysis for preventing risky events;
- Economics — forecasting events with quantitative and textual characteristics that can affect economic index of government and big enterprises (making deals, political events, reports on market segments or particular enterprises);
- Social and humanitarian field — forecasting events that have social, public resonance; appearance of publications, opinions on presented topics; forecasting unwanted (criminal) events, characterized by region or textual data.
- Ecology — forecasting the probability of technogenic and natural events, analysis of trends and factors that can affect the probability of event occurrence.
Possible consumers of expected results:
- Russian and foreign air companies (“Aeroflot” OJSC, S7, etc.);
- Aircraft manufacturers (“Sukhoy”, etc.);
- Insurance companies;
- Other large manufacturing companies;
- Info-analytical companies that monitor and forecast socially significant events;
- Environmental companies;
- Ministry of Internal Affairs (forecasting and factor analysis of events related to offences).
Possible activities to demonstrate the expected results to the consumer:
The main way to demonstrate the expected results to consumers is to license the results by the industrial partner. The industrial partner of this project is “Sh-Air-S” LLC (www.aviasoft.ru) is developing software and implementing the set of solutions, which provide risk management for leading aircraft manufacturers and aircraft (Sukhoy, “Aeroflot” OJSC etc.).
The industrial partner, “Sh-Air-S” company, plans to demonstrate the information regarding developing process to the consumers through active sales and participation in relevant events.
The company also plans to take an active part in industrial events, which will help to demonstrate the features of the product being developed to the potential consumers.
The industrial partner will get an exclusive license for the results of intellectual activity of this ASREX. Based on pilot projects results and using the created results of intellectual activity on real data from industrial partner’s clients, integration of this software solution into industrial partner’s products will be done. Upon the fact of sales of the industrial partner’s solutions, where the results of intellectual activities are integrated, “PAWLIN Technologies” LLC will receive the royalty payment as specified in Preliminary contract on co-financing and further use of results of ASREX between Tenderer and Industrial Partner (appended to the application).
In addition, “PAWLIN Technologies” LLC will be able to use the results of intellectual activities for further researches and independent commercialization, under the terms of the agreement with the industrial partner.
Demonstration of the expected results of ASREX to other possible consumers will also be done in the following ways:
- Direct sales — finding the customers and offering them the use of the results of intellectual activities. Customers can be found in the Internet, among visitors of relevant exhibitions and conferences, etc.;
- Fulfillment of pilot projects with potential consumers of the results of intellectual activities;
- Pilot project is a test, experimental project, implemented to study positive and negative aspects of some concept to make a decision about the feasibility of further bringing this concept in real life.
Pilot projects are focused on demonstrating capabilities of the results of intellectual activities to potential consumers. The result of successful implementation of the pilot project is a profitable contract for providing the software. The pilot project helps potential consumers to make a decision about purchasing the product, because during project demonstration the capabilities of software applied directly to particular consumer tasks are presented.
Information about project executors:
The Industrial Partner is “Sh-Air-S” LLC (www.aviasoft.ru), company, that develops, implements and maintains specialized IT-systems for small, medium and large air companies , airports and related infrastructure. “Sh-Air-S” company takes part in the market of developing software for analytical systems in aviation with more than 10 years of experience. “Sh-Air-S” specializes in the following services:
- Developing recommendations on processes and decisions optimization , determining quantitative and qualitative indicators of air company operations from the point of view of aviation security, constructing effective systems for accounting and reporting documentation;
- Audition of threat assessment and risk management processes in the field of aviation management, including aviation security and flight safety in air companies;
- Automation of billing services for ground maintenance of aircraft;
- Audition and analysis of business processes management and document management.
- Integration with leading booking systems (Saber, Amadeus);
- Development of integration protocols with SAP and Oracle BI systems;
- SMS alert gateways for passengers and staff;
- Report management;
- Building cloud infrastructure both based on customer’s own data centers, and using remote Oracle infrastructure;
- Consolidation of hardware resources, creation of IaaS/DBaaS/SaaS. The possibility of using clouds like Oracle, Amazon AWS and MS Azure;
- Creating and configuring single monitoring and database management center based on Enterprise Manager Cloud Control;
- Migration to Oracle 12c version with minimal downtime;
- Migration from Oracle Forms software, which is not supported by Oracle. The unique technology from “Sh-Air-S” is able to simply and fast convert the forms and reports of Oracle Forms and Reports 6-10 into a more modern web-application format, while the amount of needed resources is significantly reduced;
- The system of centralized management and generating report documentation from “Aviasoft” company. It allows to create and edit SAP Crystal Reports (and not only them), keeping current licensing conditions (no additional licenses are required).
“Sh-Air-S” company products
Information system of insurance risks management “IMSD”. The system provides the following:
- maintaining, uploading and storing the data on aviation events and losses related to them;
- maintaining, uploading and storing the data on aircraft insurance coverage;
- calculation of annual reduction of agreed aircraft insurance amount and additional insurance payments;
- control and storage of documents;
- reports generation on aviation events for a scalable period of time;
- maintaining of directories related to aviation events.
Automated aviation security control system:
- control of travel and shipping documents;
- prevention of violations of aviation security;
- control of flight schedule;
- access to a unified base of violators;
- assessment of risks and threats on the course.
During the implementation period about 150 agencies are connected, the base contains more than 400 users.
Information system for optimization of aircraft turnover:
- schedule optimization based on seasonal air flights schedule and its modifications;
- harmonization of modifications;
- preparing report documentation.
Information System of Automation of Fee Service. The solution provides management of financial information on aircraft maintenance:
- maintaining the service and fees rates databases in accordance with current price lists and agreements;
- supporting multicurrency and maintaining the directories of exchange rates;
- preparing acts on maintenance of aircraft including all required fees, rates, prices and taxes;
- calculation of relevant fees in accordance with the rates and prices in rubles and foreign currencies that are valid at flight date;
- forming the register for airport maintenance for reporting period;
- forming consolidated accounts for air companies for reporting period;
- forming the statements distributed by air companies (or types of aircraft) in rubles and other currency for reporting period;
- forming the statement of material consumption distributed by air companies (or types of aircraft) for reporting period;
- supporting schedule management for regular flights;
- access rights management for the System on individual level of needed operations;
- integration with SAP BI
Scientific adviser of the project is Lev Semenovich Kuravskiy, Ph. D., Professor, Dean of the Faculty of Information Technologies of Moscow State University of Psychology and Education, Head of the Department of Applied Informatics and Multimedia Technologies at the Faculty of Information Technologies of Moscow State University of Psychology and Education, laureate of Russian Federation Prize in Education (2011).