Databases for AI

The evolution of data environments towards the growth in the size, complexity, dynamicity and decentralisation (SCoDD) of schemas drastically impacts contemporary data management. The SCoDD trend emerges as a central data management concern in Big Data scenarios, where users and applications have a demand for more complete data, produced by independent data sources, under different semantic assumptions and contexts of use.

This demand catalyzed the creation and use of Knowledge Graphs. Knowledge Graphs data models emerge as tools to integrate structured data from different data sources and to support the representation of schema-less, sparse and heterogeneous data. Knowledge Graphs are also used to bridge the representation gap between text and structured data, serving as a lightweight semantic representation layer where entities and relations extracted from texts can be represented in a lightly integrated way.

This ability to integrate data at scale, representing it as a Knowledge Graph, supports an exponential growth in the number of associations which are expressed in the data, and which are accessible to answer complex data analysis needs and build AI-driven applications.

 

Knowledge Graphs

The consumption of data from Knowledge Graphs supports the access to knowledge at a new scale and granularity. As an exemplar application scenario Knowledge Graphs are emerging is a fundamental part of Google’s and Facebook’s strategy to support more direct information access and to evolve in the direction of transforming these platforms into intelligent assistants. Additionally, platforms such as Apple Siri have Knowledge Graphs as their underlying data substrate.

However, there is no de-facto data management infrastructure to support the interaction with Knowledge Graphs. Existing databases operate under the assumption that the data consumers know exactly how the information is expressed within the data at the character level. While this assumption holds for smaller databases (small tables), it breaks for large heterogeneous datasets as in the Knowledge Graphs.

The emergence of this new data environment demands the revisiting of the semantic assumptions behind data consumption and the design of data access mechanisms which can support semantically heterogeneous data environments.

 

Databases of the Future

StarGraph is database platform which allow users to interact with large-scale and heterogeneous data/Knowledge Graphs using natural language queries and search operations. StarGraph is a tool which works as a “Google” for structured/semi-structured data, in which users can search and query heterogeneous data.

Users can access information with queries such as:

“Give me all schools with the worst performance indicators in the Munich area?”

“How many road construction sites were there in Passau on the period of November 2014?”

“Which were the construction contractors and the value of their contracts employed by the State Government of Bavaria?”

“Show me the biological pathways and the drug interactions associated with the BetaTub3 tubulin.”

“List all production oil-wells in the field Marlin which have overall volume of more than 300.000 barrels and which has a fault sealing composition of the type D5.”

“Which CEOs from baking companies served at the board of Oil & Gas companies”