What is StarGraph?

StarGraph (aka *graph) is a graph database to query large Knowledge Graphs. Playing with Knowledge Graphs can be useful if you are developing AI applications or doing data analysis over complex domains. For understanding more about Knowledge Graphs and its applications in AI and advanced Data Analysis please check here.

Features

These are the features of StarGraph:

  • Natural language interface.
  • Easy integration of structured and unstructured data (built-in named entity linking).
  • Performs semantic approximations.
  • NoSQL open source database.
  • Compliance with Semantic Web standards (RDF/SPARQL).
  • Scalable for large Knowledge Graphs.
  • Multi-lingual support.

Knowledge Graphs

But what is a Knowledge Graph anyway? Knowledge Graphs are data models which focus on the representation of complex and sparse conceptual models. If you want to represent a simple and more closed domain, you use tables and relational databases. Tables provide you a framed view of a certain part of the domain you are representing. Think of the typical “Customer” and “Purchase” tables in a typical organizational database: you have names, addresses, phone numbers, etc. Tables are very good to represent sets of attributes which are regular and well focused. As the set of attributes grow larger they tend to become sparse. Now imagine modeling a much more complex domain such as describing a Person for a social network or for a dating website. Thinking with the relational (table) mind, the model designer might constrain the concept of a person to some dimensions, such as profession and hobbies. This process of constraining or slicing the domain introduces an artificial limitation in the model.

Knowledge Graphs are a generic name to designate data models which can grow organically within the number of attributes. In essence it is a labelled graph, which can grow unconstrained. These models allow the continuous capture and growth of the domain that can be interpreted by a system. This property of reflecting the worlds complexity into the database is fundamental for the development of two emerging types of contemporary demands: (i) Artificial Intelligence (AI) systems and (ii) complex data analysis systems.

If you go for Knowledge Graphs you need to be Schema-agnostic

But playing with Knowledge Graphs can be hard. Due to the freedom to introduce the knowledge (which means, being schema-less) the complexity of the schema of a Knowledge Graph can grown very large. Differently from relational databases, knowing what is inside a Knowledge Graph can be hard. This means that ideally, databases management systems (DBMS) for Knowledge Graphs need to be schema-agnostic, meaning that they abstract users from the specific representation of the data (e.g. if ‘John Smith’ was described as ‘Customer’, ‘Client’ or ‘Consumer’).

Schema-agnostic databases provide an additional layer of abstraction for users: users can query the data with their own terms and it is the work of the database engine to semantically match what was asked with the data, independent of the design choices of the schema.

Querying StarGraph

Now imagine you have a database containing some data about the people, places, films and objects which are in Wikipedia (a version of thhis database is called DBpedia). Now suppose you want to know “Who is the daughter of Bill Clinton married to?”. In SPARQL, the query would be:

PREFIX : <http://dbpedia.org/resource/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/ontology/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT   ?y  {
 :Bill_Clinton dbpedia:child ?x .
 ?x dbpedia2:spouse ?y .
 }

Meaning that before issuing this query, you need to know that the database expressed the concepts ‘Bill Clinton’, ‘daughter’ and ‘married to’ as the ids ‘http://dbpedia.org/resource/Bill_Clinton’, ‘http://dbpedia.org/ontology/child’ and ‘http://dbpedia.org/property/spouse’. Checking the exact terms for a matching can be a problem if you have a schema with millions of attributes.

A possible schema-agnostic way to express this query using StarGraph is:

SELECT ?y {
  BillClinton hasDaughter ?x .
  ?x marriedTo ?y .
}

Note that in this example, the user used his own terms, adhering to the syntax of the SPARQL language.

However, using StarGraph we can further simplify this process, by allowing users top issue the same query using a keyword form:

"Bill Clinton daughter married to"

or using the natural language question in the beginning.

"Who is the daughter of Bill Clinton married to?"