ABSTRACT

DBpedia has existed for almost over a decade now. Although the data and the community created ontology have received immense investment, good tools to query and explore the data are still rare. In this paper, we present a prototype that attempts to break down complexity of graph querying into simple and guided steps. During the implementation of our DBpedia Explorer, we faced many barriers that can be traced back to a lack of data quality as well as the design of the DBpedia Ontology. We investigated these problems in detail based on a small sub-graph of DBpedia and gained valuable insights that will hopefully allow us to apply data transformation and ﬁxes to DBpedia that will be beneﬁcial for browsing and querying of the data in the future.

IMPLEMENTATION

Overview and Initial Search . We restricted ourselves to a single text input ﬁeld to keep the tool simple and to make it resemble a search engine rather than a query builder. The user interface of the tool can be seen in Figure 2 and Figure 3 . The text input of the user is then translated to an element of the DBpedia knowledge graph using an auto-complete feature. For this we used the DBpedia Lookup ² service providing a preﬁx search for all the instances of the graph. In order to include also both classes and properties of the DBpedia ontology into the tool, we added these to a custom Solr index and merged the output of both services. Using the results of the auto-complete, the user can create a search query that generates a SPARQL query which consults the oﬃcial endpoint.

This initial query is deﬁned as the ﬁrst filter on the class Thing and creates the ﬁrst tab. While additional filters limit the result set of the current tab, the expand operation will open in a new dependent tab, allowing the user to keep history, freely switch between sets and make changes to a set which will also propagate to sets expanding from it. The tab view also provides the options to close a set and to create a new empty one. Figure 1 shows the high-level architecture of our tool.

**Figure 1.** DBpedia Explorer High-Level Architecture

Filters . A filter is either based on an instance, a class or a property from the underlying knowledge graph and can be added to a result set to generate its query. Filters based on instances are referred to as instance filters, the ones based on classes and properties as class filters and property filters respectively. Filters are rendered in natural language, so the user is always able to keep track of his current query. Class ﬁlters are displayed with their label, e.g. ”Actor”. When adding an instance as a ﬁlter, the ﬁlter will search for instances with an arbitrary relation to the previous ﬁlter. This arbitrary relation with the label related to (e.g. ”Actor related to Germany”) can be clicked on and changed to a speciﬁc property in a drop-down menu. Properties in the drop-down are ordered by the number of occurrences of the property with the ﬁlter instance. Each ﬁlter implements a function F ( i ) that generates a SPARQL snippet that is then inserted into the main query. A class ﬁlter with class C creates the snippet “?s _i a C.”, a property ﬁlter with property P produces “?s _i P ?o _k .” and an Instance ﬁlter with instance I produces ”?s _i ?p _k I .”, where k is an integer that is increased when iterating over the set ﬁlters to create unique variables and i is an integer passed to the snippet generation function.

Expand . Expands are used to create a relation between two sets (opening a new dependent tab). This relation can be arbitrary or a ﬁxed property from the ontology. Each result set s _n may have a set s _m that it is expanding from. This set s _m is referred to as the base set. Query generation is a base query with snippets S ₁ to S _j with S _n being the snippet generated for set s _n and with s _{n + 1} being the base set of s _n .

“SELECT ?S ₀ WHERE " + S ₀ + S ₁ + … + S _j + " LIMIT 1000”

with S _n being:

f ₁ . F ( n ) + f ₂ . F ( n ) + … + f _p . F ( n ) + Er ( n )

with p being the number of ﬁlters in the result set. The expand function Er ( i ) produces an empty snippet, if the set s _r has no base set. Otherwise it yields the snippet “?s _i ?p _k ?s _{i + 1} .” when describing an arbitrary relation between two sets. Expands can be added by clicking the expand button and will result in a new set searching for instances related to the previous set. This arbitrary expand can then also be reﬁned by clicking the “ related to ” label in the set description and selecting a property from the drop-down.

Search Restrictions . Entering a string into the search ﬁeld will suggest one or more elements from the knowledge graph using auto-completion. The best match is then selected and used to suggest one or more possible operations the user can execute. These operations can be the addition of an expand, the addition of a ﬁlter or a combination of the two. The user can now submit his input to simply add the selected element as a ﬁlter or click on a suggested operation to execute it. When selecting properties and classes we excluded operations from the suggestion when they did not make sense (disjoint sets, such as Persons that are also Castles or Places being birth place of Castle ). The collection of instances in the result set all have, per deﬁnition, the common type owl:Thing . This common type of all instances can be more speciﬁc as a class ﬁlter will force all instances of the set to be of a speciﬁc type. This common type is referred to as the set class. The set class prevents the user from adding any class or property ﬁlter that would change the set class to any class not being a sub- or super-class of the current one. This can occur when adding class or property ﬁlters as well as property expands. Checking for the validity of property ﬁlters and expands is done using the domain and range of the property.

**Figure 2.** A step-by-step search example

EVALUATION

We targeted our evaluation to identify data quality issues in DBpedia that serve as requirements for future development of the ontology, which could allow easier exploration. We ran evaluation queries based on the following classes and their properties: dbo:Castle , dbo:Building , dbo:ArchitecturalStructure , dbo:Place , dbo:Settlement , dbo:PopulatedPlace , dbo:Person , dbo:Monarch and dbo:Royalty . This class set gave us good results with meaningful properties like dbo:birthPlace , dbo:builder or dbo:architect connecting the Building and Person domain. The operation suggestions work as expected, only providing meaningful suggestion when it comes to property and class ﬁlters. With our search tool, we found it easy to create both simple and complex queries by following the operation suggestions. However, we ran into diﬃculties when dealing with inconsistencies in the underlying data. In the following, we make use of concepts introduced by OntoClean [5] such as rigidity and identity; while the former can help improving recall, the latter is more interesting for exploration purposes as primary keys can serve as a natural partition of the data to explore.

**Figure 3.** A query search expand example

Typing into the search ﬁeld (upper left) will retrieve auto-complete suggestions and select the best match (”Catholic Church” colored blue). Below, the tool will provide operation suggestions. The current set already contains the ﬁlters ”Monarch” (class ﬁlter) and ”Germany” (instance ﬁlter). The ”related to” relation to Germany can be further reﬁned by selecting a more speciﬁc property from the drop-down menu. The light grey information next to the results shows the exact relation(s) of the result to the ﬁlters.

In some cases, there are ambivalences between classes, properties and instances. For instance, searching for Monarch will suggest the class dbo:Monarch and also the instance http://dbpedia.org/resource/Monarch . In addition, there is a property labeled monarch . To hide this issue in the tool, we merged elements sharing the same label into one auto-completion result and created operation suggestions for each element. This way, all operations are available to the user; however, this behavior might be confusing for the user.

In many cases, the DBpedia knowledge graph contains empty classes. According to our analysis only 61.14% of DBpedia classes have instances (461 out of 754 classes). Searching for a class like dbo:Biologist will yield an empty result, which is unintended behaviour caused by the underlying data. In practice Biologists are usually linked to the http://dbpedia.org/resource/Biologist instance using the dbo:field property. In the case of dbo:Actors , the search does have results, though it is incomplete (6695 results). Most actors are linked to the http://dbpedia.org/resource/Actor instance with the dbo:occupation property (19802 results). It would make sense for the dbo:occupation property to be rigid, with the http://dbpedia.org/resource/Actor and similar instances being of type dbo:Profession . This would then facilitate the retrieval of a result set containing all instances of the same profession, no matter if it is Actor or Biologist. To get around this issue, the tool would require a custom handwritten rule set or a linked ontology that has equivalent terms and OntoClean annotations. Furthermore, we ran into problems regarding transitivity and missing OWL property chains. Data inconsistencies where, for example, some castles are linked to http://dbpedia.org/reource/Germany directly; some are only linked to a city in Germany and not the Germany instance itself, which causes incompleteness of data. In this particular case we found only 8 castles being related to the Germany instance with the dbo:location property and 51 when searching for castles with locations within Germany. Furthermore, Wikipedia states that the actual number of German castles is in the thousands ³ , so there are still a lot of castles missed even by our second query. Such issues can only ﬁxed by custom inference rules in DBpedia itself as inclusion in our tool would make the queries a lot less eﬃcient and yield many irrelevant results. This would as well require a custom rule set containing property chains for additional inference. We further tested for completeness of country assignment of dbo:City and dbo:AdministrativeRegions via FILTER NOT EXISTS and found that around 82% of these links are missing, thus rendering application-level property chains less eﬀective.

In Figure 2 , a full query example is shown. After searching for Castle and Person and clicking on the respective operation suggestions, the result set contained the list of Persons related to Castles . The search was then further reﬁned by adding Japan as an instance ﬁlter, resulting in Persons related to Japan related to Castles . The relation between Persons and Castles could then be changed to something more speciﬁc by selecting a relation in the drop-down menu ( being owner of , birth place being , etc.)

When allowing the user to specify the generic “ related to ” property in the expand and ﬁlter operation, we realized that property occurrence counts are insuﬃcient to rank relevant properties accordingly as the general count lacks context speciﬁcity. Multiplying the count with a pre-calculated TF-IDF measure, where classes are documents and property occurrence of instances of this class are terms, yielded much better results, however, the measure is class-speciﬁc and can only be applied in some situations. Using the tool properly requires no knowledge of the SPARQL syntax or the structure of a query, however it currently requires a rough knowledge of the class and property names. This issue has to be tackled by a more sophisticated auto-completion using alternative labels of the ontology (for instance, the input ”born in” should be resolved to “ dbo:birthPlace ” by the auto-completion). While approaches to lexicalize DBpedia exist [9] , results have not yet been consolidated into the main knowledge graph and no synonyms exist.

Investigating Explorability of DBpedia and its Ontology

ABSTRACT

INTRODUCTION

RELATED WORK

IMPLEMENTATION

EVALUATION

CONCLUSION

ACKNOWLEDGEMENTS

REFERENCES

Footnotes