SSSW 2015 Sense Making

SSSW 2015
Bertinoro–Italy
July 10, 2015
Sense Making
Axel-Cyrille Ngonga Ngomo
&
Philippe Cudré-Mauroux

On Making Sense
•  ½ of Computer Science is about making sense of
some input data
–  KDD (cf. Claudia & Laura tutorial)
–  NLP (cf. Roberto’s talk)
–  Multimedia Analysis
–  Social Media / Big Data Analytics
–  Visualization
–  etc.

On the Menu Today
•  Making Sense of Semantic Data
–  Making sense of SPARQL & Semantic Web predicates
–  Trust on Semantic Web data
–  Emergent Semantics
•  Leveraging Semantic Data for Sense Making
–  Making sense of textual entities
–  Making sense of relational data
–  Making sense of webtables

Introduction
At some point in the early
twenty-ﬁrst century, all of mankind
was united in celebration. We
marveled at our own magniﬁcence as
we gave birth to AI.
– Morpheus, The Matrix
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 2 / 52

Introduction

Linked Data Web

Linked Data Web
Sense Making
Helping end users to make sense of the Semantic Web.

Gaps
Language Gap
Semantic Web speaks languages that normal users do not understand

Language Gap
Problem
What does it mean?

Language Gap

Language Gap
Problem
What does it mean?
1 PREFIX dbo: <http :// dbpedia.org/ontology/>
2 PREFIX res: <http :// dbpedia.org/resource/>
3 SELECT DISTINCT ?person WHERE {
4 ?person dbo:team ?sportsTeam .
5 ?sportsTeam dbo:league res: Premier_League .
6 ?person dbo:birthDate ?date .
7 ?person dbo:birthPlace ?place .
8 { ?place dbo:locatedIn res:Africa .}
9 UNION
10 { ?place dbo:locatedIn res:Asia .}
11 }
12 ORDER BY DESC (? date)
13 OFFSET 0 LIMIT 1

Language Gap
Problem
What does it mean?
1 PREFIX dbo: <http :// dbpedia.org/ontology/>
2 PREFIX res: <http :// dbpedia.org/resource/>
3 SELECT DISTINCT ?person WHERE {
4 ?person dbo:team ?sportsTeam .
5 ?sportsTeam dbo:league res: Premier_League .
6 ?person dbo:birthDate ?date .
7 ?person dbo:birthPlace ?place .
8 { ?place dbo:locatedIn res:Africa .}
9 UNION
10 { ?place dbo:locatedIn res:Asia .}
11 }
12 ORDER BY DESC (? date)
13 OFFSET 0 LIMIT 1
Give me the youngest person who plays in a Premier League
team and was born in Africa or Asia.

Language Gap
Solution
Verbalization frameworks for the Semantic Web
Document planner MicroplannerRealizer
http://github.com/AKSW/SemWeb2NL

Language Gap: Triple2NL/BGP2NL
Approach
1 ρ(s p o) ⇒ poss(ρ(p),ρ(s)) ∧
subj(BE,ρ(p)) ∧ dobj(BE,ρ(o))
2 ρ(s p o) ⇒ subj(ρ(p),ρ(s)) ∧ dobj(ρ(p),ρ(o))

Approach
1 :Momo :author :Ende
⇒ Momo’s author is Michael Ende.
2 ?x :author :Ende
⇒ ?x ’s author is Michael Ende.

Approach
1 :Momo :author :Ende
⇒ Momo’s author is Michael Ende.
2 ?x :author :Ende
⇒ ?x ’s author is Michael Ende.
3 :Momo :writtenBy :Ende
⇒ Momo was written by Michael Ende.
4 ?x :writtenBy :Ende
⇒ ?x was written by Michael Ende.

Language Gap: SPARQL2NL/RDF2NL
Approach
Combination rules
1 ρ((s, p, o1).(s, p, o2))
⇒ poss(ρ(p),ρ(s))∧ subj(BE,ρ(p)) ∧ dobj(BE, cc(ρ(o1), ρ(o1))
?x’s author is Paul Erd¨os and ?x’s author is Kevin Bacon.
⇒ ?x’s authors are Paul Erd¨os and Kevin Bacon.

Language Gap: SPARQL2NL/RDF2NL
?place is Shakespeare’s birth place or ?place is Shakespeare’s death
place.
⇒ ?place is Shakespeare’s birth or death place.
This query retrieves values ?height such that ?height is Claudia
Schiﬀer’s height.
⇒ This query retrieves Claudia Schiﬀer’s height.
?person’s team is ?sportsTeam. ?person’s birth date is ?date.
?sportsTeam’s league is Premier League.
⇒ ?person’s team is ?sportsTeam, ?person’s birth date is ?date, and
?sportsTeam’s league is Premier League.

Language Gap: Evaluation
125 participants, 49 SPARQL experts, 3 tasks
94% of verbalizations were understandable
5.31 ± 1.08 average adequacy score
0 50 100 150 200 250
Number of Survey Answers
1
2
3
4
5
6
Adequacy
0 20 40 60 80 100 120
Number of Survey Answers
1
2
3
4
5
6
Fluency
Figure : Adequacy and ﬂuency results in survey

Slightly larger error with NL for experts
Non-experts enabled understand the meaning of queries
0 0,2 0,4 0,6 0,8 1 1,2 1,4
error rate
SPARQL
NL
NL (SPARQL experts)
Figure : Error rate over the three tasks

Non-experts faster with NL than experts with SPARQL
Experts faster with NL than experts with SPARQL
0 5 10 15 20
time in minutes (purple = standard deviation)
SPARQL
SPARQL (filtered)
NL
NL (filtered)
NL (SPARQL experts)
NL (SPARQL experts, filtered)
Figure : Average time needed

Language Gap: Application

Language Gap: Challenges
Complex queries
Sacriﬁce adequacy for ﬂuency
Other languages
Hybrid approach
Personalization

Gaps
Semantic Gap
Decentralized content generation
Contextualization mismatch

Semantic Gap
Problem
How do I communicate with it?

Semantic Gap
Solution
Question Answering Systems
Example:
Where did Abraham Lincoln die?
SELECT ?x WHERE {
res:Abraham Lincoln dbo:deathPlace ?x .
}
PowerAqua:
Triple representation:
state/place, die, Abraham Lincoln
Ontology mappings:
Place, deathPlace, Abraham Lincoln

Semantic Gap: Mismatch
Triples do not always provide a faithful representation of the semantic
structure of the question
Thus more expressive queries cannot be answered
Example 1:
Which cities have more than three universities?
SELECT ?y WHERE {
?x rdf:type dbo:University .
?x dbo:city ?y .
}
HAVING (COUNT(?x) > 3)
cities, more than, universities three

Semantic Gap: Mismatch
Triples do not always provide a faithful representation of the semantic
structure of the question
Thus more expressive queries cannot be answered
Example 2:
Who produced the most ﬁlms?
SELECT ?y WHERE {
?x rdf:type dbo:Film .
?x dbo:producer ?y .
}
ORDER BY DESC(COUNT(?x)) LIMIT 1
person/organization, produced, most ﬁlms

Semantic Gap: Approach
To understand a user question, we need to understand:
The words
Abraham Lincoln → res:Abraham Lincoln
died in → dbo:deathPlace
The semantic structure
the most N → ODER BY DESC(COUNT(?n)) LIMIT 1
more than three N → HAVING (COUNT(?n) > 3)
Template-Based Question Answering
1 Template generation: Understanding the semantic structure)
2 Template instantiation: Understanding the words)

Semantic Gap: Example
Query: Who produced the most ﬁlms?

1 SPARQL template:
SELECT ?x WHERE {
?y rdf:type ?c .
?y ?p ?x .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
?c CLASS [ﬁlms]
?p PROPERTY [produced]

1 SPARQL template:
SELECT ?x WHERE {
?y rdf:type ?c .
?y ?p ?x .
}
?c CLASS [ﬁlms]
2 Instantiations:
?c = <http://dbpedia.org/ontology/Film>
?p = <http://dbpedia.org/ontology/producer>

Semantic Gap: Architecture
Natural
Language
Question
Semantic
Representaion
SPARQL
Query
Templates
Templates
with URI slots
Ranked SPARQL
Queries
Answer
LOD
Entity identiﬁcation
Entity and Query Ranking
Query
Selection
Resources
and Classes
SPARQL
Endpoint
Type Checking
and Prominence
BOA Pattern
Library
Properties
Tagged
Question
Domain Independent
Lexicon
Domain Dependent
Lexicon
Parsing
Corpora?
!
Loading
State
Process
Uses

Semantic Gap: Template Generation

1 Natural language question is tagged
with part-of-speech information.

2 Based on POS tags, lexical entries
are built on the ﬂy.

3 These lexical entries, together with
domain-independent lexical entries,
are used for parsing the question.

4 The resulting semantic
representation is translated into a
SPARQL template.

Semantic Gap: Who produced the most films?
domain-independent: who, the most
domain-dependent: produced/VBD, films/NNS
SPARQL template 1:
SELECT ?x WHERE {
?x ?p ?y .
?y rdf:type ?c .
}
?c CLASS [films]

domain-independent: who , the most
SPARQL template 1:
SELECT ?x WHERE {
?x ?p ?y .
?y rdf:type ?c .
}
?c CLASS [ﬁlms]

domain-dependent: produced/VBD , ﬁlms/NNS
SPARQL template 1:
SELECT ?x WHERE {
?x ?p ?y .
?y rdf:type ?c .
}
?c CLASS [ﬁlms]

SPARQL template 2:
SELECT ?x WHERE {
?x ?p ?y .
}
?p PROPERTY [ﬁlms]

Semantic Gap: Template instantiation

1 For resources and classes:
Identify synonyms of the label using WordNet.
Retrieve entities with a label similar to the slot label
based on string similarities (trigram, Levenshtein,
substring).

2 For property labels, the label is
additionally compared to natural
language expressions stored in the
BOA pattern library.

3 The highest ranking entities are
returned as candidates for ﬁlling the
query slots.

BOA
The BOA pattern library is a repository of natural language
representations of Semantic Web predicates.
Idea:
For each predicate P in a data repository (e.g. DBpedia), collect the
set of entities S and O connected through P.
Search a text corpus (e.g. Wikipedia) for all sentences containing the
labels of S and O.
For all retrieved sentences, the natural language predicate is a
potential pattern for P. The potential patterns are then scored by a
neural network (e.g. according to frequency) and ﬁltered.

BOA: Example
Predicate:
http://dbpedia.org/ontology/subsidiary
RDF snippet:
<http://dbpedia.org/resource/Google>
<http://dbpedia.org/ontology/subsidiary>
<http://dbpedia.org/resource/YouTube> .
<http://dbpedia.org/resource/Google> rdfs:label ‘Google’@en .
<http://dbpedia.org/resource/YouTube> rdfs:label ‘Youtube’@en .
Sentences:
Google’s acquisition of Youtube comes as online video is really starting
to hit its stride.
Youtube, a division of Google, is exploring a new way to get more
high-quality clips on its site: ﬁnancing amateur video creators.
Patterns:
subsidiary: S’s acquisition of O
subsidiary: O, a division of S

BOA
The use of BOA patterns allows us to match natural language expressions
and ontology concepts even if they are not string similar and not covered
by WordNet.
Examples:
married to → http://dbpedia.org/ontology/spouse
was born in → http://dbpedia.org/ontology/birthPlace
graduated from → http://dbpedia.org/ontology/almaMater
write → http://dbpedia.org/ontology/author

Example: Who produced the most films?
Candidates for filling query slots:
?c CLASS [films]
<http://dbpedia.org/ontology/Film>
<http://dbpedia.org/ontology/FilmFestival>
. . .
<http://dbpedia.org/ontology/producer>
<http://dbpedia.org/property/producer>
<http://dbpedia.org/ontology/wineProduced>
. . .

Semantic Gap: Query ranking and selection

1 Every entity receives a score
considering string similarity and
prominence
2 The score of a query is then
computed as the average of the
scores of the entities used to ﬁll its
slots

3 In addition, type checks are
performed

4 Of the remaining queries, the one
with highest score that returns a
result is chosen to retrieve an
answer.

Example: Who produced the most ﬁlms?
SELECT ?x WHERE {
?x <http://dbpedia.org/ontology/producer> ?y .
?y rdf:type <http://dbpedia.org/ontology/Film> .
}
Score: 0.7592425075864263
SELECT ?x WHERE {
?x <http://dbpedia.org/ontology/film> ?y .
}
Score: 0.6264001353183296
SELECT ?x WHERE {
?x <http://dbpedia.org/ontology/producer> ?y .
?y rdf:type <http://dbpedia.org/ontology/FilmFestival>.
}
Score: 0.6012584940627768

Evaluation Setup
Question set: 39 DBpedia training questions from QALD-1
5 could not be parsed due to unknown syntactic constructions or
uncovered domain-independent expressions
19 were answered exactly as required by the benchmark (with
precision and recall 1.0)
Another 2 are answered almost correctly (with precision and recall
greater than 0.8)
Mean precision: 0.61
Mean recall: 0.63
F-measure: 0.62

Main Sources of Error
Incorrect templates
Template structure does not coincide with structure of the data:
When did Germany join the EU?
res:Germany dbp:accessioneudate ?x .
Predicate detection fails
inhabitants dbp:population, dbp:populationTotal
owns dbo:keyPerson
higher dbp:elevationM
Wrong query is selected
Who wrote The pillars of the Earth?
res:The Pillars of the Earth (TV Miniseries) dbo:writer ?x .
res:The Pillars of the Earth dbo:author ?x .

Language Gap: Challenges
Schema-agnostic QA
Query Ranking
Relation Extraction
Ontology Lexicalization
Extraction of surface forms

Justiﬁcation Gap
Problem
Are you sure? Prove it to me.

Justiﬁcation Gap
Solution
Gathering natural-language evidence?
http://aksw.org/Projects/defacto

Justiﬁcation Gap: Automatic Query Generation
Solution
Gathering natural-language evidence?
⇓

Justiﬁcation Gap: Evidence Generation
(s, p, o) = “ρ(s)” “ρ(p)” “ρ(s)”
:Momo :author :Ende
1 “Momo” “author” “Michael Ende”
2 “Momo” “written by” “Michael Ende”
3 “Momo” “book by” “Michael Ende”

Justiﬁcation Gap: Proof Scoring
Combination of features including
1 Score of BOA pattern
2 Token distance
3 Total occurrence of resource labels
4 Similarity to title

Justiﬁcation Gap: Trustworthiness
1 Topic majority on the Web
2 Topic majority in results
3 Topic terms
4 Page rank

Justiﬁcation Gap: Fact Conﬁrmation
1 Combined trustworthiness and proof score
2 Number of proofs
3 Total hit count
4 Domain/Range

Justiﬁcation Gap: Evaluation
10 triples/property
Top-60 most used properties
473 from 600 manually veriﬁed to be true

Justiﬁcation Gap: Evaluation
J48 is overall best classiﬁer (78.8% - 87.6%)
Easiest data set: random
Mixed dataset hardest

Challenges

Summary
Language Gap
Semantic Gap
Justiﬁcation Gap
Access Gap
Data Gap
Noise Gap
. . .

Goal

The End
Thank you! Questions?
Axel Ngonga
http://aksw.org/AxelNgonga
ngonga@informatik.uni-leipzig.de
AKSW Research Group
University of Leipzig, Germany
@akswgroup
@NgongaAxel

The Semantics of the Semantic Web
•  A priori: top-down semantics
–  Logical assertions
–  Crisp reuse of conceptualization
•  In practice: hybrid bottom-up/top-down approach
–  (Human/software) agents are sloppy/ignorant
–  Agents do not agree (for various reasons)
=> Centralized view on decentralized construct ?

Semantic Grounding
The meaning of symbols can be explained by its
semantic correspondences to other symbols alone
[“Understanding understanding” Rapaport 93]
•  Type 1 semantics: understanding in terms of something else
•  Problem: how to ground semantics?
•  Type 2 semantics: understanding something in terms of itself
•  “syntactic semantics”: grounding through recursive
understanding

Emergent Semantics
Emergent Semantics:
•  Semantics as a posteriori agreement on conceptualizations
=> Don’t believe / enforce the schema !
•  Semantics of symbols as recursive correspondences to other
symbols
•  Analyzing transitive closures of mappings
•  Self-organizing, bottom-up approach
•  Global semantics (stable states) emerging from multiple
local interactions
•  Syntactic semantics
•  Studying semantics from a syntactic perspective

3 Concrete Examples
1.  Emergence of Semantic Interoperability
2.  Entity disambiguation using same-as networks
3.  A posteriori schema for LOD properties

•  How many links do you need to make a semantic network
interoperable?
•  Semantic interoperability as an emergent property!
⇒  Connectivity indicator: ci = ∑j,k (jk-j(bc+cc)-k) pjk
•  Necessary condition for semantic interoperability in the
large: ci ≥ 0
Semantic Connectivity
Philippe Cudré-Mauroux, Karl Aberer: A Necessary Condition for Semantic
Interoperability in the Large. CoopIS/DOA/ODBASE 2004: 859-872.

Graph-Based Disambiguation
•  The great thing about unique identifiers is that there are
so many to choose from
–  URI jungle
–  Disambiguation based on transitive closures on equality links
Philippe Cudré-Mauroux, Parisa Haghani, Michael Jost, Karl Aberer, Hermann de Meer:
idMesh: graph-based disambiguation of linked data. WWW 2009: 591-600.

A Posteriori Schema
•  Instance data use schema constructs in creative
ways!
⇒ Retro-engineering of schema constructs based on the
deployment of instance data
⇒ Context-dependent, retro-compatible
Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux: Fixing the
Domain and Range of Properties in Linked Data by Context Disambiguation. LDOW 2015.

•  Tons of research opportunities in this field
•  Understanding the emergent properties of LOD
networks (and how to exploit them)
•  Analyzing the deployment / use of semantic data (a
priori VS a posteriori views)
•  Capturing user disagreement (e.g., multi-views
ontologies, fuzzy ontologies, results diversification)
Research Directions

Leveraging
Semantic Data
for
Sense Making

Volume
■  amount of data
Velocity
■  speed of data in and out
Variety
■  range of data types and sources
[Gartner 2012] "Big Data are high-volume, high-velocity, and/or high-variety
information assets that require new forms of processing to enable enhanced
decision making, insight discovery and process optimization"
Opportunity: The 3-Vs of Big Data

Information Management
•  The story so far:
–  Strict separation between unstructured and structured
data management infrastructures
DBMS
JDBC
SQL
Inverted Index
Keywords
HTTP

Information Integration
•  Information integration is still one of the biggest
CS problem out there (according to many e.g., Gartner)
•  Information integration typically requires some
sort of mediation
1.  Unstructured Data: keywords, synsets
2.  Structured Data: global schema, transitive closure of
schemas (mostly syntactic)
⇒ nightmarish if 1 and 2 taken separately, horror
marathon if considered together

Entities as Mediation
•  Rising paradigm
–  Store information at the entity granularity
–  Integrate information by inter-linking entities
•  Advantages?
–  Coarser granularity compared to keywords
•  More natural, e.g., brain functions similarly (or is it the
other way around?)
–  Denormalized information compared to RDBMSs
•  Schema-later, heterogeneity, sparsity
•  Pre-computed joins, “Semantic” linking
•  Drawbacks?

Entity-Centric Data Management
Higher-level apps

Exposing Textual Data
•  The XI Pipeline
•  Runs on massive amounts of data (Spark)
Mention
Extraction
NER
Entity
Linking
Entity
Typing

Named Entity Recognition (NER)
Text
extraction
(Apache Tika)
List of
extracted
n-grams
n-gram
Indexing
foreach
Candidate
Selection
List of
selected
n-grams
Supervised
Classi!er
Ranked
list of
n-grams
Lemmat
ization
n+1 grams
merging
Feature
extractionFeature
extractionFeatures
POS
Tagging
frequency
reweighting
Roman Prokofyev, Gianluca Demartini, Philippe Cudré-Mauroux:
Effective named entity recognition for idiosyncratic web collections. WWW 2014: 397-408

Entity Linking
•  Linking entities to text is an old problem…
–  … and is extremely hard, esp. for machines
•  Dozens of approaches have been suggested
•  What if
–  We want to combine approaches / frameworks?
–  We want to leverage both human computations &
algorithms?

ZenCrowd
•  Integrate textual data w/ the Web of Data
•  Uses sets of algorithmic matchers to match
entities to online concepts
•  Uses dynamic templating to create micro-
matching-tasks and publish them on MTurk
•  Combines both algorithmic and human matchers
using probabilistic networks
Gianluca Demartini, Djellel Eddine Difallah, Philippe Cudré-Mauroux:
ZenCrowd: leveraging probabilistic reasoning and crowdsourcing
techniques for large-scale entity linking. WWW 2012: 469-478

ZenCrowd Architecture
Micro
Matching
Tasks
HTML
Pages
HTML+ RDFa
Pages
LOD Open Data Cloud
Crowdsourcing
Platform
ZenCrowd
Entity
Extractors
LOD Index Get Entity
Input Output
Probabilistic
Network
Decision Engine
Micro-
TaskManager
Workers Decisions
Algorithmic
Matchers

Probabilistic Inference
•  Probabilistic network to integrate a priori & a
posteriori information
–  Agreement of good turkers & algorithms
•  Learning process
–  Constraints
•  Unicity
•  Equality (SameAs)
–  Giant probabilistic graph
•  Instantiated selectively
w1
w2
l1
l2
pw1( ) pw2( )
lf1( ) lf2( )
pl1( ) pl2( )
l
lf3
pl
c11
c22
c12
c21
c13
u2-3( )sa1-2( )

Does it Work?
•  Improves avg. prec. by 0.14 on average!
–  Minimal crowd involvement
–  Embarrassingly parallel problem
Top$US$
Worker$
0$
0.5$
1$
0$ 250$ 500$
Worker&Precision&
Number&of&Tasks&
US$Workers$
IN$Workers$
0.6$
0.62$
0.64$
0.66$
0.68$
0.7$
0.72$
0.74$
0.76$
0.78$
0.8$
1$ 2$ 3$ 4$ 5$ 6$ 7$ 8$ 9$
Precision)
Top)K)workers)

Entity Typing
•  Entities can have many types (facets)
•  Which fine-grained types are most relevant given the context?
Thing

American

Billionaires

People

from
King

County

People

from

Sea:le

Windows

People

Agent

Person

Living

People

American

People
of

Sco@sh

Descent

Harvard

University

People

American

Computer

Programmers

American

Philanthropists

TRank
•  Fine-grained Typing
•  Tree of 447’260 types
•  Rooted on <owl:Thing>
•  Depth of 19
•  Ranks relevant types by analyzing the context
•  Textual context
•  Graph context
•  Decision trees
•  Linear regression
Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, Karl Aberer:
TRank: Ranking Entity Types Using the Web of Data. ISWC 2013: 640-656.

Exposing Relational Data
•  Mapping language file
describes the relation between
ontology and RDB
•  Server provides HTML and
linked data views and a
SPARQL 1.1 endpoint
•  Rewriting engine uses map-
pings to rewrite Jena &
Sesame API calls to SQL
queries and generates RDF
dumps in various formats
http://d2rq.org/ ,
http://aksw.org/Projects/Sparqlify.html , etc.

Exposing Webtables
•  Wealth of data in (HTML) tables
•  Yet another type of content to expose
Sreeram Balakrishnan, Alon Y. Halevy, Boulos Harb, Hongrae Lee, Jayant Madhavan, Afshin
Rostamizadeh, Warren Shen, Kenneth Wilder, Fei Wu, Cong Yu: Applying WebTables in
Practice. CIDR 2015
Tao, Cui, and David W. Embley. "Automatic hidden-web table interpretation, conceptualization,
and semantic annotation." Data & Knowledge Engineering 68.7 (2009): 683-703.

Application 1: Enterprise Search
•  How can end-users reach entities?
⇒ Structured search
⇒ Keyword search
•  On their names or attributes
–  Obviously not ideal
•  BM25 on TREC 2011 AOR: MAP=0.15, P@10=0.20
•  Query extension, query completion or pseudo-relevance
feedback yield comparable (or worse) results

Hybrid Entity Search
The Descendants
TheDescendants
type
title
GeorgeClooney
George Clooney
name
May 6, 1961
dateOfBirth
type
ShaileneW
Shailene Woodley
name
Nov. 15, 1991
dateOfBirth
type
playsIn
playsIn
•  Main idea: combine unstructured and structured
search
–  Inverted index to locate first candidates
–  Graph queries to refine the results
•  Graph traversals (queries on object properties)
•  Graph neighborhoods (queries on
data type properties)
Inverted Index
Keywords
HTTP
DBMS
SPARQL

Architecture
LOD Cloud
index()
User
Query Annotation
and Expansion
Inverted Index
RDF
Store
Ranking
FunctionsRanking
FunctionsRanking
Functions
query()
Entity Search
Keyword Query
intermediate
top-k results
Graph-Enriched
Results
Graph Traversals
(queries on object
properties)
Neighborhoods
(queries on datatype
properties)
Structured
Inverted Index
WordNet
3rd party
search engines
Final Ranking
Function
Pseudo-Relevance Feedback
Alberto Tonon, Gianluca Demartini, Philippe Cudré-Mauroux: Combining inverted indices and
structured search for ad-hoc object retrieval. SIGIR 2012: 125-134

Application 2:
Literature Browsing/Recommendation

Application 3: Co-Reference Resolution
•  Better co-reference resolution through the
knowledge base
Roman Prokofyev, Alberto Tonon, Michael Luggen, Loic Vouilloz, Djellel Eddine Difallah, and
Philippe Cudre-Mauroux: SANAPHOR: Ontology-Based Coreference Resolution. ISWC 2015.
Barack Obama
called
Angela Merkel last week;
the president asked
the chancellor whether…

•  NER in vertical domains
•  Crowdsourcing parts of the processing
•  Predicate extraction
•  Summarization
•  Exposing further types of content
•  Updates / transactions
•  Parallelization
•  Higher-level applications
Research Opportunities

1.  Analyzing emergent properties of LOD
2.  Crowdsourcing predicate extraction
3.  SPARQL verbalization
4.  Hybrid question answering
5.  Source selection
6.  Ranking SPARQL queries
Research Tasks

SSSW 2015 Sense Making

Recommended

Recommended

More Related Content

Similar to SSSW 2015 Sense Making

Similar to SSSW 2015 Sense Making (20)

More from eXascale Infolab

More from eXascale Infolab (20)

Recently uploaded

Recently uploaded (20)

SSSW 2015 Sense Making