SlideShare a Scribd company logo
1 of 111
Download to read offline
SSSW 2015
Bertinoro–Italy
July 10, 2015
Sense Making
Axel-Cyrille Ngonga Ngomo
&
Philippe Cudré-Mauroux
On Making Sense
•  ½ of Computer Science is about making sense of
some input data
–  KDD (cf. Claudia & Laura tutorial)
–  NLP (cf. Roberto’s talk)
–  Multimedia Analysis
–  Social Media / Big Data Analytics
–  Visualization
–  etc.
On the Menu Today
•  Making Sense of Semantic Data
–  Making sense of SPARQL & Semantic Web predicates
–  Trust on Semantic Web data
–  Emergent Semantics
•  Leveraging Semantic Data for Sense Making
–  Making sense of textual entities
–  Making sense of relational data
–  Making sense of webtables
Making Sense
of
Semantic Data
Introduction
At some point in the early
twenty-first century, all of mankind
was united in celebration. We
marveled at our own magnificence as
we gave birth to AI.
– Morpheus, The Matrix
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 2 / 52
Introduction
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 3 / 52
Linked Data Web
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 4 / 52
Linked Data Web
Sense Making
Helping end users to make sense of the Semantic Web.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 4 / 52
Gaps
Language Gap
Semantic Web speaks languages that normal users do not understand
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 5 / 52
Language Gap
Problem
What does it mean?
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 6 / 52
Language Gap
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 7 / 52
Language Gap
Problem
What does it mean?
1 PREFIX dbo: <http :// dbpedia.org/ontology/>
2 PREFIX res: <http :// dbpedia.org/resource/>
3 SELECT DISTINCT ?person WHERE {
4 ?person dbo:team ?sportsTeam .
5 ?sportsTeam dbo:league res: Premier_League .
6 ?person dbo:birthDate ?date .
7 ?person dbo:birthPlace ?place .
8 { ?place dbo:locatedIn res:Africa .}
9 UNION
10 { ?place dbo:locatedIn res:Asia .}
11 }
12 ORDER BY DESC (? date)
13 OFFSET 0 LIMIT 1
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 8 / 52
Language Gap
Problem
What does it mean?
1 PREFIX dbo: <http :// dbpedia.org/ontology/>
2 PREFIX res: <http :// dbpedia.org/resource/>
3 SELECT DISTINCT ?person WHERE {
4 ?person dbo:team ?sportsTeam .
5 ?sportsTeam dbo:league res: Premier_League .
6 ?person dbo:birthDate ?date .
7 ?person dbo:birthPlace ?place .
8 { ?place dbo:locatedIn res:Africa .}
9 UNION
10 { ?place dbo:locatedIn res:Asia .}
11 }
12 ORDER BY DESC (? date)
13 OFFSET 0 LIMIT 1
Give me the youngest person who plays in a Premier League
team and was born in Africa or Asia.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 8 / 52
Language Gap
Solution
Verbalization frameworks for the Semantic Web
Document planner MicroplannerRealizer
http://github.com/AKSW/SemWeb2NL
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 9 / 52
Language Gap: Triple2NL/BGP2NL
Approach
1 ρ(s p o) ⇒ poss(ρ(p),ρ(s)) ∧
subj(BE,ρ(p)) ∧ dobj(BE,ρ(o))
2 ρ(s p o) ⇒ subj(ρ(p),ρ(s)) ∧ dobj(ρ(p),ρ(o))
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 10 / 52
Language Gap: Triple2NL/BGP2NL
Approach
1 ρ(s p o) ⇒ poss(ρ(p),ρ(s)) ∧
subj(BE,ρ(p)) ∧ dobj(BE,ρ(o))
2 ρ(s p o) ⇒ subj(ρ(p),ρ(s)) ∧ dobj(ρ(p),ρ(o))
1 :Momo :author :Ende
⇒ Momo’s author is Michael Ende.
2 ?x :author :Ende
⇒ ?x ’s author is Michael Ende.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 10 / 52
Language Gap: Triple2NL/BGP2NL
Approach
1 ρ(s p o) ⇒ poss(ρ(p),ρ(s)) ∧
subj(BE,ρ(p)) ∧ dobj(BE,ρ(o))
2 ρ(s p o) ⇒ subj(ρ(p),ρ(s)) ∧ dobj(ρ(p),ρ(o))
1 :Momo :author :Ende
⇒ Momo’s author is Michael Ende.
2 ?x :author :Ende
⇒ ?x ’s author is Michael Ende.
3 :Momo :writtenBy :Ende
⇒ Momo was written by Michael Ende.
4 ?x :writtenBy :Ende
⇒ ?x was written by Michael Ende.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 10 / 52
Language Gap: SPARQL2NL/RDF2NL
Approach
Combination rules
1 ρ((s, p, o1).(s, p, o2))
⇒ poss(ρ(p),ρ(s))∧ subj(BE,ρ(p)) ∧ dobj(BE, cc(ρ(o1), ρ(o1))
?x’s author is Paul Erd¨os and ?x’s author is Kevin Bacon.
⇒ ?x’s authors are Paul Erd¨os and Kevin Bacon.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 11 / 52
Language Gap: SPARQL2NL/RDF2NL
?place is Shakespeare’s birth place or ?place is Shakespeare’s death
place.
⇒ ?place is Shakespeare’s birth or death place.
This query retrieves values ?height such that ?height is Claudia
Schiffer’s height.
⇒ This query retrieves Claudia Schiffer’s height.
?person’s team is ?sportsTeam. ?person’s birth date is ?date.
?sportsTeam’s league is Premier League.
⇒ ?person’s team is ?sportsTeam, ?person’s birth date is ?date, and
?sportsTeam’s league is Premier League.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 12 / 52
Language Gap: Evaluation
125 participants, 49 SPARQL experts, 3 tasks
94% of verbalizations were understandable
5.31 ± 1.08 average adequacy score
0 50 100 150 200 250
Number of Survey Answers
1
2
3
4
5
6
Adequacy
0 20 40 60 80 100 120
Number of Survey Answers
1
2
3
4
5
6
Fluency
Figure : Adequacy and fluency results in survey
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 13 / 52
Language Gap: Evaluation
125 participants, 49 SPARQL experts, 3 tasks
Slightly larger error with NL for experts
Non-experts enabled understand the meaning of queries
0 0,2 0,4 0,6 0,8 1 1,2 1,4
error rate
SPARQL
NL
NL (SPARQL experts)
Figure : Error rate over the three tasks
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 13 / 52
Language Gap: Evaluation
125 participants, 49 SPARQL experts, 3 tasks
Non-experts faster with NL than experts with SPARQL
Experts faster with NL than experts with SPARQL
0 5 10 15 20
time in minutes (purple = standard deviation)
SPARQL
SPARQL (filtered)
NL
NL (filtered)
NL (SPARQL experts)
NL (SPARQL experts, filtered)
Figure : Average time needed
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 13 / 52
Language Gap: Application
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 14 / 52
Language Gap: Application
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 15 / 52
Language Gap: Challenges
Complex queries
Sacrifice adequacy for fluency
Other languages
Hybrid approach
Personalization
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 16 / 52
Gaps
Semantic Gap
Decentralized content generation
Contextualization mismatch
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 17 / 52
Semantic Gap
Problem
How do I communicate with it?
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 18 / 52
Semantic Gap
Solution
Question Answering Systems
Example:
Where did Abraham Lincoln die?
SELECT ?x WHERE {
res:Abraham Lincoln dbo:deathPlace ?x .
}
PowerAqua:
Triple representation:
state/place, die, Abraham Lincoln
Ontology mappings:
Place, deathPlace, Abraham Lincoln
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 19 / 52
Semantic Gap: Mismatch
Triples do not always provide a faithful representation of the semantic
structure of the question
Thus more expressive queries cannot be answered
Example 1:
Which cities have more than three universities?
SELECT ?y WHERE {
?x rdf:type dbo:University .
?x dbo:city ?y .
}
HAVING (COUNT(?x) > 3)
Triple representation:
cities, more than, universities three
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 20 / 52
Semantic Gap: Mismatch
Triples do not always provide a faithful representation of the semantic
structure of the question
Thus more expressive queries cannot be answered
Example 2:
Who produced the most films?
SELECT ?y WHERE {
?x rdf:type dbo:Film .
?x dbo:producer ?y .
}
ORDER BY DESC(COUNT(?x)) LIMIT 1
Triple representation:
person/organization, produced, most films
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 20 / 52
Semantic Gap: Approach
To understand a user question, we need to understand:
The words
Abraham Lincoln → res:Abraham Lincoln
died in → dbo:deathPlace
The semantic structure
the most N → ODER BY DESC(COUNT(?n)) LIMIT 1
more than three N → HAVING (COUNT(?n) > 3)
Template-Based Question Answering
1 Template generation: Understanding the semantic structure)
2 Template instantiation: Understanding the words)
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 21 / 52
Semantic Gap: Example
Query: Who produced the most films?
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 22 / 52
Semantic Gap: Example
Query: Who produced the most films?
1 SPARQL template:
SELECT ?x WHERE {
?y rdf:type ?c .
?y ?p ?x .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
?c CLASS [films]
?p PROPERTY [produced]
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 22 / 52
Semantic Gap: Example
Query: Who produced the most films?
1 SPARQL template:
SELECT ?x WHERE {
?y rdf:type ?c .
?y ?p ?x .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
?c CLASS [films]
?p PROPERTY [produced]
2 Instantiations:
?c = <http://dbpedia.org/ontology/Film>
?p = <http://dbpedia.org/ontology/producer>
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 22 / 52
Semantic Gap: Architecture
Natural
Language
Question
Semantic
Representaion
SPARQL
Query
Templates
Templates
with URI slots
Ranked SPARQL
Queries
Answer
LOD
Entity identification
Entity and Query Ranking
Query
Selection
Resources
and Classes
SPARQL
Endpoint
Type Checking
and Prominence
BOA Pattern
Library
Properties
Tagged
Question
Domain Independent
Lexicon
Domain Dependent
Lexicon
Parsing
Corpora?
!
Loading
State
Process
Uses
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 23 / 52
Semantic Gap: Template Generation
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 24 / 52
Semantic Gap: Template Generation
1 Natural language question is tagged
with part-of-speech information.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 24 / 52
Semantic Gap: Template Generation
2 Based on POS tags, lexical entries
are built on the fly.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 24 / 52
Semantic Gap: Template Generation
3 These lexical entries, together with
domain-independent lexical entries,
are used for parsing the question.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 24 / 52
Semantic Gap: Template Generation
4 The resulting semantic
representation is translated into a
SPARQL template.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 24 / 52
Semantic Gap: Who produced the most films?
domain-independent: who, the most
domain-dependent: produced/VBD, films/NNS
SPARQL template 1:
SELECT ?x WHERE {
?x ?p ?y .
?y rdf:type ?c .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
?c CLASS [films]
?p PROPERTY [produced]
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 25 / 52
Semantic Gap: Who produced the most films?
domain-independent: who , the most
domain-dependent: produced/VBD, films/NNS
SPARQL template 1:
SELECT ?x WHERE {
?x ?p ?y .
?y rdf:type ?c .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
?c CLASS [films]
?p PROPERTY [produced]
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 25 / 52
Semantic Gap: Who produced the most films?
domain-independent: who, the most
domain-dependent: produced/VBD , films/NNS
SPARQL template 1:
SELECT ?x WHERE {
?x ?p ?y .
?y rdf:type ?c .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
?c CLASS [films]
?p PROPERTY [produced]
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 25 / 52
Semantic Gap: Who produced the most films?
domain-independent: who, the most
domain-dependent: produced/VBD, films/NNS
SPARQL template 2:
SELECT ?x WHERE {
?x ?p ?y .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
?p PROPERTY [films]
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 25 / 52
Semantic Gap: Template instantiation
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 26 / 52
Semantic Gap: Template instantiation
1 For resources and classes:
Identify synonyms of the label using WordNet.
Retrieve entities with a label similar to the slot label
based on string similarities (trigram, Levenshtein,
substring).
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 26 / 52
Semantic Gap: Template instantiation
2 For property labels, the label is
additionally compared to natural
language expressions stored in the
BOA pattern library.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 26 / 52
Semantic Gap: Template instantiation
3 The highest ranking entities are
returned as candidates for filling the
query slots.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 26 / 52
BOA
The BOA pattern library is a repository of natural language
representations of Semantic Web predicates.
Idea:
For each predicate P in a data repository (e.g. DBpedia), collect the
set of entities S and O connected through P.
Search a text corpus (e.g. Wikipedia) for all sentences containing the
labels of S and O.
For all retrieved sentences, the natural language predicate is a
potential pattern for P. The potential patterns are then scored by a
neural network (e.g. according to frequency) and filtered.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 27 / 52
BOA: Example
Predicate:
http://dbpedia.org/ontology/subsidiary
RDF snippet:
<http://dbpedia.org/resource/Google>
<http://dbpedia.org/ontology/subsidiary>
<http://dbpedia.org/resource/YouTube> .
<http://dbpedia.org/resource/Google> rdfs:label ‘Google’@en .
<http://dbpedia.org/resource/YouTube> rdfs:label ‘Youtube’@en .
Sentences:
Google’s acquisition of Youtube comes as online video is really starting
to hit its stride.
Youtube, a division of Google, is exploring a new way to get more
high-quality clips on its site: financing amateur video creators.
Patterns:
subsidiary: S’s acquisition of O
subsidiary: O, a division of S
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 28 / 52
BOA
The use of BOA patterns allows us to match natural language expressions
and ontology concepts even if they are not string similar and not covered
by WordNet.
Examples:
married to → http://dbpedia.org/ontology/spouse
was born in → http://dbpedia.org/ontology/birthPlace
graduated from → http://dbpedia.org/ontology/almaMater
write → http://dbpedia.org/ontology/author
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 29 / 52
Example: Who produced the most films?
Candidates for filling query slots:
?c CLASS [films]
<http://dbpedia.org/ontology/Film>
<http://dbpedia.org/ontology/FilmFestival>
. . .
?p PROPERTY [produced]
<http://dbpedia.org/ontology/producer>
<http://dbpedia.org/property/producer>
<http://dbpedia.org/ontology/wineProduced>
. . .
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 30 / 52
Semantic Gap: Query ranking and selection
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 31 / 52
Semantic Gap: Query ranking and selection
1 Every entity receives a score
considering string similarity and
prominence
2 The score of a query is then
computed as the average of the
scores of the entities used to fill its
slots
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 31 / 52
Semantic Gap: Query ranking and selection
3 In addition, type checks are
performed
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 31 / 52
Semantic Gap: Query ranking and selection
4 Of the remaining queries, the one
with highest score that returns a
result is chosen to retrieve an
answer.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 31 / 52
Example: Who produced the most films?
SELECT ?x WHERE {
?x <http://dbpedia.org/ontology/producer> ?y .
?y rdf:type <http://dbpedia.org/ontology/Film> .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
Score: 0.7592425075864263
SELECT ?x WHERE {
?x <http://dbpedia.org/ontology/film> ?y .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
Score: 0.6264001353183296
SELECT ?x WHERE {
?x <http://dbpedia.org/ontology/producer> ?y .
?y rdf:type <http://dbpedia.org/ontology/FilmFestival>.
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
Score: 0.6012584940627768
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 32 / 52
Evaluation Setup
Question set: 39 DBpedia training questions from QALD-1
5 could not be parsed due to unknown syntactic constructions or
uncovered domain-independent expressions
19 were answered exactly as required by the benchmark (with
precision and recall 1.0)
Another 2 are answered almost correctly (with precision and recall
greater than 0.8)
Mean precision: 0.61
Mean recall: 0.63
F-measure: 0.62
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 33 / 52
Main Sources of Error
Incorrect templates
Template structure does not coincide with structure of the data:
When did Germany join the EU?
res:Germany dbp:accessioneudate ?x .
Predicate detection fails
inhabitants dbp:population, dbp:populationTotal
owns dbo:keyPerson
higher dbp:elevationM
Wrong query is selected
Who wrote The pillars of the Earth?
res:The Pillars of the Earth (TV Miniseries) dbo:writer ?x .
res:The Pillars of the Earth dbo:author ?x .
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 34 / 52
Language Gap: Challenges
Schema-agnostic QA
Query Ranking
Relation Extraction
Ontology Lexicalization
Extraction of surface forms
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 35 / 52
Justification Gap
Problem
Are you sure? Prove it to me.
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 36 / 52
Justification Gap
Solution
Gathering natural-language evidence?
http://aksw.org/Projects/defacto
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 37 / 52
Justification Gap: Automatic Query Generation
Solution
Gathering natural-language evidence?
⇓
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 38 / 52
Justification Gap: Evidence Generation
(s, p, o) = “ρ(s)” “ρ(p)” “ρ(s)”
:Momo :author :Ende
1 “Momo” “author” “Michael Ende”
2 “Momo” “written by” “Michael Ende”
3 “Momo” “book by” “Michael Ende”
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 39 / 52
Justification Gap: Proof Scoring
Combination of features including
1 Score of BOA pattern
2 Token distance
3 Total occurrence of resource labels
4 Similarity to title
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 40 / 52
Justification Gap: Trustworthiness
Combination of features including
1 Topic majority on the Web
2 Topic majority in results
3 Topic terms
4 Page rank
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 41 / 52
Justification Gap: Fact Confirmation
Combination of features including
1 Combined trustworthiness and proof score
2 Number of proofs
3 Total hit count
4 Domain/Range
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 42 / 52
Justification Gap: Evaluation
10 triples/property
Top-60 most used properties
473 from 600 manually verified to be true
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 43 / 52
Justification Gap: Evaluation
J48 is overall best classifier (78.8% - 87.6%)
Easiest data set: random
Mixed dataset hardest
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 44 / 52
Challenges
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 45 / 52
Challenges
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 46 / 52
Challenges
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 47 / 52
Challenges
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 48 / 52
Summary
Language Gap
Semantic Gap
Justification Gap
Access Gap
Data Gap
Noise Gap
. . .
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 49 / 52
Goal
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 50 / 52
Goal
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 51 / 52
The End
Thank you! Questions?
Axel Ngonga
http://aksw.org/AxelNgonga
ngonga@informatik.uni-leipzig.de
AKSW Research Group
University of Leipzig, Germany
@akswgroup
@NgongaAxel
Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 52 / 52
Emergent
Semantics
The Semantics of the Semantic Web
•  A priori: top-down semantics
–  Logical assertions
–  Crisp reuse of conceptualization
•  In practice: hybrid bottom-up/top-down approach
–  (Human/software) agents are sloppy/ignorant
–  Agents do not agree (for various reasons)
=> Centralized view on decentralized construct ?
Semantic Grounding
The meaning of symbols can be explained by its
semantic correspondences to other symbols alone
[“Understanding understanding” Rapaport 93]
•  Type 1 semantics: understanding in terms of something else
•  Problem: how to ground semantics?
•  Type 2 semantics: understanding something in terms of itself
•  “syntactic semantics”: grounding through recursive
understanding
Emergent Semantics
Emergent Semantics:
•  Semantics as a posteriori agreement on conceptualizations
=> Don’t believe / enforce the schema !
•  Semantics of symbols as recursive correspondences to other
symbols
•  Analyzing transitive closures of mappings
•  Self-organizing, bottom-up approach
•  Global semantics (stable states) emerging from multiple
local interactions
•  Syntactic semantics
•  Studying semantics from a syntactic perspective
3 Concrete Examples
1.  Emergence of Semantic Interoperability
2.  Entity disambiguation using same-as networks
3.  A posteriori schema for LOD properties
•  How many links do you need to make a semantic network
interoperable?
•  Semantic interoperability as an emergent property!
⇒  Connectivity indicator: ci = ∑j,k (jk-j(bc+cc)-k) pjk
•  Necessary condition for semantic interoperability in the
large: ci ≥ 0
Semantic Connectivity
Philippe Cudré-Mauroux, Karl Aberer: A Necessary Condition for Semantic
Interoperability in the Large. CoopIS/DOA/ODBASE 2004: 859-872.
Graph-Based Disambiguation
•  The great thing about unique identifiers is that there are
so many to choose from
–  URI jungle
–  Disambiguation based on transitive closures on equality links
Philippe Cudré-Mauroux, Parisa Haghani, Michael Jost, Karl Aberer, Hermann de Meer:
idMesh: graph-based disambiguation of linked data. WWW 2009: 591-600.
A Posteriori Schema
•  Instance data use schema constructs in creative
ways!
⇒ Retro-engineering of schema constructs based on the
deployment of instance data
⇒ Context-dependent, retro-compatible
Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux: Fixing the
Domain and Range of Properties in Linked Data by Context Disambiguation. LDOW 2015.
•  Tons of research opportunities in this field
•  Understanding the emergent properties of LOD
networks (and how to exploit them)
•  Analyzing the deployment / use of semantic data (a
priori VS a posteriori views)
•  Capturing user disagreement (e.g., multi-views
ontologies, fuzzy ontologies, results diversification)
Research Directions
Leveraging
Semantic Data
for
Sense Making
Volume
■  amount of data
Velocity
■  speed of data in and out
Variety
■  range of data types and sources
[Gartner 2012] "Big Data are high-volume, high-velocity, and/or high-variety
information assets that require new forms of processing to enable enhanced
decision making, insight discovery and process optimization"
Opportunity: The 3-Vs of Big Data
Entities
Information Management
•  The story so far:
–  Strict separation between unstructured and structured
data management infrastructures
DBMS
JDBC
SQL
Inverted Index
Keywords
HTTP
Information Integration
•  Information integration is still one of the biggest
CS problem out there (according to many e.g., Gartner)
•  Information integration typically requires some
sort of mediation
1.  Unstructured Data: keywords, synsets
2.  Structured Data: global schema, transitive closure of
schemas (mostly syntactic)
⇒ nightmarish if 1 and 2 taken separately, horror
marathon if considered together
Entities as Mediation
•  Rising paradigm
–  Store information at the entity granularity
–  Integrate information by inter-linking entities
•  Advantages?
–  Coarser granularity compared to keywords
•  More natural, e.g., brain functions similarly (or is it the
other way around?)
–  Denormalized information compared to RDBMSs
•  Schema-later, heterogeneity, sparsity
•  Pre-computed joins, “Semantic” linking
•  Drawbacks?
Entity-Centric Data Management
Higher-level apps
Exposing Textual Data
•  The XI Pipeline
•  Runs on massive amounts of data (Spark)
Mention
Extraction
NER
Entity
Linking
Entity
Typing
Named Entity Recognition (NER)
Text
extraction
(Apache Tika)
List of
extracted
n-grams
n-gram
Indexing
foreach
Candidate
Selection
List of
selected
n-grams
Supervised
Classi!er
Ranked
list of
n-grams
Lemmat
ization
n+1 grams
merging
Feature
extractionFeature
extractionFeatures
POS
Tagging
frequency
reweighting
Roman Prokofyev, Gianluca Demartini, Philippe Cudré-Mauroux:
Effective named entity recognition for idiosyncratic web collections. WWW 2014: 397-408
Entity Linking
•  Linking entities to text is an old problem…
–  … and is extremely hard, esp. for machines
•  Dozens of approaches have been suggested
•  What if
–  We want to combine approaches / frameworks?
–  We want to leverage both human computations &
algorithms?
ZenCrowd
•  Integrate textual data w/ the Web of Data
•  Uses sets of algorithmic matchers to match
entities to online concepts
•  Uses dynamic templating to create micro-
matching-tasks and publish them on MTurk
•  Combines both algorithmic and human matchers
using probabilistic networks
Gianluca Demartini, Djellel Eddine Difallah, Philippe Cudré-Mauroux:
ZenCrowd: leveraging probabilistic reasoning and crowdsourcing
techniques for large-scale entity linking. WWW 2012: 469-478
ZenCrowd Architecture
Micro
Matching
Tasks
HTML
Pages
HTML+ RDFa
Pages
LOD Open Data Cloud
Crowdsourcing
Platform
ZenCrowd
Entity
Extractors
LOD Index Get Entity
Input Output
Probabilistic
Network
Decision Engine
Micro-
TaskManager
Workers Decisions
Algorithmic
Matchers
Probabilistic Inference
•  Probabilistic network to integrate a priori & a
posteriori information
–  Agreement of good turkers & algorithms
•  Learning process
–  Constraints
•  Unicity
•  Equality (SameAs)
–  Giant probabilistic graph
•  Instantiated selectively
w1
w2
l1
l2
pw1( ) pw2( )
lf1( ) lf2( )
pl1( ) pl2( )
l
lf3
pl
c11
c22
c12
c21
c13
u2-3( )sa1-2( )
Does it Work?
•  Improves avg. prec. by 0.14 on average!
–  Minimal crowd involvement
–  Embarrassingly parallel problem
Top$US$
Worker$
0$
0.5$
1$
0$ 250$ 500$
Worker&Precision&
Number&of&Tasks&
US$Workers$
IN$Workers$
0.6$
0.62$
0.64$
0.66$
0.68$
0.7$
0.72$
0.74$
0.76$
0.78$
0.8$
1$ 2$ 3$ 4$ 5$ 6$ 7$ 8$ 9$
Precision)
Top)K)workers)
Entity Typing
•  Entities can have many types (facets)
•  Which fine-grained types are most relevant given the context?
Thing	
  
American	
  
Billionaires	
  
People	
  
from	
  King	
  
County	
  
People	
  
from	
  
Sea:le	
  
Windows	
  
People	
  
Agent	
  
Person	
  
Living	
  
People	
  
American	
  
People	
  of	
  
Sco@sh	
  
Descent	
  
Harvard	
  
University	
  
People	
  
American	
  
Computer	
  
Programmers	
  
American	
  
Philanthropists	
  
TRank
•  Fine-grained Typing
•  Tree of 447’260 types
•  Rooted on <owl:Thing>
•  Depth of 19
•  Ranks relevant types by analyzing the context
•  Textual context
•  Graph context
•  Decision trees
•  Linear regression
Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, Karl Aberer:
TRank: Ranking Entity Types Using the Web of Data. ISWC 2013: 640-656.
Exposing Relational Data
•  Mapping language file
describes the relation between
ontology and RDB
•  Server provides HTML and
linked data views and a
SPARQL 1.1 endpoint
•  Rewriting engine uses map-
pings to rewrite Jena &
Sesame API calls to SQL
queries and generates RDF
dumps in various formats
http://d2rq.org/ ,
http://aksw.org/Projects/Sparqlify.html , etc.
Exposing Webtables
•  Wealth of data in (HTML) tables
•  Yet another type of content to expose
Sreeram Balakrishnan, Alon Y. Halevy, Boulos Harb, Hongrae Lee, Jayant Madhavan, Afshin
Rostamizadeh, Warren Shen, Kenneth Wilder, Fei Wu, Cong Yu: Applying WebTables in
Practice. CIDR 2015
Tao, Cui, and David W. Embley. "Automatic hidden-web table interpretation, conceptualization,
and semantic annotation." Data & Knowledge Engineering 68.7 (2009): 683-703.
Application 1: Enterprise Search
•  How can end-users reach entities?
⇒ Structured search
⇒ Keyword search
•  On their names or attributes
–  Obviously not ideal
•  BM25 on TREC 2011 AOR: MAP=0.15, P@10=0.20
•  Query extension, query completion or pseudo-relevance
feedback yield comparable (or worse) results
Hybrid Entity Search
The Descendants
TheDescendants
type
title
GeorgeClooney
George Clooney
name
May 6, 1961
dateOfBirth
type
ShaileneW
Shailene Woodley
name
Nov. 15, 1991
dateOfBirth
type
playsIn
playsIn
•  Main idea: combine unstructured and structured
search
–  Inverted index to locate first candidates
–  Graph queries to refine the results
•  Graph traversals (queries on object properties)
•  Graph neighborhoods (queries on
data type properties)
Inverted Index
Keywords
HTTP
DBMS
SPARQL
Architecture
LOD Cloud
index()
User
Query Annotation
and Expansion
Inverted Index
RDF
Store
Ranking
FunctionsRanking
FunctionsRanking
Functions
query()
Entity Search
Keyword Query
intermediate
top-k results
Graph-Enriched
Results
Graph Traversals
(queries on object
properties)
Neighborhoods
(queries on datatype
properties)
Structured
Inverted Index
WordNet
3rd party
search engines
Final Ranking
Function
Pseudo-Relevance Feedback
Alberto Tonon, Gianluca Demartini, Philippe Cudré-Mauroux: Combining inverted indices and
structured search for ad-hoc object retrieval. SIGIR 2012: 125-134
Application 2:
Literature Browsing/Recommendation
Application 3: Co-Reference Resolution
•  Better co-reference resolution through the
knowledge base
Roman Prokofyev, Alberto Tonon, Michael Luggen, Loic Vouilloz, Djellel Eddine Difallah, and
Philippe Cudre-Mauroux: SANAPHOR: Ontology-Based Coreference Resolution. ISWC 2015.
Barack Obama
called
Angela Merkel last week;
the president asked
the chancellor whether…
•  NER in vertical domains
•  Crowdsourcing parts of the processing
•  Predicate extraction
•  Summarization
•  Exposing further types of content
•  Updates / transactions
•  Parallelization
•  Higher-level applications
Research Opportunities
1.  Analyzing emergent properties of LOD
2.  Crowdsourcing predicate extraction
3.  SPARQL verbalization
4.  Hybrid question answering
5.  Source selection
6.  Ranking SPARQL queries
Research Tasks

More Related Content

Similar to SSSW 2015 Sense Making

4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...Holistic Benchmarking of Big Linked Data
 
2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slides2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slidesFilip Ilievski
 
Rich Data? Poor Data? Depends on...
Rich Data? Poor Data? Depends on...Rich Data? Poor Data? Depends on...
Rich Data? Poor Data? Depends on...Lars G. Svensson
 
SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesBasil Ell
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Julien PLU
 
Translating Natural Language into SPARQL for Neural Question Answering
Translating Natural Language into SPARQL for Neural Question AnsweringTranslating Natural Language into SPARQL for Neural Question Answering
Translating Natural Language into SPARQL for Neural Question AnsweringTommaso Soru
 
TechSEO Boost 2018: Search & Spam Fighting in the Age of Deep Learning
TechSEO Boost 2018: Search & Spam Fighting in the Age of Deep LearningTechSEO Boost 2018: Search & Spam Fighting in the Age of Deep Learning
TechSEO Boost 2018: Search & Spam Fighting in the Age of Deep LearningCatalyst
 
How To Do An Introduction For A Research Paper. Publicati
How To Do An Introduction For A Research Paper. PublicatiHow To Do An Introduction For A Research Paper. Publicati
How To Do An Introduction For A Research Paper. PublicatiErika Nelson
 
601 Session5-Encyclopedias
601 Session5-Encyclopedias601 Session5-Encyclopedias
601 Session5-EncyclopediasDiane Nahl
 
Knowledge Technologies: Opportunities and Challenges
Knowledge Technologies: Opportunities and ChallengesKnowledge Technologies: Opportunities and Challenges
Knowledge Technologies: Opportunities and ChallengesFariz Darari
 
Visual Querying LOD sources with LODeX
 Visual Querying LOD sources with LODeX Visual Querying LOD sources with LODeX
Visual Querying LOD sources with LODeXFabio Benedetti
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4semanticsconference
 
Semantics at the multimedia fragment level SSSW 2013
Semantics at the multimedia fragment level SSSW 2013Semantics at the multimedia fragment level SSSW 2013
Semantics at the multimedia fragment level SSSW 2013Raphael Troncy
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?Michael Nelson
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
 

Similar to SSSW 2015 Sense Making (20)

4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
 
2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slides2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slides
 
All good things
All good thingsAll good things
All good things
 
Rich Data? Poor Data? Depends on...
Rich Data? Poor Data? Depends on...Rich Data? Poor Data? Depends on...
Rich Data? Poor Data? Depends on...
 
SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queries
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
 
Translating Natural Language into SPARQL for Neural Question Answering
Translating Natural Language into SPARQL for Neural Question AnsweringTranslating Natural Language into SPARQL for Neural Question Answering
Translating Natural Language into SPARQL for Neural Question Answering
 
On Unified Stream Reasoning
On Unified Stream ReasoningOn Unified Stream Reasoning
On Unified Stream Reasoning
 
2014.12 - Let's Disco (EDDI 2014)
2014.12 - Let's Disco (EDDI 2014)2014.12 - Let's Disco (EDDI 2014)
2014.12 - Let's Disco (EDDI 2014)
 
TechSEO Boost 2018: Search & Spam Fighting in the Age of Deep Learning
TechSEO Boost 2018: Search & Spam Fighting in the Age of Deep LearningTechSEO Boost 2018: Search & Spam Fighting in the Age of Deep Learning
TechSEO Boost 2018: Search & Spam Fighting in the Age of Deep Learning
 
Link Discovery Tutorial Introduction
Link Discovery Tutorial IntroductionLink Discovery Tutorial Introduction
Link Discovery Tutorial Introduction
 
How To Do An Introduction For A Research Paper. Publicati
How To Do An Introduction For A Research Paper. PublicatiHow To Do An Introduction For A Research Paper. Publicati
How To Do An Introduction For A Research Paper. Publicati
 
601 Session5-Encyclopedias
601 Session5-Encyclopedias601 Session5-Encyclopedias
601 Session5-Encyclopedias
 
Knowledge Technologies: Opportunities and Challenges
Knowledge Technologies: Opportunities and ChallengesKnowledge Technologies: Opportunities and Challenges
Knowledge Technologies: Opportunities and Challenges
 
Visual Querying LOD sources with LODeX
 Visual Querying LOD sources with LODeX Visual Querying LOD sources with LODeX
Visual Querying LOD sources with LODeX
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Aussenac semanticsnl pwebsem2017-v4
Aussenac semanticsnl pwebsem2017-v4Aussenac semanticsnl pwebsem2017-v4
Aussenac semanticsnl pwebsem2017-v4
 
Semantics at the multimedia fragment level SSSW 2013
Semantics at the multimedia fragment level SSSW 2013Semantics at the multimedia fragment level SSSW 2013
Semantics at the multimedia fragment level SSSW 2013
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 

More from eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictioneXascale Infolab
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...eXascale Infolab
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapeXascale Infolab
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...eXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...eXascale Infolab
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceanseXascale Infolab
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutioneXascale Infolab
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data ManagementeXascale Infolab
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataeXascale Infolab
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataeXascale Infolab
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingeXascale Infolab
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...eXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingeXascale Infolab
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)eXascale Infolab
 

More from eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 

Recently uploaded

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Recently uploaded (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

SSSW 2015 Sense Making

  • 1. SSSW 2015 Bertinoro–Italy July 10, 2015 Sense Making Axel-Cyrille Ngonga Ngomo & Philippe Cudré-Mauroux
  • 2. On Making Sense •  ½ of Computer Science is about making sense of some input data –  KDD (cf. Claudia & Laura tutorial) –  NLP (cf. Roberto’s talk) –  Multimedia Analysis –  Social Media / Big Data Analytics –  Visualization –  etc.
  • 3. On the Menu Today •  Making Sense of Semantic Data –  Making sense of SPARQL & Semantic Web predicates –  Trust on Semantic Web data –  Emergent Semantics •  Leveraging Semantic Data for Sense Making –  Making sense of textual entities –  Making sense of relational data –  Making sense of webtables
  • 5. Introduction At some point in the early twenty-first century, all of mankind was united in celebration. We marveled at our own magnificence as we gave birth to AI. – Morpheus, The Matrix Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 2 / 52
  • 6. Introduction Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 3 / 52
  • 7. Linked Data Web Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 4 / 52
  • 8. Linked Data Web Sense Making Helping end users to make sense of the Semantic Web. Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 4 / 52
  • 9. Gaps Language Gap Semantic Web speaks languages that normal users do not understand Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 5 / 52
  • 10. Language Gap Problem What does it mean? Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 6 / 52
  • 11. Language Gap Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 7 / 52
  • 12. Language Gap Problem What does it mean? 1 PREFIX dbo: <http :// dbpedia.org/ontology/> 2 PREFIX res: <http :// dbpedia.org/resource/> 3 SELECT DISTINCT ?person WHERE { 4 ?person dbo:team ?sportsTeam . 5 ?sportsTeam dbo:league res: Premier_League . 6 ?person dbo:birthDate ?date . 7 ?person dbo:birthPlace ?place . 8 { ?place dbo:locatedIn res:Africa .} 9 UNION 10 { ?place dbo:locatedIn res:Asia .} 11 } 12 ORDER BY DESC (? date) 13 OFFSET 0 LIMIT 1 Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 8 / 52
  • 13. Language Gap Problem What does it mean? 1 PREFIX dbo: <http :// dbpedia.org/ontology/> 2 PREFIX res: <http :// dbpedia.org/resource/> 3 SELECT DISTINCT ?person WHERE { 4 ?person dbo:team ?sportsTeam . 5 ?sportsTeam dbo:league res: Premier_League . 6 ?person dbo:birthDate ?date . 7 ?person dbo:birthPlace ?place . 8 { ?place dbo:locatedIn res:Africa .} 9 UNION 10 { ?place dbo:locatedIn res:Asia .} 11 } 12 ORDER BY DESC (? date) 13 OFFSET 0 LIMIT 1 Give me the youngest person who plays in a Premier League team and was born in Africa or Asia. Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 8 / 52
  • 14. Language Gap Solution Verbalization frameworks for the Semantic Web Document planner MicroplannerRealizer http://github.com/AKSW/SemWeb2NL Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 9 / 52
  • 15. Language Gap: Triple2NL/BGP2NL Approach 1 ρ(s p o) ⇒ poss(ρ(p),ρ(s)) ∧ subj(BE,ρ(p)) ∧ dobj(BE,ρ(o)) 2 ρ(s p o) ⇒ subj(ρ(p),ρ(s)) ∧ dobj(ρ(p),ρ(o)) Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 10 / 52
  • 16. Language Gap: Triple2NL/BGP2NL Approach 1 ρ(s p o) ⇒ poss(ρ(p),ρ(s)) ∧ subj(BE,ρ(p)) ∧ dobj(BE,ρ(o)) 2 ρ(s p o) ⇒ subj(ρ(p),ρ(s)) ∧ dobj(ρ(p),ρ(o)) 1 :Momo :author :Ende ⇒ Momo’s author is Michael Ende. 2 ?x :author :Ende ⇒ ?x ’s author is Michael Ende. Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 10 / 52
  • 17. Language Gap: Triple2NL/BGP2NL Approach 1 ρ(s p o) ⇒ poss(ρ(p),ρ(s)) ∧ subj(BE,ρ(p)) ∧ dobj(BE,ρ(o)) 2 ρ(s p o) ⇒ subj(ρ(p),ρ(s)) ∧ dobj(ρ(p),ρ(o)) 1 :Momo :author :Ende ⇒ Momo’s author is Michael Ende. 2 ?x :author :Ende ⇒ ?x ’s author is Michael Ende. 3 :Momo :writtenBy :Ende ⇒ Momo was written by Michael Ende. 4 ?x :writtenBy :Ende ⇒ ?x was written by Michael Ende. Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 10 / 52
  • 18. Language Gap: SPARQL2NL/RDF2NL Approach Combination rules 1 ρ((s, p, o1).(s, p, o2)) ⇒ poss(ρ(p),ρ(s))∧ subj(BE,ρ(p)) ∧ dobj(BE, cc(ρ(o1), ρ(o1)) ?x’s author is Paul Erd¨os and ?x’s author is Kevin Bacon. ⇒ ?x’s authors are Paul Erd¨os and Kevin Bacon. Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 11 / 52
  • 19. Language Gap: SPARQL2NL/RDF2NL ?place is Shakespeare’s birth place or ?place is Shakespeare’s death place. ⇒ ?place is Shakespeare’s birth or death place. This query retrieves values ?height such that ?height is Claudia Schiffer’s height. ⇒ This query retrieves Claudia Schiffer’s height. ?person’s team is ?sportsTeam. ?person’s birth date is ?date. ?sportsTeam’s league is Premier League. ⇒ ?person’s team is ?sportsTeam, ?person’s birth date is ?date, and ?sportsTeam’s league is Premier League. Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 12 / 52
  • 20. Language Gap: Evaluation 125 participants, 49 SPARQL experts, 3 tasks 94% of verbalizations were understandable 5.31 ± 1.08 average adequacy score 0 50 100 150 200 250 Number of Survey Answers 1 2 3 4 5 6 Adequacy 0 20 40 60 80 100 120 Number of Survey Answers 1 2 3 4 5 6 Fluency Figure : Adequacy and fluency results in survey Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 13 / 52
  • 21. Language Gap: Evaluation 125 participants, 49 SPARQL experts, 3 tasks Slightly larger error with NL for experts Non-experts enabled understand the meaning of queries 0 0,2 0,4 0,6 0,8 1 1,2 1,4 error rate SPARQL NL NL (SPARQL experts) Figure : Error rate over the three tasks Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 13 / 52
  • 22. Language Gap: Evaluation 125 participants, 49 SPARQL experts, 3 tasks Non-experts faster with NL than experts with SPARQL Experts faster with NL than experts with SPARQL 0 5 10 15 20 time in minutes (purple = standard deviation) SPARQL SPARQL (filtered) NL NL (filtered) NL (SPARQL experts) NL (SPARQL experts, filtered) Figure : Average time needed Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 13 / 52
  • 23. Language Gap: Application Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 14 / 52
  • 24. Language Gap: Application Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 15 / 52
  • 25. Language Gap: Challenges Complex queries Sacrifice adequacy for fluency Other languages Hybrid approach Personalization Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 16 / 52
  • 26. Gaps Semantic Gap Decentralized content generation Contextualization mismatch Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 17 / 52
  • 27. Semantic Gap Problem How do I communicate with it? Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 18 / 52
  • 28. Semantic Gap Solution Question Answering Systems Example: Where did Abraham Lincoln die? SELECT ?x WHERE { res:Abraham Lincoln dbo:deathPlace ?x . } PowerAqua: Triple representation: state/place, die, Abraham Lincoln Ontology mappings: Place, deathPlace, Abraham Lincoln Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 19 / 52
  • 29. Semantic Gap: Mismatch Triples do not always provide a faithful representation of the semantic structure of the question Thus more expressive queries cannot be answered Example 1: Which cities have more than three universities? SELECT ?y WHERE { ?x rdf:type dbo:University . ?x dbo:city ?y . } HAVING (COUNT(?x) > 3) Triple representation: cities, more than, universities three Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 20 / 52
  • 30. Semantic Gap: Mismatch Triples do not always provide a faithful representation of the semantic structure of the question Thus more expressive queries cannot be answered Example 2: Who produced the most films? SELECT ?y WHERE { ?x rdf:type dbo:Film . ?x dbo:producer ?y . } ORDER BY DESC(COUNT(?x)) LIMIT 1 Triple representation: person/organization, produced, most films Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 20 / 52
  • 31. Semantic Gap: Approach To understand a user question, we need to understand: The words Abraham Lincoln → res:Abraham Lincoln died in → dbo:deathPlace The semantic structure the most N → ODER BY DESC(COUNT(?n)) LIMIT 1 more than three N → HAVING (COUNT(?n) > 3) Template-Based Question Answering 1 Template generation: Understanding the semantic structure) 2 Template instantiation: Understanding the words) Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 21 / 52
  • 32. Semantic Gap: Example Query: Who produced the most films? Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 22 / 52
  • 33. Semantic Gap: Example Query: Who produced the most films? 1 SPARQL template: SELECT ?x WHERE { ?y rdf:type ?c . ?y ?p ?x . } ORDER BY DESC(COUNT(?y)) LIMIT 1 ?c CLASS [films] ?p PROPERTY [produced] Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 22 / 52
  • 34. Semantic Gap: Example Query: Who produced the most films? 1 SPARQL template: SELECT ?x WHERE { ?y rdf:type ?c . ?y ?p ?x . } ORDER BY DESC(COUNT(?y)) LIMIT 1 ?c CLASS [films] ?p PROPERTY [produced] 2 Instantiations: ?c = <http://dbpedia.org/ontology/Film> ?p = <http://dbpedia.org/ontology/producer> Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 22 / 52
  • 35. Semantic Gap: Architecture Natural Language Question Semantic Representaion SPARQL Query Templates Templates with URI slots Ranked SPARQL Queries Answer LOD Entity identification Entity and Query Ranking Query Selection Resources and Classes SPARQL Endpoint Type Checking and Prominence BOA Pattern Library Properties Tagged Question Domain Independent Lexicon Domain Dependent Lexicon Parsing Corpora? ! Loading State Process Uses Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 23 / 52
  • 36. Semantic Gap: Template Generation Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 24 / 52
  • 37. Semantic Gap: Template Generation 1 Natural language question is tagged with part-of-speech information. Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 24 / 52
  • 38. Semantic Gap: Template Generation 2 Based on POS tags, lexical entries are built on the fly. Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 24 / 52
  • 39. Semantic Gap: Template Generation 3 These lexical entries, together with domain-independent lexical entries, are used for parsing the question. Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 24 / 52
  • 40. Semantic Gap: Template Generation 4 The resulting semantic representation is translated into a SPARQL template. Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 24 / 52
  • 41. Semantic Gap: Who produced the most films? domain-independent: who, the most domain-dependent: produced/VBD, films/NNS SPARQL template 1: SELECT ?x WHERE { ?x ?p ?y . ?y rdf:type ?c . } ORDER BY DESC(COUNT(?y)) LIMIT 1 ?c CLASS [films] ?p PROPERTY [produced] Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 25 / 52
  • 42. Semantic Gap: Who produced the most films? domain-independent: who , the most domain-dependent: produced/VBD, films/NNS SPARQL template 1: SELECT ?x WHERE { ?x ?p ?y . ?y rdf:type ?c . } ORDER BY DESC(COUNT(?y)) LIMIT 1 ?c CLASS [films] ?p PROPERTY [produced] Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 25 / 52
  • 43. Semantic Gap: Who produced the most films? domain-independent: who, the most domain-dependent: produced/VBD , films/NNS SPARQL template 1: SELECT ?x WHERE { ?x ?p ?y . ?y rdf:type ?c . } ORDER BY DESC(COUNT(?y)) LIMIT 1 ?c CLASS [films] ?p PROPERTY [produced] Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 25 / 52
  • 44. Semantic Gap: Who produced the most films? domain-independent: who, the most domain-dependent: produced/VBD, films/NNS SPARQL template 2: SELECT ?x WHERE { ?x ?p ?y . } ORDER BY DESC(COUNT(?y)) LIMIT 1 ?p PROPERTY [films] Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 25 / 52
  • 45. Semantic Gap: Template instantiation Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 26 / 52
  • 46. Semantic Gap: Template instantiation 1 For resources and classes: Identify synonyms of the label using WordNet. Retrieve entities with a label similar to the slot label based on string similarities (trigram, Levenshtein, substring). Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 26 / 52
  • 47. Semantic Gap: Template instantiation 2 For property labels, the label is additionally compared to natural language expressions stored in the BOA pattern library. Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 26 / 52
  • 48. Semantic Gap: Template instantiation 3 The highest ranking entities are returned as candidates for filling the query slots. Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 26 / 52
  • 49. BOA The BOA pattern library is a repository of natural language representations of Semantic Web predicates. Idea: For each predicate P in a data repository (e.g. DBpedia), collect the set of entities S and O connected through P. Search a text corpus (e.g. Wikipedia) for all sentences containing the labels of S and O. For all retrieved sentences, the natural language predicate is a potential pattern for P. The potential patterns are then scored by a neural network (e.g. according to frequency) and filtered. Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 27 / 52
  • 50. BOA: Example Predicate: http://dbpedia.org/ontology/subsidiary RDF snippet: <http://dbpedia.org/resource/Google> <http://dbpedia.org/ontology/subsidiary> <http://dbpedia.org/resource/YouTube> . <http://dbpedia.org/resource/Google> rdfs:label ‘Google’@en . <http://dbpedia.org/resource/YouTube> rdfs:label ‘Youtube’@en . Sentences: Google’s acquisition of Youtube comes as online video is really starting to hit its stride. Youtube, a division of Google, is exploring a new way to get more high-quality clips on its site: financing amateur video creators. Patterns: subsidiary: S’s acquisition of O subsidiary: O, a division of S Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 28 / 52
  • 51. BOA The use of BOA patterns allows us to match natural language expressions and ontology concepts even if they are not string similar and not covered by WordNet. Examples: married to → http://dbpedia.org/ontology/spouse was born in → http://dbpedia.org/ontology/birthPlace graduated from → http://dbpedia.org/ontology/almaMater write → http://dbpedia.org/ontology/author Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 29 / 52
  • 52. Example: Who produced the most films? Candidates for filling query slots: ?c CLASS [films] <http://dbpedia.org/ontology/Film> <http://dbpedia.org/ontology/FilmFestival> . . . ?p PROPERTY [produced] <http://dbpedia.org/ontology/producer> <http://dbpedia.org/property/producer> <http://dbpedia.org/ontology/wineProduced> . . . Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 30 / 52
  • 53. Semantic Gap: Query ranking and selection Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 31 / 52
  • 54. Semantic Gap: Query ranking and selection 1 Every entity receives a score considering string similarity and prominence 2 The score of a query is then computed as the average of the scores of the entities used to fill its slots Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 31 / 52
  • 55. Semantic Gap: Query ranking and selection 3 In addition, type checks are performed Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 31 / 52
  • 56. Semantic Gap: Query ranking and selection 4 Of the remaining queries, the one with highest score that returns a result is chosen to retrieve an answer. Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 31 / 52
  • 57. Example: Who produced the most films? SELECT ?x WHERE { ?x <http://dbpedia.org/ontology/producer> ?y . ?y rdf:type <http://dbpedia.org/ontology/Film> . } ORDER BY DESC(COUNT(?y)) LIMIT 1 Score: 0.7592425075864263 SELECT ?x WHERE { ?x <http://dbpedia.org/ontology/film> ?y . } ORDER BY DESC(COUNT(?y)) LIMIT 1 Score: 0.6264001353183296 SELECT ?x WHERE { ?x <http://dbpedia.org/ontology/producer> ?y . ?y rdf:type <http://dbpedia.org/ontology/FilmFestival>. } ORDER BY DESC(COUNT(?y)) LIMIT 1 Score: 0.6012584940627768 Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 32 / 52
  • 58. Evaluation Setup Question set: 39 DBpedia training questions from QALD-1 5 could not be parsed due to unknown syntactic constructions or uncovered domain-independent expressions 19 were answered exactly as required by the benchmark (with precision and recall 1.0) Another 2 are answered almost correctly (with precision and recall greater than 0.8) Mean precision: 0.61 Mean recall: 0.63 F-measure: 0.62 Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 33 / 52
  • 59. Main Sources of Error Incorrect templates Template structure does not coincide with structure of the data: When did Germany join the EU? res:Germany dbp:accessioneudate ?x . Predicate detection fails inhabitants dbp:population, dbp:populationTotal owns dbo:keyPerson higher dbp:elevationM Wrong query is selected Who wrote The pillars of the Earth? res:The Pillars of the Earth (TV Miniseries) dbo:writer ?x . res:The Pillars of the Earth dbo:author ?x . Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 34 / 52
  • 60. Language Gap: Challenges Schema-agnostic QA Query Ranking Relation Extraction Ontology Lexicalization Extraction of surface forms Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 35 / 52
  • 61. Justification Gap Problem Are you sure? Prove it to me. Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 36 / 52
  • 62. Justification Gap Solution Gathering natural-language evidence? http://aksw.org/Projects/defacto Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 37 / 52
  • 63. Justification Gap: Automatic Query Generation Solution Gathering natural-language evidence? ⇓ Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 38 / 52
  • 64. Justification Gap: Evidence Generation (s, p, o) = “ρ(s)” “ρ(p)” “ρ(s)” :Momo :author :Ende 1 “Momo” “author” “Michael Ende” 2 “Momo” “written by” “Michael Ende” 3 “Momo” “book by” “Michael Ende” Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 39 / 52
  • 65. Justification Gap: Proof Scoring Combination of features including 1 Score of BOA pattern 2 Token distance 3 Total occurrence of resource labels 4 Similarity to title Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 40 / 52
  • 66. Justification Gap: Trustworthiness Combination of features including 1 Topic majority on the Web 2 Topic majority in results 3 Topic terms 4 Page rank Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 41 / 52
  • 67. Justification Gap: Fact Confirmation Combination of features including 1 Combined trustworthiness and proof score 2 Number of proofs 3 Total hit count 4 Domain/Range Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 42 / 52
  • 68. Justification Gap: Evaluation 10 triples/property Top-60 most used properties 473 from 600 manually verified to be true Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 43 / 52
  • 69. Justification Gap: Evaluation J48 is overall best classifier (78.8% - 87.6%) Easiest data set: random Mixed dataset hardest Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 44 / 52
  • 70. Challenges Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 45 / 52
  • 71. Challenges Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 46 / 52
  • 72. Challenges Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 47 / 52
  • 73. Challenges Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 48 / 52
  • 74. Summary Language Gap Semantic Gap Justification Gap Access Gap Data Gap Noise Gap . . . Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 49 / 52
  • 75. Goal Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 50 / 52
  • 76. Goal Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 51 / 52
  • 77. The End Thank you! Questions? Axel Ngonga http://aksw.org/AxelNgonga ngonga@informatik.uni-leipzig.de AKSW Research Group University of Leipzig, Germany @akswgroup @NgongaAxel Axel-Cyrille Ngonga Ngomo (AKSW) Sense Making July 10th, 2015 52 / 52
  • 79. The Semantics of the Semantic Web •  A priori: top-down semantics –  Logical assertions –  Crisp reuse of conceptualization •  In practice: hybrid bottom-up/top-down approach –  (Human/software) agents are sloppy/ignorant –  Agents do not agree (for various reasons) => Centralized view on decentralized construct ?
  • 80. Semantic Grounding The meaning of symbols can be explained by its semantic correspondences to other symbols alone [“Understanding understanding” Rapaport 93] •  Type 1 semantics: understanding in terms of something else •  Problem: how to ground semantics? •  Type 2 semantics: understanding something in terms of itself •  “syntactic semantics”: grounding through recursive understanding
  • 81. Emergent Semantics Emergent Semantics: •  Semantics as a posteriori agreement on conceptualizations => Don’t believe / enforce the schema ! •  Semantics of symbols as recursive correspondences to other symbols •  Analyzing transitive closures of mappings •  Self-organizing, bottom-up approach •  Global semantics (stable states) emerging from multiple local interactions •  Syntactic semantics •  Studying semantics from a syntactic perspective
  • 82. 3 Concrete Examples 1.  Emergence of Semantic Interoperability 2.  Entity disambiguation using same-as networks 3.  A posteriori schema for LOD properties
  • 83. •  How many links do you need to make a semantic network interoperable? •  Semantic interoperability as an emergent property! ⇒  Connectivity indicator: ci = ∑j,k (jk-j(bc+cc)-k) pjk •  Necessary condition for semantic interoperability in the large: ci ≥ 0 Semantic Connectivity Philippe Cudré-Mauroux, Karl Aberer: A Necessary Condition for Semantic Interoperability in the Large. CoopIS/DOA/ODBASE 2004: 859-872.
  • 84. Graph-Based Disambiguation •  The great thing about unique identifiers is that there are so many to choose from –  URI jungle –  Disambiguation based on transitive closures on equality links Philippe Cudré-Mauroux, Parisa Haghani, Michael Jost, Karl Aberer, Hermann de Meer: idMesh: graph-based disambiguation of linked data. WWW 2009: 591-600.
  • 85. A Posteriori Schema •  Instance data use schema constructs in creative ways! ⇒ Retro-engineering of schema constructs based on the deployment of instance data ⇒ Context-dependent, retro-compatible Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux: Fixing the Domain and Range of Properties in Linked Data by Context Disambiguation. LDOW 2015.
  • 86. •  Tons of research opportunities in this field •  Understanding the emergent properties of LOD networks (and how to exploit them) •  Analyzing the deployment / use of semantic data (a priori VS a posteriori views) •  Capturing user disagreement (e.g., multi-views ontologies, fuzzy ontologies, results diversification) Research Directions
  • 88. Volume ■  amount of data Velocity ■  speed of data in and out Variety ■  range of data types and sources [Gartner 2012] "Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization" Opportunity: The 3-Vs of Big Data
  • 90. Information Management •  The story so far: –  Strict separation between unstructured and structured data management infrastructures DBMS JDBC SQL Inverted Index Keywords HTTP
  • 91. Information Integration •  Information integration is still one of the biggest CS problem out there (according to many e.g., Gartner) •  Information integration typically requires some sort of mediation 1.  Unstructured Data: keywords, synsets 2.  Structured Data: global schema, transitive closure of schemas (mostly syntactic) ⇒ nightmarish if 1 and 2 taken separately, horror marathon if considered together
  • 92. Entities as Mediation •  Rising paradigm –  Store information at the entity granularity –  Integrate information by inter-linking entities •  Advantages? –  Coarser granularity compared to keywords •  More natural, e.g., brain functions similarly (or is it the other way around?) –  Denormalized information compared to RDBMSs •  Schema-later, heterogeneity, sparsity •  Pre-computed joins, “Semantic” linking •  Drawbacks?
  • 94. Exposing Textual Data •  The XI Pipeline •  Runs on massive amounts of data (Spark) Mention Extraction NER Entity Linking Entity Typing
  • 95. Named Entity Recognition (NER) Text extraction (Apache Tika) List of extracted n-grams n-gram Indexing foreach Candidate Selection List of selected n-grams Supervised Classi!er Ranked list of n-grams Lemmat ization n+1 grams merging Feature extractionFeature extractionFeatures POS Tagging frequency reweighting Roman Prokofyev, Gianluca Demartini, Philippe Cudré-Mauroux: Effective named entity recognition for idiosyncratic web collections. WWW 2014: 397-408
  • 96. Entity Linking •  Linking entities to text is an old problem… –  … and is extremely hard, esp. for machines •  Dozens of approaches have been suggested •  What if –  We want to combine approaches / frameworks? –  We want to leverage both human computations & algorithms?
  • 97. ZenCrowd •  Integrate textual data w/ the Web of Data •  Uses sets of algorithmic matchers to match entities to online concepts •  Uses dynamic templating to create micro- matching-tasks and publish them on MTurk •  Combines both algorithmic and human matchers using probabilistic networks Gianluca Demartini, Djellel Eddine Difallah, Philippe Cudré-Mauroux: ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. WWW 2012: 469-478
  • 98. ZenCrowd Architecture Micro Matching Tasks HTML Pages HTML+ RDFa Pages LOD Open Data Cloud Crowdsourcing Platform ZenCrowd Entity Extractors LOD Index Get Entity Input Output Probabilistic Network Decision Engine Micro- TaskManager Workers Decisions Algorithmic Matchers
  • 99. Probabilistic Inference •  Probabilistic network to integrate a priori & a posteriori information –  Agreement of good turkers & algorithms •  Learning process –  Constraints •  Unicity •  Equality (SameAs) –  Giant probabilistic graph •  Instantiated selectively w1 w2 l1 l2 pw1( ) pw2( ) lf1( ) lf2( ) pl1( ) pl2( ) l lf3 pl c11 c22 c12 c21 c13 u2-3( )sa1-2( )
  • 100. Does it Work? •  Improves avg. prec. by 0.14 on average! –  Minimal crowd involvement –  Embarrassingly parallel problem Top$US$ Worker$ 0$ 0.5$ 1$ 0$ 250$ 500$ Worker&Precision& Number&of&Tasks& US$Workers$ IN$Workers$ 0.6$ 0.62$ 0.64$ 0.66$ 0.68$ 0.7$ 0.72$ 0.74$ 0.76$ 0.78$ 0.8$ 1$ 2$ 3$ 4$ 5$ 6$ 7$ 8$ 9$ Precision) Top)K)workers)
  • 101. Entity Typing •  Entities can have many types (facets) •  Which fine-grained types are most relevant given the context? Thing   American   Billionaires   People   from  King   County   People   from   Sea:le   Windows   People   Agent   Person   Living   People   American   People  of   Sco@sh   Descent   Harvard   University   People   American   Computer   Programmers   American   Philanthropists  
  • 102. TRank •  Fine-grained Typing •  Tree of 447’260 types •  Rooted on <owl:Thing> •  Depth of 19 •  Ranks relevant types by analyzing the context •  Textual context •  Graph context •  Decision trees •  Linear regression Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, Karl Aberer: TRank: Ranking Entity Types Using the Web of Data. ISWC 2013: 640-656.
  • 103. Exposing Relational Data •  Mapping language file describes the relation between ontology and RDB •  Server provides HTML and linked data views and a SPARQL 1.1 endpoint •  Rewriting engine uses map- pings to rewrite Jena & Sesame API calls to SQL queries and generates RDF dumps in various formats http://d2rq.org/ , http://aksw.org/Projects/Sparqlify.html , etc.
  • 104. Exposing Webtables •  Wealth of data in (HTML) tables •  Yet another type of content to expose Sreeram Balakrishnan, Alon Y. Halevy, Boulos Harb, Hongrae Lee, Jayant Madhavan, Afshin Rostamizadeh, Warren Shen, Kenneth Wilder, Fei Wu, Cong Yu: Applying WebTables in Practice. CIDR 2015 Tao, Cui, and David W. Embley. "Automatic hidden-web table interpretation, conceptualization, and semantic annotation." Data & Knowledge Engineering 68.7 (2009): 683-703.
  • 105. Application 1: Enterprise Search •  How can end-users reach entities? ⇒ Structured search ⇒ Keyword search •  On their names or attributes –  Obviously not ideal •  BM25 on TREC 2011 AOR: MAP=0.15, P@10=0.20 •  Query extension, query completion or pseudo-relevance feedback yield comparable (or worse) results
  • 106. Hybrid Entity Search The Descendants TheDescendants type title GeorgeClooney George Clooney name May 6, 1961 dateOfBirth type ShaileneW Shailene Woodley name Nov. 15, 1991 dateOfBirth type playsIn playsIn •  Main idea: combine unstructured and structured search –  Inverted index to locate first candidates –  Graph queries to refine the results •  Graph traversals (queries on object properties) •  Graph neighborhoods (queries on data type properties) Inverted Index Keywords HTTP DBMS SPARQL
  • 107. Architecture LOD Cloud index() User Query Annotation and Expansion Inverted Index RDF Store Ranking FunctionsRanking FunctionsRanking Functions query() Entity Search Keyword Query intermediate top-k results Graph-Enriched Results Graph Traversals (queries on object properties) Neighborhoods (queries on datatype properties) Structured Inverted Index WordNet 3rd party search engines Final Ranking Function Pseudo-Relevance Feedback Alberto Tonon, Gianluca Demartini, Philippe Cudré-Mauroux: Combining inverted indices and structured search for ad-hoc object retrieval. SIGIR 2012: 125-134
  • 109. Application 3: Co-Reference Resolution •  Better co-reference resolution through the knowledge base Roman Prokofyev, Alberto Tonon, Michael Luggen, Loic Vouilloz, Djellel Eddine Difallah, and Philippe Cudre-Mauroux: SANAPHOR: Ontology-Based Coreference Resolution. ISWC 2015. Barack Obama called Angela Merkel last week; the president asked the chancellor whether…
  • 110. •  NER in vertical domains •  Crowdsourcing parts of the processing •  Predicate extraction •  Summarization •  Exposing further types of content •  Updates / transactions •  Parallelization •  Higher-level applications Research Opportunities
  • 111. 1.  Analyzing emergent properties of LOD 2.  Crowdsourcing predicate extraction 3.  SPARQL verbalization 4.  Hybrid question answering 5.  Source selection 6.  Ranking SPARQL queries Research Tasks