Hacker News new | past | comments | ask | show | jobs | submit login
SWI-Prolog for the semantic web (swi-prolog.org)
64 points by amkk on Sept 29, 2014 | hide | past | favorite | 31 comments



For anyone interested, I did my Masters thesis on SWI-prolog as a Semantic querying tool, integrating it with the Eclipse RCP platform (Bioclipse.net in particular), and compared its querying performance with Java based Jena SPARQL parser for some typical tasks in cheminformatics.

The title was "SWI-Prolog as a Semantic Web Tool for semantic querying in Bioclipse: Integration and performance benchmarking", and it is available for download here: https://www.researchgate.net/publication/50313589_SWI-Prolog...

In this particular task, SWI-Prolog totally knocked out Jena, and it was also more amenable to some heuristic optimiziations, where we the running time really became infinitesimal in comparison to other tools.


Read through the paper quickly, seems like a nice representation and solution of the problem. I have a couple of questions if you wouldn't mind.

You say this in the paper:

> It is an interesting observation that writing the Prolog query on the simpler form (Figure 18) made it amenable to heuristic optimization by sorting the values searched for, while this was not possible in the longer Prolog program

I'm afraid I didn't read carefully and will go back, but could you clarify this a bit? I didn't understand about the longer vs. the shorter prolog code. Would this optimization be required always get better performance than Jena or Pellet?

And then this:

> Additionally, a drawback of SWI-Prolog speci?cally, against Jena and Pellet, is that since it is not written in Java, it is not as portable (i.e. the same code can not easily be executed) to di?erent platforms such as Mac, Windows, Linux etc. Instead the source code has to be compiled separately for each platform. This also has the result that the SWI-Prolog Bioclipse integration plugin will not be as portable as Bioclipse itself.

Really, though, Bioclipse is dependent on the portabilty of Eclipse which probably doesn't support any more platforms than SWI Prolog (and most probably fewer than SWI Prolog), so I wouldn't really see that as a limitation. I would think you could provide the binaries in the distribution itself.


I might not have used the optimal wording there. The important difference between the two prolog implementations is that the "longer" version is implemented by (recursive) iteration over a list of [peak] values (and comparing it to a reference list), whereas in the shorter one, the reference values are specified as separate conditions in the prolog rule, combined with logical AND.

The separate conditions enables "shortcut" of the match-testing, in a way that the recursive list-parsing did not: As soon as one of the conditions is not met, the current spectrum will be rejected and the backtracking will go on with the next item, while with the recursive list-parsing version each item of the list of values will be compared to the reference [value-]list regardless of whether the current spectra has already been rejected or not.

The "shorter" prolog version, with separate conditions, thus enables to order the conditions according so that statistically rare peak values come first (thus, "heuristically"), so that a spectrum can be rejected as soon as possible.

Being a heuristic solution, the performance would of course depend on the prior knowledge of the values in the data.

But as can be seen from figure 15, both prolog versions beat Jena and Pellet by a large margin, though the "shorter" version did so well that it is even hard to notice an increase of the running time linear to the number of triples in the triplestore, in the diagram.

Hope that made it a tad clearer!


Yes, that helps a lot, thanks!


Regardgint the portability, I tend to agree. To the end user, it is indeed possible to provide a portable solution. It requires more work on the developer side, to include multiple binaries and setting up the required Eclipse plugins/fragments to support that.


E.g, just have a look at the graph in the top here, you can hardly see the dotted green line for prolog, creeping in the bottom of the graph: http://saml.rilspace.org/prolog-query-much-faster-when-mimic...


How does the closed world assumption of Prolog (if it's not explicitly specified it is assumed false) mesh with the open world assumption (if it is not explicitly specified it is unknown) of RDF?


In all the practical applications I've seen of RDF or OWL, all the queries rely on some form of the closed-world assumption.


OWA exists only on paper. Practically all implementations assume CWA.


That's false. Several practical implementations of OWL and RDF are based on OWA. At least one is based on CWA and OWA.


I used to use the swi-prolog semantic web tools heavily. Very good stuff.

The cleopatria semantic web server is interesting, but I have just played with it. The core RDF storage and inference get libraries were solid and nice to use.


The opening paragraph is structurally identical to "buy our nuclear power plant, it'll generate a billion terawatts, oh and it also comes with this bike shed". Prolog (the submission says) handles the gruesome problem of dealing with RDF, oh and it also generates HTML pages and JSON!

That pattern's a red flag whenever I see it. Like an ostensible proof of P!=NP that begins with a 30 page history written for laymen. Who is that paragraph aimed at? Is there really a sizeable population casually using RDF and Prolog but losing sleep over HTML and JSON?


> a population casually using RDF and Prolog but losing sleep over HTML and JSON?

Have you met academics?

Just kidding ;), but by exaggerating, not by lying. (Also I know for a fact that you have met academics). Of course any compsci academic can pick-up HTML and JSON in half an hour of time, but it's not mad to imagine that some of them never were interested in actual web technologies (working on RDF can be done from a purely theoretical point of view) and seeing this sentences on HTML and JSON may be a useful information to them, just as mentioning a practical use-case at the end of a paper's abstract can be.


There are a lot of non-developer types who need to deal with metadata and for whom the mechanics of websites are irrelevant, e.g. scientific researchers or art historians or statisticians.

Librarians invented metadata not computer scientists.


I'm pretty sure philosophers invented metadata, even though they did not give it that name. The question how to distinguish between the properties an object has and what properties we attach to it, is sort of central in epistemology. Higher category theorists sometimes distinguish between stuff, structure and property http://nlab.mathforge.org/nlab/show/stuff,+structure,+proper..., which makes the definition of forgetful functors more precise. Conversly attaching structure or properties are (not nescessarily unique) adjoint functors to those forgetful ones. In mathematics this comes up for example if you consider the category of abelian groups from which there is both a forgetful functor to the category of groups and to the category of sets, but the general idea should be applicable to metadata attached to text aswell.


Aristotelian distinction between essential and accidental properties is probably what yr thinking about here, but that's not really the same distinction as the one between data and metadata.


There is also a really interesting Biomedical toolkit for SWI-prolog, which if IIRC uses or integrates with the semantic capabilities (for ontologies etc), although it was a while since I looked at it, so might recall wrong:

* BlipKit - Biomedical Logic Programming : http://www.blipkit.org


Is semantic web tech being reliably employed to solve any big problems? (RDF, RDFa, OWL, SPARQL, triple stores, graph dbs...?) Is it fast?


Large-scale, complex information integration at big companies every day. The forgotten "V" of Big Data, Variety, i.e., schema complexity, is the sweet spot of semantic technology.


It is not at all fast.

I don't even think it's computationally possible for SPARQL to be fast.


SPARQL is PSPACE-complete. Worst case complexity and "fast in practice" aren't really the same thing at all. I suspect average case complexity for SPARQL is much better, which is backed up by several reasonably peformant implementations.


What are these reasonably performant implementations? Have you tried them with a billion edges?

Also, if your data store is "fast in practice" but has worst cases that are PSPACE-complete, how do you prevent worst-case queries from DOSing it?


Well, I'm a vendor in the space so I like my implementation: http://stardog.com/ -- and yes we've "tried it" with 10s of billions of edges.

Worst cases are prevented from DOSing by having query management features like auto-killing queries that run too long, etc.


It is always nice to see Prolog on the news. SWI was the choosen implementation at my university back in the mid-90's.


SPARQL which is the standard query language for RDF is based on Prolog. So why go back to Prolog?


Prolog is a full programming language, SPARQL is only a query language.


SPARQL is similar to non-recursive Datalogˆnot, but that's a subset of full Prolog. SPARQL is a query language, not a full-on programming language as Prolog is.


True. Most notably I badly miss the ability to encapsulate queries in named "functions". That is one of the things I really like in prolog, since it enables to quickly raise one's level of abstraction, by building up a "language" of facts and rules.

If anyone knows a way to do something similar in SPARQL, I'm highly interested to know.


You can do this with user-defined rules in some RDF databases. Stardog(.com) supports it nicely.


Interesting, thanks for the hint!


Is it possible to port it to mercury?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: