For anyone interested, I did my Masters thesis on SWI-prolog as a Semantic querying tool, integrating it with the Eclipse RCP platform (Bioclipse.net in particular), and compared its querying performance with Java based Jena SPARQL parser for some typical tasks in cheminformatics.
In this particular task, SWI-Prolog totally knocked out Jena, and it was also more amenable to some heuristic optimiziations, where we the running time really became infinitesimal in comparison to other tools.
Read through the paper quickly, seems like a nice representation and solution of the problem. I have a couple of questions if you wouldn't mind.
You say this in the paper:
> It is an interesting observation that writing the Prolog query on the simpler form (Figure 18) made it amenable to heuristic optimization by sorting the values searched for, while this was not possible in the longer Prolog program
I'm afraid I didn't read carefully and will go back, but could you clarify this a bit? I didn't understand about the longer vs. the shorter prolog code. Would this optimization be required always get better performance than Jena or Pellet?
And then this:
> Additionally, a drawback of SWI-Prolog speci?cally, against Jena and Pellet, is that since it is not written in Java, it is not as portable (i.e. the same code can not easily be executed) to di?erent platforms such as Mac, Windows, Linux etc. Instead the source code has to be compiled separately for each platform. This also has the result that the SWI-Prolog Bioclipse integration plugin will not be as portable as Bioclipse itself.
Really, though, Bioclipse is dependent on the portabilty of Eclipse which probably doesn't support any more platforms than SWI Prolog (and most probably fewer than SWI Prolog), so I wouldn't really see that as a limitation. I would think you could provide the binaries in the distribution itself.
I might not have used the optimal wording there. The important difference between the two prolog implementations is that the "longer" version is implemented by (recursive) iteration over a list of [peak] values (and comparing it to a reference list), whereas in the shorter one, the reference values are specified as separate conditions in the prolog rule, combined with logical AND.
The separate conditions enables "shortcut" of the match-testing, in a way that the recursive list-parsing did not: As soon as one of the conditions is not met, the current spectrum will be rejected and the backtracking will go on with the next item, while with the recursive list-parsing version each item of the list of values will be compared to the reference [value-]list regardless of whether the current spectra has already been rejected or not.
The "shorter" prolog version, with separate conditions, thus enables to order the conditions according so that statistically rare peak values come first (thus, "heuristically"), so that a spectrum can be rejected as soon as possible.
Being a heuristic solution, the performance would of course depend on the prior knowledge of the values in the data.
But as can be seen from figure 15, both prolog versions beat Jena and Pellet by a large margin, though the "shorter" version did so well that it is even hard to notice an increase of the running time linear to the number of triples in the triplestore, in the diagram.
Regardgint the portability, I tend to agree. To the end user, it is indeed possible to provide a portable solution. It requires more work on the developer side, to include multiple binaries and setting up the required Eclipse plugins/fragments to support that.
How does the closed world assumption of Prolog (if it's not explicitly specified it is assumed false) mesh with the open world assumption (if it is not explicitly specified it is unknown) of RDF?
I used to use the swi-prolog semantic web tools heavily. Very good stuff.
The cleopatria semantic web server is interesting, but I have just played with it. The core RDF storage and inference get libraries were solid and nice to use.
The opening paragraph is structurally identical to "buy our nuclear power plant, it'll generate a billion terawatts, oh and it also comes with this bike shed". Prolog (the submission says) handles the gruesome problem of dealing with RDF, oh and it also generates HTML pages and JSON!
That pattern's a red flag whenever I see it. Like an ostensible proof of P!=NP that begins with a 30 page history written for laymen. Who is that paragraph aimed at? Is there really a sizeable population casually using RDF and Prolog but losing sleep over HTML and JSON?
> a population casually using RDF and Prolog but losing sleep over HTML and JSON?
Have you met academics?
Just kidding ;), but by exaggerating, not by lying. (Also I know for a fact that you have met academics). Of course any compsci academic can pick-up HTML and JSON in half an hour of time, but it's not mad to imagine that some of them never were interested in actual web technologies (working on RDF can be done from a purely theoretical point of view) and seeing this sentences on HTML and JSON may be a useful information to them, just as mentioning a practical use-case at the end of a paper's abstract can be.
There are a lot of non-developer types who need to deal with metadata and for whom the mechanics of websites are irrelevant, e.g. scientific researchers or art historians or statisticians.
Librarians invented metadata not computer scientists.
I'm pretty sure philosophers invented metadata, even though they did not give it that name. The question how to distinguish between the properties an object has and what properties we attach to it, is sort of central in epistemology. Higher category theorists sometimes distinguish between stuff, structure and property http://nlab.mathforge.org/nlab/show/stuff,+structure,+proper..., which makes the definition of forgetful functors more precise. Conversly attaching structure or properties are (not nescessarily unique) adjoint functors to those forgetful ones. In mathematics this comes up for example if you consider the category of abelian groups from which there is both a forgetful functor to the category of groups and to the category of sets, but the general idea should be applicable to metadata attached to text aswell.
Aristotelian distinction between essential and accidental properties is probably what yr thinking about here, but that's not really the same distinction as the one between data and metadata.
There is also a really interesting Biomedical toolkit for SWI-prolog, which if IIRC uses or integrates with the semantic capabilities (for ontologies etc), although it was a while since I looked at it, so might recall wrong:
Large-scale, complex information integration at big companies every day. The forgotten "V" of Big Data, Variety, i.e., schema complexity, is the sweet spot of semantic technology.
SPARQL is PSPACE-complete. Worst case complexity and "fast in practice" aren't really the same thing at all. I suspect average case complexity for SPARQL is much better, which is backed up by several reasonably peformant implementations.
SPARQL is similar to non-recursive Datalogˆnot, but that's a subset of full Prolog. SPARQL is a query language, not a full-on programming language as Prolog is.
True. Most notably I badly miss the ability to encapsulate queries in named "functions". That is one of the things I really like in prolog, since it enables to quickly raise one's level of abstraction, by building up a "language" of facts and rules.
If anyone knows a way to do something similar in SPARQL, I'm highly interested to know.
The title was "SWI-Prolog as a Semantic Web Tool for semantic querying in Bioclipse: Integration and performance benchmarking", and it is available for download here: https://www.researchgate.net/publication/50313589_SWI-Prolog...
In this particular task, SWI-Prolog totally knocked out Jena, and it was also more amenable to some heuristic optimiziations, where we the running time really became infinitesimal in comparison to other tools.