I'm doing a lot of research right now on Semantic Web and complex data models that represent relationships between individuals and organizations.
I knew a little semantic ontologies although I never understood what it was used if not make graphs.
I saw on university wiki that the language to question an ontology is the SPARQL (tell me if I'm wrong).
But recently I saw a company that had created a semantic ontology put it in the form of GraphQL that I did not know (https://diffuseur.datatourisme.gouv.fr/graphql/voyager/).
It seems to me that semantic ontologies are made to better find information, for example to make a chatbot (it's what I want to do), but here they transformed a semantic ontology into an API, is it right? To make a GraphQL, should I build first a semantic ontology?
Could you explain to me a little the difference between all this, honestly it's a little vague for me.
Context
Datatourisme is a platform that allows to publish (via the Producteur component) and to consume (via the Diffuseur component) POI-related open data.
It seems you have linked to a particular application developed over Diffuseur with a help of GraphQL Voyager. The application illustrates capabilities of the GraphQL API exposed by Diffuseur.
The API documentation is available here (in French):
datatourisme.frama.io/api
framagit.org/datatourisme/api
Problem
Datatourisme stores data in the RDF format (presumably using the Blazegraph
triplestore)
Datatourisme provides access via GraphQL (not SPARQL)
Why RDF
Partly due to some "schemaless", RDF is convenient in heterogeneous data integration tasks:
Tourism national ontology structuring in a common sharing format the whole tourism data extracted from different official French data bases: entertainment and events, natural and cultural sites, leisure and sports activities, touristic products, tours, accomodations, shops, restaurants.
RDF is semantic: in particular, RDF is self-describing.
SPARQL
SPARQL is W3C-standardized language for RDF querying. There were also proposed other RDF query languages.
BTW, it is possible to query non-RDF sources with SPARQL, e. g. defining R2RML mappings.
RDF self-descibeness and SPARQL standardness remove the need to create or to learn a new (shitty) API every day.
GraphQL
Similar to SPARQL, GraphQL allows to avoid multiple requests.
GraphQL allows to wrap different data sources of different kinds, but typically they are REST APIs.
As you can see, it is possible to wrap a SPARQL endpoint (there also exists HyperGraphQL).
Why GraphQL
Why Datatourisme have preferred GraphQL?
GraphQL is closer to developers and technologies they use en masse. In the past, JSON-LD had the same motivation (however, see my note about JSON-LD here).
As it seems, Diffuseur's GraphQL layer provides API keys support and prevents too complex SPARQL queries.
Are data still semantic
The answer depends on what you mean by semantic. There was an opinion that even relational model is quite semantical...
I'd answer affirmatively, if it's possible to extract e. g. the comment to the :rcs property with GraphQL (and the answer seems to be negative).
Conclusion
Answering your direct question:
it is not necessary (though possible) to create a semantic ontology first in order to use GraphQL;
it is not necessary (though possible) to use GraphQL after creating a semantic ontology.
Answering your indirect question:
probably you need a semantic ontology in order to build such chatbot;
probably you need something else in addition.
See also: How is the knowledge represented in Siri – is it an ontology or something else?
Update
In addition to HyperGraphQL, there are other interesting convergence projects:
in Stardog
in Topbraid
in Ontotext Platform
GraphQL and SPARQL are different languages for different purposes. SPARQL is a language to work with Triple stores, graph datasets, and RDF nodes. GraphQL is a API language, preferably for working with JSON structures. As for your specific case, I would recommend to clarify your goal on using AI in your application. If you require to apply a graph dataset in your application, perform more advance knowledge discovery like reasoning on dataset, then you may need a Semantic Web approach to apply SPARQL on top of your dataset. As you can see in the picture below, Semantic Web presents different layers to perform knowledge discovery, perform reasoning, by ontology design and RDF-izing datasets.
see here to read more.
If your AI application does not have such requirements and you can accomplish your data analysis using a JSON-based database, GraphQL is probably a good choice to create your API, as it is widely used by different Web and Mobile applications these days. In particular, it is used to share your data through different platforms and microservices. See here for more information.
Quickly, the differences are :
SPARQL (SPARQL Protocol and RDF Query Language) is a language dedicated to query RDF graph databases (CRUD and more). It is a standard within the Semantic Web tools and provided by W3C Recommendation.
GraphQL is a language created by Facebook and strongly ressembling JSON to communicate with APIs. It is a communication tool between clients and server endpoints. The request defines itself the structure of the answer. Its use is not limited to SQL or NoSQL or ...
"Graph" doesnt mean "a structure made of triples" as for the RDF.
Those are two different language for different applications.
A very important difference, which I didn't see mentioned in the previous answers, is that, while SPARQL is more powerful query language in general, it produces only tabular output, while GraphQL gives tree structures, which is important in some implementation cases.
Related
I'm new to openEHR and snomed. I'm wanting to store information pack definition for a tobacco summary. How do I go about storing the measurement units (grams, oz, number of cigarettes)? Is there a reference list of these in either of the standards?
Thanks
Your question should not be about storing, it should be about modeling with openEHR. Storage of openEHR data is a separated issue.
For modeling, you will need first to understand the information model, the structure, the datatypes, etc. You will find some types that might be useful in your case, for instance using a DV_COUNT for storing the number of (this is for counting, like number of cigarettes), that doesn't have units of measure since is a count. If you want to store volume or weight, the openEHR information model has DV_QUANTITY. For standard units, as Bert says, you can use UCUM. For non standard units, you might need to choose a different datatype since the recommendation for DV_QUANTITY.units is to use UCUM (Unified Code for Units of Measure).
When you have that figured our, you need to follow the openEHR methodology for modeling, using archetypes and templates. A template would be the final form of your structure that can be used in software. At that moment you can worry about storage.
Storing today is a solved problem. There are many solutions, using relational, document and mixed databases. My implementation, the EHRServer, uses pure relational approach. But you can create your own, just map the openEHR information model structures to your database of preference, starting from the datatypes.
And of course, start with the openEHR specs: https://www.openehr.org/programs/specification/workingbaseline
BTW, SNOMED doesn't play any role here, not sure why you mentioned that in the title. You need to understand the standards before trying to implement them.
OpenEhr has an own Unit list from which you should choose a unit in a DvQuantity, but since short time, in the specs, the newest version, is described that you must use a unit from the UCUM standard. Check the description for DataTypes in the specifications.
You can find the UCUM standard here. The link is published by the Regenstreif institute (the same institute which serves the LOINC standard), so it is stable.
http://unitsofmeasure.org/ucum.html
There is a Golang-UCUM-library:
https://github.com/BertVerhees/ucum
I have developed a tool that enables searching of an ontology I authored. It submits the searches as SPARQL queries.
I have received some feedback that my search implementation is all-or-none, or "binary". In other words, if a user's input doesn't exactly match a term in the ontology, they won't get any hit at all.
I have been asked to add some more flexible, or "advanced" search algorithms. Indexing and bag-of-words searching were suggested.
Can anyone give some examples of implementing search methods on an ontology that don't require a literal match?
FIrst of all, what kind of entities are you trying to match (literals, or string casts of URIs?), and what kind of SPARQL queries are you running now? Something like this?
?term ?predicate "user input" .
If you are searching across literals, you can make the search more flexible right off the bat by using case-insensitive regular expression filtering, although this will probably make your searches slower, and it won't catch cases where some of the word tokens are present but in a different order. In the following example, your should probably constrain the types of ?term and ?predicate first, or even filter on a string datatype on ?userInput
?term ?predicate ?someLiteral .
FILTER(regex(?someLiteral), "user input", "i"))
Several triplestores offer support for full-text searching and result scoring. These are often extensions to the SPARQL language.
For example, Virtuoso and some others offer a bif:contains predicate. Virtuoso also offers the faceted search web interface (plus a service, I think.) I have been pleased with the web-based full text search in Blazegraph and Stardog, but I can't say anything at this point about using them with a SPARQL query to get a score on a search pattern. Some (GraphDB) even support explicit integration with Lucene or Solr*, so you may be able to take advantage of their search languages.
Finally... are you using a library like the OWL API or RDF4J to access your ontology? If so, you could certainly save the relationships between your terms and any literals in a Java native data structure, and then directly use a fuzzy search component like Lucene to index each literal as a "document" and then search the user input across the index.
Why don't you post your ontology and give an example of a search you would like to peform in a non-binary way. I (or someone else) can try to show you a minimal implementation.
*Solr integration only appears to be offered in the commercially-licensed version of GraphDB
I am currently using Neo4j Python rest client and I would like to visualise the graph and be able to amend it, add new nodes relationships etc. Also I would like the changes in the neo4j database as well. Is that possible? Also can self-loops be visualised? I have read about D3.js and Neoclipse and Gephi in http://www.neo4j.org/develop/visualize but I am not sure which one to use.
Thanks in advance.
You can manipulate the graph in Neo4J using Cypher, in particular using a the REST API.
Any kind of tool that allows you to interface with Cypher is potentially able to do what you are asking: it is a matter to combine some Cypher queries with the GUI.
Said that, create the right visualization for what you are doing might be tricky and general approach might no satisfy your needs: while Neoclipse can let you manipulate nodes and links in Neo4J (for free) you might want to do in a particular way (for example restricting the choice of editing or the field/properties to be changed).
Linkurious offers a solution to do that as well, but it's a commercial license.
Other solutions such KeyLines, d3.js, sigmaJS let you personalize that experience: note that they will require to create the interface yourself, but the result will be much better in case of a specific product IMHO.
So value your time and requirements and go with the proper solution.
For more tools have a look at the Neo4J visualization page: http://www.neo4j.org/develop/visualize
About self loops:
that's a tricky bit and there is not a right way to do those - imagine a scenario with hunders of multi-selfloops.
Personally I would recommend to NOT draw them on the chart as link/edges, while representing them in some other ways: es. glyphs, notes, bubbles on the node...
I believe the only tool that allows this today is Neoclipse, but I don't think it's updated to use the Labels and Indexing features released in 2.0.
As such, your best bet will be using the Neo4j Browser to visualize and Cypher to mutate your graph. If you want richer functionality and want a fun project to hack on, it shouldn't be super hard to build a basic visualization for Neo that allows mutating the graph. I would have a look at sigma.js: http://linkurio.us/sigma-js-1-0-next-gen-graph-drawing-lib-web/
I have a ruby project where part of the operation is to select entities given user-specified constraints. So far, I've been hacking my own filter language, using regular expressions and specifying inclusion/exclusion based on the fields in the entities.
If you are interested in my current approach, here's an example: For instance, given this list of entities:
[{"type":"dog", "name":"joe"}, {"type":"dog", "name":"fuzz"}, {"type":"cat", "name":"meow"}]
A user could specify a filter like so:
{"filter":{
"type":{"included":["dog"] },
"name":{"excluded":["^f.*"] }
}}
Would match all dogs but exclude fuzz.
This is sort of working now. However, I am starting to require more sophisticated selection parameters. I am thinking that rather than continuing to hack on my own filter language, there might be a more general-purpose filter language I can just embed in my application? For instance, is there a parser that can in-app filter using a SQL where clause? Or are there some other general, simple filter languages that I'm not aware of? I would especially like to move away from regexps since I want to do range querying on numbers (like is entity["size"] < 50 ?)
It is a little bit of an extrapolation, but I think you may be looking for a search engine, or at least enough of one that you may as well use one just for the query language.
If so you might want to look at elasticsearch which does have Ruby client bindings, and could be a good fit for what you are trying to do. Especially if you want or need to express the data you want to search as JSON for use by client code, as that format is natively supported by the search engine.
The query language is quite expressive, and there are a variety of built-in and plugin tools available to explore and use it.
in the end, i ended up implementing a ruby dsl. it's easy, fun, and powerful.
Can you suggest some light weight fuzzy text search library?
What I want to do is to allow users to find correct data for search terms with typos.
I could use full-text search engines like Lucene, but I think it's an overkill.
Edit:
To make question more clear here is a main scenario for that library:
I have a large list of strings. I want to be able to search in this list (something like MSVS' intellisense) but it should be possible to filter this list by string which is not present in it but close enough to some string which is in the list.
Example:
Red
Green
Blue
When I type 'Gren' or 'Geen' in a text box, I want to see 'Green' in the result set.
Main language for indexed data will be English.
I think that Lucene is to heavy for that task.
Update:
I found one product matching my requirements. It's ShuffleText.
Do you know any alternatives?
Lucene is very scalable—which means its good for little applications too. You can create an index in memory very quickly if that's all you need.
For fuzzy searching, you really need to decide what algorithm you'd like to use. With information retrieval, I use an n-gram technique with Lucene successfully. But that's a special indexing technique, not a "library" in itself.
Without knowing more about your application, it won't be easy to recommend a suitable library. How much data are you searching? What format is the data? How often is the data updated?
I'm not sure how well Lucene is suited for fuzzy searching, the custom library would be better choice. For example, this search is done in Java and works pretty fast, but it is custom made for such task:
http://www.softcorporation.com/products/people/
Soundex is very 'English' in it's encoding - Daitch-Mokotoff works better for many names, especially European (Germanic) and Jewish names. In my UK-centric world, it's what I use.
Wiki here.
You didn't specify your development platform, but if its PHP then suggest you look at the ZEND Lucene lubrary :
http://ifacethoughts.net/2008/02/07/zend-brings-lucene-to-php/
http://framework.zend.com/manual/en/zend.search.lucene.html
As it LAMP its far lighter than Lucene on Java, and can easily be extended for other filetypes, provided you can find a conversion library or cmd line converter - there are lots of OSS solutions around to do this.
Try Walnutil - based on Lucene API - integrated to SQL Server and Oracle DBs . You can create any type of index and then use it. For simple search you can use some methods from walnutilsoft, for more complicated search cases you can use Lucene API. See web based example where was used indexes created from Walnutil Tools. Also you can see some code example written on Java and C# which you can use it for creating different type of search.
This tools is free.
http://www.walnutilsoft.com/
If you can choose to use a database, I recommend using PostgreSQL and its fuzzy string matching functions.
If you can use Ruby, I suggest looking into the amatch library.
#aku - links to working soundex libraries are right there at the bottom of the page.
As for Levenshtein distance, the Wikipedia article on that also has implementations listed at the bottom.
A powerful, lightweight solution is sphinx.
It's smaller then Lucene and it supports disambiguation.
It's written in c++, it's fast, battle-tested, has libraries for every env and it's used by large companies, like craigslists.org