sparql delete query optimization

sparql delete query optimization - insert

I have a query delete/insert that I'd like to optimize if possible. The query does delete/insert on up to 50 objects at a time. My Jmeter tests show that the DELETE clause takes 4 times longer in comparison to INSERT: delete takes around 3300 ms and insert takes about 860 ms. I'd like to improve the DELETE clause. I was thinking of using FILTER, but was told it would not scale well. Any recommendation is much appreciated.
What I have right now is:
DELETE {
?s ?p ?o.
?collection dc:identifier ?cid;
rdf:type ?ct;
rdf:li ?list.
?list rdf:first ?first;
rdf:rest ?rest.
}
WHERE
{
{ ?s dc:identifier "11111"^^xsd:int; ?p ?o. }
UNION { ?s dc:identifier "22222"^^xsd:int; ?p ?o.}
UNION {?s dc:identifier "33333"^^xsd:int; ?p ?o.}
UNION{} UNION{}.......
OPTIONAL{
?s dc:hasPart ?collection.
?collection dc:identifier ?cid;
rdf:type ?ct;
rdf:li ?list.
?list rdf:first ?first;
rdf:rest ?rest.
}
INSERT DATA
{
GRAPH <http://test.org/>
{.....}
GRAPH <http://test.org/>
{.....}
GRAPH....
}

Without having your data, or even knowing what triple store you're using, we can't really help much in optimization. It might just be that deletes are more expensive than insertions. That said, one thing that might help is to use values rather than unions in your where block. That is, instead of:
{ ?s dc:identifier "11111"^^xsd:int; ?p ?o. }
UNION { ?s dc:identifier "22222"^^xsd:int; ?p ?o.}
UNION {?s dc:identifier "33333"^^xsd:int; ?p ?o.}
UNION{} UNION{}.......
do:
values ?identifier { "11111"^^xsd:int "22222"^^xsd:int "33333"^^xsd:int "44444"^^xsd:int }
?s dc:identifier ?identifier ; ?p ? o

Related

Sort the Inserted data on basis of Time in Ontology Model

I have an ontology Model. I am inserting integer data in one of the class instance through Sparql update. The model is storing the data randomly with out any order. Now when I want to extract this data through Sparql Query I want it in order of the time of insertion. How could i achieve this? Any idea?
P.S: My ontology Model is made in Protege software.
My Query for inserting Data is below one.
PREFIX test:<http://www.semanticweb.org/muhammad/ontologies/2017/2/untitled-ontology-14#>
INSERT {
?KPI_Variables test:hasValue_ROB1 10
} WHERE {
?KPI_Variables test:hasValue_ROB1 ?Newvalue
FILTER(?KPI_Variables= test:Actual_Production_Time)
}
And For Getting the data I am using the following Query:
PREFIX test:<http://www.semanticweb.org/muhammad/ontologies/2017/2/untitled-ontology-14#>
SELECT ?KPI_Variables ?Newvalue WHERE {
?KPI_Variables test:hasValue_ROB1 ?Newvalue
FILTER(?KPI_Variables = test:Actual_Production_Time)
} LIMIT 25

Data in RDF is simply triples. There's no notion of when a triple is added to a graph. If you want that kind of information, you'll need to make it explicit in your data model. SPARQL does include a now function that lets you get a timestamp for when a query is run. That means that you could do something like this:
prefix : <urn:ex:>
insert {
[] :hasSubject ?s ;
:hasPredicate ?p ;
:hasObject ?o ;
:hasTime ?now .
}
where {
#-- Fake a couple of triples
values (?s ?p ?o) {
(:a :p :b)
(:c :q :d)
}
#-- Get the current time
bind (now() as ?now)
}
Now your graph contains data like:
#prefix : <urn:ex:> .
[ :hasObject :d ;
:hasPredicate :q ;
:hasSubject :c ;
:hasTime "2017-04-28T13:32:11.482+00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>
] .
[ :hasObject :b ;
:hasPredicate :p ;
:hasSubject :a ;
:hasTime "2017-04-28T13:32:11.482+00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>
] .
Which you can query like:
prefix : <urn:ex:>
select ?s ?p ?o ?time {
[] :hasSubject ?s ;
:hasPredicate ?p ;
:hasObject ?o ;
:hasTime ?time
}
order by ?time
s,p,o,time
urn:ex:c,urn:ex:q,urn:ex:d,2017-04-28T13:32:11.482+00:00
urn:ex:a,urn:ex:p,urn:ex:b,2017-04-28T13:32:11.482+00:00
Once you've inserted some things at different times, you'd have different time values, so sorting would be meaningful. I'd suggest that you don't just reify the triples like I did (and if you are going to go with a straightfoward reification, you should probably use the standard vocabulary for it), but rather have some meaningful structure that actually has timestamps as part of it.

SPARQL DBpedia filter out specific results

I am working on a small part where I receive all types of a resource. The thing is: I don't want to have all types, only the "http://dbpedia.org/ontology"-types. How do I filter them within a SPARQL query? I don't really care as long I receive only the ontologies.
In this query I need only the dbpedia-Ontologies "Country", "Location" "PopulatedPlace" and "Place".
SPARQL endpoint: http://de.dbpedia.org/sparql
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?type WHERE {
?i rdfs:label "Deutschland"#de ; a ?type .
}
I set up a FILTER which filters out the Ontologies. But that's not the solution as it is static and only works for this example. It also duplicates. But that's a minor problem.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?type WHERE {
?i rdfs:label "Deutschland"#de ; a ?type .
FILTER (?type = <http://dbpedia.org/ontology/Country> ||
?type = <http://dbpedia.org/ontology/PopulatedPlace> ||
?type = <http://dbpedia.org/ontology/Place> ||
?type = <http://dbpedia.org/ontology/Location>)
}
Need some suggestions or help. Thx in advance.

Okay i thought i shouldn't have asked, but this took me some time to realize... There is a function to filter when a string starts with the same letters...
strstarts
Solution. Hope i could at least help someone.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?type WHERE {
?i rdfs:label "Deutschland"#de ; a ?type .
FILTER (strstarts(str(?type), "http://dbpedia.org/ontology/"))
}

Well what you did is fine but probably not the correct way of doing it. What you're looking for are called owl classes. So you just need to check if the type that you're looking for is an owl:Class or not.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT ?type WHERE {
?i rdfs:label "Deutschland"#de ; a ?type .
?type a owl:Class .
}

SPARQL selecting MAX value of a counter

I'm new to sparql and trying to fegure out what is the best way to select the max value of a counter of another query without creating a new table, only using sub-queries. I'm working on a relatively small dataset so the computation time is not a problem.
SELECT ?p1 ?c1 (MAX(?cnt1) AS ?maxc)
WHERE{
{
SELECT ?p1 ?c1 (COUNT(?s1) AS ?cnt1)
WHERE {
?s1 ?p1 ?o1;
a ?c1.
}Group by ?p1 ?c1
#ORDER BY ?p1 DESC(?cnt1)
}
}GROUP BY ?p1
so I'm expecting to get a row for every value of ?p1 with the max ?cnt1 the suitable ?c1
I'm pretty sure that is the way to do that but for some reason it causes my endpoint to crash. the inner query works fine and when grouping by both ?p1 and
?c1 it produces just one line and the max value is empty.
Thanks,
Omri

Your query will crash unless you are grouping by both ?p1 and ?c1.
When grouping, all variables appearing in SELECT must ether appear in the GROUP or in an aggregation function (MAX, COUNT, etc.).
The following query will give you the maximum value of your counter, but without the corresponding ?p1 ?c1. To have those, you will likely need another sub-query with a FILTER in it...
SELECT (MAX(?cnt1) AS ?maxc)
WHERE{
{
SELECT ?p1 ?c1 (COUNT(?s1) AS ?cnt1)
WHERE {
?s1 ?p1 ?o1;
a ?c1.
}Group by ?p1 ?c1
}
}

Accelerate SPARQL query - filtering out rows which contain

I am currently working with SPARQL (and TopBraidComposer). I have a query which only brings back matching literals, and then filters out the literals based on not wanting certain categories.
Currently, this query is taking a long time to run, and I think it is my FILTER which is causing the delay. I was wondering if someone would have a better and faster way of filtering out (NOT returning) rows which contain a set of key words (ex. cat1, cat2, cat3).
As of now, I am using;
SELECT ?category
WHERE {
?s1 ?p ?category .
?s2 ?p ?category .
FILTER (str(?category) != "Cat1") .
FILTER (str(?category) != "Cat2") .
FILTER (str(?category) != "Cat3") .
FILTER (str(?category) != "Cat4") .
FILTER (str(?category) != "Cat6") .
FILTER (str(?category) != "Cat8") .
}

It's not clear how much you've trimmed down your example, but the code you presented is doing more work than it needs to.
SELECT ?category
WHERE {
?s1 ?p ?category .
?s2 ?p ?category .
FILTER (str(?category) != "Cat1") .
FILTER (str(?category) != "Cat2") .
FILTER (str(?category) != "Cat3") .
FILTER (str(?category) != "Cat4") .
FILTER (str(?category) != "Cat6") .
FILTER (str(?category) != "Cat8") .
}
Suppose your data has
:a :p "Cat0" .
:b :p "Cat0" .
Then the bindings for ?s1, ?s2, ?p? and ?category can be
?s1 ?s2 ?p ?category
--------------------
:a :a :p "Cat0"
:a :b :p "Cat0"
:b :b :p "Cat0"
:b :a :p "Cat0"
That's four ways to select "Cat0". You said that you want literals, but right now you're hitting every kind of ?category and applying str to it multiple times. You might do this instead:
SELECT DISTINCT ?category
WHERE {
?s ?p ?category .
FILTER( isLiteral(?category) &&
!(str(?category) in ("Cat1", "Cat2", "Cat3",
"Cat4", "Cat6", "Cat8")) )
}

How to improve slow query using FILTER (?id IN ( … ) )

I just started using SPARQL, and I'm trying to create a query that retrieves all information where an id has one of a number of predefined values? I have something like this :
SELECT *
WHERE {
?id ?property ?value .
?value a ?type .
?type rdfs:label ?type_value .
FILTER ( ?id IN (<id1>,<idi>,<idn> ) )
}
The problem I've been running into is the query gets really slow when the list of ids gets increasingly large. I intuitively think there's a better way to write this query, but I'm having trouble figuring out how to create this kind of query. I'm thinking along the lines of something like this:
SELECT *
WHERE {
<id_value> ?property ?value .
?value a ?type .
?type rdfs:label ?type_value .
}
where it retrieves all values only for the multiple ids, eliminating the filtering of results at the end, but I can't figure out how to write the query so that it returns all values for an id_value. when I add another line for another id_value, it filters out other values I'm expecting, so I think I'm writing it incorrectly. How can I do this?

Using values, you can write:
SELECT * WHERE {
values ?id { <id1> <idi> <idn> }
?id ?property ?value .
?value a ?type .
?type rdfs:label ?type_value .
}
The SPARQL 1.1 says about values:
Data can be directly written in a graph pattern or added to a query
using VALUES. VALUES provides inline data as a solution sequence which
are combined with the results of query evaluation by a join operation.
It can be used by an application to provide specific requirements on
query results and also by SPARQL query engine implementations that
provide federated query through the SERVICE keyword to send a more
constrained query to a remote query service.
One of the examples is actually very close to what you've already got:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX : <http://example.org/book/>
PREFIX ns: <http://example.org/ns#>
SELECT ?book ?title ?price
{
VALUES ?book { :book1 :book3 }
?book dc:title ?title ;
ns:price ?price .
}

Try using the VALUES clause instead like so:
SELECT *
WHERE {
VALUES ?id { ...list of ids... }
?id ?property ?value .
?value a ?type .
?type rdfs:label ?type_value .
}
This should hopefully be much more efficient that using the FILTER approach.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

sparql delete query optimization - insert

Related

Sort the Inserted data on basis of Time in Ontology Model

SPARQL DBpedia filter out specific results

SPARQL selecting MAX value of a counter

Accelerate SPARQL query - filtering out rows which contain

How to improve slow query using FILTER (?id IN ( … ) )

Categories

Resources