Neo4j Query Performance Issue - performance

I have some performance issue using a specific Cypher Command.
I look for R nodes not directly connected to a specific set of nodes of type I (Here, nodes with index field at "79" and "4") and I want to maximize the field "score" :
MATCH (r:R), (i0:I { index:"79" }), (i1:I { index:"4" })
WHERE NOT r--i0 AND NOT r--i1
RETURN r.index
ORDER BY r.score DESC
LIMIT 5
The query is executed generally in 1250ms.
If I remove the ORDER BY clause, the request time goes down to 130ms.
The order clause iterates on nearly 3300 elements.
Any idea how I can speed up that request ? I am sure there is a way to use another syntax to perform this search.

I think it is normal, by removing the ORDER BY, he will return you the 5 first nodes he can match.
By adding the ORDER BY, it forces to load all possible matching nodes, depending of the amount of "R" nodes the time will increase.
Now :
Did you "profiled" your query with PROFILE
do you have indexes/constraints on I:index ?
Can you change slightly your query to :
MATCH (r:R), (i0:I { index:"79" }), (i1:I { index:"4" })
WHERE NOT EXISTS((r)--(i0))
AND NOT EXISTS((r)--(i1))
RETURN r.index
ORDER BY r.score DESC
LIMIT 5

Which version do you use? try to update to the latest one, also please share your visual query plan by prefixing your query with `PROFILE``
Change it to:
MATCH (i0:I { index:"79" }), (i1:I { index:"4" })
MATCH (r:R)
WHERE NOT r--i0 AND NOT r--i1
WITH r
ORDER BY r.score DESC
LIMIT 5
RETURN r.index

Related

ArangoDb - How to count number of filtered results before limiting them

db.query(aql `
FOR doc IN ${collection}
FILTER REGEX_TEST(CONCAT(VALUES(doc, true)), ${queryStr}, true)
SORT doc[${sortBy}] ${dir}
LIMIT ${start, count}
RETURN doc._key
`)
.then(cursor => {
cb(cursor._result)
}, err => console.log(err))
I have above AQL query,
I want to count total nuber of filtered results before limiting the results per page (For Pagination Purpose)
I think issue is similar to this MySQL - How to count rows before pagination?, Find total number of results in mySQL query with offset+limit
want to do in ArangoDb with AQL
and part of solution may be this How to count number of elements with AQL?
So, What is the efficient/best solution for my requirement with AQL ?
You can set the flag fullCount in the options for creating the cursor to true. Then the result will have an extra attribute with the sub-attributes stats and fullCount.
You then can get the the fullCount-attribute via cursor.extra.stats.fullCount. This attribute contains the number of documents in the result before the last LIMIT in the query was applied. see HTTP documentation
In addition, you should use the explain feature to analyse your query. In your case, your query will always make a full collection scan, thus won't scale well.
update
I added the fullCount flag to your code. Keep in mind, that the fullCount attribute only appears if the number of results before LIMIT is higher then the results after.
db.query(aql `
FOR doc IN ${collection}
FILTER REGEX_TEST(CONCAT(VALUES(doc, true)), ${queryStr}, true)
SORT doc[${sortBy}] ${dir}
LIMIT ${start, count}
RETURN {family: doc.family, group: doc.group} `, {count:true, options:{fullCount:true} })
.then(cursor => { console.log(cursor) }, err => console.log(err))

How can I descending sort a grouping based on the count of the reduction array in rethinkdb

Importing this dataset as a table:
https://data.cityofnewyork.us/Housing-Development/Registration-Contacts/feu5-w2e2#revert
I use the following query to perform an aggregation and then attempt to sort in descending order based on the reduction field. My intention is sort based on the count of that field or to have the aggregation create a second field called count and sort the grouping results in descending order of the reduction array count or length. How can this be done in rethinkdb?
query:
r.table("contacts").filter({"Type": "Agent","ContactDescription" : "CONDO"}).hasFields("CorporationName").group("CorporationName").ungroup().orderBy(r.desc('reduction'))
I don't quite understand what you're going for, but does this do what you want? If not, what do you want to be different in the output?
r.table("contacts")
.filter({"Type": "Agent","ContactDescription" : "CONDO"})
.hasFields("CorporationName")
.group("CorporationName")
.ungroup()
.merge(function(row){ return {count: row('reduction').count()}; })
.orderBy(r.desc('count'))
You are almost there:
r.table("contacts").filter({"Type": "Agent","ContactDescription" : "CONDO"}).hasFields("CorporationName").group("CorporationName").count().ungroup().orderBy(r.desc('reduction'))
See that .count()? That is a map-reduce operation to get the count of each group.
I haven't tested the query on your dataset. Please comment in case you had problems with it.
EDIT:
If you want to add a count field and preserve the original document, you need to use map and reduce. In your case, it should be something like:
r.table("contacts").filter({"Type": "Agent","ContactDescription" : "CONDO"})
.hasFields("CorporationName")
.group("CorporationName")
.map(r.row.merge({count:1}))
.reduce(function(left, right){
return {
count: left('count').add(right('count')),
<YOUR_OTHER_FIELDS>: left('<YOUR_OTHER_FIELDS>'),
...
};
})
.ungroup().orderBy(r.desc(r.row('reduction')('count')))
EDIT:
I am not sure if this can do the trick, but it is worth a try:
.reduce(function(left, right){
return left.merge({count: left('count').add(right('count'))})
})

Order by random in RethinkDB

I want to order documents randomly in RethinkDB. The reason for this is that I return n groups of documents and each group must appear in order in the results (so all documents belonging to a group should be placed together); and I need to randomly pick a document, belonging to the first group in the results (you don't know which is the first group in the results - the first ones could be empty, so no documents are retrieved for them).
The solution I found to this is to randomly order each of the groups before concat-ing to the result, then always pick the first document from the results (as it will be random). But I'm having a hard time ordering these groups randomly. Would appreciate any hint or even a better solution if there is one.
If you want to order a selection of documents randomly you can just use .orderBy and return a random number using r.random.
r.db('test').table('table')
.orderBy(function (row) { return r.random(); })
If these document are in a group and you want to randomize them inside the group, you can just call orderBy after the group statement.
r.db('test').table('table')
.groupBy('property')
.orderBy(function (row) { return r.random(); })
If you want to randomize the order of the groups, you can just call orderBy after calling .ungroup
r.db('test').table('table')
.groupBy('property')
.ungroup()
.orderBy(function (row) { return r.random(); })
The accepted answer here should not be possible, as John mentioned the sorting function must be deterministic, which r.random() is not.
The r.sample() function could be used to return a random order of the elements:
If the sequence has less than the requested number of elements (i.e., calling sample(10) on a sequence with only five elements), sample will return the entire sequence in a random order.
So, count the number of elements you have, and set that number as the sample number, and you'll get a randomized response.
Example:
var res = r.db("population").table("europeans")
.filter(function(row) {
return row('age').gt(18)
});
var num = res.count();
res.sample(num)
I'm not getting this to work. I tried to sort an table randomly and I'm getting the following error:
e: Sorting by a non-deterministic function is not supported in:
r.db("db").table("table").orderBy(function(var_33) { return r.random(); })
Also I have read in the rethink documentation that this is not supported. This is from the rethinkdb orderBy documentation:
Sorting functions passed to orderBy must be deterministic. You cannot, for instance, order rows using the random command. Using a non-deterministic function with orderBy will raise a ReqlQueryLogicError.
Any suggestions on how to get this to work?
One simple solution would be to give each document a random number:
r.db('db').table('table')
.merge(doc => ({
random: r.random(1, 10)
})
.orderBy('random')

mongoDB geoNear command with count

I am using the geoNear commang with mongoid in order to retrive a document collection ordered by distance. I need the distance for each document in the collection which is why I am having to resort to the geoNear command.
Given the following command:
category_ids = ["list", "of", "ids"]
cmd = Hash.new
cmd[:geoNear] = :poi
cmd[:near] = [params[:location][:x], params[:location][:y]]
cmd[:query] = {
"$or" => [
{primary_category_id: {"$in" => category_ids}},
{category_ids: {"$in" => category_ids}}
]
}
cmd[:spherical] = true
cmd[:num] = num
res = Poi.collection.database.command cmd
My problem is that I require the total number of results in the collection. Sure I could just run another query that just counts the number of items that satisfy the query part of the command, however that would be pretty inefficient and also not very extendible as every change I make in the command would have to be reflected in the count query. Just adding a maxDistance would land me in a whole heap of trouble.
Another option would be to go with find and calculate the distance manually but again I would like to avoid that.
So my question is there a clever way of getting the number of documents returned by the command (minus the num) without having to run a separate query or having to calculate the distance manually and go with find.
You can use facet for the same after geoNear use facet one will project the documents and in other you can use group by _id null and use the count in group to count the total number of documents.

How can this query be improved?

I am using LINQ to write a query - one query shows all active customers , and another shows all active as well as inactive customers.
if(showall)
{
var prod = Dataclass.Customers.Where(multiple factors ) (all inactive + active)
}
else
{
var prod = Dataclass.Customers.Where(multiple factors & active=true) (only active)
}
Can I do this using only one query? The issue is that, multiple factors are repeated in both the queries
thanks
var customers = Dataclass.Customers.Where(multiple factors);
var activeCust = customers.Where(x => x.active);
I really don't understand the question either. I wouldn't want to make this a one-liner because it would make the code unreadable
I'm assuming you are trying to minimze the number of roundtrips?
If "multiple factors" is the same, you can just filter for active users after your first query:
var onlyActive = prod.Where(p => p.active == true);
Wouldn't you just use your first query to return all customers?? If not you'd be returning the active users twice.
Options I'd consider
Bring all customers once, order by 'status' column so you can easily split them into two sets
Focus on minimizing DB roundtrips. Whatever you do in the front end costs an order of magnitude less than going to the DB.
Revise user requirements. For ex. consider paging on results - it's unlikely that end user will need all customers at once.

Resources