How to make ORDER by faster in cypher?

How to make ORDER by faster in cypher? - performance

PART-I:
I have a lucene index on property a1 of a node n, and I have a cypher with
ORDER BY n.a1 DESC
Will it take advantage of the lucene index while sorting the results?
PART-II:
Lets assume i have similar indexes on a1, a2, a3...aN(individually), and I have a cypher with
ORDER BY n.a1, n.a2 DESC, n.a3... n.aN DESC
Will it take advantage of the indexes or, do i have to define some kind of a multi field index separately for this particular combination of the fields and asc/desc ?

Part I.
No. From the Java API you can add Lucene sort query objects.
Part II
No, see above.
The sorting happens without using any indexes just the results that are part of your query.
The index is only used to lookup nodes for starting points.

Related

Elastic search calculation with data from different indexes

Good day, everyone. I have a lit bit strange case of using elastic search for me.
There are two different indexes, each index contain one data type.
First type contains next important for this case data:
keyword (text,keyword),
URL (text,keyword)
position (number).
Second type contains next data fields:
keyword (text,keyword)
numberValue (number).
I need to do next things:
1.Group data from the first ind by URL
2.For each object in group calculate new metric (metric A) by next simple formula: position*numberValue*Param
3.For each groups calculate sum of elements metric A we have calculated on stage 1
4.Order by desc result groups by sums we have calculated on stage 3
5.Take some interval of result groups.
Param - param, i need to set for calculation, this is not in elastic.
That is not difficult algorithm, but data in different indices, and i don`t know how to do it fast, and i prefer to do it on elastic search level.
I don`t know how to make effective data search or pipeline of data processing which can help me to implement this case.
I use ES version 6.2.3 if it is important.
Give me some advice, please, how can i implement this algorithm.

By reading 2. you seem to assume keyword is some sort of primary key. Elasticsearch is not an RDB and can only reason over one document at a time, so unless numberValue and position are (indexed) fields of the same document you can't combine them.
The rest of the items seem to be possible to achieve with the help of Aggregation

Sort by sum of multiple fields

I'm using Lucene.Net to index and search. My data contains some numeric fields.
How can I sort search result by sum/multiplying of mulitple numeric fields?

The simplest solution is to index an extra field with the calculated value then sort by that.
This is a very common technique in "no-sql" stores. ie denormalise and store whatever extra values are needed to optimise query time performance/capabilities.

An effective way to lookup duplicate nodes in Neo4j 1.8?

I'm trying to programmatically locate all duplicate nodes in a Neo4j 1.8 database (using Neo4j 1.8).
The nodes that need examination all have a (non-indexed) property externalId for which I want to find duplicates of. This is the Cypher query I've got:
START n=node(*), dup=node(*) WHERE
HAS(n.externalId) AND HAS(dup.externalId) AND
n.externalId=dup.externalId AND
ID(n) < ID(dup)
RETURN dup
There are less than 10K nodes in the data and less than 1K nodes with an externalId.
The query above is working but seems to perform badly. Is there a less memory consuming way to do this?

Try this query:
START n=node(*)
WHERE HAS(n.externalId)
WITH n.externalId AS extId, COLLECT(n) AS cn
WHERE LENGTH(cn) > 1
RETURN extId, cn;
It avoids taking the Cartesian product of your nodes. It finds the distinct externalId values, collects all the nodes with the same id, and then filters out the non-duplicated ids. Each row in the result will contain an externalId and a collection of the duplicate nodes with that id.

The start clause consists of a full graph scan, then assembling a cartesian product of the entire set of nodes (10k * 10k = 100m pairs to start from), and then narrows that very large list down based on criteria in the where clause. (Maybe there are cypher optimizations here? I'm not sure)
I think adding an index on externalId would be a clear win and may provide enough of a performance gain for now, but you could also look at finding duplicates in a different way, perhaps something like this:
START n=node(*)
WHERE HAS(n.externalId)
WITH n
ORDER BY ID(n) ASC
WITH count(*) AS occurrences, n.externalId AS externalId, collect(ID(n)) AS ids
WHERE occurrences > 1
RETURN externalId, TAIL(ids)

Paging in Elasticsearch when results have equal scores

Is it possible to implement reliable paging of elasticsearch search results if multiple documents have equal scores?
I'm experimenting with custom scoring in elasticsearch. Many of the scoring expressions I try yield result sets where many documents have equal scores. They seem to come in the same order each time I try, but can it be guaranteed?
AFAIU it can't, especially not if there is more than one shard in a cluster. Documents with equal score wrt. a given elasticsearch query are returned in random, non-deterministic order that can change between invocations of the same query, even if the underlying database does not change (and therefore paging is unreliable) unless one of the following holds:
I use function_score to guarantee that the score is unique for each document (e.g. by using a unique number field).
I use sort and guarantee that the sorting defines a total order (e.g. by using a unique field as fallback if everything else is equal).
Can anyone confirm (and maybe point at some reference)?
Does this change if I know that there is only one primary shard without any replicas (see other, similar querstion: Inconsistent ordering of results across primary /replica for documents with equivalent score) ? E.g. if I guarantee that there is one shard AND there is no change in the database between two invocations of the same query then that query will return results in the same order?
What are other alternatives (if any)?

I ended up using additional sort in cases where equal scores are likely to happen - for example searching by product category. This additional sort could be id, creation date or similar. The setup is 2 servers, 3 shards and 1 replica.

Neo4j Spatial order by distance

I'm currently using Spatial for my queries as follows:
START b=node:LocationIndex('withinDistance:[70.67,12.998,6.0]')
RETURN b
ORDER BY b.score
B is an entity that has a score and I'd like to order by this score, but I found a case in which, all the entities with score 0 were not ordered by distance. I know Spatial automatically orders by distance, but once I force the order by another field, I lose this order.
Is there any way of forcing this order as a second order field like:
START b=node:LocationIndex('withinDistance:[70.67,12.998,6.0]')
RETURN b
ORDER BY b.score, ?distance?

Unfortunately in the current spatial plugin, there is no cypher support at all, so the distance function (or distance result) cannot be accessed by the ORDER BY.
As you already noticed, the withinDistance function in the index itself will return results ordered by distance. If you do not add an extra ORDER BY in the cypher query, the distance order should be maintained. However, when adding the extra ORDER BY, the original order is lost. It would be an interesting feature request to the cypher developers to maintain the original order for elements that are comparatively identical in the ORDER BY.
There is also a separate plan to develop spatial functions within cypher itself, and that will solve the problem the way you want. But there is not yet any information on a development or release schedule for this.
One additional option that might help you in a shorter time frame and is independent of the neo4j development plans themselves, is to add an order by extension to the spatial index query. Right now you are specifying the index query as 'withinDistance:[70.67,12.998,6.0]', but you could edit the Spatial Plugin code to support passing extra parameters to this query, and they could be an order by parameter. Then you would have complete control of the order.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to make ORDER by faster in cypher? - performance

Part I. No. From the Java API you can add Lucene sort query objects. Part II No, see above. The sorting happens without using any indexes just the results that are part of your query. The index is only used to lookup nodes for starting points.

Related

Elastic search calculation with data from different indexes

Sort by sum of multiple fields

An effective way to lookup duplicate nodes in Neo4j 1.8?

Paging in Elasticsearch when results have equal scores

Neo4j Spatial order by distance

Categories

Resources