Get neighboring results for an ElasticSearch query - elasticsearch

Is it possible to get nearby results for ElasticSearch query?
Example 1. If I have items, named as:
one
two
three
four
then search "three", ordered, for example, by name, ascending, should return something like (given number of neighbors is 1):
one
*three*
two
Example 2. I have query results and ID of an document in it, I want IDs of next and previous document. The order is set in query.

Related

Count largest duplicate in a list of cells in Google Sheets

I have a list of cells which contain values (A,B,C,D,E...). I'd like to count the largest duplicate.
I think about multiple MAX together with COUNTIF but that would be very long since my list of value has 60+ items
Example file: https://docs.google.com/spreadsheets/d/1ZUnSokdPsEPVJw9S8DfGHvJE1L1Ng8litbCXeA9QWzI/edit?usp=sharing
I'm new at this, but I think the following formula does what you've asked.
=INDEX(A2:G2,MODE(MATCH(A2:G2,A2:G2,0)))
Your sheet is view only, so I can't place it there.
Replace my A2:G2 range in the formula with the range with your long list (either column or row) of values.
This will search the range for the amount of the most duplicates, and return the first value that is duplicated the most.
Note: it doesn't flag if there are other values that have an equal number of duplciates as the first value.
Here is a sample sheet, with examples using values in a row, or in a column:
https://docs.google.com/spreadsheets/d/1mIxTKXjED9kpqAGV55pf8Yjq102Sc5Ymn_yY6qh3XmE/edit?usp=sharing
Let me know if this doesn't achieve what you want.

Elasticsearch field collapsing with minimum inner hits count

When using field collapsing, is there a way to filter out results which inner hits count is less than a give threshold?
In an hotel database I want to find hotels with three cheapest available rooms cheaper than X. Each document has a hotel_id, room_id and price. If the hotel has not 3 available rooms cheaper than X, I cannot do anything with it.
So I do a search for rooms cheaper than X, sorted by price, collapsing with hotel_id, but I want to see only groups that contains 3 rooms in inner hits, otherwise that hotel result is unusable. With the size parameter I define a maximum, but I cannot find a way to define a minimum.
Aggregation is not an option due performance constraints.

Find out if any aggregations will have buckets without doing aggregations

Aggregations in Elasticsearch are pretty expensive. Before actually computing aggregations, I would like to find out if an aggregation would have non-zero count.
For eg. say based on my query, N documents are returned. Now I want to find out, if on these N documents, I aggregate over a certain field, will that aggregation have any buckets? If for all documents the field has null value or empty string, it should return false or 0. If even single document has field as non-empty string or non-null value, it should return true or a non-zero number. I don't really care about the count.
Is it possible to do in a way which is much faster than computing aggregation?

Efficient cypher query matching subgraph connecting two groups of nodes

my problem is the following. I have a small but dense network in Neo4j (~280 nodes, ~3600 relationships). There is only one type of node and one type of edge (i.e. a single label for each). Now, I'd like to specify two distinct groups of nodes, given by values for their "group" property, and match the subgraph consisting of all paths up to a certain length connecting the two groups. In addition I would like to add constraints on the relations. So, at the moment I have this:
MATCH (n1) WHERE n1.group={group1}
MATCH (n2) WHERE n2.group={group2}
MATCH p=(n1)-[r*1..3]-(n2)
WHERE ALL(c IN r WHERE c.weight > {w})
AND ALL(n in NODES(p) WHERE 1=length(filter(m in NODES(p) WHERE m=n)))
WITH DISTINCT r AS dr, NODES(p) AS ns
UNWIND dr AS udr UNWIND ns AS uns
RETURN COLLECT(DISTINCT udr), COLLECT(DISTINCT uns)
which achieves what I want but in some cases seems to be too slow. Here the WHERE statement filters out paths with relationships whose weight property is below a threshold as well as those containing cycles.
The last three lines have to do with the desired output format. Given the matching subgraph (paths), I want all unique relationships in one list, and all unique nodes in another (for visualization with d3.js). The only way I found to do this is to UNWIND all elements and then COLLECT them as DISTINCT.
Also note that the group properties and the weight limit are passed in as query parameters.
Now, is there any way to achieve the same result faster? E.g., with paths up to a length of 3 the query takes about 5-10 seconds on my local machine (depending on the connectedness of the chosen node groups), and returns on the order of ~50 nodes and a few hundred relationships. This seems to be in reach of acceptable performance. Paths up to length 4 however are already prohibitive (several minutes or never returns).
Bonus question: is there any way to specify the upper limit on path length as a parameter? Or does a different limit imply a totally different query plan?
This probably won't work at all, but it might give you something to play with. I tried changing a few things that may or may not work.
MATCH (n1) WHERE n1.group={group1}
MATCH (n2) WHERE n2.group={group2}
MATCH p=(n1)-[r*1..3]-(n2)
WHERE r.weight > {w}
WITH n1, NODES(p) AS ns, n2, DISTINCT r AS dr
WHERE length(ns) = 1
UNWIND dr AS udr UNWIND ns AS uns
RETURN COLLECT(DISTINCT udr), COLLECT(DISTINCT uns)

Neo4j traversal performance

I want to perform an undirected traversal to extract all ids connected through a certain type of relationship
When I perform the following query it returns the values fast enough
MATCH path=(s:Node {entry:"a"})-[:RelType*1..10]-(x:Node)
RETURN collect(distinct ID(x))
However doing
MATCH path=(s:Node {entry:"a"})-[:RelType*]-(x:Node)
RETURN collect(distinct ID(x))
takes an huge amount of time. I suspect that by using * it searches every path from s to x, but since I want only the ids these paths can be discarded. What I really want is an BFS or DFS search to find the connect nodes from s.
Both query returns the exact same result since there are no elements with shortest path higher than 5 (only in the test example !).
Did you add an index for create index on :Node(entry) ?
Also depending on the # of rels per node in your path you get rels^10 (or general rels^steps) paths through your graph that are potentially returned.
Can you try first with a smaller upper limit like 3 and work from there?
Also leaving off the direction really hurts as you then get cycles.
What you can also try to do is:
MATCH path=(s:Node {entry:"a"})-[:RelType*]->(x:Node)
RETURN ID(X)
and stream the results and do the uniqueness in the client
Or this if you don't want to do uniqueness in the client
MATCH path=(s:Node {entry:"a"})-[:RelType*]->(x:Node)
RETURN distinct ID(X)

Resources