How could i remove items from another search? - elasticsearch

On elastic search we make two searches, one for exact items, and another for non-exact items.
On we search input = dev, and on the exact result we get this item:
{"_id" : "users-USER#1-name",
"_source" : {
"pk" : "USER#1",
"entity" : "users",
"field" : "name",
"input" : "dev",
}}
Then we do a second search for the non-exact results we get this item:
{"_id" : "users-USER#1-description",
"_source" : {
"pk" : "USER#1",
"entity" : "users",
"field" : "name",
"input" : "Dev1",
}}
We want to remove the exact results from the first search from the second non-exact search by pk, we want to remove the items with the pk's from the first search from the second search
I'll heavenly appreciate any idea.
For example, on the fist search we got item:
"_id" : "users-USER#1-name"
"pk" : "USER#1"
Since we got this item on the first search, we want to remove all the items with the pks from the second search.
So the second search would be empty:
empty

Related

Elasticsearch fuzziness with multi_match and bool_prefix type

I have a set of search_as_you_type_fields I need to search against. Here is my mapping
"mappings" : {
"properties" : {
"description" : {
"type" : "search_as_you_type",
"doc_values" : false,
"max_shingle_size" : 3
},
"questions" : {
"properties" : {
"content" : {
"type" : "search_as_you_type",
"doc_values" : false,
"max_shingle_size" : 3
},
"tags" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
}
}
},
"title" : {
"type" : "search_as_you_type",
"doc_values" : false,
"max_shingle_size" : 3
},
}
}
I am using a multi_match query with bool_prefix type.
"query": {
"multi_match": {
"query": "triangle",
"type": "bool_prefix",
"fields": [
"title",
"title._2gram",
"title._3gram",
"description",
"description._2gram",
"description._3gram",
"questions.content",
"questions.content._2gram",
"questions.content._3gram",
"questions.tags",
"questions.tags._2gram",
"questions.tags._3gram"
]
}
}
So far works fine. Now I want to add a typo tolerance which is fuzziness in ES. However, looks like bool_prefix has some conflicts working with this. So if I modify my query and add "fuzziness": "AUTO" and make an error in a word "triangle" -> "triangld", it won't get any results.
However, if I am looking for a phrase "right triangle", I have some different behavior:
even if no typos is made, I got more results with just "fuzziness": "AUTO" (1759 vs 1267)
if I add a typo to the 2d word "right triangdd", it seems to work, however looks like it now pushes the results containing "right" without "triangle" first ("The Bill of Rights", "Due process and right to privacy" etc.) in front.
If I make a typo in the 1st word ("righd triangle") or both ("righd triangdd"), the results seems to be just fine. So this is probably the only correct behavior.
I've seen a couple of articles and even GitHub issues that fuzziness does not work in a proper way with a multi_match query with bool_prefix, however I can't find a workaround for this. I've tried changing the query type, but looks like bool_prefix is the only one that supports search as you type and I need to get search result as a user starts typing something.
Since I make all the requests from ES from our backend What I also can do is manipulate a query string to build different search query types if needed. For example, for 1 word searches use one type for multi use another. But I basically need to maintain current behavior.
I've also tried appending a sign "~" or "~1[2]" to the string which seems to be another way of specifying the fuzziness, but the results are rather unclear and performance (search speed) seems to be worse.
My questions are:
How can I achieve fuzziness for 1 word searches? so that query "triangld" returns documents containing "triangle" etc.
How can I achieve correct search results when the typo in the 2d (last?) word of the query? Like I mentioned above it works, but see the point 2 above
Why just adding a fuzziness (see p. 1) returns more results even if the phrase is correct?
Anything I need to change in my analyzers etc.?
so to achieve a desired behavior, we did the following:
change query type to "query_string"
added query string preprocessing on the backend. We split the query string by white spaces and add "~1" or "~2" to each word if their length is more 4 chars or 8 chars respectively. ~ is a fuzziness syntax in ES. However, we don't add this to the current typing word until the user types a white space. For example, user typing [t, tr, tri, ... triangle] => no fuzzy, but once "triangle " => "triangle~2". This is because there will be unexpected results with the last word having fuzziness
we also removed all ngram fields from the search fields as we get the same results but performance is a bit better.
added "default_operator": "AND" to the query to contain the results from one field for phrase queries

Finding most commonly coincident values within an array with Elasticsearch

I would like to display the most commonly ordered pairs of products for a set of orders placed. an abbreviated version of my search index would look something like this:
{
"_type" : "order",
"_id" : "10",
"_score" : 1.0,
"_source" : {
...
"product_ids" : [
1, 2
]
...
}
},
{
"_type" : "order",
"_id" : "11",
"_score" : 1.0,
"_source" : {
...
"product_ids" : [
1, 2, 3
]
...
}
}
Given that my search index contains a set of orders, each with a product_ids field that contains an array of the product ids that are in the order, is it possible to put together an Elasticsearch aggregation that will return either:
The most common pairs of product ids (which may be members of an arbitrarily long list of product ids) that occur the most frequently together in orders.
The most common sets of product ids of an arbitrary length that occur most frequently together in orders.
I've been reading the documentation, and I'm not sure if an adjacency matrix might be appropriate for this problem. My current hunch is to write a scripted cardinality query that orders and joins the product_ids in the search document to get results in line with #2, since #1 seems like it might involve too many permutations of product ids to be reasonably efficient.

Count of elements on kibana visualization

I have inserted below JSON records on my elastic index. How do I get count of all the elements present in the "devices" array so that count can be visualized on Kibana Dashboard ?
Filter condition - Devices count needs to be displayed as "4" for SAMPLE application and "2" for SAMPLE2 application on Kibana.
Without Filter condition - Device count to be displayed as "6" devices.
{
"status" : "SUCCESS",
"request" : ["ABC"],
"applicationName" : "SAMPLE",
"endTime" : 1478772517736,
"devices" : ["d1","d2","d3","d4"]
}
,
{
"status" : "FAILED",
"request" : ["EDF"],
"applicationName" : "SAMPLE2",
"endTime" : 1478772517736,
"devices" : ["d5","d12"]
}
You should create a scripted field in Kibana in order to get the length of an array element. So your script could look something like this:
doc['devices'].values.size()
OR
doc['devices'].values.length
And then you can have a Data Table visualization, where having the array count in respective to the applicationName by using the terms aggregation. Or you could apply filters saying:
applicationName:"SAMPLE"
applicationName:"SAMPLE2"
which will display the array count for the given filter criteria. This SO could be helpful.

Elasticsearch - Getting multiple documents with multiple custom offset and size 1

Currently, the way I use to get multiple documents with exact query but different positions offset with size 1 is to use Elastic Search Multi Search API. I wonder if there is any better way to do this that would result in a better performance.
The example of current query I am using :
{"index" : "test"}
{"query" : {"term" : { "user" : "Kimchy" }}, "from" : a, "size" : 1}
{}
{"query" : {"term" : { "user" : "Kimchy" }}, "from" : b, "size" : 1}
{}
{"query" : {"term" : { "user" : "Kimchy" }}, "from" : c, "size" : 1}
{}
{"query" : {"term" : { "user" : "Kimchy" }}, "from" : d, "size" : 1}
{}
{"query" : {"term" : { "user" : "Kimchy" }}, "from" : e, "size" : 1}
....
where a,b,c,d,e is a parameter given when query.
If I understand you correctly a,b,c,d,e will all be numbers right?, so you basically want to be able to ask elastic search for say the 3rd, 4th, and 7th documents that show up in a specific query?
I'm not sure if it is the best way to do things, but it would certainly be faster to find the smallest and largest numbers in a through e then do "from : smallest" and "size : largest-smallest". Then take the results that ES returns and go through it yourself to get the specific documents.
Every time you do a from/size query elastic search has to find all the queries before that number anyways so you are currently basically redoing the same search over and over.
This approach does get sketchy if there is a large difference between your smallest and biggest numbers though, and you may end up trying to send back thousands of documents.

How to index the following for multifaceting in elasticsearch?

If I have a People collection. Each person may have multiple hobbies. (e.g. Running, Climbing, Swimming, Jumping Jacks).
How would I index a single person with all those attrubutes such that I could apply a facet to them? Could someone provide a sample oh how data should be indexed given the following:
Person | Hobbies
Joe | Chess, Jumping Jacks, Swimming
Person | Hobbies
Bob | Rowing
And how I would go about being able to get facets for "hobbies" key? (note that "Jumping Jacks" is a single value, but whitespace separated word.
If you both want to search on the hobbies field and make a facet on it, you need to use a multi_field. That's how you can index the same field in different ways. Usually the version for search needs to be tokenized and at least lowercased, plus language dependent analysis if you want, while the facet version doesn't even need to be analyzed since the facet entries need to be the same that you had in your source documents.
{
"people" : {
"properties" : {
"hobbies" : {
"type" : "multi_field",
"fields" : {
"hobbies" : {"type" : "string", "index" : "analyzed"},
"facet" : {"type" : "string", "index" : "not_analyzed"}
}
}
}
}
}
The above mapping would create two different fields for the same hobbies field as input. The first one, which you can refer to in your queries just using the hobbies name using the default standard analyzer; the second one is not analyzed and can be used for the facet. You can refer to it as hobbies.facet.
As a result you can search for jumpingand find a match, but your facet will look like the following:
Chess (1)
Jumping Jacks (1)
Swimming (1)
Hobbies (1)
Rowing (1)

Resources