Elasticsearch - Getting multiple documents with multiple custom offset and size 1 - elasticsearch

Currently, the way I use to get multiple documents with exact query but different positions offset with size 1 is to use Elastic Search Multi Search API. I wonder if there is any better way to do this that would result in a better performance.
The example of current query I am using :
{"index" : "test"}
{"query" : {"term" : { "user" : "Kimchy" }}, "from" : a, "size" : 1}
{}
{"query" : {"term" : { "user" : "Kimchy" }}, "from" : b, "size" : 1}
{}
{"query" : {"term" : { "user" : "Kimchy" }}, "from" : c, "size" : 1}
{}
{"query" : {"term" : { "user" : "Kimchy" }}, "from" : d, "size" : 1}
{}
{"query" : {"term" : { "user" : "Kimchy" }}, "from" : e, "size" : 1}
....
where a,b,c,d,e is a parameter given when query.

If I understand you correctly a,b,c,d,e will all be numbers right?, so you basically want to be able to ask elastic search for say the 3rd, 4th, and 7th documents that show up in a specific query?
I'm not sure if it is the best way to do things, but it would certainly be faster to find the smallest and largest numbers in a through e then do "from : smallest" and "size : largest-smallest". Then take the results that ES returns and go through it yourself to get the specific documents.
Every time you do a from/size query elastic search has to find all the queries before that number anyways so you are currently basically redoing the same search over and over.
This approach does get sketchy if there is a large difference between your smallest and biggest numbers though, and you may end up trying to send back thousands of documents.

Related

How could i remove items from another search?

On elastic search we make two searches, one for exact items, and another for non-exact items.
On we search input = dev, and on the exact result we get this item:
{"_id" : "users-USER#1-name",
"_source" : {
"pk" : "USER#1",
"entity" : "users",
"field" : "name",
"input" : "dev",
}}
Then we do a second search for the non-exact results we get this item:
{"_id" : "users-USER#1-description",
"_source" : {
"pk" : "USER#1",
"entity" : "users",
"field" : "name",
"input" : "Dev1",
}}
We want to remove the exact results from the first search from the second non-exact search by pk, we want to remove the items with the pk's from the first search from the second search
I'll heavenly appreciate any idea.
For example, on the fist search we got item:
"_id" : "users-USER#1-name"
"pk" : "USER#1"
Since we got this item on the first search, we want to remove all the items with the pks from the second search.
So the second search would be empty:
empty

Searching multiple types in elasticsearch

I have a usecase where there are two different types in the same index. Both the types have different structure and mapping.
I need to query both types at the same time using different query DSL.
How can I build my query DSL to simultaneously query more than one type of the same index.
I looked into elasticsearch guide at https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-index-multi-type.html but there is no proper explanation here. According to this even if I set two different types in my request :
/index/type1,type2/_search
I will have to send the same query DSL.
You need to use multi-search API and the _msearch endpoint
curl -XGET localhost:9200/index/_msearch -d '
{"type": "type1"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
{"type": "type2"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
'
Note: make sure to separate each line by newlines (including the last line)
You'll get two responses in the same order as the requests

Aggregation distinct values in ElasticSearch

I'm trying to get the distinct values and their amount in ElasticSearch.
This can be done via:
"distinct_publisher": {
"terms": {
"field": "publisher", "size": 0
}
}
The problem I've is that it counts the terms, but if there are values in publishers separated via a space e.g.:
"Chicken Dog"
and 5 documents have this value in the publisher field, then I get 5 for Chicken and 5 for Dog:
"buckets" : [
{
"key" : "chicken",
"doc_count" : 5
},
{
"key" : "dog",
"doc_count" : 5
},
...
]
But I want to get as the result:
"buckets" : [
{
"key" : "Chicken Dog",
"doc_count" : 5
}
]
The reason you're getting 5 buckets for each of chicken and dog is because your documents were analyzed at the time that you indexed them.
This means elasticsearch did some small processing to turn Chicken Dog into chicken and dog (lowercase, and tokenize on space). You can see how elasticsearch will analyze a given piece of text into searchable tokens by using the Analyze API, for example:
curl -XGET 'localhost:9200/_analyze?&text=Chicken+Dog'
In order to aggregate over the "raw" distinct values, you need to utilize the not_analyzed mapping so elasticsearch doesn't do its usual processing. This reference may help. You may need to reindex your data to apply the not_analyzed mapping to get the result you want.

Elastic Search Query for multiple conditions

I want to build a query in Elastic Search which has 3 sub conditions.
1. It must satisfy at-least one of list of provided values.
2. After 1, 2 must be satisfied and then 3rd condition.
(1 must be satisfied, 2 and 3 also must be satisfied but only after 1 is satisfied).
1 is a list of values, so anyone satisfying will suffice.
Please give a outline of how to frame the Elastic Search query using boolean parameters.
Thanks in advance.
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" :[{"term":{"sessionId":"-ShAwL2KlnVeo6nMMNX3ycVlc0kdikOWPC8vShyvpRpdmOQJkbBo-FiLJymsuZp36gcQs1I"}}],
"should" : [
{ "term" : {"visitorId": "b090606f-968d-fef4-33e3-3341f3a04265"}},
{ "term" : {"clientIp": "192.168.8.100"}}
]
}
}
}
}
}
the terms specified in the must, the documents must match the criteria
the terms specified in the should, any of the term can be matched

elasticsearch custom_score multiplication is inaccurate

I've inserted some documents which are all identical except for one floating-point field, called a.
When script of a custom_score query is set to just _score, the resulting score is 0.40464813 for a particular query matching some fields. When script is then changed to _score * a (mvel) for the same query, where a is 9.908349251612433, the final score becomes 4.0619955.
Now, if I run this calculation via Chrome's JS console, I get 4.009394996051871.
4.0619955 (elasticsearch)
4.009394996051871 (Chrome)
This is quite a difference and produces an incorrect ordering of results. Why could it be, and is there a way to correct it?
If I run a simple calculation using the numbers you provided, then I get the result that you expect.
curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1' -d '
{
"a" : 9.90834925161243
}
'
curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1' -d '
{
"query" : {
"custom_score" : {
"script" : "0.40464813 *doc[\u0027a\u0027].value",
"query" : {
"match_all" : {}
}
}
}
}
'
# {
# "hits" : {
# "hits" : [
# {
# "_source" : {
# "a" : 9.90834925161243
# },
# "_score" : 4.009395,
# "_index" : "test",
# "_id" : "lPesz0j6RT-Xt76aATcFOw",
# "_type" : "test"
# }
# ],
# "max_score" : 4.009395,
# "total" : 1
# },
# "timed_out" : false,
# "_shards" : {
# "failed" : 0,
# "successful" : 5,
# "total" : 5
# },
# "took" : 1
# }
I think what you are running into here is testing too little data across multiple shards.
Doc frequencies are calculated per shard by default. So if you have two identical docs on shard_1 and one doc on shard_2, then the docs on shard_1 will score lower than the docs on shard_2.
With more data, the document frequencies tend to even out over shards. But when testing small amounts of data you either want to create an index with only one shard, or to add search_type=dfs_query_then_fetch to the query string params.
This calculates global doc frequencies across all involved shards before calculating the scores.
If you set explain to true in your query, then you can see exactly how your scores are being calculated

Resources