Elasticsearch distinct multi search result - elasticsearch

I use Elasticsearch 2.2 and when using Multi Search API, it's possible to distinct the result between the first search and the second search??
I have multi search query dsl like this
{}
{"query": {"filtered": {"filter": {"bool": {"must": [{"terms": {"mtart": ["roh"]}},{"terms": {"werks": ["f230","f232"]}}]}},"query": {"query_string": {"query": "roh"}}}}}
{}
{"query":{"filtered":{"filter":{"bool":{"must":[{"terms":{"mtart":["roh","nlag"]}},{"terms":{"werks":["f230","f231"]}}]}},"query":{"query_string":{"query":"roh"}}}}}
The result for the first search is 12 hits and the second is 39 hits. But there is duplicate value between first search and second search. And I want to distinct all the hits and get the unique documents. It is possible to do it??
Thanks a lot

Related

scoring of Term vs. Terms query different

I am retrieving documents by filtering and using a term query to apply a score.
The query should match all animals having a specified color - the more colors are matched, the higher the score of a doc. Strange thing is, term and terms query result in a different scoring.
{
"query": {
"bool": {
"should": [
{"terms": {"color": ["brown","darkbrown"] } },
]
}
}
}
should be the same like using
{"term": {"color": {"value": "brown"} } },
{"term": {"color": {"value": "darkbrown"} } }
Query no. 1 gives me the exact same score for a document whether 1 or 2 terms are matched. The latter of course returns a higher score, if more colors are matched.
As stated by the coordination factor the returned score should be higher if more terms are matched. Therefore these two queries should result in the same score - or is because term queries do not analyze the search term?
My field is indexed as text. Strings are indexed as an "array" of strings, e.g. "brown","darkbrown"
Difference between term vs terms query:
Term query return documents that contain one or more exact term in a provided field.
The terms query is the same as the term query, except you can search for multiple values.
Warning: Avoid using the term query for text fields.
As far your this part is concerned
or is because term queries do not analyze the search term?
Yes, It is because the search term does not analyze the term searched. It just matches the exact search term.

Elastic Search Multimatch: Is there a way to search all fields except one?

We have an Elastic Search structure that specifies fields in a multi_match query like this:
"multi_match": {
"query": "find this string",
"fields": ["*_id^20", "*_name^20", "*"]
}
This works great - except under certain circumstances like when query is "Find NOWAK". This is because "NOW" is a reserved word for date searching and field "*" matches fields that are defined as dates.
So what I would like to do is ignore fields that match "*_at".
Is there way to tell Elastic Search to ignore certain fields in a multi_match query?
If the answer to that is "no" then the follow up question is how to escape the search term so that it won't trigger key words
Running version 6.7
Try this:
Exclude a field on a Elasticsearch query
curl -XGET 'localhost:9200/testidx/items/_search?pretty=true' -d '{
"query" : {
"query_string": {
"fields": ["title", "field2", "field3"], <-- add this
"query": "Titulo"
}},
"_source" : {
"exclude" : ["*.body"]
}
}'
Apparently the answer is "No: there is not a way to tell ElasticSearch to ignore certain fields in a multi_match query"
For my particular issue I found an inexpensive way to find the necessary white-listed fields (this is performed outside the scope of ElasticSearch otherwise I would post it here) and list those in place of the "*" when building the query.
I am hopeful someone will tell me I'm wrong, but I don't think I am.

Elasticsearch hits.total different with OR

When I use the following search (/posts/_search) my hits.total is 1400:
{"query": {"query_string": {"query": "Bitcoin"}}}
When I use the following search (/posts/_search) my hits.total is 500:
{"query": {"query_string": {"query": "Ethereum"}}}
When I use an OR in my search, the hits.total is 1400, where I expected it to be 1900.
{"query": {"query_string": {"query": "(Ethereum) OR (Bitcoin)"}}}
Why is my hits.total number different when I am using an "OR"? I am using the hits.total as a counter to display and the number should be the same, right?
I am pretty new with ElasticSearch and hopefully, someone could point me in the right direction. Thanks!
Most probably it Looks like there are some documents where **_all has both terms** i.e. Bitcoin and Ethereum, and hence, same documents get selected when u run the query independently, but when u run, this common documents get included only once.
May be this Venn diagram can explain better
A U B = (7+2+5) + (8+1+2+5) - (2+5) = 23
A + B = (7+2+5) + (8+1+2+5) = 30
If you are sure, these field which can never have multiple values then try adding "default_field" in the query and run the results. When you don't pass "default_field", if defaults to index.query.default_field index settings, which in turn defaults to _all.
{
"query": {
"query_string": {
"default_field": "CRYPTOCURRENCY_TYPE",
"query": "as"
}
}
}
More details you can be found here : https://www.elastic.co/guide/en/elasticsearch/reference/5.5/query-dsl-query-string-query.html

How to show exact match (either word or sentence) result first and then others in elastic search?

Hi is any query in the elastic search which will display exact match (either word or sentence) results first and then partial match results.please help me into this?
You can use multi-match queries and boost exact matches. Chech https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html
You can use a match query as follows.
Suppose you are looking for a word "tim"
GET /index/type/_search
{
"query": {
"bool": {
"should": [{"match": {"field_name": "tim"}}
]
}
}
}
This will automatically return the best results first, and for partial matches, you can read on fuzzy queries:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html

Elasticsearch how to match documents for which the field tokens are a sub-set of the query tokens

I have a keyword/key-phrase field I tokenize using standard analyser. I want this field to match if if there is a search phrase that has all tokens of this field in it.
For example if the field value is "veni, vidi, vici" and the search phrase is "Ceaser veni,vidi,vici" I want this search phrase to match but search phrase "veni, vidi" not match.
I also need "vidi, veni, vici" (weird!) to match. So the positions and ordering of the terms is not really important. A phrase match would not quite work for me I think.
I can use "bool query" with "minimum_should_match" parameter for this specific example but that is not really what I want as minimum should match is about ratio/number of tokens in the search phrase.
Pure ES solution would go like this. You will need two requests.
1) First you need to pass user query through analyze api to get all the search tokens.
curl -XGET 'localhost:9200/_analyze' -d '
{
"analyzer" : "standard",
"text" : "Ceaser veni,vidi,vici"
}'
you will get 4 tokens ceaser, veni, vidi, vici . You need to pass these tokens as an array to next search request.
2) We need to search for documents whose tokens are subset of search tokens.
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"query": {
"match": {
"title": "Ceaser veni,vidi,vici"
}
}
},
{
"script": {
"script": "if(search_tokens.containsAll(doc['title'].values)){return true;}",
"params": {
"search_tokens": [
"ceaser",
"veni",
"vidi",
"vici"
]
}
}
}
]
}
}
}
}
}
Here job of first match query inside the filter is to narrow down the documents on which script should run. containsAll method will check if the documents tokens are sublist of search tokens. This will be slow but will do the job with your current set up. One big improvement you can do is store tokens as an array so that doc['title'].values can be replaced with that field which will improve the script.
Hope this helps!
No built-in solution but this works:
Add an extra field with the number of terms in the field for each document. So in your "veni, vidi, vici" example, you would have a field like "field_term_count" : 3.
Perform a separate match search for each token in the search query.
Sum the number of searches that matched for each document with at least one match (e.g. a hashtable with key of document ID and value of count).
Compare the number of matches in 3 to the "field_term_count" field for each of the documents with matches. If they are equal then the document is a match.
Then "Ceaser veni,vidi,vici" will match but the search phrases "veni, vidi" will not, as desired. It should be quite fast for reasonable numbers of matches.

Resources