match query returns exact values only in elasticsearch - elasticsearch

I have following documents:
{
"_index": "testrest",
"_type": "testrest",
"_id": "sadfasdfw1",
"_score": 1,
"_source": {,
"s_item_name": "Create",
"request_id": "35",
"confidence": "0.5",
}
},
{
"_index": "testrest",
"_type": "testrest",
"_id": "asdfds",
"_score": 1,
"_source": {,
"s_item_name": "Update",
"request_id": "35",
"confidence": "0.3333",
}
},
I am trying to get results for request_id of 35 and their confidence values.
For eg. if input is only 0. then both results should be displayed.
And if input is 0.5 then only first doc., and if 0.3 only second doc.
Here's what I tried:
{
"query": {
"bool": {
"must": [
{ "match": { "confidence_score": "0.33" }}
],
"filter": {
"term": {
"request_id": "35"
}
}
}
}
}
This gives 0 results. Since it requires exact values only, like 0.5 or 0.3333.
I thought match works for this instead of term.
How do I make the query similar to LIKE operator in SQL?

For like I should suggest you have a look at wildcard, prefix or match_phrase type in elastic search or if you are using the latest version you can write SQL statement using ES plugin.

Related

Elasticsearch associating exact match terms

I have a search index of filenames containing over 100,000 entries that share about 500 unique variations of the main filename field. I have recently made some modifications to certain filename values that are being generated from my data. I was wondering if there is a way to link certain queries to return an exact match. In the following query:
"query": {
"bool": {
"must": [
{
"match": {
"filename": "foo-bar"
}
}
],
}
}
how would it be possible to modify the index and associate the results so that above query will also match results foo-bar-baz, but not foo-bar-foo or any other variation?
Thanks in advance for your help
You can use a term query instead of a match query. Perfect to use on a keyword:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
Adding a working example with index data and search query. (Using the default mapping)
Index Data:
{
"fileName": "foo-bar"
}
{
"fileName": "foo-bar-baz"
}
{
"fileName": "foo-bar-foo"
}
Search Query:
{
"query": {
"bool": {
"should": [
{
"match": {
"fileName.keyword": "foo-bar"
}
},
{
"match": {
"fileName.keyword": "foo-bar-baz"
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"fileName": "foo-bar"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 0.9808291,
"_source": {
"fileName": "foo-bar-baz"
}
}
]

Elastic search ---: MUST_NOT query not working

I have a query in which i want to add a must_not clause that would discard all records that have blank data for a some field. I tried a lot of ways but none worked. when I issue the same query (mentioned below) with other specific fields then it works fine.
this query should get all records that do not have "registrationType1" field empty/blank
query:
{
"size": 20,
"_source": [
"registrationType1"
],
"query": {
"bool": {
"must_not": [
{
"term": {
"registrationType1": ""
}
}
]
}
}
}
the results below still contains "registrationType1" with empty values
results:
**"_source": {
"registrationType1": ""}}
, * {
"_index": "oh_animal",
"_type": "animals",
"_id": "3842002",
"_score": 1,
"_source": {
"registrationType1": "A&R"}}
, * {
"_index": "oh_animal",
"_type": "animals",
"_id": "3842033",
"_score": 1,
"_source": {
"registrationType1": "AMHA"}}
, * {
"_index": "oh_animal",
"_type": "animals",
"_id": "3842213",
"_score": 1,
"_source": {
"registrationType1": "AMHA"}}
, * {
"_index": "oh_animal",
"_type": "animals",
"_id": "3842963",
"_score": 1,
"_source": {
"registrationType1": ""}}
, * {
"_index": "oh_animal",
"_type": "animals",
"_id": "3869063",
"_score": 1,
"_source": {
"registrationType1": ""}}**
PFB mappings for the field above
"registrationType1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
You need to use the keyword subfield in order to do this:
{
"size": 20,
"_source": [
"registrationType1"
],
"query": {
"bool": {
"must_not": [
{
"term": {
"registrationType1.keyword": "" <-- change this
}
}
]
}
}
}
If you do not specify any text value on the text fields, there is basically nothing to analyze and return the documents accordingly.
In similar way, if you remove must_not and replace it with must, it would show empty results.
What you can do is, looking at your mapping, query must_not on keyword field. Keyword fields won't be analysed and in that way your query would return the results as you expect.
Query
POST myemptyindex/_search
{
"query": {
"bool": {
"must_not": [
{
"term": {
"registrationType1.keyword": ""
}
}
]
}
}
}
Hope this helps!
I am using elasticsearch version 7.2,
I replicated your data and ingested in my elastic index,and tried querying with and without .keyword.
I am getting the desired result when using the ".keyword" in the field name.It is not returning the docs which have registrationType1="".
Note - The query does not works when not using the ".keyword"
I have added my sample code below, have a look if that helps.
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.indices.create(index="test", ignore=400, body={
"mappings": {
"_doc": {
"properties": {
"registrationType1": {
"type": "text",
"field": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
})
data = {
"registrationType1": ""
}
es.index(index="test",doc_type="_doc",body=data,id=1)
search = es.search(index="test", body={
"size": 20,
"_source": [
"registrationType1"
],
"query": {
"bool": {
"must_not": [
{
"term": {
"registrationType1.keyword": ""
}
}
]
}
}
})
print(search)
Executing the above should not return any results as we are inserting empty for the field
There was some issue with the mappings itself, I deleted the index and re-indexed it with new mappings and its working now.

Elasticsearch query prefer exact match over partial match on multiple fields

I am doing a free text search on documents with multiple fields. When I perform a search I want the documents that have a perfect match on any of the labels to have a higher scoring. Is there any way I can do this from the query?
For example the documents have two fields called label-a and label-b and when I perform the following multi-match query:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "apple",
"type": "most_fields",
"fields": [
"label-a",
"label-b"
]
}
}
]
}
}
}
I get the following results (only the relevant part):
"hits": [
{
"_index": "salad",
"_type": "fruit",
"_id": "4",
"_score": 0.581694,
"_source": {
"label-a": "apple pie and pizza",
"label-b": "pineapple with apple juice"
}
},
{
"_index": "salad",
"_type": "fruit",
"_id": "2",
"_score": 0.1519148,
"_source": {
"label-a": "grape",
"label-b": "apple"
}
},
{
"_index": "salad",
"_type": "fruit",
"_id": "1",
"_score": 0.038978107,
"_source": {
"label-a": "apple apple apple apple apple apple apple apple apple apple apple apple",
"label-b": "raspberry"
}
},
{
"_index": "salad",
"_type": "fruit",
"_id": "3",
"_score": 0.02250402,
"_source": {
"label-a": "apple pie and pizza",
"label-b": "raspberry"
}
}
]
I want the second document, the one with the value grape for label-a and value apple for label-b, to have the highest score as I am searching for the value apple and one of the labels has that exact value. This should work regardless of which label the exact term appears.
Because Elasticsearch uses tf/idf model for scoring you are getting these results. Try to specify in your index fields "label-a" and "label-b" additionally as not-analyzed(raw) fields. Then rewrite your query someth like this:
{
"query": {
"bool": {
"should": {
"match": {
"label-a.raw": {
"query": "apple",
"boost": 2
}
}
},
"must": [
{
"multi_match": {
"query": "apple",
"type": "most_fields",
"fields": [
"label-a",
"label-b"
]
}
}
]
}
}
}
The should clause will boost documents with exact match and you will probably get them in the first place. Try to play with the boost number and pls check th equery before running. This is just and idea what you can do

ElasticSearch : How can I boost score depending on field value?

I am trying to get rid of sorting in elasticsearch by boosting the _score based on field value. Here is my scenario:
I have a field in my document: applicationDate. This is time elapsed since EPOC. I want record having greater applicationDate (most recent) to have higer score.
If score of two documents are same, I want to sort them on another field that is of type String. Say "status" is another field that can have value (Available, in progress, closed ). So, documents having same applicationDate should have _score based on status.
Available should have more score , In Progress a less, Closed, least. So by this means, I wont have to sort the documents after getting results.
Please give me some pointers.
You should be able to achieve this using Function Score .
Depending on your requirements it could be as simple as the following
Example:
put test/test/1
{
"applicationDate" : "2015-12-02",
"status" : "available"
}
put test/test/2
{
"applicationDate" : "2015-12-02",
"status" : "progress"
}
put test/test/3
{
"applicationDate" : "2016-03-02",
"status" : "progress"
}
post test/_search
{
"query": {
"function_score": {
"functions": [
{
"field_value_factor" : {
"field" : "applicationDate",
"factor" : 0.001
}
},
{
"filter": {
"term": {
"status": "available"
}
},
"weight": 360
},
{
"filter": {
"term": {
"status": "progress"
}
},
"weight": 180
}
],
"boost_mode": "multiply",
"score_mode": "sum"
}
}
}
**Results:**
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "3",
"_score": 1456877060,
"_source": {
"applicationDate": "2016-03-02",
"status": "progress"
}
},
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 1449014780,
"_source": {
"applicationDate": "2015-12-02",
"status": "available"
}
},
{
"_index": "test",
"_type": "test",
"_id": "2",
"_score": 1449014660,
"_source": {
"applicationDate": "2015-12-02",
"status": "progress"
}
}
]
Have you looked at function scores?
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html
Specifically look at decay functions in the above documentation.
There is a new field called rank_feature_field that can be useful for this usecase:
https://www.elastic.co/guide/en/elasticsearch/reference/current/rank-feature.html

Is it possible to perform user count / cardinality with logical relationship in ElasticSearch?

I have documents of Users with the following format:
{
userId: "<userId>",
userAttributes: [
"<Attribute1>",
"<Attribute2>",
...
"<AttributeN>"
]
}
I want to be able to get the number of unique users that answer a logic statement, for example How many users have attribute1 AND attribute2 OR attribute3?
I've read about the cardinality function in cardinality-aggregation but it seems to work for a single value, lacking the logic abilities of "AND" and "OR".
Note that I have around 1,000,000,000 documents and I need the results as fast as possible, this why I was looking at the cardinality estimation.
What about this attempt, considering the userAttributes as a simple array of strings (analyzed in my case, but single lowercase terms):
POST /users/user/_bulk
{"index":{"_id":1}}
{"userId":123,"userAttributes":["xxx","yyy","zzz"]}
{"index":{"_id":2}}
{"userId":234,"userAttributes":["xxx","yyy","aaa"]}
{"index":{"_id":3}}
{"userId":345,"userAttributes":["xxx","yyy","bbb"]}
{"index":{"_id":4}}
{"userId":456,"userAttributes":["xxx","ccc","zzz"]}
{"index":{"_id":5}}
{"userId":567,"userAttributes":["xxx","ddd","ooo"]}
GET /users/user/_search
{
"query": {
"query_string": {
"query": "userAttributes:(((xxx AND yyy) NOT zzz) OR ooo)"
}
},
"aggs": {
"unique_ids": {
"cardinality": {
"field": "userId"
}
}
}
}
which gives the following:
"hits": [
{
"_index": "users",
"_type": "user",
"_id": "2",
"_score": 0.16471066,
"_source": {
"userAttributes": [
"xxx",
"yyy",
"aaa"
]
}
},
{
"_index": "users",
"_type": "user",
"_id": "3",
"_score": 0.04318809,
"_source": {
"userAttributes": [
"xxx",
"yyy",
"bbb"
]
}
},
{
"_index": "users",
"_type": "user",
"_id": "5",
"_score": 0.021594046,
"_source": {
"userAttributes": [
"xxx",
"ddd",
"ooo"
]
}
}
]

Resources