No matches when querying Elastic Search - elasticsearch

I'm trying to run a query elastic search. When run this query
GET accounts/_search/
{
"query": {
"term": {
"address_line_1": "1000"
}
}
}
I get back multiple records like
"hits" : [
{
"_index" : "accounts",
"_type" : "_doc",
"_id" : "...",
"_score" : 8.355149,
"_source" : {
"state_id" : 35,
"first_name" : "...",
"last_name" : "...",
"middle_name" : "P",
"dob" : "...",
"status" : "ACTIVE",
"address_line_1" : "1000 BROADROCK CT",
"address_line_2" : "",
"address_city" : "PARMA",
"address_zip" : "",
"address_zip_plus_4" : ""
}
},
But when I try to expand it to include the more like below I don't get any matches
GET accounts/_search/
{
"query": {
"term": {
"address_line_1": "1000 B"
}
}
}
The response is
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}

The term query is looking for exact matches. Your address_line_* fields were most probably indexed with the standard analyzer which lowercase-s all the letters which in turn prevents the query from matching.
So either use
GET accounts/_search/
{
"query": {
"match": { <--
"address_line_1": "1000 B"
}
}
}
which does not really 'care' about B being lower/upper case or adjust your field analyzers such that the capitalization is preserved.

Related

Elastic Search Query for Relevancy Given a Phrase Rather Than Just One Word

Elastic Search querying/boosting is not working as I would expect it to...
I have an index where documents look like this:
{
"entity_id" : "x",
"entity_name" : "y",
"description": "search engine",
"keywords" : [
"Google"
]
}
Im trying to get the document to show up with a relevancy score when querying by a search phrase that contains one of the keywords.
like this:
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "What are some of products for Google?",
"boost": 10,
"fields": ["keywords"]
}
}
],
"filter": {
"term" : { "entity_name" : "y" }
}
}
}
}
The problem is that my results are not as expected for three reasons:
The result contains hits that do not have any relevancy to "Google" or "Products" or any words in the search phrase.
The document that I am expecting to get returned has a _score = 0.0
The document that I am expecting to get returned has a mysterious "_ignored" : [ "description.keyword"],
The response looks like this:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_score" : 0.0,
"_source": {
"entity_id" : "a",
"entity_name" : "y",
"description": "some other entity",
"keywords": ["Other"]
}
},
{
"_score" : 0.0,
"_ignored" : [
"description.keyword"
],
"_source": {
"entity_id" : "x",
"entity_name" : "y",
"description": "search engine",
"keywords": ["Google"]
}
}
]
}
}
What am I doing wrong?
TLDR;
You use the wrong query type, query_string is not suitable for your needs, maybe use match
To understand
First and foremost:
_ignored is a field that track all the fields that where malformed at index time, and thus are going to be ignored at search time. [doc]
Why is my score 0:
It is because of the query_string query. [doc]
Returns documents based on a provided query string, using a parser with a strict syntax.
eg:
"query": "(new york city) OR (big apple)"
The query_string query splits (new york
city) OR (big apple) into two parts: new york city and big apple.
To illustrate my point, look at the example bellow:
POST /so_relevance_score/_doc
{
"entity_id" : "x",
"entity_name" : "y",
"description": "search engine",
"keywords" : [
"Google"
]
}
POST /so_relevance_score/_doc
{
"entity_id" : "x",
"entity_name" : "y",
"description": "consumer electronic",
"keywords" : [
"Apple"
]
}
GET /so_relevance_score/_search
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "What are some of products for Google?",
"boost": 10,
"fields": ["keywords"]
}
}
],
"filter": {
"term" : { "entity_name" : "y" }
}
}
}
}
will return the following results:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "so_relevance_score",
"_type" : "_doc",
"_id" : "0uYgP34Bpf2xEaYqLYai",
"_score" : 0.0,
"_source" : {
"entity_id" : "x",
"entity_name" : "y",
"description" : "search engine",
"keywords" : [
"Google"
]
}
},
{
"_index" : "so_relevance_score",
"_type" : "_doc",
"_id" : "1eYmP34Bpf2xEaYquoZC",
"_score" : 0.0,
"_source" : {
"entity_id" : "x",
"entity_name" : "y",
"description" : "consumer electronic",
"keywords" : [
"Apple"
]
}
}
]
}
}
Score is 0 for both document. Which means that both documents are as relevant on this query for ElasticSearch.
But if you were to change the query type to match
GET /so_relevance_score/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"keywords": "What are some of products for Google?"
}
}
],
"filter": {
"term" : { "entity_name" : "y" }
}
}
}
}
I get:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.6931471,
"hits" : [
{
"_index" : "so_relevance_score",
"_type" : "_doc",
"_id" : "0uYgP34Bpf2xEaYqLYai",
"_score" : 0.6931471,
"_source" : {
"entity_id" : "x",
"entity_name" : "y",
"description" : "search engine",
"keywords" : [
"Google"
]
}
},
{
"_index" : "so_relevance_score",
"_type" : "_doc",
"_id" : "1eYmP34Bpf2xEaYquoZC",
"_score" : 0.0,
"_source" : {
"entity_id" : "x",
"entity_name" : "y",
"description" : "consumer electronic",
"keywords" : [
"Apple"
]
}
}
]
}
}
With a relevance score !
If you want to fine tune your results, I suggest diving into the documentation for query types [doc]

Query consecutive words using match_phrase elasticsearch works unexpected

I have the parameter name as a text:
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
}
}
}
}
Because of the nature of text type in ElasticSearch, matchs every word on the phrase. That's why in some cases I get the next results:
POST /example-tags/_search
{
"query": {
"match": {
"name": "Jordan Rudess was born in 1956"
}
}
}
// Results
{
"took" : 28,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 2.1596613,
"hits" : [
{
"_index" : "example-tags",
"_type" : "_doc",
"_id" : "6101e538bc8ec610aff699e4",
"_score" : 4.1596613,
"_source" : {
"name" : "Jordan Rudess"
}
},
{
"_index" : "example-tags",
"_type" : "_doc",
"_id" : "610123538bc8ec61034ff699e4",
"_score" : 4.1796613,
"_source" : {
"name" : "Alice in Chains"
}
},
]
}
}
As you can see, in the text Jordan Rudess was born in 1956 I get the result Alice in Chains just for the word in. I want to avoid this behaviour.
If I try:
POST /example-tags/_search
{
"query": {
"match_phrase": {
"name": "Dream Theater keyboardist's Jordan Rudess was born in 1956"
}
}
}
// Results
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
So, in the past example I was expecting to get the Jordan Rudess tag name but I get empty results.
I need to get the maximum ocurrences in tag.name of consecutive words in a phrase. How can I achieve that?

elasticsearch does not return expected returns

I'm complete new on elasticsearch. I tried search API but it's not returning what I expected
What I did
POST /test/_doc/1
{
"name": "Hello World"
}
GET /test/_doc/1
Response:
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_version" : 5,
"_seq_no" : 28,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "Hello World"
}
}
GET /test/_mapping
Response:
{
"test" : {
"mappings" : {
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"query" : {
"properties" : {
"term" : {
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
}
}
GET /test/_search
{
"query": {
"term": {
"name": "Hello"
}
}
}:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
GET /test/_search
{
"query": {
"term": {
"name": "Hello World"
}
}
}
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
My elasticsearch version is 7.3.2
The last two search should return me document 1, is that correct? Why does it hit nothing?
Problem is that you have term queries. Term queries are not analysed. Hence Hello didn't match the term hello in your index. Note the case difference.
Unlike full-text queries, term-level queries do not analyze search terms. Instead, term-level queries match the exact terms stored in a field.
Reference
Whereas match queries analyse the search term also.
{
"query": {
"match": {
"name": "Hello"
}
}
}
You can use _analyze to check how your terms are indexed.

how to Compare two fields in ElasticSearch

I need to compare two fields in Elasticsearch. I tried with below query using Kibana. But it's a gave runtime exception. Kindly help me how to compare these fields.
GET /eps/_search
{
"query": {
"bool" : {
"filter" : {
"script" : {
"script" : {
"source": "doc['output_record_count'].value < doc['input_record_count'].value",
"lang" : "painless"
}
}
}
}
}
}
Note:
For complete Match query I am getting below success response.
Query:
GET /eps/_search
{
"query": {
"match_all": {}
}
}
The Response is :
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "eps",
"_type" : "_doc",
"_id" : "9bNkeXEBLNJ-eURYKdv1",
"_score" : 1.0,
"_source" : {
"experience" : "EPS",
"#version" : "1",
"sdcids" : "013bb234-0840-11ea-8e7d-515f88cf3efa",
"output_record_count" : 13,
"input_record_count" : 10,
"#timestamp" : "2020-04-14T15:52:19.582Z",
"SDC_Ids" : "013bb234-0840-11ea-8e7d-515f88cf3efa"
}
}
]
}
}

How to compare 2 field in elasticsearch

Ok, I have example result on my data in elastic search :
"hits" : [
{
"_index" : "solutionpedia_data",
"_type" : "doc",
"_id" : "nyODP24BA840z5O6WguE",
"_score" : 46.63439,
"_source" : {
"ID" : "1",
"PRODUCT_NAME" : "ATM",
"UPDATEDATE" : "13-FEB-18",
"PROPOSAL" : [
{
}
],
"MARKETING_KIT" : [ ],
"VIDEO" : [ ]
}
},
{
"_index" : "classification",
"_type" : "doc",
"_id" : "5M-r5m4BNYha4zuWalJa",
"_score" : 39.25268,
"_source" : {
"productId" : "1",
"productName" : "ATM",
"productIconUrl" : "media/8ae0f0c3-1402-4559-901e-7ec9b874ce68-prod032.webp",
"type" : "nonconnectivity",
"businessLineId" : "",
"subsidiaries" : "",
"segment" : [],
"productType" : "Efisien",
"tariff" : null,
"tags" : [ ],
"contact" : [],
"mediaId" : [
"Med391"
],
"documentId" : [
"doc260",
"doc261"
],
"createdAt" : "2019-09-22T05:22:46.956Z",
"updatedAt" : "2019-09-22T05:22:46.956Z",
"totalClick" : 46
}
}
]
this is a result of my alias. can we search for the same data based on 2 different fields, the example above is the ID and productId fields. Can we make these 2 objects in one bucket or compare?
i was try with some aggregate but nothing :
{
"query": {
"match_all": {}
},
"size": 0,
"aggregations": {
"product catalog": {
"terms": {
"field": "productId.keyword",
"min_doc_count": 2,
"size": 100
},
"aggregations": {
"product solped": {
"terms": {
"field": "ID.keyword",
"min_doc_count": 2
}
}
}
}
}
}
result :
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 10,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1276,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"product catalog" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
}
You can achieve this with a Scripted Bucket Aggregation, using script logic to define your buckets (pseudo code: if field a exists value of field a, if field b exists value of field b).
Another (and better) way to achieve this is to change your data model and indexing logic on Elasticsearch side and store the information in a field of the same name.
You could also consider the alias data type to make fields with different names in different indices accessible under one common field name. This is also the approach Elastic takes with the Elastic Common Schema specification.

Resources