Searching in array by elasticsearch - elasticsearch

I want to search in elasticsearch but getting hit even though the condition does not match.
For eg:-
{
tweet: [
{
firstname: Lav
lastname: byebye
}
{
firstname: pointto
lastname: ihadcre
}
{
firstname: letssearch
lastname: sarabhai
}
]
}
}
Now there are following condition:-
1)
must:- firstname: Lav
must:- lastname: byebye
required:there should be hit
getting: Hit
2)
must:- firstname: Lav
must:- lastname: ihadcre
required:there should not be hit
getting: Hit
I should not be getting hit in 2nd condition which is problem
thanks for help

To achieve the behavior that you are describing, tweets should be indexed as nested objects and searched using nested query or filter. For example:
curl -XDELETE localhost:9200/test-idx
curl -XPUT localhost:9200/test-idx -d '{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0
},
"mappings": {
"doc": {
"properties": {
"tweet": {"type": "nested"}
}
}
}
}'
curl -XPUT "localhost:9200/test-idx/doc/1" -d '{
"tweet": [{
"firstname": "Lav",
"lastname": "byebye"
}, {
"firstname": "pointto",
"lastname": "ihadcre"
}, {
"firstname": "letssearch",
"lastname": "sarabhai"
}]
}
'
echo
curl -XPOST "localhost:9200/test-idx/_refresh"
echo
curl "localhost:9200/test-idx/doc/_search?pretty=true" -d '{
"query": {
"nested" : {
"path" : "tweet",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"match" : {"tweet.firstname" : "Lav"}
},
{
"match" : {"tweet.lastname" : "byebye"}
}
]
}
}
}
}
}'
echo
curl "localhost:9200/test-idx/doc/_search?pretty=true" -d '{
"query": {
"nested" : {
"path" : "tweet",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"match" : {"tweet.firstname" : "Lav"}
},
{
"match" : {"tweet.lastname" : "ihadcre"}
}
]
}
}
}
}
}'

Related

Range in elasticsearch not really working

I run this query:
curl -X GET "localhost:9200/mydocs/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool" : { "must" : [{"wildcard": {"guid": "14744*"}}, {"range": {"availability.start": {"lt": "now"}}}] }
}
}
'
I then get this response:
"hits" : [
{
"_index" : "mydocs",
"_type" : "_doc",
"_id" : "14744",
"_score" : 2.0,
"_source" : {
"guid" : "14744",
"availability" : {
"start" : "2021-03-28T22:00:00.000Z",
"end" : "2021-12-31T22:59:00.000Z"
},
"title" : "Some title"
}
}
]
What I actually want is results where today is in the range for the availability's start and end.
The above results says the document is available between
2021-03-28T22:00:00.000Z
and
2021-12-31T22:59:00.000Z
Today is 2021-04-15:15:00.000Z
So, what I shoud do is to add:
{"range": {"availability.end": {"gt": "now"}}}
isn't it correct? But when I run:
curl -X GET "localhost:9200/mydocs/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool" : { "must" : [{"wildcard": {"guid": "14744*"}}, {"range": {"availability.start": {"lt": "now"}}}, {"range": {"availability.end": {"gt": "now"}}}] }
}
}
'
I got an empty hits list.
Partial mapping:
{
mappings: {
_doc: {
properties: {
availability: {
properties: {
end: {
type: "keyword"
},
start: {
type: "keyword"
}
}
},
properties: {
guid: {
type: "keyword"
}
}
}
}
}
Your query is perfectly correct! Good job with that!
The problem is that the availability.* fields are defined as keyword.
They MUST be of type date in order for range queries on date values to deliver accurate results, otherwise the range queries will just perform a lexical (i.e. string) comparison of now vs the date values expressed as strings:
availability: {
properties: {
end: {
type: "date" <--- change this
},
start: {
type: "date" <--- and this
}
}
},
You can't change the mapping of existing fields, but you can always create new fields. So, you can change your mapping to create new date sub-fields for both start and end, like this:
PUT mydocs/_mapping
{
"properties": {
"availability": {
"properties": {
"end": {
"type": "keyword",
"fields": {
"date": {
"type": "date"
}
}
},
"start": {
"type": "keyword",
"fields": {
"date": {
"type": "date"
}
}
}
}
}
}
}
Then you simply need to run the following command in order to update your index:
POST mydocs/_update_by_query
And then modify your query to use the new sub-fields and that will work:
POST mydocs/_search
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"guid": "14744*"
}
},
{
"range": {
"availability.start.date": {
"lt": "now"
}
}
},
{
"range": {
"availability.end.date": {
"gt": "now"
}
}
}
]
}
}
}

How to find out what is my index sorted by in elasticsearch?

I created new index in elasticsearch (v6) using command:
curl -XPUT -H 'Content-Type: application/json' http://localhost:9200/sorttest -d '
{
"settings" : {
"index" : {
"sort.field" : ["username", "date"],
"sort.order" : ["asc", "desc"]
}
},
"mappings": {
"_doc": {
"properties": {
"username": {
"type": "keyword",
"doc_values": true
},
"date": {
"type": "date"
}
}
}
}
}
'
The response was
{"acknowledged":true,"shards_acknowledged":true,"index":"sorttest"}
Next I checked out generated mapping
curl -XGET localhost:9200/sorttest/_mapping?pretty
And the result was
{
"sorttest" : {
"mappings" : {
"_doc" : {
"properties" : {
"date" : {
"type" : "date"
},
"username" : {
"type" : "keyword"
}
}
}
}
}
}
The question is: how can I find out what kind of sorting is set for my index?
Just
curl -XGET localhost:9200/sorttest?pretty
and you will see:
"settings" : {
"index" : {
...
"sort" : {
"field" : [
"username",
"date"
],
"order" : [
"asc",
"desc"
]
},
...
}
}

Query issue with parent-children relation in Elasticsearch

Having the following children-father mapping:
curl -XPUT 'localhost:9200/my_index' -d '{
"mappings": {
"my_parent": {
"dynamic": "strict",
"properties" : {
"title" : { "type": "string" },
"body" : { "type": "string" },
"source_id" : { "type": "integer" },
}
},
"my_child": {
"_parent": {"type": "my_parent" },
"properties" : {
"user_id" : { "type": "string" },
}}}}'
... this two parents with ids 10 and 11:
curl -X PUT 'localhost:9200/my_index/my_parent/10' -d '{
"title" : "Microsiervos - Discos duros de 10TB",
"body" : "Empiezan a sacar DD de 30GB en el mercado",
"source_id" : "27",
}'
curl -X PUT 'localhost:9200/my_index/my_parent/11' -d '{
"title" : "Microsiervos - En el 69 llegamos a la luna",
"body" : "Se cumplen 3123 anos de la llegada a la luna",
"source_id" : "27",
}'
... and this two childrens:
curl -XPUT 'localhost:9200/my_index/my_child/1234_10?parent=10' -d '{
"user_id": "1234",
}'
curl -XPUT 'localhost:9200/my_index/my_child/1234_11?parent=11' -d '{
"user_id": "1234",
}'
With the following query, I want to get the _id of the father with user_id = 1234.
curl -XGET 'localhost:9200/my_index/my_parent/_search?pretty=true' -d '{
"_source" : "_id",
"query": {
"has_child": {
"type": "my_child",
"query" : {
"query_string" : {
"default_field" : "user_id",
"query" : "1234"
}}}}}'
This outputs the two ids, 10 and 11.
Now I want to search on parent on those specific ids only, something like this:
curl -XGET 'localhost:9200/my_index/my_parent/_search?pretty=true' -d '{
"query": {
"bool": {
"must": [
{
"terms": {
"_id": ["10", "11"]
}},
{
"query_string": {
"default_field": "body",
"query": "mercado"
}}]}}}'
As you can notice, the "_id": ["10", "11"] part is written by hand. I would like to know if there's a way to combine this two queries in one single query putting the ids returned in the first query automatically on the second query.
So the output to this should be:
},
"hits" : {
"total" : 1,
"max_score" : 0.69177496,
"hits" : [ {
"_index" : "my_index",
"_type" : "my_parent",
"_id" : "10",
"_score" : 0.69177496,
"_source":{
"title" : "Microsiervos - Discos duros de 10TB",
"body" : "Empiezan a sacar DD de 30GB en el mercado",
"source_id" : "27"
}}]}}
Use bool Query and put both conditions in must:
curl -XGET "http://localhost:9200/my_index/my_parent/_search" -d'
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "body",
"query": "mercado"
}
},
{
"has_child": {
"type": "my_child",
"query": {
"query_string": {
"default_field": "user_id",
"query": "1234"
}
}
}
}
]
}
}
}'

How to see which of the queries in boolean is matched?

I have given multiple queries using the bool query. Now it can happen that some of them might have matches and some queries might not have matches in the database. How can I know which of the queries had a match?
For example, here I have a bool query with two should conditions against the field landMark.
{
"query": {
"bool": {
"should": [
{
"match": {
"landMark": "wendys"
}
},
{
"match": {
"landMark": "starbucks"
}
}
]
}
}
}
How can I know which one of them matched in the above query if only one of them matches the documents?
You can use named queries for this purpose. Try this
{
"query": {
"bool": {
"should": [
{
"match": {
"landMark": {
"query": "wendys",
"_name": "wendy match"
}
}
},
{
"match": {
"landMark": {
"query": "starbucks",
"_name": "starbucks match"
}
}
}
]
}
}
}
you can use any _name . In response you will get something like this
"matched_queries": ["wendy match"]
so you will be able to tell which query matched that specific document.
Named query is certainly the way to go.
LINK - https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-named-queries-and-filters.html
Idea of named query is simple , you tag a name to each of your query and in the result , it shows which all tags matched per document.
curl -XPOST 'http://localhost:9200/data/data' -d ' { "landMark" : "wendys near starbucks" }'
curl -XPOST 'http://localhost:9200/data/data' -d ' { "landMark" : "wendys" }'
curl -XPOST 'http://localhost:9200/data/data' -d ' { "landMark" : "starbucks" }'
Hence create you query in this fashion -
curl -XPOST 'http://localhost:9200/data/_search?pretty' -d '{
"query": {
"bool": {
"should": [
{
"match": {
"landMark": {
"query": "wendys",
"_name": "wendy_is_a_match"
}
}
},
{
"match": {
"landMark": {
"query": "starbucks",
"_name": "starbuck_is_a_match"
}
}
}
]
}
}
}'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.581694,
"hits" : [ {
"_index" : "data",
"_type" : "data",
"_id" : "AVMCNNCY3OZJfBZCJ_tO",
"_score" : 0.581694,
"_source": { "landMark" : "wendys near starbucks" },
"matched_queries" : [ "starbuck_is_a_match", "wendy_is_a_match" ] ---> "Matched tags
}, {
"_index" : "data",
"_type" : "data",
"_id" : "AVMCNS0z3OZJfBZCJ_tQ",
"_score" : 0.1519148,
"_source": { "landMark" : "starbucks" },
"matched_queries" : [ "starbuck_is_a_match" ]
}, {
"_index" : "data",
"_type" : "data",
"_id" : "AVMCNRsF3OZJfBZCJ_tP",
"_score" : 0.04500804,
"_source": { "landMark" : "wendys" },
"matched_queries" : [ "wendy_is_a_match" ]
} ]
}
}

Multiple properties in facet (elasticsearch)

I have following index:
curl -XPUT "http://localhost:9200/test/" -d '
{
"mappings": {
"files": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"owners": {
"type": "nested",
"properties": {
"name": {
"type":"string",
"index":"not_analyzed"
},
"mail": {
"type":"string",
"index":"not_analyzed"
}
}
}
}
}
}
}
'
With sample documents:
curl -XPUT "http://localhost:9200/test/files/1" -d '
{
"name": "first.jpg",
"owners": [
{
"name": "John Smith",
"mail": "js#example.com"
},
{
"name": "Joe Smith",
"mail": "joes#example.com"
}
]
}
'
curl -XPUT "http://localhost:9200/test/files/2" -d '
{
"name": "second.jpg",
"owners": [
{
"name": "John Smith",
"mail": "js#example.com"
},
{
"name": "Ann Smith",
"mail": "as#example.com"
}
]
}
'
curl -XPUT "http://localhost:9200/test/files/3" -d '
{
"name": "third.jpg",
"owners": [
{
"name": "Kate Foo",
"mail": "kf#example.com"
}
]
}
'
And I need to find all owners that match some query, let's say "mit":
curl -XGET "http://localhost:9200/test/files/_search" -d '
{
"facets": {
"owners": {
"terms": {
"field": "owners.name"
},
"facet_filter": {
"query": {
"query_string": {
"query": "*mit*",
"default_field": "owners.name"
}
}
},
"nested": "owners"
}
}
}
'
This gives me following result:
{
"facets" : {
"owners" : {
"missing" : 0,
"_type" : "terms",
"other" : 0,
"total" : 4,
"terms" : [
{
"count" : 2,
"term" : "John Smith"
},
{
"count" : 1,
"term" : "Joe Smith"
},
{
"count" : 1,
"term" : "Ann Smith"
}
]
}
},
"timed_out" : false,
"hits" : {...}
}
And it's ok.
But what I exaclty need is to get owners with their email addresses (for each entry in facet I need additional field in results).
Is it achievable?
Not possible i think? Depending on your needs I would have
Create a composite field with both name & email and do the facet on that field, or
Run the query in addition to the facet and extract it from the query-result, but this is obviously not scalable
Two step-operation, get the facet, build the needed queries and merge results.

Resources