Elasticsearch - different searchstring for different fields - elasticsearch

is it possible to query different fields with different searchstrings in one query?
For example:
{
"query": [
{ "match": { "name": "Bob" }},
{ "match": { "title": "Awesome Title" }}]
}
where "name" and "title" are fields of documents.
I know there's a multi_match query, but there I query a list of fields all with the same string...

You can try this query to search for documents that match both the conditions ("name": "Bob" and "title": "Awesome Title"). Replace the <index_name> with the name of the index.
$ curl -XGET 'localhost:9200/<index_name>/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"name": "Bob"
}
},
{
"match_phrase": {
"title": "Awesome Title"
}
}
],
"minimum_should_match": 2
}
}
}
'
Illustration:
(a) Index 4 documents
# Doc 1 - Only "name" matches
$ curl -X PUT "localhost:9200/office/doc/o001" -H 'Content-Type: application/json' -d'
{
"name" : "Bob",
"title" : "Senior Staff",
"description" : "Developing new products"
}
'
# Doc 2 - None of the criteria match
$ curl -X PUT "localhost:9200/office/doc/o002" -H 'Content-Type: application/json' -d'
{
"name" : "Tom",
"title" : "Marketing Manager",
"description" : "Shows and events"
}
'
# Doc 3 - Only "title" matches
$ curl -X PUT "localhost:9200/office/doc/o003" -H 'Content-Type: application/json' -d'
{
"name" : "Liz",
"title" : "Awesome Title",
"description" : "Recruiting talent"
}
'
# Doc 4 - Both "name" and "title" match - Expected in result
$ curl -X PUT "localhost:9200/office/doc/o004" -H 'Content-Type: application/json' -d'
{
"name" : "Bob",
"title" : "Awesome Title",
"description" : "Finance Operations"
}
'
(b) Verify that the documents were indexed
$ curl 'localhost:9200/office/_search?q=*'
# Output
{"took":19,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":4,"max_score":1.0,"hits":[{"_index":"office","_type":"doc","_id":"o003","_score":1.0,"_source":
{
"name" : "Liz",
"title" : "Awesome Title",
"description" : "Recruiting talent"
}
},{"_index":"office","_type":"doc","_id":"o001","_score":1.0,"_source":
{
"name" : "Bob",
"title" : "Senior Staff",
"description" : "Developing new products"
}
},{"_index":"office","_type":"doc","_id":"o004","_score":1.0,"_source":
{
"name" : "Bob",
"title" : "Awesome Title",
"description" : "Finance Operations"
}
},{"_index":"office","_type":"doc","_id":"o002","_score":1.0,"_source":
{
"name" : "Tom",
"title" : "Marketing Manager",
"description" : "Shows and events"
}
(c) Run the query. Result is the one doc (with id=o004) that matches both criteria:
$ curl -XGET 'localhost:9200/office/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"name": "Bob"
}
},
{
"match_phrase": {
"title": "Awesome Title"
}
}
],
"minimum_should_match": 2
}
}
}
'
(d) Get the query result
{
"took" : 27,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.4261419,
"hits" : [
{
"_index" : "office",
"_type" : "doc",
"_id" : "o004",
"_score" : 1.4261419,
"_source" : {
"name" : "Bob",
"title" : "Awesome Title",
"description" : "Finance Operations"
}
}
]
}
}

Related

How to order search results with related to slop in elasticsearch?

I have an index in ES:
curl -XGET 'http://127.0.0.1:9200/so/_settings?pretty=true'
{
"so" : {
"settings" : {
"index" : {
"number_of_shards" : "1",
"provided_name" : "so",
"creation_date" : "1594912442805",
"analysis" : {
"analyzer" : {
"my_simple_analyzer" : {
"type" : "simple",
"tokenizer" : "lowercase"
}
}
},
"number_of_replicas" : "1",
"uuid" : "8YVu4zU_Sdylr3KhOIwu9Q",
"version" : {
"created" : "7080099"
}
}
}
}
}
It has around 1.5M data.
curl -XGET 'http://127.0.0.1:9200/so/_count?pretty=true'
{
"count" : 15426942,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
I wanted to perform a full text search, so that the query string first does the phrase match and then followed by results which has slop of 1, then slop of 2 and so on.
So I came up with the below query for the same:
curl -XGET 'http://127.0.0.1:9200/so/_search?pretty=true' -H 'Content-Type: application/json' -d '{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"posts": {
"query": "get the scanner on a specific family like this",
"_name": "exact_match"
}
}
},
{
"match": {
"posts": {
"query": "get the scanner on a specific family like this",
"_name": "partial_match"
}
}
}
]
}
}
}'
Is this the correct query? Because I do see the partial_match doesnt sort from slop of distance 1 and so on. How to achieve it?

Search by exact match in all fields in Elasticsearch

Let's say I have 3 documents, each of them only contains one field (but let's imagine that there are more, and we need to search through all fields).
Field value is "first second"
Field value is "second first"
Field value is "first second third"
Here is a script that can be used to create these 3 documents:
# drop the index completely, use with care!
curl -iX DELETE 'http://localhost:9200/test'
curl -H 'content-type: application/json' -iX PUT 'http://localhost:9200/test/_doc/one' -d '{"name":"first second"}'
curl -H 'content-type: application/json' -iX PUT 'http://localhost:9200/test/_doc/two' -d '{"name":"second first"}'
curl -H 'content-type: application/json' -iX PUT 'http://localhost:9200/test/_doc/three' -d '{"name":"first second third"}'
I need to find the only document (document 1) that has exactly "first second" text in one of its fields.
Here is what I tried.
A. Plain search:
curl -H 'Content-Type: application/json' -iX POST 'http://localhost:9200/test/_search' -d '{
"query": {
"query_string": {
"query": "first second"
}
}
}'
returns all 3 documents
B. Quoting
curl -H 'Content-Type: application/json' -iX POST 'http://localhost:9200/test/_search' -d '{
"query": {
"query_string": {
"query": "\"first second\""
}
}
}'
gives 2 documents: 1 and 3, because both contain 'first second'.
Here https://stackoverflow.com/a/28024714/7637120 they suggest to use 'keyword' analyzer to analyze the fields when indexing, but I would like to avoid any customizations to the mapping.
Is it possible to avoid them and still only find document 1?
Yes, you can do that by declaring name mapping type as keyword. The key to solve your problem is just simple -- declare name mapping type:keyword and off you go
to demonstrate it, I have done these
1) created mapping with `keyword` for `name` field`
2) indexed the three documents
3) searched with a `match` query
mappings
PUT so_test16
{
"mappings": {
"_doc":{
"properties":{
"name": {
"type": "keyword"
}
}
}
}
}
Indexing the documents
POST /so_test16/_doc
{
"id": 1,
"name": "first second"
}
POST /so_test16/_doc
{
"id": 2,
"name": "second first"
}
POST /so_test16/_doc
{
"id": 3,
"name": "first second third"
}
The query
GET /so_test16/_search
{
"query": {
"match": {"name": "first second"}
}
}
and the result
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "so_test16",
"_type" : "_doc",
"_id" : "m1KXx2sB4TH56W1hdTF9",
"_score" : 0.2876821,
"_source" : {
"id" : 1,
"name" : "first second"
}
}
]
}
}
Adding second solution
( if the name is not a keyword type but a text type. Only thing here is fielddata:true also needed to be added for name field)
Mappings
PUT so_test18
{
"mappings" : {
"_doc" : {
"properties" : {
"id" : {
"type" : "long"
},
"name" : {
"type" : "text",
"fielddata": true
}
}
}
}
}
and the search query
GET /so_test18/_search
{
"query": {
"bool": {
"must": [
{"match_phrase": {"name": "first second"}}
],
"filter": {
"script": {
"script": {
"lang": "painless",
"source": "doc['name'].values.length == 2"
}
}
}
}
}
}
and the response
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.3971361,
"hits" : [
{
"_index" : "so_test18",
"_type" : "_doc",
"_id" : "o1JryGsB4TH56W1hhzGT",
"_score" : 0.3971361,
"_source" : {
"id" : 1,
"name" : "first second"
}
}
]
}
}
In Elasticsearch 7.1.0, it seems that you can use keyword analyzer even without creating a special mapping. At least I didn't, and the following query does what I need:
curl -H 'Content-Type: application/json' -iX POST 'http://localhost:9200/test/_search' -d '{
"query": {
"query_string": {
"query": "first second",
"analyzer": "keyword"
}
}
}'

Query issue with parent-children relation in Elasticsearch

Having the following children-father mapping:
curl -XPUT 'localhost:9200/my_index' -d '{
"mappings": {
"my_parent": {
"dynamic": "strict",
"properties" : {
"title" : { "type": "string" },
"body" : { "type": "string" },
"source_id" : { "type": "integer" },
}
},
"my_child": {
"_parent": {"type": "my_parent" },
"properties" : {
"user_id" : { "type": "string" },
}}}}'
... this two parents with ids 10 and 11:
curl -X PUT 'localhost:9200/my_index/my_parent/10' -d '{
"title" : "Microsiervos - Discos duros de 10TB",
"body" : "Empiezan a sacar DD de 30GB en el mercado",
"source_id" : "27",
}'
curl -X PUT 'localhost:9200/my_index/my_parent/11' -d '{
"title" : "Microsiervos - En el 69 llegamos a la luna",
"body" : "Se cumplen 3123 anos de la llegada a la luna",
"source_id" : "27",
}'
... and this two childrens:
curl -XPUT 'localhost:9200/my_index/my_child/1234_10?parent=10' -d '{
"user_id": "1234",
}'
curl -XPUT 'localhost:9200/my_index/my_child/1234_11?parent=11' -d '{
"user_id": "1234",
}'
With the following query, I want to get the _id of the father with user_id = 1234.
curl -XGET 'localhost:9200/my_index/my_parent/_search?pretty=true' -d '{
"_source" : "_id",
"query": {
"has_child": {
"type": "my_child",
"query" : {
"query_string" : {
"default_field" : "user_id",
"query" : "1234"
}}}}}'
This outputs the two ids, 10 and 11.
Now I want to search on parent on those specific ids only, something like this:
curl -XGET 'localhost:9200/my_index/my_parent/_search?pretty=true' -d '{
"query": {
"bool": {
"must": [
{
"terms": {
"_id": ["10", "11"]
}},
{
"query_string": {
"default_field": "body",
"query": "mercado"
}}]}}}'
As you can notice, the "_id": ["10", "11"] part is written by hand. I would like to know if there's a way to combine this two queries in one single query putting the ids returned in the first query automatically on the second query.
So the output to this should be:
},
"hits" : {
"total" : 1,
"max_score" : 0.69177496,
"hits" : [ {
"_index" : "my_index",
"_type" : "my_parent",
"_id" : "10",
"_score" : 0.69177496,
"_source":{
"title" : "Microsiervos - Discos duros de 10TB",
"body" : "Empiezan a sacar DD de 30GB en el mercado",
"source_id" : "27"
}}]}}
Use bool Query and put both conditions in must:
curl -XGET "http://localhost:9200/my_index/my_parent/_search" -d'
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "body",
"query": "mercado"
}
},
{
"has_child": {
"type": "my_child",
"query": {
"query_string": {
"default_field": "user_id",
"query": "1234"
}
}
}
}
]
}
}
}'

Elasticsearch children-father issue with 'has_parent'

Having the following mapping...
curl -XPUT 'localhost:9200/myindex' -d '{
"mappings": {
"my_parent": {},
"my_child": {
"_parent": {
"type": "my_parent"
}}}}'
... the following parent:
curl -X PUT localhost:9200/myindex/my_parent/1?pretty=true' -d '{
"title" : "Microsiervos - Discos duros de 10TB",
"body" : "Empiezan a sacar DD de 30GB en el mercado"
}'
and the following children:
curl -XPUT 'localhost:9200/myindex/my_child/2?parent=1' -d '{
"user": "Pepe"
}'
If I do the following has_child query:
curl -XGET 'localhost:9200/myindex/my_parent/_search?pretty=true' -d '{
"query": {
"has_child": {
"type": "my_child",
"query" : {
"query_string" : {
"default_field" : "user",
"query" : "Pepe"
}}}}}'
I get the desired output. Pepe is found, and his father is shown:
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "myindex",
"_type" : "my_parent",
"_id" : "2",
"_score" : 1.0,
"_source":{
"title" : "Microsiervos - En el 69 llegamos a la luna",
"body" : "Se cumplen 3123 anos de la llegada a la luna"
}
But if I try to do it in reverse trying to get the children using has_parent:
curl -XGET 'localhost:9200/myindex/my_parent/_search?pretty=true' -d '{
"query": {
"has_parent": {
"parent_type": "my_parent",
"query" : {
"query_string" : {
"default_field" : "body",
"query" : "mercado"
}}}}}'
I dont get any hits. I was supposing to get the Pepe children as output. What I'm missing or doing wrong?
PS: I'm using Elasticsearch 2.1.1
You have made a mistake above.You are searching in my_parent type. If you want to fetch children using parent, you should fetch it from child_type.
Change query to:
curl -XGET 'localhost:9200/myindex/my_child/_search?pretty=true' -d
'{
"query": {
"has_parent": {
"parent_type": "my_parent",
"query" : {
"query_string" : {
"default_field" : "body",
"query" : "mercado"
}
}
}
}
}'
Please note I have used
curl -XGET 'localhost:9200/myindex/my_child/_search?pretty=true'
instead of
curl -XGET 'localhost:9200/myindex/my_parent/_search?pretty=true'

ElasticSearch - searching different doc_types with the same field name but different analyzers

Let's say I make a simple ElasticSearch index:
curl -XPUT 'http://localhost:9200/test/' -d '{
"settings": {
"analysis": {
"char_filter": {
"de_acronym": {
"type": "mapping",
"mappings": [".=>"]
}
},
"analyzer": {
"analyzer1": {
"type": "custom",
"tokenizer": "keyword",
"char_filter": ["de_acronym"]
}
}
}
}
}'
And I make two doc_types that have the same property name but they are analyzed slightly differently from one another:
curl -XPUT 'http://localhost:9200/test/_mapping/docA' -d '{
"docA": {
"properties": {
"name": {
"type": "string",
"analyzer": "simple"
}
}
}
}'
curl -XPUT 'http://localhost:9200/test/_mapping/docB' -d '{
"docB": {
"properties": {
"name": {
"type": "string",
"analyzer": "analyzer1"
}
}
}
}'
Next, let's say I put a document in each doc_type with the same name:
curl -XPUT 'http://localhost:9200/test/docA/1' -d '{ "name" : "U.S. Army" }'
curl -XPUT 'http://localhost:9200/test/docB/1' -d '{ "name" : "U.S. Army" }'
Let's try to search for "U.S. Army" in both doc types at the same time:
curl -XGET 'http://localhost:9200/test/_search?pretty' -d '{
"query": {
"match_phrase": {
"name": {
"query": "U.S. Army"
}
}
}
}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.5,
"hits" : [ {
"_index" : "test",
"_type" : "docA",
"_id" : "1",
"_score" : 1.5,
"_source":{ "name" : "U.S. Army" }
} ]
}
}
I only get one result! I get the other result when I specify docB's analyzer:
curl -XGET 'http://localhost:9200/test/_search?pretty' -d '
{
"query": {
"match_phrase": {
"name": {
"query": "U.S. Army",
"analyzer": "analyzer1"
}
}
}
}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "docB",
"_id" : "1",
"_score" : 1.0,
"_source":{ "name" : "U.S. Army" }
} ]
}
}
I was under the impression that ES would search each doc_type with the appropriate analyzer. Is there a way to do this?
The ElasticSearch docs say that precedence for search analyzer goes:
1) The analyzer defined in the query itself, else
2) The analyzer defined in the field mapping, else
...
In this case, is ElasticSearch arbitrarily choosing which field mapping to use?
Take a look at this issue in github, which seems to have started from this post in ES google groups. I believe it answers your question:
if its in a filtered query, we can't infer it, so we simply pick one of those and use its analysis settings

Resources