Issue with a basic elasticsearch "terms" query - elasticsearch

I am trying to run a simple elasticsearch terms query as follows (using the sense chrome extension):
GET _search
{
"query": {
"terms": {
"childcareTypes": [
"SHARED_CHARGE",
"OUT_OF_SCHOOL"
],
"minimum_match": 2
}
}
}
This returns 0 hits:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
I am not sure why because a match_all query does show that the two of the three records match:
GET _search
{
"query": {
"match_all": {}
}
}
yields:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "bignibou",
"_type": "advertisement",
"_id": "1",
"_score": 1,
"_source": {
"id": 1,
"childcareWorkerType": "AUXILIAIRE_PARENTALE",
"childcareTypes": [
"SHARED_CHARGE",
"OUT_OF_SCHOOL"
],
"giveBath": "YES"
}
},
{
"_index": "bignibou",
"_type": "advertisement",
"_id": "2",
"_score": 1,
"_source": {
"id": 2,
"childcareWorkerType": "AUXILIAIRE_PARENTALE",
"childcareTypes": [
"SHARED_CHARGE",
"OUT_OF_SCHOOL"
],
"giveBath": "EMPTY"
}
},
{
"_index": "bignibou",
"_type": "advertisement",
"_id": "3",
"_score": 1,
"_source": {
"id": 3,
"childcareWorkerType": "AUXILIAIRE_PARENTALE",
"childcareTypes": [
"SHARED_CHARGE"
],
"giveBath": "YES"
}
}
]
}
}
and my mapping does show that the field childcareTypes is analyzed:
{
"advertisement": {
"dynamic": "false",
"properties": {
"id": {
"type": "long",
"store": "yes"
},
"childcareWorkerType": {
"type": "string",
"store": "yes",
"index": "analyzed"
},
"childcareTypes": {
"type": "string",
"store": "yes",
"index": "analyzed"
},
"giveBath": {
"type": "string",
"store": "yes",
"index": "analyzed"
}
}
}
}
Can someone please explain why my terms query returns 0 hits?

It happens like that because terms will not analyze the input. This means that it will search exactly for SHARED_CHARGE and OUT_OF_SCHOOL (capital letters). Whereas you have that field as "index": "analyzed" which means ES will use the standard analyzer to index the data.
For SHARED_CHARGE ES stores shared_charge.
For OUT_OF_SCHOOL ES stores out_of_school.

and my mapping does show that the field childcareTypes is analyzed:
This is exactly where your problem is : the field is analyzed, however, a terms query look directly for terms, which are not analyzed (see here).
To be more precise, the indexed values look like this :
shared_charge
out_of_school
And your terms query search for :
SHARED_CHARGE
OUT_OF_SCHOOL
You can check this behavior as if you try this query...
POST /bignibou/_search
{
"query": {
"terms": {
"childcareTypes": [
"shared_charge",
"out_of_school"
]
}
}
}
...you will find your docs.
You should either use your previous query on a not_analyzed version of the field, or a query from the match family.

Related

Normalizing keyword field: ascii should match diacritic, but not vice versa

I have a keyword field that can contain characters with diacritics. Queries without diacritics should return results with those diacritics, but not vice versa. The first part can be resolved by using a normalizer, the configuration for which is also described in a related question. If I use that for e.g. {"title": "Sulgi"} and {"title": "Šulgi"}, searching for "Sulgi" will (correctly) return both documents. However, searching for "Šulgi" also returns both documents, instead of just the one with the diacritic. It seems ES is also normalizing the query input, which is generally good, but is it possible to change that behavior?
PUT _template/test
{
"index_patterns": ["*"],
"settings": {
"analysis": {
"normalizer": {
"exact": {
"type": "custom",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "keyword",
"normalizer": "exact"
}
}
}
}
POST test/_doc/1
{
"title": "Sulgi"
}
POST test/_doc/2
{
"title": "Šulgi"
}
Example search query:
POST test/_search
{
"query": {
"term": {
"title":"Šulgi"
}
}
}
{
"took": 294,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.18232156,
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 0.18232156,
"_source": {
"title": "Šulgi"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"_source": {
"title": "Sulgi"
}
}
]
}
}

Is there a way to exclude a particular term from elastic search highlights?

I'm trying out a query in elastic search(version 6.0) where I have a base query and on top of that, I have filters applied to narrow down the search. It is as follows:
GET target_index/_search
{
"from": {start},
"size": {offset},
"_source": [
"id",
"name",
"email",
"company",
"created_at",
],
"query": {
"bool": {
"filter": {
"bool": {
"filter": [
{ "terms":{"name.raw": ["test","test2"] }},
{ "terms":{"email.raw": ["test#test.com","test2#test.com"] }}
]
}
},
"must": {
"query_string": {
"query": "test",
}
}
}
},
"highlight": {
"fields": {
"*":{
"type":"plain"
}
}
}
}
Current result -
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.90374,
"hits": [
{
"_index": "index_name",
"_id": "my_id",
"_score": 1.90374,
"_source": {
"id": 2,
"name": "test",
"email": "test#test.com",
"company": "test company"
},
"highlight": {
"name.raw": [
"<em>test</em>"
],
"name": [
"<em>test</em>"
],
"company": [
"<em>test</em> company"
]
}
}
]
}
}
Desired result -
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.90374,
"hits": [
{
"_index": "index_name",
"_id": "my_id",
"_score": 1.90374,
"_source": {
"id": 2,
"name": "test",
"email": "test#test.com",
"company": "test company"
},
"highlight": {
"company": [
"<em>test</em> company"
]
}
}
]
}
}
Here, in the highlights in the desired result, I don't want the data for "name" and "name.raw". This field should not be searched only for this particular query , so I cannot disable the field entirely from searching.
I have a lot of terms and cannot specify every term to include in the query. Is there a way to exclude only a few fields from query search?
related ES doc -
https://www.elastic.co/guide/en/elasticsearch/reference/6.0/index.html
Instead of excluding certain fields, you could include only those that you need:
{
"query": {
...
},
"highlight": {
"fields": {
"company":{ <---
"type":"plain"
}
}
}
}

Update elastic search data with new key-value pair

{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 25,
"max_score": 1,
"hits": [
{
"_index": "surtest1",
"_type": "catalog",
"_id": "prod_9876561740",
"_score": 1,
"_source": {
"id": "01",
"type": "product"
}
},
{
"_index": "surtest1",
"_type": "catalog",
"_id": "prod_9876543375",
"_score": 1,
"_source": {
"id": "02",
"type": "product"
}
}
]
}
}
This is the sample json response of search inside elastic search.
We need to add one more key-value pair("spec":"4gb") in all the json object like,
"_source": {
"id": "01",
"type": "product" ,
"spec": "4gb"
},
"_source": {
"id": "02",
"type": "product" ,
"spec": "4gb"
}
this updation should be in a single command.Please guide us to perform this operation.
Try
POST /surtest1/_update_by_query?refresh
{
"script": {
"source": "ctx._source['spec']='4gb'"
}
}
Take a look at Update By Query API. You are able to prepare a query to match all documents and use scripting to add the property you want.
Example:
POST twitter/_update_by_query
{
"script": {
"source": "ctx._source.likes++",
"lang": "painless"
},
"query": {
"term": {
"user": "kimchy"
}
}
}

What is the query required for fetching full-text with delimiter in elasticsearch

Assuming I have a document like this in elasticSearch :
{
"videoName": "taylor.mp4",
"type": "long"
}
I tried full-text search using the DSL query:
{
"query": {
"match":{
"videoName": "taylor"
}
}
}
I need to get the above document, but I don't get it .If I specify taylor.mp4, it returns the document.
So, I would like to know, how to make full-text search with delimiters.
Edit after KARTHEEK answer:
The regexp fetches the taylor.mp4 document. Take the situation, where the document in video index are:
{
"videoName": "Akon - smack that.mp4",
"type": "long"
}
So, the query for retrieving this document can be ,
{
"query": {
"match":{
"videoName": "smack that"
}
}
}
In this case, the document will be retrieved, since we use smack in the query string. match does the full-text search and gets us the document. But, say I only know the that keyword and the match, doesn't get the document. I need to use regexp for that.
{
"query": {
"regexp":{
"videoName": "smack.* that.*"
}
}
}
On the Other hand, if i take up regexp and make all my query strings to smack.* that.*, this will also not retrieve any documents. And, we dont know which word will have its suffix .mp4. So, my question is we need to do the full-text search with match, and it should also detect the delimiters. Is there any other way ?
Edit after Richa asked the mapping of index
for http://localhost:9200/example/videos/_mapping
{
"example": {
"mappings": {
"videos": {
"properties": {
"query": {
"properties": {
"match": {
"properties": {
"videoName": {
"type": "string"
}
}
}
}
},
"type": {
"type": "string"
},
"videoName": {
"type": "string"
}
}
}
}
}
}
Depending upon above query you mentioned right we can use regular expression in order get the result.Please find attached result for your perusal and let me know if there are anything else you want.
curl -XGET "http://localhost:9200/test/sample/_search" -d'
{
"query": {
"regexp":{
"videoName": "taylor.*"
}
}
}'
Result:
{
"took": 22,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "sample",
"_id": "1",
"_score": 1,
"_source": {
"videoName": "taylor.mp4",
"type": "long"
}
}
]
}
}
Please use this mapping
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"doc": {
"properties": {
"videoName": {
"type": "string",
"term_vector": "yes"
}
}
}
}
}
After that you need to index a document that you mentioned earlier:
PUT test_index/doc/1
{
"videoName": "Akon - smack that.mp4",
"type": "long"
}
Output:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.15342641,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.15342641,
"_source": {
"videoName": "Akon - smack that.mp4",
"type": "long"
}
}
]
}
}
Query to get results:
GET /test_index/doc/1/_termvector?fields=videoName
Results:
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_version": 1,
"found": true,
"took": 1,
"term_vectors": {
"videoName": {
"field_statistics": {
"sum_doc_freq": 3,
"doc_count": 1,
"sum_ttf": 3
},
"terms": {
"akon": {
"term_freq": 1
},
"smack": {
"term_freq": 1
},
"that.mp4": {
"term_freq": 1
}
}
}
}
}
By using this we will search based on "smack"
POST /test_index/_search
{
"query": {
"match": {
"_all": "smack"
}
}
}
Result:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.15342641,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.15342641,
"_source": {
"videoName": "Akon - smack that.mp4",
"type": "long"
}
}
]
}
}

how to index for specific fields of documents using elasticsearch

My requirement is to store specific fields of document to index in elasticsearch.
Example:
My document is
{
"name":"stev",
"age":26,
"salary":25000
}
This is my document but i don't want indexing total document.I want store only name field.
I created one index emp and write mapping like below
"person" : {
"_all" : {"enabled" : false},
"properties" : {
"name" : {
"type" : "string", "store" : "yes"
}
}
}
When see the index document
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "AU1_p0xAq8r9iH00jFB_",
"_score": 1,
"_source": { }
}
,
{
"_index": "test",
"_type": "test",
"_id": "AU1_lMDCq8r9iH00jFB-",
"_score": 1,
"_source": { }
}
]
}
}
name fields is not generated,Why?
any one help to me
It's hard to tell what you're doing wrong from what you posted, but I can give you an example that works.
Elasticsearch will, by default, index whatever source documents you give it. Every time it sees a new document field, it will create a mapping field with sensible defaults, and it will index them by default as well. If you want to exclude fields, you can set "index": "no" and "store": "no" in the mapping for each field you want to exclude. If you want that behavior to be the default for every field, you can use the "_default_" property for specifying that fields not be stored (though I couldn't get it to work for not indexing).
You probably also will want to disable "_source", and use the "fields" parameter in your search queries.
Here is an example. The index definition looks like this:
PUT /test_index
{
"mappings": {
"person": {
"_all": {
"enabled": false
},
"_source": {
"enabled": false
},
"properties": {
"name": {
"type": "string",
"index": "analyzed",
"store": "yes"
},
"age": {
"type": "integer",
"index": "no",
"store": "no"
},
"salary": {
"type": "integer",
"index": "no",
"store": "no"
}
}
}
}
}
Then I can add a few documents with the bulk api:
POST /test_index/person/_bulk
{"index":{"_id":1}}
{"name":"stev","age":26,"salary":25000}
{"index":{"_id":2}}
{"name":"bob","age":30,"salary":28000}
{"index":{"_id":3}}
{"name":"joe","age":27,"salary":35000}
Since I disabled "_source", a simple query will return only ids:
POST /test_index/_search
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "person",
"_id": "1",
"_score": 1
},
{
"_index": "test_index",
"_type": "person",
"_id": "2",
"_score": 1
},
{
"_index": "test_index",
"_type": "person",
"_id": "3",
"_score": 1
}
]
}
}
But if I specify that I want the "name" field, I'll get it:
POST /test_index/_search
{
"fields": [
"name"
]
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "person",
"_id": "1",
"_score": 1,
"fields": {
"name": [
"stev"
]
}
},
{
"_index": "test_index",
"_type": "person",
"_id": "2",
"_score": 1,
"fields": {
"name": [
"bob"
]
}
},
{
"_index": "test_index",
"_type": "person",
"_id": "3",
"_score": 1,
"fields": {
"name": [
"joe"
]
}
}
]
}
}
You can prove to yourself that the other fields were not stored by running:
POST /test_index/_search
{
"fields": [
"name", "age", "salary"
]
}
which will return the same result. I can also prove that the "age" field wasn't indexed by running this query, which would return a document if "age" had been indexed:
POST /test_index/_search
{
"fields": [
"name", "age"
],
"query": {
"term": {
"age": {
"value": 27
}
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Here is a bunch of code I used for playing around with this. I wanted to use a _default mapping and/or field to handle this without having to specify the settings for each field. I was able to make it work in terms of not storing data, but each field was still indexed.
http://sense.qbox.io/gist/d84967923d6c0757dba5f44240f47257ba2fbe50

Resources