Elasticsearch: How to achieve a case sensitive term query? - elasticsearch

I try to query for items by a field called unit which is case sensitive (like kWh), but my term query matches only when I query for kwh (lower case W). What I have seen in the docs is that term should be the right one for case sensitivity, so I am not sure what I am doing wrong.
## Create an item
curl -X POST "localhost:9200/my_index/my_type/my_id" -H 'Content-Type: application/json' -d'{"point_name" : "my_point_name", "unit" : "kWh"}'
=> {"_index":"my_index","_type":"my_type","_id":"my_id","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true}
## Try to query it by unit with exact match (kWh)
curl -X GET "localhost:9200/my_index/my_type/_search" -H 'Content-Type: application/json' -d'{"query" : { "bool" : {"must" : [{ "term" : {"unit" : "kWh"}}]}}}'
=> {"took":36,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
## Query with lower case unit kwh
curl -X GET "localhost:9200/my_index/my_type/_search" -H 'Content-Type: application/json' -d'{"query" : { "bool" : {"must" : [{ "term" : {"unit" : "kwh"}}]}}}'
=> {"took":12,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.2876821,"hits":[{"_index":"my_index","_type":"my_type","_id":"my_id","_score":0.2876821,"_source":{"point_name" : "my_point_name", "unit" : "kWh"}}]}}
I don't want to use match here since I create these queries by other fields as well and I want to ensure an exact match behaviour. Can anyone point me how the query would be correct and why this term query does not work?
I am using this dockerimage as my server:
docker.elastic.co/elasticsearch/elasticsearch:6.2.4

Related

Elasticsearch Document search related

I have an Index in Elasticsearch with one document we can say doc id 01 and I updated the document with new doc ID we can say id 02 now I have two documents.
My Question is I want only one latest document(which is doc id 02) in search query(index/_search)
what will be the query for such type of scenario.
If you want to get the document having the maximum value (assuming you are creating doc_id in increase numerical order from the example given) for doc_id, you can use this query:
curl "https://{es_endpoint}/sample_index/_search?pretty" -H 'Content-Type: application/json' -d'
{
"sort" : [
{ "_id" : {"order" : "desc"}}
],
"size": 1
}'

Elasticsearch: what my index contains: docs or positions?

I've created ES index using the following command:
curl -X PUT -H 'Content-Type: application/json' -H 'Accept: application/json' -d '{"settings" :{"number_of_shards" : 10, "number_of_replicas" : 0, "analysis":{"analyzer": {"my_analyzer": {"type": "custom", "tokenizer":"whitespace","filter":["lowercase","porter_stem"],"stopwords":[...stopwords here ...]}}}}, "mappings" : {"html" : {"properties" : "head" : { "type" : "text", "analyzer": "my_analyzer" }, "body" : { "type" : "text", "analyzer": "my_analyzer"}}}}}' localhost:9200/docs
I read here that:
Analyzed string fields use positions as the default, and all other fields use docs as the default.
Since my fields are of text type, are they considered string fields?
My main issue is how to know what does my index contain (docs or positions?) for each field! I used \docs\_settings command to get the index settings, but didn't get useful answer?
Any hints?
EDIT:
In addition answer of #ibexit below, I verified that practically by issuing phrase queries against ES indices.
You defined the fields as text, without specifying index_options in your mapping. In this case the default for text fields will be applied (index_options=positions). The inverse index will now contain doc number, term frequencies, and term positions (or order) for the text fields.
For more in depth information about inverted indices please have a look on https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up or https://youtu.be/x37B_lCi_gc
This should be a good starting point for your research.
Cheers!

Elastic search simple query to find all ids

I am trying to get all id's for a type, but I am pulling my hair out.
Please see my attacment.
HERE IS THE cURL call :
curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type: application/json'
-d'{ "query": { "wildcard" : { "id" : "Account[enter image description here][1]*" } }}'
cURL call with no results
I would guess there is an issue with the way your id-field is analyzed. You can retrieve the mapping by using the _mapping endpoint (described in the docs). Your id field should be analyzed as a string (with break characters, tokenizers and all) for the wildcard query to work. If it is not analyzed, as you might expect for an id-field, the wildcard query will not work. Then you would need to change the mapping and reindex your data to make it work.

Elasticsearch: is bulk search possible?

i know there is support for bulk index operation. but is it possible to do the same for search queries? i want to send many different unrelated queries (to do precision/recall testing) and it would probably be faster using bulk query
Yes, you can use the multi search API and the /_msearch endpoint to send as many queries as you wish in one shot.
curl -XPOST localhost:9200/_msearch -d '
{"index" : "test1"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10}
{"index" : "test2"}
{"query" : {"match_all" : {}}}
'
You'll get a responses array with the response of each query in the same order as in the request.
Note:
make sure to separate each line by a newline character
make sure to add the extra newline after the last query.

Determining which words were matched in a fuzzy search

I'm running a fuzzy search, and need to see which words were matched. For example, if I am searching for the query testing, and it matches a field with the sentence The boy was resting, I need to be able to know that the match was due to the word resting.
I tried setting the parameter explain = true, but it doesn't seem to contain the information I need. Any thoughts?
Alright, this is what I was looking for:
After a bit of research, I found the Highlighting feature of elasticsearch.
By default it returns a snippet of context surrounding the match, but you can set the fragment size to the query length to return only the exact match. For example:
{
query : query,
highlight : {
"fields" : {
'text' : {
"fragment_size" : query.length
}
}
}
}
Using explain should give you some clues, although not very easily available.
If you run the following, also available at https://www.found.no/play/gist/daa46f0e14273198691a , you should see e.g. description: "weight(text:nesting^0.85714287 in 1) […], description: "weight(text:testing in 1) [PerFieldSimilarity] […] and so on in the hit's _explanation.
#!/bin/bash
export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
# Create indexes
curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{}'
# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"text":"The boy was resting"}
{"index":{"_index":"play","_type":"type"}}
{"text":"The bird was testing while nesting"}
'
# Do searches
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"match": {
"text": {
"query": "testing",
"fuzziness": 1
}
}
},
"explain": true
}
'

Resources