Elasticsearch bool must query - elasticsearch

I am trying to write a Elasticsearch bool query. I am having an issue querying an field (DATE) using bool must query.
Elastic search data look like so:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 15,
"successful": 15,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 11.519888,
"hits": [
{
"_index": "test-2019.06.27",
"_type": "test",
"_id": "pa6gmGsByDlvLvAyiRF-",
"_score": 11.519888,
"_source": {
"DATE": "01/06/19"
}
}
]
}
}
Elasticsearch query like that:
{
"query":
{
"bool" : {
"must" : [
{
"match" : {
"DATE" : {
"query" : "01/06/19",
"operator" : "AND",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
}
The query is not working.
Any idea please?

For date-typed queries, I used to write Range query.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html

Try the below code,
{
"query": {
"range" : {
"DATE" : {
"gte" : "now-1d/d",
"lt" : "now/d"
}
}
}
}

Related

Get elasticsearch to ignore diacritics and accents in search hit

I want to search data on elasticsearch with different languages, and expect the data will be retrieved no matter if there is a diacritics or accent.
``
for example I have this data:
``
POST ابجد/_doc/31
{
"name":"def",
"city":"Tulkarem"
}
``
POST ابجٌد/_doc/31 { "name":"def", "city":"Tulkarem" }
PUT /abce
{
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"tokenizer" : "standard",
"filter" : ["my_ascii_folding"]
}
},
"filter" : {
"my_ascii_folding" : {
"type" : "asciifolding",
"preserve_original" : true
}
}
}
}
}
The difference between the two indexes is the diacritics.
Trying to get data:
GET ابجد/_search
I need it to retrieve both index, currently it is revering this:
`{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "ابجد",
"_id": "31",
"_score": 1,
"_source": {
"name": "def",
"city": "Tulkarem"
}
}
]
}
}

elasticsearch - only return specific fields without _source?

I've found some answer like
Make elasticsearch only return certain fields?
But they all need _source field.
In my system, disk and network are both scarce resources.
I can't store _source field and I don't need _index, _score field.
ElasticSearch Version: 5.5
Index Mapping just likes
{
"index_2020-04-08": {
"mappings": {
"type1": {
"_all": {
"enabled": false
},
"_source": {
"enabled": false
},
"properties": {
"rank_score": {
"type": "float"
},
"first_id": {
"type": "keyword"
},
"second_id": {
"type": "keyword"
}
}
}
}
}
}
My query:
GET index_2020-04-08/type1/_search
{
"query": {
"bool": {
"filter": {
"term": {
"first_id": "hello"
}
}
}
},
"size": 1000,
"sort": [
{
"rank_score": {
"order": "desc"
}
}
]
}
The search results I got :
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "index_2020-04-08",
"_type": "type1",
"_id": "id_1",
"_score": null,
"sort": [
0.06621722
]
},
{
"_index": "index_2020-04-08",
"_type": "type1",
"_id": "id_2",
"_score": null,
"sort": [
0.07864579
]
}
]
}
}
The results I want:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_id": "id_1"
},
{
"_id": "id_2"
}
]
}
}
Can I implement it?
To return specific fields in the document, you must do one of the two:
Include the _source field in your documents, which is enabled by default.
Store specific fields with the stored fields feature which must be enabled manually
Because you want pretty much the document Ids and some metadata, you can use the filter_path feature.
Here's an example that's close to what you want (just change the field list):
$ curl -X GET "localhost:9200/metricbeat-7.6.1-2020.04.02-000002/_search?filter_path=took,timed_out,_shards,hits.total,hits.max_score,hits.hits._id&pretty"
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 1.0,
"hits" : [
{
"_id" : "8SEGSHEBzNscjCyQ18cg"
},
{
"_id" : "8iEGSHEBzNscjCyQ18cg"
},
{
"_id" : "8yEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9CEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9SEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9iEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9yEGSHEBzNscjCyQ18cg"
},
{
"_id" : "-CEGSHEBzNscjCyQ18cg"
},
{
"_id" : "-SEGSHEBzNscjCyQ18cg"
},
{
"_id" : "-iEGSHEBzNscjCyQ18cg"
}
]
}
}
Just to clarify based on the SO question you linked -- you're not storing the _source, you're requesting it from ES. It's usually used to limit what you want to have retrieved, i.e.
...
"_source": ["only", "fields", "I", "need"]
...
_score, _index etc are meta fields that are going to be retrieved no matter what. You can "hack" it a bit by seeting the size to 0 and aggregating, i.e.
{
"size": 0,
"aggs": {
"by_ids": {
"terms": {
"field": "_id"
}
}
}
}
which will save you a few bytes
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Ac76WXEBnteqn982smh_",
"doc_count" : 1
},
{
"key" : "As77WXEBnteqn982EGgq",
"doc_count" : 1
}
]
}
}
}
but performing aggregations has a cost of its own.

Elasticsearch query do not work with # value

When I execute a simple search query on an email it does not return anything to me, unless I remove what follows the "#", why?
I wish to make queries on the e-mails in fuzzy and autocompletion.
ELASTICSEARCH INFOS:
{
"name" : "ZZZ",
"cluster_name" : "YYY",
"cluster_uuid" : "XXX",
"version" : {
"number" : "6.5.2",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "WWW",
"build_date" : "2018-11-29T23:58:20.891072Z",
"build_snapshot" : false,
"lucene_version" : "7.5.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
MAPPING :
PUT users
{
"mappings":
{
"_doc": { "properties": { "mail": { "type": "text" } } }
}
}
ALL DATAS :
[
{ "mail": "firstname.lastname#company.com" },
{ "mail": "john.doe#company.com" }
]
QUERY WORKS :
Term request works but mail == "firstname.lastname#company.com" and not "firstname.lastname"...
QUERY :
GET users/_search
{ "query": { "term": { "mail": "firstname.lastname" } }}
RETURN :
{
"took": 7,
"timed_out": false,
"_shards": { "total": 6, "successful": 6, "skipped": 0, "failed": 0 },
"hits": {
"total": 1,
"max_score": 4.336203,
"hits": [
{
"_index": "users",
"_type": "_doc",
"_id": "H1dQ4WgBypYasGfnnXXI",
"_score": 4.336203,
"_source": {
"mail": "firstname.lastname#company.com"
}
}
]
}
}
QUERY NOT WORKS :
QUERY :
GET users/_search
{ "query": { "term": { "mail": "firstname.lastname#company.com" } }}
RETURN :
{
"took": 0,
"timed_out": false,
"_shards": { "total": 6, "successful": 6, "skipped": 0, "failed": 0 },
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
SOLUTION :
Change mapping (reindex after mapping changes) with uax_url_email analyzer for mails.
PUT users
{
"settings":
{
"index": { "analysis": { "analyzer": { "mail": { "tokenizer":"uax_url_email" } } } }
}
"mappings":
{
"_doc": { "properties": { "mail": { "type": "text", "analyzer":"mail" } } }
}
}
If you use no other tokenizer for your indexed text field, it will use the standard tokenizer, which tokenizes on the # symbol [I don't have a source on this, but there's proof below].
If you use a term query rather than a match query then that exact term will be searched for in the inverted index elasticsearch match vs term query.
Your inverted index looks like this
GET users/_analyze
{
"text": "firstname.lastname#company.com"
}
{
"tokens": [
{
"token": "firstname.lastname",
"start_offset": 0,
"end_offset": 18,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "company.com",
"start_offset": 19,
"end_offset": 30,
"type": "<ALPHANUM>",
"position": 1
}
]
}
To resolve this you could specify your own analyzer for the mail field or you could use the match query, which will analyze your searched text just like how it analyzes the indexed text.
GET users/_search
{
"query": {
"match": {
"mail": "firstname.lastname#company.com"
}
}
}

Using the returned value of sum aggregation - elasticsearch

I made this query to sum all my "practiceValue" that are doubles.
{
"size" : 0,
"query" : {
"bool" : {
"must_not" : [
{
"missing" : { "field" : "practiceObj.practiceValue" }
}
],
"must" : [
{
"match" : { "entityObj.description" : "FIRST" }
}
]
}
},
"aggs" : {
"total" : {
"sum" : { "script" : "(doc['practiceObj.practiceValue'].value)"
}
}
}
}
My query returned the following:
{
"took": 32,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 11477,
"max_score": 0,
"hits": []
},
"aggregations": {
"total": {
"value": 1593598.7499999984
}
}
}
How can I use that "total" value in order to round it?
"value" equals to 1593598.7499999984 and I want to make it 1593598.75
Thanks!

ElasticSearch 'range' query returns inappropriate results

Lets take this query:
{
"timeout": 10000,
"from": 0,
"size": 21,
"sort": [
{
"view_avg": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"range": {
"price": {
"from": 10,
"to": 20
}
}
},
{
"terms": {
"category_ids": [
16405
]
}
}
]
}
}
}
This query on data set that I am running on, should return no results (as all prices are in 100s-1000s range). However, this query returns results, matching prices as:
"price": "1399.00"
"price": "1299.00"
"price": "1089.00"
And so on, and so forth.. Any ideas how I could modify the query, so it returns the correct results?
I'm 99% sure your mapping is wrong and price is declared as string. Elasticsearch is using different Lucene range queries based on the field type as you can see in their documentation. The TermRangeQuery for string type acts like your output, it uses lexicographical ordering (ie. 1100 is between 10 and 20).
To test it you can try the following mapping/search:
PUT tests/
PUT tests/test/_mapping
{
"test": {
"_source" : {"enabled" : false},
"_all" : {"enabled" : false},
"properties" : {
"num" : {
"type" : "float", // <-- HERE IT'S A FLOAT
"store" : "no",
"index" : "not_analyzed"
}
}
}
}
PUT tests/test/1
{
"test" : {
"num" : 100
}
}
POST tests/test/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"num": {
"from": 10,
"to": 20
}
}
}
]
}
}
}
Result:
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
If you delete the index and try to recreate it changing the num type to a string:
PUT tests/test/_mapping
{
"test": {
"_source" : {"enabled" : false},
"_all" : {"enabled" : false},
"properties" : {
"num" : {
"type" : "string", // <-- HERE IT'S A STRING
"store" : "no",
"index" : "not_analyzed"
}
}
}
}
You'll see a different result:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "tests",
"_type": "test",
"_id": "1",
"_score": 1
}
]
}
}
price needs to be a numeric field for that must to work. If it's string it will return. Make sure the mapping is correct, if it would have been float it would have worked.
You can check the mapping of the index with GET /index_name/_mapping.
If you would have had the following (and the price is string):
"range": {
"price": {
"from": 30,
"to": 40
}
}
that shouldn't return the docs because 1 (string) is before 3 or 4 (strings), even if numerically speaking 30 is smaller than 1399.

Resources