Range filter not working - elasticsearch

I'm using the elasticsearch search engine and when I run the code below, the results returned doesn't match the range criteria(I get items with published date below the desired limit):
#!/bin/bash
curl -X GET 'http://localhost:9200/newsidx/news/_search?pretty' -d '{
"fields": [
"art_text",
"title",
"published",
"category"
],
"query": {
"bool": {
"should": [
{
"fuzzy": {"art_text": {"boost": 89, "value": "google" }}
},
{
"fuzzy": {"art_text": {"boost": 75, "value": "twitter" }}
}
],
"minimum_number_should_match": 1
}
},
"filter" : {
"range" : {
"published" : {
"from" : "2013-04-12 00:00:00"
}
}
}
}
'
I also tried putting the range clause in a must one, inside the bool query, but the results were the same.
Edit: I use elasticsearch to search in a mongodb through a river plugin. This is the script I ran to search the mongodb db with ES:
#!/bin/bash
curl -X PUT localhost:9200/_river/mongodb/_meta -d '{
"type":"mongodb",
"mongodb": {
"db": "newsful",
"collection": "news"
},
"index": {
"name": "newsidx",
"type": "news"
}
}'
Besides this, I didn't create another indexes.
Edit 2:
A view to the es mappings:
http://localhost:9200/newsidx/news/_mapping
published: {
type: "string"
}

The reason is in your mapping. The published field, which you are using as a date, is indexed as a string. That's probably because the date format you are using is not the default one in elasticsearch, thus the field type is not auto-detected and it's indexed as a simple string.
You should change your mapping using the put mapping api. You need to define the published field as a date there, specifying the format you're using (can be more than one) and reindex your data.
After that your range filter should work!

Related

Elastic Search Date Range Query

I am new to elastic search and I am struggling with date range query. I have to query the records which fall between some particular dates.The JSON records pushed into elastic search database are as follows:
"messageid": "Some message id",
"subject": "subject",
"emaildate": "2020-01-01 21:09:24",
"starttime": "2020-01-02 12:30:00",
"endtime": "2020-01-02 13:00:00",
"meetinglocation": "some location",
"duration": "00:30:00",
"employeename": "Name",
"emailid": "abc#xyz.com",
"employeecode": "141479",
"username": "username",
"organizer": "Some name",
"organizer_email": "cde#xyz.com",
I have to query the records which has start time between "2020-01-02 12:30:00" to "2020-01-10 12:30:00". I have written a query like this :
{
"query":
{
"bool":
{
"filter": [
{
"range" : {
"starttime": {
"gte": "2020-01-02 12:30:00",
"lte": "2020-01-10 12:30:00"
}
}
}
]
}
}
}
This query is not giving results as expected. I assume that the person who has pushed the data into elastic search database at my office has not set the mapping and Elastic Search is dynamically deciding the data type of "starttime" as "text". Hence I am getting inconsistent results.
I can set the mapping like this :
PUT /meetings
{
"mappings": {
"dynamic": false,
"properties": {
.
.
.
.
"starttime": {
"type": "date",
"format":"yyyy-MM-dd HH:mm:ss"
}
.
.
.
}
}
}
And the query will work but I am not allowed to do so (office policies). What alternatives do I have so that I can achieve my task.
Update :
I assumed the data type to be "Text" but by default Elastic Search applies both "Text" and "Keyword" so that we can implement both Full Text and Keyword based searches. If it is also set as "Keyword" . Will this benefit me in any case. I do not have access to lots of stuff in the office that's why I am unable to debug the query.I only have the search API for which I have to build the query.
GET /meetings/_mapping output :
'
'
'
"starttime" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
'
'
'
Date range queries will not work on text field, for that, you have to use the date field
Since you are working on date fields , best practice is to use the date field.
I would suggest you to reindex your index to another index so that you can change the type of your text field to date field
Step1-: Create index2 using index1 mapping and make sure to change the type of your date field which is text to date type
Step 2-: Run the elasticsearch reindex and reindex all your data from index1 to index2. Since you have changed your field type to date field type. Elasticsearch will now recognize this field as date
POST _reindex
{
"source":{ "index": "index1" },
"dest": { "index": "index2" }
}
Now you can run your Normal date queries on index2
As #jzzfs suggested the idea is to add a date sub-field to the starttime field. You first need to modify the mapping like this:
PUT meetings/_mapping
{
"properties": {
"starttime" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
},
"date": {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss",
}
}
}
}
}
When done, you need to reindex your data using the update by query API so that the starttime.date field gets populated and index:
POST meetings/_update_by_query
When the update is done, you'll be able to leverage the starttime.date sub-field in your query:
{
"query": {
"bool": {
"filter": [
{
"range": {
"starttime.date": {
"gte": "2020-01-02 12:30:00",
"lte": "2020-01-10 12:30:00"
}
}
}
]
}
}
}
There are ways of parsing text fields as dates at search time but the overhead is impractical... You could, however, keep the starttime as text by default but make it a multi-field and query it using starttime.as_date, for example.

Elasticsearch mappings doesn't seem to apply for query

I use Elasticsearch with Spring Boot application. In this application there
I have index customer, and customer contains field secretKey. This secret key is string that is build from numbers and letters in way FOOBAR-000
My goal was to select exactly one customer by his secret key, so I changed mappings to NOT ANALYZE that fields but it seems not to work. What am I doing wrong?
Here's my mapping
curl -X GET 'http://localhost:9200/customer/_mapping'
{
"customer": {
"mappings": {
"customer": {
"properties": {
"secretKey": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
but after I will run query
curl -XGET "http:/localhost:9200/customer/_validate/query?explain" -d'
{
"query": {
"query_string": {
"query": "FOOBAR-3121"
}
}
}'
I get following explanation:
"explanations": [
{
"index": "customer",
"valid": true,
"explanation": "_all:foobar _all:3121"
},
]
From my understanding you have an index called "customer" and within this index, a document containing a "customer field. In your case the secretKey should be nested in the "customer" field. For some reasons Elasticsearch decided to have a strange behaviour if you encapsulate objects without specifying that they are of nested type. This is the article from the doc that explains the behaviour in details. If you specify it with the following :
{
"customer": {
"mappings": {
"_doc": {
"properties": {
"customer": {
"type": "nested"
}
}
}
}
}
}
Then it should work with your query
You need to specify field name in your query, without it ElasticSearch executes query against all field, so you see _all . Try this one:
curl -XGET "http:/localhost:9200/customer/_validate/query?explain" -d'
{
"query": {
"term": {
"secretKey": {
"value": "FOOBAR-3121"
}
}
}
}'
My goal was to select exactly one customer by his secret key
Your requirement is strict, so use MATCH query to select ONLY matched customer!
curl -XGET "http:/localhost:9200/customer/_validate/query?explain" -d'
{
"query": {
"match": {
"secretKey": "FOOBAR-3121"
}
}

Elasticsearch: match exact keywords with special characters

I am storing tags as an array of keywords:
...
Tags: {
type: "keyword"
},
...
Resulting in arrays like this:
Tags: [
"windows",
"opengl",
"unicode",
"c++",
"c",
"cross-platform",
"makefile",
"emacs"
]
I thought that as I am using the keyword type I could easily do exact search terms, as it is not supposed to be using any analyser.
Apparently I was wrong! this gives me results:
body.query.bool.must.push({term: {"_all": "c"}}); # 38 results
But this doesn't:
body.query.bool.must.push({term: {"_all": "c++"}}); # 0 results
Although there are obviously instances of this tag, as seen above.
If I use body.query.bool.must.push({match: {"_all": search}}); instead (using match instead of term) then "c" and "c++" returns the exact same results, which is wrong as well.
The problem here is that you are using _all - Field, which uses an analyzer (standard by default). Make a small test with your data to be sure:
Test 1:
curl -X POST http://127.0.0.1:9200/script/test/_search \
-d '{
"query": {
"term" : { "_all": "c++"}
}
}'
Test 2:
curl -X POST http://127.0.0.1:9200/script/test/_search \
-d '{
"query": {
"term" : { "tags": "c++"}
}
}'
In my test second query returns documents, first not.
Do you really need to search with multiple fields? If so, you can override the default analyzer of _all field - for a quick test I put an index with settings like this:
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"test" : {
"_all" : {"type" : "string", "index" : "not_analyzed", "analyzer" : "keyword"},
"properties": {
"tags": {
"type": "keyword"
}
}
}
}
}
Or you can create Custom _all Field.
Solutions like Multi Field query, that allow to define list of fields to be searched over would rather behave like your example with body.query.bool.must.push({match: {"_all": search}});.

Elasticsearch match exact term

I have an Elasticsearch repo and a aplication that create documents for what we call 'assets'. I need to prevent users to create 'assets' with the same 'title'.
When the user tries to create an 'asset' I am querying the repo with the title and if there is a match an error message is shown to the user.
My problem is that when I query the title I am getting multiple results (for similar matches).
This is my query so far:
GET assets-1/asset/_search
{
"query": {
"match": {
"title": {
"query": "test",
"operator": "and"
}
}
}
}
I have many records with title: 'test 1', 'test 2', 'test bla' and only one with the title 'test'.
But I am getting all of the above.
Is there any condition or property I have to add to the query so I will exact match the term?
Your title field is probably analyzed and thus the test token will match any title containing that token.
In order to implement an exact match you need to have a not_analyzed field and do a term query on it.
You need to change the mapping of your title field to this:
curl -XPUT localhost:9200/assets-1/_mapping/asset -d '{
"asset": {
"properties": {
"title": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}'
Then you need to reindex your data and you'll then be able to run an exact match query like this:
curl -XPOST localhost:9200/assets-1/asset/_search -d '{
"query": {
"term": {
"title.raw": "test"
}
}
}'

How to avoid cross object search behavior with nested types in elastic search

I am trying to determine the best way to index a document in elastic search. I have a document, Doc, which has some fields:
Doc
created_at
updated_at
field_a
field_b
But Doc will also have some fields specific to individual users. For example, field_x will have value 'A' for user 1, and field_x will have value 'B' for user 2. For each doc, there will be a very limited number of users (typically 2, up to ~10). When a user searches on field_x, they must search on the value that belongs to them. I have been exploring nested types in ES.
Doc
created_at
updated_at
field_x: [{
user: 1
field_x: A
},{
user: 2
field_x: B
}]
When user 1 searches on field_x for value 'A', this doc should result in a hit. However, it should not when user 1 searches by value 'B'.
However, according to the docs:
One of the problems when indexing inner objects that occur several
times in a doc is that “cross object” search match will occur
Is there a way to avoid this behavior with nested types or should I explore another type?
Additional information regarding performance of such queries would be very valuable. Just from reading the docs, its stated that nested queries are not too different in terms of performance as related to regular queries. If anyone has real experience this, I would love to hear it.
Nested type is what you are looking for, and don't worry too much about performance.
Before indexing your documents, you need to set the mapping for your documents:
curl -XDELETE localhost:9200/index
curl -XPUT localhost:9200/index
curl -XPUT localhost:9200/index/type/_mapping -d '{
"type": {
"properties": {
"field_x": {
"type": "nested",
"include_in_parent": false,
"include_in_root": false,
"properties": {
"user": {
"type": "string"
},
"field_x": {
"type": "string",
"index" : "not_analyzed" // NOTE*
}
}
}
}
}
}'
*note: If your field really contains only singular letters like "A" and "B", you don't want to analyze the field, otherwise elasticsearch will remove these singular letter "words".
If that was just your example, and in your real documents you are searching for proper words, remove this line and let elasticsearch analyze the field.
Then, index your documents:
curl -XPUT http://localhost:9200/index/type/1 -d '
{
"field_a": "foo",
"field_b": "bar",
"field_x" : [{
"user" : "1",
"field_x" : "A"
},
{
"user" : "2",
"field_x" : "B"
}]
}'
And run your query:
curl -XGET localhost:9200/index/type/_search -d '{
"query": {
"nested" : {
"path" : "field_x",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"term": {
"field_x.user": "1"
}
},
{
"term": {
"field_x.field_x": "A"
}
}
]
}
}
}
}
}';
This will result in
{"took":13,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.987628,"hits":[{"_index":"index","_type":"type","_id":"1","_score":1.987628, "_source" :
{
"field_a": "foo",
"field_b": "bar",
"field_x" : [{
"user" : "1",
"field_x" : "A"
},
{
"user" : "2",
"field_x" : "B"
}]
}}]}}
However, querying
curl -XGET localhost:9200/index/type/_search -d '{
"query": {
"nested" : {
"path" : "field_x",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"term": {
"field_x.user": "1"
}
},
{
"term": {
"field_x.field_x": "B"
}
}
]
}
}
}
}
}';
won't return any results
{"took":6,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

Resources