how to use filter in ElasticSearch? - elasticsearch

I'm trying to implement filter using ElasticSearch I'm simply want to implement range filter I've the following data:
{
"result": [
{
"Id": "144039",
"posted_dt": 1506951883637,
"submit_dt": 1507609800000,
"title": "Request for Information (RFI) # 306-18-0018",
"fname": "RODRI",
"email": "",
"desc": "dummy Text"
}
]
}
I want to get data from last 3 or 5 days I'm using this :
query = {
"bool": {
"must": [
{
"range" : {
"posted_dt" : {
"gte" : "now-3d/d",
"lt" : "now/d"
}
}
} ]
}
}
My mapping for posted_dt is :
"posted_dt": {
"type": "long"
},
I did try the filter as well but didn't succeed.
Please help.
Thanks
Randheer

Your mapping of "posted_dt" field is incorrect. You intend to store date which is in epoch in millis but you are storing it as long type. So the date range filter won't work on long datatype. Update your "posted_dt" field's mapping like :
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"posted_dt": {
"type": "date",
"format": "epoch_millis"
}
}
}
}
}
Refer Date datatype in Elasticsearch.

First you need to share your mapping. Actually make sure that posted_dt and submit_dt are defined as date in your mapping. Here you are using a long which is incorrect to deal with dates.
A side note is that you should use filter instead of must in your case. Will be faster IMO.

Related

Elastic Search Date Range Query

I am new to elastic search and I am struggling with date range query. I have to query the records which fall between some particular dates.The JSON records pushed into elastic search database are as follows:
"messageid": "Some message id",
"subject": "subject",
"emaildate": "2020-01-01 21:09:24",
"starttime": "2020-01-02 12:30:00",
"endtime": "2020-01-02 13:00:00",
"meetinglocation": "some location",
"duration": "00:30:00",
"employeename": "Name",
"emailid": "abc#xyz.com",
"employeecode": "141479",
"username": "username",
"organizer": "Some name",
"organizer_email": "cde#xyz.com",
I have to query the records which has start time between "2020-01-02 12:30:00" to "2020-01-10 12:30:00". I have written a query like this :
{
"query":
{
"bool":
{
"filter": [
{
"range" : {
"starttime": {
"gte": "2020-01-02 12:30:00",
"lte": "2020-01-10 12:30:00"
}
}
}
]
}
}
}
This query is not giving results as expected. I assume that the person who has pushed the data into elastic search database at my office has not set the mapping and Elastic Search is dynamically deciding the data type of "starttime" as "text". Hence I am getting inconsistent results.
I can set the mapping like this :
PUT /meetings
{
"mappings": {
"dynamic": false,
"properties": {
.
.
.
.
"starttime": {
"type": "date",
"format":"yyyy-MM-dd HH:mm:ss"
}
.
.
.
}
}
}
And the query will work but I am not allowed to do so (office policies). What alternatives do I have so that I can achieve my task.
Update :
I assumed the data type to be "Text" but by default Elastic Search applies both "Text" and "Keyword" so that we can implement both Full Text and Keyword based searches. If it is also set as "Keyword" . Will this benefit me in any case. I do not have access to lots of stuff in the office that's why I am unable to debug the query.I only have the search API for which I have to build the query.
GET /meetings/_mapping output :
'
'
'
"starttime" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
'
'
'
Date range queries will not work on text field, for that, you have to use the date field
Since you are working on date fields , best practice is to use the date field.
I would suggest you to reindex your index to another index so that you can change the type of your text field to date field
Step1-: Create index2 using index1 mapping and make sure to change the type of your date field which is text to date type
Step 2-: Run the elasticsearch reindex and reindex all your data from index1 to index2. Since you have changed your field type to date field type. Elasticsearch will now recognize this field as date
POST _reindex
{
"source":{ "index": "index1" },
"dest": { "index": "index2" }
}
Now you can run your Normal date queries on index2
As #jzzfs suggested the idea is to add a date sub-field to the starttime field. You first need to modify the mapping like this:
PUT meetings/_mapping
{
"properties": {
"starttime" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
},
"date": {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss",
}
}
}
}
}
When done, you need to reindex your data using the update by query API so that the starttime.date field gets populated and index:
POST meetings/_update_by_query
When the update is done, you'll be able to leverage the starttime.date sub-field in your query:
{
"query": {
"bool": {
"filter": [
{
"range": {
"starttime.date": {
"gte": "2020-01-02 12:30:00",
"lte": "2020-01-10 12:30:00"
}
}
}
]
}
}
}
There are ways of parsing text fields as dates at search time but the overhead is impractical... You could, however, keep the starttime as text by default but make it a multi-field and query it using starttime.as_date, for example.

How to index date ranges with ElasticSearch 5.1

I have documents that I want to index/search with ElasticSearch. These documents may contain multiple dates, and in some cases, the dates are actually date ranges. I'm wondering if someone can help me figure out how to write a query that does the right thing (or how to properly index my document so I can query it).
An example is worth a thousand words. Suppose the document contains two marriage date ranges: 2005-05-05 to 2007-07-07 and 2012-12-012 to 2014-03-03.
If I index each date range in start and end date fields, and write a typical range query, then a search for 2008-01-01 will return this record because one marriage will satisfy one of the inequalities and the other will satisfy the other. I don't know how to get ES to keep the two date ranges separate. Obviously, having marriage1 and marriage2 fields would resolve this particular problem, but in my actual data set I have an unbounded number of dates.
I know that ES 5.2 supports the date_range data type, which I believe would resolve this issue, but I'm stuck with 5.1 because I'm using AWS's managed ES.
Thanks in advance.
You can use nested objects for this purpose.
PUT /records
{
"mappings": {
"record": {
"properties": {
"marriage": {
"type": "nested",
"properties": {
"start": { "type": "date" },
"end": { "type": "date" },
"person1": { "type": "string" },
"person2": { "type": "string" }
}
}
}
}
}
}
PUT /records/record/1
{
"marriage": [ { "start" : "2005-05-05","end" :"2007-07-07" , "person1" : "","person2" :"" },{"start": "2012-12-12","end": "2014-03-03","person1" : "","person2" :"" }]
}
POST /records/record/_search
{
"query": {
"nested": {
"path": "marriage",
"query": {
"range": {
"marriage.start": { "gte": "2008-01-01", "lte": "2015-02-03"}
}
}
}
}

How to specify or target a field from a specific document type in queries or filters in Elasticsearch?

Given:
Documents of two different types, let's say 'product' and 'category', are indexed to the same Elasticsearch index.
Both document types have a field 'tags'.
Problem:
I want to build a query that returns results of both types, but the documents of type 'product' are allowed to have tags 'X' and 'Y', and the documents of type 'category' are only allowed to have tag 'Z'. How can I achieve this? It appears I can't use product.tags and category.tags since then ES will look for documents' product/category field, which is not what I intend.
Note:
While for the example above there might be some kind of workaround, I'm looking for a general way to target or specify fields of a specific document type when writing queries. I basically want to 'namespace' the field names used in my query so only documents of the type I want to work with are considered.
I think field aliasing would be the best answer for you, but it's not possible.
Instead you can use "copy_to" but I it probably affects index size:
DELETE /test
PUT /test
{
"mappings": {
"product" : {
"properties": {
"tags": { "type": "string", "copy_to": "ptags" },
"ptags": { "type": "string" }
}
},
"category" : {
"properties": {
"tags": { "type": "string", "copy_to": "ctags" },
"ctags": { "type": "string" }
}
}
}
}
PUT /test/product/1
{ "tags":"X" }
PUT /test/product/2
{ "tags":"Y" }
PUT /test/category/1
{ "tags":"Z" }
And you can query one of fields or many of them:
GET /test/product,category/_search
{
"query": {
"term": {
"ptags": {
"value": "x"
}
}
}
}
GET /test/product,category/_search
{
"query": {
"multi_match": {
"query": "x",
"fields": [ "ctags", "ptags" ]
}
}
}

Elasticsearch mapping - different data types in same field

I am trying to to create a mapping that will allow me to have a document looking like this:
{
"created_at" : "2014-11-13T07:51:17+0000",
"updated_at" : "2014-11-14T12:31:17+0000",
"account_id" : 42,
"attributes" : [
{
"name" : "firstname",
"value" : "Morten",
"field_type" : "string"
},
{
"name" : "lastname",
"value" : "Hauberg",
"field_type" : "string"
},
{
"name" : "dob",
"value" : "1987-02-17T00:00:00+0000",
"field_type" : "datetime"
}
]
}
And the attributes array must be of type nested, and dynamic, so i can add more objects to the array and index it by the field_type value.
Is this even possible?
I have been looking at the dynamic_templates. Can i use that?
You actually can index multiple datatypes into the same field using a multi-field mapping and the ignore_malformed parameter, if you are willing to query the specific field type if you want to do type specific queries (like comparisons).
This will allow elasticsearch to populate the fields that are pertinent for each input, and ignore the others. It also means you don’t need to do anything in your indexing code to deal with the different types.
For example, for a field called user_input that you want to be able to do date or integer range queries over if that is what the user has entered, or a regular text search if the user has entered a string, you could do something like the following:
PUT multiple_datatypes
{
"mappings": {
"_doc": {
"properties": {
"user_input": {
"type": "text",
"fields": {
"numeric": {
"type": "double",
"ignore_malformed": true
},
"date": {
"type": "date",
"ignore_malformed": true
}
}
}
}
}
}
}
We can then add a few documents with different user inputs:
PUT multiple_datatypes/_doc/1
{
"user_input": "hello"
}
PUT multiple_datatypes/_doc/2
{
"user_input": "2017-02-12"
}
PUT multiple_datatypes/_doc/3
{
"user_input": 5
}
And when you search for these, and have ranges and other type-specific queries work as expected:
// Returns only document 2
GET multiple_datatypes/_search
{
"query": {
"range": {
"user_input.date": {
"gte": "2017-01-01"
}
}
}
}
// Returns only document 3
GET multiple_datatypes/_search
{
"query": {
"range": {
"user_input.numeric": {
"lte": 9
}
}
}
}
// Returns only document 1
GET multiple_datatypes/_search
{
"query": {
"term": {
"user_input": {
"value": "hello"
}
}
}
}
I wrote about this as a blog post here
No - you cannot have different datatypes for the same field within the same type.
e.g. the field index/type/value can not be both a string and a date.
A dynamic template can be used to set the datatype and analyzer based on the format of the field name
For example:
set all fields with field names ending in "_dt" to type datetime.
But this won't help in your scenario, once the datatype is set you can't change it.

Indexing a field for both phrase searching and partial matches

I am creating an index on an object, and wanting to be able to do both full phrase searches as well as partial matches. The type is called "deponent", and a simplified index creation is shown below:
{
"deponent": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name": {
"type": "string"
},
"full": {
"type": "string",
"index": "not_analyzed",
"omit_norms": true,
"index_options": "docs",
"include_in_all": false
}
}
}
}
}
}
The intent of this is to index the values in the "name" field twice: once where the individual words within the field are not broken up (name.full) and once where the words are broken up (name.name).
I have a document which has been indexed whose name field is set to "Dr. Danny Watson". I would expect the following behaviors to occur when executing a term query (whose query string is not analyzed according to the documentation):
When searching name.full using "Dr. Danny Watson", the record
should be returned
When searching name.full using "Watson", the record should not be returned
When searching name.name using "Dr. Danny Watson", the record should not be returned
When searching name.name using "Watson", the record should be returned
The queries for the four points above:
1 - works as expected (returns the record)
{
"query" : {
"term": {
"name.full": {
"value": "Dr. Danny Watson"
}
}
}
}
2 - works as expected (does not return the record)
{
"query" : {
"term": {
"name.full": {
"value": "Watson"
}
}
}
}
3 - works as expected (does not return the record)
{
"query" : {
"term": {
"name.name": {
"value": "Dr. Danny Watson"
}
}
}
}
4 - does NOT work as expected - the record is not returned
{
"query" : {
"term": {
"name.name": {
"value": "Watson"
}
}
}
}
So it seems my understanding of something is flawed. What am I missing?
You don't need to call the field "name.name". The multi-field with the original name is used as the default, so you should use just "name" for that.
Also it's always good to make sure the index and search analyzers are in order (so for instance both your indexed terms and the search term are changed to lower case).

Resources