To Select documents having same startDate and endDate - elasticsearch

I have some documents where in each document , there is a startDate and endDate date fields. I need all documents with both these value as same. I couldn't find any query which will help me to do it.

Elasticsearch supports script filters, which you can use in this case . More Info
Something like this is what you will need -
POST /<yourIndex>/<yourType>/_search?
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "doc['startDate'].value == doc['endDate'].value"
}
}
}
}
}

This can be achieved in 2 manner
Index solution - While indexing add an additional field called isDateSame and set it to true or false based on the value of startDate and endDate. Then you can easily do a query based on that field. This is the best optimized solution
Script solution - Elasticsdearch maintains all the indexed data in field data which is more like a reverse reverse index. Using script you can access any indexed fields and do comparison. This is pretty fast but not as good as first one.You can use the following query for the same

Related

Exact match over decimal values

I want to perform an exact match over decimal values.
I have submitted two applications , for first application with annual salary as 99999868.10 and the other as 99999868.99.
When I do a query for 99999868 or I search 99999868.10 it returns me both the data , whereas I expect it to return only the exact match for it
The query I am executing is :
GET index/_search
{"query": {
"term": {
"Annual Salary": {
"value": "99999868"
}
}
}
}
Change mapping of salary field to numeric type and re index data
Numeric type reference : - https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html
use match_phrase and let me know. Actually, it will solve your problem.

Select and Update all matching Documents

We are trying to do the following and any help would be appreciated.
Say you make a search and 100,000 documents match.
We would like to increment a counter in each document that matched. Then at the same time select the first page say the first 50.
Can this be done in one operation or may be a parallel scenario.
You could try a multi search query for such kind of maneuver:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html
Basically you add several queries in a single multi search which is parallelized on the ES side, and returns a list of responses per each query.
You Can use Update by query using NEST.Let me know if you still facing any issues.
You can use update by query to do this :
{
"script": {
// fieldName is field you want to increment in document
"source": "ctx._source.fieldName=params.val", // counter increment by when query match
"lang": "painless",
"params": {
"i": 0,
"val": i+1,
}
},
"query": {
// your match condition
}
}

Get records for particular day of the week in ElasticSearch

I have an ES cluster that has some summarized numerical data such that there is exactly 1 record per day. I want to write a query that will return the documents for a specific day of the week. For example, all records for Tuesdays. Currently I am doing this by getting all records for the required date range and then filtering out the ones for the day that I need. Is there a way to do that with a query?
You can do it using a script like this:
POST my_index/_search
{
"query": {
"script": {
"script": {
"source": "doc.my_date.value.dayOfWeek == 2"
}
}
}
}
If you're going to run this query often, you would be probably better off creating another field dayOfWeek in your document that contains the day of the week that you can then easily query using a term query. It would be more efficient than a script.

ElasticSearch 2.4 date range histogram using the difference between two date fields

I haven't been able to find anything regarding this for ES 2.* in regards to the problem here or in the docs, so sorry if this is a duplicate.
What I am trying to do is create an aggregation in an ElasticSearch query that will allow me to create buckets based on the difference in a record between 2 date fields.
I.e. If I had data in ES for a shop, I might like to see the time difference between a purchase_date field and shipped_date field.
So in that instance I'd want to create an aggregate that had buckets to give me the hits for when shipped_date - purchase_date is < 1 day, 1-2 days, 3-4 days or 5+ days.
Ideally I was hoping this was possible in an ES query. Is that the case or would the best approach be to process the results into my own array based on the time difference for each hit?
I was able to achieve this by using the built in expression language which is enabled by default in ES 2.4. The functionality I wanted was to group my results to show the difference between EndDate and Date Processed in increments of 15 days. Relevant part of the query is:
{
...,
"aggs": {
"reason": {
"date_histogram": {
"min_doc_count": 1,
"interval": "1296000000ms", // 15 days
"format": "epoch_millis",
"script": {
"lang": "expression",
"inline": "doc['DateProcessed'] > doc['EndDate'] ? doc['DateProcessed'] - doc['EndDate'] : -1"
}
}
...
}
}

Query that works on difference of dates

Consider I have a doc which has createdDate and closedDate. Now I want to find all docs where (closedDate - createdDate) > 2. I am not able to apply script in range field. Any clue how to proceed with this.
I think this may be possbile by using scripts. By isn't any way I can perform this by query.
Isn't a way to perform this like
{
"range" : {
"date" : {
"gt" : "{createdDate} - {closedDate}/d > 2"
}
}
}
The only way to do that by query is to index an additonal duration field before-hand into your JSON document. Personally I would store the duration in milliseconds and use filters for queries.
If this is not acceptable you will have to use script fields. Described here and here in the Elasticsearch docu.
IMO saving the durtion to each document is preferable, especially if you frequently use the duration for further analysis. The additional field does not cost a lot of memory, but reduces the need for calculations (and therefore is likly to speed up query time) And Especially in Elasticsearch memory shouldn't be a big issue.
Yes, you can do this via script
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": "(doc.closedDate.value - doc.createdDate.value)/86400000 > 2"
}
}
]
}
}
}
Note: make sure to enable dynamic scripting in order to try this.
However, it'd be best to already compute that difference at indexing time and then use a range query on that difference field.

Resources