In Elasticsearch, how to use a range query on a text field? - elasticsearch

There is a 'remark' field in Elasticsearch index that contains various remarks along with the date when that remark was given. For example:
remark
------
14/02/2023 To be updated ; 15/02/2023 Further action is needed ; 16/02/2023 Looks good
Due to some implementation specific reasons, I can't split date as a separate field. I need to query all the records that match a given date range in 'remark' field. For example: Retrieve all the records that are in the date range 15/02/2023 and 16/02/2023.
I have written the following query in Elasticsearch:
GET myindex/_search
{
"query"
: {
"bool"
: {
"must"
: [
{
"range"
: {
"remark"
: {
"gte" : "2023-02-15",
"lte" : "2023-02-16"
}
}
}
]
}
},
"highlight"
: {
"fields"
: {
"content"
: {
"type" : "unified",
"fragment_size" : 150,
"number_of_fragments" : 3,
"pre_tags" : [""],
"post_tags" : [""]
}
}
},
"size"
: 1000
}
The above query doesn't work since the field 'remark' is not of type datetime. Is there any workaround to this issue?

Yes, it's possible to use text or keyword field type with range query but it's an expensive query, so it's disabled by default.
Using the range query with text and keyword fields
Range queries on text or keyword fields will not be executed if
search.allow_expensive_queries is set to false.
I won't recommend you to enable it but if you want you can use:
PUT _cluster/settings
{
"transient": {
"search.allow_expensive_queries": "true"
}
}
After you update the cluster settings your query will work.
Recommendation:
add a new field:
PUT index_name/_mapping
{
"properties": {
"remark_date": {
"type": "date"
}
}
}
and update the data, the update by query will add a new field and value for each document.
POST index_name/_update_by_query?wait_for_completion=false
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html#ranges-on-text-and-keyword

Related

How to find similar documents in Elasticsearch

My documents are made up of using various fields. Now given an input document, I want to find the similar documents using the input document fields. How can I achieve it?
{
"query": {
"more_like_this" : {
"ids" : ["12345"],
"fields" : ["field_1", "field_2"],
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}
you will get similar documents to id 12345. Here you need to specify only ids and field like title, category, name, etc. not their values.
Here is another code to do without ids, but you need to specify fields with values. Example: Get similar documents which have similar title to:
elasticsearch is fast
{
"query": {
"more_like_this" : {
"fields" : ["title"],
"like" : "elasticsearch is fast",
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}
You can add more fields and their values
You haven't mentioned the types of your fields. A general approach is to use a catch all field (using copy_to) with the more like this query.
{
"query": {
"more_like_this" : {
"fields" : ["first name", "last name", "address", "etc"],
"like" : "your_query",
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}
Put everything in your_query . You can increase or decrease min_term_freq and max_query_terms

Elasticsearch slow results with IN query and Scoring

I have text document data (500k approximately) saved in elasticsearch where the document text is mapped with it's corresponding document number.
I am trying to fetch results in batches for "Sample Text" in particular set of document numbers (300k appoximately) with scoring and i am facing extreme slowness in the result.
Here is the the Mapping
PUT my_index
{
"mappings" : {
"doc_repo" : {
"properties" : {
"doc_number" : {
"type" : "integer"
},
"document" : {
"type" : "string",
"term_vector" : "with_positions_offsets_payloads"
}
}
}
}
}
Here is the request query
{
"query" : {
"bool" : {
"must" : [
{
"terms" : {
"document" : [
"sample text"
]
}
},
{
"terms" : {
"doc_number" : [1,2,3....,300K] //ArrayOf_300K_DocNumbers
}
}
]
}
},
"fields" : [
"doc_number"
],
"size" : 500,
"from" : 0
}
I Tried fetching result in two other ways
Result without scoring in particular set of document numbers(i used filtering for this)
Result with scoring but without any particular set of document numbers (in batches)
Both of these were pretty quick, but problem comes when i am trying achieve both.
Do i need to change mapping or search query or any other ways to achieve this.
Thanks in advance.
Issue was specifically with elasticsearch 2.X, Upgrading elasticsearch solves the issue.

elasticsearch percolator filter fails

I'm using a document query against a percolator that works ok. When I try to filter the percolator queries against which document percolate using queries ids, it doesn't return any result. For example:
{
"doc" : {
"text" : "This is the text within my document"
},
"highlight" : {
"order" : "score",
"pre_tags" : ["<example>"],
"post_tags" : ["</example>"],
"fields" : {
"text" : { "number_of_fragments" : 0 }
}
},
"filter":{"ids":{"values":[11,15]}}
,
"size" : 100
}
I know for sure that those ids are correct, but allways obtain "matches" : [ ]. When I don't use filter, ES retrieves correct matches.
Thanks for your help.
I think I've solved it. It seems that the filter only works on the "metadata" fields, meaning that you have to add customized fields to the queries indexed in the percolator in order to use them to filter when you need.
Using my previous example, I would have to index in percolator queries like:
{
"query" : {
"match_phrase" : {
"text" : "document"
}
},
"id" : 11
}
Adding "manually" a redundant id field in order to use it later as filter reference.
At percolation time, you have to use something like:
{
"doc" : {
"text" : "This is the text within my document"
},
"filter":{"match":{"id":11}},
"highlight" : {
"order" : "score",
"pre_tags" : ["<example>"],
"post_tags" : ["</example>"],
"fields" : {
"text" : { "number_of_fragments" : 0 }
}
},
"size" : 100
}
In order to use only that percolator query. Complementary information can be found here.

How to make use of `gt` and `fields` in the same query in Elasticsearch

In my previous question, I was introduced to the fields in a query_string query and how it can help me to search nested fields of a document.
{
"query": {
"query_string": {
"fields": ["*.id","id"],
"query": "2"
}
}
}
But it only works for matching, what if I want to do some comparison? After some reading and testing, it seems queries like range do not support fields. Is there any way I can perform a range query, e.g. on a date, over a field that can be scattered anywhere in the document hierarchy?
i.e. considering the following document:
{
"id" : 1,
"Comment" : "Comment 1",
"date" : "2016-08-16T15:22:36.967489",
"Reply" : [ {
"id" : 2,
"Comment" : "Inner comment",
"date" : "2016-08-16T16:22:36.967489"
} ]
}
Is there a query searching over the date field (like date > '2016-08-16T16:00:00.000000') which matches the given document, because of the nested field, without explicitly giving the address to Reply.date? Something like this (I know the following query is incorrect):
{
"query": {
"range" : {
"date" : {
"gte" : "2016-08-16T16:00:00.000000",
},
"fields": ["date", "*.date"]
}
}
}
The range query itself doesn't support it, however, you can leverage the query_string query (again) and the fact that you can wildcard fields and that it supports range queries in order to achieve what you need:
{
"query": {
"query_string": {
"query": "\*date:[2016-08-16T16:00:00.000Z TO *]"
}
}
}
The above query will return your document because Reply.date matches *date

Elastic Search - Filter value by latest date

In elastic search we have a document with different values. Each value have a period. A period tells if the value is still actual, and tells in what period the value is actual.
Example bellow:
"": [
{
"memberType": user,
"period": {
"validFrom": "1964-08-23",
"validTo": "2008-12-31"
},
},
{
"memberType": admin,
"period": {
"validFrom": "2008-12-31",
"validTo": null
}
}
]
In our query, I want to filter by memberType. But only deal with the newest type of member. So if I filter by memberType "user", the document above should not be a match, because the actual memberType is admin.
In the above example, I could filter with a boolean filter by memberType, and missing field on the validTo field.
But if the person is not valid longer at all, both will have a validTo date defined, and I have to look at the newest date then.
How can I achieve that? I'm thinking of a nested query, or a custom script filter. But I dont know how to express the query.
Thanks in advance
Provided that your field name for this array is memberDetails, you can use this query to achieve what you need.
{
"query" : { "match_all" : {} },
"filter" : {
"nested" : {
"path" : "memberDetails",
"filter" : {
"bool" : {
"must" : [
{
"term" : {"memberType" : "user"}
},
{
"missing" : { "field" : "validTo" }
}
]
}
}
}
}
}

Resources