When sending a script call request, move Data Raw content to URL - elasticsearch

The simplest example:
GET /_search
{
"from" : 0, "size" : 10,
"query" : {
"term" : { "user" : "kimchy" }
}
}
Rewrite without data raw Search URI:
GET /_search?from=0&size=10&q=user:kimchy
Is it possible to rewrite the example for Search Template like this:
GET /_search/template
{
"id": "sample_id_script",
"params": {
"gte": "2020-10-15 00:00:00",
"lte": "2020-10-15 23:59:59"
}
}

Yes, it's possible via the source query string parameter!! You simply need to inline your JSON body and add the other &source_content_type=application/json query string parameter, and voilĂ !
GET /_search/template?source={"id": "sample_id_script","params": {"gte": "2020-10-15 00:00:00","lte": "2020-10-15 23:59:59"}}&source_content_type=application/json
Please note, though, that it's not the same concept as the example you're showing. In your example, we're hitting the _search endpoint and sending a query (i.e. using q=) expressed in the Lucene Expression language. It's basically the equivalent of what you would send in a query_string query.
The second case is different, because you're sending a search template via the _search/template endpoint. So even though the effect is the same (i.e. sending a payload via the query string), the concept semantic is different.

Related

Elasticsearch 7 number_format_exception for input value as a String

I have field in index with mapping as :
"sequence_number" : {
"type" : "long",
"copy_to" : [
"_custom_all"
]
}
and using search query as
POST /my_index/_search
{
"query": {
"term": {
"sequence_number": {
"value": "we"
}
}
}
}
I am getting error message :
,"index_uuid":"FTAW8qoYTPeTj-cbC5iTRw","index":"my_index","caused_by":{"type":"number_format_exception","reason":"For input string: \"we\""}}}]},"status":400}
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:260) ~[elasticsearch-rest-client-7.1.1.jar:7.1.1]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:238) ~[elasticsearch-rest-client-7.1.1.jar:7.1.1]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:212) ~[elasticsearch-rest-client-7.1.1.jar:7.1.1]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1433) ~[elasticsearch-rest-high-level-client-7.1.1.jar:7.1.1]
at
How can i ignore number_format_exception errors, so the query just doesn't return anything or ignores this filter in particular - either is acceptable.
Thanks in advance.
What you are looking for is not possible, ideally, you should have coherce enabled on your numeric fields so that your index doesn't contain dirty data.
The best solution is that in your application which generated the Elasticsearch query(you should have a check for NumberFormatExcepton if you are searching for numeric fields as your index doesn't contain the dirty data in the first place and reject the query if you get an exception in your application).
Edit: Another interesting approach is to validate the data before inserting into ES, using the Validate API as suggested by #prakash, only thing is that it would add another network call but if your application is not latency-sensitive, it can be used as a workaround.

Elasticsearch integer range query is not working

I have field hcc_member_id as of Integer type. I want to perform range query on this field. I tried queries given in the ES documentation, but it does not seem to work. No matter what the query is it always returns same response.
I think I am doing things in a wrong way but not able to identify the problem. Any help is good.
You should use POST instead of GET. Otherwise your Json will be ignored.
Furtermore you should add a "query" field to our json:
(without query you will get something like No parser for element [range]])
{
"query": {
"range": {
"hc_member_id": {
"gte": 1000
}
}
}
}
this is a working (for me) query
//EDIT // IT WORK ONLY IN POST NOT GET
{
"query" : {
"range" : {
"hcc_member_id" : {
"gte" : 1000
}
}
}
}

Elasticsearch bulk or search

Background
I am working on an API that allows the user to pass in a list of details about a member (name, email addresses, ...) I want to use this information to match up with account records in my Elasticsearch database and return a list of potential matches.
I thought this would be as simple as doing a bool query on the fields I want, however I seem to be getting no hits.
I'm relatively new to Elasticsearch, my current _search request looks like this.
Example Query
POST /member/account/_search
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"should" [{
"term" : {
"email": "jon.smith#gmail.com"
}
},{
"term" : {
"email": "samy#gmail.com"
}
},{
"term" : {
"email": "bo.blog#gmail.com"
}
}]
}
}
}
}
}
Question
How should I update this query to return records that match any of the email addresses?
Am I able to prioritise records that match email and another field? Example "family_name".
Will this be a problem if I need to do this against a few hundred emails addresses?
Well , you need to make the change in the index side rather than query side.
By default your email ID is broken into
jon.smith#gmail.com => [ jon , smith , gmail , com]
While indexing.
Now when you are searching using term query , it does not apply the analyzer and it tries to get the exact match of jon.smith#gmail.com , which as you can see , wont work.
Even if you use match query , then you will end up getting all document as matches.
Hence you need to change the mapping to index email ID as a single token , rather than tokenizing it.
So using not_analyzed would be the best solution here.
When you define email field as not_analyzed , the following happens while indexing.
jon.smith#gmail.com => [ jon.smith#gmail.com]
After changing the mapping and indexing all your documents , now you can freely run the above query.
I would suggest to use terms query as following -
{
"query": {
"terms": {
"email": [
"jon.smith#gmail.com",
"samy#gmail.com",
"bo.blog#gmail.com"
]
}
}
}
To answer the second part of your question - You are looking for boosting and would recommend to go through function score query

Elasticsearch doesn't return results

I am facing a strange issue in elasticsearch query. I don't know much about elasticsearch. My query is:
{
"query":
{
"bool":
{
"must":
[
{
"text":
{
"countryCode2":"DE"
}
}
],
"must_not":[],
"should":[]
}
},"from":0,"size":1,"sort":[],"facets":{}
}
The issues is for "DE". It is giving me results but for "BE" or "IN" it returns empty result.
You are indexing using the default mapping, which by default removes english stopwords. The country codes "IN", "BE", and many more are stopwords which don't even get indexed, therefore it's not possible to have matching documents, nor get back those country codes when faceting on that field.
The solution is to reindex after having submitted your own mapping for the country code field:
{
"your_type_name" : {
"country" : {
"type" : "string", "index" : "not_analyzed"
}
}
}
If you already tried to do this but nothing changed, the mapping didn't get submitted properly. I would suggest to double check that its json structure is correct and that you can actually get it back using the get mapping api.
As this is a common problem the defaults are probably going to change in the future to be less intrusive and avoid applying any language dependent text analysis.

How to perform a date range elasticsearch query given multiple dates per document?

I'm using ElasticSearch to index forum threads and reply posts. Each post has a date field associated with it. I'd like to perform a query that includes a date range which will return threads that contain posts matching a date range. I've looked at using a nested mapping but the docs say the feature is experimental and may lead to inaccurate results.
What's the best way to accomplish this? I'm using the Java API.
You haven't said much about your data structure, but I'm inferring from your question that you have post objects which contain a date field, and presumably a thread_id field, ie some way of identifying which thread a post belongs to?
Do you also have a thread object, or is your thread_id sufficient?
Either way, your stated goal is to return a list of threads which have posts in a particular date range. This means that you need to group your threads (rather than returning the same thread_id multiple times for each post in the date range).
This grouping can be done by using facets.
So the query in JSON would look like this:
curl -XGET 'http://127.0.0.1:9200/posts/post/_search?pretty=1&search_type=count' -d '
{
"facets" : {
"thread_id" : {
"terms" : {
"size" : 20,
"field" : "thread_id"
}
}
},
"query" : {
"filtered" : {
"query" : {
"text" : {
"content" : "any keywords to match"
}
},
"filter" : {
"numeric_range" : {
"date" : {
"lt" : "2011-02-01",
"gte" : "2011-01-01"
}
}
}
}
}
}
'
Note:
I'm using search_type=count because I don't actually want the posts returned, just the thread_ids
I've specified that I want the 20 most frequently encountered thread_ids (size: 20). The default would be 10
I'm using a numeric_range for the date field because dates typically have many distinct values, and the numeric_range filter uses a different approach to the range filter, making it perform better in this situation
If your thread_ids look like how-to-perform-a-date-range-elasticsearch-query then you can use these values directly. But if you have a separate thread object, then you can use the multi-get API to retrieve these
your thread_id field should be mapped as { "index": "not_analyzed" } so that the whole value is treated as a single term, rather than being analyzed into separate terms

Resources