Elasticsearch integer range query is not working - elasticsearch

I have field hcc_member_id as of Integer type. I want to perform range query on this field. I tried queries given in the ES documentation, but it does not seem to work. No matter what the query is it always returns same response.
I think I am doing things in a wrong way but not able to identify the problem. Any help is good.

You should use POST instead of GET. Otherwise your Json will be ignored.
Furtermore you should add a "query" field to our json:
(without query you will get something like No parser for element [range]])
{
"query": {
"range": {
"hc_member_id": {
"gte": 1000
}
}
}
}

this is a working (for me) query
//EDIT // IT WORK ONLY IN POST NOT GET
{
"query" : {
"range" : {
"hcc_member_id" : {
"gte" : 1000
}
}
}
}

Related

When sending a script call request, move Data Raw content to URL

The simplest example:
GET /_search
{
"from" : 0, "size" : 10,
"query" : {
"term" : { "user" : "kimchy" }
}
}
Rewrite without data raw Search URI:
GET /_search?from=0&size=10&q=user:kimchy
Is it possible to rewrite the example for Search Template like this:
GET /_search/template
{
"id": "sample_id_script",
"params": {
"gte": "2020-10-15 00:00:00",
"lte": "2020-10-15 23:59:59"
}
}
Yes, it's possible via the source query string parameter!! You simply need to inline your JSON body and add the other &source_content_type=application/json query string parameter, and voilĂ !
GET /_search/template?source={"id": "sample_id_script","params": {"gte": "2020-10-15 00:00:00","lte": "2020-10-15 23:59:59"}}&source_content_type=application/json
Please note, though, that it's not the same concept as the example you're showing. In your example, we're hitting the _search endpoint and sending a query (i.e. using q=) expressed in the Lucene Expression language. It's basically the equivalent of what you would send in a query_string query.
The second case is different, because you're sending a search template via the _search/template endpoint. So even though the effect is the same (i.e. sending a payload via the query string), the concept semantic is different.

Elasticsearch. Painless script to search based on the last result

Let's see if someone could shed a light on this one, which seems to be a little hard.
We need to correlate data from multiple index and various fields. We are trying painless script.
Example:
We make a search in an index to gather data about the queueid of mails sent by someone#domain
Once we have the queueids, we need to store the queueids in an array an iterate over it to make new searchs to gather data like email receivers, spam checks, postfix results and so on.
Problem: Hos can we store the data from one search and use it later in the second search?
We are testing something like:
GET here_an_index/_search
{
"query": {
"bool" : {
"must": [
{
"range": {
"#timestamp": {
"gte": "now-15m",
"lte": "now"
}
}
}
],
"filter" : {
"script" : {
"script" : {
"source" : "doc['postfix_from'].value == params.from; qu = doc['postfix_queueid'].value; return qu",
"params" : {
"from" : "someona#mdomain"
}
}
}
}
}
}
}
And, of course, it throws an error.
"doc['postfix_from'].value ...",
"^---- HERE"
So, in a nuttshell: is there any way ti execute a search looking for some field value based on a filter (like from:someone#dfomain) and use this values on later searchs?
We have evaluated using script fields or nested, but due to some architecture reasons and what those changes would entail, right now, can not be used.
Thank you very much!

How can we do a key insensitive cardinality aggregation?

We can use cardinality to get a distinct count on a field, however the cardinality is case sensitive... meaning that if we have emails like user#x.com, User#x.com and USER#x.com these will count as 3 emails, however I need this to count as a single email count.
This is the aggregation I am using:
"aggs" : {
"emails" : {
"cardinality" : {
"field" : "emails.keyword"
}
}
}
I would need something like:
"aggs" : {
"emails" : {
"cardinality" : {
"field" : "emails.keyword",
"casesensitive": false ????
}
}
}
How can we do to make a cardinality aggregation to be key insensitive?
Although I would go with Val's suggestion, here is the query I thought may be useful if you do not have the control of the mapping where I made use of a custom script in Cardinality Aggregation
Aggregation Query:
POST <your_index_name>/_search
{
"size":0,
"aggs":{
"email_count":{
"cardinality":{
"script":{
"source":"doc['email.keyword'].toString().toLowerCase()"
}
}
}
}
}
Note that you would find more details on Scripting in the aforementioned link.
Hope this helps!

Elasticsearch query returns 10 when expecting > 10,000

I want to retrieve all the JSON objects in Elasticsearch that have a null value for awsKafkaTimestamp. This is the query I have set up:
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "tracer.awsKafkaTimestamp"
}
}
}
}
}
When I curl to my elasticsearch endpoint with the DSL I only get a few values back. I am expecting all (10000+) of them because I know for sure all the awsKafkaTimestamp values are null
This is the response I get when I use Postman. As you can see, there are only 10 JSON objects returned to me:
It's correct behaviour of the elasticsearch. By default, it only returns 10 records and provides information in hits.total field about the total number of documents matching search criteria. To retrieve more data than 10 you should specify size field in your query as shown below (you can read more about it here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html):
{
"from" : 0, "size" : 10,
"query" : {
"term" : { "user" : "kimchy" }
}
}
By default elasticsearch will give you 10 results, even if it matches to 10212. You can set the size parameter but that is limited to 10000, so your only option is to use the scroll API to get,
Example from elasticsearch site Scroll API
curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '
{
"query": {
"match" : {
"title" : "elasticsearch"
}
}
}
'

Filter facet returns count of all documents and not range

I'm using Elasticsearch and Nest to create a query for documents within a specific time range as well as doing some filter facets. The query looks like this:
{
"facets": {
"notfound": {
"query": {
"term": {
"statusCode": {
"value": 404
}
}
}
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2014-04-05T05:25:37",
"to": "2014-04-07T05:25:37"
}
}
}
]
}
}
}
In the specific case, the total hits of the search is 21 documents, which fits the documents within that time range in Elasticsearch. But the "notfound" facet returns 38, which fits the total number of ErrorDocuments with a StatusCode value of 404.
As I understand the documentation, facets collects data from withing the search. In this case, the "notfound" facet should never be able to return a count higher that 21.
What am I doing wrong here?
There's a distinct difference between filter/query/filtered_query/facet filter which is good to know.
Top level filter
{
filter: {}
}
This acts as a post-filter, meaning it will filter the results after the query phase has ended. Since facets are part of the query phase filters do not influence the documents that are facetted over. Filters do not alter score and are therefor very cacheable.
Top level query
{
query: {}
}
Queries influence the score of a document and are therefor less cacheable than filters. Queries run in the query phase and thus also influence the documents that are facetted over.
Filtered query
{
query: {
filtered: {
filter: {}
query: {}
}
}
}
This allows you to run filters in the query phase taking advantage of their better cacheability and have them influence the documents that are facetted over.
Facet filter
"facets" : {
"<FACET NAME>" : {
"<FACET TYPE>" : {
...
},
"facet_filter" : {
"term" : { "user" : "kimchy"}
}
}
}
this allows you to apply a filter to the documents that the facet is run over. Remember that the it'll be a combination of the queryphase/facetfilter unless you also specify global:true on the facet as well.
Query Facet/Filter Facet
{
"facets" : {
"wow_facet" : {
"query" : {
"term" : { "tag" : "wow" }
}
}
}
}
Which is the one that #thomasardal is using in this case which is perfectly fine, it's a facet type which returns a single value: the query hit count.
The fact that your Query Facet returns 38 and not 21 is because you use a filter for your time range.
You can fix this by either doing the filter in a filtered_query in the query phase or apply a facet filter(not a filter_facet) to your query_facet although because filters are cached better you better use facet filter inside you filter facet.
Confusingly Filter Facets are specified using .FacetFilter() on the search object. I will change this in 1.0 to avoid future confusion.
Sadly: .FacetFilter() and .FacetQuery() in NEST do not allow you to specify a facet filter like you can with other facets:
var results = typedClient.Search<object>(s => s
.FacetTerm(ft=>ft
.OnField("myfield")
.FacetFilter(f=>f.Term("filter_facet_on_this_field", "value"))
)
);
You issue here is that you are performing a Filter Facet and not a normal facet on your query (which will follow the restrictions applied via the query filter). In the JSON, the issue is because of the "query" between the facet name "notfound" and the "terms" entry. This is telling Elasticsearch to run this as a separate query and facet on the results of this separate query and not your main query with the date range filter. So your JSON should look like the following:
{
"facets": {
"notfound": {
"term": {
"statusCode": {
"value": 404
}
}
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2014-04-05T05:25:37",
"to": "2014-04-07T05:25:37"
}
}
}
]
}
}
}
Since I see you have this tagged with NEST as well, in your call using NEST, you are probably using FacetFilter on your search request, switch this to just Facet to get the desired result.

Resources