Converting from SQL to elasticsearch query - elasticsearch

Elasticsearch noob and need help with a query. I have the following SQL query that I need to convert to a query to Elasticsearch
SELECT COUNT(*)
FROM table
WHERE Message LIKE '%Communication has failed.%'
AND [Date] > CONVERT( CHAR(8), GetDate(), 112) + ' 07:40:00'
AND [Date] < CONVERT( CHAR(8), GetDate(), 112) + ' 22:15:00'
I want to run the query against elasticsearch using curl and need help composing the query.
[Date] is equal to #timestamp in Elasticsearch document. Would also be nice if the elastic query syntax had and equivalent of getdate() to the current data.

The easiest way to get a count of all results is using the hits field in the result set. If you want to to that, your query would be:
POST /my_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"message": "*Communication has failed*"
}
},
{
"range": {
"my_date_field": {
"gte": "01/01/2018",
"lt": "01/02/2018",
"format": "dd/MM/yyyy"
}
}
}
]
}
},
"size": 0
}
Notice I am doing a range query on the date and a match query on the message.
I also set a size to zero b/c I only want to get back the hits values. The result would look like this:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 346,
"max_score": 0,
"hits": []
}
}
You could also use an aggregate query, but in this case you would want to aggregate on one field and the result of the aggregate would be the same as the hits field value. Think of an aggregate like the GROUP BY function in SQL. If you were only to GROUP BY one group, then your group would be equal to the value of COUNT(*). If you are interested in learning more about aggregates the documation is here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-valuecount-aggregation.html

Related

elastic range query giving only top 10 records

I am using elastic search query range to get the records from one date to other date using python, but I am only getting 10 records.
Below is the query
{"query": {"range": {"date": {"gte":"2022-01-01 01:00:00", "lte":"2022-10-10 01:00:00"}}}}
Sample Output:
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 8,
"successful": 8,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": 1.0,
"hits": [{"_source": {}}]
}
The "hits" list consists of only 10 records. When I checked in my database there are more than 10 records.
Can anyone tell me how to modify the query to get all the records for the mentioned date ranges?
You need to use the size param as by default Elasticsearch returns only 10 results.
{
"size" : 100, // if you want to fetch return 100 results.
"query": {
"range": {
"date": {
"gte": "2022-01-01 01:00:00",
"lte": "2022-10-10 01:00:00"
}
}
}
}
Refer Elasticsearch documentation for more info.
Update-1: Since OP wants to know the exact count of records matching search query(refer comments section), one use &track_total_hits=true in the URL (hint: causes performance issues) and then in the search response under hits you will get exact records matching your search as shown.
POST /_search?track_total_hits=true
"hits": {
"total": {
"value": 24981859, // note count with relation equal.
"relation": "eq"
},
"max_score": null,
"hits": []
}
Update-2, OP mentioned in the comments, that he is getting only 10K records in one query, as mentioned in the chat, its restericted by Elasticsearch due to performance reasons but if you still want to change it, it can be done by changing below setting of an index.
index.max_result_window: 20000 // your desired count
However as suggested in documentation
index.max_result_window The maximum value of from + size for searches
to this index. Defaults to 10000. Search requests take heap memory and
time proportional to from + size and this limits that memory. See
Scroll or Search After for a more efficient alternative to raising
this.

Get All records From Elastic Search without size

My query is something like this :
{ "from": 0, "size": 100,"track_total_hits": true, "query": {"bool": {"filter": [{
"bool": {
"must_not": {
"exists": {
"field": "deleted_at"
}
}
}}]}}, "sort": [{ "added_at" : {"order" : "desc"}}]}
Now If I don't specify size it gives only 10 records . And I don't know how many records are there . So what is possible thing to retrieve all data at once or even get count ?
hits.total.value will give you the value of the total number of documents, matching the search query.
track_total_hits value defaults to 10,000. If the number of documents is more than 10,000, then the relationship will change to gte, instead of eq
Refer to this official documentation, to know more about hits.total
"hits": {
"total": {
"value": 11, // note this
"relation": "eq"
},
"max_score": 0.0,
"hits": [
{
Elasticsearch returns datas with size option. If you want to fetch all datas,
you should use pagination with scroll api
You can follow this link
What is the version of ES you are using? With ES 7.x count of matching documents will be present in the response under hits -> total -> value

Get uniq results from elastic search

Get single record in given date. If it have multiple records then return single uniq record in every request (If it have single record it can be return same retult).
"query": {
"bool": {
"must": [
{
"match": {
"site_name": "blog_new_post"
}
},
{
"match": {
"postdate_yyyymmdd": "20190715"
}
}
]
}
},
"size": 1
}
I tried with size. So, size returning same record at some times.
{
"took": 152,
"timed_out": false,
"_shards": {
"total": 1180,
"successful": 1180,
"failed": 0
},
"hits": {
"total": 6624,
"max_score": 3.6852486,
"hits": [
{
"_index": "some-*",
"_type": "data-*",
"_id": "8a9e351e92e6b9b26c8d8fb0173cadd9",
"_score": 3.6852486,
"_source": {
"uniq_id": "8a9e351e92e6b9b26c8d8fb0173cadd9",
"postdate_yyyymmdd": "20190715"
}
}]
}
}
Uniq record based on uniq_id. Uniq id is different for every record.
For this you have to aggregate your results on the basis of aggregation.
As you said you want records based on uniq_id and I am assuming that you might have multiple records for this one id in your index and you want to return only one record for each uniq_id
You can refer to elastic search aggregations documentation.
you can sort records by newest and fetch top_hits in your aggregation. specifying the size of top_hits as 1 would get you 1 record for your uniq_id
Refer to documentation for aggregation instructions
Top_Hits Aggregation

Significant terms buckets always empty

I have a collection of posts with their tags imported into Elasticsearch. The indexes are:
language - type: keyword
tags (array) - type: keyword
created_at - type: date
Single document looks like that:
{ "language": "en", "tags": ["foo", "bar"], created_at: "..." }
I'm trying to get the significant terms query on my data set using:
GET _search
{
"aggregations": {
"significant_tags": {
"significant_terms": {
"field": "tags"
}
}
}
}
But the results bucket are always empty:
{
"took": 22,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"skipped": 0,
"failed": 0
},
"aggregations": {
"significant_tags": {
"doc_count": 2945,
"bg_count": 2945,
"buckets": []
}
}
}
I can confirm the data is properly imported as i'm able to any other aggregation on this dataset and it works fine. Just the significant terms don't want to cooperate. Any ideas on what am i possibly doing wrong in here?
Elasticsearch 6.2.4
Significant terms needs a foreground query or aggregation for it to calculate difference in term frequencies and produce statistically significant results. So you will need to add a initial query then your aggregation. See the doc for details https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html

Check if document exists or not in elasticsearch

I want to check if a document with specific field value does exist or not in elasticsearch.
I have gone through internet but only found how to check if a field exists or not.
My Index/Type is
/twitter/user
username is one field in document.
I want to check if username="xyz" exists or not in this Type.
You can query with size 0. total value will give you an idea that doc exists or not.
GET /twitter/user/_search
{"size": 0,
"query": {"match": {
"username": "xyz"
}}}
Edited --
_count api can be used as well.
GET /twitter/user/_count
{ "query": {"match": {
"username": "xyz"
}}}
From the documentation:
If all you want to do is to check whether a document exists—you’re not interested in the content at all—then use the HEAD method instead of the GET method. HEAD requests don’t return a body, just HTTP headers:
curl -i -XHEAD http://localhost:9200/twitter/user/userid
Elasticsearch will return a 200 OK status code if the document exists ... and a 404 Not Found if it doesn’t exist
Note: userid is the value of the _id field.
simply search the document, if it exists it will return result otherwise not
http://127.0.0.1:9200/twitter/user/_search?q=username:xyz
and the exact what are you looking for is
http://127.0.0.1:9200/twitter/user/_search/exists?q=username:xyz
it will return exists true or false
{
"exists": false
}
You can use term query with size 0. See below query for reference:
POST twitter/user/_search
{
"query": {
"term": {
"username": "xyz"
}
},
"size" : 0
}
Response:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
}
}
You will get total document count in hits.total and then you can check for count > 0

Resources