Scan And Scroll Query In ElasticSearch - elasticsearch

I am using ES 1.5 version, I used the scroll query, id ,I ts working fine and returning the scroll id
GET /ecommerce_parts/_search?search_type=scan&scroll=3m
{
"query": { "match_all": {}},
"size": 1000
}
{
"scrollid": "c2Nhbjs1OzE4Okk4ckVkSld0UXdDU212UU1kNWZBU1E7MTc6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxNjpJOHJFZEpXdFF3Q1NtdlFNZDVmQVNROzIwOkk4ckVkSld0UXdDU212UU1kNWZBU1E7MTk6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxO3RvdGFsX2hpdHM6MTg0NjEwOw==",
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 184610,
"max_score": 0,
"hits": []
}
}
Now when I passing the scroll id to retrieve the set of documents its throwing an error
GET /_search/scroll?scroll=1m
c2Nhbjs1OzE4Okk4ckVkSld0UXdDU212UU1kNWZBU1E7MTc6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxNjpJOHJFZEpXdFF3Q1NtdlFNZDVmQVNROzIwOkk4ckVkSld0UXdDU212UU1kNWZBU1E7MTk6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxO3RvdGFsX2hpdHM6MTg0NjEwOw==
{
"error": "ElasticsearchIllegalArgumentException[Malformed scrollId []]",
"status": 400
}
GET
/_search/scroll?scroll=3mc2Nhbjs1OzIzOkk4ckVkSld0UXdDU212UU1kNWZBU1E7MjE6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsyNTpJOHJFZEpXdFF3Q1NtdlFNZDVmQVNROzI0Okk4ckVkSld0UXdDU212UU1kNWZBU1E7MjI6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxO3RvdGFsX2hpdHM6MTg0NjEwOw==
Both of the above queries giving me an error how to retrieve the documents based on the scroll id, If m wrong please suggest a query

For subsequent requests, you're supposed to send the scroll ID in a JSON payload like this:
POST /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "c2Nhbjs1OzE4Okk4ckVkSld0UXdDU212UU1kNWZBU1E7MTc6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxNjpJOHJFZEpXdFF3Q1NtdlFNZDVmQVNROzIwOkk4ckVkSld0UXdDU212UU1kNWZBU1E7MTk6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxO3RvdGFsX2hpdHM6MTg0NjEwOw=="
}
In earlier versions of ES you could do it like this, too:
POST /_search/scroll?scroll=1m&scroll_id=c2Nhbjs1OzE4Okk4ckVkSld0UXdDU212UU1kNWZBU1E7MTc6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxNjpJOHJFZEpXdFF3Q1NtdlFNZDVmQVNROzIwOkk4ckVkSld0UXdDU212UU1kNWZBU1E7MTk6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxO3RvdGFsX2hpdHM6MTg0NjEwOw==

Related

elastic range query giving only top 10 records

I am using elastic search query range to get the records from one date to other date using python, but I am only getting 10 records.
Below is the query
{"query": {"range": {"date": {"gte":"2022-01-01 01:00:00", "lte":"2022-10-10 01:00:00"}}}}
Sample Output:
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 8,
"successful": 8,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": 1.0,
"hits": [{"_source": {}}]
}
The "hits" list consists of only 10 records. When I checked in my database there are more than 10 records.
Can anyone tell me how to modify the query to get all the records for the mentioned date ranges?
You need to use the size param as by default Elasticsearch returns only 10 results.
{
"size" : 100, // if you want to fetch return 100 results.
"query": {
"range": {
"date": {
"gte": "2022-01-01 01:00:00",
"lte": "2022-10-10 01:00:00"
}
}
}
}
Refer Elasticsearch documentation for more info.
Update-1: Since OP wants to know the exact count of records matching search query(refer comments section), one use &track_total_hits=true in the URL (hint: causes performance issues) and then in the search response under hits you will get exact records matching your search as shown.
POST /_search?track_total_hits=true
"hits": {
"total": {
"value": 24981859, // note count with relation equal.
"relation": "eq"
},
"max_score": null,
"hits": []
}
Update-2, OP mentioned in the comments, that he is getting only 10K records in one query, as mentioned in the chat, its restericted by Elasticsearch due to performance reasons but if you still want to change it, it can be done by changing below setting of an index.
index.max_result_window: 20000 // your desired count
However as suggested in documentation
index.max_result_window The maximum value of from + size for searches
to this index. Defaults to 10000. Search requests take heap memory and
time proportional to from + size and this limits that memory. See
Scroll or Search After for a more efficient alternative to raising
this.

Check if document exists or not in elasticsearch

I want to check if a document with specific field value does exist or not in elasticsearch.
I have gone through internet but only found how to check if a field exists or not.
My Index/Type is
/twitter/user
username is one field in document.
I want to check if username="xyz" exists or not in this Type.
You can query with size 0. total value will give you an idea that doc exists or not.
GET /twitter/user/_search
{"size": 0,
"query": {"match": {
"username": "xyz"
}}}
Edited --
_count api can be used as well.
GET /twitter/user/_count
{ "query": {"match": {
"username": "xyz"
}}}
From the documentation:
If all you want to do is to check whether a document exists—you’re not interested in the content at all—then use the HEAD method instead of the GET method. HEAD requests don’t return a body, just HTTP headers:
curl -i -XHEAD http://localhost:9200/twitter/user/userid
Elasticsearch will return a 200 OK status code if the document exists ... and a 404 Not Found if it doesn’t exist
Note: userid is the value of the _id field.
simply search the document, if it exists it will return result otherwise not
http://127.0.0.1:9200/twitter/user/_search?q=username:xyz
and the exact what are you looking for is
http://127.0.0.1:9200/twitter/user/_search/exists?q=username:xyz
it will return exists true or false
{
"exists": false
}
You can use term query with size 0. See below query for reference:
POST twitter/user/_search
{
"query": {
"term": {
"username": "xyz"
}
},
"size" : 0
}
Response:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
}
}
You will get total document count in hits.total and then you can check for count > 0

Elasticsearch query not returning _scroll_id for scroll query

We have an Elasticsearch cluster which all seems to be working fine except that scrolling does not work. When I do a query with a ?scroll=1m querystring no _scroll_id is returned in the results.
To check if it was anything to do with the existing Indexes I created a new Index:
PUT scroll_test
POST scroll_test/1
{
"foo": "bar"
}
POST scroll_test/2
{
"foo": "baz"
}
POST /scroll_test/_search?scroll=1m
{
"size": 1,
"query": {
"match_all": {}
}
}
returns
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "scroll_test",
"_type": "1",
"_id": "AV0N_R0jl33mdjPtW4uQ",
"_score": 1,
"_source": {
"foo": "bar"
}
}
]
}
}
We have just done a rolling upgrade from v5.2 to v5.4.3 (cluster health is now green). Scrolling still does not work after upgrading to v5.4.3.
I am able to execute scroll based queries on a local Elasticsearch v5.4.2 instance.
After reading a lot of other questions, I took away these main ideas:
Aggregation can't scroll
the query I copied from the Kibana "Discover" page "inspect" button had this, but I don't know what it was doing, and I was able to remove it with seemingly fine results.
Don't use scroll, and just use search_after:
docs state: We no longer recommend using the scroll API for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in time (PIT).
I don't know if aggregations also miss out on search_after but I am playing it safe by not using them for now.

Using scan and scroll for elasticSearch on sense

I am trying to iterate over several documents in elasticSearch, and am using Sense (the google chrome plugin to do so). Using scan and scroll for efficiency I get the scroll id as:
POST _search?scroll=10m&search_type=scan
{
"query": { "match_all": {}}
}
The result of which is:
{
"_scroll_id": "c2Nhbjs1OzE4ODY6N[...]c5NTU0NTs=",
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 20000,
"max_score": 0,
"hits": []
}
}
Then pass this to a GET as:
GET _search/scroll?scroll=1m&scroll_id="c2Nhbjs1OzE4ODY6N[...]c5NTU0NTs="
but I get 0 results, specifically:
{
"_index": "my_index",
"_type": "_search",
"_id": "scroll",
"found": false
}
I found the problem, I had specified the index my_index in the server box on sense. Removing this and re-executing the post command as:
POST /my_index/_search?scroll=10m&search_type=scan
{
"query": { "match_all": {}}
}
and passing the resulting scroll_id as:
GET _search/scroll?scroll=1m&scroll_id="c2Nhbjs1OzE4ODY6N[...]c5NTU0NTs="
worked!
This works in my sense (of course you should replace the id from your case; don't use ")
POST /test/_search?search_type=scan&scroll=1m
GET /_search/scroll?scroll=1m&scroll_id=c2Nhbjs1OzI[...]Tt0b3RhbF9oaXRzOjQ7

Using Kibana4 Tile Map with Geo-Points

I'm using Logstash to insert a location attribute into my logs that are going into ElasticSearch:
"location" : {
"lat":31.07,
"lon":-82.09
}
I'm then setting up a mapping to tell ElasticSearch it's a Geo-Point. I'm not exactly sure how this call should look. This is what I've been using so far:
PUT logstash-*/_mapping/latlon
{
"latlon" : {
"properties" : {
"location" : {
"type" : "geo_point",
"lat_lon" : true,
"geohash" : true
}
}
}
}
When I query for matching records in Kibana 4, the location field is annotated with a small globe. So far, so good.
When I move to the Tile Map visualization, bring up matching records, bucket by Geo Coordinates, select Geohash from the 'Aggregation' drop down, and then select location from the Field drop down, and press 'Apply', no results are returned.
The aggreations part of the request looks like this:
"aggs": {
"2": {
"geohash_grid": {
"field": "location",
"precision": 3
}
}
}
And the response:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 689,
"max_score": 0,
"hits": []
},
"aggregations": {
"2": {
"buckets": []
}
}
}
For some reason, ElasticSearch isn't returning results, even though it seems like the Geo-Point mapping is recognized. Any tips for how to troubleshoot from here?

Resources