Using scan and scroll for elasticSearch on sense - elasticsearch

I am trying to iterate over several documents in elasticSearch, and am using Sense (the google chrome plugin to do so). Using scan and scroll for efficiency I get the scroll id as:
POST _search?scroll=10m&search_type=scan
{
"query": { "match_all": {}}
}
The result of which is:
{
"_scroll_id": "c2Nhbjs1OzE4ODY6N[...]c5NTU0NTs=",
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 20000,
"max_score": 0,
"hits": []
}
}
Then pass this to a GET as:
GET _search/scroll?scroll=1m&scroll_id="c2Nhbjs1OzE4ODY6N[...]c5NTU0NTs="
but I get 0 results, specifically:
{
"_index": "my_index",
"_type": "_search",
"_id": "scroll",
"found": false
}

I found the problem, I had specified the index my_index in the server box on sense. Removing this and re-executing the post command as:
POST /my_index/_search?scroll=10m&search_type=scan
{
"query": { "match_all": {}}
}
and passing the resulting scroll_id as:
GET _search/scroll?scroll=1m&scroll_id="c2Nhbjs1OzE4ODY6N[...]c5NTU0NTs="
worked!

This works in my sense (of course you should replace the id from your case; don't use ")
POST /test/_search?search_type=scan&scroll=1m
GET /_search/scroll?scroll=1m&scroll_id=c2Nhbjs1OzI[...]Tt0b3RhbF9oaXRzOjQ7

Related

Kibana including versioned documents in visualizations

I have a document with _id "123456", and when I do a GET in Elasticsearch for that ID in my index I can see that it is _version: 2 which makes sense because I updated it.
However in my Kibana visualizations it seems like it is picking up both versions of the same document when showing the results.
How do I exclude versioned documents from re-appearing in the visualization? For example, this record is showing up twice in my bar graph.
Please and thank you
Example GET response:
{
"_index": "censored",
"_type": "censored",
"_id": "123456",
"_version": 2,
"found": true,
"_source": {
... ommitted
}
}
Also I am sure there is only one actual document with that ID because if I do a _search on the _id field I can see this:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 7.53924,
"hits": [
{
"_index": "censored",
"_type": "censored",
"_id": "123456",
"_score": 7.53924,
"_source": {
... ommitted
}
}
]
}
}
EDIT: Things I've tried below
aggs": {
"latest": {
"terms": {
"field": "_id"
}
}
}
and
"aggs": {
"latest": {
"max": {
"field": "version"
}
}
}
So frankly this is just a workaround, if someone finds a better solution I will mark that as the answer instead. Anyway this is how I've been able to prevent multiple records with the same _id showing up in my visualizations on my dashboard:
I just changed the "Y Axis - Count" on all the visualizations to being "Y Axis - Unique Count on field _id"
Honestly it seems silly that I have to do this because I think different versions should just automatically be exempt from appearing in my saved searches & visualizations. I couldn't find any information about why this was happening. I even tried a _forcemerge to try and delete previous versions of records but it didn't do anything.
Would be nice if someone found a real solution.

elasticsearch, multi_match and filter by date

It seems I followed every similar answer I found, but I just cant figure out what is wrong...
This is a "match all" query:
{
"query": {
"match_all": {}
}
}
..and the results:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "unittest_index_unittestdocument",
"_type": "unittestdocument",
"_id": "a.b",
"_score": 1,
"_source": {
"id": "a.b",
"docdate": "2018-01-24T09:45:44.4168345+02:00",
"primarykeywords": [
"keyword"
],
"primarytitles": [
"the title of a document"
]
}
}
]
}
}
but when I try to filter that with a date like this:
{
"query":{
"bool":{
"must":{
"multi_match":{
"type":"most_fields",
"query":"document",
"fields":[ "primarytitles","primarykeywords" ]
}
},
"filter": [
{"range":{ "docdate": { "gte":"1900-01-23T15:17:12.7313261+02:00" } } }
]
}
}
}
I have zero hits...
I tried to follow this https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html and this filtering by date in elasticsearch with no success at all..
Is there any difference that I cannot see????
Please note that when I remove the date filter and I add a term filter on "primarykeywords" i get the results i want. The only problem is the range filter
Apparently there was no error with my query, the problem was that the docdate field wasn't index... :/
Don't know why I initially skipped indexing that field (my mistake), but I do believe elastic should warn me that I am trying to query something that has "index: false"
This thing that elastic just doesn't return results without informing what is going on is, in my opinion, a major issue. I lost one day reading everything I could find on the web, just because I didn't had a proper feedback from the engine.
Fail safe died for this reason...

Elasticsearch query not returning _scroll_id for scroll query

We have an Elasticsearch cluster which all seems to be working fine except that scrolling does not work. When I do a query with a ?scroll=1m querystring no _scroll_id is returned in the results.
To check if it was anything to do with the existing Indexes I created a new Index:
PUT scroll_test
POST scroll_test/1
{
"foo": "bar"
}
POST scroll_test/2
{
"foo": "baz"
}
POST /scroll_test/_search?scroll=1m
{
"size": 1,
"query": {
"match_all": {}
}
}
returns
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "scroll_test",
"_type": "1",
"_id": "AV0N_R0jl33mdjPtW4uQ",
"_score": 1,
"_source": {
"foo": "bar"
}
}
]
}
}
We have just done a rolling upgrade from v5.2 to v5.4.3 (cluster health is now green). Scrolling still does not work after upgrading to v5.4.3.
I am able to execute scroll based queries on a local Elasticsearch v5.4.2 instance.
After reading a lot of other questions, I took away these main ideas:
Aggregation can't scroll
the query I copied from the Kibana "Discover" page "inspect" button had this, but I don't know what it was doing, and I was able to remove it with seemingly fine results.
Don't use scroll, and just use search_after:
docs state: We no longer recommend using the scroll API for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in time (PIT).
I don't know if aggregations also miss out on search_after but I am playing it safe by not using them for now.

Query with `field` returns nothing

I'm new to elastic search and am having troubles with my queries.
When I do a match all I get this;
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 1,
"hits": [{
"_index": "stations",
"_type": "station",
"_id": "4432",
"_score": 1,
"_source": {
"SiteName": "Abborrkroksvägen",
"LastModifiedUtcDateTime": "2015-02-13 10:34:20.643",
"ExistsFromDate": "2015-02-14 00:00:00.000"
}
},
{
"_index": "stations",
"_type": "station",
"_id": "9110",
"_score": 1,
"_source": {
"SiteName": "Abrahamsberg",
"LastModifiedUtcDateTime": "2012-03-26 23:55:32.900",
"ExistsFromDate": "2012-06-23 00:00:00.000"
}
}
]
}
}
My search query looks like this:
{
"query": {
"query_string": {
"fields": ["SiteName"],
"query": "a"
}
}
}
The problem is that when I run the query above I get empty results which is strange. I should receive both of the documents from my index, right?
What am I doing wrong? Did I index my data wrong or is my query just messed up?
Appreciate any help I can get. Thanks guys!
There is nothing wrong either in your data or query. It seems you didn't understand how data get stored in elasticsearch!
Firstly, when you index data("SiteName": "Abborrkroksvägen" and "SiteName": "Abrahamsberg") they will get stored as individual analysed terms.
When you query ES using "query":"a"(means here you are looking for the term "a" ) then it will look for whether there is any match with term a but as there are no terms so you will get empty results.
When you query ES using "query":"a*"(means all terms starts with "a") then it will return you as you expected.
Hope this clarifies your question!
Also you may have a look at article I found recently about search - https://www.timroes.de/2016/05/29/elasticsearch-kibana-queries-in-depth-tutorial/

Scan And Scroll Query In ElasticSearch

I am using ES 1.5 version, I used the scroll query, id ,I ts working fine and returning the scroll id
GET /ecommerce_parts/_search?search_type=scan&scroll=3m
{
"query": { "match_all": {}},
"size": 1000
}
{
"scrollid": "c2Nhbjs1OzE4Okk4ckVkSld0UXdDU212UU1kNWZBU1E7MTc6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxNjpJOHJFZEpXdFF3Q1NtdlFNZDVmQVNROzIwOkk4ckVkSld0UXdDU212UU1kNWZBU1E7MTk6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxO3RvdGFsX2hpdHM6MTg0NjEwOw==",
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 184610,
"max_score": 0,
"hits": []
}
}
Now when I passing the scroll id to retrieve the set of documents its throwing an error
GET /_search/scroll?scroll=1m
c2Nhbjs1OzE4Okk4ckVkSld0UXdDU212UU1kNWZBU1E7MTc6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxNjpJOHJFZEpXdFF3Q1NtdlFNZDVmQVNROzIwOkk4ckVkSld0UXdDU212UU1kNWZBU1E7MTk6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxO3RvdGFsX2hpdHM6MTg0NjEwOw==
{
"error": "ElasticsearchIllegalArgumentException[Malformed scrollId []]",
"status": 400
}
GET
/_search/scroll?scroll=3mc2Nhbjs1OzIzOkk4ckVkSld0UXdDU212UU1kNWZBU1E7MjE6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsyNTpJOHJFZEpXdFF3Q1NtdlFNZDVmQVNROzI0Okk4ckVkSld0UXdDU212UU1kNWZBU1E7MjI6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxO3RvdGFsX2hpdHM6MTg0NjEwOw==
Both of the above queries giving me an error how to retrieve the documents based on the scroll id, If m wrong please suggest a query
For subsequent requests, you're supposed to send the scroll ID in a JSON payload like this:
POST /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "c2Nhbjs1OzE4Okk4ckVkSld0UXdDU212UU1kNWZBU1E7MTc6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxNjpJOHJFZEpXdFF3Q1NtdlFNZDVmQVNROzIwOkk4ckVkSld0UXdDU212UU1kNWZBU1E7MTk6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxO3RvdGFsX2hpdHM6MTg0NjEwOw=="
}
In earlier versions of ES you could do it like this, too:
POST /_search/scroll?scroll=1m&scroll_id=c2Nhbjs1OzE4Okk4ckVkSld0UXdDU212UU1kNWZBU1E7MTc6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxNjpJOHJFZEpXdFF3Q1NtdlFNZDVmQVNROzIwOkk4ckVkSld0UXdDU212UU1kNWZBU1E7MTk6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxO3RvdGFsX2hpdHM6MTg0NjEwOw==

Resources