Kibana including versioned documents in visualizations - elasticsearch

I have a document with _id "123456", and when I do a GET in Elasticsearch for that ID in my index I can see that it is _version: 2 which makes sense because I updated it.
However in my Kibana visualizations it seems like it is picking up both versions of the same document when showing the results.
How do I exclude versioned documents from re-appearing in the visualization? For example, this record is showing up twice in my bar graph.
Please and thank you
Example GET response:
{
"_index": "censored",
"_type": "censored",
"_id": "123456",
"_version": 2,
"found": true,
"_source": {
... ommitted
}
}
Also I am sure there is only one actual document with that ID because if I do a _search on the _id field I can see this:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 7.53924,
"hits": [
{
"_index": "censored",
"_type": "censored",
"_id": "123456",
"_score": 7.53924,
"_source": {
... ommitted
}
}
]
}
}
EDIT: Things I've tried below
aggs": {
"latest": {
"terms": {
"field": "_id"
}
}
}
and
"aggs": {
"latest": {
"max": {
"field": "version"
}
}
}

So frankly this is just a workaround, if someone finds a better solution I will mark that as the answer instead. Anyway this is how I've been able to prevent multiple records with the same _id showing up in my visualizations on my dashboard:
I just changed the "Y Axis - Count" on all the visualizations to being "Y Axis - Unique Count on field _id"
Honestly it seems silly that I have to do this because I think different versions should just automatically be exempt from appearing in my saved searches & visualizations. I couldn't find any information about why this was happening. I even tried a _forcemerge to try and delete previous versions of records but it didn't do anything.
Would be nice if someone found a real solution.

Related

elasticsearch: copying meta-field _id to other field while creating document

I am using elasticsearch. I see there is meta-field _id for each document. I want to search document using this meta-field as I don't have any other field as unique field in document. But _id is a string and can have dashes which are not possible to search unless we add mapping for field as type :keyword. But it is possible as mentioned here. So now I am thinking to add another field newField in document and make it same as _id. One way to do it is: first create document and assign _id to that field and save document again. But this will have 2 connections which is not that good. So I want to find some solution to set newField while creating document itself. Is it even possible?
You can search for a document that contains dashes:
PUT my_index/tweet/testwith-
{
"fullname" : "Jane Doe",
"text" : "The twitter test!"
}
We just created a document with a dash in its id
GET my_index/tweet/_search
{
"query": {
"terms": {
"_id": [
"testwith-"
]
}
}
}
We search for the document that have the following id: testwith-
{
"took": 9,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_type": "tweet",
"_id": "testwith-",
"_score": 1,
"_source": {
"fullname": "Jane Doe",
"text": "The twitter test!"
}
}
]
}
}
We found it. We can search on document that have - in it.
you could also use a set processor when using an ingest pipeline to store the id in an additional field, see https://www.elastic.co/guide/en/elasticsearch/reference/5.5/accessing-data-in-pipelines.html and https://www.elastic.co/guide/en/elasticsearch/reference/5.5/set-processor.html

Elasticsearch query not returning _scroll_id for scroll query

We have an Elasticsearch cluster which all seems to be working fine except that scrolling does not work. When I do a query with a ?scroll=1m querystring no _scroll_id is returned in the results.
To check if it was anything to do with the existing Indexes I created a new Index:
PUT scroll_test
POST scroll_test/1
{
"foo": "bar"
}
POST scroll_test/2
{
"foo": "baz"
}
POST /scroll_test/_search?scroll=1m
{
"size": 1,
"query": {
"match_all": {}
}
}
returns
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "scroll_test",
"_type": "1",
"_id": "AV0N_R0jl33mdjPtW4uQ",
"_score": 1,
"_source": {
"foo": "bar"
}
}
]
}
}
We have just done a rolling upgrade from v5.2 to v5.4.3 (cluster health is now green). Scrolling still does not work after upgrading to v5.4.3.
I am able to execute scroll based queries on a local Elasticsearch v5.4.2 instance.
After reading a lot of other questions, I took away these main ideas:
Aggregation can't scroll
the query I copied from the Kibana "Discover" page "inspect" button had this, but I don't know what it was doing, and I was able to remove it with seemingly fine results.
Don't use scroll, and just use search_after:
docs state: We no longer recommend using the scroll API for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in time (PIT).
I don't know if aggregations also miss out on search_after but I am playing it safe by not using them for now.

Query with `field` returns nothing

I'm new to elastic search and am having troubles with my queries.
When I do a match all I get this;
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 1,
"hits": [{
"_index": "stations",
"_type": "station",
"_id": "4432",
"_score": 1,
"_source": {
"SiteName": "Abborrkroksvägen",
"LastModifiedUtcDateTime": "2015-02-13 10:34:20.643",
"ExistsFromDate": "2015-02-14 00:00:00.000"
}
},
{
"_index": "stations",
"_type": "station",
"_id": "9110",
"_score": 1,
"_source": {
"SiteName": "Abrahamsberg",
"LastModifiedUtcDateTime": "2012-03-26 23:55:32.900",
"ExistsFromDate": "2012-06-23 00:00:00.000"
}
}
]
}
}
My search query looks like this:
{
"query": {
"query_string": {
"fields": ["SiteName"],
"query": "a"
}
}
}
The problem is that when I run the query above I get empty results which is strange. I should receive both of the documents from my index, right?
What am I doing wrong? Did I index my data wrong or is my query just messed up?
Appreciate any help I can get. Thanks guys!
There is nothing wrong either in your data or query. It seems you didn't understand how data get stored in elasticsearch!
Firstly, when you index data("SiteName": "Abborrkroksvägen" and "SiteName": "Abrahamsberg") they will get stored as individual analysed terms.
When you query ES using "query":"a"(means here you are looking for the term "a" ) then it will look for whether there is any match with term a but as there are no terms so you will get empty results.
When you query ES using "query":"a*"(means all terms starts with "a") then it will return you as you expected.
Hope this clarifies your question!
Also you may have a look at article I found recently about search - https://www.timroes.de/2016/05/29/elasticsearch-kibana-queries-in-depth-tutorial/

Using scan and scroll for elasticSearch on sense

I am trying to iterate over several documents in elasticSearch, and am using Sense (the google chrome plugin to do so). Using scan and scroll for efficiency I get the scroll id as:
POST _search?scroll=10m&search_type=scan
{
"query": { "match_all": {}}
}
The result of which is:
{
"_scroll_id": "c2Nhbjs1OzE4ODY6N[...]c5NTU0NTs=",
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 20000,
"max_score": 0,
"hits": []
}
}
Then pass this to a GET as:
GET _search/scroll?scroll=1m&scroll_id="c2Nhbjs1OzE4ODY6N[...]c5NTU0NTs="
but I get 0 results, specifically:
{
"_index": "my_index",
"_type": "_search",
"_id": "scroll",
"found": false
}
I found the problem, I had specified the index my_index in the server box on sense. Removing this and re-executing the post command as:
POST /my_index/_search?scroll=10m&search_type=scan
{
"query": { "match_all": {}}
}
and passing the resulting scroll_id as:
GET _search/scroll?scroll=1m&scroll_id="c2Nhbjs1OzE4ODY6N[...]c5NTU0NTs="
worked!
This works in my sense (of course you should replace the id from your case; don't use ")
POST /test/_search?search_type=scan&scroll=1m
GET /_search/scroll?scroll=1m&scroll_id=c2Nhbjs1OzI[...]Tt0b3RhbF9oaXRzOjQ7

ElasticSearch returns document in search but not in GET

I'm doing a search of documents in my index, then subsequently trying to get some of them by _id. Despite receiving a set of results, some of the documents can not be retrieved with a simple get. Worse still, I CAN get the same document with a URI search where ?_id:<the id>
Just for example, running a simple GET
curl -XGET 'http://localhost:9200/keepbusy_process__issuer_application/KeepBusy__Activities__Activity/neHSKSBCSv-OyAYn3IFcew'
Gives me the result:
{
"_index" : "keepbusy_process__issuer_application",
"_type" : "KeepBusy__Activities__Activity",
"_id" : "neHSKSBCSv-OyAYn3IFcew",
"exists" : false
}
But if I do a search with same _id:
curl -XGET 'http://localhost:9200/keepbusy_process__issuer_application/KeepBusy__Activities__Activity/_search?q=_id:neHSKSBCSv-OyAYn3IFcew'
I get the expected result:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.0,
"hits": [
{
"_index": "keepbusy_process__issuer_application",
"_type": "KeepBusy__Activities__Activity",
"_id": "neHSKSBCSv-OyAYn3IFcew",
"_score": 1.0,
"_source": {
"template_uid": "KeepBusy__Activities__Activity.create application",
"name": "create application",
"updated_at": "2014-01-08T10:02:33-05:00",
"updated_at_ms": 1389193353975
}
}
]
}
}
I'm indexing documents through the stretcher ruby API, and immediately after indexing I'm doing a refresh. My local setup is 2 nodes. I'm running v0.90.9
There is nothing obvious in the logs why this should fail. I've restarted the cluster and everything appears to start correctly, but the result is the same.
Is there something I'm missing or some way I can further diagnose this issue?
This issue typically occurs when documents are indexed with non-default routing (either explicitly set or deducted from parent's id in case of parent/child documents). If this is the case, try specifying correct routing in you get request.

Resources