Can We Retrieve Previous _source Docs with Elastic Search Versions - elasticsearch

I've read the blog post on ES regarding versioning.
However, I'd like to be able to get the previous _source documents from an update.
For example, let's say I have this object:
{
"name": "John",
"age": 32,
"job": "janitorial technician"
}
// this becomes version 1
And I update it to:
{
"name": "John",
"age": 32,
"job": "president"
}
// this becomes version 2
Then, through versioning in ES, would I be able to get the previous job property of the object? I've tried this:
curl -XGET "localhost:9200/index/type/id?version=1"
but that just returns the most up-to-date _source object (the one where John is president).
I'd actually like to implement a version differences aspect much like StackOverflow does. (BTW, I'm using elastic-search as my main db - if there's a way to do this with other NoSQL databases, I'd be happy to try it out. Preferably, one that integrates well with ES.)

No, you can't do this using the builtin versioning. All that does is to store the current version number to prevent you applying updates out of order.
If you wanted to keep multiple versions available, then you'd have to implement that yourself. Depending on how many versions you are likely to want to store, you could take three approaches:
For low volume changes:
1) store older versions within the same document
{ text: "foo bar",
date: "2011-11-01",
previous: [
{ date: '2011-10-01', content: { text: 'Foo Bar' }},
{ date: '2011-09-01', content: { text: 'Foo-bar!' }},
]
}
For high volume changes:
2) add a current flag:
{
doc_id: 123,
version: 3,
text: "foo bar",
date: "2011-11-01",
current: true
}
{
doc_id: 123,
version: 2,
text: "Foo Bar",
date: "2011-10-01",
current: false
}
3) Same as (2) above, but store the old versions in a separate index, so keeping your "live" index, which will be used for the majority of your queries, small and more performant.

Related

Elasticsearch re-index all vs join

I'm pretty new on Elasticsearch and all its concepts. I would like to understand how I could accomplish what I have in my Relational DB in an Elasticsearch architecture.
The scenario is the following
I have a index "data":
{
"id": "00001",
"content" : "some text here ..",
"type": "T1",
"categories: ["A", "A1", "B"]
}
The requirement says that data can be queried by:
some text search in the context field
that belongs to a specific type or category
So far, so simple, so good.
This data will not be completed from the creating time. It might happen that new categories will be added/removed to the data later. So, many data uploads/re-indexes might happen along the way
For example:
create the data
{
"id": "00001",
"content" : "some text here ..",
"type": "T1",
"categories: ["A"]
}
Then it was decided that all data with type=T1 must belong to both A & B categories.
{
"id": "00001",
"content" : "some text here ..",
"type": "T1",
"categories: ["A", "B"]
}
If I have a billion hits for type=T1 I would have to update/re-index a billion entries. Maybe it is how things should work and this where my question lands on.
Is ok to re-index all the data just to add/remove a new category, or would it be possible to have a second much smaller index just to do this association and somehow join both indexes at time to query?
Something like it:
Data:
{
"id": "00001",
"content" : "some text here ..",
"type": "T1"
}
DataCategories:
{
"type": "T1"
"categories" : ["A", "B"]
}
Is it acceptable/possible?
This is a common scenario - but unfortunately, there is no 1:1 mapping for RDBMS features in text search engines like Lucene/elasticsearch.
Possible options:
1 - For the best performance, reindex. It may not be practical depending on the velocity of your change
2 - Consider Parent-Child; Though it's a slower option - often will meet performance requirements. The category could be a parent document, each having several thousands of children.
3 - If its category renaming - Consider using IDs for the category and translating it to text in the application.
4 - Update document depends on the number of documents to be updated; maybe for few thousand - run an update query, if more - reindex.
Suggested reading - https://www.elastic.co/blog/managing-relations-inside-elasticsearch

How can I show a table with the sum of value x of all childeren within Kibana

I'm have an elasticsearch database with documents stored the following way(, seperates the documents):
{
"path":"path/to/data"
"kind": "type1"
},
{
"path":"path/to/data/values1"
"kind": "type2"
"x": 2
},
{
"path":"path/to/data/values2"
"kind": "type2"
"x": 2
},
{
"path":"path/to/data/datasub"
"kind": "type1"
},
{
"path":"path/to/data/datasub/values1"
"kind": "type2"
"x": 1
}
Now I want the create table view/chart show all type2's with all the sum of x of all their childeren.
So I expect the total of path/to/data to be 5 and the total of path/to/data/datasub 1.
To consider: the depth of this structure could theoretically be unlimited
I'm running Elastichsearch 7 and Kibana 7 and I want to use the table visualisation to start with but I would like to be able to use this kind of aggregation throughout multiple visualisations. I have Googles a lot and found all kinds of Elastichsearch queries but nothing on how to achieve this in Kibana.
All help is much appreciated
For those who run into the same question:
The solution I ended up using is to split the path in to tokens prior to importing it into Elasticsearch. So consider a document having a path like "/this/is/a/path". This becomes the following array in the document:
[
"/this",
"/this/is",
"/this/is/a",
"/this/is/a/path"
]
You can then use a terms aggregation on it with various metrics to calculate your desired measurements.

elasticsearch return unique matched values

I recently started looking at elasticsearch, I'm in the process of learning what it can do and decide how I can use it in my projects.
For one project I used a couchdb (noSQL) database. The client can search it using couchdb views. Easy, but limited in functionality.
I'd like to have elasticsearch to open up the data in a far more rich way.
Searching for composers and titles of musical pieces is now handled by elasticsearch with amazingly fast 'query_string's. And it's fuzzy!
There is one thing however I did not manage to accomplish with elasticsearch, but I'm pretty sure it's possible, I'm just missing it.
It's about the autocomplete functionality when entering instrument names.
For example:
I have 2 documents (musical pieces) with different instruments needed to play them:
{
title: 'Awesome Piece',
authors: [{
name: 'John Doe',
role: 'composer'
}, {
name: 'Shakespeare',
role: 'lyricist'
}],
instruments: [
'soprano',
'alto',
'tenor',
'bass',
'trumpet',
'trumpet',
'piano'
]
}
{
title: 'Not so Awesome Piece',
authors: [{
name: 'Another J. Doe',
role: 'composer'
}, {
name: 'Little John',
role: 'arranger'
}],
instruments: [
'trombone',
'organ'
]
}
To enter a new musical piece there is a field to insert instrument names. I'd like the offer an autocomplete.
So if the user types 't', I want a list of all instruments matching 't*': ['tenor', 'trumpet', 'trombone'], if he types 'tr', I need: ['trumpet', 'trombone']
The best I coold find was a query with an aggregation, but it searches for documents and aggregates them as a whole, returning all instruments of the document(s) found with the query.
And off course, I want the autocomplete to be fuzzy in the end.
Can anybody point me in a direction?
Thanks in advance!
(I'm running elasticsearch 2.3, but I don't mind upgrading!)

Multiple atomic updates using MongoDB?

I am using Codeigniter and Alex Bilbie's MongoDB library.
In my API that I am developing users can upload images and other users can comment on them.
I have chosen to include the comments as sub documents to the images.
Each comment contains:
Fullname (of author)
Comment
Created_at
So in other words. The users full name is "hard coded" into each comment so if they
later decides to change their names I have a problem.
I read that I can use atomic updates to update all occurrences of the name (like in comments) but how can I do this using Alex´s library? Can I update all places where the name is wrong?
UPDATE
This is how the image document looks like with the comments.
I think that it is pretty strange that MongoDB encourage the use of subdocuments but then does not include a way to update multiple items in an array.
{
"_id": ObjectId("4e9ead773dc793dc01020000"),
"description": "An image",
"category": "accident",
"comments": [
{
"id": ObjectId("4e96bd063dc7937202000000"),
"fullname": "James Bond",
"comment": "This is a comment.",
"created_at": "2011-10-19 13:02:40"
}
],
"created_at": "2011-10-19 12:59:03"
}
Thankful for all help!
I am not familiar with codeignitor, but mb mongodb shell syntax will help you:
db.comments.update( {"Fullname":"Andrew Orsich"},
{ $set : { Fullname: "New name"} }, false, true )
Last true flag indicate that you want update multiple documents. So it is possible to update all comments in one update operation.
BTW: denormalazing (not 'hard coding') data in mongodb and nosql in general is usual operation. Also operation that require update a lot of documents usually work async. But it is up to you.
Update:
db.comments.update( {"comments.Fullname":"Andrew Orsich"},
{ $set : { comments.$.Fullname: "New name"} }, false, true )
But, above query will update full name in first comment on nested array. If you need to affect changes to more than one array element you will need to use multiple update statements.

Complex CouchDb View

I have the following documents in a database:
{
id: 1,
text: 'Hello I had a big grannysmith apple today and a big pear'
},
{
id: 2,
text: 'Hello I had a big apple today only'
},
{
id: 3,
text: 'Hello I had a big apple today, a big pear yesterday and a big orange today'
}
My view needs to return an aggregated count of specific keywords in the text, but the keywords need to be 'loose' on how they found. My view should return something like this:
{
'grannysmith apple' : 3,
'pear': 2,
'orange': 1
}
As you can see I have counted apples 3 times even though the tag is for 'grannysmith apple', I still want to pick up any occurrences of 'apples' as well.
Is this possible? Or should I be doing this before I insert into CouchDb? I'm using node.js to perform the saving - should I do it in node.js?
Thanks in advance!
I think you should use RegExp objects as keywords and aggregate with group_level.

Resources