Complex CouchDb View - view

I have the following documents in a database:
{
id: 1,
text: 'Hello I had a big grannysmith apple today and a big pear'
},
{
id: 2,
text: 'Hello I had a big apple today only'
},
{
id: 3,
text: 'Hello I had a big apple today, a big pear yesterday and a big orange today'
}
My view needs to return an aggregated count of specific keywords in the text, but the keywords need to be 'loose' on how they found. My view should return something like this:
{
'grannysmith apple' : 3,
'pear': 2,
'orange': 1
}
As you can see I have counted apples 3 times even though the tag is for 'grannysmith apple', I still want to pick up any occurrences of 'apples' as well.
Is this possible? Or should I be doing this before I insert into CouchDb? I'm using node.js to perform the saving - should I do it in node.js?
Thanks in advance!

I think you should use RegExp objects as keywords and aggregate with group_level.

Related

Elasticsearch question, should I have duplicate data along 2 different indices? Not sure how to set up the data

Edit: 3 different incides. Sorry about the title :c
I am trying to grasp elasticsearch as fast as I can but I think I've confused myself majorly here. How should I set this data up?
I have 3 major searches:
1: Search by pokemon name. Eg: Show all Charizard in the system.
2: Search by trainer name Eg: Show all of John Doe's pokemon/checkins at the pokecenter.
3: Search by checkins at the pokecenter.
Should each of these be in their own separate index? I am absolutely from an SQL background primarily so I want to have separate tables for all of these. But that isn't how elasticsearch works... so I am really confused here.
Should I have a separate index for each pokemon?
And then another separate index for each trainer?
And then another separate index for each checkin at the pokecenter?
Query return examples
1: Search by pokemon name.
{
1 : {
id: 9239329,
pokeId: 6,
name: Charizard,
trainerId: 2932
}
}
2: Search by trainer name
{
1 : {
id: 2932,
name: John Doe,
pokemon: [
9239329
]
}
}
3: Search by checkins at the pokecenter.
{
1 : {
id: 3232,
date: 11/11/1111,
pokemon: [
9239329
],
trainerId: 2932
}
}
But if I have a separate index.... and index for EACH of these ... while that would be fast wouldn't that just be crazy horrendous data duplication?
It depends on the scope of the project :
the ideal way is to have each one as it's separate index this allows you to scale them differently if needed and move them to another cluster and also allow each one to have different replica settings
The quick way , is to have the checkins as an index and the trainer as a nested object , and under that the pokemon as a nested object.
note: nested queries are slower, and writing the queries to return exactly what you want is a little tricker.

Counting documents by property occurrence in Kibana

I'm trying to create a visualization that looks like this:
Foobar, 10
Bar, 8
Baz, 5.6
The first column is the aggregation itself. Imagine i have documents like this:
{
id: 1,
name: 'lorem ipsum',
type: 'A'
author: {
name: 'Foobar',
}
}
{
id: 2,
name: 'dolor sit amet',
type: 'B',
author: {
name: 'Foobar',
}
}
So, i want to add a +1 to the score of "Foobar" everytime i find a document of type A. And a +2 to the score if i find a document of type B. Basically, aggregating by the author name, and calculating a dynamic value on results.
Is this possible in Kibana? Thanks for the help.
AFAIK, you can't do this in Kibana in visualize panel, maybe you can try it in program then index the result into es.

elasticsearch return unique matched values

I recently started looking at elasticsearch, I'm in the process of learning what it can do and decide how I can use it in my projects.
For one project I used a couchdb (noSQL) database. The client can search it using couchdb views. Easy, but limited in functionality.
I'd like to have elasticsearch to open up the data in a far more rich way.
Searching for composers and titles of musical pieces is now handled by elasticsearch with amazingly fast 'query_string's. And it's fuzzy!
There is one thing however I did not manage to accomplish with elasticsearch, but I'm pretty sure it's possible, I'm just missing it.
It's about the autocomplete functionality when entering instrument names.
For example:
I have 2 documents (musical pieces) with different instruments needed to play them:
{
title: 'Awesome Piece',
authors: [{
name: 'John Doe',
role: 'composer'
}, {
name: 'Shakespeare',
role: 'lyricist'
}],
instruments: [
'soprano',
'alto',
'tenor',
'bass',
'trumpet',
'trumpet',
'piano'
]
}
{
title: 'Not so Awesome Piece',
authors: [{
name: 'Another J. Doe',
role: 'composer'
}, {
name: 'Little John',
role: 'arranger'
}],
instruments: [
'trombone',
'organ'
]
}
To enter a new musical piece there is a field to insert instrument names. I'd like the offer an autocomplete.
So if the user types 't', I want a list of all instruments matching 't*': ['tenor', 'trumpet', 'trombone'], if he types 'tr', I need: ['trumpet', 'trombone']
The best I coold find was a query with an aggregation, but it searches for documents and aggregates them as a whole, returning all instruments of the document(s) found with the query.
And off course, I want the autocomplete to be fuzzy in the end.
Can anybody point me in a direction?
Thanks in advance!
(I'm running elasticsearch 2.3, but I don't mind upgrading!)

Restrict a find query MongoDB with ruby driver

I have a mongo query that look like this :
coll.find({ foo: bar }, {sort: { '$natural' => -1 }, limit: 1})
My problem is that my collection contain a huge quantity of documents and the find is really slow.
So I was wondering if I could restrict the query for the last x inserted documents ?
Something like this :
coll.find({ foo: bar }, {sort: { '$natural' => -1 }, limit: 1, max_doc: 10_000})
Thanks and sorry for my english.
You can't do that restriction at query time. But you could have a separated caped collection for this effect.
You insert on both and do this query on the caped collection which will will only retain the last N documents.
But this will only fit if you don't need to update or remove documents from that collection.
http://docs.mongodb.org/manual/core/capped-collections

Can We Retrieve Previous _source Docs with Elastic Search Versions

I've read the blog post on ES regarding versioning.
However, I'd like to be able to get the previous _source documents from an update.
For example, let's say I have this object:
{
"name": "John",
"age": 32,
"job": "janitorial technician"
}
// this becomes version 1
And I update it to:
{
"name": "John",
"age": 32,
"job": "president"
}
// this becomes version 2
Then, through versioning in ES, would I be able to get the previous job property of the object? I've tried this:
curl -XGET "localhost:9200/index/type/id?version=1"
but that just returns the most up-to-date _source object (the one where John is president).
I'd actually like to implement a version differences aspect much like StackOverflow does. (BTW, I'm using elastic-search as my main db - if there's a way to do this with other NoSQL databases, I'd be happy to try it out. Preferably, one that integrates well with ES.)
No, you can't do this using the builtin versioning. All that does is to store the current version number to prevent you applying updates out of order.
If you wanted to keep multiple versions available, then you'd have to implement that yourself. Depending on how many versions you are likely to want to store, you could take three approaches:
For low volume changes:
1) store older versions within the same document
{ text: "foo bar",
date: "2011-11-01",
previous: [
{ date: '2011-10-01', content: { text: 'Foo Bar' }},
{ date: '2011-09-01', content: { text: 'Foo-bar!' }},
]
}
For high volume changes:
2) add a current flag:
{
doc_id: 123,
version: 3,
text: "foo bar",
date: "2011-11-01",
current: true
}
{
doc_id: 123,
version: 2,
text: "Foo Bar",
date: "2011-10-01",
current: false
}
3) Same as (2) above, but store the old versions in a separate index, so keeping your "live" index, which will be used for the majority of your queries, small and more performant.

Resources