ElasticSearch Chaining queries based on result from first query - elasticsearch

I have some data and I am looking to implement a search feature that probably requires chaining multiple queries. for example there are few people who are part of a group but each member in the database are separate. None of the data is nested.
For example
data = [
{
id: '1'
name: 'abc',
familyId: '3'
},
{
id: '2'
name: 'def',
familyId: '3'
},
{
id: '3'
name: 'ghi',
familyId: null
},
]
So now I am trying to implement a search feature where people can search by name, and if the name matches I want to show that result along with his family members. Each data is different and there is no connection between them apart from the familyId.
So currently my solution is to make a search using the name first and then from the result of my first search I will see if there is family ID present in the result, and if yes make another ES query to get all the members and then show the result.
Is there a away I could make it one query that will give me the desired output?
Any suggestion is very much appreciated.

there's no native Elasticsearch functionality for this unfortunately, your approach is the current best way to do it

Related

ElasticSearch - backward pagination with search_after when sorting value is null

I have an application which has a dashboard, basically a table with hundreds of thousands of records.
This table has up to 50 different columns. These columns have different types in mapping: keyword, text, boolean, integer.
As records in the table might have the same values, I use sorting as an array of 2 attributes:
First attribute is what client wants to sort by. It can be a simple
sorting object or some sort query with nested filter.
Second
attribute is basically a default sorting by id, needed for sorting
the documents which have identical values for the column customer
wants to sort by.
I checked multiple topics/issues on github and here
on elastic forum to understand how to implement search_after
mechanism for back sorting but it's not working for all the cases I
need.
Please have a look at the image:
Imagine there is a limit = 3, the customer right now is on the 3d page of a table and all the data is sorted by name asc, _id asc
The names are: A, B, C, D, E on the image.
The ids are numeric parts of the Doc word.
When customer wants to go back to the previous page, which is a page #2 on my picture, what I do is pass the following to elastic:
sort: [
{
name: 'desc'
},
{
_id: 'desc'
}
],
search_after: [null, Doc7._id]
As as result, I get only one document, which is Doc6: null on my image. It seems to be logical, because I ask elastic to search by desc after null and id 7 and I have only 1 doc corresponding this..it's Doc6 but it's not what I need.
I can't make up the solution to get the data that I need.
Could anyone help, please?

Elasticsearch question, should I have duplicate data along 2 different indices? Not sure how to set up the data

Edit: 3 different incides. Sorry about the title :c
I am trying to grasp elasticsearch as fast as I can but I think I've confused myself majorly here. How should I set this data up?
I have 3 major searches:
1: Search by pokemon name. Eg: Show all Charizard in the system.
2: Search by trainer name Eg: Show all of John Doe's pokemon/checkins at the pokecenter.
3: Search by checkins at the pokecenter.
Should each of these be in their own separate index? I am absolutely from an SQL background primarily so I want to have separate tables for all of these. But that isn't how elasticsearch works... so I am really confused here.
Should I have a separate index for each pokemon?
And then another separate index for each trainer?
And then another separate index for each checkin at the pokecenter?
Query return examples
1: Search by pokemon name.
{
1 : {
id: 9239329,
pokeId: 6,
name: Charizard,
trainerId: 2932
}
}
2: Search by trainer name
{
1 : {
id: 2932,
name: John Doe,
pokemon: [
9239329
]
}
}
3: Search by checkins at the pokecenter.
{
1 : {
id: 3232,
date: 11/11/1111,
pokemon: [
9239329
],
trainerId: 2932
}
}
But if I have a separate index.... and index for EACH of these ... while that would be fast wouldn't that just be crazy horrendous data duplication?
It depends on the scope of the project :
the ideal way is to have each one as it's separate index this allows you to scale them differently if needed and move them to another cluster and also allow each one to have different replica settings
The quick way , is to have the checkins as an index and the trainer as a nested object , and under that the pokemon as a nested object.
note: nested queries are slower, and writing the queries to return exactly what you want is a little tricker.

elasticsearch return unique matched values

I recently started looking at elasticsearch, I'm in the process of learning what it can do and decide how I can use it in my projects.
For one project I used a couchdb (noSQL) database. The client can search it using couchdb views. Easy, but limited in functionality.
I'd like to have elasticsearch to open up the data in a far more rich way.
Searching for composers and titles of musical pieces is now handled by elasticsearch with amazingly fast 'query_string's. And it's fuzzy!
There is one thing however I did not manage to accomplish with elasticsearch, but I'm pretty sure it's possible, I'm just missing it.
It's about the autocomplete functionality when entering instrument names.
For example:
I have 2 documents (musical pieces) with different instruments needed to play them:
{
title: 'Awesome Piece',
authors: [{
name: 'John Doe',
role: 'composer'
}, {
name: 'Shakespeare',
role: 'lyricist'
}],
instruments: [
'soprano',
'alto',
'tenor',
'bass',
'trumpet',
'trumpet',
'piano'
]
}
{
title: 'Not so Awesome Piece',
authors: [{
name: 'Another J. Doe',
role: 'composer'
}, {
name: 'Little John',
role: 'arranger'
}],
instruments: [
'trombone',
'organ'
]
}
To enter a new musical piece there is a field to insert instrument names. I'd like the offer an autocomplete.
So if the user types 't', I want a list of all instruments matching 't*': ['tenor', 'trumpet', 'trombone'], if he types 'tr', I need: ['trumpet', 'trombone']
The best I coold find was a query with an aggregation, but it searches for documents and aggregates them as a whole, returning all instruments of the document(s) found with the query.
And off course, I want the autocomplete to be fuzzy in the end.
Can anybody point me in a direction?
Thanks in advance!
(I'm running elasticsearch 2.3, but I don't mind upgrading!)

How do I model my document in MongoDB to make it paginable for nested attributes?

I'm trying to cache my tweets and show that based on my keyword save. However, as tweets grow overtime I need to paginate them.
I'm using Ruby and Mongoid which this is what I have come up so far.
class SavedTweet
include Mongoid::Document
field :saved_id, :type => String
field :slug, :type => String
field :tweets, :type => Array
end
And the tweets array would be like this
{id: "id", text: "text", created_at: "created_at"}
So it's like a bucket for each keyword that you can save. My first problem would be that Mongodb cannot sort the second level of document which in this case it's tweets and that'd make pagination much harder because I cannot use skip and limit. I will have to load the whole tweets and put that in the cache and paginate from that.
The question is how should I model my problem to make it paginable out of Mongodb and not in the memory. I'm assuming that doing it in Mongodb would be faster. Right now, I'm in the early stage of my application so it's easier to change the model than later. If you guys have any suggestions or opinion I'm really appreciated.
An option could be to save tweets in a different collection and link them with your SavedTweet class. It will be easy to query and you could use skip and limit without problems.
{id: "id", text: "text", created_at: "created_at", saved_tweet:"_id"}
EDIT: a better explanation, with two aditional options
As far I see, you have three options, if I understand correctly your requirements:
Use the same schema that you are already using. You would have two problems: you cannot use skip and limit with an usual query and you have a limit of 16 MB per document. I think, the first one could be resolved with an Aggregation Framework query ($unwind, $skip and $limit could be helpful). The second one could be a problem if you have a lot of tweet documents in the array, because one document cannot have more than 16MB of size.
Use two collections to store your tweets. One collection would have the same structure that you already have. For example:
{
save_id:"23232",
slug:"adkj"
}
And the other collection would have one document per tweet.
{
id: "id",
text: "text",
created_at: "created_at",
saved_tweet:"_id"
}
With saved_tweet field you are linking saved_tweets with tweet with a 1 to N relation. So with this way, you can carry out queries over tweet collection and still be able to use limit and skip operators..
Save all info in the same document. If your saved_tweet collection only have those fields, you can save all info in a whole document (one document for each tweet). Something like this:
{
save_id:"23232",
slug:"adkj"
tweet:
{
tweet_id: "id",
text: "text",
created_at: "created_at"
}
}
Whit this solution you are duplicating fields, because *save_id* and slug would be the same in other documents of the same saved_tweet, but I could be an option if you have a little quantity of fields and that fields are not subdocuments or arrays.
I hope it is clear now.

How do you calculate the average of a all entries in a field from a specific collection in Mongo using Ruby

Given the following data:
{
_id: ObjectId("51659dc99d62eedc1a000001"),
type: "image_search",
branch: "qa_media_discovery_feelobot",
time_elapsed: 19000,
test: "1365613930 All Media",
search_term: null,
env: "delta",
date: ISODate("2013-04-10T17:13:45.751Z")
}
I would like to run a command like:
avg_image_search_time = #coll.find("type" => "image_search").avg(:time_elapsed)
How would I accomplish this?
I understand the documentation on this is kind of difficult to follow.
avg_image_search_time = #coll.aggregate([ {"$group" => {"_id"=>"$type", "avg"=> {"$avg"=>"$time_elapsed"}}}, {"$match" => {"_id"=>"image_search"}} ]).first['avg']
To break this down:
We are grouping the matches by the type field, and returning the $avg time_elapsed for each type. We name the resulting average avg. Then, of those groups, filter out only the ones where the group _id matches image_search. Finally, since aggregate always returns an array, get the first result (there should only be one), and grab the avg field that we named.
Use the mongodb aggregation framework http://docs.mongodb.org/manual/core/aggregation/

Resources