Retrieving a Subset of Fields from MongoDB in Ruby - ruby

I'm trying to get a subset of fields from MongoDB with a query made in Ruby but it doesn't seem to work. It doesn't return any results
This is the ruby code:
coll.find("title" => 'Halo', :fields => ["title", "isrc"]) #this doesn't work
If I remove the fields hash, it works, returning the results with all the fields
coll.find("title" => 'Halo') #this works
Looking at the mongodb console the first query ends-up on the mongodb server like this:
{ title: "Halo", fields: [ "title", "isrc" ] }
If I try to make the query from the mongo client console, it works, I get the results and the subset. I make the query like this:
db.tracks.find({title: 'Halo'}, {title:1,isrc:1})
What could be the problem? I've been looking for a solution for this for a couple of hours now.

As of Sep, 2015, these other answers are outdated. You need to use the projection method: #projection(hash)
coll.find({"title" => 'Halo'}).projection({title: 1, isrc: 1})

The query should look like
collection.find(selector = {}, opts = {})
Query the database
In your case it is
coll.find({"title" => 'Halo'}, {:fields => ["title", "isrc"]})
But still remains a problem, the ruby-driver ignores the condition of "fields", and returns all the fields! :\

This query will return only the title and isrc for a doc that has the title "Halo":
coll.find({"title" => 'Halo'},{:fields => {"_id" => 0, "title" => 1, "isrc" => 1}})
Note the use of a Hash for the fields where the keys are the field names and the values are either 1 or 0, depending on whether you want to include or exclude the given field.

You can use the below query
coll.find({"title" => 'Halo'}).projection({title: 1, isrc: 1, _id: 0})
if you don't want _id, to be retrieved in case.

Related

Elasticsearch Completion Suggester ignores Index parameter and returns results for multiple indices

I'm using the PHP implementation of Elastic to use a Completion Suggester like this:
$params_organisations = [
'index' => $this->organisation_index,
'body' => [
"suggest" => [
"suggestions" => [
'prefix' => $request->q,
"completion" => [
"field" => "suggest1",
"fuzzy" => ["fuzziness" => 0],
"skip_duplicates" => "false",
"size" => 7
]
]
]
]
];
However, the response contains other indices as well:
suggest: {suggestions: Array(1)}
timed_out: false
took: 8
_shards:
failed: 3
failures: Array(3)
0:
index: ".kibana_1"
node: "xxxxxxxxx"
reason: {type: "illegal_argument_exception", reason: "no mapping found for field [suggest1]"}
I fear this might impact performance as some other indices do contain a suggest1, field as well and they are searched and returning results. I've not changed the names and sometimes I want to treat the suggest fields in a similar way, but is it problematic to have identical suggest-type field_names across indices?
Or is there a way to more explicitly define an index? I've also tried appending the index name to the endpoint, but same result. I've found an explicit suggest endpoint in the PHP implementation, but it seems to be deprecated? Any help is much appreciated!
Ok, so the problem was not with ElasticSearch, it turns out the index string coming from the configuration was not being processed correctly, yielding an empty string, causing Elastic to query all indices.

Simple query without a specified field searching in whole ElasticSearch index

Say we have an ElasticSearch instance and one index. I now want to search the whole index for documents that contain a specific value. It's relevant to the search for this query over multiple fields, so I don't want to specify every field to search in.
My attempt so far (using NEST) is the following:
var res2 = client.Search<ElasticCompanyModelDTO>(s => s.Index("cvr-permanent").AllTypes().
Query(q => q
.Bool(bo => bo
.Must( sh => sh
.Term(c=>c.Value(query))
)
)
));
However, the query above results in an empty query:
I get the following output, ### ES REQEUST ### {} , after applying the following debug on my connectionstring:
.DisableDirectStreaming()
.OnRequestCompleted(details =>
{
Debug.WriteLine("### ES REQEUST ###");
if (details.RequestBodyInBytes != null) Debug.WriteLine(Encoding.UTF8.GetString(details.RequestBodyInBytes));
})
.PrettyJson();
How do I do this? Why is my query wrong?
Your problem is that you must specify a single field to search as part of a TermQuery. In fact, all ElasticSearch queries require a field or fields to be specified as part of the query. If you want to search every field in your document, you can use the built-in "_all" field (unless you've disabled it in your mapping.)
You should be sure you really want a TermQuery, too, since that will only match exact strings in the text. This type of query is typically used when querying short, unanalyzed string fields (for example, a field containing an enumeration of known values like US state abbreviations.)
If you'd like to query longer full-text fields, consider the MultiMatchQuery (it lets you specify multiple fields, too.)
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html
Try this
var res2 = client.Search<ElasticCompanyModelDTO>(s =>
s.Index("cvr-permanent").AllTypes()
.Query(qry => qry
.Bool(b => b
.Must(m => m
.QueryString(qs => qs
.DefaultField("_all")
.Query(query))))));
The existing answers rely on the presence of _all. In case anyone comes across this question at a later date, it is worth knowing that _all was removed in ElasticSearch 6.0
There's a really good video explaining the reasons behind this and the way the replacements work from ElasticOn starting at around 07:30 in.
In short, the _all query can be replaced by a simple_query_string and it will work with same way. The form for the _search API would be;
GET <index>/_search
{
"query": {
"simple_query_string" : {
"query": "<queryTerm>"
}
}
}
The NEST pages on Elastic's documentation for this query are here;

Restrict a find query MongoDB with ruby driver

I have a mongo query that look like this :
coll.find({ foo: bar }, {sort: { '$natural' => -1 }, limit: 1})
My problem is that my collection contain a huge quantity of documents and the find is really slow.
So I was wondering if I could restrict the query for the last x inserted documents ?
Something like this :
coll.find({ foo: bar }, {sort: { '$natural' => -1 }, limit: 1, max_doc: 10_000})
Thanks and sorry for my english.
You can't do that restriction at query time. But you could have a separated caped collection for this effect.
You insert on both and do this query on the caped collection which will will only retain the last N documents.
But this will only fit if you don't need to update or remove documents from that collection.
http://docs.mongodb.org/manual/core/capped-collections

How do you calculate the average of a all entries in a field from a specific collection in Mongo using Ruby

Given the following data:
{
_id: ObjectId("51659dc99d62eedc1a000001"),
type: "image_search",
branch: "qa_media_discovery_feelobot",
time_elapsed: 19000,
test: "1365613930 All Media",
search_term: null,
env: "delta",
date: ISODate("2013-04-10T17:13:45.751Z")
}
I would like to run a command like:
avg_image_search_time = #coll.find("type" => "image_search").avg(:time_elapsed)
How would I accomplish this?
I understand the documentation on this is kind of difficult to follow.
avg_image_search_time = #coll.aggregate([ {"$group" => {"_id"=>"$type", "avg"=> {"$avg"=>"$time_elapsed"}}}, {"$match" => {"_id"=>"image_search"}} ]).first['avg']
To break this down:
We are grouping the matches by the type field, and returning the $avg time_elapsed for each type. We name the resulting average avg. Then, of those groups, filter out only the ones where the group _id matches image_search. Finally, since aggregate always returns an array, get the first result (there should only be one), and grab the avg field that we named.
Use the mongodb aggregation framework http://docs.mongodb.org/manual/core/aggregation/

Getting the highest value of a column in MongoDB

I've been for some help on getting the highest value on a column for a mongo document. I can sort it and get the top/bottom, but I'm pretty sure there is a better way to do it.
I tried the following (and different combinations):
transactions.find("id" => x).max({"sellprice" => 0})
But it keeps throwing errors. What's a good way to do it besides sorting and getting the top/bottom?
Thank you!
max() does not work the way you would expect it to in SQL for Mongo. This is perhaps going to change in future versions but as of now, max,min are to be used with indexed keys primarily internally for sharding.
see http://www.mongodb.org/display/DOCS/min+and+max+Query+Specifiers
Unfortunately for now the only way to get the max value is to sort the collection desc on that value and take the first.
transactions.find("id" => x).sort({"sellprice" => -1}).limit(1).first()
Sorting might be overkill. You can just do a group by
db.messages.group(
{key: { created_at:true },
cond: { active:1 },
reduce: function(obj,prev) { if(prev.cmax<obj.created_at) prev.cmax = obj.created_at; },
initial: { cmax: **any one value** }
});
db.collectionName.aggregate(
{
$group :
{
_id : "",
last :
{
$max : "$sellprice"
}
}
}
)
Example mongodb shell code for computing aggregates.
see mongodb manual entry for group (many applications) :: http://docs.mongodb.org/manual/reference/aggregation/group/#stage._S_group
In the below, replace the $vars with your collection key and target variable.
db.activity.aggregate(
{ $group : {
_id:"$your_collection_key",
min: {$min : "$your_target_variable"},
max: {$max : "$your_target_variable"}
}
}
)
Use aggregate():
db.transactions.aggregate([
{$match: {id: x}},
{$sort: {sellprice:-1}},
{$limit: 1},
{$project: {sellprice: 1}}
]);
It will work as per your requirement.
transactions.find("id" => x).sort({"sellprice" => -1}).limit(1).first()
If the column's indexed then a sort should be OK, assuming Mongo just uses the index to get an ordered collection. Otherwise it's more efficient to iterate over the collection, keeping note of the largest value seen. e.g.
max = nil
coll.find("id" => x).each do |doc|
if max == nil or doc['sellprice'] > max then
max = doc['sellprice']
end
end
(Apologies if my Ruby's a bit ropey, I haven't used it for a long time - but the general approach should be clear from the code.)
Assuming I was using the Ruby driver (I saw a mongodb-ruby tag on the bottom), I'd do something like the following if I wanted to get the maximum _id (assuming my _id is sortable). In my implementation, my _id was an integer.
result = my_collection.find({}, :sort => ['_id', :desc]).limit(1)
To get the minimum _id in the collection, just change :desc to :asc
Following query does the same thing:
db.student.find({}, {'_id':1}).sort({_id:-1}).limit(1)
For me, this produced following result:
{ "_id" : NumberLong(10934) }

Resources