Multiple limit condition in mongodb - ruby

I have a collection in which one of the field is "type". I want to get some values of each type depending upon condition which is same for all the types. Like I want 2 documents for type A, 2 for type B like that.
How to do this in a single query? I am using Ruby Active Record.

Generally what you are describing is a relatively common question around the MongoDB community which we could describe as the "top n results problem". This is when given some input that is likely sorted in some way, how to get the top n results without relying on arbitrary index values in the data.
MongoDB has the $first operator which is available to the aggregation framework which deals with the "top 1" part of the problem, as this actually takes the "first" item found on a grouping boundary, such as your "type". But getting more than "one" result of course gets a little more involved. There are some JIRA issues on this about modifying other operators to deal with n results or "restrict" or "slice". Notably SERVER-6074. But the problem can be handled in a few ways.
Popular implementations of the rails Active Record pattern for MongoDB storage are Mongoid and Mongo Mapper, both allow access to the "native" mongodb collection functions via a .collection accessor. This is what you basically need to be able to use native methods such as .aggregate() which supports more functionality than general Active Record aggregation.
Here is an aggregation approach with mongoid, though the general code does not alter once you have access to the native collection object:
require "mongoid"
require "pp";
Mongoid.configure.connect_to("test");
class Item
include Mongoid::Document
store_in collection: "item"
field :type, type: String
field :pos, type: String
end
Item.collection.drop
Item.collection.insert( :type => "A", :pos => "First" )
Item.collection.insert( :type => "A", :pos => "Second" )
Item.collection.insert( :type => "A", :pos => "Third" )
Item.collection.insert( :type => "A", :pos => "Forth" )
Item.collection.insert( :type => "B", :pos => "First" )
Item.collection.insert( :type => "B", :pos => "Second" )
Item.collection.insert( :type => "B", :pos => "Third" )
Item.collection.insert( :type => "B", :pos => "Forth" )
res = Item.collection.aggregate([
{ "$group" => {
"_id" => "$type",
"docs" => {
"$push" => {
"pos" => "$pos", "type" => "$type"
}
},
"one" => {
"$first" => {
"pos" => "$pos", "type" => "$type"
}
}
}},
{ "$unwind" => "$docs" },
{ "$project" => {
"docs" => {
"pos" => "$docs.pos",
"type" => "$docs.type",
"seen" => {
"$eq" => [ "$one", "$docs" ]
},
},
"one" => 1
}},
{ "$match" => {
"docs.seen" => false
}},
{ "$group" => {
"_id" => "$_id",
"one" => { "$first" => "$one" },
"two" => {
"$first" => {
"pos" => "$docs.pos",
"type" => "$docs.type"
}
},
"splitter" => {
"$first" => {
"$literal" => ["one","two"]
}
}
}},
{ "$unwind" => "$splitter" },
{ "$project" => {
"_id" => 0,
"type" => {
"$cond" => [
{ "$eq" => [ "$splitter", "one" ] },
"$one.type",
"$two.type"
]
},
"pos" => {
"$cond" => [
{ "$eq" => [ "$splitter", "one" ] },
"$one.pos",
"$two.pos"
]
}
}}
])
pp res
The naming in the documents is actually not used by the code, and titles in the data shown for "First", "Second" etc, are really just there to illustrate that you are indeed getting the "top 2" documents from the listing as a result.
So the approach here is essentially to create a "stack" of the documents "grouped" by your key, such as "type". The very first thing here is to take the "first" document from that stack using the $first operator.
The subsequent steps match the "seen" elements from the stack and filter them, then you take the "next" document off of the stack again using the $first operator. The final steps in there are really justx to return the documents to the original form as found in the input, which is generally what is expected from such a query.
So the result is of course, just the top 2 documents for each type:
{ "type"=>"A", "pos"=>"First" }
{ "type"=>"A", "pos"=>"Second" }
{ "type"=>"B", "pos"=>"First" }
{ "type"=>"B", "pos"=>"Second" }
There was a longer discussion and version of this as well as other solutions in this recent answer:
Mongodb aggregation $group, restrict length of array
Essentially the same thing despite the title and that case was looking to match up to 10 top entries or greater. There is some pipeline generation code there as well for dealing with larger matches as well as some alternate approaches that may be considered depending on your data.

You will not be able to do this directly with only the type column and the constraint that it must be one query. However there is (as always) a way to accomplish this.
To find documents of different types, you would need to have some type of additional value that, on average distributed the types out according to how you want the data back.
db.users.insert({type: 'A', index: 1})
db.users.insert({type: 'B', index: 2})
db.users.insert({type: 'A', index: 3})
db.users.insert({type: 'B', index: 4})
db.users.insert({type: 'A', index: 5})
db.users.insert({type: 'B', index: 6})
Then when querying for items with db.users.find(index: {$gt: 2, $lt: 7}) you will have the right distribution of items.
Though I'm not sure this was what you were looking for

Related

I have json data i need search `unique` if key exist or not

I have JSON data I need search unique if the key exists or not.
[
{
"key1" => []
},
{
"key" => []
},
{
"unique" => []
}
]
I can use loop but need an efficient way to check unique exist or not
You'll need to iterate through the array either way.
# You'll get found item or `nil`
data.find { |item| item.key?('unique') }
# You'll get `true` or `false`
data.any? { |item| item.key?('unique') }
Btw better to use a hash as an input instead of an array:
data = {
"key1" => [],
"key" => [],
"unique" => []
}
data.key?('unique')
=> true

Adding hashes to hashes (Ruby)

I've tried adding hashes through #Hash.new with no success, now I am trying .merge as per the forums with limited success as well. I'm trying to add #rand(1..100) into [0] without going into the hash manually. Any ideas?
#age = Hash.new
#email = Hash.new
#age2 = rand(1..100)
people = [
{
"first_name" => "Bob",
"last_name" => "Jones",
"hobbies" => ["basketball", "chess", "phone tag"]
},
{
"first_name" => "Molly",
"last_name" => "Barker",
"hobbies" => ["programming", "reading", "jogging"]
},
{
"first_name" => "Kelly",
"last_name" => "Miller",
"hobbies" => ["cricket", "baking", "stamp collecting"]
}
]
people[0].each do |w|
people.merge({:age => rand(1..100)})
puts "array 0 is #{w}"
end
puts p people
Assuming that's your structure, you do this:
people.each do |person|
person['age'] = rand(1..100)
end
You ideally want to use symbol-style keys instead. That would mean declaring them like this:
people = [
{
first_name: "Bob",
last_name: "Jones",
...
},
...
]
That way you access them like people[0][:first_name]. Your merged in hash uses symbol keys for :age. Remember in Ruby strings and symbols are not equivalent, that is 'bob' != :bob. You should use symbols for regular structures like this, strings for more arbitrary data.

Map keys with the same name

A GET to an API endpoint I'm working with returns json with an inconsistent order of contacts, either
{"contacts"=>[
{"id"=>$UUID_0, "name"=>nil, "email"=>$EMAIL_0, "phone"=>$PHONE_0, "type"=>"foo"},
{"id"=>$UUID_1, "name"=>nil, "email"=>$EMAIL_1, "phone"=>$PHONE_1, "type"=>"bar"}
]}
or
{"contacts"=>[
{"id"=>$UUID_1, "name"=>nil, "email"=>$EMAIL_1, "phone"=>$PHONE_1, "type"=>"bar"},
{"id"=>$UUID_0, "name"=>nil, "email"=>$EMAIL_0, "phone"=>$PHONE_0, "type"=>"foo"}
]}
The "type" values are the only static objects in these responses, so I'd like to map this so that the contact types are keys containing the other pairs:
{
"foo"=>{"id"=>$UUID_0, "name"=>$NAME_0, "email"=>$EMAIL_0, "phone"=>$PHONE_0},
"bar"=>{"id"=>$UUID_1, "name"=>$NAME_1, "email"=>$EMAIL_1, "phone"=>$PHONE_1}
}
A solution is not obvious to me.
If you use Ruby on Rails, or at least ActiveSupport, you can try index_by instead of group_by: it won't put the values into arrays.
hash['contacts'].index_by {|r| r['type']}
=>
{
"bar" => {
"id" => "asdf",
"name" => nil,
"email" => "EMAIL_1",
"phone" => "PHONE_1",
"type" => "bar"
},
"foo" => {
"id" => "asdf",
"name" => nil,
"email" => "EMAIL_0",
"phone" => "PHONE_0",
"type" => "foo"
}
}
Hash[data['contacts'].map { |c| [c['type'], c] }]
This can be done with Enumerable#reduce:
hash['contacts'].reduce({}) {|m,c| m[c['type']] = c;m}
How it works:
An empty hash is the starting point.
The block is called once for each element in the contacts list. The block receives the hash that we're building as m and the current contact as c.
In the block, assign c to the hash based on its type and return the hash so far.
Final result is the last return value of the block.

How do I merge these two array's containing hashes into a third array?

I have two array's containing hashes with product information. The array's are not sorted in a meaningful way.
Array A (number of products/hashes = 27.605) contains:
itemId
description
category
example
[
{"itemId" => "wi225858",
"description" => "Awesome product",
"category" => "/Top products/"},
{...}
]
Array B (number of products/hashes = 18.498) contains:
itemId
description
brand
example
[
{"itemId" => "wi225858",
"description" => "Awesome product",
"brand" => "Coolio"},
{...}
]
Goal (number of products/hashes = 27.605):
itemId
description
category
brand
example
[
{"itemId" => "wi225858",
"description" => "Awesome product",
"category" => "/Top products/",
"brand" => "Coolio"},
{...},
{"itemId" => "wi225605",
"description" => "Brandless nice product",
"category" => "/Nice products/"},
{...}
]
The itemId's are unique. I want the Ruby code to take an itemId from A, check if B contains a product with the same itemId, if it does, add the brand value to the item. If no brand is found, then leave it empty.
The code should create a new array with product hashes I can save to a JSON file.
I've tried:
c = []
a.each do |one|
b.each do |two|
if one['itemId'] == two['itemId']
combined_product = one.merge(two)
c << combined_product
end
end
end
I have two problems with this code:
c.size returns 21.022, which means there are 6.583 products without a brand that have not made it into array c.
It's slow
What could I try next?
This SO answer shows a simple way:
(a+b).group_by { |product| product["itemId"] }.map { |k,v| v.inject(:merge) }
Here is how it works:
We make one big array with all products
We group together products with the same itemId
Finally we loop and map to merge same products hashes
There is a design flaw. You should have hashes instead of arrays like
A = {
"wi225858" => {
"description" => "Awesome product",
"category" => "/Top products/"
},
...
}
and
B = {
"wi225858" => {
"description" => "Awesome product",
"brand" => "Coolio"
},
...
}
Then, you can simply do
A.merge(B){|_, a, b| a.merge(b)}
Even considering the time it takes to convert your arrays into the hashes I suggest, it should run magnitudes faster than what you are doing now.

How do I extract values from nested JSON?

After parsing some JSON:
data = JSON.parse(data)['info']
puts data
I get:
[
{
"title"=>"CEO",
"name"=>"George",
"columns"=>[
{
"display_name"=> "Salary",
"value"=>"3.85",
}
, {
"display_name"=> "Bonus",
"value"=>"994.19",
}
, {
"display_name"=> "Increment",
"value"=>"8.15",
}
]
}
]
columns has nested data in itself.
I want to save the data in a database or CSV file.
title, name, value_Salary, value_Bonus, value_increment
But I'm not concerned about getting display_name, so just the values of first of columns, second of columns data, etc.
Ok I tried data.map after converting to hash & hash.flatten could find a way out.. .map{|x| x['columns']}
.map {|s| s["value"]}
tried to get the values atleast separately - but couldnt...
This is a simple problem, and resolves down to a couple nested map blocks.
Here's the data retrieved from JSON, plus an extra row to demonstrate how easy it is to handle a more complex JSON response:
data = [
{
"title" => "CEO",
"name" => "George",
"columns" => [
{
"display_name" => "Salary",
"value" => "3.85",
},
{
"display_name" => "Bonus",
"value" => "994.19",
},
{
"display_name" => "Increment",
"value" => "8.15",
}
]
},
{
"title" => "CIO",
"name" => "Fred",
"columns" => [
{
"display_name" => "Salary",
"value" => "3.84",
},
{
"display_name" => "Bonus",
"value" => "994.20",
},
{
"display_name" => "Increment",
"value" => "8.15",
}
]
}
]
Here's the code:
records = data.map { |record|
title, name = record.values_at('title', 'name')
values = record['columns'].map{ |column| column['value'] }
[title, name, *values]
}
Here's the resulting data structure, an array of arrays:
records
# => [["CEO", "George", "3.85", "994.19", "8.15"],
# ["CIO", "Fred", "3.84", "994.20", "8.15"]]
Saving it into a database or CSV is left for you to figure out, but Ruby's CSV class makes it trivial to write a file, and an ORM like Sequel makes it really easy to insert the data into a database.

Resources