Embedding Documents Directly in Documents with mongoid - ruby

I need to to bulk insert the array of embedded documents to an existing document. I have tried the below code, but it was not working
arr_loc = []
arr_loc << Location.new(:name=> "test") << Location.new(:name=> "test2")
biz = Business.first
biz.locations = arr_loc
biz.save # not working
currently i am inserting each doc separately by looping the array, i hope there is a better cleaner way to do this.
from mongo shell we can easily do this like this
> var mongo = db.things.findOne({name:"mongo"});
> print(tojson(mongo));
{"_id" : "497da93d4ee47b3a675d2d9b" , "name" : "mongo", "type" : "database"}
> mongo.data = { a:1, b:2};
{"a" : 1 , "b" : 2}
> db.things.save(mongo);
> db.things.findOne({name:"mongo"});
{"_id" : "497da93d4ee47b3a675d2d9b" , "name" : "mongo" , "type" : "database", "data" : {"a" : 1 , "b" : 2}}
>
check the link for more info.. is it possible to do this with mongoid?

It turns out to be a problem in calling save method after assignment
biz.locations = arr_loc #this is fine
biz.save # no need for that
Mongoid updates the document on the assignment itself, no explicit save required. Refer this mongoid google group thread (Thanks Nick hoffman) for more info

Related

Nifi: MergeRecord doesn't wait and group up json files to one batch

I met the problem with Apache NiFi.
I have about 100.000k+ json files looks like:
[ {
"client_customer_id" : 8385419410,
"campaign_id" : "11597209433",
"resourceName" : "customers/8385419410/adGroupAds/118322191652~479093457035",
"campaign" : "11597209433",
"clicks" : "0",
"topImpressionPercentage" : 1,
"videoViews" : "0",
"conversionsValue" : 0,
"conversions" : 0,
"costMicros" : "0",
"ctr" : 0,
"currentModelAttributedConversions" : 0,
"currentModelAttributedConversionsValue" : 0,
"engagements" : "0",
"absoluteTopImpressionPercentage" : 1,
"activeViewImpressions" : "0",
"activeViewMeasurability" : 0,
"activeViewMeasurableCostMicros" : "0",
"activeViewMeasurableImpressions" : "0",
"allConversionsValue" : 0,
"allConversions" : 0,
"averageCpm" : 0,
"gmailForwards" : "0",
"gmailSaves" : "0",
"gmailSecondaryClicks" : "0",
"impressions" : "2",
"interactionRate" : 0,
"interactions" : "0",
"status" : "ENABLED",
"ad.resourceName" : "customers/8385419410/ads/479093457035",
"ad.id" : "479093457035",
"adGroup" : "customers/8385419410/adGroups/118322191652",
"device" : "DESKTOP",
"date" : "2020-11-25"
} ]
Before saving it to database one by one, i want to create batch with 1,000-10,000 elements in one json and then save it to DB to increase speed.
MergeRecord settings:
What did i expect: MergeRecord waiting some time to group up json to create batch with 1000-10000 elements in one json, and then send this batch to PutDatabaseRecord processor.
Actual behaviour: MergeRecord instantly sending json's to PutDatabaseRecord one by one without grouping and joining them.
1/10 flows files will contain several json files as one file, as u can see on screenshot by their size. But seems like these settings of processor don't apply to all files:
I don't understand where's the problem. MergeRecord settings or json files? This is really slow behaviour and my data (1.5 Gb) will be stored in 1 day probably.
The only way I could replicate this was to use a random table.name for each of the flow files, which would cause each file to be in it's own bin, rapidly overfilling your "Maximum Number of Bins", and causing each file to be sent as a separate flow file. If you have more than 10 tables, I would increase that setting.
My only other suggestion would be to play around with the Run Schedule and Run Duration of the MergeRecord Processor (on the scheduling tab). If you set the run schedule to 2 minutes (for example), the processor will run once every two minutes and try to merge as many of the files in the queue as it can.

How to extract the values for a string?

I have this type of data :
--Line1 : val1=10; val2=20; val3=30
--Line2 : val1=11; val2=21; val3=31
--Line3 : val1=12; val2=22; val3=32
--Line4 : val1=13; val2=23; val3=33
--Line5 : val1=14; val2=24; val3=34
--Line6 : val1=15; val2=25; val3=35
--Line7 : val1=16; val2=26; val3=30
Now, i am trying to write a script to get any particular value (say val1 for Line4) on the basis of string "Line1", Line2, etc.
Any hint? Working in linux.

MongoDB count query performance

I have problem with count performance in MongoDB.
I'm using ZF2 and Doctrine ODM with SoftDelete filter. Now when query "first time" collection with db.getCollection('order').count({"deletedAt": null}), it takes about 30 seconds, sometimes even more. Second and more query takes about 150ms. After few minutes query takes again about 30 seconds. This is only on collections with size > 700MB.
Server is Amazon EC2 t2.medium instance, Mongo 3.0.1
Maybe it similar to MongoDB preload documents into RAM for better performance, but those answers do not solve my problem.
Any ideas what is going on?
/edit
explain
{
"executionSuccess" : true,
"nReturned" : 111449,
"executionTimeMillis" : 24966,
"totalKeysExamined" : 0,
"totalDocsExamined" : 111449,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : []
},
"nReturned" : 111449,
"executionTimeMillisEstimate" : 281,
"works" : 145111,
"advanced" : 111449,
"needTime" : 1,
"needFetch" : 33660,
"saveState" : 33660,
"restoreState" : 33660,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 111449
},
"allPlansExecution" : []
}
The count will go through each document which is creating performance issues.
Care about the precise number if it's a small one. You're interested to know if there are 100 results or 500. But once it goes beyond, let's say, 10000, you can just say 'More than 10000 results' found to the user.
db.getCollection('order').find({"deletedAt": null}).limit(10000).count(true)

refactor large switch-statement

Any suggestions for refactoring this ugly case-switch into something more elegant?
This method (in Ruby) returns a (short or full) description for Belgian provinces, given a zipcode.
def province(zipcode, short = false)
case zipcode
when 1000...1300
short ? 'BXL' : 'Brussel'
when 1300...1500
short ? 'WBR' : 'Waals-Brabant'
when 1500...2000, 3000...3500
short ? 'VBR' : 'Vlaams-Brabant'
when 2000...3000
short ? 'ANT' : 'Antwerpen'
when 3500...4000
short ? 'LIM' : 'Limburg'
when 4000...5000
short ? 'LIE' : 'Luik'
when 5000...6000
short ? 'NAM' : 'Namen'
when 6000...6600, 7000...8000
short ? 'HAI' : 'Henegouwen'
when 6600...7000
short ? 'LUX' : 'Luxemburg'
when 8000...9000
short ? 'WVL' : 'West-Vlaanderen'
when 9000..9999
short ? 'OVL' : 'Oost-Vlaanderen'
else
fail ArgumentError, 'Not a valid zipcode'
end
end
Based on suggestions from MiiinimalLogic i made a second version. It this preferable?
class Provincie
ProvincieNaam = Struct.new(:kort, :lang)
PROVINCIES = {
1000...1300 => ProvincieNaam.new('BXL', 'Brussel'),
1300...1500 => ProvincieNaam.new('WBR', 'Waals-Brabant'),
1500...2000 => ProvincieNaam.new('VBR', 'Vlaams-Brabant'),
2000...3000 => ProvincieNaam.new('ANT', 'Antwerpen'),
3000...3500 => ProvincieNaam.new('VBR', 'Vlaams-Brabant'),
3500...4000 => ProvincieNaam.new('LIM', 'Limburg'),
4000...5000 => ProvincieNaam.new('LIE', 'Luik'),
5000...6000 => ProvincieNaam.new('NAM', 'Namen'),
6000...6600 => ProvincieNaam.new('HAI', 'Henegouwen'),
6600...7000 => ProvincieNaam.new('LUX', 'Luxemburg'),
7000...8000 => ProvincieNaam.new('HAI', 'Henegouwen'),
8000...9000 => ProvincieNaam.new('WVL', 'West-Vlaanderen'),
9000..9999 => ProvincieNaam.new('OVL', 'Oost-Vlaanderen')
}.freeze
def self.lang(postcode)
provincie_naam(postcode).lang
end
def self.kort(postcode)
provincie_naam(postcode).kort
end
def self.provincie_naam(postcode)
PROVINCIES.each { |list, prov| return prov if list.cover?(postcode) }
fail ArgumentError, 'Geen geldige postcode'
end
private_class_method :provincie_naam
end
Personally, I'd specify the zip ranges & Province information in a different data structure a la Map of Range objects/Provinces, then use Ruby's Range methods to check if the result falls in the range with the range's method.
You could consider having just one range lookup, either as is done here or with a map structure, with a secondary lookup (probably in a map) from the short description to the long description.

modify active record relation in ruby

I have my #albums which are working fine; pickung data up with albums.json.
What I'd like to do is to split this #albums in three parts.
I was thinking of something like { "ownAlbums" : [ ... ], "friendSubscriptions" : [ ... ], "otherSubscriptions" : [ ... ] } But I got several errors like
syntax error, unexpected tASSOC, expecting kEND #albums["own"] => #albums
when I tried
#albums["own"] => #albums
or
TypeError (Symbol as array index):
when I tried:
#albums[:otherSubscriptions] = 'others'
and so on.
I never tried something like this before but this .json is just a simple array ?
How can I split it in three parts ?
Or do I have to modify the active record to do so ? Because if so, I'd find another way than splitting.
Second Edit
I tried something like this and it's actually working:
#albums = [*#albums]
own = []
cnt = 0
#albums.each do |ownAlbum|
cnt = cnt.to_int
own[cnt] = ownAlbum
cnt=cnt+1
end
subs = Subscription.where(:user_id => #user.user_id)
#albums[0] = own
#albums[1] = subs
But where I have [0] and [1] I'D prefer Strings.
But then I get the error: TypeError (can't convert String into Integer):
How to get around that ?

Resources