Reuse mongodb connection in ruby script - ruby

I have a ruby script with several connections to a MongoDB I've since found that each query/insert is opening a new connection leaving me lots of connections in use.
ruby-2.0.0p195-2.el6.x86_64 & mongodb-2.4.12-1.el6.x86_64
I have several connections as per the examples below throughout my script. How do I use a single, or as few as possible connections so that I don't end up with 100's in use at any one time. My script is split into def foo / end chunks where some blocks have just 1 db action, others have 3 or 4.
#Insert into db
db = Mongo::Connection.new.db("room1-web")
coll = db.collection("room1")
coll.update({"_id" =>a}, rabbitdb, {upsert:true})
#Insert into db
db = Mongo::Connection.new.db("room1-web")
coll = db.collection("room1")
coll.update({"_id" =>b}, chickendb, {upsert:true})
#Do query on db to update indicators etc.
db = Mongo::Connection.new.db("room1-indi-lookup")
coll = db.collection("elements")
kitty = coll.find({"_id" => table[address][i], "state" => char}, :fields => {"_id" => 0, "state" => 0}).to_a

#conn = Mongo::Connection.new("localhost", 27017, :pool_size => 5, :pool_timeout => 5)
db = #conn.db(.....)

Related

MongoDB MongoMapper Ruby Replica Set Config

I'm using this config to connect to MongoDB with MongoMapper in my Sinatra application:
MongoMapper.connection = Mongo::Connection.new('localhost', 27017)
Now I have a replica set with 2 mongos on separate servers, 10.5.5.5, and 10.5.5.6. How do I setup the connection with both mongos? How do I add authentication to this connection?
I ended up doing this:
MongoMapper.connection = Mongo::MongoReplicaSetClient.new(
['10.5.5.5:27017', '10.5.5.6:27017'],
:read => :primary, :rs_name => 'name', :connect_timeout => 30, :op_timeout => 30
)
MongoMapper.database = "db_name"
MongoMapper.database.authenticate("user", "test123")
Works beautifully.
You should be able to set a different connection per model. But I guess this is not exactly what you trying to do.
class MyModel
include MongoMapper::Document
connection(Mongo::Connection.new('localhost', 27017))
set_database_name "my_database"
# ...
end
Or there is ReplSetConnection with this you can set your replications sets:
MongoMapper.connection = Mongo::ReplicaSetConnection.new(['10.5.5.5', 30000], [' 10.5.5.6', 30000])
And the authentication is simple:
MongoMapper.connection = Mongo::Connection.new('localhost', 27017)
MongoMapper.database = "DBNAME"
MongoMapper.database.authenticate("USERNAME", "PASSWORD")

Redmine_Backlogs fails to show successful migration in settings page

So, I am trying to get redmine_backlogs to work with SQL server. [We are using SQL Server rather than SQLite, to scale better]
I am NOT a Ruby programmer, but I really want this plugin for my team, as we actively use Redmine for several projects.
After reading through some Ruby tutorials, I've managed to get make some modifications and get the plugin installed and migrated correctly [it appears].
On the plugin settings screen [in administration] in Redmine, it shows the migration wasn't successful.
Even though all the list items are green, and the migrations appeared to work.
Any ideas?
The changes I made were to bypass suspected issues with the way Active records handles direct SQL queries.
Here are the changes I've made -
ERROR #1 –
C:\Projects\Redmine\redmine-2.3.2>rake redmine:plugins:migrate
Migrating redmine_backlogs (Redmine Backlogs)...
== AddStoryPositions: migrating ==============================================
-- execute("select max(position) from issues")
-> 0.0020s
-> -1 rows
rake aborted!
An error has occurred, this and all later migrations canceled:
undefined method `each' for -1:FixnumC:/Projects/Redmine/redmine-2.3.2/plugins/redmine_backlogs/db/migrate/026_add_story_positions.rb:10:in `up'
FIX #1 –
Direct queries are not working correctly with the sqladapter [TinyTds + Active Record]
026_add_story_positions.rb
class AddStoryPositions < ActiveRecord::Migration
def self.up
# Rails doesn't support temp tables, mysql doesn't support update
# from same-table subselect
unless RbStory.trackers.size == 0
max = 0
dbconfig = YAML.load_file(File.join(File.dirname(__FILE__), '../../../../config/database.yml'))#[Rails.env]['username']
if dbconfig[Rails.env]['adapter'] == 'sqlserver' then
database = dbconfig[Rails.env]['database']
dataserver = dbconfig[Rails.env]['dataserver']
mode = dbconfig[Rails.env]['mode']
port = dbconfig[Rails.env]['port']
username = dbconfig[Rails.env]['username']
password = dbconfig[Rails.env]['password']
client = TinyTds::Client.new(
:database => database,
:dataserver => dataserver,
:mode => mode,
:port => port,
:username => username,
:password => password)
client.execute("select max(position) from issues").each{|row| max = row[0]}
client.execute "update issues
set position = #{max} + id
where position is null and tracker_id in (#{RbStory.trackers(:type=>:string)})"
else
execute("select max(position) from issues").each{|row| max = row[0]}
execute "update issues
set position = #{max} + id
where position is null and tracker_id in (#{RbStory.trackers(:type=>:string)})"
end
end
end
def self.down
puts "Reverting irreversible migration"
end
end
ERROR #2
rake aborted!
An error has occurred, this and all later migrations canceled:
TinyTds::Error: ALTER TABLE ALTER COLUMN position failed because one or more objects access this column.: ALTER TABLE [issues] ALTER COLUMN [position]
integer NOT NULLC:/Projects/Redmine/redmine-2.3.2/plugins/redmine_backlogs/db/migrate/033_unique_positions.rb:30:in `up'
FIX #2
033_unique_positions.rb
#SQLServer cannot change the type of an indexes column, so it must be dropped first
remove_index :issues, :position
change_column :issues, :position, :integer, :null => false
add_index :issues, :position
ERROR #3
rake aborted!
undefined method each' for -1:Fixnum
C:/Projects/Redmine/redmine-2.3.2/plugins/redmine_backlogs/lib/backlogs_setup.rb:155:inmigrated?'
FIX #3
def migrated?
available = Dir[File.join(File.dirname(__FILE__), '../db/migrate/*.rb')].collect{|m| Integer(File.basename(m).split('_')[0].gsub(/^0+/, ''))}.sort
return true if available.size == 0
available = available[-1]
ran = []
dbconfig = YAML.load_file(File.join(File.dirname(__FILE__), '../../../config/database.yml'))#[Rails.env]['username']
if dbconfig[Rails.env]['adapter'] == 'sqlserver' then
database = dbconfig[Rails.env]['database']
dataserver = dbconfig[Rails.env]['dataserver']
mode = dbconfig[Rails.env]['mode']
port = dbconfig[Rails.env]['port']
username = dbconfig[Rails.env]['username']
password = dbconfig[Rails.env]['password']
client = TinyTds::Client.new(
:database => database,
:dataserver => dataserver,
:mode => mode,
:port => port,
:username => username,
:password => password)
client.execute("select version from schema_migrations where version like '%-redmine_backlogs'").each{|m|
ran << Integer((m.is_a?(Hash) ? m.values : m)[0].split('-')[0])
}
else
Setting.connection.execute("select version from schema_migrations where version like '%-redmine_backlogs'").each{|m|
ran << Integer((m.is_a?(Hash) ? m.values : m)[0].split('-')[0])
}
end
return false if ran.size == 0
ran = ran.sort[-1]
return ran >= available
end
module_function :migrated?
I was using the wrong where clause -
This is the correct one, I must have overwritten when debugging.
'%-redmine_backlogs'
The above code works.
I could not answer my own question before, but now I can.
The above code was tested and works. I have been running backlogs on Windows with MS SQL successfully since.

Mongodb not inserting Ruby time.new consistantly on Heroku

Built a small app to grab Tweets from political candidates for the upcoming election. Using Ruby, Twitterstream, Mongodb and Heroku.
The time is being inserted into the database inconsistantly. Sometimes it works, sometimes it doesn't. Is this my code, Heroku or Mongodb (Mongohq). I have a support question in.
Working
{
_id: ObjectId("52556b5bd2d9530002000002"),
time: ISODate("2013-10-09T14:42:35.044Z"),
user: "Blondetigressnc",
userid: 1342776674,
tweet: "RT #GovBrewer: Mr. President #BarackObama, reopen America’s National Parks or let the states do it. #GrandCanyon #Lead http://t.co/kkPKt9B7…",
statusid: "387951226866110464"
}
Not working
{
_id: ObjectId("52556c2454d4ad0002000016"),
user: "PeterMcC66",
userid: 1729065984,
tweet: "#GovBrewer #Blondetigressnc #BarackObama Time to impeach surely?",
statusid: "387952072223506432"
}
Seems random. See anything wrong or stupid in my code?
require 'rubygems'
require 'tweetstream'
require 'mongo'
# user ids
users = 'list of Twitter user ids here'
# connect to stream
TweetStream.configure do |config|
config.consumer_key = ENV['T_KEY']
config.consumer_secret = ENV['T_SECRET']
config.oauth_token = ENV['T_TOKEN']
config.oauth_token_secret = ENV['T_TOKEN_SECRET']
config.auth_method = :oauth
end
# connection to database
if ENV['MONGOHQ_URL']
uri = URI.parse(ENV['MONGOHQ_URL'])
conn = Mongo::Connection.from_uri(ENV['MONGOHQ_URL'])
DB = conn.db(uri.path.gsub(/^\//, ''))
else
DB = Mongo::Connection.new.db("tweetsDB")
end
# creation of collections
tweets = DB.create_collection("tweets")
deleted = DB.create_collection("deleted-tweets")
#client = TweetStream::Client.new
#client.on_delete do | status_id, user_id |
puts "#{status_id}"
timenow = Time.new
id = status_id.to_s
deleted.insert({ :time => timenow, :user_id => user_id, :statusid => id })
end
#client.follow(users) do |status|
puts "[#{status.user.screen_name}] #{status.text}"
timenow = Time.new
id = status.id
tweets.insert({ :time => timenow, :user => status.user.screen_name, :userid => status.user.id, :tweet => status.text, :statusid => id.to_s })
end
The issue is that you need to use a UTC time, not your local timezone. This is not a MongoDB or a Ruby driver issue, its a constraint of the BSON spec and the ISODate BSON type.
http://docs.mongodb.org/manual/reference/bson-types/#date
http://bsonspec.org/#/specification
Also, just a good practice though.
General word of advice: Use UTC on the back-end of whatever you're building always anyway, regardless of what datastore you're using (not a MongoDB specific thing). This is especially true if this data is something you want to query on directly.
If you need to convert to a local timezone, its best to handle that when you display or output the data rather than trying to manage that elsewhere. Some of the most fantastic bugs I've ever seen were related to inconsistent handling of timezones in the persistence layer of the application.
Keep those times consistent on the back-and, deal with local timezone conversion when in your application and life will be much easier for you.
Here is an examples of how to work with times in MongoDB using Ruby:
require 'time' # required for ISO-8601
require 'mongo'
include Mongo
client = MongoClient.new
coll = client['example_database']['example_collection']
coll.insert({ 'updated_at' => Time.now.utc })
doc = coll.find_one()
doc['updated_it'].is_a?(Time) #=> true
doc['updated_at'].to_s #=> "2013-10-07 22:43:52 UTC"
doc['updated_at'].iso8601 #=> "2013-10-07T22:43:52Z"
doc['updated_at'].strftime("updated at %m/%d/%Y") #=> "updated at 10/07/2013"
I keep a gist of this available here:
https://gist.github.com/brandonblack/6876374

Stream based parsing and writing of JSON

I fetch about 20,000 datasets from a server in 1,000 batches. Each dataset is a JSON object. Persisted this makes around 350 MB of uncompressed plaintext.
I have a memory limit of 1GB. Hence, I write each 1,000 JSON objects as an array into a raw JSON file in append mode.
The result is a file with 20 JSON arrays which needs to be aggregated. I need to touch them anyway, because I want to add metadata. Generally the Ruby Yajl Parser makes this possible like so:
raw_file = File.new(path_to_raw_file, 'r')
json_file = File.new(path_to_json_file, 'w')
datasets = []
parser = Yajl::Parser.new
parser.on_parse_complete = Proc.new { |o| datasets += o }
parser.parse(datasets)
hash = { date: Time.now, datasets: datasets }
Yajl::Encoder.encode(hash, json_file)
Where is the problem with this solution? The problem is that still the whole JSON is parsed into memory, which I must avoid.
Basically what I need is a solution which parses the JSON from an IO object and encodes them to another IO object, at the same time.
I assumed Yajl offers this, but I haven't found a way, nor did its API give any hints, so I guess not. Is there a JSON Parser library which supports this? Are there other solutions?
The only solution I can think of is to use the IO.seek capabilities. Write all the datasets arrays one after another [...][...][...] and after every array, I seek back to the start and overwrite ][ with ,, effectively connecting the arrays manually.
Why can't you retrieve a single record at a time from the database, process it as necessary, convert it to JSON, then emit it with a trailing/delimiting comma?
If you started with a file that only contained [, then appended all your JSON strings, then, on the final entry didn't append a comma, and instead used a closing ], you'd have a JSON array of hashes, and would only have to process one row's worth at a time.
It'd be a tiny bit slower (maybe) but wouldn't impact your system. And DB I/O can be very fast if you use blocking/paging to retrieve a reasonable number of records at a time.
For instance, here's a combination of some Sequel example code, and code to extract the rows as JSON and build a larger JSON structure:
require 'json'
require 'sequel'
DB = Sequel.sqlite # memory database
DB.create_table :items do
primary_key :id
String :name
Float :price
end
items = DB[:items] # Create a dataset
# Populate the table
items.insert(:name => 'abc', :price => rand * 100)
items.insert(:name => 'def', :price => rand * 100)
items.insert(:name => 'ghi', :price => rand * 100)
add_comma = false
puts '['
items.order(:price).each do |item|
puts ',' if add_comma
add_comma ||= true
print JSON[item]
end
puts "\n]"
Which outputs:
[
{"id":2,"name":"def","price":3.714714089426208},
{"id":3,"name":"ghi","price":27.0179624376119},
{"id":1,"name":"abc","price":52.51248221170203}
]
Notice the order is now by "price".
Validation is easy:
require 'json'
require 'pp'
pp JSON[<<EOT]
[
{"id":2,"name":"def","price":3.714714089426208},
{"id":3,"name":"ghi","price":27.0179624376119},
{"id":1,"name":"abc","price":52.51248221170203}
]
EOT
Which results in:
[{"id"=>2, "name"=>"def", "price"=>3.714714089426208},
{"id"=>3, "name"=>"ghi", "price"=>27.0179624376119},
{"id"=>1, "name"=>"abc", "price"=>52.51248221170203}]
This validates the JSON and demonstrates that the original data is recoverable. Each row retrieved from the database should be a minimal "bitesized" piece of the overall JSON structure you want to build.
Building upon that, here's how to read incoming JSON in the database, manipulate it, then emit it as a JSON file:
require 'json'
require 'sequel'
DB = Sequel.sqlite # memory database
DB.create_table :items do
primary_key :id
String :json
end
items = DB[:items] # Create a dataset
# Populate the table
items.insert(:json => JSON[:name => 'abc', :price => rand * 100])
items.insert(:json => JSON[:name => 'def', :price => rand * 100])
items.insert(:json => JSON[:name => 'ghi', :price => rand * 100])
items.insert(:json => JSON[:name => 'jkl', :price => rand * 100])
items.insert(:json => JSON[:name => 'mno', :price => rand * 100])
items.insert(:json => JSON[:name => 'pqr', :price => rand * 100])
items.insert(:json => JSON[:name => 'stu', :price => rand * 100])
items.insert(:json => JSON[:name => 'vwx', :price => rand * 100])
items.insert(:json => JSON[:name => 'yz_', :price => rand * 100])
add_comma = false
puts '['
items.each do |item|
puts ',' if add_comma
add_comma ||= true
print JSON[
JSON[
item[:json]
].merge('foo' => 'bar', 'time' => Time.now.to_f)
]
end
puts "\n]"
Which generates:
[
{"name":"abc","price":3.268814929005337,"foo":"bar","time":1379688093.124606},
{"name":"def","price":13.871147312377719,"foo":"bar","time":1379688093.124664},
{"name":"ghi","price":52.720984131655676,"foo":"bar","time":1379688093.124702},
{"name":"jkl","price":53.21477190840114,"foo":"bar","time":1379688093.124732},
{"name":"mno","price":40.99364022416619,"foo":"bar","time":1379688093.124758},
{"name":"pqr","price":5.918738444452265,"foo":"bar","time":1379688093.124803},
{"name":"stu","price":45.09391752439902,"foo":"bar","time":1379688093.124831},
{"name":"vwx","price":63.08947792357426,"foo":"bar","time":1379688093.124862},
{"name":"yz_","price":94.04921035056373,"foo":"bar","time":1379688093.124894}
]
I added the timestamp so you can see that each row is processed individually, AND to give you an idea how fast the rows are being processed. Granted, this is a tiny, in-memory database, which has no network I/O to content with, but a normal network connection through a switch to a database on a reasonable DB host should be pretty fast too. Telling the ORM to read the DB in chunks can speed up the processing because the DBM will be able to return larger blocks to more efficiently fill the packets. You'll have to experiment to determine what size chunks you need because it will vary based on your network, your hosts, and the size of your records.
Your original design isn't good when dealing with enterprise-sized databases, especially when your hardware resources are limited. Over the years we've learned how to parse BIG databases, which make 20,000 row tables appear miniscule. VM slices are common these days and we use them for crunching, so they're often the PCs of yesteryear: single CPU with small memory footprints and dinky drives. We can't beat them up or they'll be bottlenecks, so we have to break the data into the smallest atomic pieces we can.
Harping about DB design: Storing JSON in a database is a questionable practice. DBMs these days can spew JSON, YAML and XML representations of rows, but forcing the DBM to search inside stored JSON, YAML or XML strings is a major hit in processing speed, so avoid it at all costs unless you also have the equivalent lookup data indexed in separate fields so your searches are at the highest possible speed. If the data is available in separate fields, then doing good ol' database queries, tweaking in the DBM or your scripting language of choice, and emitting the massaged data becomes a lot easier.
It is possible via JSON::Stream or Yajl::FFI gems. You will have to write your own callbacks though. Some hints on how to do that can be found here and here.
Facing a similar problem I have created the json-streamer gem that will spare you the need to create your own callbacks. It will yield you each object one by one removing it from the memory afterwards. You could then pass these to another IO object as intended.
There is a library called oj that does exactly that. It can do parsing and generation. For example, for parsing you can use Oj::Doc:
Oj::Doc.open('[3,[2,1]]') do |doc|
result = {}
doc.each_leaf() do |d|
result[d.where?] = d.fetch()
end
result
end #=> ["/1" => 3, "/2/1" => 2, "/2/2" => 1]
You can even backtrack in the file using doc.move(path). it seems very flexible.
For writing documents, you can use Oj::StreamWriter:
require 'oj'
doc = Oj::StreamWriter.new($stdout)
def write_item(doc, item)
doc.push_object
doc.push_key "type"
doc.push_value "item"
doc.push_key "value"
doc.push_value item
doc.pop
end
def write_array(doc, array)
doc.push_object
doc.push_key "type"
doc.push_value "array"
doc.push_key "value"
doc.push_array
array.each do |item|
write_item(doc, item)
end
doc.pop
doc.pop
end
write_array(doc, [{a: 1}, {a: 2}]) #=> {"type":"array","value":[{"type":"item","value":{":a":1}},{"type":"item","value":{":a":2}}]}

Ruby variable subsitution in method call

Ruby noob here. Any help with a little issue I'm having would be appreciated.
I am trying to place an array into a connection string argument which is formatted as an array.
My array is as follows:
hosts = ["192.168.0.2:27017","192.168.0.3:27017"]
I need to pull the array apart and structure it like an array so that I can substitute all of the connections into the call at once. The number of hosts can vary so hence why its in an array.
hosts_mapped = hosts.map { |i| "'" + i.to_s + "'" }.join(",")
gives me "192.168.0.2:27017","192.168.0.3:27017" as a string I think... or this may have mapped it back to an array as I get an error which looks like the one below after trying to initiate a connection.
#conn = Mongo::ReplSetConnection.new([hosts_mapped], :refresh_mode => :sync, :refresh_interval => 10)
Exception `Mongo::ConnectionFailure' at gems/mongo-1.7.0/lib/mongo/util/pool_manager.rb:282 - Cannot connect to a replica set using seeds '192.168.0.2:27017
Mongo::ConnectionFailure: Cannot connect to a replica set using seeds '192.168.0.2:27017
As you can see it only seems to reference the first entry. I need to hold this array in a configuration file so this is the reason it does not go directly into the connection string above.
To me it seems I have mapped hosts_mapped back to an array, but if I puts hosts_mapped I get the string in the correct format.
"192.168.0.2:27017","192.168.0.3:27017"
A working connection string looks like:
#conn = Mongo::ReplSetConnection.new(["192.168.0.2:27017","192.168.0.3:27017"], :refresh_mode => :sync, :refresh_interval => 10)
Does anyone have any idea where I am going wrong here?
Full code to test:
#!/usr/bin/ruby -d
require "mongo"
hosts = ["192.168.0.2:27017","192.168.0.3:27017"]
hosts_mapped = hosts.map {|i| "'" + i.to_s + "'" }.join(",") #conn =
Mongo::ReplSetConnection.new([hosts_mapped], :refresh_mode => :sync,:refresh_interval => 10)
According to the docs Mongo::ReplSetConnection.new can take an array:
Mongo::ReplSetConnection.new(['localhost:30000', 'localhost:30001'])
Since you already have an array, you can just pass it as the first parameter:
hosts = ["192.168.0.2:27017","192.168.0.3:27017"]
Mongo::ReplSetConnection.new(hosts)
you already have an array hosts = ["192.168.0.2:27017","192.168.0.3:27017"]
And if #conn = Mongo::ReplSetConnection.new(["192.168.0.2:27017","192.168.0.3:27017"], :refresh_mode => :sync, :refresh_interval => 10) works all you need to do is
#conn = Mongo::ReplSetConnection.new(hosts, :refresh_mode => :sync, :refresh_interval => 10)

Resources