how can I scan for a pattern across multiple clusters? - ruby

I have several key patterns
redis ruby gem version 4.2.5
app:namespace1:{id}
app:namespace2:{id}
app:namespace1:{id}:nested1
app:namespace2:{id}:nested2
app:whatever1
app:whatever2
How can I scan for everything app:*?
so far if i can for app:* I only get top level non nested values, like
app:whatever1
app:whatever2
def match_keys(pattern)
#redis ||= init_connection
all_keys = []
cursor = 0
loop do
cursor, keys = #redis.scan(cursor, match: pattern)
all_keys += keys
break if cursor == '0'
end
all_keys.uniq
end
match_keys 'app:*'
#=> ["app:whatever1", "app:whatever2"]
I need it to return the entire app:* namespace keys including all the children keys
Discovery
it looks like my issue is not related to the search string, its because my redis environment runs in a clustered environment, so the keys are spread across multiple clusters, and the cluster responds back with the keys on its own node but not from the others
so how can I get my redis ruby cluster to scan all matching keys across clusters?
here is my cluster config
def init_connection
config = {
cluster: [
"redis://#{ENV['REDIS_HOST_1']}",
"redis://#{ENV['REDIS_HOST_2']}",
"redis://#{ENV['REDIS_HOST_3']}"
],
replica: true
}
#redis ||= ::Redis.new(config)
end
according to this post how to use redis scan in cluster enviroment?
it seems like I need to figure out a way to loop over each node, but i cant find any documentation for doing that

Did you consider upgrading redis-rb?
In the latest versions clustering support was moved into a redis-clustering gem, which in turns uses redis-cluster-client under the hood.
It seems that the latter performs scan across all nodes, so high chances are that with the latest redis-rb you won't need to raise the sun manually.

Related

How to Fix Document Not Found errors with find

I have a collection of Person, stored in a legacy mongodb server (2.4) and accessed with the mongoid gem via the ruby mongodb driver.
If I perform a
Person.where(email: 'some.existing.email#server.tld').first
I get a result (let's assume I store the id in a variable called "the_very_same_id_obtained_above")
If I perform a
Person.find(the_very_same_id_obtained_above)
I got a
Mongoid::Errors::DocumentNotFound
exception
If I use the javascript syntax to perform the query, the result is found
Person.where("this._id == #{the_very_same_id_obtained_above}").first # this works!
I'm currently trying to migrate the data to a newever version. Currently mongodbrestore-ing on amazon documentdb to make tests (mongodb 3.6 compatible) and the issue remains.
One thing I noticed is that those object ids are peculiar:
5ce24b1169902e72c9739ff6 this works anyway
59de48f53137ec054b000004 this requires the trick
The small number of zeroes toward the end of the id seems to be highly correlated with the problem (I have no idea of the reason).
That's the default:
# Raise an error when performing a #find and the document is not found.
# (default: true)
raise_not_found_error: true
Source: https://docs.mongodb.com/mongoid/current/tutorials/mongoid-configuration/#anatomy-of-a-mongoid-config
If this doesn't answer your question, it's very likely the find method is overridden somewhere in your code!

Ruby tweetstream stop unexpectedly

I use tweetstream gem to get sample tweets from Twitter Streaming API:
TweetStream.configure do |config|
config.username = 'my_username'
config.password = 'my_password'
config.auth_method = :basic
end
#client = TweetStream::Client.new
#client.sample do |status|
puts "#{status.text}"
end
However, this script will stop printing out tweets after about 100 tweets (the script continues to run). What could be the problem?
The Twitter Search API sets certain arbitrary (from the outside) limits for things, from the docs:
GET statuses/:id/retweeted_by Show user objects of up to 100 members who retweeted the status.
From the gem, the code for the method is:
# Returns a random sample of all public statuses. The default access level
# provides a small proportion of the Firehose. The "Gardenhose" access
# level provides a proportion more suitable for data mining and
# research applications that desire a larger proportion to be statistically
# significant sample.
def sample(query_parameters = {}, &block)
start('statuses/sample', query_parameters, &block)
end
I checked the API docs but don't see an entry for 'statuses/sample', but looking at the one above I'm assuming you've reached 100 of whatever statuses/xxx has been accessed.
Also, correct me if I'm wrong, but I believe Twitter no longer accepts basic auth and you must use an OAuth key. If this is so, then that means you're unauthenticated, and the search API will also limit you in other ways too, see https://dev.twitter.com/docs/rate-limiting
Hope that helps.
Ok, I made a mistake there, I was looking at the search API when I should've been looking at the streaming API (my apologies), but it's possible some of the things I was talking about could be the cause of your problems so I'll leave it up. Twitter definitely has moved away from basic auth, so I'd try resolving that first, see:
https://dev.twitter.com/docs/auth/oauth/faq

How do I embed multiple documents in MongoDB using Ruby API?

I'm trying to insert a document that has multiple embedded documents but I have been unable to determine the structure for such a document.
I'm using Mongoid in most places but need to perform a batch document insert.
I've tried the following:
def build_records_array(records)
records.collect do |record|
record.raw_attributes["identifier"] = record.identifiers.collect { |identifier| identifier.raw_attributes }
record.raw_attributes
end
end # self.build_records_array
However the identifiers don't show up as embedded documents when I call insert. I just get a bunch of garbage in my parent document.
What is the proper structure for embedded documents?
So, I just had a typo. I wasn't thinking about Mongoid when looking at my problem. After playing around with the Mongo Driver to retrieve records Mongoid had created I discovered that I had everything right but the attribute name.
def build_records_array(records)
records.collect do |record|
record.raw_attributes["identifiers"] = record.identifiers.collect { |identifier| identifier.raw_attributes }
record.raw_attributes
end
end # build_records_array

Ruby calling AWS ELB functions

I'm writing some Ruby scripts to wrap AWS ELB command line calls, mostly so that I can act on several ELB instances simultaneously. One task is to use the elb-describe-instance-health call to see what instance IDs are attached to this ELB.
I want to match the Instance ID to a nickname we have set up for those instances, so that I can see at a glance what machines area connected to the ELB, without having to look up the instance names.
So I am issuing:
cmd = "elb-describe-instance-health #{elbName}"
value = `#{cmd}`
Passing the elb name into the call. This returns output such as:
INSTANCE_ID i-jfjtktykg InService N/A N/A
INSTANCE_ID i-ujelforos InService N/A N/A
One line appear for each instance in the ELB. There are two spaces between each field.
What I need to get is the second field, which is the actual instance ID. Basically I'm trying to get each line returned, turn it into an array, get the 2nd field, which I can then use to lookup our server nickname.
Not sure if this is the right approach, but any suggestions on how to get this done are very welcome.
The newly released aws-sdk gem supports Elastic Load Balancing (AWS::ELB). If you want to get a list of instance ids attached to your load balancer you can do the following:
AWS.config(:access_key_id => '...', :secret_access_key => '...')
elb = AWS::ELB.new
intsance_ids = elb.load_balancers['LOAD_BALANCER_NAME'].instances.collect(&:id)
You could also use EC2 to store your instance nicknames.
ec2 = AWS::EC2.new
ec2.instances['INSTANCE_ID'].tags['nickname'] = 'NICKNAME'
Assuming your instances are tagged with their nicknames, you could collect them like so:
elb = AWS::ELB.new
elb.load_balancers['LOAD_BALANCER_NAME'].instances.collect{|i| i.tags['nickname'] }
A simple way to extract the second column would be something like this:
ids = value.split("\n").collect { |line| line.split(/\s+/)[1] }
This will leave the second column values in the Array ids. All this does is breaks the value into lines, breaks each line into whitespace delimited columns, and then extracts the second column.
There's probably no need to try to be too clever for something like this, a simple and straight forward solution should be sufficient.
References:
collect
split

mongoid query caching

Rails' ActiveRecord has a feature called Query Caching (ActiveRecord::QueryCache) which saves the result of SQL query for the life-span of a request. While I'm not very familiar with the internals of the implementation, I think that it saves the query results somewhere in the Rack env, which is discarded in the end of the request.
The Mongoid, unfortunately, doesn't currently provide such feature, and this is exacerbated by the fact, that some queries occur implicitly (references).
I'm considering to implement this feature, and I'm curious, where and how Mongoid (or, perhaps, mongo driver?) should be hooked in order to implement this.
Mongoid has caching, described under http://mongoid.org/en/mongoid/docs/extras.html
Also MongoDB itself has caching ability: http://www.mongodb.org/display/DOCS/Caching
The mongoid caching extra knows 2 different cases: Caching of all queries of a model or caching of a query.
Mongoid caching seems to work slightly different: it looks like mongoid delegates caching to mongodb. (In the sources of mongoid I only can find option settings for caching but no cache module.)
Finally would say, there is no real difference in the caching in general -- in memory is in fact in memory! No matter if it's in the app or in the database.
I don't prefer to implement an extra caching algorithm, because this seems to be redundant and a RAM killer.
BTW: If your really want to cache results in-app you could try Rails.cache or another cache gem as a workaround.
The other answer is obviously wrong. Not only mongoid or mongo driver doesn't cache the query, even if mongo would - it still might be on other machine across the network.
My solution was to wrap the receive_message in Mongo::Connection.
Pros: one definite place
Cons: deserialization still takes place
require 'mongo'
module Mongo
class Connection
module QueryCache
extend ActiveSupport::Concern
module InstanceMethods
# Enable the selector cache within the block.
def cache
#query_cache ||= {}
old, #query_cache_enabled = #query_cache_enabled, true
yield
ensure
clear_query_cache
#query_cache_enabled = old
end
# Disable the selector cache within the block.
def uncached
old, #query_cache_enabled = #query_cache_enabled, false
yield
ensure
#query_cache_enabled = old
end
def clear_query_cache
#query_cache.clear
end
def cache_receive_message(operation, message)
#query_cache[operation] ||= {}
key = message.to_s.hash
log = "[MONGO] CACHE %s"
if entry = #query_cache[operation][key]
Mongoid.logger.debug log % 'HIT'
entry
else
Mongoid.logger.debug log % 'MISS'
#query_cache[operation][key] = yield
end
end
def receive_message_with_cache(operation, message, log_message=nil, socket=nil, command=false)
if query_cache_enabled
cache_receive_message(operation, message) do
receive_message_without_cache(operation, message, log_message, socket, command)
end
else
receive_message_without_cache(operation, message, log_message, socket, command)
end
end
end # module InstanceMethods
included do
alias_method_chain :receive_message, :cache
attr_reader :query_cache, :query_cache_enabled
end
end # module QueryCache
end # class Connection
end
Mongo::Connection.send(:include, Mongo::Connection::QueryCache)
OK, Mongoid 4 supports QueryCache middleware.
Just add middleware in application.rb
config.middleware.use "Mongoid::QueryCache::Middleware"
And then profit:
MOPED: 127.0.0.1:27017 QUERY database=XXX collection=page_variants selector={"$query"=>{"_id"=>BSON::ObjectId('5564dabb6d61631e21d70000')}, "$orderby"=>{:_id=>1}} flags=[] limit=-1 skip=0 batch_size=nil fields=nil runtime: 0.4397ms
MOPED: 127.0.0.1:27017 QUERY database=XXX collection=page_variants selector={"$query"=>{"_id"=>BSON::ObjectId('5564dacf6d61631e21dc0000')}, "$orderby"=>{:_id=>1}} flags=[] limit=-1 skip=0 batch_size=nil fields=nil runtime: 0.4590ms
QUERY CACHE database=XXX collection=page_variants selector={"$query"=>{"_id"=>BSON::ObjectId('5564c9596d61631e21d30000')}, "$orderby"=>{:_id=>1}}
QUERY CACHE database=XXX collection=page_variants selector={"$query"=>{"_id"=>BSON::ObjectId('5564dabb6d61631e21d70000')}, "$orderby"=>{:_id=>1}}
Source:
Mongoid changelog
https://github.com/mongoid/mongoid/blob/master/CHANGELOG.md#new-features-2
3410 Mongoid now has a query cache that can be used as a middleware in Rack applications. (Arthur Neves)
For Rails:
config.middleware.use(Mongoid::QueryCache::Middleware)
Mongoid 4.0+ now has a QueryCaching module: http://www.rubydoc.info/github/mongoid/mongoid/Mongoid/QueryCache
You can use it on finds by wrapping your lookups like so:
QueryCache.cache { MyCollection.find("xyz") }

Resources