Lets say we have the following array of strings (this array is a lot bigger):
[
'http://www.example.com?id=123456',
'http://www.example.com?id=234567'
]
As you can see, everything up to the first digit is the same in both strings. Is there a way to easily find what both strings have in common and what is different? So that I get a string like 'http://www.example.com?id=' and and array like ['123456', '234567'].
Here's a method to find the longest common prefix in an array.
def _lcp(str1, str2)
end_index = [str1.length, str2.length].min - 1
end_index.downto(0) do |i|
return str1[0..i] if str1[0..i] == str2[0..i]
end
''
end
def lcp(strings)
strings.inject do |acc, str|
_lcp(acc, str)
end
end
lcp [
'http://www.example.com?id=123456',
'http://www.example.com?id=234567',
'http://www.example.com?id=987654'
]
#=> "http://www.example.com?id="
lcp [
'http://www.example.com?id=123456',
'http://www.example.com?id=123457'
]
#=> "http://www.example.com?id=12345"
# This is an approach using higher level ruby std-lib components instead of a regex.
# Why re-invent the wheel?
module UriHelper
require 'uri'
require 'cgi'
# Take an array of urls and extract the id parameter.
# #param urls {Array} an array of urls to parse
# #returns {Array}
def UriHelper.get_id_params( urls )
urls.map do |u|
puts u
uri = URI(u)
params = CGI::parse(uri.query)
params["id"].first # returned
end
end
end
require "test/unit"
# This is unit test proving our helper works as intended
class TestUriHelper < Test::Unit::TestCase
def test_get_id_params
urls = [
'http://www.example.com?id=123456',
'http://www.example.com?id=234567'
]
assert_equal("123456", UriHelper.get_id_params(urls).first )
assert_equal("234567", UriHelper.get_id_params(urls).last )
end
end
Related
I have a Hash which needs to be converted in a String with escaped characters.
{name: "fakename"}
and should end up like this:
'name:\'fakename\'
I don't know how this type of string is called. Maybe there is an already existing method, which I simply don't know...
At the end I would do something like this:
name = {name: "fakename"}
metadata = {}
metadata['foo'] = 'bar'
"#{name} AND #{metadata}"
which ends up in that:
'name:\'fakename\' AND metadata[\'foo\']:\'bar\''
Context: This query a requirement to search Stripe API: https://stripe.com/docs/api/customers/search
If possible I would use Stripe's gem.
In case you can't use it, this piece of code extracted from the gem should help you encode the query parameters.
require 'cgi'
# Copied from here: https://github.com/stripe/stripe-ruby/blob/a06b1477e7c28f299222de454fa387e53bfd2c66/lib/stripe/util.rb
class Util
def self.flatten_params(params, parent_key = nil)
result = []
# do not sort the final output because arrays (and arrays of hashes
# especially) can be order sensitive, but do sort incoming parameters
params.each do |key, value|
calculated_key = parent_key ? "#{parent_key}[#{key}]" : key.to_s
if value.is_a?(Hash)
result += flatten_params(value, calculated_key)
elsif value.is_a?(Array)
result += flatten_params_array(value, calculated_key)
else
result << [calculated_key, value]
end
end
result
end
def self.flatten_params_array(value, calculated_key)
result = []
value.each_with_index do |elem, i|
if elem.is_a?(Hash)
result += flatten_params(elem, "#{calculated_key}[#{i}]")
elsif elem.is_a?(Array)
result += flatten_params_array(elem, calculated_key)
else
result << ["#{calculated_key}[#{i}]", elem]
end
end
result
end
def self.url_encode(key)
CGI.escape(key.to_s).
# Don't use strict form encoding by changing the square bracket control
# characters back to their literals. This is fine by the server, and
# makes these parameter strings easier to read.
gsub("%5B", "[").gsub("%5D", "]")
end
end
params = { name: 'fakename', metadata: { foo: 'bar' } }
Util.flatten_params(params).map { |k, v| "#{Util.url_encode(k)}=#{Util.url_encode(v)}" }.join("&")
I use it now with that string, which works... Quite straigt forward:
"email:\'#{email}\'"
email = "test#test.com"
key = "foo"
value = "bar"
["email:\'#{email}\'", "metadata[\'#{key}\']:\'#{value}\'"].join(" AND ")
=> "email:'test#test.com' AND metadata['foo']:'bar'"
which is accepted by Stripe API
I am new to Ruby so sorry if it's a simple question.
I want to open a ruby file and search all constants, but I don't know the right regular expression.
Here is my simplified code:
def findconst()
filename = #path_main
k= {}
akonstanten = []
k[:konstanten] = akonstanten
if (File.exists?(filename))
file = open(filename, "r")
while (line = file.gets)
if (line =~ ????)
k[:konstanten] << line
end
end
end
end
You can use Ripper library to extract the tokens.
For example, this code will return you constants and methods names for the file
A = "String" # Comment
B = <<-STR
Yet Another String
STR
class C
class D
def method_1
end
def method_2
end
end
end
require "ripper"
tokens = Ripper.lex(File.read("file.rb"))
pp tokens.group_by { |x| x[1] }[:on_ident].map(&:last)
pp tokens.group_by { |x| x[1] }[:on_const].map(&:last)
# => ["method_1", "method_2"]
# => ["A", "B", "C", "D"]
As Sergio Says searching for words with Caps won't just give you constants but if it's good enough it's good enough.
The regexpression you are looking for is something like
if (line =~ /[^a-z][A-Z]/)
Which says match any capital that is not preceded by a lower case letter. Of course this will only count one per line so you might like to consider tokenising the stream and working on tokens, not lines.
Sorry to ask this but I really need to get this done. I'd like to be able to pass in a string and strip out the stop_words. I have the following:
class Query
def self.normalize term
stop_words=["a","big","array"]
term.downcase!
legit=[]
if !stop_words.include?(term)
legit << term
end
return legit
end
def self.check_parts term
term_parts=term.split(' ')
tmp_part=[]
term_parts.each do |part|
t=self.normalize part
tmp_part << t
end
return tmp_part
end
end
I would think that this would return only terms that are not in the stop_words list but I'm getting back either an empty array or an array of the terms passed in. Like this:
ruby-1.9.2-p290 :146 > Query.check_parts "here Is my Char"
=> [[], [], [], ["char"]]
ruby-1.9.2-p290 :147 >
What am I doing wrong?
thx in advance
If you just want to filter out the terms and get an array of downcased words, it is simple.
module Query
StopWords = %w[a big array]
def self.check_parts string; string.downcase.split(/\s+/) - StopWords end
end
Query.check_parts("here Is my Char") # => ["here", "is", "my", "char"]
Why do you want the result as an array I don't know but
term_parts=term.split(' ')
term_parts.reject { |part| stop_words.include?(part) }
You could write what you expect.
By the way, you have an array for array because
def self.check_parts term
term_parts=term.split(' ')
tmp_part=[] # creates an array
term_parts.each do |part|
t=self.normalize part # normalize returns an empty array
# or one of only one element (a term).
tmp_part << t # you add an array into the array
end
return tmp_part
end
I need to create a signature string for a variable in Ruby, where the variable can be a number, a string, a hash, or an array. The hash values and array elements can also be any of these types.
This string will be used to compare the values in a database (Mongo, in this case).
My first thought was to create an MD5 hash of a JSON encoded value, like so: (body is the variable referred to above)
def createsig(body)
Digest::MD5.hexdigest(JSON.generate(body))
end
This nearly works, but JSON.generate does not encode the keys of a hash in the same order each time, so createsig({:a=>'a',:b=>'b'}) does not always equal createsig({:b=>'b',:a=>'a'}).
What is the best way to create a signature string to fit this need?
Note: For the detail oriented among us, I know that you can't JSON.generate() a number or a string. In these cases, I would just call MD5.hexdigest() directly.
I coding up the following pretty quickly and don't have time to really test it here at work, but it ought to do the job. Let me know if you find any issues with it and I'll take a look.
This should properly flatten out and sort the arrays and hashes, and you'd need to have to some pretty strange looking strings for there to be any collisions.
def createsig(body)
Digest::MD5.hexdigest( sigflat body )
end
def sigflat(body)
if body.class == Hash
arr = []
body.each do |key, value|
arr << "#{sigflat key}=>#{sigflat value}"
end
body = arr
end
if body.class == Array
str = ''
body.map! do |value|
sigflat value
end.sort!.each do |value|
str << value
end
end
if body.class != String
body = body.to_s << body.class.to_s
end
body
end
> sigflat({:a => {:b => 'b', :c => 'c'}, :d => 'd'}) == sigflat({:d => 'd', :a => {:c => 'c', :b => 'b'}})
=> true
If you could only get a string representation of body and not have the Ruby 1.8 hash come back with different orders from one time to the other, you could reliably hash that string representation. Let's get our hands dirty with some monkey patches:
require 'digest/md5'
class Object
def md5key
to_s
end
end
class Array
def md5key
map(&:md5key).join
end
end
class Hash
def md5key
sort.map(&:md5key).join
end
end
Now any object (of the types mentioned in the question) respond to md5key by returning a reliable key to use for creating a checksum, so:
def createsig(o)
Digest::MD5.hexdigest(o.md5key)
end
Example:
body = [
{
'bar' => [
345,
"baz",
],
'qux' => 7,
},
"foo",
123,
]
p body.md5key # => "bar345bazqux7foo123"
p createsig(body) # => "3a92036374de88118faf19483fe2572e"
Note: This hash representation does not encode the structure, only the concatenation of the values. Therefore ["a", "b", "c"] will hash the same as ["abc"].
Here's my solution. I walk the data structure and build up a list of pieces that get joined into a single string. In order to ensure that the class types seen affect the hash, I inject a single unicode character that encodes basic type information along the way. (For example, we want ["1", "2", "3"].objsum != [1,2,3].objsum)
I did this as a refinement on Object, it's easily ported to a monkey patch. To use it just require the file and run "using ObjSum".
module ObjSum
refine Object do
def objsum
parts = []
queue = [self]
while queue.size > 0
item = queue.shift
if item.kind_of?(Hash)
parts << "\\000"
item.keys.sort.each do |k|
queue << k
queue << item[k]
end
elsif item.kind_of?(Set)
parts << "\\001"
item.to_a.sort.each { |i| queue << i }
elsif item.kind_of?(Enumerable)
parts << "\\002"
item.each { |i| queue << i }
elsif item.kind_of?(Fixnum)
parts << "\\003"
parts << item.to_s
elsif item.kind_of?(Float)
parts << "\\004"
parts << item.to_s
else
parts << item.to_s
end
end
Digest::MD5.hexdigest(parts.join)
end
end
end
Just my 2 cents:
module Ext
module Hash
module InstanceMethods
# Return a string suitable for generating content signature.
# Signature image does not depend on order of keys.
#
# {:a => 1, :b => 2}.signature_image == {:b => 2, :a => 1}.signature_image # => true
# {{:a => 1, :b => 2} => 3}.signature_image == {{:b => 2, :a => 1} => 3}.signature_image # => true
# etc.
#
# NOTE: Signature images of identical content generated under different versions of Ruby are NOT GUARANTEED to be identical.
def signature_image
# Store normalized key-value pairs here.
ar = []
each do |k, v|
ar << [
k.is_a?(::Hash) ? k.signature_image : [k.class.to_s, k.inspect].join(":"),
v.is_a?(::Hash) ? v.signature_image : [v.class.to_s, v.inspect].join(":"),
]
end
ar.sort.inspect
end
end
end
end
class Hash #:nodoc:
include Ext::Hash::InstanceMethods
end
These days there is a formally defined method for canonicalizing JSON, for exactly this reason: https://datatracker.ietf.org/doc/html/draft-rundgren-json-canonicalization-scheme-16
There is a ruby implementation here: https://github.com/dryruby/json-canonicalization
Depending on your needs, you could call ary.inspect or ary.to_yaml, even.
I want to convert something like this:
class NestedItem
attr_accessor :key, :children
def initialize(key, &block)
self.key = key
self.children = []
self.instance_eval(&block) if block_given?
end
def keys
[key] + children.keys
end
end
root = NestedItem.new("root") do
children << NestedItem.new("parent_a") do
children << NestedItem.new("child_a")
children << NestedItem.new("child_c")
end
children << NestedItem.new("parent_b") do
children << NestedItem.new("child_y")
children << NestedItem.new("child_z")
end
end
require 'pp'
pp root
#=>
# #<NestedItem:0x1298a0
# #children=
# [#<NestedItem:0x129814
# #children=
# [#<NestedItem:0x129788 #children=[], #key="child_a">,
# #<NestedItem:0x12974c #children=[], #key="child_c">],
# #key="parent_a">,
# #<NestedItem:0x129738
# #children=
# [#<NestedItem:0x129698 #children=[], #key="child_y">,
# #<NestedItem:0x12965c #children=[], #key="child_z">],
# #key="parent_b">],
# #key="root">
Into this:
root.keys #=>
[
"root",
"root.parent_a",
"root.parent_a.child_a",
"root.parent_a.child_c",
"root.parent_b",
"root.parent_b.child_y",
"root.parent_b.child_z",
]
...using a recursive method.
What's the simplest way to go about this?
Update
I did this:
def keys
[key] + children.map(&:keys).flatten.map do |node|
"#{key}.#{node}"
end
end
Anything better?
Would Array.flatten work for you?
self.children.flatten should return the flattened results.
Yes, .flatten will produce what I think you really want.
But if you want exactly the string output you typed, this will do it:
def keys x
here = key
here = x + '.' + here if x
[ here ] + children.inject([]) { |m,o| m += o.keys here }
end
pp root.keys nil
Or, alternatively, replace the last line in #keys with:
([ here ] + children.map { |o| o.keys here }).flatten