What's the best way to hash a url in ruby?

What's the best way to hash a url in ruby? - ruby

I'm writing a web app that points to external links. I'm looking to create a non-sequential, non-guessable id for each document that I can use in the URL. I did the obvious thing: treating the url as a string and str#crypt on it, but that seems to choke on any non-alphanumberic characters, like the slashes, dots and underscores.
Any suggestions on the best way to solve this problem?
Thanks!

Depending on how long a string you would like you can use a few alternatives:
require 'digest'
Digest.hexencode('http://foo-bar.com/yay/?foo=bar&a=22')
# "687474703a2f2f666f6f2d6261722e636f6d2f7961792f3f666f6f3d62617226613d3232"
require 'digest/md5'
Digest::MD5.hexdigest('http://foo-bar.com/yay/?foo=bar&a=22')
# "43facc5eb5ce09fd41a6b55dba3fe2fe"
require 'digest/sha1'
Digest::SHA1.hexdigest('http://foo-bar.com/yay/?foo=bar&a=22')
# "2aba83b05dc9c2d9db7e5d34e69787d0a5e28fc5"
require 'digest/sha2'
Digest::SHA2.hexdigest('http://foo-bar.com/yay/?foo=bar&a=22')
# "e78f3d17c1c0f8d8c4f6bd91f175287516ecf78a4027d627ebcacfca822574b2"
Note that this won't be unguessable, you may have to combine it with some other (secret but static) data to salt the string:
salt = 'foobar'
Digest::SHA1.hexdigest(salt + 'http://foo-bar.com/yay/?foo=bar&a=22')
# "dbf43aff5e808ae471aa1893c6ec992088219bbb"
Now it becomes much harder to generate this hash for someone who doesn't know the original content and has no access to your source.

I would also suggest looking at the different algorithms in the digest namespace. To make it harder to guess, rather than (or in addition to) salting with a secret passphrase, you can also use a precise dump of the time:
require 'digest/md5'
def hash_url(url)
Digest::MD5.hexdigest("#{Time.now.to_f}--#{url}")
end
Since the result of any hashing algorithm is not guaranteed to be unique, don't forget to check for the uniqueness of your result against previously generated hashes before assuming that your hash is usable. The use of Time.now makes the retry trivial to implement, since you only have to call until a unique hash is generated.

Use Digest::MD5 from Ruby's standard library:
Digest::MD5.hexdigest(my_url)

Related

Fetch from hash with either Singular or Plural

I get the following input hash in my ruby code
my_hash = { include: 'a,b,c' }
(or)
my_hash = { includes: 'a,b,c' }
Now I want the fastest way to get 'a,b,c'
I currently use
def my_includes
my_hash[:include] || my_hash[:includes]
end
But this is very slow because it always checks for :include keyword first then if it fails it'll look for :includes. I call this function several times and the value inside this hash can keep changing. Is there any way I can optimise and speed up this? I won't get any other keywords. I just need support for :include and :includes.

Caveats and Considerations
First, some caveats:
You tagged this Rails 3, so you're probably on a very old Ruby that doesn't support a number of optimizations, newer Hash-related method calls like #fetch_values or #transform_keys!, or pattern matching for structured data.
You can do all sorts of things with your Hash lookups, but none of them are likely to be faster than a Boolean short-circuit when assuming you can be sure of having only one key or the other at all times.
You haven't shown any of the calling code, so without benchmarks it's tough to see how this operation can be considered "slow" in any general sense.
If you're using Rails and not looking for a pure Ruby solution, you might want to consider ActiveModel::Dirty to only take action when an attribute has changed.
Use Memoization
Regardless of the foregoing, what you're probably missing here is some form of memoization so you don't need to constantly re-evaluate the keys and extract the values each time through whatever loop feels slow to you. For example, you could store the results of your Hash evaluation until it needs to be refreshed:
attr_accessor :includes
def extract_includes(hash)
#includes = hash[:include] || hash[:includes]
end
You can then call #includes or #includes= (or use the #includes instance variable directly if you like) from anywhere in scope as often as you like without having to re-evaluate the hashes or keys. For example:
def count_includes
#includes.split(?,).count
end
500.times { count_includes }
The tricky part is basically knowing if and when to update your memoized value. Basically, you should only call #extract_includes when you fetch a new Hash from somewhere like ActiveRecord or a remote API. Until that happens, you can reuse the stored value for as long as it remains valid.

You could work with a modified hash that has both keys :include and :includes with the same values:
my_hash = { include: 'a,b,c' }
my_hash.update(my_hash.key?(:include) ? { includes: my_hash[:include] } :
{ include: my_hash[:includes] })
#=> {:include=>"a,b,c", :includes=>"a,b,c"}
This may be fastest if you were using the same hash my_hash for multiple operations. If, however, a new hash is generated after just a few interrogations, you might see if both the keys :include and :includes can be included when the hash is constructed.

How to make Ruby Hash lookup faster?

What to do on a Ruby Hash, to optimize/fasten lookup ? (read only access)
Ex: freezing the hash, sorting the keys, forcing numerical keys...

Use frozen_string_literal:
frozen_string_literal reduces the generated garbage by ~100MB or ~20%! Free performance by adding a one line comment.
Conclusion
Gem authors: add # frozen_string_literal: true to the top of all Ruby files in a gem. It gives a free performance improvement to all your users as long as you don’t use String mutation.
Mike Perham: Ruby Optimization with One Magic Comment, 2018-02-28: https://www.mikeperham.com/2018/02/28/ruby-optimization-with-one-magic-comment/
Use Symbol as hash keys:
If you use Ruby 2.2, Symbol could be more performant than String as Hash keys.
Fast Ruby: https://github.com/fastruby/fast-ruby#hash

JSON generate unique hash value (SHA-512)

I'm searching for a way to generate a SHA-512 hash from a json string in Ruby, independent from the positions of the elements in it, and independent from nestings, arrays, nested arrays and so on. I just want to hash the raw data along with its keys.
I tried some approaches with converting the JSON into a ruby hash, deep sort them by their keys, append everything into one, long string and hash it. But I bet that my solution isn't the most efficient one, and that there must be a better way to do this.
EDIT
So far, I convert JSON into a Ruby hash. Then I try to use this function to get a canonical representation:
def self.canonical_string_from_hash value, key=nil
str = ""
if value.is_a? Hash
value.keys.sort.each do |k|
str += canonical_string_from_hash(value[k], k)
end
elsif value.is_a? Array
str += key.to_s
value.each do |v|
str += canonical_string_from_hash(v)
end
else
str += key ? "#{key}#{value}" : value.to_s
end
return str
end
But I'm not sure, if this is a good and efficient way to do this.
For example, this hash
hash = {
id: 3,
zoo: "test",
global: [
{ukulele: "ringding", blub: 3},
{blub: nil, ukulele: "rangdang", guitar: "stringstring"}
],
foo: {
ids: [3,4,5],
bar: "asdf"
}
}
gets converted to this string:
barasdfids345globalblub3ukuleleringdingblubguitarstringstringukulelerangdangid3zootest

But I'm not sure, if this is a good and efficient way to do this.
Depends on what you are trying to do. Your canonical/equivalent structures need to represent what is important to you for the comparison. Removing details such as object structure makes sense if you consider two items with different structure but same string values equivalent.
According to your comments, you are attempting to sign a request that is being transferred from one system to a second one. In other words you want security, not a measure of similarity or a digital fingerprint for some other purpose. Therefore equivalent requests are ones that are identical in all the ways that affect the processing that you want to protect. It is simpler, and very likely more secure, to lock down the raw bytes of data that transfer between your two systems.
In which case your whole approach needs a re-think. The reasons for that are probably best discussed on security.stackoverflow.com
However, in brief:
Use an HMAC routine (HMAC-SHA512), it is designed for your purpose. Instead of a salt, this uses a secret, which is essentially the same thing (in fact you need to keep your salt a secret in your implementation too, which is unusual for something called a salt), but has been combined with the SHA in a way which makes it resilient to a couple of attack forms possible against simple concatenation followed by SHA. The worst of these is that it is possible to extend the data and have it generate the same SHA when processed, without needing to know the salt. In other words, an attacker could take a known valid request and use it to forge other requests which will get past your security check. Your proposed solution looks vulnerable to this form of attack to me.
Unpacking the request and analysing the details to get a "canonical" view of the request is not necessary, and also reduces the security of your solution. The only reason for doing this is that you are for some reason not able to handle the request once it has been serialised to JSON, and are forced to work only with the de-serialised request at one end or another of the two systems. If that is purely a knowledge or convenience thing, then fix that problem rather than trying to roll your own security protocol using SHA-512.
You should sign the request, and check the signature, against the fully serialised JSON string. If you need to de-serialise data from a "man-in-the-middle" attack, then you are potentially already exposed to some attacks via the parser. You should work to reject suspect requests before any data processing has been done to them.
TL;DR - ALthough not a direct answer to your question, the correct solution for you is to not write this code at all. Instead you need to place your secure signature code closer to the ins and outs of your two services that need to trust each other.

ruby symbol as key, but can't get value from hash

I'm doing some update on other one's code and now I have a hash, it's like:
{"instance_id"=>"74563c459c457b2288568ec0a7779f62", "mem_quota"=>536870912, "disk_quota"=>2147483648, "mem_usage"=>59164.0, "cpu_usage"=>0.1, "disk_usage"=>6336512}
and I want to get the value by symbol as a key, for example: :mem_quota, but failed.
The code is like:
instance[:mem_usage].to_f
but it returns nothing. Is there any reason can cause this problem?

Use instance["mem_usage"] instead since the hash is not using symbols.

The other explanations are correct, but to give a broader background:
You are probably used to working within Rails where a very specific variant of Hash, called HashWithIndifferentAccess, is used for things like params. This particular class works like a standard ruby Hash, except when you access keys you are allowed to use either Symbols or Strings. The standard Ruby Hash, and generally speaking, Hash implementations in other languages, expect that to access an element, the key used for later access should be an object of the same class and value as the key used to store the object. HashWithIndifferentAccess is a Rails convenience class provided via the Active Support libraries. You are free to use them yourself, but they have first be brought in by requiring them.
HashWithIndifferentAccess just does the conversion for you at access time from string to symbol.
So, for your case, instance["mem_usage"].to_f should work.

You need HashWithIndifferentAccess.
require 'active_support/core_ext'
h1 = {"instance_id"=>"74563c459c457b2288568ec0a7779f62", "mem_quota"=>536870912,
"disk_quota"=>2147483648, "mem_usage"=>59164.0, "cpu_usage"=>0.1,
"disk_usage"=>6336512}
h2 = h1.with_indifferent_access
h1[:mem_usage] # => nil
h1["mem_usage"] # => 59164.0
h2[:mem_usage] # => 59164.0
h2["mem_usage"] # => 59164.0

Also, there are the symbolize_keys and stringify_keys options that may be of help. The method names are self-descriptive enough, I believe.

Clearly the keys of your hash are strings because they have double quotes around them. Therefore you will need to access the keys with instance["mem_usage"] or you will need to build a new hash with symbols as the keys first.

If you use Rails with ActiveSupport, then do use HashWithIndifferentAccess for flexibility in accessing hash with either string or symbol.
hash = HashWithIndifferentAccess.new({
"instance_id"=>"74563c459c457b2288568ec0a7779f62",
"mem_quota"=>536870912, "disk_quota"=>2147483648,
"mem_usage"=>59164.0,
"cpu_usage"=>0.1,
"disk_usage"=>6336512
})
hash[:mem_usage] # => 59164.0
hash["mem_usage"] # => 59164.0

Why is :key.hash != 'key'.hash in Ruby?

I'm learning Ruby right now for the Rhodes mobile application framework and came across this problem: Rhodes' HTTP client parses JSON responses into Ruby data structures, e.g.
puts #params # prints {"body"=>{"results"=>[]}}
Since the key "body" is a string here, my first attempt #params[:body] failed (is nil) and instead it must be #params['body']. I find this most unfortunate.
Can somebody explain the rationale why strings and symbols have different hashes, i.e. :body.hash != 'body'.hash in this case?

Symbols and strings serve two different purposes.
Strings are your good old familiar friends: mutable and garbage-collectable. Every time you use a string literal or #to_s method, a new string is created. You use strings to build HTML markup, output text to screen and whatnot.
Symbols, on the other hand, are different. Each symbol exists only in one instance and it exists always (i.e, it is not garbage-collected). Because of that you should make new symbols very carefully (String#to_sym and :'' literal). These properties make them a good candidate for naming things. For example, it's idiomatic to use symbols in macros like attr_reader :foo.
If you got your hash from an external source (you deserialized a JSON response, for example) and you want to use symbols to access its elements, then you can either use HashWithIndifferentAccess (as others pointed out), or call helper methods from ActiveSupport:
require 'active_support/core_ext'
h = {"body"=>{"results"=>[]}}
h.symbolize_keys # => {:body=>{"results"=>[]}}
h.stringify_keys # => {"body"=>{"results"=>[]}}
Note that it'll only touch top level and will not go into child hashes.

Symbols and Strings are never ==:
:foo == 'foo' # => false
That's a (very reasonable) design decision. After all, they have different classes, methods, one is mutable the other isn't, etc...
Because of that, it is mandatory that they are never eql?:
:foo.eql? 'foo' # => false
Two objects that are not eql? typically don't have the same hash, but even if they did, the Hash lookup uses hash and then eql?. So your question really was "why are symbols and strings not eql?".
Rails uses HashWithIndifferentAccess that accesses indifferently with strings or symbols.

In Rails, the params hash is actually a HashWithIndifferentAccess rather than a standard ruby Hash object. This allows you to use either strings like 'action' or symbols like :action to access the contents.
You will get the same results regardless of what you use, but keep in mind this only works on HashWithIndifferentAccess objects.
Copied from : Params hash keys as symbols vs strings

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio