I prefer to generate uniquely random alpha-numeric string to append to the end of my UID.
The closest I could find in the class library so far has been the Random class, which generates numbers which is the next best thing.
What I have so far is:
getNextRandomNumber
^(((rand nextValue) /
(Time now milliSeconds asInteger / Time now minutes asInteger
+ (Time now hour24 asInteger)) asInteger)).
rand is a class variable, initialized as:
initialize
rand := Random new.
This seems very poorly written. But I'm unsure of what else to do.
Which dialect are you using?
In Pharo, I usually implement a method in String class called something like #randomOfSize:. Something like:
String class >> randomOfSize: anInteger
^ self streamContents: [ :s |
anInteger timesRepeat:
[ s nextPut: (Character codePoint: (97 to: 122) atRandom) ] ]
You can tweak the character codes to get the interval of characters you need.
Then, to generate an 8 characters long random string you can do:
String randomOfSize: 8
In Pharo, you can also use the UUID class, as follows:
UUID new printString
Hope it helped!
Related
I'm trying to find a way to convert a long string ID like "T2hR8VAR4tNULoglmIbpAbyvdRi1y02rBX" to a numerical id.
I thought about getting the ASCII value of each number and then adding them up but I don't think that this is a good way as different numbers can have the same result, for example, "ABC" and "BAC" will have the same result
A = 10, B = 20, C = 50,
ABC = 10 + 20 + 50 = 80
BAC = 20 + 10 + 50 = 80
I also thought about getting each letters ASCII code, then set the numbers next to each other for example "ABC"
so ABC = 102050
this method won't work as having a 20 letter String will result in a huge number, so how can I solve this problem? thank you in advance.
You can use the hashCode() function. "id".hashcode(). All objects implement a variance of this function.
From the documentation:
open fun hashCode(): Int
Returns a hash code value for the object. The general contract of hashCode is:
Whenever it is invoked on the same object more than once, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
If two objects are equal according to the equals() method, then calling the hashCode method on each of the two objects must produce the same integer result.
All platform object implements it by default. There is always a possibility for duplicates if you have lots of ids.
If you use a JVM based kotlin environment the hash will be produced by the
String.hashCode() function from the JVM.
If you need to be 100% confident that there are no possible duplicates, and the input Strings can be up to 20 characters long, then you cannot store the IDs in a 64-bit Long. You will have to use BigInteger:
val id = BigInteger(stringId.toByteArray())
At that point, I question whether there is any point in converting the ID to a numerical format. The String itself can be the ID.
In spreadsheets I have cells named like "F14", "BE5" or "ALL1". I have the first part, the column coordinate, in a variable and I want to convert it to a 0-based integer column index.
How do I do it, preferably in an elegant way, in Ruby?
I can do it using a brute-force method: I can imagine loopping through all letters, converting them to ASCII and adding to a result, but I feel there should be something more elegant/straightforward.
Edit: Example: To simplify I do only speak about the column coordinate (letters). Therefore in the first case (F14) I have "F" as the input and I expect the result to be 5. In the second case I have "BE" as input and I expect getting 56, for "ALL" I want to get 999.
Not sure if this is any clearer than the code you already have, but it does have the advantage of handling an arbitrary number of letters:
class String
def upcase_letters
self.upcase.split(//)
end
end
module Enumerable
def reverse_with_index
self.map.with_index.to_a.reverse
end
def sum
self.reduce(0, :+)
end
end
def indexFromColumnName(column_str)
start = 'A'.ord - 1
column_str.upcase_letters.map do |c|
c.ord - start
end.reverse_with_index.map do |value, digit_position|
value * (26 ** digit_position)
end.sum - 1
end
I've added some methods to String and Enumerable because I thought it made the code more readable, but you could inline these or define them elsewhere if you don't like that sort of thing.
We can use modulo and the length of the input. The last character will
be used to calculate the exact "position", and the remainders to count
how many "laps" we did in the alphabet, e.g.
def column_to_integer(column_name)
letters = /[A-Z]+/.match(column_name).to_s.split("")
laps = (letters.length - 1) * 26
position = ((letters.last.ord - 'A'.ord) % 26)
laps + position
end
Using decimal representation (ord) and the math tricks seems a neat
solution at first, but it has some pain points regarding the
implementation. We have magic numbers, 26, and constants 'A'.ord all
over.
One solution is to give our code better knowlegde about our domain, i.e.
the alphabet. In that case, we can switch the modulo with the position of
the last character in the alphabet (because it's already sorted in a zero-based array), e.g.
ALPHABET = ('A'..'Z').to_a
def column_to_integer(column_name)
letters = /[A-Z]+/.match(column_name).to_s.split("")
laps = (letters.length - 1) * ALPHABET.size
position = ALPHABET.index(letters.last)
laps + position
end
The final result:
> column_to_integer('F5')
=> 5
> column_to_integer('AK14')
=> 36
HTH. Best!
I have found particularly neat way to do this conversion:
def index_from_column_name(colname)
s=colname.size
(colname.to_i(36)-(36**s-1).div(3.5)).to_s(36).to_i(26)+(26**s-1)/25-1
end
Explanation why it works
(warning spoiler ;) ahead). Basically we are doing this
(colname.to_i(36)-('A'*colname.size).to_i(36)).to_s(36).to_i(26)+('1'*colname.size).to_i(26)-1
which in plain English means, that we are interpreting colname as 26-base number. Before we can do it we need to interpret all A's as 1, B's as 2 etc. If only this is needed than it would be even simpler, namely
(colname.to_i(36) - '9'*colname.size).to_i(36)).to_s(36).to_i(26)-1
unfortunately there are Z characters present which would need to be interpreted as 10(base 26) so we need a little trick. We shift every digit 1 more then needed and than add it at the end (to every digit in original colname)
`
Upon creating an instance of a given ActiveRecord model object, I need to generate a shortish (6-8 characters) unique string to use as an identifier in URLs, in the style of Instagram's photo URLs (like http://instagram.com/p/P541i4ErdL/, which I just scrambled to be a 404) or Youtube's video URLs (like http://www.youtube.com/watch?v=oHg5SJYRHA0).
What's the best way to go about doing this? Is it easiest to just create a random string repeatedly until it's unique? Is there a way to hash/shuffle the integer id in such a way that users can't hack the URL by changing one character (like I did with the 404'd Instagram link above) and end up at a new record?
Here's a good method with no collision already implemented in plpgsql.
First step: consider the pseudo_encrypt function from the PG wiki.
This function takes a 32 bits integer as argument and returns a 32 bits integer that looks random to the human eye but uniquely corresponds to its argument (so that's encryption, not hashing). Inside the function, you may change the formula: (((1366.0 * r1 + 150889) % 714025) / 714025.0) with another function known only by you that produces a result in the [0..1] range (just tweaking the constants will probably be good enough, see below my attempt at doing just that). Refer to the wikipedia article on the Feistel cypher for more theorical explanations.
Second step: encode the output number in the alphabet of your choice. Here's a function that does it in base 62 with all alphanumeric characters.
CREATE OR REPLACE FUNCTION stringify_bigint(n bigint) RETURNS text
LANGUAGE plpgsql IMMUTABLE STRICT AS $$
DECLARE
alphabet text:='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
base int:=length(alphabet);
_n bigint:=abs(n);
output text:='';
BEGIN
LOOP
output := output || substr(alphabet, 1+(_n%base)::int, 1);
_n := _n / base;
EXIT WHEN _n=0;
END LOOP;
RETURN output;
END $$
Now here's what we'd get for the first 10 URLs corresponding to a monotonic sequence:
select stringify_bigint(pseudo_encrypt(i)) from generate_series(1,10) as i;
stringify_bigint
------------------
tWJbwb
eDUHNb
0k3W4b
w9dtmc
wWoCi
2hVQz
PyOoR
cjzW8
bIGoqb
A5tDHb
The results look random and are guaranteed to be unique in the entire output space (2^32 or about 4 billion values if you use the entire input space with negative integers as well).
If 4 billion values was not wide enough, you may carefully combine two 32 bits results to get to 64 bits while not loosing unicity in outputs. The tricky parts are dealing correctly with the sign bit and avoiding overflows.
About modifying the function to generate your own unique results: let's change the constant from 1366.0 to 1367.0 in the function body, and retry the test above. See how the results are completely different:
NprBxb
sY38Ob
urrF6b
OjKVnc
vdS7j
uEfEB
3zuaT
0fjsab
j7OYrb
PYiwJb
Update: For those who can compile a C extension, a good replacement for pseudo_encrypt() is range_encrypt_element() from the permuteseq extension, which has of the following advantages:
works with any output space up to 64 bits, and it doesn't have to be a power of 2.
uses a secret 64-bit key for unguessable sequences.
is much faster, if that matters.
You could do something like this:
random_attribute.rb
module RandomAttribute
def generate_unique_random_base64(attribute, n)
until random_is_unique?(attribute)
self.send(:"#{attribute}=", random_base64(n))
end
end
def generate_unique_random_hex(attribute, n)
until random_is_unique?(attribute)
self.send(:"#{attribute}=", SecureRandom.hex(n/2))
end
end
private
def random_is_unique?(attribute)
val = self.send(:"#{attribute}")
val && !self.class.send(:"find_by_#{attribute}", val)
end
def random_base64(n)
val = base64_url
val += base64_url while val.length < n
val.slice(0..(n-1))
end
def base64_url
SecureRandom.base64(60).downcase.gsub(/\W/, '')
end
end
Raw
user.rb
class Post < ActiveRecord::Base
include RandomAttribute
before_validation :generate_key, on: :create
private
def generate_key
generate_unique_random_hex(:key, 32)
end
end
You can hash the id:
Digest::MD5.hexdigest('1')[0..9]
=> "c4ca4238a0"
Digest::MD5.hexdigest('2')[0..9]
=> "c81e728d9d"
But somebody can still guess what you're doing and iterate that way. It's probably better to hash on the content
I have a code that should get unique string(for example, "d86c52ec8b7e8a2ea315109627888fe6228d") from client and return integer more than 2200000000 and less than 5800000000. It's important, that this generated int is not random, it should be one for one unique string. What is the best way to generate it without using DB?
Now it looks like this:
did = "d86c52ec8b7e8a2ea315109627888fe6228d"
min_cid = 2200000000
max_cid = 5800000000
cid = did.hash.abs.to_s.split.last(10).to_s.to_i
if cid < min_cid
cid += min_cid
else
while cid > max_cid
cid -= 1000000000
end
end
Here's the problem - your range of numbers has only 3.6x10^9 possible values where as your sample unique string (which looks like a hex integer with 36 digits) has 16^32 possible values (i.e. many more). So when mapping your string into your integer range there will be collisions.
The mapping function itself can be pretty straightforward, I would do something such as below (also, consider using only a part of the input string for integer conversion, e.g. the first seven digits, if performance becomes critical):
def my_hash(str, min, max)
range = (max - min).abs
(str.to_i(16) % range) + min
end
my_hash(did, min_cid, max_cid) # => 2461595789
[Edit] If you are using Ruby 1.8 and your adjusted range can be represented as a Fixnum, just use the hash value of the input string object instead of parsing it as a big integer. Note that this strategy might not be safe in Ruby 1.9 (per the comment by #DataWraith) as object hash values may be randomized between invocations of the interpreter so you would not get the same hash number for the same input string when you restart your application:
def hash_range(obj, min, max)
(obj.hash % (max-min).abs) + [min, max].min
end
hash_range(did, min_cid, max_cid) # => 3886226395
And, of course, you'll have to decide what to do about collisions. You'll likely have to persist a bucket of input strings which map to the same value and decide how to resolve the conflicts if you are looking up by the mapped value.
You could generate a 32-bit CRC, drop one bit, and add the result to 2.2M. That gives you a max value of 4.3M.
Alternately you could use all 32 bits of the CRC, but when the result is too large, append a zero to the input string and recalculate, repeating until you get a value in range.
While its not my application a simple way to explain my problem is to assume I'm running a URL shortener. Rather than attempt to try and figure out what the next string I should use as the unique section of the URL, I just index all my URLs by integer and map the numbers to strings behind the scenes, essentially just changing the base of the number to, let's say, 62: a-z + A-Z + 0-9.
In ActiveRecord I can easily alter the reader for the url_id field so that it returns my base 62 string instead of the number being stored in the database:
class Short < ActiveRecord::Base
def url_id
i = read_attribute(:convo)
return '0' if i == 0
s = ''
while i > 0
s << CHARS[i.modulo(62)]
i /= 62
end
s
end
end
but is there a way to tell ActiveRecord to accept Short.find(:first,:conditions=>{:url_id=>'Ab7'}), ie. putting the 'decoding' logic into my Short ActiveRecord class?
I guess I could define my own def self.find_by_unique_string(string), but that feels like cheating somehow! Thanks!
Another alternative is to actually add an extra field to your database table for unique_string and then use a before_save callback to put the encoded value in this field. Then, once the record is saved, you will be able to use that field in any kind of find.