Detect characters in a string - ruby

I'm playing with ruby on codewars. The task is to create a method that accepts a string and returns a string of length 26 of 1s and 0s. The 26 characters of the string correspond to each letter of the alphabet (upper or lower case) and is 1 if the letter is in the string, 0 if not. If an a or an A is in the string, the first character of the returned string is 1 otherwise 0, if b or B is, the second is 1, and so on. For instance:
change('a **& bZ') # => '11000000000000000000000001'
Solutions:
def change input
('a'..'z').to_a.join.gsub(/[#{a.scan(/[a-zA-Z]/).uniq.join}]/i,'1').gsub(/\D/,'0')
end
vs.
def change input
('a'..'z').map { |letter| input.downcase.include?(letter) ? '1' : '0' }.join
end
How can I tell which solution is more optimal? There can be more optimal ones.

Let n be the number of letters in the input and m be the number of letters in the alphabet.
input.scan(/[a-zA-Z]/).uniq.join
is O(n) + O(n) + O(n). Fortunately, you are doing this only once (when what the pattern to gsub is evaluated) Therefore, your complexity adds up to 2*O(m) + 3*O(n) + O(m) = O(max(n, m)).
On the other hand,
input.downcase.include?(letter)
is O(n), but it is executed for each letter in the alphabet, leaving you with O(m*n) + O(m) = O(m*n).
Therefore, the first solution is asymptotically better, as O(max(n, m)) < O(m*n).
That is unless you consider the number of letters in the alphabet a small constant, in which case they are both O(n) and it's just a matter of benchmarking.
You can see that both are linear:
Running 100_000 iterations on a random 1000 letter string gave the following results (using cruby 2.2.2):
user system total real
36.160000 0.000000 36.160000 ( 36.182512)
3.910000 0.000000 3.910000 ( 3.915191)
So in practice, the second solution is far superior.
It is also way more readable.

Not really an answer to your question (which one is the most efficient) but an idea that uses binary arithmetic and the ascii table:
def change input
res = 0
input.each_byte { |c|
res |= c.between?(97,122) ? 1<<(122-c) : c.between?(65,90) ? 1<<(90-c) : 0
}
"%026b" % res
end
s = "Portez ce vieux whisky au juge blond qui fume"
puts change s
This code uses the ascii ranges 97-122 for lower-case letters and 65-90 for upper case letters. each_byte returns the ascii code c for each letters. If a letter is lower case (for example x) 122-c returns 122-120 so 2 that is the position of the corresponding bit. 1<<2 shift to the right the bits of the number 1 and you obtain 100 (binary), then the bitwise operator | (OR) with res gives 0 | 100 = 100 so 0000 0000 0000 0000 0000 0001 00 (without spaces and with leading zeros added).
Advantage: the string is parsed only once, there's no need to create an array and you only need one string manipulation (the formatted string at the end). The algorithm only uses operations that a processor is able to do very quickly.
Notices:
This code is able to deal with utf8 strings without modification since multibyte characters don't use values under 80 (Hex).
For better performances, you can replace the between?(...,...) method with simple number comparaisons:
res |= c>96 ? c<123 ? 1<<(122-c) : 0 : c<91 ? c>64 ? 1<<(90-c) : 0 : 0
With this change, this code is at least 2X faster than your second way.

Related

How to generate random string type primary key, which can auto increase its length?

If my table needs to use string type as its primary key, the length of which is increasable and as short as possible, and when it is available, it should be random in some sense, how can I make that?
For example:
given 26 letters, and the result should be like:
Assuming you just want a bit of obfuscation rather than proper cryptographic security, I'd suggest using a set of linear congruential generators to transform your integers into non-sequential values that you can then convert into base-26 values where each digit is represented by a letter of the alphabet (e.g., a=0, b=1, ..., z=25).
You'll need a different LCG for strings of each length, but these can be generated quite easily. Also, the input values will have to be adjusted so that, for example, the first two-character string corresponds to an input value of 26. (I'm counting from zero, since this makes the maths a bit more straightforward.)
For example, suppose you start with a value of n=12345. The first thing you need to do is figure out how long the output string needs to be:
n = 12345 # Input value
m = 26 # LCG modulus
k = 1 # Length of output string
while n >= m:
n -= m
m *= 26
k += 1
print(k) # Should be 3 in this case
print(n) # Should be 11643 (=12345 - 26 - 26**2)
Next, transform this output value of n with an LCG having a modulus of m=263 (for a 3-character output). For example, you could try a=7541 and c=12127. (Make sure the values you choose correspond to a maximal length sequence according to the Hull–Dobell theorem as described in the Wikipedia article.)
n_enc = (n * 7541 + 12127) % (26**3) # Should be 2294
In base 26, the mumber 2294 is represented as 3×262 + 10×26 + 6, so the final output will be dkg.
To reverse this process, convert the base-26 string back into an integer, apply the inverse LCG function
n = ((n_enc + 5449) * 3277) % (26**3) # Should be 11643
and add back on the smaller powers of 26:
while m > 26:
m //= 26
n += m
One slight wrinkle in this method is that if the length of your alphabet is not divisible by any squares greater than 1 (e.g., 26 = 2×13 is not divisible by 4, 9 or 16), then the LCG for single-character strings is inevitably going to produce sequential results. You can fix this by using a random permutation of the alphabet to represent the base-26 numbers.
I should also add the standard caveat that random strings of alphabet characters can sometimes spell words that are offensive or inappropriate, so you might want to consider restricting yourself to a disemvowelled alphabet if these strings are going to be visible to users at all.

Algorithm to find

the logic behind this was (n-2)3^(n-3) has lots of repetitons like (abc)***(abc) when abc is at start and at end and the strings repated total to 3^4 . similarly as abc moves ahead and number of sets of (abc) increase
You can use dynamic programming to compute the number of forbidden strings.
The algorithms follow from the observation below:
"Legal string of size n is the legal string of size n - 1 extended with one letter, so that the last three letters of the resulting string are not all distinct."
So if we had all the legal strings of size n-1 we could try extending them to obtain the legal strings of size n.
To check whether the extended string is legal we just need to know the last two letters of the previous string (of size n-1).
In the algorithm we will compute two arrays, where
different[i] # number of legal strings of length i in which last two letters are different
same[i] # number of legal strings of length i in which last two letters are the same
It can be easily proved that:
different[i+1] = different[i] + 2*same[i]
same[i+1] = different[i] + same[i]
It is the consequence of the following facts:
Any 'same' string of size i+1 can be obtained either from 'same' string of size i (think BB -> BBB) or from 'different' string (think AB -> ABB) and these are the only options.
Any 'different' string of size i+1 can be obtained either from 'different' string of size i (think AB-> ABA ) or from the 'same' string in two ways (AA -> AAB or AA -> AAC)
Having observed all this it is easy to write an algorithm that computes the result in O(n) time.
I suggest you use recursion, and look at two numbers:
F(n), the number of legal strings of length n whose last two symbols are the same.
G(n), the number of legal strings of length n whose last two symbols are different.
Is that enough to go on?
get the ASCII values of the last three letters and add the square values of these letters. If it gives a certain result, then it is forbidden. For A, B and C, it would be fine.
To do this:
1) find out how to get characters from your string.
2) find out how to get ASCII value of a character.
3) Multiply these ASCII values with themselves.
4) Do that for the three letters each time and add their values.

How to compute one's complement using Ruby's bitwise operators?

What I want:
assert_equal 6, ones_complement(9) # 1001 => 0110
assert_equal 0, ones_complement(15) # 1111 => 0000
assert_equal 2, ones_complement(1) # 01 => 10
the size of the input isn't fixed as in 4 bits or 8 bits. rather its a binary stream.
What I see:
v = "1001".to_i(2) => 9
There's a bit flipping operator ~
(~v).to_s(2) => "-1010"
sprintf("%b", ~v) => "..10110"
~v => -10
I think its got something to do with one bit being used to store the sign or something... can someone explain this output ? How do I get a one's complement without resorting to string manipulations like cutting the last n chars from the sprintf output to get "0110" or replacing 0 with 1 and vice versa
Ruby just stores a (signed) number. The internal representation of this number is not relevant: it might be a FixNum, BigNum or something else. Therefore, the number of bits in a number is also undefined: it is just a number after all. This is contrary to for example C, where an int will probably be 32 bits (fixed).
So what does the ~ operator do then? Wel, just something like:
class Numeric
def ~
return -self - 1
end
end
...since that's what '~' represents when looking at 2's complement numbers.
So what is missing from your input statement is the number of bits you want to switch: a 32-bits ~ is different from a generic ~ like it is in Ruby.
Now if you just want to bit-flip n-bits you can do something like:
class Numeric
def ones_complement(bits)
self ^ ((1 << bits) - 1)
end
end
...but you do have to specify the number of bits to flip. And this won't affect the sign flag, since that one is outside your reach with XOR :)
It sounds like you only want to flip four bits (the length of your input) - so you probably want to XOR with 1111.
See this question for why.
One problem with your method is that your expected answer is only true if you only flip the four significant bits: 1001 -> 0110.
But the number is stored with leading zeros, and the ~ operator flips all the leading bits too: 00001001 -> 11110110. Then the leading 1 is interpreted as the negative sign.
You really need to specify what the function is supposed to do with numbers like 0b101 and 0b11011 before you can decide how to implement it. If you only ever want to flip 4 bits you can do v^0b1111, as suggested in another answer. But if you want to flip all significant bits, it gets more complicated.
edit
Here's one way to flip all the significant bits:
def maskbits n
b=1
prev=n;
mask=prev|(prev>>1)
while (mask!=prev)
prev=mask;
mask|=(mask>>(b*=2))
end
mask
end
def ones_complement n
n^maskbits(n)
end
This gives
p ones_complement(9).to_s(2) #>>"110"
p ones_complement(15).to_s(2) #>>"0"
p ones_complement(1).to_s(2) #>>"0"
This does not give your desired output for ones_compliment(1), because it treats 1 as "1" not "01". I don't know how the function could infer how many leading zeros you want without taking the width as an argument.
If you're working with strings you could do:
s = "0110"
s.gsub("\d") {|bit| bit=="1"?"0":"1"}
If you're working with numbers, you'll have to define the number of significant bits because:
0110 = 6; 1001 = 9;
110 = 6; 001 = 1;
Even, ignoring the sign, you'll probably have to handle this.
What you are doing (using the ~) operator, is indeed a one's complement. You are getting those values that you are not expecting because of the way the number is interpreted by Ruby.
What you actually need to do will depend on what you are using this for. That is to say, why do you need a 1's complement?
Remember that you are getting the one's complement right now with ~ if you pass in a Fixnum: the number of bits which represent the number is a fixed quantity in the interpreter and thus there are leading 0's in front of the binary representation of the number 9 (binary 1001). You can find this number of bits by examining the size of any Fixnum. (the answer is returned in bytes)
1.size #=> 4
2147483647.size #=> 4
~ is also defined over Bignum. In this case it behaves as if all of the bits which are specified in the Bignum were inverted, and then if there were an infinite string of 1's in front of that Bignum. You can, conceivably shove your bitstream into a Bignum and invert the whole thing. You will however need to know the size of the bitstream prior to inversion to get a useful result out after it is inverted.
To answer the question as you pose it right off the bat, you can find the largest power of 2 less than your input, double it, subtract 1, then XOR the result of that with your input and always get a ones complement of just the significant bits in your input number.
def sig_ones_complement(num)
significant_bits = num.to_s(2).length
next_smallest_pow_2 = 2**(significant_bits-1)
xor_mask = (2*next_smallest_pow_2)-1
return num ^ xor_mask
end

Number base conversion as a stream operation

Is there a way in constant working space to do arbitrary size and arbitrary base conversions. That is, to convert a sequence of n numbers in the range [1,m] to a sequence of ceiling(n*log(m)/log(p)) numbers in the range [1,p] using a 1-to-1 mapping that (preferably but not necessarily) preservers lexigraphical order and gives sequential results?
I'm particularly interested in solutions that are viable as a pipe function, e.i. are able to handle larger dataset than can be stored in RAM.
I have found a number of solutions that require "working space" proportional to the size of the input but none yet that can get away with constant "working space".
Does dropping the sequential constraint make any difference? That is: allow lexicographically sequential inputs to result in non lexicographically sequential outputs:
F(1,2,6,4,3,7,8) -> (5,6,3,2,1,3,5,2,4,3)
F(1,2,6,4,3,7,9) -> (5,6,3,2,1,3,5,2,4,5)
some thoughts:
might this work?
streamBasen -> convert(n, lcm(n,p)) -> convert(lcm(n,p), p) -> streamBasep
(where lcm is least common multiple)
I don't think it's possible in the general case. If m is a power of p (or vice-versa), or if they're both powers of a common base, you can do it, since each group of logm(p) is then independent. However, in the general case, suppose you're converting the number a1 a2 a3 ... an. The equivalent number in base p is
sum(ai * mi-1 for i in 1..n)
If we've processed the first i digits, then we have the ith partial sum. To compute the i+1'th partial sum, we need to add ai+1 * mi. In the general case, this number is going have non-zero digits in most places, so we'll need to modify all of the digits we've processed so far. In other words, we'll have to process all of the input digits before we'll know what the final output digits will be.
In the special case where m are both powers of a common base, or equivalently if logm(p) is a rational number, then mi will only have a few non-zero digits in base p near the front, so we can safely output most of the digits we've computed so far.
I think there is a way of doing radix conversion in a stream-oriented fashion in lexicographic order. However, what I've come up with isn't sufficient for actually doing it, and it has a couple of assumptions:
The length of the positional numbers are already known.
The numbers described are integers. I've not considered what happens with the maths and -ive indices.
We have a sequence of values a of length p, where each value is in the range [0,m-1]. We want a sequence of values b of length q in the range [0,n-1]. We can work out the kth digit of our output sequence b from a as follows:
bk = floor[ sum(ai * mi for i in 0 to p-1) / nk ] mod n
Lets rearrange that sum into two parts, splitting it at an arbitrary point z
bk = floor[ ( sum(ai * mi for i in z to p-1) + sum(ai * mi for i in 0 to z-1) ) / nk ] mod n
Suppose that we don't yet know the values of a between [0,z-1] and can't compute the second sum term. We're left with having to deal with ranges. But that still gives us information about bk.
The minimum value bk can be is:
bk >= floor[ sum(ai * mi for i in z to p-1) / nk ] mod n
and the maximum value bk can be is:
bk <= floor[ ( sum(ai * mi for i in z to p-1) + mz - 1 ) / nk ] mod n
We should be able to do a process like this:
Initialise z to be p. We will count down from p as we receive each character of a.
Initialise k to the index of the most significant value in b. If my brain is still working, ceil[ logn(mp) ].
Read a value of a. Decrement z.
Compute the min and max value for bk.
If the min and max are the same, output bk, and decrement k. Goto 4. (It may be possible that we already have enough values for several consecutive values of bk)
If z!=0 then we expect more values of a. Goto 3.
Hopefully, at this point we're done.
I've not considered how to efficiently compute the range values as yet, but I'm reasonably confident that computing the sum from the incoming characters of a can be done much more reasonably than storing all of a. Without doing the maths though, I won't make any hard claims about it though!
Yes, it is possible
For every I character(s) you read in, you will write out O character(s)
based on Ceiling(Length * log(In) / log(Out)).
Allocate enough space
Set x to 1
Loop over digits from end to beginning # Horner's method
Set a to x * digit
Set t to O - 1
Loop while a > 0 and t >= 0
Set a to a + out digit
Set out digit at position t to a mod to base
Set a to a / to base
Set x to x * from base
Return converted digit(s)
Thus, for base 16 to 2 (which is easy), using "192FE" we read '1' and convert it, then repeat on '9', then '2' and so on giving us '0001', '1001', '0010', '1111', and '1110'.
Note that for bases that are not common powers, such as base 17 to base 2 would mean reading 1 characters and writing 5.

How many digits will be after converting from one numeral system to another

The main question: How many digits?
Let me explain. I have a number in binary system: 11000000 and in decimal is 192.
After converting to decimal, how many digits it will have (in dicimal)? In my example, it's 3 digits. But, it isn't a problem. I've searched over internet and found one algorithm for integral part and one for fractional part. I'm not quite understand them, but (I think) they works.
When converting from binary to octal, it's more easy: each 3 bits give you 1 digit in octal. Same for hex: each 4 bits = 1 hex digit.
But, I'm very curious, what to do, if I have a number in P numeral system and want to convert it to the Q numeral system? I know how to do it (I think, I know :)), but, 1st of all, I want to know how many digits in Q system it will take (u no, I must preallocate space).
Writing n in base b takes ceiling(log base b (n)) digits.
The ratio you noticed (octal/binary) is log base 8 (n) / log base 2 (n) = 3.
(From memory, will it stick?)
There was an error in my previous answer: look at the comment by Ben Schwehn.
Sorry for the confusion, I found and explain the error I made in my previous answer below.
Please use the answer provided by Paul Tomblin. (rewritten to use P, Q and n)
Y = ln(P^n) / ln(Q)
Y = n * ln(P) / ln(Q)
So Y (rounded up) is the number of characters you need in system Q to express the highest number you can encode in n characters in system P.
I have no answer (that wouldn't convert the number already and take up that many space in a temporary variable) to get the bare minimum for a given number 1000(bin) = 8(dec) while you would reserve 2 decimal positions using this formula.
If a temporary memory usage isn't a problem, you might cheat and use (Python):
len(str(int(otherBaseStr,P)))
This will give you the number of decimals needed to convert a number in base P, cast as a string (otherBaseStr), into decimals.
Old WRONG answer:
If you have a number in P numeral system of length n
Then you can calculate the highest number that is possible in n characters:
P^(n-1)
To express this highest number in number system Q you need to use logarithms (because they are the inverse to exponentiation):
log((P^(n-1))/log(Q)
(n-1)*log(P) / log(Q)
For example
11000000 in binary is 8 characters.
To get it in Decimal you would need:
(8-1)*log(2) / log(10) = 2.1 digits (round up to 3)
Reason it was wrong:
The highest number that is possible in n characters is
(P^n) - 1
not
P^(n-1)
If you have a number that's X digits long in base B, then the maximum value that can be represented is B^X - 1. So if you want to know how many digits it might take in base C, then you have to find the number Y that C^Y - 1 is at least as big as B^X - 1. The way to do that is to take the logarithm in base C of B^X-1. And since the logarithm (log) of a number in base C is the same as the natural log (ln) of that number divided by the natural log of C, that becomes:
Y = ln((B^X)-1) / ln(C) + 1
and since ln(B^X) is X * ln(B), and that's probably faster to calculate than ln(B^X-1) and close enough to the right answer, rewrite that as
Y = X * ln(B) / ln(C) + 1
Covert that to your favourite language. Because we dropped the "-1", we might end up with one digit more than you need in some cases. But even better, you can pre-calculate ln(B)/ln(C) and just multiply it by new "X"s and the length of the number you are trying to convert changes.
Calculating the number of digit can be done using the formulas given by the other answers, however, it might actually be faster to allocate a buffer of maximum size first and then return the relevant part of that buffer instead of calculating a logarithm.
Note that the worst case for the buffer size happens when you convert to binary, which gives you a buffer size of 32 characters for 32-bit integers.
Converting a number to an arbitrary base could be done using the C# function below (The code would look very similar in other languages like C or Java):
public static string IntToString(int value, char[] baseChars)
{
// 32 is the worst cast buffer size for base 2 and int.MaxValue
int i = 32;
char[] buffer = new char[i];
int targetBase= baseChars.Length;
do
{
buffer[--i] = baseChars[value % targetBase];
value = value / targetBase;
}
while (value > 0);
char[] result = new char[32 - i];
Array.Copy(buffer, i, result, 0, 32 - i);
return new string(result);
}
The keyword here is "logarithm", here are some suggestive links:
http://www.adug.org.au/MathsCorner/MathsCornerLogs2.htm
http://staff.spd.dcu.ie/johnbcos/download/Fermat%20material/Fermat_Record_Number/HOW_MANY.html
look at the logarithms base P and base Q. Round down to nearest integer.
The logarithm base P can be computed using your favorite base (10 or e): log_P(x) = log_10(x)/log_10(P)
You need to compute the length of the fractional part separately.
For binary to decimal, there are as many decimal digits as there are bits. For example, binary 0.11001101001001 is decimal 0.80133056640625, both 14 digits after the radix point.
For decimal to binary, there are two cases. If the decimal fraction is dyadic, then there are as many bits as decimal digits (same as for binary to decimal above). If the fraction is not dyadic, then the number of bits is infinite.
(You can use my decimal/binary converter to experiment with this.)

Resources