Why do numeric string comparisons give unexpected results?

Why do numeric string comparisons give unexpected results? - ruby

'10:' < '1:'
# => true
Can someone explain me why the result in the above example is true? If I just compare '1:' and '2:' I get the result expected:
'1:' < '2:'
# => true

Strings are compared character by character.
When you compare 1: vs 2:, the comparison begins with 2 vs 1, and the comparison stops there with the expected result.
When you compare 1: vs 10:, the comparison begins with 1 vs 1, and since it is a tie, the comparison moves on to the next comparison, which is : vs 0, and the comparison stops there with the result that you have found surprising (given your expectation that the integers within the strings would be compared).
To do the comparison you expect, use to_i to convert both operands to integers.

It is character by character comparison in ASCII.
'10:' < '1:' is (49 < 49) || (48 < 58) || (58 < ?)
#=> true
'1:' < '2:' is (49 < 50) || (58 < 58)
#=> true
Left to Right boolean check is used and check breaks where true is found.
Note: It is just my observation over various example patterns.

The first character of each of your two strings are the same. And as Dave said in the comments, the second character of the first, '0', is less than ':', so the first string is less than the second.

Because the ASCII code for 0 is 48, which is smaller than the ASCII code for :, which is 58.

Related

Writing a function that returns true if given string has exactly 6 characters

I am trying to write a function that returns true or false if a given string has exactly 6 consecutive characters with the same value. If the string has more or less than 6, it will return false:
I am not allowed to use lists, sets or import any packages. I am only restricted to while loops, for loops, and utilizing basic mathematical operations
Two example runs are shown below:
Enter a string: 367777776
True
Enter a string: 3677777777776
False
Note that although I entered numbers, it is actually a string within the function argument for example: consecutive('3777776')
I tried to convert the string into an ASCII table and then try and filter out the numbers there. However, I
def consecutive(x):
storage= ' '
acc=0
count=0
for s in x:
storage+= str(ord(s)) + ' '
acc+=ord(s)
if acc == acc:
count+=1
for s in x-1:
return count
My intention is to compare the previous character's ASCII code to the current character's ASCII code in the string. If the ASCII doesnt match, I will add an accumulator for it. The accumulator will list the number of duplicates. From there, I will implement an if-else statement to see if it is greater or less than 6 However, I have a hard time translating my thoughts into python code.
Can anyone assist me?

That's a pretty good start!
A few comments:
Variables storage and acc play the same role, and are a little more complicated than they have to be. All you want to know when you arrive at character s is whether or not s is identical to the previous character. So, you only need to store the previously seen character.
Condition acc == acc is always going to be True. I think you meant acc == s?
When you encounter an identical character, you correctly increase the count with count += 1. However, when we change characters, you should reset the count.
With these comments in mind, I fixed your code, then blanked out a few parts for you to fill. I've also renamed storage and acc to previous_char which I think is more explicit.
def has_6_consecutive(x):
previous_char = None
count = 0
for s in x:
if s == previous_char:
???
elif count == 6:
???
else:
???
previous_char = ???
???

You could use recursion. Loop over all the characters and for each one check to see of the next 6 are identical. If so, return true. If you get to the end of the array (or even within 6 characters of the end), return false.
For more info on recursion, check this out: https://www.programiz.com/python-programming/recursion

would something like this be allowed?
def consecF(n):
consec = 1
prev = n[0]
for i in n:
if i==prev:
consec+=1
else:
consec=1
if consec == 6:
return True
prev = i
return False
n = "12111123333221"
print(consecF(n))

You can try a two pointer approach, where the left pointer is fixed at the first instance of some digit and the right one is shifted as long as the digit is seen.
def consecutive(x):
left = 0
while left != len(x):
right = left
while right < len(x) and x[right] == x[left]:
right += 1
length = (right - 1) - left + 1 # from left to right - 1 inclusive, x[left] repeated
if length == 6: # found desired length
return True
left = right
return False # no segment found
tests = [
'3677777777776',
'367777776'
]
for test in tests:
print(f"{test}: {consecutive(test)}")
Output
3677777777776: False
367777776: True

You should store the current sequence of repeated chars.
def consecutive(x):
sequencechar = ' '
repetitions = 0
for ch in x:
if ch != sequencechar:
if repetitions == 6:
break
sequencechar = ch
repetitions = 1
else:
repetitions += 1
return repetitions == 6
If I could, I would not have given the entire solution, but this still is a simple problem. However one has to take care of some points.
As you see the current sequence is stored, and when the sequence is ended and a new starts, on having found a correct sequence it breaks out of the for loop.
Also after the for loop ends normally, the last sequence is checked (which was not done in the loop).

Generating random number of length 6 with SecureRandom in Ruby

I tried SecureRandom.random_number(9**6) but it sometimes returns 5 and sometimes 6 numbers. I'd want it to be a length of 6 consistently. I would also prefer it in the format like SecureRandom.random_number(9**6) without using syntax like 6.times.map so that it's easier to be stubbed in my controller test.

You can do it with math:
(SecureRandom.random_number(9e5) + 1e5).to_i
Then verify:
100000.times.map do
(SecureRandom.random_number(9e5) + 1e5).to_i
end.map { |v| v.to_s.length }.uniq
# => [6]
This produces values in the range 100000..999999:
10000000.times.map do
(SecureRandom.random_number(9e5) + 1e5).to_i
end.minmax
# => [100000, 999999]
If you need this in a more concise format, just roll it into a method:
def six_digit_rand
(SecureRandom.random_number(9e5) + 1e5).to_i
end

To generate a random, 6-digit string:
# This generates a 6-digit string, where the
# minimum possible value is "000000", and the
# maximum possible value is "999999"
SecureRandom.random_number(10**6).to_s.rjust(6, '0')
Here's more detail of what's happening, shown by breaking the single line into multiple lines with explaining variables:
# Calculate the upper bound for the random number generator
# upper_bound = 1,000,000
upper_bound = 10**6
# n will be an integer with a minimum possible value of 0,
# and a maximum possible value of 999,999
n = SecureRandom.random_number(upper_bound)
# Convert the integer n to a string
# unpadded_str will be "0" if n == 0
# unpadded_str will be "999999" if n == 999999
unpadded_str = n.to_s
# Pad the string with leading zeroes if it is less than
# 6 digits long.
# "0" would be padded to "000000"
# "123" would be padded to "000123"
# "999999" would not be padded, and remains unchanged as "999999"
padded_str = unpadded_str.rjust(6, '0')

Docs to Ruby SecureRand, lot of cool tricks here.
Specific to this question I would say: (SecureRandom.random_number * 1000000).to_i
Docs: random_number(n=0)
If 0 is given or an argument is not given, ::random_number returns a float: 0.0 <= ::random_number < 1.0.
Then multiply by 6 decimal places (* 1000000) and truncate the decimals (.to_i)
If letters are okay, I prefer .hex:
SecureRandom.hex(3) #=> "e15b05"
Docs:
hex(n=nil)
::hex generates a random hexadecimal string.
The argument n specifies the length, in bytes, of the random number to
be generated. The length of the resulting hexadecimal string is twice
n.
If n is not specified or is nil, 16 is assumed. It may be larger in
future.
The result may contain 0-9 and a-f.
Other options:
SecureRandom.uuid #=> "3f780c86-6897-457e-9d0b-ef3963fbc0a8"
SecureRandom.urlsafe_base64 #=> "UZLdOkzop70Ddx-IJR0ABg"
For Rails apps creating a barcode or uid with an object you can do something like this in the object model file:
before_create :generate_barcode
def generate_barcode
begin
return if self.barcode.present?
self.barcode = SecureRandom.hex.upcase
end while self.class.exists?(barcode: barcode)
end

SecureRandom.random_number(n) gives a random value between 0 to n. You can achieve it using rand function.
2.3.1 :025 > rand(10**5..10**6-1)
=> 742840
rand(a..b) gives a random number between a and b. Here, you always get a 6 digit random number between 10^5 and 10^6-1.

Inequality comparison between strings in Oracle

SELECT LAST_NAME FROM Employees
WHERE last_name < 'King';
In the book 'SQL Fundamentals I Exam Guide' it says that on the comparison LAST_NAME < 'King' occurs the following convertion in the NLS settings assuming a US7ASCII database character set with AMERICAN NLS settings:
K + i + n + g = 75 + 105 + 110 + 103 = 393.
Then for each row in the table EMPLOYEES table, the LAST_NAME column is similarly converted to a numeric value. If this value is less than 393, then the row is selected.
But when i execute the SELECT command above, on SQL*PLUS, it returns rows(for example 'Greenberg', 'Bernestein') that does not follow the rule mentioned on the book. Are there any settings that i need to make to obtain rows that satisfy that rule?

This rule is certainly not valid. If it was, then you could swap the characters and you would still get the same result 393. But character ordering matters when comparing words.
To get a value appropriate for comparison you would have to calculate like this:
K + i + n + g = ((75 × 256 + 105) × 256 + 110) × 256 + 103
But you would exceed the valid range of numeric values for long words. For 7-bit ASCII codes (strictly in the range 0 ... 127) you could also multiply with 128 instead of 256.
--
In realty, the values are compared one by one, i.e (in pseudo code):
valueOf(last_name[0]) < 75 OR
valueOf(last_name[1]) < 105 OR
valueOf(last_name[2]) < 110 OR
valueOf(last_name[3]) < 103
... where the comparisions stop at the first inequality encountered or if the end of one of the words is reached, then the lengths of the words are compared.
In other words, the characters of the 2 words are compared character by character until two different characters are encountered. Then the comparison of these two characters yields the final result.
Take 'Kelvin' < 'King' as an example:
'K' < 'K' ==> false
'e' < 'i' ==> true
final result = true
Other example 'King' < 'Kelvin' (words are swapped):
'K' < 'K' ==> false
'i' < 'e' ==> false, the characters are not equal, therefore stop
final result = false
Other example 'be' < 'begin':
'b' < 'b' ==> false
'e' < 'e' ==> false
end of first word reached, length('be') < length('begin') ==> true
final result = true
The actual comparison of two characters is performed by comparing their numeric values, as you have mentioned already.

If that is actually what the book says, the book is wildly and frighteningly incorrect. If we're talking about the Oracle Press book, I would strongly suspect that you're misreading the explanation because I am hard-pressed to imagine how that mistake could make it through without getting caught by the author, the editor, or a reviewer.
To compare two strings, you do exactly the same thing that you do when you're putting strings in alphabetical order by hand. The string "B" comes after the string "All My Data" and before the string "Changes Constantly". You take the first character of the string and look at the decimal representation ('A' is 65, 'B' is 66, and 'C' is 67) and order based on that. If there are ties, say "All Data" and "All Indexes", you move on to the second character and compare until you can break the tie 'D' is 68 which is less than 'I' which is 73 so "All Data" < "All Indexes".

Comparing numerical strings and formatting integers with leading zeros in Ruby

how i use the number 05798300 exact in ruby?
When i enter:
2.0.0p247 :031 > 05798300
SyntaxError: (irb):31: Invalid octal digit
or
2.0.0p247 :001 > 04704110
=> 1280072
I need check if the number: 04704110 is between 0100000000 and 09000000.

If you need to keep leading zeros, store your postal codes as strings and you can compare them as such:
test = '04704110'
lower = '01000000' #assuming eight digits
upper = '09000000'
p lower < test && test < upper
#=> true
Otherwise, compare them as integers but format them when you print them, adding leading zeros:
test = 4704110
p "%08d" % test
#=> "04704110"

Anomalous behavior while comparing a unicode character to a unicode character range

For some reason, I'm getting unexpected results in the range comparisons of unicode characters.
To summarize, in my minimized test code, ("\u1000".."\u1200") === "\u1100" is false, where I would expect it to be true -- while the same test against "\u1001" is true as expected. I find this utterly incomprehensible. The results of the < operator are also interesting -- they contradict ===.
The following code is a good minimal illustration:
# encoding: utf-8
require 'pp'
a = "\u1000"
b = "\u1200"
r = (a..b)
x = "\u1001"
y = "\u1100"
pp a, b, r, x, y
puts "a < x = #{a < x}"
puts "b > x = #{b > x}"
puts "a < y = #{a < y}"
puts "b > y = #{b > y}"
puts "r === x = #{r === x}"
puts "r === y = #{r === y}"
I would naively expect that both of the === operations would produce "true" here. However, the actual output of running this program is:
ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-darwin11.3.0]
"\u1000"
"\u1200"
"\u1000".."\u1200"
"\u1001"
"\u1100"
a < x = true
b > x = true
a < y = true
b > y = true
r === x = true
r === y = false
Could someone enlighten me?
(Note I'm on 1.9.3 on Mac OS X, and I'm explicitly setting the encoding to utf-8.)

ACTION:
I've submitted this behavior as bug #6258 to ruby-lang.
There's something odd about the collation order in that range of characters
irb(main):081:0> r.to_a.last.ord.to_s(16)
=> "1036"
irb(main):082:0> r.to_a.last.succ.ord.to_s(16)
=> "1000"
irb(main):083:0> r.min.ord.to_s(16)
=> "1000"
irb(main):084:0> r.max.ord.to_s(16)
=> "1200"
The min and max for the range are the expected values from your input, but if we turn the range into an array, the last element is "\u1036" and it's successor is "\u1000". Under the covers, Range#=== must be enumerating the String#succ sequence rather than simple bound checking on min and max.
If we look at the source (click toggle) for Range#=== we see it dispatches to Range#include?. Range#include? source shows special handling for strings -- if answer can be determined by string length alone, or all the invloved strings are ASCII, we get simple bounds checks, otherwise we dispatch to super, which means the #include? gets answered by Enumerable#include? which enumerates using Range#each which again has special handling for string and dispatches to String#upto which enumerates with String#succ.
String#succ has a bunch of special handling when the string contains is_alpha or is_digit numbers (which should not be true for U+1036), otherwise it increments the final char using enc_succ_char. At this point I lose the trail, but presumably this calculates a successor using the encoding and collation information associated with the string.
BTW, as a work around, you could use a range of integer ordinals and test against ordinals if you only care about single chars. eg:
r = (a.ord..b.ord)
r === x.ord
r === y.ord

Looks like Range doesn't mean what we think it means.
What I think is happening is that you're creating is a Range that is trying to include letters, digits, and punctuation. Ruby is unable to do this and is not "understanding" that you want essentially an array of code points.
This is causing the Range#to_a method to fall apart:
("\u1000".."\u1099").to_a.size #=> 55
("\u1100".."\u1199").to_a.size #=> 154
("\u1200".."\u1299").to_a.size #=> 73
The zinger is when you put all three together:
("\u1000".."\u1299").to_a.size  #=> 55
Ruby 1.8.7 works as expected-- as Matt points out in the comments, "\u1000" is just the literal "u1000" because no Unicode.
The string#succ C source code doesn't just return the next codepooint:
Returns the successor to <i>str</i>. The successor is calculated by
incrementing characters starting from the rightmost alphanumeric (or
the rightmost character if there are no alphanumerics) in the
string. Incrementing a digit always results in another digit, and
incrementing a letter results in another letter of the same case.
Incrementing nonalphanumerics uses the underlying character set's
collating sequence.
Range is doing something different than just next, next, next.
Range with these characters does ACSII sequence:
('8'..'A').to_a
=> ["8", "9", ":", ";", "<", "=", ">", "?", "#", "A"]
But using #succ is totally different:
'8'.succ
=> '9'
'9'.succ
=> '10' # if we were in a Range.to_a, this would be ":"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Why do numeric string comparisons give unexpected results? - ruby

'10:' < '1:' # => true Can someone explain me why the result in the above example is true? If I just compare '1:' and '2:' I get the result expected: '1:' < '2:' # => true

It is character by character comparison in ASCII. '10:' < '1:' is (49 < 49) || (48 < 58) || (58 < ?) #=> true '1:' < '2:' is (49 < 50) || (58 < 58) #=> true Left to Right boolean check is used and check breaks where true is found. Note: It is just my observation over various example patterns.

The first character of each of your two strings are the same. And as Dave said in the comments, the second character of the first, '0', is less than ':', so the first string is less than the second.

Because the ASCII code for 0 is 48, which is smaller than the ASCII code for :, which is 58.

Related

Writing a function that returns true if given string has exactly 6 characters

Generating random number of length 6 with SecureRandom in Ruby

Inequality comparison between strings in Oracle

Comparing numerical strings and formatting integers with leading zeros in Ruby

Anomalous behavior while comparing a unicode character to a unicode character range

Categories

Resources