Inequality comparison between strings in Oracle - oracle

SELECT LAST_NAME FROM Employees
WHERE last_name < 'King';
In the book 'SQL Fundamentals I Exam Guide' it says that on the comparison LAST_NAME < 'King' occurs the following convertion in the NLS settings assuming a US7ASCII database character set with AMERICAN NLS settings:
K + i + n + g = 75 + 105 + 110 + 103 = 393.
Then for each row in the table EMPLOYEES table, the LAST_NAME column is similarly converted to a numeric value. If this value is less than 393, then the row is selected.
But when i execute the SELECT command above, on SQL*PLUS, it returns rows(for example 'Greenberg', 'Bernestein') that does not follow the rule mentioned on the book. Are there any settings that i need to make to obtain rows that satisfy that rule?

This rule is certainly not valid. If it was, then you could swap the characters and you would still get the same result 393. But character ordering matters when comparing words.
To get a value appropriate for comparison you would have to calculate like this:
K + i + n + g = ((75 × 256 + 105) × 256 + 110) × 256 + 103
But you would exceed the valid range of numeric values for long words. For 7-bit ASCII codes (strictly in the range 0 ... 127) you could also multiply with 128 instead of 256.
--
In realty, the values are compared one by one, i.e (in pseudo code):
valueOf(last_name[0]) < 75 OR
valueOf(last_name[1]) < 105 OR
valueOf(last_name[2]) < 110 OR
valueOf(last_name[3]) < 103
... where the comparisions stop at the first inequality encountered or if the end of one of the words is reached, then the lengths of the words are compared.
In other words, the characters of the 2 words are compared character by character until two different characters are encountered. Then the comparison of these two characters yields the final result.
Take 'Kelvin' < 'King' as an example:
'K' < 'K' ==> false
'e' < 'i' ==> true
final result = true
Other example 'King' < 'Kelvin' (words are swapped):
'K' < 'K' ==> false
'i' < 'e' ==> false, the characters are not equal, therefore stop
final result = false
Other example 'be' < 'begin':
'b' < 'b' ==> false
'e' < 'e' ==> false
end of first word reached, length('be') < length('begin') ==> true
final result = true
The actual comparison of two characters is performed by comparing their numeric values, as you have mentioned already.

If that is actually what the book says, the book is wildly and frighteningly incorrect. If we're talking about the Oracle Press book, I would strongly suspect that you're misreading the explanation because I am hard-pressed to imagine how that mistake could make it through without getting caught by the author, the editor, or a reviewer.
To compare two strings, you do exactly the same thing that you do when you're putting strings in alphabetical order by hand. The string "B" comes after the string "All My Data" and before the string "Changes Constantly". You take the first character of the string and look at the decimal representation ('A' is 65, 'B' is 66, and 'C' is 67) and order based on that. If there are ties, say "All Data" and "All Indexes", you move on to the second character and compare until you can break the tie 'D' is 68 which is less than 'I' which is 73 so "All Data" < "All Indexes".

Related

Generating unique non-similar codes with validation

I know there are similar questions so please bear with me.
I wish to generate approximately 50K codes for people to place orders - ideally no longer than 10 chars and can include letters and digits. They are not discount codes so I am not worried about people trying to guess codes. What I am worried about is somebody accidentally entering a wrong digit (ie 1 instead of l or 0 instead of O) and then the system will fail if by chance it is also a valid code.
As the codes are constantly being generated, ideally I don't want a table look-up validation, but an formula (eg if it contains an A the number element should be divisable by 13 or some such).
Select some alphabet (made of digits and letters) of size B such that there are no easy confusions. Assign every symbol a value from 0 to B-1, preferably in random order. Now you can use sequential integers, convert them to base B and assign the symbols accordingly.
For improved safety, you can append one or two checksum symbols for error detection.
With N=34 (ten digits and twenty four letters 9ABHC0FVW3YGJKL1N2456XRTS78DMPQEUZ), 50K codes require codes of length only four.
If you don't want the generated codes to be consecutive, you can scramble the bits before the change of base.
Before you start generating random combinations of characters, there are a couple of things you need to bear in mind:
1. Profanity
If your codes include every possible combination of four letters from the alphabet, they will inevitably include every four-letter word. You need to be absolutely sure that you never ask customers to enter anything foul or offensive.
2. Human error
People often make mistakes when entering codes. Confusing similar characters like O and 0 is only part of the problem. Other common mistakes include transposing adjacent characters (e.g. the → teh) and hitting the wrong key on the keyboard (e.g., and → amd)
To avoid these issues, I would recommend that you generate codes from a restricted alphabet that has no possibility of spelling out anything unfortunate, and use the Luhn algorithm or something similar to catch accidental data entry errors.
For example, here's some Python code that generates hexadecimal codes using an alphabet of 16 characters with no vowels. It uses a linear congruential generator step to avoid outputting sequential numbers, and includes a base-16 Luhn checksum to detect input errors. The code2int() function will return −1 if the checksum is incorrect. Otherwise it will return an integer. If this integer is less than your maximum input value (e.g., 50,000), then you can assume the code is correct.
def int2code(n):
# Generates a 7-character code from an integer value (n > 0)
alph = 'BCDFGHJKMNPRTWXZ'
mod = 0xfffffd # Highest 24-bit prime
mul = 0xc36572 # Randomly selected multiplier
add = 0x5d48ca # Randomly selected addend
# Convert the input number `n` into a non-sequential 6-digit
# hexadecimal code by means of a linear congruential generator
c = "%06x" % ((n * mul + add) % mod)
# Replace each hex digit with the corresponding character from alph.
# and generate a base-16 Luhn checksum at the same time
luhn_sum = 0
code = ''
for i in range(6):
d = int(c[i], 16)
code += alph[d]
if i % 2 == 1:
t = d * 15
luhn_sum += (t & 0x0f) + (t >> 4)
else:
luhn_sum += d
# Append the checksum
checksum = (16 - (luhn_sum % 16)) % 16
code += alph[checksum]
return code
def code2int(code):
# Converts a 7-character code back into an integer value
# Returns -1 if the input is invalid
alph = 'BCDFGHJKMNPRTWXZ'
mod = 0xfffffd # Highest 24-bit prime
inv = 0x111548 # Modular multiplicative inverse of 0xc36572
sub = 0xa2b733 # = 0xfffffd - 0x5d48ca
if len(code) != 7:
return -1
# Treating each character as a hex digit, convert the code back into
# an integer value. Also make sure the Luhn checksum is correct
luhn_sum = 0
c = 0
for i in range(7):
if code[i] not in alph:
return -1
d = alph.index(code[i])
c = c * 16 + d
if i % 2 == 1:
t = d * 15
luhn_sum += (t & 0x0f) + (t >> 4)
else:
luhn_sum += d
if luhn_sum % 16 != 0:
return -1
# Discard the last digit (corresponding to the Luhn checksum), and undo
# the LCG calculation to retrieve the original input value
c = (((c >> 4) + sub) * inv) % mod
return c
# Test
>>> print('\n'.join([int2code(i) for i in range(10)]))
HWGMTPX
DBPXFZF
XGCFRCN
PKKNDJB
JPWXNRK
DXGGCBR
ZCPNMDD
RHBXZKN
KMKGJTZ
FRWNXCH
>>> print(all([code2int(int2code(i)) == i for i in range(50000)]))
True

Separating a number into groups of three

consider this number:785462454105,I need an algorithm that first separates the number into groups with maximum length of three(starting from the right side) ,it would look something like:
a = 105
b = 454
c = 462
d = 785
Of course I know I can convert the number to string but I want to do this without any conversion. Also I'm not allowed to use any arrays and any special methods or class which exist in the programming language use(I'm using java but as I said I'm not allowed to use the functions).The only tools I have are loops,conditional clauses and mathematical,arithmetic and logical operators.
Also it is possible to get 454000 or 462000000 out using loops but how can I get rid of the zeros?
Note that something like 1234 should turn to:
a = 234
b = 1
It's easy to get a group of last 3 digits if you take a remainder while dividing by 1000.
785462454105 % 1000 == 105
Then you could get rid of last 3 digits dividing by 1000:
785462454105 / 1000 == 785462454
Repeat this in a loop until the number becomes zero and you're done.
The only issue left is to print leading zeros:
123045 % 1000 = 45 but we want to print 045.
Usually you'll need a separate inner loop, for example, to count decimal digits (dividing by 10 until it becomes zero) and then print number of missing zeros (it's equal to number of digits you want minus number of digits in your number).
But here it's a simple case, you could just sole it via couple of ifs:
long a = 785462454105;
while (a > 0) {
long x = a % 1000;
a /= 1000;
if (x < 10) {
System.out.print("00");
} else if (x < 100) {
System.out.print("0");
}
System.out.println(x);
}

Feasibility of a bit modified version of Rabin Karp algorithm

I am trying to implement a bit modified version of Rabin Karp algorithm. My idea is if I get a hash value of the given pattern in terms of weight associated with each letter, then I don't have to worry about anagrams so I can just pick up a part of the string, calculate its hash value and compare with hash value of the pattern unlike traditional approach where hashvalue of both part of string and pattern is calculated and then checked whether they are actually similar or it could be an anagram. Here is my code below
string = "AABAACAADAABAABA"
pattern = "AABA"
#string = "gjdoopssdlksddsoopdfkjdfoops"
#pattern = "oops"
#get hash value of the pattern
def gethashp(pattern):
sum = 0
#I mutiply each letter of the pattern with a weight
#So for eg CAT will be C*1 + A*2 + T*3 and the resulting
#value wil be unique for the letter CAT and won't match if the
#letters are rearranged
for i in range(len(pattern)):
sum = sum + ord(pattern[i]) * (i + 1)
return sum % 101 #some prime number 101
def gethashst(string):
sum = 0
for i in range(len(string)):
sum = sum + ord(string[i]) * (i + 1)
return sum % 101
hashp = gethashp(pattern)
i = 0
def checkMatch(string,pattern,hashp):
global i
#check if we actually get first four strings(comes handy when you
#are nearing the end of the string)
if len(string[:len(pattern)]) == len(pattern):
#assign the substring to string2
string2 = string[:len(pattern)]
#get the hash value of the substring
hashst = gethashst(string2)
#if both the hashvalue matches
if hashst == hashp:
#print the index of the first character of the match
print("Pattern found at {}".format(i))
#delete the first character of the string
string = string[1:]
#increment the index
i += 1 #keep a count of the index
checkMatch(string,pattern,hashp)
else:
#if no match or end of string,return
return
checkMatch(string,pattern,hashp)
The code is working just fine. My question is this a valid way of doing it? Can there be any instance where the logic might fail? All the Rabin Karp algorithms that I have come across doesn't use this logic instead for every match, it furthers checks character by character to ensure it's not an anagram. So is it wrong if I do it this way? My opinion is with this code as soon as the hash value matches, you never have to further check both the strings character by character and you can just move on to the next.
It's not necessary that only anagrams collide with the hash value of the pattern. Any other string with same hash value could also collide. Same hash value can act as a liar, so character by character match is required.
For example in your case, you are taking mod 100. Take any distinct 101 patterns, then by the Pigeonhole principle, at least two of them would be having the same hash. If you use one of them as a pattern then the presence of other string would err your output if you avoid character match.
Moreover, even with the hash you used, two anagrams can have the same hash value which can be obtained by solving two linear equations.
For example,
DCE = 4*1 + 3*2 + 5*3 = 25
CED = 3*1 + 5*2 + 4*3 = 25

How to affect only letters, not punctuation in Caesar Cipher code

I am trying to write a Caesar Cipher in Ruby and I hit a snag when trying to change only the letters to a numerical values and not the punctuation marks.
Here is my script so far:
def caesar_cipher(phrase, key)
array = phrase.split("")
number = array.map {|n| n.upcase.ord - (64-key)}
puts number
end
puts "Script running"
caesar_cipher("Hey what's up", 1)
I tried to use select but I couldn't figure out how to select only the punctuation marks or only the letters.
Use String#gsub to match only the characters that you want to replace. In this case it's the letters of the alphabet, so you'll use the regular expression /[a-z]/i.
You can pass a block to gsub which will be called for each match in the string, and the return value of the block will be used as the replacement. For example:
"Hello, world!".gsub(/[a-z]/i) {|chr| (chr.ord + 1).chr }
# => Ifmmp, xpsme!"
Here's a version of your Caesar cipher method that works pretty well:
BASE_ORD = 'A'.ord
def caesar_cipher(phrase, key)
phrase.gsub(/[a-z]/i) do |letter|
orig_pos = letter.upcase.ord - BASE_ORD
new_pos = (orig_pos + key) % 26
(new_pos + BASE_ORD).chr
end
end
caesar_cipher("Hey, what's up?", 1) # => "IFZ, XIBU'T VQ?"
Edit:
% is the modulo operator. Here it's used to make new_pos "wrap around" to the beginning of the alphabet if it's greater than 25.
For example, suppose letter is "Y" and key is 5. The position of "Y" in the alphabet is 24 (assuming "A" is 0), so orig_pos + key will be 29, which is past the end of the alphabet.
One solution would be this:
new_pos = orig_pos + key
if new_pos > 25
new_pos = new_pos - 26
end
This would make new_pos 3, which corresponds to the letter "D," the correct result. We can get the same result more efficiently, however, by taking "29 modulo 26"—expressed in Ruby (and many other languages) as 29 % 26—which returns the remainder of the operation 29 ÷ 26. (because there are 26 letters in the alphabet). 29 % 26 is 3, the same result as above.
In addition to constraining a number to a certain range, as we do here, the modulo operator is also often used to test whether a number is divisible by another number. For example, you can check if n is divisible by 3 by testing n % 3 == 0.

Why do numeric string comparisons give unexpected results?

'10:' < '1:'
# => true
Can someone explain me why the result in the above example is true? If I just compare '1:' and '2:' I get the result expected:
'1:' < '2:'
# => true
Strings are compared character by character.
When you compare 1: vs 2:, the comparison begins with 2 vs 1, and the comparison stops there with the expected result.
When you compare 1: vs 10:, the comparison begins with 1 vs 1, and since it is a tie, the comparison moves on to the next comparison, which is : vs 0, and the comparison stops there with the result that you have found surprising (given your expectation that the integers within the strings would be compared).
To do the comparison you expect, use to_i to convert both operands to integers.
It is character by character comparison in ASCII.
'10:' < '1:' is (49 < 49) || (48 < 58) || (58 < ?)
#=> true
'1:' < '2:' is (49 < 50) || (58 < 58)
#=> true
Left to Right boolean check is used and check breaks where true is found.
Note: It is just my observation over various example patterns.
The first character of each of your two strings are the same. And as Dave said in the comments, the second character of the first, '0', is less than ':', so the first string is less than the second.
Because the ASCII code for 0 is 48, which is smaller than the ASCII code for :, which is 58.

Resources