Arithmetic with Chars in Julia - char

Julia REPL tells me that the output of
'c'+2
is 'e': ASCII/Unicode U+0065 (category Ll: Letter, lowercase)
but that the output of
'c'+2-'a'
is 4.
I'm fine with the fact that Chars are identified as numbers via their ASCII code. But I'm confused about the type inference here: why is the first output a char and the second an integer?

Regarding the motivation for conventions, it’s similar to time stamps and intervals. The difference between time stamps is an interval and accordingly you can add an interval to a time stamp to get another time stamp. You cannot, however, add two time stamps because that doesn’t make sense—what is the sum of two points in time supposed to be? The difference between two chars is their distance in code point space (an integer); accordingly you can add an integer to a char to get another char that’s offset by that many code points. You can’t add two chars because adding two code points is not a meaningful operation.
Why allow comparisons of chars and differences in the first place? Because it’s common to use that kind of arithmetic and comparison to implement parsing code, eg for parsing numbers in various bases and formats.

The reason is:
julia> #which 'a' - 1
-(x::T, y::Integer) where T<:AbstractChar in Base at char.jl:227
julia> #which 'a' - 'b'
-(x::AbstractChar, y::AbstractChar) in Base at char.jl:226
Subtraction of Char and integer is Char. This is e.g. 'a' - 1.
However, subtraction of two Char is integer. This is e.g. 'a' - 'b'.
Note that for Char and integer both addition and subtraction are defined, but for two Char only subtraction works:
julia> 'a' + 'a'
ERROR: MethodError: no method matching +(::Char, ::Char)
This indeed can lead to tricky cases at times that rely of order of operations, as in this example:
julia> 'a' + ('a' - 'a')
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
julia> 'a' + 'a' - 'a'
ERROR: MethodError: no method matching +(::Char, ::Char)
Also note that when working with Char and integer you cannot subtract Char from integer:
julia> 2 - 'a'
ERROR: MethodError: no method matching -(::Int64, ::Char)
Motivation:
subtraction of two char - this is sometimes useful if you want to get a relative position of a char with reference to other char, e.g. c - '0' to convert char to its decimal representation if you know char is a digit;
adding or subtracting an integer and char - the same but in reverse e.g. you want to convert digit to char and write '0' + d.
I have been using Julia for years now, and I used this feature maybe once or twice, so I would not say it is super commonly needed.

Related

perl6 min and max of mixed Str and Int arguments

What type gets converted first for min and max routine when the arguments contain mixture of Str and Int ?
To exit type 'exit' or '^D'
> say ("9", "10").max
9
> say ("9", "10").max.WHAT
(Str)
> say (9, "10").max
9
> say (9, "10").max.WHAT
(Int) # if convert to Int first, result should be 10
> say ("9", 10).max
9
> say ("9", 10).max.WHAT
(Str) # if convert to Str first, result should be 9
> say (9, "10").min
10
> say (9, "10").min.WHAT
(Str) # does min and max convert Str or Int differently?
If min or max converts arguments to be the type of the first argument, the results here are still inconsistent.
Thank you for your enlightenment !!!
Both min and max use the cmp infix operator to do the comparisons. If the types differ, then this logic is used (rewritten slightly to be pure Perl 6, whereas the real one uses an internals shortcut):
multi sub infix:<cmp>($a, $b) {
$a<> =:= $b<>
?? Same
!! a.Stringy cmp b.Stringy
}
Effectively, if the two things point to the exact same object, then they are the Same, otherwise stringify both and then compare. Thus:
say 9 cmp 10; # uses the (Int, Int) candidate, giving Less
say "9" cmp "10"; # uses the (Str, Str) candidate, giving More
say 9 cmp "10"; # delegates to "9" cmp "10", giving More
say "9" cmp 10; # delegates to "9" cmp "10", giving More
The conversion to a string is done for the purpose of comparison (as an implementation detail of cmp), and so has no impact upon the value that is returned by min or max, which will be that found in the input list.
Well, jnthn has answered. His answers are always authoritative and typically wonderfully clear and succinct too. This one is no exception. :) But I'd started so I'll finish and publish...
A search for "method min" in the Rakudo sources yields 4 matches of which the most generic is a match in core/Any-iterable-methods.pm6.
It might look difficult to understand but nqp is actually essentially a simple subset of P6. The key thing is it uses cmp to compare each value that is pulled from the sequence of values being compared against the latest minimum (the $pulled cmp $min bit).
Next comes a search for "sub infix:<cmp>" in the Rakudo sources. This yields 14 matches.
These will all have to be looked at to confirm what the source code shows for comparing these various types of value. Note also that the logic is pairwise for each pair which is slightly weird to think about. So if there's three values a, b, and c, each of a different type, then the logic will be that a is the initial minimum, then there'll be a b cmp a which will be whatever cmp logic wins for that combination of types in that order, and then c cmp d where d is whichever won the b cmp a comparison and the cmp logic will be whatever is suitable to that pair of types in that order.
Let's start with the first one -- the match in core/Order.pm6 -- which is presumably a catchall if none of the other matches are more specific:
If both arguments of cmp are numeric, then comparison is a suitable numeric comparison (eg if they're both Ints then comparison is of two arbitrary precision integers).
If one argument is numeric but not the other, then -Inf and Inf are sorted to the start and end but otherwise comparison is done after both arguments are coerced by .Stringyfication.
Otherwise, both arguments are coerced by .Stringyfication.
So, that's the default.
Next one would have to go thru the individual overloads. For example, the next one is the cmp ops in core/allomorphs.pm6 and we see how for allomorphic types (IntStr etc.) comparison is numeric first, then string if that doesn't settle it. Note the comment:
we define cmp ops for these allomorphic types as numeric first, then Str. If you want just one half of the cmp, you'll need to coerce the args
Anyhoo, I see jnthn's posted yet another great answer so it's time to wrap this one. :)

How to display last N characters of a C string?

I'm trying to do some programming homework but I cant figure out how to display the last N characters of a C string. This is my attempt at it so far. I am also supposed to ask for the number of characters and validate it. Any help would be greatly appreciated.
void choice_4(char str[]) {
int characters;
cout << "How many characters from the end of the string do you want to display? ";
cin >> characters;
if (str[characters] != '\0')
cout<<str.Substring(str.length - characters,characters)
}
As usual with homework questions, I won't give a solution but a few hints.
I assume by “validate” you mean check whether the string is long enough. For example, you cannot show the last 12 characters of a string that is only 7 characters long. Your current attempt looking at the n-th byte of the string cannot work, however. If the string is shorter than n bytes, you'll index it out of range and invoke undefined behavior. If it is longer – which is perfectly valid and, in fact, will be the case except for trivial cases – your test wrongly returns that the requested length is invalid.
What you should do instead is computing the length N of the string and then test whether n &leq; N. You can use the standard library function std::strlen to obtain the length of a NUL terminated character array. Or you can loop it yourself and count how many bytes you see until the first NUL byte.
A C-style string is just a pointer to a byte in memory with the implicit contract that any bytes up to the first NUL byte that follow it belong to the string. Therefore, if you add m &leq; N to the pointer, you get the sub-string starting at the m-th (zero-based) byte.
Therefore, in order to get the sub-string with the last n characters of a string with N characters, how do you determine m?
By the way: A NUL byte is a char with the integer value 0. You can encode it as '\0' (as you did) but 0 works perfectly fine, too.

Scope of variables and the digits function

My question is twofold:
1) As far as I understand, constructs like for loops introduce scope blocks, however I'm having some trouble with a variable that is define outside of said construct. The following code depicts an attempt to extract digits from a number and place them in an array.
n = 654068
l = length(n)
a = Int64[]
for i in 1:(l-1)
temp = n/10^(l-i)
if temp < 1 # ith digit is 0
a = push!(a,0)
else # ith digit is != 0
push!(a,floor(temp))
# update n
n = n - a[i]*10^(l-i)
end
end
# last digit
push!(a,n)
The code executes fine, but when I look at the a array I get this result
julia> a
0-element Array{Int64,1}
I thought that anything that goes on inside the for loop is invisible to the outside, unless I'm operating on variables defined outside the for loop. Moreover, I thought that by using the ! syntax I would operate directly on a, this does not seem to be the case. Would be grateful if anyone can explain to me how this works :)
2) Second question is about syntex used when explaining functions. There is apparently a function called digits that extracts digits from a number and puts them in an array, using the help function I get
julia> help(digits)
Base.digits(n[, base][, pad])
Returns an array of the digits of "n" in the given base,
optionally padded with zeros to a specified size. More significant
digits are at higher indexes, such that "n ==
sum([digits[k]*base^(k-1) for k=1:length(digits)])".
Can anyone explain to me how to interpret the information given about functions in Julia. How am I to interpret digits(n[, base][, pad])? How does one correctly call the digits function? I can't be like this: digits(40125[, 10])?
I'm unable to reproduce you result, running your code gives me
julia> a
1-element Array{Int64,1}:
654068
There's a few mistakes and inefficiencies in the code:
length(n) doesn't give the number of digits in n, but always returns 1 (currently, numbers are iterable, and return a sequence that only contain one number; itself). So the for loop is never run.
/ between integers does floating point division. For extracting digits, you´re better off with div(x,y), which does integer division.
There's no reason to write a = push!(a,x), since push! modifies a in place. So it will be equivalent to writing push!(a,x); a = a.
There's no reason to digits that are zero specially, they are handled just fine by the general case.
Your description of scoping in Julia seems to be correct, I think that it is the above which is giving you trouble.
You could use something like
n = 654068
a = Int64[]
while n != 0
push!(a, n % 10)
n = div(n, 10)
end
reverse!(a)
This loop extracts the digits in opposite order to avoid having to figure out the number of digits in advance, and uses the modulus operator % to extract the least significant digit. It then uses reverse! to get them in the order you wanted, which should be pretty efficient.
About the documentation for digits, [, base] just means that base is an optional parameter. The description should probably be digits(n[, base[, pad]]), since it's not possible to specify pad unless you specify base. Also note that digits will return the least significant digit first, what we get if we remove the reverse! from the code above.
Is this cheating?:
n = 654068
nstr = string(n)
a = map((x) -> x |> string |> int , collect(nstr))
outputs:
6-element Array{Int64,1}:
6
5
4
0
6
8

dc(1) and leading zeros

I've got a shell script and I do some calculations with dc(1).
I need to have one number printed with leading zeros; I can't find an easy and straightforward way to do this with dc itself, but the manpage does mention:
Z
Pops a value off the stack, calculates the number of digits
it has (or number of characters, if it
is a string) and pushes that
number. The digit count for a
number does not include any leading
zeros, even if those appear to the
right of the radix point.
Which sort of implies there is an easy and straightforward way ...
I know there are a zillion-and-one method of accomplishing this, and I the script is running happily with one of them. I'm just curious ;-)
Give this a try:
Enter:
[lc1+dsc0nld>b]sb
[sddZdscld>bp]sa
999
12lax
Output:
000000999
Enter:
3lax
Output:
999
The original number is left on the stack after the macro ends. Registers used: a (macro), b (macro), c (count), d (digits).
Explanation:
Macro a does the setup, calls b and prints the original number.
sd - store the number of digits to be output in register d
dZ - duplicate the original number and push the count of its digits
dsc - duplicate that count and store it in register c
ld>b - load the desired digits from register d, if it's greater than the count then call macro b
p - print the original number
Macro b outputs zeros until the count is greater than the number of desired digits
lc1+ - load the count from register c and increment it
dsc - duplicate the count and store it back to register c
0n - output a zero without a newline
ld>b - load the desired digits from register d, if it's still greater than the incremented count then loop back to run macro b again, otherwise it will return to the caller (macro a)
To use an arbitrary leading character:
[lclkZ+dsclknld>b]sb
[sksddZdscld>bp]sa
999 14 [ ] lax
999
[abc] 12 [-] lax
---------abc
In addition to the other registers, it uses k to store the character (which could actually be more than one):
[XYZ] 6 [def] lax
defXYZ
8 [ab] lax
abababXYZ
4 [ghjikl] lax
ghijklXYZ
The fill strings are used whole so the result may be longer than you asked for if the desired length number is larger than the length of the original string, but smaller than the length of the two strings (or integer multiples).
Here is an example, albeit inelegant. This prints out 999 with 2 leading zeros. You'll need to duplicate the code for more digits.
#Push number to print on stack
999
# macro to print a zero
[[0]P]sa
# Print a zero if less than 5 digits
dZ5>a
# Print a zero if less than 4 digits
dZ4>a
# Print a zero if less than 3 digits
dZ3>a
# Print a zero if less than 2 digits
dZ2>a
# Print out number
p
The solutions given work for decimal numbers. For hex (as well as for any other input) radix use. e.g.
c=18A; dc <<< "16i${c^^}d0r[r1+r10/d0<G]dsGx4+r-[1-[0]nlGx]sY[d0<Y]dsGxr10op"
^ radix formatted length ^ ^ leading symbol
You may also try
c=19; dc <<< "10i${c^^}d0r[r1+r10/d0<G]dsGx4+r-[1-[_]nlGx]sY[d0<Y]dsGxr10op"
c=607; dc <<< " 8i${c^^}d0r[r1+r10/d0<G]dsGx8+r-[1-[*]nlGx]sY[d0<Y]dsGxr10op"
c=1001; dc <<< " 2i${c^^}d0r[r1+r10/d0<G]dsGx8+r-[1-[ ]nlGx]sY[d0<Y]dsGxr10op"
G and Y are the registers used. First the number of digits is counted on the stack, then the number of symbols to be printed.
c=607; dc <<< "8i${c^^}d0r[r1+r10/d0<G]dsGx f 8+r-[1-[*]nlGx]sY f [d0<Y]dsGxr10op"

How to compute one's complement using Ruby's bitwise operators?

What I want:
assert_equal 6, ones_complement(9) # 1001 => 0110
assert_equal 0, ones_complement(15) # 1111 => 0000
assert_equal 2, ones_complement(1) # 01 => 10
the size of the input isn't fixed as in 4 bits or 8 bits. rather its a binary stream.
What I see:
v = "1001".to_i(2) => 9
There's a bit flipping operator ~
(~v).to_s(2) => "-1010"
sprintf("%b", ~v) => "..10110"
~v => -10
I think its got something to do with one bit being used to store the sign or something... can someone explain this output ? How do I get a one's complement without resorting to string manipulations like cutting the last n chars from the sprintf output to get "0110" or replacing 0 with 1 and vice versa
Ruby just stores a (signed) number. The internal representation of this number is not relevant: it might be a FixNum, BigNum or something else. Therefore, the number of bits in a number is also undefined: it is just a number after all. This is contrary to for example C, where an int will probably be 32 bits (fixed).
So what does the ~ operator do then? Wel, just something like:
class Numeric
def ~
return -self - 1
end
end
...since that's what '~' represents when looking at 2's complement numbers.
So what is missing from your input statement is the number of bits you want to switch: a 32-bits ~ is different from a generic ~ like it is in Ruby.
Now if you just want to bit-flip n-bits you can do something like:
class Numeric
def ones_complement(bits)
self ^ ((1 << bits) - 1)
end
end
...but you do have to specify the number of bits to flip. And this won't affect the sign flag, since that one is outside your reach with XOR :)
It sounds like you only want to flip four bits (the length of your input) - so you probably want to XOR with 1111.
See this question for why.
One problem with your method is that your expected answer is only true if you only flip the four significant bits: 1001 -> 0110.
But the number is stored with leading zeros, and the ~ operator flips all the leading bits too: 00001001 -> 11110110. Then the leading 1 is interpreted as the negative sign.
You really need to specify what the function is supposed to do with numbers like 0b101 and 0b11011 before you can decide how to implement it. If you only ever want to flip 4 bits you can do v^0b1111, as suggested in another answer. But if you want to flip all significant bits, it gets more complicated.
edit
Here's one way to flip all the significant bits:
def maskbits n
b=1
prev=n;
mask=prev|(prev>>1)
while (mask!=prev)
prev=mask;
mask|=(mask>>(b*=2))
end
mask
end
def ones_complement n
n^maskbits(n)
end
This gives
p ones_complement(9).to_s(2) #>>"110"
p ones_complement(15).to_s(2) #>>"0"
p ones_complement(1).to_s(2) #>>"0"
This does not give your desired output for ones_compliment(1), because it treats 1 as "1" not "01". I don't know how the function could infer how many leading zeros you want without taking the width as an argument.
If you're working with strings you could do:
s = "0110"
s.gsub("\d") {|bit| bit=="1"?"0":"1"}
If you're working with numbers, you'll have to define the number of significant bits because:
0110 = 6; 1001 = 9;
110 = 6; 001 = 1;
Even, ignoring the sign, you'll probably have to handle this.
What you are doing (using the ~) operator, is indeed a one's complement. You are getting those values that you are not expecting because of the way the number is interpreted by Ruby.
What you actually need to do will depend on what you are using this for. That is to say, why do you need a 1's complement?
Remember that you are getting the one's complement right now with ~ if you pass in a Fixnum: the number of bits which represent the number is a fixed quantity in the interpreter and thus there are leading 0's in front of the binary representation of the number 9 (binary 1001). You can find this number of bits by examining the size of any Fixnum. (the answer is returned in bytes)
1.size #=> 4
2147483647.size #=> 4
~ is also defined over Bignum. In this case it behaves as if all of the bits which are specified in the Bignum were inverted, and then if there were an infinite string of 1's in front of that Bignum. You can, conceivably shove your bitstream into a Bignum and invert the whole thing. You will however need to know the size of the bitstream prior to inversion to get a useful result out after it is inverted.
To answer the question as you pose it right off the bat, you can find the largest power of 2 less than your input, double it, subtract 1, then XOR the result of that with your input and always get a ones complement of just the significant bits in your input number.
def sig_ones_complement(num)
significant_bits = num.to_s(2).length
next_smallest_pow_2 = 2**(significant_bits-1)
xor_mask = (2*next_smallest_pow_2)-1
return num ^ xor_mask
end

Resources