strconv.ParseInt fails if number starts with 0 - go

I'm currently having issues parsing some numbers starting with 0 in Go.
fmt.Println(strconv.ParseInt("0491031", 0, 64))
0 strconv.ParseInt: parsing "0491031": invalid syntax
GoPlayground: https://go.dev/play/p/TAv7IEoyI8I
I think this is due to some base conversion error, but I don't have ideas about how to fix it.
I'm getting this error parsing a 5GB+ csv file with gocsv, if you need more details.
[This error was caused by the GoCSV library that doesn't allow to specify a base for the numbers you're going to parse.]

Quoting from strconv.ParseInt()
If the base argument is 0, the true base is implied by the string's prefix following the sign (if present): 2 for "0b", 8 for "0" or "0o", 16 for "0x", and 10 otherwise. Also, for argument base 0 only, underscore characters are permitted as defined by the Go syntax for integer literals.
You are passing 0 for base, so the base to parse in will be inferred from the string value, and since it starts with a '0' followed by a non '0', your number is interpreted as an octal (8) number, and the digit 9 is invalid there.
Note that this would work:
fmt.Println(strconv.ParseInt("0431031", 0, 64))
And output (try it on the Go Playground):
143897 <nil>
(Octal 431031 equals 143897 decimal.)
If your input is in base 10, pass 10 for base:
fmt.Println(strconv.ParseInt("0491031", 10, 64))
Then output will be (try it on the Go Playground):
491031 <nil>

Related

Impala substr can't get utf8 character correctly

I am new to ETL and I was assigned with a task on sanitizing some sensitive information before giving the data to a client.
I am using HUE web client with Impala.
What I want to do is:
For example, a column info like '京客隆(三里屯店)', then I need to transform it into something like '京XXX店)' .
My query is:
select '京客隆(三里屯店)', concat(substr('京客隆(三里屯店)', 1, 3), 'XXX', substr('京客隆(三里屯店)', char_length('京客隆(三里屯店)') -6, 6));
But I get gibberish in the output:
'京客隆(三里屯店)' | concat(substr('京客隆(三里屯店)', 1, 3), 'xxx', substr('京客隆(三里屯店)', char_length('京客隆(三里屯店)') - 6, 6))
京客隆(三里屯店) | 京XXX�店�
The problem is that :
select '京客隆(三里屯店)', substr('京客隆(三里屯店)', char_length('京客隆(三里屯店)') -3 , 3);
output: 京客隆(三里屯店) ��
doesn't get the correct characaters. Why is that? I pasted the string in python shell and I can get the correct characters if I only take the last 3 bytes.
It turns out that I misunderstood the function substr.
substr(STRING a, INT start [, INT len]) :
It takes characters starting from (including) INT start. So for example my string '京客隆(三里屯店)' is 27 bytes long in total, and each utf8 char takes 3 bytes here. I need to take the last 3 bytes, which is the ) , then I need to write:
substr('京客隆(三里屯店), 27 - 2 ,3 ) .
It then gets the 25, 26, 27 3 bytes and display the char ) correctly.
Updated:
I was told to use :
SELECT regexp_replace('京客隆(三里屯店)', '(.)(.*)(.{2})', '\\1***\\3');
works like an charm :P.

Converting nginx uuid from hex to Base64: how is byte-order involved?

Nginx can be configured to generate a uuid suitable for client identification. Upon receiving a request from a new client, it appends a uuid in two forms before forwarding the request upstream to the origin server(s):
cookie with uuid in Base64 (e.g. CgIGR1ZfUkeEXQ2YAwMZAg==)
header with uuid in hexadecimal (e.g. 4706020A47525F56980D5D8402190303)
I want to convert a hexadecimal representation to the Base64 equivalent. I have a working solution in Ruby, but I don't fully grasp the underlying mechanics, especially the switching of byte-orders:
hex_str = "4706020A47525F56980D5D8402190303"
Treating hex_str as a sequence of high-nibble (most significant 4 bits first) binary data, produce the (ASCII-encoded) string representation:
binary_seq = [hex_str].pack("H*")
# 47 (71 decimal) -> "G"
# 06 (6 decimal) -> "\x06" (non-printable)
# 02 (2 decimal) -> "\x02" (non-printable)
# 0A (10 decimal) -> "\n"
# ...
#=> "G\x06\x02\nGR_V\x98\r]\x84\x02\x19\x03\x03"
Map binary_seq to an array of 32-bit little-endian unsigned integers. Each 4 characters (4 bytes = 32 bits) maps to an integer:
data = binary_seq.unpack("VVVV")
# "G\x06\x02\n" -> 167904839 (?)
# "GR_V" -> 1449087559 (?)
# "\x98\r]\x84" -> 2220690840 (?)
# "\x02\x19\x03\x03" -> 50534658 (?)
#=> [167904839, 1449087559, 2220690840, 50534658]
Treating data as an array of 32-bit big-endian unsigned integers, produce the (ASCII-encoded) string representation:
network_seq = data.pack("NNNN")
# 167904839 -> "\n\x02\x06G" (?)
# 1449087559 -> "V_RG" (?)
# 2220690840 -> "\x84]\r\x98" (?)
# 50534658 -> "\x03\x03\x19\x02" (?)
#=> "\n\x02\x06GV_RG\x84]\r\x98\x03\x03\x19\x02"
Encode network_seq in Base64 string:
Base64.encode64(network_seq).strip
#=> "CgIGR1ZfUkeEXQ2YAwMZAg=="
My rough understanding is that big-endian is the standard byte-order for network communications, while little-endian is more common on host machines. Why nginx provides two forms that require switching byte order to convert I'm not sure.
I also don't understand how the .unpack("VVVV") and .pack("NNNN") steps work. I can see that G\x06\x02\n becomes \n\x02\x06G, but I don't understand the steps that get there. For example, focusing on the first 8 digits of hex_str, why do .pack(H*) and .unpack("VVVV") produce:
"4706020A" -> "G\x06\x02\n" -> 167904839
whereas converting directly to base-10 produces:
"4706020A".to_i(16) -> 1191576074
? The fact that I'm asking this shows I need clarification on what exactly is going on in all these conversions :)

Fortran 90 - reading format

I'm trying to read that string in a formatted file: " PARAMETER (NE_M=10,NL_M=12)".
I want to replace the 12 by 11.
I tried to read the sting like this :
integer :: i
character(len=30) :: text
10 format(6x,24a,2i) text,i
read(text_data,10) text, i
write(6,100) text, 11
But it doesn't work. Any idea?
The reading and writing you have written will not do what you want. The input statement you presented for reading is 33 characters wide, and your formatting only accounts for 32 of those characters and your write will not contain the closing ).
Consider the following code, if you do not need to capture the 12 in the input.
program test
character(len=30) :: text
101 format(a30, i2, ')')
open(unit=10, file='testinput.f')
read(10,101) text
write(*,101) text, 11
end program
and the input (with 6 leading spaces) in file testinput.f:
PARAMETER (NE_M=10,NL_M=12)
when run, produces the output:
% ./test
PARAMETER (NE_M=10,NL_M=11)
This code was compiled and tested with GNU gfortran 4.8.2.
assuming test_data is a unit number of an open file and 100 is a
format statement number.
integer :: i
character(len=30) :: text
10 format(6x,a24,i2)
read(text_data,10) text, i
write(6,100) text(:24), i
fixing those other issues:
integer :: i
character(len=30) :: text
open(unit=20,file='filename')
10 format(6x,a24,i2)
read(20,10) text, i
write(6,10) text(:24), i

ruby YAML parse bug with number

I have encountered what appears to be a bug with the YAML parser. Take this simple yaml file for example:
new account:
- FLEETBOSTON
- 011001742
If you parse it using this ruby line of code:
INPUT_DATA = YAML.load_file("test.yml")
Then I get this back:
{"new account"=>["FLEETBOSTON", 2360290]}
Am I doing something wrong? Because I'm pretty sure this is never supposed to happen.
It is supposed to happen. Numbers starting with 0 are in octal notation. Unless the next character is x, in which case they're hexadecimal.
07 == 7
010 == 8
011 == 9
0x9 == 9
0xA == 10
0xF == 15
0x10 == 16
0x11 == 17
Go into irb and just type in 011001742.
1.9.2-p290 :001 > 011001742
=> 2360290
PEBKAC. :)
Your number is a number, so it's treated as a number. If you want to make it explictly a string, enclose it into quotes, so YAML will not try to make it a number.
new account:
- FLEETBOSTON
- '011001742'

Ruby IO#read max length for single read

How can i determine the max length IO#read can get in a single read on the current platform?
irb(main):301:0> File.size('C:/large.file') / 1024 / 1024
=> 2145
irb(main):302:0> s = IO.read 'C:/large.file'
IOError: file too big for single read
That message comes from io.c, remain_size. It is emitted when the (remaining) size of the file is greater or equal to LONG_MAX. That value depends on the platform your Ruby has been compiled with.
At least in Ruby 1.8.7, the maximum value for Fixnums happens to be just half of that value (-1), so you could get the limit by
2 * 2 ** (1..128).to_a.find { | i | (1 << i).kind_of? Bignum } - 1
You should rather not rely on that.

Resources