In Julia, how does one convert a list of ASCII decimals to a string? - ascii

tldr: I want to convert [125, 119, 48, 126, 40] to output string, }w0~(
To give a real life example, I am working with sequence data in fastq format (Here is a link to the library imported).
cat example.fastq outputs the following:
#some/random/identifier
ACTAG
+
}w0~(
The julia code below demonstrates reading the fastq file:
import BioSequences.FASTQ
fastq_stream = FASTQ.Reader(open("example.fastq", "r"))
for record in fastq_stream
# Still need to learn, why this offset of 33?
println(
Vector{Int8}(FASTQ.quality(record, :sanger)) .+ 33
)
println(
String(FASTQ.sequence(record))
)
println(
String(FASTQ.identifier(record))
)
break
end
close(fastq_stream)
This code prints the following:
[125, 119, 48, 126, 40]
ACTAG
some/random/identifier
I don't want to have to store this information in a list. I would prefer to convert it to string. So the output I am looking for here is:
}w0~(
ACTAG
some/random/identifier

julia> String(UInt8.([125, 119, 48, 126, 40]))
"}w0~("
Explanation
in Julia Strings are constructed using a set of bytes. If you are using ASCII only the char-byte mapping is simple and you can directly work on raw data (which is also the fastest way to do that).
Note that since Julia Strings are immutable, when creating String from raw bytes, the initial bytes become unavailable - this also means that no data is copied in the String creation process. Have a look at the example below:
julia> mybytes = UInt8.([125, 119, 48, 126, 40]);
julia> mystring = String(mybytes)
"}w0~("
julia> mybytes
0-element Array{UInt8,1}
Performance note
Strings in Julia are not internalized. In analytics scenarios always consider using Symbols instead of Strings. In some scenarios using temperature=:hot instead of temperature="hot" can mean 3x shorter execution time.
EDIT - performance test
julia> using Random, BenchmarkTools;Random.seed!(0);
bb = rand(33:126,1000);
julia> #btime join(Char.($bb));
31.573 μs (13 allocations: 6.56 KiB)
julia> #btime String(UInt8.($bb));
711.111 ns (2 allocations: 2.13 KiB)
String(UInt8.($bb)) is over 40x faster and uses 1/3 of the memory

I found a workable solution for now. I am sure there are more efficient solutions out there.
join(Char(i) for i in Vector{Int8}(FASTQ.quality(record, :sanger)) .+ 33) produces the output I require.

Related

Golang readers: Why writing int64 numbers using bitwise operator <<

I have come across the following code when dealing with Go readers to limit the number of bytes read from a remote client when sending a file through multipart upload (e.g. in Postman).
r.Body = http.MaxBytesReader(w, r.Body, 32<<20+1024)
If I am not mistaken, the above notation should represent 33555456 bytes, or 33.555456 MB (32 * 2 ^ 20) + 1024. Or is this number not correct?
What I don't understand is:
why did the author use it like this? Why using 20 and not some other number?
why the author used the notation +1024 at all? Why didn't he write 33 MB instead?
would it be OK to write 33555456 directly as int64?
If I am not mistaken, the above notation should represent 33555456 bytes, or 33.555456 MB (32 * 2 ^ 20) + 1024. Or is this number not correct?
Correct. You can trivially check it yourself.
fmt.Println(32<<20+1024)
Why didn't he write 33 MB instead?
Because this number is not 33 MB. 33 * 1024 * 1024 = 34603008
would it be OK to write 33555456 directly as int64?
Naturally. That's what it likely is reduced to during compilation anyway. This notation is likely easier to read, once you figure out the logic behind 32, 20 and 1024.
Ease of reading is why I almost always (when not using ruby) write constants like "50 MB" as 50 * 1024 * 1024 and "30 days" as 30 * 86400, etc.

Wireshark: read 8 bytes of timestamp

I'm new to writing dissectors in 'C' and I came across the need to read 8 bytes timestamp from a packet.
I'm trying the following code:
g_print("offset=%d, starttime=0x%08x\n", offset, tvb_get_letoh64(tvb, offset));
and I get:
offset=8, starttime=0x0362ea14
which is only 4 bytes out of the 8 I was expecting.
How can I read it so the output would be:
offset=8, starttime=0x14ea620305779840
I also tried reading it using:
g_print("offset=%d, starttime=0x%08x\n", offset, tvb_get_bits64(tvb, 64, 32, ENC_LITTLE_ENDIAN));
g_print("offset=%d, starttime=0x%08x\n", offset, tvb_get_bits64(tvb, 64, 64, ENC_LITTLE_ENDIAN));
and it printed the 4 first bytes of the timestamp and the 2nd call printed the last 4 bytes. I'm missing something very basic...
2nd question, ok, let's assume I get the value right and convert it into nstime_t, How can I format this into a Date\time format, something like:
YYYY-MM-DDZHH:MM:SS:MMMM
Thank you so much!
What output do you get with this?
g_print("offset=%d, starttime=0x%08lx\n", offset, tvb_get_letoh64(tvb, offset));
As for your 2nd question, what is the meaning of these 8 bytes? Maybe you can declare your hf variable using FT_ABSOLUTE_TIME and use something like proto_tree_add_time(), proto_tree_add_time_item(), proto_tree_add_time_format_value() or proto_tree_add_time_format()?

Ruby: How do I convert from UNIX struct tm?

I'm trying to work with Linux's RTC in Ruby. The RTC driver via ioctl returns soemthing very similar to struct tm, as found in the standard time.h file. Alas, I cannot find a standard Ruby method that understands this structure (month number is 0-based, year is 1900-based). Short of some trivial coding, is there a standard library/object in Ruby that can convert a tm struct/array into a Time object?
The current solution is:
rtctm_raw=rtc.unpack("iiiiii") # see rtc(4) or time.h
rtctm=[ *rtctm_raw, 0,0,0,0 ]
rtctm[4]+=1
rtctm[5]+=1900
rtc_values=Time.gm(*rtctm)
But I consider this ugly, since one would think Ruby's "gm" and "mktime" calls mirror the POSIX counterparts. But they don't. If such calls are available, I would prefer to use them.
If there's an offset, just apply it before creating a Time instance :
tm_struct = {
tm_year: 117,
tm_mon: 2,
tm_mday: 7,
tm_hour: 14,
tm_min: 32,
tm_sec: 30
}
puts Time.local(
tm_struct[:tm_year] + 1900,
tm_struct[:tm_mon] + 1,
*tm_struct.values_at(:tm_mday, :tm_hour, :tm_min, :tm_sec)
)
#=> 2017-03-07 14:32:30 +0100

Converting String Object to Bytes and vice-versa using packer/unpacker Ruby

I want to convert a string object to bytes and Vice-versa.
Is it possible using Ruby Packer/Unpacker?
I am unable to find the format specifier to use
*pack_object = "Test".pack('**x**')* where x is format specifier
*unpacked_object = pack_object.unpack('**x**')* , this should result in "Test" string
String has a bytes method that returns an array of integers:
'Type'.bytes
#=> [84, 121, 112, 101]
The equivalent unpack directive is C*: (as already noted by cremno)
'Type'.unpack('C*')
#=> [84, 121, 112, 101]
Or the other way round:
[84, 121, 112, 101].pack('C*')
#=> "Type"
Note that pack returns a string in binary encoding.
Regarding your comment:
The output which i need is the same strung which i packed
pack and unpack are counterparts, so you can use all kind of directives:
'Type'.unpack('b*')
#=> ["00101010100111100000111010100110"]
['00101010100111100000111010100110'].pack('b*')
#=> 'Type'

Converting numeric bytes to character representation [duplicate]

This question already has answers here:
Ruby: create a String from bytes
(5 answers)
Closed 8 years ago.
I have the following numeric bytes that I would like to find out the character representation for, where/how do I do this?
239 187 191 104
Call chr method on each of these:
[239, 187, 191, 104].map(&:chr)
#=> ["\xEF", "\xBB", "\xBF", "h"]
# tilde, the last printable character
126.chr
#=> "~"
I think starting 127 would be non-standard chars
Use the method chr of fixnum. Like so:
239.chr
=> "\xEF"
If your input is a space separated string, you may use split and map:
"239 187 191 104".split.map(&:to_i).map(&:chr)

Resources