How does pack work in Ruby? - ruby

I am a tad confused about what I see here:
a = [ "a", "b", "c" ]
n = [ 65, 66, 67 ]
a.pack("A3A3A3") #=> "a b c "
a.pack("a3a3a3") #=> "a\000\000b\000\000c\000\000"
n.pack("ccc") #=> "ABC"
From the docs:
Packs the contents of arr into a binary sequence according to the directives in aTemplateString (see the table below) Directives “A,'' “a,'' and “Z'' may be followed by a count, which gives the width of the resulting field.
Here are the directives:
So we're using the A directive 3 times it seems? What does it mean to pack the string a into an arbitrary binary string (space padded, count is width?) Can you help me understand the output? Why are there so many 0s?

In the first case, you're printing "a" but padding its length to 3 with spaces, hence the two spaces to get the total length to 3.
In the second case, you're doing the same but padding with null bytes instead (ASCII value 0). Null bytes in Ruby are printed (and can be read) using the escape syntax \000 (this is one character), so \000\000 is actually just two null bytes.

The variable n is irrelevant, so you can ignore it.
In the pack statements, the bytes "a", "b" and "c" are concatenated ("packed") into a single string, with padding between them. The padding is such that the number of bytes (the width) taken up by the contents plus the padding equals the number provided.
So in the first pack statement, the "a" is padded with two spaces to make these three bytes: "a.." where I've put a . in place of the spaces to make it clear. That is concatenated with the "b" and the "c" similarly padded, to produce "a..b..c..".
In the second pack statement, null characters ('\000') are used instead of spaces. The \xxx notation (called an "escape sequence") means the byte with octal value xxx. It's used when there isn't a useful ASCII character (like 'a' or ' ') to show. A null character has no useful ASCII character, so the \xxx notation is used instead.

Related

Checking if a text file is formatted in a specific way

I have a text file which contains instructions. I'm reading it using File.readlines(filename). I want to check that the file is formatted as follows:
Has 3 lines
Line 1: two integers (including negatives) separated by a space
Line 2: two integers (including negatives) separated by a space and 1 capitalised letter of the alphabet also separated by a space.
Line 3: capitalised letters of the alphabet without any spaces (or punctuation).
This is what the file should look like:
8 10
1 2 E
MMLMRMMRRMML
So far I have calculated the number of lines using File.readlines(filename).length. How do I check the format of each line, do I need to loop through the file?
EDIT:
I solved the problem by creating three methods containing regular expressions, then I passed each line into it's function and created a conditional statement to check if the out put was true.
Suppose IO::read is used to return the following string str.
str = <<~END
8 10
1 2 E
MMLMRMMRRMML
END
#=> "8 10\n1 2 E\nMMLMRMMRRMML\n"
You can then test the string with a single regular expression:
r = /\A(-?\d+) \g<1>\n\g<1> \g<1> [A-Z]\n[A-Z]+\n\z/
str.match?(r)
#=> true
I could have written
r = /\A-?\d+ -?\d+\n-?\d+ -?\d+ [A-Z]\n[A-Z]+\n\z/
but matching an integer (-?\d+) is done three times. It's slightly shorter, and reduces the chance of error, to put the first of the three in capture group 1, and then treat that as a subexpression by calling it with \g<1> (not to be confused with a back-reference, which is written \k<1>). Alternatively, I could have use named capture groups:
r = /\A(?<int>-?\d+) \g<int>\n\g<int> \g<int> (?<cap>[A-Z])\n\g<cap>+\n\z/

String contains NUL bytes

I'm trying to decode this file that is in IBM437 into readable UTF I'm at the point where I think I've almost got it but I'm getting an ArgumentError where the string contains nul bytes, I'm aware of how to gsub out nul bytes using:
.gsub("\u0000", '') however I can't figure out where to gsub the bytes out.
Here's the source:
def gather_info
file = './lib/SETI_message.txt'
File.read(file).each_line do |gather|
packed = [gather].pack('b*')
ec = Encoding::Converter.new(packed, 'utf-8')
encoding_forced = packed.encode(ec)
File.open('packed.txt', 'a+'){ |s| s.puts(encoding_forced.gsub("\u0000", '')) }
end
end
gather_info
And here's the file
Can anyone tell me what I'm doing wrong here?
The following works for me :
file = File.read('SETI.txt')
packed = file.scan(/......../).map{|s| s.to_i(2)}.pack('U*')
File.write('packed.txt', packed)
Let's break file.scan(/......../).map{|s| s.to_i(2)}.pack('U*') down :
file.scan(/......../)
Here we break the huge string of 0s and 1s (the file) into an array of strings containing 8 characters each. It looks like that : ['00001111', '11110000', ...].
arr.map{|s| s.to_i(2)}
From step 1 we got an array of strings representing the different characters in binary notation. We can convert one of those strings (called s) by applying s.to_i(2) because the parameter '2' says to the method to_i to use base 2. So '00000011'.to_i(2) returns 3.
We apply this to all the characters by using map.
So we now have an array that looks like [98, 82, 49, 39, ...].
arr.pack('U*')
From step 2 we have an array of integers representing each a character. We can now use the pack method to transform our array of integers into a string. The parameter we use for pack is U to tell him that the integers are in fact UTF-8 characters.

Ruby: %q with strings [case]

I encountered this line:
at = #seq.slice(#seq.length - 2, 2).count(%q[at])
where #seq is a string. I know how slice and %q work, but I don't get the idea of putting a variable at (which we define here) as an argument of [] after %q.
It is a very verbose code.
#seq.length - 2 gives the index of the second to last character in #seq.
#seq.slice(#seq.length - 2, 2) gives the last two characters in #seq.
Applying count(%q[at]) to it returns the number of occurrences of characters in %q[at] (i.e., "at") in it, which counts "a" and "t". Since there are only two characters, it would be either 0, 1, or 2.
%q with paired delimiters are similar to the single quoted strings. In other words, %q[at], or %q!at!, or %q{at}, are all equivalent to 'at'.
%q[at]
# => "at"
P.S, %Q works similarly, but like double quoted strings.

Splitting with empty space in Ruby [duplicate]

This question already has an answer here:
How do I avoid trailing empty items being removed when splitting strings?
(1 answer)
Closed 8 years ago.
In both Ruby and JavaScript I can write expression " x ".split(/[ ]+/)
. In JavaScript I get somehow reasonable result ["", "x", ""], but in Ruby (2.0.0) i get ["", "x"], which is for me quite counterintuitive. I have problems to understand how regular expressions works in Ruby. Why don't I get the same result as in JavaScript or just ["x"]?
From string#split documentation, emphasis my own:
split(pattern=$;, [limit])
If pattern is a String, then its contents are used as the delimiter when splitting str. If pattern is a single space, str is split on whitespace, with leading whitespace and runs of contiguous whitespace characters ignored.
If pattern is a Regexp, str is divided where the pattern matches. Whenever the pattern matches a zero-length string, str is split into individual characters. If pattern contains groups, the respective matches will be returned in the array as well.
If pattern is omitted, the value of $; is used. If $; is nil (which is the default), str is split on whitespace as if ` ' were specified.
If the limit parameter is omitted, trailing null fields are suppressed. If limit is a positive number, at most that number of fields will be returned (if limit is 1, the entire string is returned as the only entry in an array). If negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed.
So if you were to use " x ".split(/[ ]+/, -1) you would get your expected result of ["", "x", ""]
*edited to reflect Wayne's comment
I found this in the C code for String#split, almost right at the end:
if (NIL_P(limit) && lim == 0) {
long len;
while ((len = RARRAY_LEN(result)) > 0 &&
(tmp = RARRAY_AREF(result, len-1), RSTRING_LEN(tmp) == 0))
rb_ary_pop(result);
}
So it actually pops empty strings off the end of the result array before returning! It looks like the creators of Ruby didn't want String#split to return a bunch of empty strings.
Notice the check for NIL_P(limit) -- this accords exactly with what the documentation says, as #dax pointed out.

How to count the number of space-delimited substrings in a string

Dim str as String
str = "30 40 50 60"
I want to count the number of substrings.
Expected Output: 4
(because there are 4 total values: 30, 40, 50, 60)
How can I accomplish this in VB6?
You could try this:
arrStr = Split(str, " ")
strCnt = UBound(arrStr) + 1
msgBox strCnt
Of course, if you've got Option Explicit set (which you should..) then declare the variables above first..
Your request doesn't make any sense. A string is a sequence of text. The fact that that sequence of text contains numbers separated by spaces is quite irrelevant. Your string looks like this:
30 40 50 60
There are not 4 separate values, there is only one value, shown above—a single string.
You could also view the string as containing 11 individual characters, so it could be argued that the "count" of the string would be 11, but this doesn't get you any further towards your goal.
In order to get the result that you expect, you need to split the string into multiple strings at each space, producing 4 separate strings, each containing a 2-digit numeric value.
Of course, the real question is why you're storing this value in a string in the first place. If they're numeric values, you should store them in an array (for example, an array of Integers). Then you can easily obtain the number of elements in the array using the LBound() and UBound() functions.
I agree with everything Cody stated.
If you really wanted to you could loop through the string character by character and count the number of times you find your delimiter. In your example, it is space delimited, so you would simply count the number of spaces and add 1, but as Cody stated, those are not separate values..
Are you trying to parse text here or what? Regardless, I think what you really need to do is store your data into an array. Make your life easier, not more difficult.

Resources