Write binary file from text string that represents hex bytes in Ruby - ruby

I have the following text string that represents hexadecimal bytes that should appear in file to create.
str = "001104059419632801001B237100300381010A"
I want to create a file that contains the above string in order when I open the created file with a Hex editor I see the same bytes
When I run this script
File.open("out.dat", 'w') {|f| f.write(str.unpack('H*')) }
it creates the file out.dat and when I open this file in a Hex editor contains this
5B2233303330333133313330333433303335333933343331333933363333333233383330333133303330333134323332333333373331333033303333333033303333333833313330333133303431225D
and I would like the content when I open the file in Hex editor be the same a text string
001104059419632801001B237100300381010A
How can I do this?
I hope make sense. Thanks

You have to split the string by aligned bytes in the first place.
str.
each_char. # enumerator
each_slice(2). # bytes
map { |h, l| (h.to_i(16) * 16 + l.to_i(16)) }.
pack('C*')
#⇒ "\x00\x11\x04\x05\x94\x19c(\x01\x00\e#q\x000\x03\x81\x01\n"
or, even better:
str.
scan(/../).
map { |b| b.to_i(16) }.
pack('C*')
Now you might dump this to the file using e.g. IO#binwrite.

Related

How to read a file's content and search for a string in multiple files

I have a text file that has around 100 plus entries like out.txt:
domain\1esrt
domain\2345p
yrtfj
tkpdp
....
....
I have to read out.txt, line-by-line and check whether the strings like "domain\1esrt" are present in any of the files under a different directory. If present delete only that string occurrence and save the file.
I know how to read a file line-by-line and also know how to grep for a string in multiple files in a directory but I'm not sure how to join those two to achieve my above requirement.
You can create an array with all the words or strings you want to find and then delete/replace:
strings_to_delete = ['aaa', 'domain\1esrt', 'delete_me']
Then to read the file and use map to create an array with all the lines who doesn't match with none of the elements in the array created before:
# read the file 'text.txt'
lines = File.open('text.txt', 'r').map do|line|
# unless the line matches with some value on the strings_to_delete array
line unless strings_to_delete.any? do |word|
word == line.strip
end
# then remove the nil elements
end.reject(&:nil?)
And then open the file again but this time to write on it, all the lines which didn't match with the values in the strings_to_delete array:
File.open('text.txt', 'w') do |line|
lines.each do |element|
line.write element
end
end
The txt file looks like:
aaa
domain\1esrt
domain\2345p
yrtfj
tkpdp
....
....
delete_me
I don't know how it'll work with a bigger file, anyways, I hope it helps.
I would suggest using gsub here. It will run a regex search on the string and replace it with the second parameter. So if you only have to replace any single string, I believe you can simply run gsub on that string (including the newline) and replace it with an empty string:
new_file_text = text.gsub(/regex_string\n/, "")

String contains NUL bytes

I'm trying to decode this file that is in IBM437 into readable UTF I'm at the point where I think I've almost got it but I'm getting an ArgumentError where the string contains nul bytes, I'm aware of how to gsub out nul bytes using:
.gsub("\u0000", '') however I can't figure out where to gsub the bytes out.
Here's the source:
def gather_info
file = './lib/SETI_message.txt'
File.read(file).each_line do |gather|
packed = [gather].pack('b*')
ec = Encoding::Converter.new(packed, 'utf-8')
encoding_forced = packed.encode(ec)
File.open('packed.txt', 'a+'){ |s| s.puts(encoding_forced.gsub("\u0000", '')) }
end
end
gather_info
And here's the file
Can anyone tell me what I'm doing wrong here?
The following works for me :
file = File.read('SETI.txt')
packed = file.scan(/......../).map{|s| s.to_i(2)}.pack('U*')
File.write('packed.txt', packed)
Let's break file.scan(/......../).map{|s| s.to_i(2)}.pack('U*') down :
file.scan(/......../)
Here we break the huge string of 0s and 1s (the file) into an array of strings containing 8 characters each. It looks like that : ['00001111', '11110000', ...].
arr.map{|s| s.to_i(2)}
From step 1 we got an array of strings representing the different characters in binary notation. We can convert one of those strings (called s) by applying s.to_i(2) because the parameter '2' says to the method to_i to use base 2. So '00000011'.to_i(2) returns 3.
We apply this to all the characters by using map.
So we now have an array that looks like [98, 82, 49, 39, ...].
arr.pack('U*')
From step 2 we have an array of integers representing each a character. We can now use the pack method to transform our array of integers into a string. The parameter we use for pack is U to tell him that the integers are in fact UTF-8 characters.

Ruby script which can replace a string in a binary file to a different, but same length string?

I would like to write a Ruby script (repl.rb) which can replace a string in a binary file (string is defined by a regex) to a different, but same length string.
It works like a filter, outputs to STDOUT, which can be redirected (ruby repl.rb data.bin > data2.bin), regex and replacement can be hardcoded. My approach is:
#!/usr/bin/ruby
fn = ARGV[0]
regex = /\-\-[0-9a-z]{32,32}\-\-/
replacement = "--0ca2765b4fd186d6fc7c0ce385f0e9d9--"
blk_size = 1024
File.open(fn, "rb") {|f|
while not f.eof?
data = f.read(blk_size)
data.gsub!(regex, str)
print data
end
}
My problem is that when string is positioned in the file that way it interferes with the block size used by reading the binary file. For example when blk_size=1024 and my 1st occurance of the string begins at byte position 1000, so I will not find it in the "data" variable. Same happens with the next read cycle. Should I process the whole file two times with different block size to ensure avoiding this worth case scenario, or is there any other approach?
I would posit that a tool like sed might be a better choice for this. That said, here's an idea: Read block 1 and block 2 and join them into a single string, then perform the replacement on the combined string. Split them apart again and print block 1. Then read block 3 and join block 2 and 3 and perform the replacement as above. Split them again and print block 2. Repeat until the end of the file. I haven't tested it, but it ought to look something like this:
File.open(fn, "rb") do |f|
last_block, this_block = nil
while not f.eof?
last_block, this_block = this_block, f.read(blk_size)
data = "#{last_block}#{this_block}".gsub(regex, str)
last_block, this_block = data.slice!(0, blk_size), data
print last_block
end
print this_block
end
There's probably a nontrivial performance penalty for doing it this way, but it could be acceptable depending on your use case.
Maybe a cheeky
f.pos = f.pos - replacement.size
at the end of the while loop, just before reading the next chunk.

problem with parsing string from excel file

i have ruby code to parse data in excel file using Parseexcel gem. I need to save 2 columns in that file into a Hash, here is my code:
worksheet.each { |row|
if row != nil
key = row.at(1).to_s.strip
value = row.at(0).to_s.strip
if !parts.has_key?(key) and key.length > 0
parts[key] = value
end
end
}
however it still save duplicate keys into the hash: "020098-10". I checked the excel file at the specified row and found the difference are " 020098-10" and "020098-10". the first one has a leading space while the second doesn't. I dont' understand is it true that .strip function already remove all leading and trailing white space?
also when i tried to print out key.length, it gave me these weird number:
020098-10 length 18
020098-10 length 17
which should be 9....
If you will inspect the strings you receive, you will probably get something like:
" \x000\x002\x000\x000\x009\x008\x00-\x001\x000\x00"
This happens because of the strings encoding. Excel works with unicode while ruby uses ISO-8859-1 by default. The encodings will differ on various platforms.
You need to convert the data you receive from excel to a printable encoding.
However when you should not encode strings created in ruby as you will end with garbage.
Consider this code:
#enc = Encoding::Converter.new("UTF-16LE", "UTF-8")
def convert(cell)
if cell.numeric
cell.value
else
#enc.convert(cell.value).strip
end
end
parts = {}
worksheet.each do |row|
continue unless row
key = convert row.at(1)
value = convert row.at(0)
parts[key] = value unless parts.has_key?(key) or key.empty?
end
You may want change the encodings to a different ones.
The newer Spreadsheet-gem handles charset conversion automatically for you, to UTF-8 I think as standard but you can change it, so I'd recommend using it instead.

Ruby string encoding problem

I've looked at the other ruby/encoding related posts but haven't been able to figure out why the following is not working. Likely just because I'm dense, but here's the situation.
Using Ruby 1.9 on windows. I have a set of CSV files that need some data appended to the end of each line. Whenever I run my script, the appended characters are gibberish. The input text appears to be IBM437 encoding, whereas my string I'm appending starts as US-ASCII. Nothing I've tried with respect to forcing encoding on the input strings or the append string seems to change the resultant output. I'm stumped. The current encoding version is simply the last that I tried.
def append_salesperson(txt, salesperson)
if txt.length > 2
return txt.chomp.force_encoding('US-ASCII') + %(, "", "", "#{salesperson}")
end
end
salespeople = Hash[
"fname", "Record Manager"]
outfile = File.open("ActData.csv", "w:US-ASCII")
salespeople.each do | filename, recordManager |
infile = File.open("#{filename}.txt")
infile.each do |line|
outfile.puts append_salesperson(line, recordManager)
end
infile.close
end
outfile.close
One small note that is related to your question is that you have your csv data as such %(, "", "", "#{salesperson}"). Here you have a space char before your double quotes. This can cause the #{salesperson} to be interpreted as multiple fields if there is a comma in this text. To fix this there can't be white space between the comma and the double quotes. Example: "this is a field","Last, First","and so on". This is one little gotcha that I ran into when creating reports meant to be viewed in Excel.
In Common Format and MIME Type for Comma-Separated Values (CSV) Files they describe the grammar of a csv file for reference.
maybe txt.chomp.force_encoding('US-ASCII') + %(, "", "", "#{salesperson.force_encoding('something')}")
?
It sounds like the CSV data is coming in as UTF-16... hence the puts shows as the printable character (the first byte) plus a space (the second byte).
Have you tried encoding your appended data with .force_encoding(Encoding::UTF-16LE) or .force_encoding(Encoding::UTF-16BE)?

Resources