Read binary file as string in Ruby - ruby

I need an easy way to take a tar file and convert it into a string (and vice versa). Is there a way to do this in Ruby? My best attempt was this:
file = File.open("path-to-file.tar.gz")
contents = ""
file.each {|line|
contents << line
}
I thought that would be enough to convert it to a string, but then when I try to write it back out like this...
newFile = File.open("test.tar.gz", "w")
newFile.write(contents)
It isn't the same file. Doing ls -l shows the files are of different sizes, although they are pretty close (and opening the file reveals most of the contents intact). Is there a small mistake I'm making or an entirely different (but workable) way to accomplish this?

First, you should open the file as a binary file. Then you can read the entire file in, in one command.
file = File.open("path-to-file.tar.gz", "rb")
contents = file.read
That will get you the entire file in a string.
After that, you probably want to file.close. If you don’t do that, file won’t be closed until it is garbage-collected, so it would be a slight waste of system resources while it is open.

If you need binary mode, you'll need to do it the hard way:
s = File.open(filename, 'rb') { |f| f.read }
If not, shorter and sweeter is:
s = IO.read(filename)

To avoid leaving the file open, it is best to pass a block to File.open. This way, the file will be closed after the block executes.
contents = File.open('path-to-file.tar.gz', 'rb') { |f| f.read }

how about some open/close safety.
string = File.open('file.txt', 'rb') { |file| file.read }

Ruby have binary reading
data = IO.binread(path/filaname)
or if less than Ruby 1.9.2
data = IO.read(path/file)

on os x these are the same for me... could this maybe be extra "\r" in windows?
in any case you may be better of with:
contents = File.read("e.tgz")
newFile = File.open("ee.tgz", "w")
newFile.write(contents)

You can probably encode the tar file in Base64. Base 64 will give you a pure ASCII representation of the file that you can store in a plain text file. Then you can retrieve the tar file by decoding the text back.
You do something like:
require 'base64'
file_contents = Base64.encode64(tar_file_data)
Have look at the Base64 Rubydocs to get a better idea.

Ruby 1.9+ has IO.binread (see #bardzo's answer) and also supports passing the encoding as an option to IO.read:
Ruby 1.9
data = File.read(name, {:encoding => 'BINARY'})
Ruby 2+
data = File.read(name, encoding: 'BINARY')
(Note in both cases that 'BINARY' is an alias for 'ASCII-8BIT'.)

If you can encode the tar file by Base64 (and storing it in a plain text file) you can use
File.open("my_tar.txt").each {|line| puts line}
or
File.new("name_file.txt", "r").each {|line| puts line}
to print each (text) line in the cmd.

Related

Changing information in a CSV file

I'm trying to write a ruby script that will read through a CSV file and prepend information to certain cells (for instance adding a path to a file). I am able to open and mutate the text just fine, but am having issues writing back to the CSV without overriding everything. This is a sample of what I have so far:
CSV.foreach(path) { |row|
text = row[0].to_s
new_text = "test:#{text}"
}
I would like to add something within that block that would then write new_textback to the same reference cell(row) in the file. The only way I have to found to write to a file is
CSV.open(path, "wb") { |row|
row << new_text
}
But I think that is bad practice since you are reopening the file within the file block already. Is there a better way I could do this?
EX: I have a CSV file that looks something like:
file,destination
test.txt,A101
and need it to be:
file,destination
path/test.txt,id:A101
Hope that makes sense. Thanks in advance!
Depending on the size if the file, you might consider loading the contents of the file into a local variable and then manipulating that, overwriting the original file.
lines = CSV.read(path)
File.open(path, "wb") do |file|
lines.each do |line|
text = line[0].to_s
line[0] = "test:#{text}" # Replace this with your editing logic
file.write CSV.generate_line(line)
end
end
Alternately, if the file is big, you could write each modified line to a new file along the way and then replace the old file with the new one at the end.
Given that you don't appear to be doing anything that draws on CSV capabilities, I'd recommend using Ruby's "in-place" option variable $-i.
Some of the stats software I use wants just the data, and can't deal with a header line. Here's a script I wrote a while back to (appear to) strip the first line out of one or more data files specified on the command-line.
#! /usr/bin/env ruby -w
#
# User supplies the name of one or more files to be "stripped"
# on the command-line.
#
# This script ignores the first line of each file.
# Subsequent lines of the file are copied to the new version.
#
# The operation saves each original input file with a suffix of
# ".orig" and then operates in-place on the specified files.
$-i = ".orig" # specify backup suffix
oldfilename = ""
ARGF.each do |line|
if ARGF.filename == oldfilename # If it's an old file
puts line # copy lines through.
else # If it's a new file remember it
oldfilename = ARGF.filename # but don't copy the first line.
end
end
Obviously you'd want to change the puts line pass-through to whatever edit operations you want to perform.
I like this solution because even if you screw it up, you've preserved your original file as its original name with .orig (or whatever suffix you choose) appended.

Reading and writing to and from files - can you do it the same way? (Ruby)

I'm in the process of learning Ruby and reading through Chris Pine's book. I'm learning how to read (and write) files, and came upon this example:
require 'yaml'
test_array = ['Give Quiche A Chance',
'Mutants Out!',
'Chameleonic Life-Forms, No Thanks']
test_string = test_array.to_yaml
filename = 'whatever.txt'
File.open filename, 'w' do |f|
f.write test_string
end
read_string = File.read filename
read_array = YAML::load read_string
puts(read_string == test_string)
puts(read_array == test_array )
The point of the example was to teach me about YAML, but my question is, if you can read a file with:
File.read filename
Can you write to a file in a similar way?:
File.write filename test_string
Sorry if it's a dumb question. I was just curious why it's written the way it was and if it had to be written that way.
Can you write to a file in a similar way?
Actually, yes. And it's pretty much exactly as you guessed:
File.write 'whatever.txt', test_array.to_yaml
I think it is amazing how intuitive Ruby can be.
See IO.write for more details. Note that IO.binwrite is also available, along with IO.read and IO.binread.
The Ruby File class will give you new and open but it inherits from the IO class so you get the read and write methods too.
I think the right way to write into a file is the following:
File.open(yourfile, 'w') { |file| file.write("your text") }
To brake this line down:
We first open the file setting the access mode ('w' to overwrite, 'a' to append, etc.)
We then actually write into the file
You can read or write to a file by specifying the mode you access it through. The Ruby File class is a subclass of IO.
The File class open or new methods take a path and a mode as arguments:
File.open('path', 'mode') alternatively: File.new('path','mode')
Example: to write to an existing file
somefile = File.open('./dir/subdirectory/file.txt', 'w')
##some code to write to file, eg:
array_of_links.each {|link| somefile.puts link }
somefile.close
See the source documentation as suggested above for more details, or similar question here: How to write to file in Ruby?

Trouble conceptualizing how to have LDA-Ruby read multiple .txt files

I am attempting to write a Ruby script that will look at a collection of unstructured plain text files and I am struggling with thinking through the best way to process these files. The current working version of my script for topic modeling is the following:
#!/usr/bin/env ruby -w
require 'rubygems'
require 'lda-ruby'
# Input a directory of files
FILES_DIRECTORY = ARGV[0]
File.open("files.csv", "w") do |f|
Dir.glob(FILES_DIRECTORY + "*.txt") do |filename|
file_id = File.basename(filename).gsub(".txt", "")
text = File.read(filename).clean
f.puts [file_id, text].join(",")
end
end
# Read csv
file = File.open("files.csv", "r") { |f| f.read }
# Train topics and infer
corpus = Lda::Corpus.new
corpus.add_document(Lda::TextDocument.new(corpus, file))
lda = Lda::Lda.new(corpus)
lda.verbose = false
lda.num_topics = 20
lda.em('random')
topics = lda.top_words(10)
puts topics
What I'm attempting to modify is having this program read through a collection of plain text files rather than a single file. It's not as easy as just tossing all the text files into a single file (as it currently does with files.csv) because, as I understand it, lda-ruby looks for multiple files to do a correct topic model rather than a single file. (I've come to this conclusion because there is little variance between having this script read a single text file [e.g., corpus.txt] that includes all the text, and the files.csv file.)
So, my question is how can I have lda-ruby iterate through these text files differently? Should the contents of the files be placed into a hash instead? If so, any pointers on where I should start with that? Or, should I scrap this and use a different LDA library?
Thanks ahead of time for any advice.
Basically, you just need to initialize the corpus before going through the directory and then add each file to the corpus in the block the same way you were previously adding your CSV file.
#!/usr/bin/env ruby -w
require 'rubygems'
require 'lda-ruby'
# Input a directory of files
FILES_DIRECTORY = ARGV[0]
corpus = Lda::Corpus.new
File.open("files.csv", "w") do |f|
Dir.glob(FILES_DIRECTORY + "*.txt") do |filename|
file = File.open(filename, "r") { |f| f.read }
corpus.add_document(Lda::TextDocument.new(corpus, file))
end
end
lda = Lda::Lda.new(corpus)
lda.verbose = false
lda.num_topics = 20
lda.em('random')
topics = lda.top_words(10)
puts topics
I know this is a rather old question, but I found this question while looking for a solution to a similar problem. Your code helped me so I thought my answer might be helpful to you or others.
If you have a directory of text files you want to use as documents, you can use the following line to create your corpus:
corpus = Lda::DirectoryCorpus.new('path/to/directory')

How to create a file in Ruby

I'm trying to create a new file and things don't seem to be working as I expect them too. Here's what I've tried:
File.new "out.txt"
File.open "out.txt"
File.new "out.txt","w"
File.open "out.txt","w"
According to everything I've read online all of those should work but every single one of them gives me this:
ERRNO::ENOENT: No such file or directory - out.txt
This happens from IRB as well as a Ruby script. What am I missing?
Use:
File.open("out.txt", [your-option-string]) {|f| f.write("write your stuff here") }
where your options are:
r - Read only. The file must exist.
w - Create an empty file for writing.
a - Append to a file.The file is created if it does not exist.
r+ - Open a file for update both reading and writing. The file must exist.
w+ - Create an empty file for both reading and writing.
a+ - Open a file for reading and appending. The file is created if it does not exist.
In your case, 'w' is preferable.
OR you could have:
out_file = File.new("out.txt", "w")
#...
out_file.puts("write your stuff here")
#...
out_file.close
Try
File.open("out.txt", "w") do |f|
f.write(data_you_want_to_write)
end
without using the
File.new "out.txt"
Try using "w+" as the write mode instead of just "w":
File.open("out.txt", "w+") { |file| file.write("boo!") }
OK, now I feel stupid. The first two definitely do not work but the second two do. Not sure how I convinced my self that I had tried them. Sorry for wasting everyone's time.
In case this helps anyone else, this can occur when you are trying to make a new file in a directory that does not exist.
If the objective is just to create a file, the most direct way I see is:
FileUtils.touch "foobar.txt"
The directory doesn't exist. Make sure it exists as open won't create those dirs for you.
I ran into this myself a while back.
File.new and File.open default to read mode ('r') as a safety mechanism, to avoid possibly overwriting a file. We have to explicitly tell Ruby to use write mode ('w' is the most common way) if we're going to output to the file.
If the text to be output is a string, rather than write:
File.open('foo.txt', 'w') { |fo| fo.puts "bar" }
or worse:
fo = File.open('foo.txt', 'w')
fo.puts "bar"
fo.close
Use the more succinct write:
File.write('foo.txt', 'bar')
write has modes allowed so we can use 'w', 'a', 'r+' if necessary.
open with a block is useful if you have to compute the output in an iterative loop and want to leave the file open as you do so. write is useful if you are going to output the content in one blast then close the file.
See the documentation for more information.
data = 'data you want inside the file'.
You can use File.write('name of file here', data)
You can also use constants instead of strings to specify the mode you want. The benefit is if you make a typo in a constant name, your program will raise an runtime exception.
The constants are File::RDONLY or File::WRONLY or File::CREAT. You can also combine them if you like.
Full description of file open modes on ruby-doc.org

Remove strings begining with 'AUTO_INCREMENT=' in 2 files

I am trying to create a ruby script that loads 2 .sql files and removes all strings that begin with 'AUTO_INCREMENT='
There are multiple occurrences of this in my .sql files and all I want is them to be removed from both files.
Thanks for any help or input as I am new to ruby and decided to give it a try.
Given the right regexp (the one below might not be the most correct given the syntax), and the answer given there to a similar question, it is rather straightforward to put a script together:
file_names = ['file1.sql', 'file2.sql']
file_names.each do |file_name|
text = File.read(file_name)
File.open(file_name, 'wb') do
|file|
file.write(text.gsub(/\s*AUTO_INCREMENT\s*(\=\s*[0-9]+)?/, ""))
end
end
Have you tried using Regex for this? If you want to remove the whole line, you could simply match ^AUTO_INCREMENT=.+$ and replace it with an empty string. That pattern should match an entire line beginning with AUTO_INCREMENT.
Here's a good site to learn Regex if you aren't familiar with it:
Hope that works for you.
You should read up on IO, String, Array for more details on methods you can use.
Here's how you might read, modify, and save the contents of one file:
# Opens a file for reading.
file = File.open("file1.txt")
# Reads all the contents into the string 'contents'.
contents = file.read
file.close
# Splits contents into an array of strings, one for each line.
lines = contents.split("\n")
# Delete any lines that start with AUTO_INCREMENT=
lines.reject! { |line| line =~ /^AUTO_INCREMENT=/ }
# Join the lines together into one string again.
new_contents = lines.join("\n")
# Open file for writing.
file = File.open("file1.txt", "w")
# Save new contents.
file.write(new_contents)
file.close

Resources