Why won't file contents copy? - ruby

I am trying to duplicate the contents of one file to another. Whilst the file is copying, the contents of the files are not. I am not sure where I am going wrong here.
puts "What file do you want to copy?"
print ">"
to_duplicate = STDIN.gets.chomp
puts "what do you want to call the new file?"
print ">"
output_file = STDIN.gets.chomp
puts "copying #{to_duplicate} to #{output_file}."
input = File.open(to_duplicate, 'r') ; prepare_file = input.read
output = File.open(output_file, 'w')
output.write(prepare_file)
puts "finished duplicating files."

Use FileUtils#cp
The easiest way to copy files is with FileUtils#cp from the Ruby Standard Library. For example:
require 'fileutils'
FileUtils.cp '/tmp/foo', '/tmp/bar'
You can certainly use variables to store filenames collected from standard input if you like. Just don't reinvent the wheel if you don't have to, and leverage the standard libraries whenever you can.

Related

Changing information in a CSV file

I'm trying to write a ruby script that will read through a CSV file and prepend information to certain cells (for instance adding a path to a file). I am able to open and mutate the text just fine, but am having issues writing back to the CSV without overriding everything. This is a sample of what I have so far:
CSV.foreach(path) { |row|
text = row[0].to_s
new_text = "test:#{text}"
}
I would like to add something within that block that would then write new_textback to the same reference cell(row) in the file. The only way I have to found to write to a file is
CSV.open(path, "wb") { |row|
row << new_text
}
But I think that is bad practice since you are reopening the file within the file block already. Is there a better way I could do this?
EX: I have a CSV file that looks something like:
file,destination
test.txt,A101
and need it to be:
file,destination
path/test.txt,id:A101
Hope that makes sense. Thanks in advance!
Depending on the size if the file, you might consider loading the contents of the file into a local variable and then manipulating that, overwriting the original file.
lines = CSV.read(path)
File.open(path, "wb") do |file|
lines.each do |line|
text = line[0].to_s
line[0] = "test:#{text}" # Replace this with your editing logic
file.write CSV.generate_line(line)
end
end
Alternately, if the file is big, you could write each modified line to a new file along the way and then replace the old file with the new one at the end.
Given that you don't appear to be doing anything that draws on CSV capabilities, I'd recommend using Ruby's "in-place" option variable $-i.
Some of the stats software I use wants just the data, and can't deal with a header line. Here's a script I wrote a while back to (appear to) strip the first line out of one or more data files specified on the command-line.
#! /usr/bin/env ruby -w
#
# User supplies the name of one or more files to be "stripped"
# on the command-line.
#
# This script ignores the first line of each file.
# Subsequent lines of the file are copied to the new version.
#
# The operation saves each original input file with a suffix of
# ".orig" and then operates in-place on the specified files.
$-i = ".orig" # specify backup suffix
oldfilename = ""
ARGF.each do |line|
if ARGF.filename == oldfilename # If it's an old file
puts line # copy lines through.
else # If it's a new file remember it
oldfilename = ARGF.filename # but don't copy the first line.
end
end
Obviously you'd want to change the puts line pass-through to whatever edit operations you want to perform.
I like this solution because even if you screw it up, you've preserved your original file as its original name with .orig (or whatever suffix you choose) appended.

Read file after writing in same script (Ruby the Hard Way ex16)

Here is my code:
filename = ARGV.first
puts "We're gong to erase #{filename}"
puts "If you don't want that, hit CTRL-C (^C)."
puts "If you do want that, hit RETURN."
$stdin.gets
puts "Opening the file..."
target = open(filename, 'w')
puts "Truncating the file. Goodbye!"
target.truncate(0)
puts "Now I'm going to ask you for three lines."
print "line 1: "
line1 = $stdin.gets.chomp
print "line 2: "
line2 = $stdin.gets.chomp
print "line 3: "
line3 = $stdin.gets.chomp
puts "I'm going to write these to the file."
target.write(line1)
target.write("\n")
target.write(line2)
target.write("\n")
target.write(line3)
target.write("\n")
print target.read
puts "And finally, we close it."
target.close
I'm trying to get it to write and then read. It works if I do target.close and then target = open(filename) again at the bottom of the script. Is there another way?
I saw this post about python explaining you need to close a file after writing to it. Does this same thing apply to Ruby? Do I need to use flush?
Also should I be using parentheses after read and close? The example does not.
There's two ways to approach this. You can, as you've done, open the file for writing, write to it, close the file, and reopen it for reading. This is fine. Closing the file will flush it to disk and reopening it will put you back at the beginning of the file.
Alternatively you can open a file for both reading and writing and manually move around within the file, like a cursor in an editor. The options to do this are defined in IO.new.
The problem with your code is this.
target.write("\n")
print target.read
At this point you've been writing to the file. The target file pointer is pointing at the end of the file, like a cursor in an editor. When you target.read it's going to read the end of the file, so you get nothing. You'd have to go back to the beginning of the file first with rewind.
target.write("\n")
target.rewind
print target.read
You'll also have to open the file for reading and writing. w+ can do that, and truncate the file for you.
puts "Opening the file..."
target = File.open(filename, 'w+')
This is an advanced technique most often useful for when you want to hold a lock on a file during the whole reading and writing process to make sure nobody else can work on the file while you are. Generally you do this when you're reading and then writing. For example, if you had a counter in a file you want to read and then increment and make sure nobody can write between.
def read_and_update_counter
value = 0
# Open for reading and writing, create the file if it doesn't exist
File.open("counter", File::RDWR|File::CREAT, 0644) {|f|
# Get an exclusive lock to prevent anyone else from using the
# file while we're updating it (as long as they also try to flock)
f.flock(File::LOCK_EX)
# read the value
value = f.read.to_i
# go back to the beginning of the file
f.rewind
# Increment and write the new value
f.write("#{value + 1}\n")
# Flush the changes to the file out of the in-memory IO cache
# and to disk.
f.flush
# Get rid of any other garbage that might be at the end of the file
f.truncate(f.pos)
}
# File.open automatically closes the file for us
return value
end
3.times { puts read_and_update_counter }

Trouble conceptualizing how to have LDA-Ruby read multiple .txt files

I am attempting to write a Ruby script that will look at a collection of unstructured plain text files and I am struggling with thinking through the best way to process these files. The current working version of my script for topic modeling is the following:
#!/usr/bin/env ruby -w
require 'rubygems'
require 'lda-ruby'
# Input a directory of files
FILES_DIRECTORY = ARGV[0]
File.open("files.csv", "w") do |f|
Dir.glob(FILES_DIRECTORY + "*.txt") do |filename|
file_id = File.basename(filename).gsub(".txt", "")
text = File.read(filename).clean
f.puts [file_id, text].join(",")
end
end
# Read csv
file = File.open("files.csv", "r") { |f| f.read }
# Train topics and infer
corpus = Lda::Corpus.new
corpus.add_document(Lda::TextDocument.new(corpus, file))
lda = Lda::Lda.new(corpus)
lda.verbose = false
lda.num_topics = 20
lda.em('random')
topics = lda.top_words(10)
puts topics
What I'm attempting to modify is having this program read through a collection of plain text files rather than a single file. It's not as easy as just tossing all the text files into a single file (as it currently does with files.csv) because, as I understand it, lda-ruby looks for multiple files to do a correct topic model rather than a single file. (I've come to this conclusion because there is little variance between having this script read a single text file [e.g., corpus.txt] that includes all the text, and the files.csv file.)
So, my question is how can I have lda-ruby iterate through these text files differently? Should the contents of the files be placed into a hash instead? If so, any pointers on where I should start with that? Or, should I scrap this and use a different LDA library?
Thanks ahead of time for any advice.
Basically, you just need to initialize the corpus before going through the directory and then add each file to the corpus in the block the same way you were previously adding your CSV file.
#!/usr/bin/env ruby -w
require 'rubygems'
require 'lda-ruby'
# Input a directory of files
FILES_DIRECTORY = ARGV[0]
corpus = Lda::Corpus.new
File.open("files.csv", "w") do |f|
Dir.glob(FILES_DIRECTORY + "*.txt") do |filename|
file = File.open(filename, "r") { |f| f.read }
corpus.add_document(Lda::TextDocument.new(corpus, file))
end
end
lda = Lda::Lda.new(corpus)
lda.verbose = false
lda.num_topics = 20
lda.em('random')
topics = lda.top_words(10)
puts topics
I know this is a rather old question, but I found this question while looking for a solution to a similar problem. Your code helped me so I thought my answer might be helpful to you or others.
If you have a directory of text files you want to use as documents, you can use the following line to create your corpus:
corpus = Lda::DirectoryCorpus.new('path/to/directory')

Weird Ruby IO with Tempfile

This is driving me crazy. Consider the following:
require 'open-uri'
#set up tempfile
extname = File.extname file_url
basename = File.basename(file_url, extname)
file = Tempfile.new([basename,extname])
#read form URI into tempfile
uri = URI.parse(file_url)
num_bytes_writen = file.write(uri.read)
puts "Wrote #{num_bytes_writen} bytes"
# Reading from my tempfile
puts "Opening: #{file.path} >>"
puts "#### BEGINING OF FILE ####"
puts File.open(file.path,'rb').read
puts "#### END OF FILE ####"
It looks like bytes get written, but when I try to open the file -- its empty. Whats up ?!
And to make it more weird -- everyting works in the Rails Console, but not when executed by a worker triggered by Resque.
Any ideas? Thanks guys
This is a problem of buffering. You need to flush the IO buffer to disk before trying to read it. Either file.close (if you've finished with it) or file.flush before doing the File.open for the read.
Update
I hadn't thought about this, but you don't need to reopen the temp file just to read it. It's already open for writing and reading, all you need to do is seek to the start of the file before reading. This way you don't have to do the flush (because you're actually reading from the buffer)...
# starting partway into your code...
num_bytes_written = file.write(uri.read)
puts "Wrote #{num_bytes_written} bytes"
puts "No need to open #{file.path} >>"
puts "### BEGINNING OF FILE ###"
file.rewind # set the cursor to the start of the buffer
puts file.read # cursor is back at the end of the buffer now
puts "### END OF FILE ###"
Another Update
After a comment from #carp I have adjusted the code above to use rewind instead of seek 0 because it also resets lineno to 0 (and not having that done, if you were using lineno would be very confusing). Also actually it's a more expressive method name.
Always close your files. Try using this style in your code, to avoid such mistakes:
File.open(myfile,"w") {|f| f.puts content }
This way, it will automatically call close when the block ends.

Read binary file as string in Ruby

I need an easy way to take a tar file and convert it into a string (and vice versa). Is there a way to do this in Ruby? My best attempt was this:
file = File.open("path-to-file.tar.gz")
contents = ""
file.each {|line|
contents << line
}
I thought that would be enough to convert it to a string, but then when I try to write it back out like this...
newFile = File.open("test.tar.gz", "w")
newFile.write(contents)
It isn't the same file. Doing ls -l shows the files are of different sizes, although they are pretty close (and opening the file reveals most of the contents intact). Is there a small mistake I'm making or an entirely different (but workable) way to accomplish this?
First, you should open the file as a binary file. Then you can read the entire file in, in one command.
file = File.open("path-to-file.tar.gz", "rb")
contents = file.read
That will get you the entire file in a string.
After that, you probably want to file.close. If you don’t do that, file won’t be closed until it is garbage-collected, so it would be a slight waste of system resources while it is open.
If you need binary mode, you'll need to do it the hard way:
s = File.open(filename, 'rb') { |f| f.read }
If not, shorter and sweeter is:
s = IO.read(filename)
To avoid leaving the file open, it is best to pass a block to File.open. This way, the file will be closed after the block executes.
contents = File.open('path-to-file.tar.gz', 'rb') { |f| f.read }
how about some open/close safety.
string = File.open('file.txt', 'rb') { |file| file.read }
Ruby have binary reading
data = IO.binread(path/filaname)
or if less than Ruby 1.9.2
data = IO.read(path/file)
on os x these are the same for me... could this maybe be extra "\r" in windows?
in any case you may be better of with:
contents = File.read("e.tgz")
newFile = File.open("ee.tgz", "w")
newFile.write(contents)
You can probably encode the tar file in Base64. Base 64 will give you a pure ASCII representation of the file that you can store in a plain text file. Then you can retrieve the tar file by decoding the text back.
You do something like:
require 'base64'
file_contents = Base64.encode64(tar_file_data)
Have look at the Base64 Rubydocs to get a better idea.
Ruby 1.9+ has IO.binread (see #bardzo's answer) and also supports passing the encoding as an option to IO.read:
Ruby 1.9
data = File.read(name, {:encoding => 'BINARY'})
Ruby 2+
data = File.read(name, encoding: 'BINARY')
(Note in both cases that 'BINARY' is an alias for 'ASCII-8BIT'.)
If you can encode the tar file by Base64 (and storing it in a plain text file) you can use
File.open("my_tar.txt").each {|line| puts line}
or
File.new("name_file.txt", "r").each {|line| puts line}
to print each (text) line in the cmd.

Resources