Ruby: How to read maybe gzipped data from file or STDIN? - ruby

I would like to read data from an input file or STDIN - the input data may be gzipped.
For files this can be done with Zlib::GzipReader like this:
require 'zlib'
ios = File.open(file, mode='r')
begin
ios = Zlib::GzipReader.new(ios)
rescue
ios.rewind
end
ios.each_line { |line| puts line }
However, I fail to get the detection of zipped data from STDIN right:
require 'zlib'
if STDIN.tty?
# do nothing
else
ios = STDIN
begin
ios = Zlib::GzipReader.new(ios)
rescue
ios.rewind
end
end
ios.each_line { |line| puts line }
The above works with gzipped data in STDIN, but regular data results in this:
./test.rb:14:in `rewind': Illegal seek - <STDIN> (Errno::ESPIPE)
from ./test.rb:14:in `rescue in <main>'
from ./test.rb:11:in `<main>'
So, if I cannot rewind STDIN, how do I test if data in STDIN is zipped or not?
Cheers,
Martin

Load data from STDIN into temporary file and only then parse it
require 'tempfile'
tf = Tempfile.new('tmp')
while $stdin.gets do
tf.puts $_
end
tf.rewind

Related

How would you close this file descriptor?

Let's say you have the following code:
from_file, to_file = ARGV
puts "Copying from #{from_file} to #{to_file}"
#in_file = open(from_file)
#indata = in_file.read
indata = open(from_file).read # Combined in_file and indata.
puts "The input file is #{indata.length} bytes long."
puts "Does the output file exist? #{File.exist?(to_file)}"
puts "Ready, hit RETURN to continue or CTRL-C to abort."
$stdin.gets
out_file = open(to_file, 'w')
out_file.write(indata)
puts "Alright, all done."
out_file.close
#in_file.close
How would you close the file descriptor invoked by indata? You will need to close File open, but indata is really a (File open).read.
P.S. Since it's a script, it will be closed automatically upon exit. Let's assume that we're running a general, consistently running backend service. And we don't know whether garbage collector will kick in, so we will need to explicitly close it. What would you do?
If you are just copying the file...
you could just use FileUtils#cp:
FileUtils.cp("from_file", "to_file")
or even shell-out to the operating system and do it with a system command.
Let's suppose you want to do something to the input file before writing it to the output file.
If from_file is not large,...
you could "gulp it" into a string using IO.read:
str = IO.read(from_file)
manipulate str as desired, to obtain new_str, then then blast it to the output file using IO#write:
IO.write("to_file", new_str)
Note that for the class File:
File < IO #=> true # File inherits IO's methods
which is why you often see this written File.read(...) and File.write(...).
If from_file is large, read a line, write a line...
provided the changes to be made are done for each line separately.
f = File.open("to_file", "w") # or File.new("to_file", "w")
IO.foreach("from_file") do |line|
# < modify line to produce new_line >
f.puts new_line
end
f.close
foreach closes "from_file" when it's finished. If f.close is not present, Ruby will close "to_file" when the method containing the code goes out of scope. Still, it's a good idea to close it in case other work is done before the code goes out of scope.
Passing File.open a block is generally a nice way to go about things, so I’ll offer it up as an alternative even if it doesn’t seem to be quite what you asked.
indata = File.open(from_file) do |f|
f.read
end

How to add new line in a file

I want to add newline character below.
But the result is wrong.
Teach me what is wrong.
test.txt(before)
------------------
2014-09
2014-10
2014-11
------------------
test.txt(after)
------------------
2014-09
2014-10
2014-11
------------------
I make a ruby script below, but the result is wrong.
f = File.open("test.txt","r+")
f.each{|line|
if line.include?("2014-10")
f.puts nil
end
}
f.close
the result
------------------
2014-09
2014-10
014-11
------------------
To solve your problem, the easiest way is to create a new file to output your new text into. To do you'll need to open the input file and the output file and iterate each line of the file check the condition and put desired line into the output file.
Example
require 'fileutils'
File.open("text-output.txt", "w") do |output|
File.foreach("text.txt") do |line|
if line.include?("2014-10")
output.puts line + "\n"
else
output.puts line
end
end
end
FileUtils.mv("text-output.txt", "text.txt")
Easy way
File.write(f = "text.txt", File.read(f).gsub(/2014-10/,"2014-10\n"))
Reading and writing a file at the same time can get messy, same thing with other data structures like arrays. You should build a new file as you go along.
Some notes:
you should use the block form of File.open because it will stop you from forgetting to call f.close
puts nil is the same as puts without arguments
single quotes are preferred over double quotes when you don’t need string interpolation
you should use do ... end instead of { ... } for multi-line blocks
File.open(...).each can be replaced with File.foreach
the intermediate result can be stored in a StringIO object which will respond to puts etc.
Example:
require 'stringio'
file = 'test.txt'
output = StringIO.new
File.foreach(file) do |line|
if line.include? '2014-10'
output.puts
else
output << line
end
end
output.rewind
File.open(file, 'w') do |f|
f.write output.read
end

Read compressed csv file on-the-fly

I have wrote some csv file and compress it, using this code:
arr = (0...2**16).to_a
File.open('file.bz2', 'wb') do |f|
writer = Bzip2::Writer.new f
CSV(writer) do |csv|
(2**16).times { csv << arr }
end
writer.close
end
I want to read this csv bzip2ed file (csv files compressed with bzip2). These files uncompressed look like:
1,2
4,12
5,2
8,7
1,3
...
So I tried this code:
Bzip2::Reader.open(filename) do |bzip2|
CSV.foreach(bzip2) do |row|
puts row.inspect
end
end
but when it is executed, it throws:
/Users/foo/.rvm/rubies/ruby-2.1.0/lib/ruby/2.1.0/csv.rb:1256:in `initialize': no implicit conversion of Bzip2::Reader into String (TypeError)
from /Users/foo/.rvm/rubies/ruby-2.1.0/lib/ruby/2.1.0/csv.rb:1256:in `open'
from /Users/foo/.rvm/rubies/ruby-2.1.0/lib/ruby/2.1.0/csv.rb:1256:in `open'
from /Users/foo/.rvm/rubies/ruby-2.1.0/lib/ruby/2.1.0/csv.rb:1121:in `foreach'
from worm_pathfinder_solver.rb:79:in `block in <main>'
from worm_pathfinder_solver.rb:77:in `open'
from worm_pathfinder_solver.rb:77:in `<main>'
Question:
What is wrong?
How should I do?
CSV.foreach assumes you're passing a file path to open. If you want to pass a stream to CSV you need to be more explicit and use CSV.new. This code will process a gzipped file:
Zlib::GzipReader.open(filename) do |gzip|
csv = CSV.new(gzip)
csv.each do |row|
puts row.inspect
end
end
Based on the brief docs you'll probably need send the read method on bzip2 object (not tested):
Bzip2::Reader.open(filename) do |bzip2|
CSV.foreach(bzip2.read) do |row|
# ^^^^
puts row.inspect
end
end
My guess would be that CSV tries to convert the Bzip2::Reader to a string but doesn't know how and simply throws the exception. You can manually read the data into a string and then pass THAT to CSV.
Though it's strange since it could handle Bzip2::Writer just fine.

Writing to a file then trying to open it again for parsing

I'm trying to save the xml feed of a twitter user to a file and then try to read it again for parsing onto the screen.
This s what I see hen I try to run it..
Wrote to file #<File:0x000001019257c8>
Now parsing user info..
twitter_stats.rb:20:in `<main>': undefined method `read' for "keva161.txt":String (NoMethodError)
Here's my code...
require "open-uri"
require "rubygems"
require "crack"
twitter_url = "http://api.twitter.com/1/statuses/user_timeline.xml?cout=100&screen_name="
username = "keva161"
full_page = twitter_url + username
local_file = username + ".txt"
tweets = open(full_page).read
my_local_file = open(local_file, "w")
my_local_file.write(tweets)
puts "Wrote to file " + my_local_file.to_s
sleep(1)
puts "Now parsing user info.."
sleep(1)
parsed_xml = Crack::XML.parse(local_file.read)
tweets = parsed_xml["statuses"]
first_tweet = tweets[0]
user = first_tweets["user"]
puts user["screen_name"]
puts user ["name"]
puts users ["created_at"]
puts users ["statuses_count"]
You are calling read on local_file, which is the string containing the filename. You meant to type my_local_file.read, I guess, to use the IO object you got from open. (...or File.read local_file.)
Not that this is the best form: why are you writing to a temporary file anyhow? You have the data in memory, so just pass it directly.
If you do want to write to a local file, I commend the block from of open:
open(local_file, 'w') do |fh|
fh.print ...
end
That way Ruby will take care of closing the file for you and all that.

How can I copy the contents of one file to another using Ruby's file methods?

I want to copy the contents of one file to another using Ruby's file methods.
How can I do it using a simple Ruby program using file methods?
There is a very handy method for this - the IO#copy_stream method - see the output of ri copy_stream
Example usage:
File.open('src.txt') do |f|
f.puts 'Some text'
end
IO.copy_stream('src.txt', 'dest.txt')
For those that are interested, here's a variation of the IO#copy_stream, File#open + block answer(s) (written against ruby 2.2.x, 3 years too late).
copy = Tempfile.new
File.open(file, 'rb') do |input_stream|
File.open(copy, 'wb') do |output_stream|
IO.copy_stream(input_stream, output_stream)
end
end
As a precaution I would recommend using buffer unless you can guarantee whole file always fits into memory:
File.open("source", "rb") do |input|
File.open("target", "wb") do |output|
while buff = input.read(4096)
output.write(buff)
end
end
end
Here my implementation
class File
def self.copy(source, target)
File.open(source, 'rb') do |infile|
File.open(target, 'wb') do |outfile2|
while buffer = infile.read(4096)
outfile2 << buffer
end
end
end
end
end
Usage:
File.copy sourcepath, targetpath
Here is a simple way of doing that using ruby file operation methods :
source_file, destination_file = ARGV
script = $0
input = File.open(source_file)
data_to_copy = input.read() # gather the data using read() method
puts "The source file is #{data_to_copy.length} bytes long"
output = File.open(destination_file, 'w')
output.write(data_to_copy) # write up the data using write() method
puts "File has been copied"
output.close()
input.close()
You can also use File.exists? to check if the file exists or not. This would return a boolean true if it does!!
Here's a fast and concise way to do it.
# Open first file, read it, store it, then close it
input = File.open(ARGV[0]) {|f| f.read() }
# Open second file, write to it, then close it
output = File.open(ARGV[1], 'w') {|f| f.write(input) }
An example for running this would be.
$ ruby this_script.rb from_file.txt to_file.txt
This runs this_script.rb and takes in two arguments through the command-line. The first one in our case is from_file.txt (text being copied from) and the second argument second_file.txt (text being copied to).
You can also use File.binread and File.binwrite if you wish to hold onto the file contents for a bit. (Other answers use an instant copy_stream instead.)
If the contents are other than plain text files, such as images, using basic File.read and File.write won't work.
temp_image = Tempfile.new('image.jpg')
actual_img = IO.binread('image.jpg')
IO.binwrite(temp_image, actual_img)
Source: binread,
binwrite.

Resources