How to decompress Gzip string in ruby? - ruby

Zlib::GzipReader can take "an IO, or IO-like, object." as it's input, as stated in docs.
Zlib::GzipReader.open('hoge.gz') {|gz|
print gz.read
}
File.open('hoge.gz') do |f|
gz = Zlib::GzipReader.new(f)
print gz.read
gz.close
end
How should I ungzip a string?

The above method didn't work for me.
I kept getting incorrect header check (Zlib::DataError) error. Apparently it assumes you have a header by default, which may not always be the case.
The work around that I implemented was:
require 'zlib'
require 'stringio'
gz = Zlib::GzipReader.new(StringIO.new(resp.body.to_s))
uncompressed_string = gz.read

Zlib by default asumes that your compressed data contains a header.
If your data does NOT contain a header it will fail by raising a Zlib::DataError.
You can tell Zlib to assume the data has no header via the following workaround:
def inflate(string)
zstream = Zlib::Inflate.new(-Zlib::MAX_WBITS)
buf = zstream.inflate(string)
zstream.finish
zstream.close
buf
end

You need Zlib::Inflate for decompression of a string and Zlib::Deflate for compression
def inflate(string)
zstream = Zlib::Inflate.new
buf = zstream.inflate(string)
zstream.finish
zstream.close
buf
end

In Rails you can use:
ActiveSupport::Gzip.compress("my string")
ActiveSupport::Gzip.decompress().

zstream = Zlib::Inflate.new(16+Zlib::MAX_WBITS)

Using (-Zlib::MAX_WBITS), I got ERROR: invalid code lengths set and ERROR: invalid block type
The only following works for me, too.
Zlib::GzipReader.new(StringIO.new(response_body)).read

I used the answer above to use a Zlib::Deflate
I kept getting broken files (for small files) and it took many hours to figure out that the problem can be fixed using:
buf = zstream.deflate(string,Zlib::FINISH)
without the the zstream.finish line!
def self.deflate(string)
zstream = Zlib::Deflate.new
buf = zstream.deflate(string,Zlib::FINISH)
zstream.close
buf
end

To gunzip content, use following code (tested on 1.9.2)
Zlib::GzipReader.new(StringIO.new(content), :external_encoding => content.encoding).read
Beware of encoding problems

We don't need any extra parameters these days. There are deflate and inflate class methods which allow for quick oneliners like these:
>> data = "Hello, Zlib!"
>> compressed = Zlib::Deflate.deflate(data)
=> "x\234\363H\315\311\311\327Q\210\312\311LR\004\000\032\305\003\363"
>> uncompressed = Zlib::Inflate.inflate(compressed)
=> "Hello, Zlib!"
I think it answers the question "How should I ungzip a string?" the best. :)

Related

Using binary data (strings in utf-8) from external file

I have problem with using strings in UTF-8 format, e.g. "\u0161\u010D\u0159\u017E\u00FD".
When such string is defined as variable in my program it works fine. But when I use such string by reading it from some external file I get the wrong output (I don't get what I want/expect). Definitely I'm missing some necessary encoding stuff...
My code:
file = "c:\\...\\vlmList_unicode.txt" #\u306b\u3064\u3044\u3066
data = File.open(file, 'rb') { |io| io.read.split(/\t/) }
puts data
data_var = "\u306b\u3064\u3044\u3066"
puts data_var
Output:
\u306b\u3064\u3044\u3066 # what I don't want
について # what I want
I'm trying to read the file in binary form by specifying 'rb' but obviously there is some other problem...
I run my code in Netbeans 7.3.1 with build in JRuby 1.7.3 (I tried also Ruby 2.0.0 but without any effect.)
Since I'm new in ruby world any ideas are welcomed...
If your file contains the literal escaped string:
\u306b\u3064\u3044\u3066
Then you will need to unescape it after reading. Ruby does this for you with string literals, which is why the second case worked for you. Taken from the answer to "Is this the best way to unescape unicode escape sequences in Ruby?", you can use this:
file = "c:\\...\\vlmList_unicode.txt" #\u306b\u3064\u3044\u3066
data = File.open(file, 'rb') { |io|
contents = io.read.gsub(/\\u([\da-fA-F]{4})/) { |m|
[$1].pack("H*").unpack("n*").pack("U*")
}
contents.split(/\t/)
}
Alternatively, if you will like to make it more readable, extract the substitution into a new method, and add it to the String class:
class String
def unescape_unicode
self.gsub(/\\u([\da-fA-F]{4})/) { |m|
[$1].pack("H*").unpack("n*").pack("U*")
}
end
end
Then you can call:
file = "c:\\...\\vlmList_unicode.txt" #\u306b\u3064\u3044\u3066
data = File.open(file, 'rb') { |io|
io.read.unescape_unicode.split(/\t/)
}
Just as a FYI:
data = File.open(file, 'rb') { |io| io.read.split(/\t/) }
Can be written more simply as one of these:
data = File.read(file, 'rb').split(/\t/)
data = File.readlines(file, "\t", 'mode' => 'rb')
(Remember that File inherits from IO, which is where these methods are defined, so look in IO for documentation on them.)
readlines takes a "separator" parameter, which in the example above is "\t". Ruby will substitute it for the usual "\n" on *nix or Mac OS, or "\r\n" on Windows, so records will be retrieved using the tab-delimiter.
This makes me wonder a bit why you'd want to do that though? I've never seen tabs as record delimiters, only column/field delimiters in "TSV" (Tab-Seperated-Value) files. So that leads me to think you should probably be using Ruby's CSV class, with a "\t" as the column-separator. But, without samples of the actual file you're reading I can't say for sure.

Having trouble saving to file in Ruby

Hi I have a simple form that allows a user to input a name, their gender and a password. I use Digest::MD5.hexdigest to encrypt the input. Once I have the encrypted input eg, d1c261ede46c1c66b7e873564291ebdc, I want to be able to append this to a file I have already created. However every thing I have tried just isn't working. Can anyone please help and thank you in advance. Here is what I have:
input = STDIN.read( ENV["CONTENT_LENGHT"] )
puts "Content-type: text/html \n\n"
require 'digest/md5'
digest = Digest::MD5.hexdigest(input)
f = File.open("register.txt", "a")
f.write(digest)
f.close
I have also tried this with no luck:
File.open("register.txt", "a") do |f|
f.puts(digest)
end
If the code is verbatim then I think you have a typo in the first line: did you mean CONTENT_LENGHT or is it a typo? ENV[] will return a string if the variable is set, which will upset STDIN#read. I get TypeError: can't convert String into Integer. Assuming the typo, then ENV[] returns nil, which tells STDIN#read to read until EOF, which from the console means, I think, Control-Z. That might be causing a problem.
I suggest you investigate by modifying your script thus:
read_length = ENV["CONTENT_LENGTH"].to_i # assumed typo fixed, convert to integer
puts "read length = #{read_length}"
input = STDIN.read( read_length )
puts "input = #{input}"
puts "Content-type: text/html \n\n" # this seems to serve no purpose
require 'digest/md5'
digest = Digest::MD5.hexdigest(input)
puts "digest = #{digest}"
# prefer this version: it's more idiomatically "Rubyish"
File.open("register.txt", "a") do |f|
puts "file opened"
f.puts(digest)
end
file_content = File.read("register.txt")
puts "done, file content = #{file_content}"
This works on my machine, with the following output (when CONTENT_LENGTH set to 12):
read length = 12
abcdefghijkl
input = abcdefghijkl
Content-type: text/html
digest = 9fc9d606912030dca86582ed62595cf7
file opened
done, file content = 6cfbc6ae37c91b4faf7310fbc2b7d5e8
e271dc47fa80ddc9e6590042ad9ed2b7
b0fb8772912c4ac0f13525409c2b224e
9fc9d606912030dca86582ed62595cf7

How can I copy the contents of one file to another using Ruby's file methods?

I want to copy the contents of one file to another using Ruby's file methods.
How can I do it using a simple Ruby program using file methods?
There is a very handy method for this - the IO#copy_stream method - see the output of ri copy_stream
Example usage:
File.open('src.txt') do |f|
f.puts 'Some text'
end
IO.copy_stream('src.txt', 'dest.txt')
For those that are interested, here's a variation of the IO#copy_stream, File#open + block answer(s) (written against ruby 2.2.x, 3 years too late).
copy = Tempfile.new
File.open(file, 'rb') do |input_stream|
File.open(copy, 'wb') do |output_stream|
IO.copy_stream(input_stream, output_stream)
end
end
As a precaution I would recommend using buffer unless you can guarantee whole file always fits into memory:
File.open("source", "rb") do |input|
File.open("target", "wb") do |output|
while buff = input.read(4096)
output.write(buff)
end
end
end
Here my implementation
class File
def self.copy(source, target)
File.open(source, 'rb') do |infile|
File.open(target, 'wb') do |outfile2|
while buffer = infile.read(4096)
outfile2 << buffer
end
end
end
end
end
Usage:
File.copy sourcepath, targetpath
Here is a simple way of doing that using ruby file operation methods :
source_file, destination_file = ARGV
script = $0
input = File.open(source_file)
data_to_copy = input.read() # gather the data using read() method
puts "The source file is #{data_to_copy.length} bytes long"
output = File.open(destination_file, 'w')
output.write(data_to_copy) # write up the data using write() method
puts "File has been copied"
output.close()
input.close()
You can also use File.exists? to check if the file exists or not. This would return a boolean true if it does!!
Here's a fast and concise way to do it.
# Open first file, read it, store it, then close it
input = File.open(ARGV[0]) {|f| f.read() }
# Open second file, write to it, then close it
output = File.open(ARGV[1], 'w') {|f| f.write(input) }
An example for running this would be.
$ ruby this_script.rb from_file.txt to_file.txt
This runs this_script.rb and takes in two arguments through the command-line. The first one in our case is from_file.txt (text being copied from) and the second argument second_file.txt (text being copied to).
You can also use File.binread and File.binwrite if you wish to hold onto the file contents for a bit. (Other answers use an instant copy_stream instead.)
If the contents are other than plain text files, such as images, using basic File.read and File.write won't work.
temp_image = Tempfile.new('image.jpg')
actual_img = IO.binread('image.jpg')
IO.binwrite(temp_image, actual_img)
Source: binread,
binwrite.

Converting python script to ruby (downloading part of a file)

I've been at this for a couple of day, and am having no luck at all. Despite reading over these two posts, I can't seem to rewrite this little python script I did up in ruby.
clean_link = link['href'].replace(' ', '%20')
mp3file = urllib2.urlopen(clean_link)
output = open('temp.mp3','wb')
output.write(mp3file.read(2000))
output.close()
I've been looking at using open-uri and net/http to do the same in ruby, but keep hitting a url redirect issue. So far I have
clean_link = link.attributes['href'].gsub(' ', '%20')
link_pieces = clean_link.scan(/http:\/\/(?:www\.)?([^\/]+?)(\/.*?\.mp3)/)
host = link_pieces[0][0]
path = link_pieces[0][1]
Net::HTTP.start(host) do |http|
resp = http.get(path)
open("temp.mp3", "wb") do |file|
file.write(resp.body)
end
end
Is there a simpler way to do this in ruby? Also, as with the python script, is there a way to only download part of the file?
EDIT: progress updated
see here & here
http.request_get('/index.html') {|res|
size = 0
res.read_body do |chunk|
size += chunk.size
# do some processing
break if size >= 2000
end
}
but you can't control chunk sizes here

Download image with Ruby RIO gem

My code:
require 'rio'
rio('nice.jpg') < rio('http://farm4.static.flickr.com/3134/3160515898_59354c9733.jpg?v=0')
But the image downloaded is currupted. Whtat is wrong with this solution?
pjb3 is correct. You must call binmode on the left-hand term:
rio('nice.jpg').binmode < rio('http://...')
If this still does not work (notably, it may happen for large jpeg files, i.e. rio uses an intermediate temp file when retrieving from the URL you have provided), then apply the binmode modifier to both terms:
rio('nice.jpg').binmode < rio('http://...').binmode
2011 UPDATE
According to Luke C., the above answer no longer applies to more recent versions of the gem:
Neither of these work. On Linux having .binmode set on the destination causes a Errno::ENOENT exception. Doing: rio('nice.jpg') < rio('http://...').binmode works
It works for me. Are you on windows? It might be because the file isn't being opened with the binary flag.
I had similar problems downloading images on Linux, I found that this worked for me:
rio(source_url).binmode > rio(filename)
Here is some simple ruby code to download an image
require 'net/http'
url = URI.parse("http://www.somedomain.com/image.jpg")
Net::HTTP.start(url.host, url.port) do |http|
resp, data = http.get(url.path, nil)
open( File.join(File.dirname(__FILE__), "image.jpg"), "wb" ) { |file| file.write(resp.body) }
end
This can even be extended to follow redirects:
require 'net/http'
url = URI.parse("http://www.somedomain.com/image.jpg")
Net::HTTP.start(url.host, url.port) do |http|
resp, data = http.get(url.path, nil)
prev_redirect = ''
while resp.header['location']
raise "Recursive redirect: #{resp.header['location']}" if prev_redirect == resp.header['location']
prev_redirect = resp.header['location']
url = URI.parse(resp.header['location'])
host = url.host if url.host
port = url.port if url.port
http = Net::HTTP.new(host, port)
resp, data = http.get(url.path, nil)
end
open( File.join(File.dirname(__FILE__), "image.jpg"), "wb" ) { |file| file.write(resp.body) }
end
It can probably be prettied up some, but it gets the job done, and is not dependent on any 3rd party gems! :)
I guess this is a bug. On windows all 0x0A replaced with 0x0D 0x0A. And as so, it makes sence that properly used (with .binmode) it works on Linux.
For downloading pictures from the web page, you can use ruby gem image_downloader

Resources