Read from a tar.gz file without saving the unpacked version - ruby

I have a tar.gz file saved on disk and I want to leave it packed there, but I need to open one file within the archive, read from it and save some information somewhere.
File structure:
base_folder
file_i_need.txt
other_folder
other_file
code (it is not much - I tried 10mio different ways and this is what is left)
def self.open_file(file)
uncompressed_file = Gem::Package::TarReader.new(Zlib::GzipReader.open(file))
uncompressed_file.rewind
end
When I run it in a console I get
<Gem::Package::TarReader:0x007fbaac178090>
and I can run commands on the entries. I just haven't figured out how to open an entry and read from it without saving it unpacked to disk. I mainly need the string from the text file.
Any help appreciated. I might just be missing something...

TarReader is Enumerable, returning Entry.
That said, to retrieve the text content from the file by it’s name one might
uncompressed = Gem::Package::TarReader.new(Zlib::GzipReader.open(file))
text = uncompressed.detect do |f|
f.fullname == 'base_folder/file_i_need.txt'
end.read
#⇒ Hello, I’m content of the text file, located inside gzipped tar
Hope it helps.

Related

Ruby Dropbox APP: How to download a word document

I'm having troubles trying to download word documents from a dropbox using an APP controlled by a ruby program. (I would like to have the ability to download any file from a dropbox).
The code they provide is great for "downloading" a .txt file, but if you try using the same code to download a .docx file, the "downloaded" file won't open in word due to "corruption."
The code I'm using:
contents = #client.get_file(path + filename)
open(filename, 'w') {|f| f.puts contents }
For variable examples, path could be '/', and filename could be 'aFile.docx'. This works, but the file, aFile.docx, that is created can not be opened. I am aware that this is simply grabbing the contents of the file and then creating a new file and inserting the contents.
Try this:
open(filename, 'wb') { |f| f.write contents }
Two changes from your code:
I used the file mode wb to specify that I'm going to write binary data. I don't think this makes a difference on Linux and OS X, but it matters on Windows.
I used write instead of puts. I believe puts expects a string, while you're trying to write arbitrary binary data. I assume this is the source of the "corruption."

How to check the content of each .txt file in a folder with Ruby

I have a folder that contains files. I was wondering how I can chech every .txt file in the folder if it contains the word "BREAK". I know it must be very easy but I kinda miss the way of getting it done.
This is what I've tried so far
Dir.glob('/path/to/dir/*.txt') do |txt_file|
# And here I need a method that opens the 'txt_file'
# and checks if it contains "BREAK"
end
The below would return an array of files containing "BREAK"
files = Dir.glob('/path/to/dir/*.txt').select do |txt_file|
File.read(txt_file).include? "BREAK"
end

Save twitter search result to JSON file

I am using twitter ruby gem to fetch twitter search result. The example code from Github extracts the information from search result.I am wondering how to save the search result, which is JSON i think, to a separate JSON file.
Here is part of the example code:
results = #search.perform("$aaa", 1000)
aFile = File.new("data.txt", "w")
results.map do |status|
myStr="#{status.from_user}: #{status.text} #{status.created_at}"
aFile.write(myStr)
aFile.write("\n")
end
Is there any way to save all the search result to a separate JSON file instead of writing strings to a file?
Thanks in advance.
If you want to save to a file all you need to do is open the file, write it it, then close it:
File.open("myFileName.txt", "a") do |mFile|
mFile.syswrite("Your content here")
mFile.close
end
When you use open you will create the file if it doesn't exist.
One thing to be aware of is that there are different ways to open file, of which will determine where the program writes to. The "a" indicates that it will append everything you write to the file, to the end of the current content.
Here is some of the options:
r Read-only mode. The file pointer is placed at the beginning of the file. This is the default mode.
r+ Read-write mode. The file pointer will be at the beginning of the file.
w Write-only mode. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.
w+ Read-write mode. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.
a Write-only mode. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.
a+ Read and write mode. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.
So in your case, you would want to pull out the data you want to save, then write it to a file as I have shown. You can also specify file paths by doing:
File.open("/the/path/to/yourfile/myFileName.txt", "a") do |mFile|
mFile.syswrite("Your content here")
mFile.close
end
Another thing to be aware of is that open does not create directories, so you will either need to create directories yourself, or you can do it with your program. Here is a link that is helpful for file input/output:
http://www.tutorialspoint.com/ruby/ruby_input_output.htm

Convert a PDF to .txt gives me an empty .txt file

Hi I'm trying to read a pdf in Ruby, first of all I want to convert it into a txt. path is the path to the PDF, The point is that I get a .txt file empty, and as someone told me is a pdftotext problem, but I don't know how to fix it.
spec = path.sub(/\.pdf$/, '')
`pdftotext #{spec}.pdf`
file = File.new("#{spec}.txt", "w+")
text = []
file.readlines.each do |l|
if l.length > 0
text << l
Rails.logger.info l
end
end
file.close
What's wrong with my code? Thanks!
It's not possible to extract text from every PDF. Some PDF files use a font encoding that makes it impossible to extract text with simple tools such as pdftotext (and some PDF files are even completely immune to direct text extraction with any tool known to me -- in these cases you'll have to apply OCR first to have a chance to extract text...).
So if you test your code with the same "weird" PDF file all the time, it may well happen that you're getting frustrated over your code while in reality the fault lies with the PDF.
First make sure that the commandline usage of pdftotxt works well with a given PDF, then test (and develop further) your code with that PDF.
The problem is you are opening the file in write ("w") mode, whuch truncates the file. You can see a table of file modes and what they mean at http://ruby-doc.org/core-1.9.3/IO.html.
Try something like this, it uses a pdftotext option to send the text to stdout to avoid creating a temporary file and uses blocks for more idiomatic ruby.
text = `pdftotext #{path} -`
text.split.select { |line|
line.length > 0
}.each { |line|
Rails.logger.info(line)
}
You would need to open the txt file with write permission.
file = File.new("#{spec}.txt", "w")
You could consult How to create a file in Ruby
Update: your code is not complete and looks buggy.
Cant say what is path
Looks like you are trying to read the text file to which you intend to write file.readlines.each
spell check length you have it l.lenght
You may want to paste the actual code.
Check this gist https://gist.github.com/4160587
As mentioned, your code is not working because you are reading and writing to the same file.
Example
Ruby code file_write.rb to do the file write operation
pdf_file = File.open("in.txt")
output_file = File.open("out.txt", "w") # file to which you want to write
#iterate over input file and write the content to output file
pdf_file.readlines.each do |l|
output_file.puts(l)
end
output_file.close
pdf_file.close
Sample txt file in.txt
Some text in file
Another line of text
1. Line 1
2. Not really line 2
Once your run file_write.rb you should see new file called out.txt with same content as in.txt You could change the content of input file if you want. In your case you would use pdf reader to get the content and write it to the text file. Basically first line of the code will change.

How can I modify .xfdl files? (Update #1)

The .XFDL file extension identifies XFDL Formatted Document files. These belong to the XML-based document and template formatting standard. This format is exactly like the XML file format however, contains a level of encryption for use in secure communications.
I know how to view XFDL files using a file viewer I found here. I can also modify and save these files by doing File:Save/Save As. I'd like, however, to modify these files on the fly. Any suggestions? Is this even possible?
Update #1: I have now successfully decoded and unziped a .xfdl into an XML file which I can then edit. Now, I am looking for a way to re-encode the modified XML file back into base64-gzip (using Ruby or the command line)
If the encoding is base64 then this is the solution I've stumbled upon on the web:
"Decoding XDFL files saved with 'encoding=base64'.
Files saved with:
application/vnd.xfdl;content-encoding="base64-gzip"
are simple base64-encoded gzip files. They can be easily restored to XML by first decoding and then unzipping them. This can be done as follows on Ubuntu:
sudo apt-get install uudeview
uudeview -i yourform.xfdl
gunzip -S "" < UNKNOWN.001 > yourform-unpacked.xfdl
The first command will install uudeview, a package that can decode base64, among others. You can skip this step once it is installed.
Assuming your form is saved as 'yourform.xfdl', the uudeview command will decode the contents as 'UNKNOWN.001', since the xfdl file doesn't contain a file name. The '-i' option makes uudeview uninteractive, remove that option for more control.
The last command gunzips the decoded file into a file named 'yourform-unpacked.xfdl'.
Another possible solution - here
Side Note: Block quoted < code > doesn't work for long strings of code
The only answer I can think of right now is - read the manual for uudeview.
As much as I would like to help you, I am not an expert in this area, so you'll have to wait for someone more knowledgable to come down here and help you.
Meanwhile I can give you links to some documents that might help you:
UUDeview Home Page
Using XDFLengine
Gettting started with the XDFL Engine
Sorry if this doesn't help you.
You don't have to get out of Ruby to do this, can use the Base64 module in Ruby to encode the document like this:
irb(main):005:0> require 'base64'
=> true
irb(main):007:0> Base64.encode64("Hello World")
=> "SGVsbG8gV29ybGQ=\n"
irb(main):008:0> Base64.decode64("SGVsbG8gV29ybGQ=\n")
=> "Hello World"
And you can call gzip/gunzip using Kernel#system:
system("gzip foo.something")
system("gunzip foo.something.gz")

Resources