Zip::ZipFile: How to modify contents of inner textfiles without unpacking zip? - ruby

Cheers,
as a beginner to ruby, I am currently in the process of solving my smaller-world problems with ruby, to get accustomed to it. Right now I am trying to modify the contents of a text file within a zip container.
the Structure is
ZIP
>> diretory/
>> mytext.text
And I am able to iterate over the contents
Zip::ZipFile.open(file_path) do |zipfile|
files = zipfile.select(&:file?)
files.each do |zip_entry|
## ....?
end
end
...but I find it very difficult to modify the text file without unpacking it.
Any help appreciated!

So with the help of Ben, here's one solution:
require "rubygems"
require "zip/zip"
zip_file_name = "src/test.zip"
Zip::ZipFile.open(zip_file_name) do |zipfile|
files = zipfile.select(&:file?)
files.each do |zip_entry|
contents = zipfile.read(zip_entry.name)
zipfile.get_output_stream(zip_entry.name){ |f| f.puts contents + ' added some text' }
end
zipfile.commit
end
I though I had tried this before - anyways. Thanks a lot!

This snip bit adds " added some text" to the end of myFile.txt.
Zip::File.open(file_path) do |zipfile|
contents = zipfile.read('myFile.txt')
zipfile.get_output_stream('myFile.txt') { |f| f.puts contents + ' added some text' }
end
For some reason, the modifications to the zip file aren't saved if the writing (the call to get_output_stream) is done while using each to iterate over the archive's files.
Edit: To modify files while iterating over them via each, open the archive with Zip::ZipFile.open (see Chris's answer for an example).
Hopefully, this snip bit will help point you in the right direction.

Related

Opening a Text File in Ruby

I am trying to create a program that will count the word frequency within a text file that I have created. I have a text file titled moms_letter.txt and this is my code:
word_count = {}
File.open("moms_letter.txt", "r") do |f|
f.each_line do |line|
words = line.split(' ').each do |word|
word_count[word] += 1 if word_count.has_key? word
word_count[word] = 1 if not word_count.has_key? word
end
end
end
puts word_count
The problem I am getting is when I go to run the file, I get the error:
there is no such file or directory - moms_letter.txt (Errno: : ENOENT)
Not quite sure why this is occurring when I have the text file created.
Any help is appreciated.
I am also newbie in Ruby, so thanks for the patience.
You must be executing your program from outside the directory where your moms_letter.txt file resides. You need to use an absolute path to open your file. Or, execute your program always from the directory where the .txt is. So, instead of using "moms_letter.txt" go with "complete/path/to/file/moms_letter.txt".
I'm fairly new to Ruby too, but have worked with text files a bit recently. It may seem like an obvious question, but is the text file you're trying to open in the same directory as your .rb file? Otherwise you'll need to include the relative path to it.
For troubleshooting sake, try File.new("temp.txt", "w") and then File.open("temp.txt", "r") to see if that works. Then you'll know if it's an issue with your code or with the txt file you're trying to access.
Also using File.exists?("moms_letter.txt") will help you determine whether you can access that file from within your .rb script.
Hope that helps!

Find every file in a directory and render them all using rdiscount

I've got a directory called "posts" which is filled with .md files. Right now rdiscount renders only one file (one.md), then puts the product into a variable (#content). Because this is done issuing...
#content = markdown(:one)
...I'm really confused as to how to make ruby 1) find every file in the directory and 2) render everything using rdiscount. Any ideas?
You can use Dir.glob to find and iterate all the Markdown files in the directory.
Dir.glob("path/to/folder/*.md") do |file|
# do what you want with file
end
To extend #Simone Carletti's answer, in order to answer part 2 of your question:
#content = ""
Dir.glob("path/to/folder/*.md") do |file|
#content << markdown(file)
end

Ruby - Reading and editing XML file

I am writing a Ruby (1.9.3) script that reads XML files from a folder and then edit it if necessary.
My issue is that I was given XML files converted by Tidy but its ouput is a little strange, fo example:
<?xml version="1.0" encoding="utf-8"?>
<XML>
<item>
<ID>000001</ID>
<YEAR>2013</YEAR>
<SUPPLIER>Supplier name test,
Coproration</SUPPLIER>
...
As you can see the has and extra CRLF. I dont know why it has this behaviour but I am addressing it with a ruby script. But am having trouble as I need to see either if the last character of the line is ">" or if the first is "<" so that I can see if there is something wrong with the markup.
I have tried:
Dir.glob("C:/testing/corrected/*.xml").each do |file|
puts file
File.open(file, 'r+').each_with_index do |line, index|
first_char = line[0,1]
if first_char != "<"
//copy this line to the previous line and delete this one?
end
end
end
I also feel like I should be copying the original file content as I read it to another temporary file and then overwrite. Is that the best "way"? Any tips are welcome as I do not have much experience in altering a files content.
Regards
Does that extra \n always appear in the <SUPPLIER> node? As others have suggested, Nokogiri is a great choice for parsing XML (or HTML). You could iterate through each <SUPPLIER> node and remove the \n character, then save the XML as a new file.
require 'nokogiri'
# read and parse the old file
file = File.read("old.xml")
xml = Nokogiri::XML(file)
# replace \n and any additional whitespace with a space
xml.xpath("//SUPPLIER").each do |node|
node.content = node.content.gsub(/\n\s+/, " ")
end
# save the output into a new file
File.open("new.xml", "w") do |f|
f.write xml.to_xml
end

Trouble conceptualizing how to have LDA-Ruby read multiple .txt files

I am attempting to write a Ruby script that will look at a collection of unstructured plain text files and I am struggling with thinking through the best way to process these files. The current working version of my script for topic modeling is the following:
#!/usr/bin/env ruby -w
require 'rubygems'
require 'lda-ruby'
# Input a directory of files
FILES_DIRECTORY = ARGV[0]
File.open("files.csv", "w") do |f|
Dir.glob(FILES_DIRECTORY + "*.txt") do |filename|
file_id = File.basename(filename).gsub(".txt", "")
text = File.read(filename).clean
f.puts [file_id, text].join(",")
end
end
# Read csv
file = File.open("files.csv", "r") { |f| f.read }
# Train topics and infer
corpus = Lda::Corpus.new
corpus.add_document(Lda::TextDocument.new(corpus, file))
lda = Lda::Lda.new(corpus)
lda.verbose = false
lda.num_topics = 20
lda.em('random')
topics = lda.top_words(10)
puts topics
What I'm attempting to modify is having this program read through a collection of plain text files rather than a single file. It's not as easy as just tossing all the text files into a single file (as it currently does with files.csv) because, as I understand it, lda-ruby looks for multiple files to do a correct topic model rather than a single file. (I've come to this conclusion because there is little variance between having this script read a single text file [e.g., corpus.txt] that includes all the text, and the files.csv file.)
So, my question is how can I have lda-ruby iterate through these text files differently? Should the contents of the files be placed into a hash instead? If so, any pointers on where I should start with that? Or, should I scrap this and use a different LDA library?
Thanks ahead of time for any advice.
Basically, you just need to initialize the corpus before going through the directory and then add each file to the corpus in the block the same way you were previously adding your CSV file.
#!/usr/bin/env ruby -w
require 'rubygems'
require 'lda-ruby'
# Input a directory of files
FILES_DIRECTORY = ARGV[0]
corpus = Lda::Corpus.new
File.open("files.csv", "w") do |f|
Dir.glob(FILES_DIRECTORY + "*.txt") do |filename|
file = File.open(filename, "r") { |f| f.read }
corpus.add_document(Lda::TextDocument.new(corpus, file))
end
end
lda = Lda::Lda.new(corpus)
lda.verbose = false
lda.num_topics = 20
lda.em('random')
topics = lda.top_words(10)
puts topics
I know this is a rather old question, but I found this question while looking for a solution to a similar problem. Your code helped me so I thought my answer might be helpful to you or others.
If you have a directory of text files you want to use as documents, you can use the following line to create your corpus:
corpus = Lda::DirectoryCorpus.new('path/to/directory')

My file is getting shorter and I don't know why

I have a requirement where I need to edit part of xml file and save it, but in my code some part of the xml file it not saving.I want to modify <mtn:ttl>4</mtn:ttl> to <mtn:ttl>9</mtn:ttl>, this part is getting modified in the below code but while writting/saving only part of file is getting chaged or the format of the file is getting chaged, can any one tell me how to solve this? original xml file size is 79kb but after editing and saving its becoming 78kb...
require "rexml/text"
require "rexml/document"
include REXML
File.open("c://conf//cad-mtn-config.xml") do |config_file|
# Open the document and edit the file
config = Document.new(config_file)
if testField.to_s.match(/<mtn:ttl>/)
config.root.elements[4].elements[11].elements[1].elements[1].elements[1].elements[8].text="9"
# Write the result to a new file.
formatter = REXML::Formatters::Default.new
File.open("c://mtn-3//mtn-2.2//conf//cad-mtn-config.xml", 'w') do |result|
formatter.write(config, result)
end
end
end
It looks like your trying to use regular expressions, why not just use rexml? The only requirement is that you need to know where the namespace is located online. Note if it were not mtn:ttl and just ttl you would not need the namespace.
require 'rexml/document'
file_path="path to file"
contents=File.new(file_path).read
xml_doc=REXML::Document.new(contents)
xml_doc.add_namespace('mtn',"http://url to mtn namespace")
xml_doc.root.elements.each('mtn:ttl') do |element|
element.text="9"
end
File.open(file_path,"w") do |data|
data<<xml_doc
end

Resources