Parsing xlsx file in ruby using gem roo - ruby

My HTML code, here am passing xslx file for parsing,
<form method="post" action="/home/parse_xlsx" enctype="multipart/form-data">
Upload XSLX File <input type="file" name="xlsx_file" id="xlsx_file" />
<input type="submit" value="Post"/>
</form>
My Controller code,
def parse_xlsx
xlsxFile = params[:xlsx_file]
prefix_tmp_path = xlsxFile.path
filename = xlsxFile.original_filename
fullname = File.join(prefix_tmp_path,filename)
require 'roo'
s = Roo::Excelx.new(fullname)
for i in 1..14
puts s.cell(i,3)
end
end
Giving me error ,
file /tmp/RackMultipart20130910-10043-u4nqsc/CMS.xlsx does not exist
When I run the following code on console am keeping my 'CMS.xlsx' file in rails root folder & it is running without any errors.
require 'roo'
s = Roo::Excelx.new("CMS.xlsx")
for i in 1..14
puts s.cell(i,3)
end
Please explain where I am going wrong.

xlsxFile.path is the file's location, you shouldn't have to join in the filename. If you need to save the file, you can rename it the original filename when you move it to it's file location
try
s = Roo::Excelx.new(xlsxFile.path)

Related

Parse quoted-printable encoding content from .mht file

I am trying to get all the images from .mht file by using Nokogiri gem. But since the .mht file has quoted-printable encoding, all the images that I received, has weird characters in it:
<img alt='3D"AFC-Logo' src="3D%22https://upload.=" width='3D"75"' height='3D"75"'>
<img src="3D%22https://en.wikipedia.org/static/images/footer/wikimedia-butto=" width='3D"88"' height='3D"31"' alt='3D"Wikimedia'>
<img src="3D%22https://en.wikipedia.org/static/images/footer/poweredby_mediawiki_8=" alt='3D"Powered' width='3D"88"' height='3D"31"'>
This is the link to that .mht file: https://drive.google.com/file/d/1DtbgrFyCEcggAk1nqpZSluNhRt-k3t95/view?usp=sharing
And below is the code that I am using to get all the images from the .mht file:
html = File.open("1646037951.mht").read
image_links = get_image_links(html)
def get_image_links(html)
html_doc = Nokogiri::HTML(html)
nodes = html_doc.xpath("//img[#src]")
raise "No <img .../> tags!" if nodes.empty?
nodes.inject([]) do |uris, node|
puts node.to_s
uris << node.attr('src').strip
end.uniq
end
I have tried to parse it by using .unpack('M').first but it's still not working as it just returns the same result as above.
Or maybe Rails have something for this?

"params" doesn't work in Ruby (Sinatra framework)

I have a simple Sinatra app in which I want to create a form so users can change their number. However, I don't even get as far as changing the number because "params" is not working. Everything is working well. I can see the parameters in the URL but if I print "params" there is nothing but "Echo".
class MyApp < Sinatra::Application
register Sinatra::ActiveRecordExtension
get '/changenumber' do
p params
p params[:mynumber]
p "Echo"
end
end
And a Form:
<form action="/changenumber" method="GET">
Phone: <input type="text" name="mynumber" value="<%= user.number %>">
<input type="submit" value="Change Number">
</form>
As vu-minh-tan pointed out you should probably use Post instead of Get.
I rebuild your example and It works well:
{"mynumber"=>"test"}
"test"
"echo"
IP - - [TIME] "GET /changenumber?mynumber=test HTTP/1.1" 200 4 0.0005
Based on this, I think your problem is that you just lock at the output in your browser. And thats only the last line in your code. You should probably try something like this:
get '/changenumber' do
"Params: #{params} mynumber: #{params[:mynumber]}"
end

text_field.set not working in test script but works fine in irb

I'm trying to rename a folder from:
<li class="selected rename" id="labelset-624" folderid="624" foldertype="labelset" permissionlevel="2" labelsetid="624">
<div class="folder-insert-drop ui-droppable"></div>
<div class="clear"></div>
<div class="folder-item droppable hoverable empty ui-droppable">
<div id="mlink-labelset-624" class="folder-menu-link" data-hasfullperm="true" data-subfoldertype="undefined"></div>
<div class="expander"></div>
<div class="folder-name labelset label-set">New Label Set</div>
<div class="target-bar"></div>
<div class="folder-rename">
<input value="New Label Set" id="folder-rename-624" maxlength="100" type="text">
</div>
with watir-webdriver using the following commands:
#b.li(:class, "selected rename").div(:class, "folder-rename").text_field.wait_until_present
#b.li(:class, "selected rename").div(:class, "folder-rename").text_field.set labelsetName
#b.li(:class, "selected rename").div(:class, "folder-rename").text_field.send_keys :return
And it gives me the following error:
Watir::Exception::UnknownObjectException: unable to locate element, using {:class=>"selected rename", :tag_name=>"li"}
When I run my test script (test-unit), I can see the value for labelsetName entered into the text field, but it quickly disappears and reverts to the default value. This causes the send_keys statement to err.
When I enter the same commands into irb, it works perfectly. I tried adding sleeps of up to 15 seconds between steps to no avail. Is there any reason the two would work differently? Any suggestions for fixing this going forward?
Unless you have a compelling reason otherwise, try accessing the <input> tag directly using the id attribute:
b.text_field(:id => "folder-rename-624").set "foo"
b.text_field(:id => "folder-rename-624").send_keys :return
And--if there's an associated submit button--try using that instead of send_keys :return.
EDIT: Unfortunately, I can't reproduce the disappearing text issue. But I'm adding this snippet, which should handle the incrementing id attribute:
tfs = b.text_fields
b.text_field(:id => "#{tfs.last.id}").set "foo"
b.text_field(:id => "#{tfs.last.id}").send_keys :return
Turns out that because I had run the test a number of times, each time creating a new folder, that the folder I was trying to rename got pushed off screen. This is what caused the error.

How to ensure files are closed when reference is held by a different object

In Ruby, when the reference to an open file is handed to another object, as in the following code, do I need to wrap the other object reference in a "begin/ensure" block, to ensure the unmanaged resources get closed, or is there another way?
#doc = Nokogiri::XML(File.open("shows.xml"))
#doc.xpath("//character")
# => ["<character>Al Bundy</character>",
# "<character>Bud Bundy</character>",
# "<character>Marcy Darcy</character>",
# "<character>Larry Appleton</character>",
# "<character>Balki Bartokomous</character>",
# "<character>John "Hannibal" Smith</character>",
# "<character>Templeton "Face" Peck</character>",
# "<character>"B.A." Baracus</character>",
# "<character>"Howling Mad" Murdock</character>"]
In general, it's OK to not worry about leaving a single file open, if nothing else will be attempting to write to it, and your program isn't a long-running application. Ruby will close the file as it shuts down and exits. (I suspect the OS will do it too if it sees the file is open, but that'd be hard to test without going into a low-level debugger or digging into the OS's code.)
If you're worried about it though, I'd recommend using the block form of File.open because it automatically closes the file when your code exits the block:
require 'nokogiri'
doc = ''
File.open('./test.html', 'r') do |fi|
doc = Nokogiri::HTML(fi)
end
puts doc.to_html
Because I was curious and have always wondered, I ran a little test. I saved some HTML to a file called "test.html" and ran this in IRB:
test.rb(main):001:0> require 'nokogiri'
=> true
test.rb(main):002:0> page = File.open('test.html', 'r')
=> #<File:test.html>
test.rb(main):003:0> page.eof?
=> false
test.rb(main):004:0> page.closed?
=> false
test.rb(main):005:0> doc = Nokogiri::HTML(page)
=> #<Nokogiri::HTML::Document:0x3fc10149bc98 name="document" children=[#<Nokogiri::XML::DTD:0x3fc10149b6f8 name="html">, #<Nokogiri::XML::Element:0x3fc10149ef60 name="html" children=[#<Nokogiri::XML::Element:0x3fc10149ed58 name="head" children=[#<Nokogiri::XML::Element:0x3fc10149eb50 name="title" children=[#<Nokogiri::XML::Text:0x3fc10149e948 "Example Domain">]>, #<Nokogiri::XML::Element:0x3fc10149e740 name="meta" attributes=[#<Nokogiri::XML::Attr:0x3fc10149e6dc name="charset" value="utf-8">]>, #<Nokogiri::XML::Element:0x3fc10149e218 name="meta" attributes=[#<Nokogiri::XML::Attr:0x3fc10149e1b4 name="http-equiv" value="Content-type">, #<Nokogiri::XML::Attr:0x3fc10149e1a0 name="content" value="text/html; charset=utf-8">]>, #<Nokogiri::XML::Element:0x3fc10149da98 name="meta" attributes=[#<Nokogiri::XML::Attr:0x3fc10149da34 name="name" value="viewport">, #<Nokogiri::XML::Attr:0x3fc10149da20 name="content" value="width=device-width, initial-scale=1">]>, #<Nokogiri::XML::Element:0x3fc10149d318 name="style" attributes=[#<Nokogiri::XML::Attr:0x3fc10149d2b4 name="type" value="text/css">] children=[#<Nokogiri::XML::CDATA:0x3fc1014a0dd8 "\n\tbody {\n\t\tbackground-color: #f0f0f2;\n\t\tmargin: 0;\n\t\tpadding: 0;\n\t\tfont-family: \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n\n\t}\n\tdiv {\n\t\twidth: 600px;\n\t\tmargin: 5em auto;\n\t\tpadding: 3em;\n\t\tbackground-color: #fff;\n\t\tborder-radius: 1em;\n\t}\n\ta:link, a:visited {\n\t\tcolor: #38488f;\n\t\ttext-decoration: none;\n\t}\n\t#media (max-width: 600px) {\n\t\tbody {\n\t\t\tbackground-color: #fff;\n\t\t}\n\t\tdiv {\n\t\t\twidth: auto;\n\t\t\tmargin: 0 auto;\n\t\t\tborder-radius: 0;\n\t\t\tpadding: 1em;\n\t\t}\n\t}\n\t">]>]>, #<Nokogiri::XML::Element:0x3fc1014a0aa4 name="body" children=[#<Nokogiri::XML::Text:0x3fc1014a089c "\n">, #<Nokogiri::XML::Element:0x3fc1014a07c0 name="div" children=[#<Nokogiri::XML::Text:0x3fc1014a05b8 "\n\t">, #<Nokogiri::XML::Element:0x3fc1014a04dc name="h1" children=[#<Nokogiri::XML::Text:0x3fc1014a02d4 "Example Domain">]>, #<Nokogiri::XML::Text:0x3fc1014a00cc "\n\t">, #<Nokogiri::XML::Element:0x3fc10149fff0 name="p" children=[#<Nokogiri::XML::Text:0x3fc10149fde8 "This domain is established to be used for illustrative examples in documents. You do not need to\n\t\tcoordinate or ask for permission to use this domain in examples, and it is not available for\n\t\tregistration.">]>, #<Nokogiri::XML::Text:0x3fc10149fbe0 "\n\t">, #<Nokogiri::XML::Element:0x3fc10149fb04 name="p" children=[#<Nokogiri::XML::Element:0x3fc10149f8fc name="a" attributes=[#<Nokogiri::XML::Attr:0x3fc10149f898 name="href" value="http://www.iana.org/domains/special">] children=[#<Nokogiri::XML::Text:0x3fc10149f3d4 "More information...">]>]>, #<Nokogiri::XML::Text:0x3fc1014a309c "\n">]>, #<Nokogiri::XML::Text:0x3fc1014a2e94 "\n">]>]>]>
test.rb(main):006:0> page.eof?
=> true
test.rb(main):007:0> page.closed?
=> false
test.rb(main):008:0> page.close
=> nil
test.rb(main):009:0> page.closed?
=> true
So, in other words, Nokogiri does not close open files.
I believe that Nokogiri will close the file as soon as it's read, but how about being safe and replacing File.open with File.read
Nope, as the Tin Man points out, Nokogiri does not close the file handle after reading it. The proper way to deal with it is still:
doc = Nokogiri::XML File.read("shows.xml")

How to use rake to insert/replace html section in each file?

I'm using rake to create a Table of contents from a bunch of static HTML files.
The question is how do I insert it into all files from within rake?
I have a <ul id="toc"> in each file to aim for. The entire content of that I want to replace.
I was thinking about using Nokogiri or similar to parse the document and replace the DOM node ul#toc. However, I don't like the idea that I have to write the parser's DOM to the HTML file. What if it changes my layouts/indents etc.??
Any thoughts/ideas? Or perhaps links to working examples?
Could you rework the files to .rhtml, where
<ul id="toc">
is replaced with an erb directive, such as
<%= get_toc() %>
where get_toc() is defined in some library module. Write the transformed files as .html (to another directory if you like) and you're in business and the process is repeatable.
Or, come to that, why not just use gsub? Something like:
File.open(out_filename,'w+') do |output_file|
output_file.puts File.read(filename).gsub(/\<ul id="toc"\>/, get_toc())
end
I ended up with an idea similar to what Mike Woodhouse suggested. Only not using erb templates (as I wanted the source files to be freely editable also by non ruby-lovers)
def update_toc(filename)
raise "FATAL: Requires self.toc= ... before replacing TOC in files!" if #toc.nil?
content = File.read(filename)
content.gsub(/<h2 class="toc">.+?<\/ul>/, #toc)
end
def replace_toc_in_all_files
#file_names.each do |name|
content = update_toc(name)
File.open(name, "w") do |io|
io.write content
end
end
end
You can manipulate the document directly and save the resulting output. If you confine your manipulations to a particular element, you won't alter the overall structure and should be fine.
A library like Nokogiri or Hpricot will only adjust your document if it's malformed. I know that Hpricot can be coached to have a more relaxed parsing method, or can operate in a more strict XML/XHTML manner.
Simple example:
require 'rubygems'
require 'hpricot'
document = <<END
<html>
<body>
<ul id="tag">
</ul>
<h1 class="indexed">Item 1</h1>
<h2 class="indexed">Item 1.1</h2>
<h1 class="indexed">Item 2</h1>
<h2 class="indexed">Item 2.1</h2>
<h2 class="indexed">Item 2.2</h2>
<h1>Remarks</h1>
<!-- Test Comment -->
</body>
</html>
END
parsed = Hpricot(document)
ul_tag = (parsed / 'ul#tag').first
sections = (parsed / '.indexed')
ul_tag.inner_html = sections.collect { |i| "<li>#{i.inner_html}</li>" }.to_s
puts parsed.to_html
This will yield:
<html>
<body>
<ul id="tag"><li>Item 1</li><li>Item 1.1</li><li>Item 2</li><li>Item 2.1</li><li>Item 2.2</li></ul>
<h1 class="indexed">Item 1</h1>
<h2 class="indexed">Item 1.1</h2>
<h1 class="indexed">Item 2</h1>
<h2 class="indexed">Item 2.1</h2>
<h2 class="indexed">Item 2.2</h2>
<h1>Remarks</h1>
<!-- Test Comment -->
</body>
</html>

Resources