How to search a XML file using the value in ARGV[1] - ruby

I am trying to search a file using the value in the ARGV array. However using doc.at is not working. I have set the variable keyword to ARGV[1] and when given a value that prints to the console but when i try to puts the variable text to the console it comes up blank.
require 'nokogiri'
input = ARGV[0]
keyword = ARGV[1]
case input
when input = "list"
doc = File.open("emails.xml") { |f| Nokogiri::XML(f) }
text = doc.at('record:contains("{keyword}")')
puts text
puts keyword
else
puts "no"
end

Your string interpolation is wrong.
Change it to:
doc.at("record:contains('#{keyword}')")
start with double " and interpolate with #{}

Related

How to read multiple XML files then output to multiple CSV files with the same XML filenames

I am trying to parse multiple XML files then output them into CSV files to list out the proper rows and columns.
I was able to do so by processing one file at a time by defining the filename, and specifically output them into a defined output file name:
File.open('H:/output/xmloutput.csv','w')
I would like to write into multiple files and make their name the same as the XML filenames without hard coding it. I tried doing it multiple ways but have had no luck so far.
Sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<record:root>
<record:Dataload_Request>
<record:name>Bob Chuck</record:name>
<record:Address_Data>
<record:Street_Address>123 Main St</record:Street_Address>
<record:Postal_Code>12345</record:Postal_Code>
</record:Address_Data>
<record:Age>45</record:Age>
</record:Dataload_Request>
</record:root>
Here is what I've tried:
require 'nokogiri'
require 'set'
files = ''
input_folder = "H:/input"
output_folder = "H:/output"
if input_folder[input_folder.length-1,1] == '/'
input_folder = input_folder[0,input_folder.length-1]
end
if output_folder[output_folder.length-1,1] != '/'
output_folder = output_folder + '/'
end
files = Dir[input_folder + '/*.xml'].sort_by{ |f| File.mtime(f)}
file = File.read(input_folder + '/' + files)
doc = Nokogiri::XML(file)
record = {} # hashes
keys = Set.new
records = [] # array
csv = ""
doc.traverse do |node|
value = node.text.gsub(/\n +/, '')
if node.name != "text" # skip these nodes: if class isnt text then skip
if value.length > 0 # skip empty nodes
key = node.name.gsub(/wd:/,'').to_sym
if key == :Dataload_Request && !record.empty?
records << record
record = {}
elsif key[/^root$|^document$/]
# neglect these keys
else
key = node.name.gsub(/wd:/,'').to_sym
# in case our value is html instead of text
record[key] = Nokogiri::HTML.parse(value).text
# add to our key set only if not already in the set
keys << key
end
end
end
end
# build our csv
File.open('H:/output/.*csv', 'w') do |file|
file.puts %Q{"#{keys.to_a.join('","')}"}
records.each do |record|
keys.each do |key|
file.write %Q{"#{record[key]}",}
end
file.write "\n"
end
print ''
print 'output files ready!'
print ''
end
I have been getting 'read memory': no implicit conversion of Array into String (TypeError) and other errors.
Here's a quick peer-review of your code, something like you'd get in a corporate environment...
Instead of writing:
input_folder = "H:/input"
input_folder[input_folder.length-1,1] == '/' # => false
Consider doing it using the -1 offset from the end of the string to access the character:
input_folder[-1] # => "t"
That simplifies your logic making it more readable because it's lacking unnecessary visual noise:
input_folder[-1] == '/' # => false
See [] and []= in the String documentation.
This looks like a bug to me:
files = Dir[input_folder + '/*.xml'].sort_by{ |f| File.mtime(f)}
file = File.read(input_folder + '/' + files)
files is an array of filenames. input_folder + '/' + files is appending an array to a string:
foo = ['1', '2'] # => ["1", "2"]
'/parent/' + foo # =>
# ~> -:9:in `+': no implicit conversion of Array into String (TypeError)
# ~> from -:9:in `<main>'
How you want to deal with that is left as an exercise for the programmer.
doc.traverse do |node|
is icky because it sidesteps the power of Nokogiri being able to search for a particular tag using accessors. Very rarely do we need to iterate over a document tag by tag, usually only when we're peeking at its structure and layout. traverse is slower so use it as a very last resort.
length is nice but isn't needed when checking whether a string has content:
value = 'foo'
value.length > 0 # => true
value > '' # => true
value = ''
value.length > 0 # => false
value > '' # => false
Programmers coming from Java like to use the accessors but I like being lazy, probably because of my C and Perl backgrounds.
Be careful with sub and gsub as they don't do what you're thinking they do. Both expect a regular expression, but will take a string which they do a escape on before beginning their scan.
You're passing in a regular expression, which is OK in this case, but it could cause unexpected problems if you don't remember all the rules for pattern matching and that gsub scans until the end of the string:
foo = 'wd:barwd:' # => "wd:barwd:"
key = foo.gsub(/wd:/,'') # => "bar"
In general I recommend people think a couple times before using regular expressions. I've seen some gaping holes opened up in logic written by fairly advanced programmers because they didn't know what the engine was going to do. They're wonderfully powerful, but need to be used surgically, not as a universal solution.
The same thing happens with a string, because gsub doesn't know when to quit:
key = foo.gsub('wd:','') # => "bar"
So, if you're looking to change just the first instance use sub:
key = foo.sub('wd:','') # => "barwd:"
I'd do it a little differently though.
foo = 'wd:bar'
I can check to see what the first three characters are:
foo[0,3] # => "wd:"
Or I can replace them with something else using string indexing:
foo[0,3] = ''
foo # => "bar"
There's more but I think that's enough for now.
You should use Ruby's CSV class. Also, you don't need to do any string matching or regex stuff. Use Nokogiri to target elements. If you know the node names in the XML will be consistent it should be pretty simple. I'm not exactly sure if this is the output you want, but this should get you in the right direction:
require 'nokogiri'
require 'csv'
def xml_to_csv(filename)
xml_str = File.read(filename)
xml_str.gsub!('record:','') # remove the record: namespace
doc = Nokogiri::XML xml_str
csv_filename = filename.gsub('.xml', '.csv')
CSV.open(csv_filename, 'wb' ) do |row|
row << ['name', 'street_address', 'postal_code', 'age']
row << [
doc.xpath('//name').text,
doc.xpath('//Street_Address').text,
doc.xpath('//Postal_Code').text,
doc.xpath('//Age').text,
]
end
end
# iterate over all xml files
Dir.glob('*.xml').each { |filename| xml_to_csv(filename) }

How do I search a text file for a string and then print/return/put out what line of the file it was found on as a number in Ruby

So, I want to use the number I get from it in this:
line = answer to question
database.read.lines[line]
Database being the text file I am searching in.
You can also do it this way :
text_to_find = 'some random text' # use gets method to take input from user
text_found_at_index = database.readlines.index{|line| not line[text].nil? }
Hope, this is what you require : )
I would try something like this:
query = gets.chomp
database.each_line.with_index do |line, index|
if line.include?(query)
puts "Line #{index}: #{line}"
end
end

Why can't I use `filename.open' instead of `open(filename)'?

In this piece of code which lets you read a file in the terminal, why do you need to use open(filename) rather than filename.open?
filename = ARGV.first
txt = open(filename)
puts "Here's your file #{filename}:"
print txt.read
print "Type the filename again: "
file_again = $stdin.gets.chomp
txt_again = open(file_again)
print txt_again.read
You cant use filename.open, because filename is a String and method open is not defined in String
Use File#open
File.open(filename)
File.open("file")
opens a local file and returns a file object. Here File#open is a method of File class.
open("file")
is actually Kernel#open and looks at the string to decide what to do with it.
Trivializing things:
File.open("file") is telling Ruby specifically to open a file.
In case of open("file") Ruby examines the string "file" to determine what type it is (here a file) and open corresponding type else throws appropriate error.
Ruby has a class for dealing with paths in an object-oriented way: Pathname
require 'pathname'
loop do
print 'Enter filename: '
pn = Pathname(gets.chomp)
if pn.file?
puts "Here's your file '#{pn}':", pn.read
elsif pn.exist?
puts 'That is not a file.'
else
puts 'File does not exist.'
end
end

Not extracting the full link using index

I'm trying to extract the first href link from a website. Just the full link alone.
I am expecting to get http://www.iana.org/domains/example as the output but instead I am getting just http://www.iana.org/domains/ex
require 'net/http'
source = Net::HTTP.get('www.example.org', '/index.html')
def findhref(page) #returns rest of the html after href
return page[page.index('href')..-1]
end
def findlink(page)
text = findhref(page)
firstquote = text.index('"') #first position of quote
secondquote = text[firstquote+1..-1].index('"') #2nd quote
puts text #for debugging
puts firstquote+1 #for debugging
puts secondquote #for debugging
return text[firstquote+1..secondquote]
end
print findlink(source)
I would suggest using Nokogiri for HTML parsing. The solution to your problem would be as simple as:
doc = Nokogiri::HTML(open('www.example.org/index.html'))
first_anchor = doc.css('a').first
first_href = first_anchor['href']

Having trouble saving to file in Ruby

Hi I have a simple form that allows a user to input a name, their gender and a password. I use Digest::MD5.hexdigest to encrypt the input. Once I have the encrypted input eg, d1c261ede46c1c66b7e873564291ebdc, I want to be able to append this to a file I have already created. However every thing I have tried just isn't working. Can anyone please help and thank you in advance. Here is what I have:
input = STDIN.read( ENV["CONTENT_LENGHT"] )
puts "Content-type: text/html \n\n"
require 'digest/md5'
digest = Digest::MD5.hexdigest(input)
f = File.open("register.txt", "a")
f.write(digest)
f.close
I have also tried this with no luck:
File.open("register.txt", "a") do |f|
f.puts(digest)
end
If the code is verbatim then I think you have a typo in the first line: did you mean CONTENT_LENGHT or is it a typo? ENV[] will return a string if the variable is set, which will upset STDIN#read. I get TypeError: can't convert String into Integer. Assuming the typo, then ENV[] returns nil, which tells STDIN#read to read until EOF, which from the console means, I think, Control-Z. That might be causing a problem.
I suggest you investigate by modifying your script thus:
read_length = ENV["CONTENT_LENGTH"].to_i # assumed typo fixed, convert to integer
puts "read length = #{read_length}"
input = STDIN.read( read_length )
puts "input = #{input}"
puts "Content-type: text/html \n\n" # this seems to serve no purpose
require 'digest/md5'
digest = Digest::MD5.hexdigest(input)
puts "digest = #{digest}"
# prefer this version: it's more idiomatically "Rubyish"
File.open("register.txt", "a") do |f|
puts "file opened"
f.puts(digest)
end
file_content = File.read("register.txt")
puts "done, file content = #{file_content}"
This works on my machine, with the following output (when CONTENT_LENGTH set to 12):
read length = 12
abcdefghijkl
input = abcdefghijkl
Content-type: text/html
digest = 9fc9d606912030dca86582ed62595cf7
file opened
done, file content = 6cfbc6ae37c91b4faf7310fbc2b7d5e8
e271dc47fa80ddc9e6590042ad9ed2b7
b0fb8772912c4ac0f13525409c2b224e
9fc9d606912030dca86582ed62595cf7

Resources