How can i read lines in a textfile with RUBY - ruby

I am new in ruby programming. I am trying to read a textfile line by line.
Here is my sample textfile:
john
doe
john_d
somepassword
Here is my code:
f = File.open('input.txt', 'r')
a = f.readlines
n = a[0]
s = a[1]
u = a[2]
p = a[3]
str = "<user><name=\"#{n}\" surname=\"#{s}\" username=\"#{u}\" password=\"#{p}\"/></user>"
File.open('result.txt', 'w') { |file| file.print(str) }
The output should look like this:
<user><name="john" surname="doe" username="john_d" password="somepassword"/></user>
But the result.txt looks like this. It includes newline character for every line:
<user><name="john
" surname="doe
" username="john_d
" password="somepassword"/></user>
How can i correct this?

It includes newline character for every line, because there is a newline character at the end of every line.
Just removed it when you don't need it:
n = a[0].gsub("\n", '')
s = a[1].gsub("\n", '')
# ...

As explained by spickermann, also just change line two into:
a = f.readlines.map! { |line| line.chomp }

As #iGian already mentioned, chomp is a good option to clean up your text. I am not sure which version of Ruby you are using, but here is the link to the official Ruby version 2.5 documentation on chomp just so you see how it is going to help you: https://ruby-doc.org/core-2.5.0/String.html#method-i-chomp
See the content of variable a after using chomp:
2.4.1 :001 > f = File.open('input.txt', 'r')
=> #<File:input.txt>
2.4.1 :002 > a = f.readlines.map! {|line| line.chomp}
=> ["john", "doe", "john_d", "somepassword"]
Depending on how many other corner cases you expect to see from your input string, here is also another suggestion that can help you to clean up your strings: strip with link to its official documentation with examples: https://ruby-doc.org/core-2.5.0/String.html#method-i-strip
See the content of variable a after using strip:
2.4.1 :001 > f = File.open('input.txt', 'r')
=> #<File:input.txt>
2.4.1 :002 > a = f.readlines.map! {|line| line.strip}
=> ["john", "doe", "john_d", "somepassword"]

FName = 'temp'
File.write FName, "john
doe
john_d
somepassword"
#=> 28
Here are two ways.
s = "<user><name=\"%s\" surname=\"%s\" username=\"%s\" password=\"%s\"/></user>"
puts s % File.readlines(FName).map(&:chomp)
# <user><name="john" surname="doe" username="john_d" password="somepassword"/></user>
puts s % File.read(FName).split("\n")
# <user><name="john" surname="doe" username="john_d" password="somepassword"/></user>
See String#% and, as mentioned in that doc, Kernel#sprintf.

Related

How to read multiple XML files then output to multiple CSV files with the same XML filenames

I am trying to parse multiple XML files then output them into CSV files to list out the proper rows and columns.
I was able to do so by processing one file at a time by defining the filename, and specifically output them into a defined output file name:
File.open('H:/output/xmloutput.csv','w')
I would like to write into multiple files and make their name the same as the XML filenames without hard coding it. I tried doing it multiple ways but have had no luck so far.
Sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<record:root>
<record:Dataload_Request>
<record:name>Bob Chuck</record:name>
<record:Address_Data>
<record:Street_Address>123 Main St</record:Street_Address>
<record:Postal_Code>12345</record:Postal_Code>
</record:Address_Data>
<record:Age>45</record:Age>
</record:Dataload_Request>
</record:root>
Here is what I've tried:
require 'nokogiri'
require 'set'
files = ''
input_folder = "H:/input"
output_folder = "H:/output"
if input_folder[input_folder.length-1,1] == '/'
input_folder = input_folder[0,input_folder.length-1]
end
if output_folder[output_folder.length-1,1] != '/'
output_folder = output_folder + '/'
end
files = Dir[input_folder + '/*.xml'].sort_by{ |f| File.mtime(f)}
file = File.read(input_folder + '/' + files)
doc = Nokogiri::XML(file)
record = {} # hashes
keys = Set.new
records = [] # array
csv = ""
doc.traverse do |node|
value = node.text.gsub(/\n +/, '')
if node.name != "text" # skip these nodes: if class isnt text then skip
if value.length > 0 # skip empty nodes
key = node.name.gsub(/wd:/,'').to_sym
if key == :Dataload_Request && !record.empty?
records << record
record = {}
elsif key[/^root$|^document$/]
# neglect these keys
else
key = node.name.gsub(/wd:/,'').to_sym
# in case our value is html instead of text
record[key] = Nokogiri::HTML.parse(value).text
# add to our key set only if not already in the set
keys << key
end
end
end
end
# build our csv
File.open('H:/output/.*csv', 'w') do |file|
file.puts %Q{"#{keys.to_a.join('","')}"}
records.each do |record|
keys.each do |key|
file.write %Q{"#{record[key]}",}
end
file.write "\n"
end
print ''
print 'output files ready!'
print ''
end
I have been getting 'read memory': no implicit conversion of Array into String (TypeError) and other errors.
Here's a quick peer-review of your code, something like you'd get in a corporate environment...
Instead of writing:
input_folder = "H:/input"
input_folder[input_folder.length-1,1] == '/' # => false
Consider doing it using the -1 offset from the end of the string to access the character:
input_folder[-1] # => "t"
That simplifies your logic making it more readable because it's lacking unnecessary visual noise:
input_folder[-1] == '/' # => false
See [] and []= in the String documentation.
This looks like a bug to me:
files = Dir[input_folder + '/*.xml'].sort_by{ |f| File.mtime(f)}
file = File.read(input_folder + '/' + files)
files is an array of filenames. input_folder + '/' + files is appending an array to a string:
foo = ['1', '2'] # => ["1", "2"]
'/parent/' + foo # =>
# ~> -:9:in `+': no implicit conversion of Array into String (TypeError)
# ~> from -:9:in `<main>'
How you want to deal with that is left as an exercise for the programmer.
doc.traverse do |node|
is icky because it sidesteps the power of Nokogiri being able to search for a particular tag using accessors. Very rarely do we need to iterate over a document tag by tag, usually only when we're peeking at its structure and layout. traverse is slower so use it as a very last resort.
length is nice but isn't needed when checking whether a string has content:
value = 'foo'
value.length > 0 # => true
value > '' # => true
value = ''
value.length > 0 # => false
value > '' # => false
Programmers coming from Java like to use the accessors but I like being lazy, probably because of my C and Perl backgrounds.
Be careful with sub and gsub as they don't do what you're thinking they do. Both expect a regular expression, but will take a string which they do a escape on before beginning their scan.
You're passing in a regular expression, which is OK in this case, but it could cause unexpected problems if you don't remember all the rules for pattern matching and that gsub scans until the end of the string:
foo = 'wd:barwd:' # => "wd:barwd:"
key = foo.gsub(/wd:/,'') # => "bar"
In general I recommend people think a couple times before using regular expressions. I've seen some gaping holes opened up in logic written by fairly advanced programmers because they didn't know what the engine was going to do. They're wonderfully powerful, but need to be used surgically, not as a universal solution.
The same thing happens with a string, because gsub doesn't know when to quit:
key = foo.gsub('wd:','') # => "bar"
So, if you're looking to change just the first instance use sub:
key = foo.sub('wd:','') # => "barwd:"
I'd do it a little differently though.
foo = 'wd:bar'
I can check to see what the first three characters are:
foo[0,3] # => "wd:"
Or I can replace them with something else using string indexing:
foo[0,3] = ''
foo # => "bar"
There's more but I think that's enough for now.
You should use Ruby's CSV class. Also, you don't need to do any string matching or regex stuff. Use Nokogiri to target elements. If you know the node names in the XML will be consistent it should be pretty simple. I'm not exactly sure if this is the output you want, but this should get you in the right direction:
require 'nokogiri'
require 'csv'
def xml_to_csv(filename)
xml_str = File.read(filename)
xml_str.gsub!('record:','') # remove the record: namespace
doc = Nokogiri::XML xml_str
csv_filename = filename.gsub('.xml', '.csv')
CSV.open(csv_filename, 'wb' ) do |row|
row << ['name', 'street_address', 'postal_code', 'age']
row << [
doc.xpath('//name').text,
doc.xpath('//Street_Address').text,
doc.xpath('//Postal_Code').text,
doc.xpath('//Age').text,
]
end
end
# iterate over all xml files
Dir.glob('*.xml').each { |filename| xml_to_csv(filename) }

How to remove the delimiters from a csv file in ruby

I am reading a file which is either separated by a "tab space" or "semicolon(;)" or "comma(,)" below code separates only tab space but i want all 3 to be checked. like if a file is comma separated it should work for that also . Please help!
#!/usr/bin/ruby
require 'csv'
test = CSV.read('test.csv', headers:true, :col_sep => "\t")
x = test.headers
puts x
It looks like :col_sep cannot be a Regexp, which could have solved your problem.
One possible solution would be to analyze the head of your CSV file and count the occurences of possible separators :
require 'csv'
possible_separators = ["\t", ';', ',']
lines_to_analyze = 1
File.open('test.csv') do |csv|
head = csv.first(lines_to_analyze).join
#col_sep = possible_separators.max_by { |sep| head.count(sep) }
end
#col_sep
#=> ";"
You can then use :
test = CSV.read('test.csv', headers:true, :col_sep => #col_sep)
x = test.headers
puts x
# a
# b
# c
The values in x won't contain any separator.

Apply .capitalize on an Cyrillic array in ruby

I want to capitalise the string elements in the array with ruby
This is my code:
headermonths = ["января","февраля","марта","апреля","мая","июня","июля","августа","октября","ноября","декабря"]
headermonths.each {|month| month.capitalize!}
puts headermonths
I get the following output:
января
февраля
марта
апреля
мая
июня
июля
августа
октября
ноября
декабря
if print the array with:
print headermonths
I get the following
["\u044F\u043D\u0432\u0430\u0440\u044F", "\u0444\u0435\u0432\u0440\u0430\u043B\u044F", "\u043C\u0430\u0440\u0442\u0430", "\u0430\u043F\u0440\u0435\u043B\u044F", "\u043C\u0430\u044F", "\u0438\u044E\u043D\u044F", "\u0438\u044E\u043B\u044F", "\u0430\u0432\u0433\u0443\u0441\u0442\u0430", "\u043E\u043A\u0442\u044F\u0431\u0440\u044F", "\u043D\u043E\u044F\u0431\u0440\u044F", "\u0434\u0435\u043A\u0430\u0431\u0440\u044F"]
But I would like to have an output like:
Января
Февраля
Марта
Апреля
Мая
Июня
Июля
Августа
Октября
Ноября
Декабря
How does I achieve this with a ruby method?
You can use the unicode gem
require 'unicode'
headermonths = ["января","февраля","марта","апреля","мая","июня","июля","августа","октября","ноября","декабря"]
headermonths.map! {|month| Unicode::capitalize month }
puts headermonths
# >> ["Января", "Февраля", "Марта", "Апреля", "Мая", "Июня", "Июля", "Августа", "Октября", "Ноября", "Декабря"]
Stand-alone solution :
# From : https://en.wikipedia.org/wiki/Cyrillic_alphabets :
upcase = "АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЮЯ"
downcase = "абвгдежзийклмнопрстуфхцчшщьюя"
headermonths = ["января","февраля","марта","апреля","мая","июня","июля","августа","октября","ноября","декабря"]
headermonths.each{|word| word[0] = word[0].tr(downcase,upcase)}
# => ["Января", "Февраля", "Марта", "Апреля", "Мая", "Июня", "Июля", "Августа", "Октября", "Ноября", "Декабря"]
If you want to use it with words in latin and cyrillic alphabets :
headermonths.each{|word| word[0] = word[0].tr(downcase,upcase).upcase }
With ActiveSupport
You can use ActiveSupport::Multibyte :
require 'active_support/core_ext/string/multibyte'
"января".mb_chars.capitalize.to_s #=> "Января"
So your script becomes :
require 'active_support/core_ext/string/multibyte'
headermonths = ["января","февраля","марта","апреля","мая","июня","июля","августа","октября","ноября","декабря"]
headermonths.map!{|word| word.mb_chars.capitalize.to_s}
#=> ["Января", "Февраля", "Марта", "Апреля", "Мая", "Июня", "Июля", "Августа", "Октября", "Ноября", "Декабря"]
Ruby 2.4
The code in your question would work just as expected with Ruby 2.4.
See "Case sensitivity for unicode characters" here.
The example below is a robust capitalize version, that works in any ruby starting with 1.9 but for cyrillic only due to -32 hardcoded.
NB: thanks and credits go to #Stefan and #EricDuminil, who lead me to the right direction
headermonths = %w|января февраля марта апреля мая июня
июля августа октября ноября декабря|
puts (headermonths.each do |s|
s[0] = (s[0].ord - 32).chr(Encoding::UTF_8)
end.inspect)
#⇒ ["Января", "Февраля", "Марта", "Апреля", "Мая", "Июня",
# "Июля", "Августа", "Октября", "Ноября", "Декабря"]

Writing regex result into a new file

I've got a list of devices:
ipc-bei640-r-br-01
ipc-bei640-r-br-02
ipc-bei640-r-br-03
ipc-bei640-r-br-04
ipc-bei640-r-br-05
ipc-bem640-r-br-01
ipc-bem640-r-br-02
ipc-bem640-r-br-03
ipc-crg660-r-br-02
ipc-geb680-r-br-04
ipc-lgv630-r-br-01
This small little ruby script counts the lines of the file braslist.txt scans it with a regex and writes the results to a newfile called "strippedfile.txt"
lines = IO.readlines("/usr/local/bin/braslist.txt")
# Linecount is forwarded to StdOut.
puts lines.length
str = File.read('braslist.txt')
file_name = ['strippedfile.txt']
file_name.each do |file_name|
text = File.read(file_name)
new_contents = str.scan(/^ipc-(?<bng>[a-z]{3}\d{3})-r-br(?<nr>-\d{2})$/)
# open and write to a file with ruby
open('strippedfile.txt', 'w') { |f|
f.print new_contents
}
end
Now what i cant seem to fix, is in the new file "strippedfile" the results are always ["bei640", "-01"] ["bei640", "-02"] ["bei640", "-03"]
And i am trying to get all results in this format:
bei640-01
bei640-02
bei640-03
bei640-04
scan returns an array of matches, you probably want to join them:
- new_contents = str.scan(/^ipc-(?<bng>[a-z]{3}\d{3})-r-br(?<nr>-\d{2})$/)
+ new_contents = str.scan(/^ipc-(?<bng>[a-z]{3}\d{3})-r-br(?<nr>-\d{2})$/).map(&:join)
To print everything without quotes and brackets line by line:
- f.print new_contents
+ f.puts new_contents
Assuming your resultant array is
a = [["bei640", "-02"], ["bei640", "-03"]]
You can use join to get your desired result
a.map{|i| i.join } #=> ["bei640-02", "bei640-03"]
or use shortcut as mudasobwa answered
a.map(&:join) #=> ["bei640-02", "bei640-03"]

How can I remove escape characters from string? UTF issue?

I've read in a XML file that has lines such as
<Song name="Caught Up In You" id='162' duration='276610'/>
I'm reading in the file with
f=File.open(file)
f.each_with_index do |line,index|
if line.match('Song name="')
#songs << line
puts line if (index % 1000) == 0
end
end
However when I try and use entries I find that get text with escaped characters such as:
"\t\t<Song name=\"Veinte Anos\" id='3118' duration='212009'/>\n"
How can I eliminate the escape characters either in the initial store or in the later selection
#songs[rand(#songs.size)]
ruby 2.0
Your text does not have 'escape' characters. The .inspect version of the string shows these. Observe:
> s = gets
Hello "Michael"
#=> "Hello \"Michael\"\n"
> puts s
Hello "Michael"
> p s # The same as `puts s.inspect`
"Hello \"Michael\"\n"
However, the real answer is to process this XML file as XML. For example:
require 'nokogiri' # gem install nokogiri
doc = Nokogiri.XML( IO.read( 'mysonglist.xml' ) ) # Read and parse the XML file
songs = doc.css( 'Song' ) # Gives you a NodeList of song els
puts songs.map{ |s| s['name'] } # Print the name of all songs
puts songs.map{ |s| s['duration'] } # Print the durations (as strings)
mins_and_seconds = songs.map{ |s| (s['duration'].to_i/1000.0).divmod(60) }
#=> [ [ 4, 36.6 ], … ]

Resources