How do I two merge several CSV files horizontally? - ruby

I've got about 50 CSV files that need to be merged together horizontally into one CSV.
The headers can be ignored. A little bit simplified the files look like this:
File 1:
1,2,4,5,6
4,5,68,7,4,2
1,2
1,2,3
File 2:
1,2,4
4,5,6,4
3,4,5
3,4,5
The output should look like this:
1,2,4,5,6,1,2,4
4,5,68,7,4,2,4,5,6,4
1,2,3,4,5
3,4,5
1,2,3
The order of mergeing the files is also not important. I know how to merge them vertically, but I have no clue how to merge horizontally.
I thought about something like this with a nested array, but it does not work, but I don't know why. It seems like the data array does not accept the line array.
#!/usr/bin/env ruby
require 'csv'
data = Array.new
filecount=1
linecount=1
CSV.open("output.csv", "wb") do |output|
Dir.glob('*.csv').each do |each|
next if each == 'output.csv'
file = CSV.read(each)
file.each do |line|
data[filecount][linecount] = line
linecount=linecount+1
end
filecount=filecount+1
end
end
puts data

I prepared a small script that solves your problem, and added some comments for better explanation.
The main idea is to catch the input line by line so you do not have to use much memory.
#!/usr/bin/env ruby
require 'csv'
# map "treats" each element of the array with the block
files = Dir.glob('csv/*.csv').map { |file| CSV.open file, 'r' }
CSV.open("output.csv", "wb") do |out|
loop do
# shift returns the next line
# compact remove nil entries
line = files.map { |file| file.shift }.compact
# remove entry if file has no row
line.reject! { |e| e.empty? }
# break the endless loop if no input to handle
break if line.empty?
out << line.flatten
end
end

Related

read file into an array excluding the the commented out lines

I'm almost a Ruby-nOOb (have just the knowledge of Ruby to write some basic .erb template or Puppet custom-facts). Looks like my requirements fairly simple but can't get my head around it.
Trying to write a .erb template, where it reads a file (with space delimited lines) to an array and then handle each array element according to the requirements. This is what I got so far:
fname = "webURI.txt"
def myArray()
#if defined? $fname
if File.exist?($fname) and File.file?($fname)
IO.readlines($fname)
end
end
myArray.each_index do |i|
myLine = myArray[i].split(' ')
puts myLine[0] +"\t=> "+ myLine.last
end
Which works just fine, except (for obvious reason) for the line that is commented out or blank lines. I also want to make sure that when spitted (by space) up, the line shouldn't have more than two fields in it; a file like this:
# This is a COMMENT
#
# Puppet dashboard
puppet controller-all-local.example.co.uk:80
# Nagios monitoring
nagios controller-all-local.example.co.uk::80/nagios
tac talend-tac-local.example.co.uk:8080/org.talend.admin
mng console talend-mca-local.example.co.uk:8080/amc # Line with three fields
So, basically these two things I'd like to achieve:
Read the lines into array, stripping off everything after the first #
Split each element and print a message if the number id more than two
Any help would be greatly appreciated. Cheers!!
Update 25/02
Thanks guy for your help!!
The blankthing doesn't work for at all; throwing in this error; but I kinda failed to understand why:
undefined method `blank?' for "\n":String (NoMethodError)
The array: myArray, which I get is actually something like this (using p instead of puts:
["\n", "puppet controller-all-local.example.co.uk:80\n", "\n", "\n", "nagios controller-all-local.example.co.uk::80/nagios\n", ..... \n"]
Hence, I had to do this to get around this prob:
$fname = "webURI.txt"
def myArray()
if File.exist?($fname) and File.file?($fname)
IO.readlines($fname).map { |arr| arr.gsub(/#.*/,'') }
end
end
# remove blank lines
SSS = myArray.reject { |ln| ln.start_with?("\n") }
SSS.each_index do |i|
myLine = SSS[i].split(' ')
if myLine.length > 2
puts "Too many arguments!!!"
elsif myLine.length == 1
puts "page"+ i.to_s + "\t=> " + myLine[0]
else
puts myLine[0] +"\t=> "+ myLine.last
end
end
You are most welcome to improve the code. cheers!!
goodArray = myArray.reject do |line|
line.start_with?('#') || line.split(' ').length > 2
end
This would reject whatever that either starts with # or the split returns an array of more than two elements returning you an array of only good items.
Edit:
For your inline commenting you can then do
goodArray.map do |line|
line.gsub(/#.*/, '')
end

Best way of Parsing 2 CSV files and printing the common values in a third file

I am new to Ruby, and I have been struggling with a problem that I suspect has a simple answer. I have two CSV files, one with two columns, and one with a single column. The single column is a subset of values that exist in one column of my first file. Example:
file1.csv:
abc,123
def,456
ghi,789
jkl,012
file2.csv:
def
jkl
All I need to do is look up the column 2 value in file1 for each value in file2 and output the results to a separate file. So in this case, my output file should consist of:
456
012
I’ve got it working this way:
pairs=IO.readlines("file1.csv").map { |columns| columns.split(',') }
f1 =[]
pairs.each do |x| f1.push(x[0]) end
f2 = IO.readlines("file2.csv").map(&:chomp)
collection={}
pairs.each do |x| collection[x[0]]=x[1] end
f=File.open("outputfile.txt","w")
f2.each do |col1,col2| f.puts collection[col1] end
f.close
...but there has to be a better way. If anyone has a more elegant solution, I'd be very appreciative! (I should also note that I will eventually need to run this on files with millions of lines, so speed will be an issue.)
To be as memory efficient as possible, I'd suggest only reading the full file2 (which I gather would be the smaller of the two input files) into memory. I'm using a hash for fast lookups and to store the resulting values, so as you read through file1 you only store the values for those keys you need. You could go one step further and write the outputfile while reading file2.
require 'CSV'
# Read file 2, the smaller file, and store keys in result Hash
result = {}
CSV.foreach("file2.csv") do |row|
result[row[0]] = false
end
# Read file 1, the larger file, and look for keys in result Hash to set values
CSV.foreach("file1.csv") do |row|
result[row[0]] = row[1] if result.key? row[0]
end
# Write the results
File.open("outputfile.txt", "w") do |f|
result.each do |key, value|
f.puts value if value
end
end
Tested with Ruby 1.9.3
Parsing For File 1
data_csv_file1 = File.read("file1.csv")
data_csv1 = CSV.parse(data_csv_file1, :headers => true)
Parsing For File 2
data_csv_file2 = File.read("file2.csv")
data_csv2 = CSV.parse(data_csv_file1, :headers => true)
Collection of names
names_from_sheet1 = data_csv1.collect {|data| data[0]} #returns an array of names
names_from_sheet2 = data_csv2.collect {|data| data[0]} #returns an array of names
common_names = names_from_sheet1 & names_from_sheet2 #array with common names
Collecting results to be printed
results = [] #this will store the values to be printed
data_csv1.each {|data| results << data[1] if common_names.include?(data[0]) }
Final output
f = File.open("outputfile.txt","w")
results.each {|result| f.puts result }
f.close

Moving to the last line of a file while reading it in a .each loop in Ruby

I'm reading in a file that can contain any number of rows.
I only need to save the first 1000 or so, passed in as a variable "recordsToParse".
If I reach my 1000 line limit, or whatever it's set to, I need to save the trailer information in the file to verify total_records, total_amount etc.
So, I need a way to move my "pointer" from where ever I am in the file to the last line and run through one more time.
file = File.open(file_name)
parsed_file_rows = Array.new
successful_records, failed_records = 0, 0
file_contract = file_contract['File_Contract']
output_file_name = file_name.gsub(/.TXT|.txt|.dat|.DAT/,'')
file.each do |line|
line.chomp!
line_contract = determine_row_type(file_contract, line)
if line_contract
parsed_row = parse_row_by_contract(line_contract, line)
parsed_file_rows << parsed_row
successful_records += 1
else
failed_records += 1
end
if (not recordsToParse.nil?)
if successful_records > recordsToParse
# move "pointer" to last line and go through loop once more
#break;
end
end
end
store_parsed_file('Parsed_File',"#{output_file_name}_parsed", parsed_file_rows)
[successful_records, failed_records]
Use IO.seek with IO::SEEK_END to move your pointer to the end of the file, then move up to the last CR, then you have your last line.
This would only be worthwhile if the file is very big, otherwise just follow the file.each do |line| to the last line or you could read the last line like this IO.readlines("file.txt")[-1].
The easiest solution is to use a gem like elif
require "elif"
lastline = Elif.open("bigfile.txt") { |f| f.gets }
It reads your lastline in a snap undoubtedly using seek.
This is one of those times I'd take advantage of the OS's head and tail commands using something like:
head = `head -#{ records_to_parse } #{ file_to_read }`.split("\n")
tail = `tail -1 #{ file_to_read }
head.pop if (head[-1] == tail.chomp)
Then write it all out using something like:
File.open(new_file_to_write, 'w') do |fo|
fo.puts head, tail
end

loop, array and file problem in ruby

I'm currently learning ruby and here what I'm trying to do:
A script which open a file, make a subsitution, then comparing every lines to each other to see if it exist many times.
So, I tried to work directly with the string, but I didn't find how to do it, so I put every line in an array, and comparing every row.
But I got a first problem.
Here is my code:
#!/usr/bin/env ruby
DOC = "test.txt"
FIND = /,,^M/
SEP = "\n"
#make substitution
puts File.read(DOC).gsub(FIND, SEP)
#open the file and put every line in an array
openFile = File.open(DOC, "r+")
fileArray = openFile.each { |line| line.split(SEP) }
#print fileArray #--> give the name of the object
#Cross the array to compare every items to every others
fileArray.each do |items|
items.chomp
fileArray.each do |items2|
items2.chomp
#Delete if the item already exist
if items = items2
fileArray.delete(items2)
end
end
end
#Save the result in a new file
File.open("test2.txt", "w") do |f|
f.puts fileArray
end
At the end, I only have the name of the array object "fileArray". I print the object after the split, and i've got the same, so I guess the problem is from here. Little help required (if you know how to do this without array, just with the line in the file, answer appreciate too).
Thanks !
EDIT:
So, here's my code now
#!/usr/bin/env ruby
DOC = "test.txt"
FIND = /,,^M/
SEP = "\n"
#make substitution
File.read(DOC).gsub(FIND, SEP)
unique_lines = File.readlines(DOC).uniq
#Save the result in a new file
File.open('test2.txt', 'w') { |f| f.puts(unique_lines) }
Can't figure out how to chomp this.
Deleting duplicate lines in a file:
no_duplicate_lines = File.readlines("filename").uniq
No need to write so much code :)
Modify your code like this:
f.puts fileArray.join("\n")
Alternate way:
unique_lines = File.readlines("filename").uniq
# puts(unique_lines.join("\n")) # Uncomment this line and see if the variable holds the result you want...
File.open('filename', 'w') {|f| f.puts(unique_lines.join("\n"))}
Just a couple of points about the original code:
fileArray = openFile.each { |line| line.split(SEP) }
sets fileArray to a File object, which I suspect wasn't your intention. File#each (the # notation is Ruby convention to describe a particular method on an object of the supplied class) executes your supplied block for each line (it's also available with a synonym: each_line), where a line is defined by default as your OS's end-line character(s).
If you were looking to build an array of lines, then you could just have written
fileArray = openFile.readlines
and if you wanted those lines to be chomped (often a good idea) then that could be achieved by something like
fileArray = openFile.readlines.collect { |line| line.chomp }
or even (since File mixes in Enumerable)
fileArray = openFile.collect { |line| line.chomp }
And one other tiny thing: Ruby tests for equality with ==, = is only for assignment, so
if items = items2
will set items to items2 (and will always evaluate as true)

Ruby: Deleting last iterated item?

What I'm doing is this: have one file as input, another as output. I chose a random line in the input, put it in the output, and then delete it.
Now, I've iterated over the file and am on the line I want. I've copied it to the output file. Is there a way to delete it? I'm doing something like this:
for i in 0..number_of_lines_to_remove
line = rand(lines_in_file-2) + 1 #not removing the first line
counter = 0
IO.foreach("input.csv", "r") { |current_line|
if counter == line
File.open("output.csv", "a") { |output|
output.write(current_line)
}
end
counter += 1
}
end
So, I have current_line, but I'm not sure how to remove it from the source file.
Array.delete_at might do. Given an index, it removes the object at that index, returning the object.
input.csv:
one,1
two,2
three,3
Program:
#!/usr/bin/ruby1.8
lines = File.readlines('/tmp/input.csv')
File.open('/tmp/output.csv', 'a') do |file|
file.write(lines.delete_at(rand(lines.size)))
end
p lines # ["two,2\n", "three,3\n"]
output.csv:
one,1
Here is a randomline class. You create a new randomline object by passing it an input file name and an output file name. You can then call the deleterandom method on that object and pass it a number of lines to delete.
The data is stored internally in arrays as well as being put to file. Currently output is in append mode so if you use the same file it will just add to the end, you could change the a to a w if you wanted to start the file fresh each time.
class Randomline
attr_accessor :inputarray, :outputarray
def initialize(filein, fileout)
#filename = filein
#filein = File.open(filein,"r+")
#fileoutput = File.open(fileout,"a")
#inputarray = []
#outputarray = []
readin()
end
def readin()
#filein.each do |line|
#inputarray << line
end
end
def deleterandom(numtodelete)
numtodelete.times do |num|
random = rand(#inputarray.size)
#outputarray << inputarray[random]
#fileoutput.puts inputarray[random]
#inputarray.delete_at(random)
end
#filein = File.open(#filename,"w")
#inputarray.each do |line|
#filein.puts line
end
end
end
here is an example of it being used
a = Randomline.new("testin.csv","testout.csv")
a.deleterandom(3)
You have to re-write the source-file after removing a line otherwise the modifications won't stick as they're performed on a copy of the data.
Keep in mind that any operation which modifies a file in-place runs the risk of truncating the file if there's an error of any sort and the operation cannot complete.
It would be safer to use some kind of simple database for this kind of thing as libraries like SQLite and BDB have methods for ensuring data integrity, but if that's not an option, you just need to be careful when writing the new input file.

Resources