how to combine 2 file text with join line in ruby - ruby

Please help me how to do with this code. and what should i do with that code:
file1.txt file2.txt
aaa 111
bbb 222
ccc 333
ddd 444
and the result just show like this
aaa
bbb
ccc
ddd
111
222
333
444
but what i want is
aaa|111
bbb|222
ccc|333
ddd|444
Here is my code
f1 = File.readlines('./file1.txt')
f2 = File.readlines('./file2.txt')
File.open('result.txt', 'w') do |output_file|
f1.each_with_index do |elem, i|
output_file.puts "#{elem} #{f2[i]}"
end
end

Instead of iterating on one array and looking up the second array by index, it is more elegant to zip the two arrays together. puts will, given an array, output each element in a separate row.
f1 = File.readlines('file1.txt', chomp: true)
f2 = File.readlines('file2.txt', chomp: true)
lines = f1.zip(f2).map { |items| items.join('|') }
puts lines
Or, using the new shorthand syntax, you could even say
lines = f1.zip(f2).map { _1.join('|') }

.map(&:chomp)
f1 = File.readlines('./file1.txt').map(&:chomp)
f2 = File.readlines('./file2.txt').map(&:chomp)
File.open('result.txt', 'w') do |output_file|
f1.each_with_index do |elem, i|
output_file.puts "#{elem} #{f2[i]}"
end
end
Result
aaa 111
bbb 222
ccc 333
ddd 444

Related

Join two CSV files in Ruby without using tables

I have 2 CSV files with columns like A, B, C.. & D, E, F. I want to join these two CSV files into a new file with rows where File1.B = File2.E and the row would have columns A, B/E, C, D, F. How can I achieve this JOIN without using tables?
Givens
We are given the following.
The paths for the two input files:
fname1 = 't1.csv'
fname2 = 't2.csv'
The path for the output file:
fname3 = 't3.csv'
The names of the headers to match in each of the two input files:
target1 = 'B'
target2 = 'E'
I do assume that (as is the case with the example) the two files necessarily contain the same number of lines.
Create test files
Let's first create the two files:
str = [%w|A B C|, %w|1 1 1|, %w|2 2 2|, %w|3 4 5|, %w|6 9 9|].
map { |a| a.join(",") }.join("\n")
#=> "A,B,C\n1,1,1\n2,2,2\n3,4,5\n6,9,9"
File.write(fname1, str)
#=> 29
str = [%w|D E F|, %w|21 1 41|, %w|22 5 42|, %w|23 8 45|, %w|26 9 239|].
map { |a| a.join(",") }.join("\n")
#=> "D,E,F\n21,1,41\n22,5,42\n23,8,45\n26,9,239"
File.write(fname2, str)
#=> 38
Read the input files into CSV::Table objects
When reading fname1 I will use the :header_converters option to convert the header "B" to "B/E". Note that this does not require knowledge of the location of the column with header "B" (or whatever it may be).
require 'csv'
new_target1 = target1 + "/" + target2
#=> "B/E"
csv1 = CSV.read(fname1, headers: true,
header_converters: lambda { |header| header==target1 ? new_target1 : header})
csv2 = CSV.read(fname2, headers: true)
Construct arrays of headers to be written from each input file
headers1 = csv1.headers
#=> ["A", "B/E", "C"]
headers2 = csv2.headers - [target2]
#=> ["D", "F"]
Create the output file
We will first write the new headers headers1 + headers2 to the output file.
Next, for each row index i (i = 0 corresponding to the first row after the header row in each file), for which a condition is satisfied, we write as a single row the elements of csv1[i] and csv2[i] that are in the columns having headers in headers1 and headers2. The condition to be satisfied to write the rows at index i is that i satisfies:
csv1[i][new_target1] == csv2[i][target2] #=> true
Now open fname3 for writing, write the headers and then the body.
CSV.open(fname3, 'w') do |csv|
csv << headers1 + headers2
[csv1.size, csv2.size].min.times do |i|
csv << (headers1.map { |h| csv1[i][h] } +
headers2.map { |h| csv2[i][h] }) if
csv1[i][new_target1] == csv2[i][target2]
end
end
#=> 4
Let's confirm that what was written is correct.
puts File.read(fname3)
A,B/E,C,D,F
1,1,1,21,41
6,9,9,26,239
If you have CSV files like these:
first.csv:
A | B | C
1 | 1 | 1
2 | 2 | 2
3 | 4 | 5
6 | 9 | 9
second.csv:
D | E | F
21 | 1 | 41
22 | 5 | 42
23 | 8 | 45
26 | 9 | 239
You can do something like this:
require 'csv'
first = CSV.read('first.csv')
second = CSV.read('second.csv')
CSV.open("result.csv", "w") do |csv|
csv << %w[A B.E C D F]
first.each do |rowF|
second.each do |rowS|
csv << [rowF[0],rowF[1],rowF[2],rowS[0],rowS[2]] if rowF[1] == rowS[1]
end
end
end
To get this:
result.csv:
A | B.E | C | D | F
1 | 1 | 1 | 21 | 41
6 | 9 | 9 | 26 | 239
The answer is to use group by to create a hash table and then iterate over the keys of the hash table. Assuming the column you're joining on is unique in each table:
join_column = "whatever"
csv1 = CSV.table("file1.csv").group_by { |r| r[join_column] }
csv2 = CSV.table("file2.csv").group_by { |r| r[join_column] }
joined_data = csv1.keys.sort.map do |join_column_values|
csv1[join_column].first.merge(csv2[join_column].first)
end
If the column is not unique in each table, then you need to decide how you want to handle those cases since there will be more than just the first element in the arrays csv1[join_column] and csv2[join_column]. You could do an O(mxn) join as suggested in one of the other answers (i.e. nested map calls), or you could filter or combine them based on some criteria. The choice really depends on your usecase.

ruby - tab from row not respected

I'm reading a simple txt file very well. However i'm getting the data row with tab not respected.
Below the row in the file.
Anderson Silva R$10 off R$20 of food 10.0 2 987 Fake St Batman Inc
And below is the out line at pry.
As we can see the 987 and Fake St is together in the same row.
Anderson Silva
R$10 off R$20 of food
10.0
2
987 Fake St
Batman Inc
and here the simple code
line.split("\t").map do |col|
col = col.split("\t")
puts col
end
I don't know if I'm understanding your question correctly, but I'd suspect that there's not actually a tab where you expect one.
def contrived_method(str)
str.split("\t").each do |col|
col = col.split("\t")
puts col
end
end
line1 = "10.0\t2\t987 Fake St"
line2 = "10.0\t2\t987\tFake St"
contrived_method(line1)
#=> 10.0
#=> 2
#=> 987 Fake St
contrived_method(line2)
#=> 10.0
#=> 2
#=> 987
#=> Fake St
For demonstration, I've reduced the size of your string to show that the String::split method will indeed split on the supplied delimiter. And--in this case--I've used eachinstead of mapbecause there's no assignment.
You'll find the inspect method valuable in this case:
line1 = "10.0\t2\t987 Fake St"
puts line1.inspect
#=> "10.0\t2\t987 Fake St"
puts line1
#=> 10.0 2 987 Fake St

How to count 1 to 9 on a single line in ruby

I'm struggling to figure out how to loop numbers in a single line on ruby.
x = 0
while x <= 9
puts x
x +=1
end
This would give you
0
1
2
3
4
5
6
7
8
9
Each on different lines.
But what I want is to get this on a single line so like
01234567891011121314151617181920
Also not limited to just 0-9 more like 0 to infinity on a single line.
The purpose is to make an triangle of any size that follows this pattern.
1
12
123
1234
12345
123456
Each of these would be on a different line. The formatting here won't let me put in on different lines.
Would really like to solve this. It is hurting my head.
try this:
(1..9).each { |n| print n }
puts
You said you want "to make a triangle of any size that follows this pattern", so you should not make assumptions about how that should be done. Here are two ways to do that.
#1
def print_triangle(n)
(1..n).each.with_object('') { |i,s| puts s << i.to_s }
end
print_triangle(9)
1
12
123
1234
12345
123456
1234567
12345678
123456789
#2
def print_triangle(n)
s = (1..n).to_a.join
(1..n).each { |i| puts s[0,i] }
end
print_triangle(9)
1
12
123
1234
12345
123456
1234567
12345678
123456789
how about this solution:
last_num = 9
str = (1..last_num).to_a.join # create string 123456789
0.upto(last_num-1){ |i| puts str[0..i] } # print line by line
puts (1..9).map(&:to_s).join
Regarding your final aim there are lots of (probably easier) ways, but here's one:
def print_nums k
k.times { |n| puts (1..(n+1)).map { |i| i*10**(n+1-i) }.inject(:+) }
end
print_nums 9
#1
#12
#123
#1234
#12345
#123456
#1234567
#12345678
#123456789
This approach generates the actual numbers using units, tens, hundreds etc in relation to the line number i.
Thought Process
Looking at a basic example of four lines:
1
12
123
1234
is the same as:
1*10**0 #=> 1
1*10**1 + 2*10**0 #=> 12
1*10**2 + 2*10**1 + 3*10**0 #=> 123
1*10**3 + 2*10**2 + 3*10**1 + 4*10**0 #=> 1234
which in Ruby can be generated with:
(1..1).map { |i| i*10**(1-i) }.inject(:+) #=> 1
(1..2).map { |i| i*10**(2-i) }.inject(:+) #=> 12
(1..3).map { |i| i*10**(3-i) }.inject(:+) #=> 123
(1..4).map { |i| i*10**(4-i) }.inject(:+) #=> 1234
looking for a pattern we can generalise and put in a method:
def print_nums k
k.times { |n| puts (1..(n+1)).map { |i| i*10**(n+1-i) }.inject(:+) }
end
You could (and should) of course ignore all of the above and just extend the excellent answer by #seph
3.times { |i| (1..(i+1)).each { |n| print n }; puts }
#1
#12
#123
The simplest way if you want to start from 1
9.times {|n| puts n+1}
try if you want to start from 0
10.times {|n| puts n}
if you want pyramid format this is one way to do
9.times{|c| puts (1..c+1).to_a.join}
this is the ouput
2.3.0 :025 > 9.times{|c| puts (1..c+1).to_a.join}
1
12
123
1234
12345
123456
1234567
12345678
123456789

How to separate the text file's values using the start and end positions

I have a text file:
GLKIIM 08052016 08052016 444-22222222 33333 5675555
ABCDEF 87645123 34211016 333-11111111 22222 5123455
I am using CSV.read to read the text file.
For each line in the text file, I need to extract the column values by the start and end positions. For that I have arrays:
start_pos = [1 8 17 26 30 39 45]
end_pos = [6 15 24 28 37 43 51]
which mean in the text file from position start_pos[0] to end_pos[0], i.e 1 to 6, we will have the first column's values, GLKIIM and ABCDEF.
The column names are:
column_name = [SOURCE_NAME BATCH_DATE EFFECT_DATE ID ACCOUNT_NO ENTITY ACCOUNT]
I need to create a hash as follows:
{
0=>{"SOURCE_NAME"=>"GLKIIM", "BATCH_DATE"=>"08052016", "EFFECT_DATE"=>"08052016", "ID"=>"444", "ACCOUNT_NO"=>"22222222", "ENTITY"=>"33333", "ACCOUNT"=>"5675555"},
1=>{"SOURCE_NAME"=>"ABCDEF", "BATCH_DATE"=>"87645123", "EFFECT_DATE"=>"34211016", "ID"=>"333", "ACCOUNT_NO"=>"11111111", "ENTITY"=>"22222", "ACCOUNT"=>"5123455"}
}
I cannot use space () as a delimiter to segregate the columns values, I need to use the start and end positions.
input = 'GLKIIM 08052016 08052016 444-22222222 33333 5675555
ABCDEF 87645123 34211016 333-11111111 22222 5123455'
start_pos = %w|1 8 17 26 30 39 45|.map &:to_i
end_pos = %w|6 15 24 28 37 43 51|.map &:to_i
input.split($/).map do |line|
start_pos.zip(end_pos).map { |s, e| line[s-1..e-1] }
end
#⇒  [["GLKIIM", "08052016", "08052016", "444", "22222222", "33333", "5675555"],
# ["ABCDEF", "87645123", "34211016", "333", "11111111", "22222", "5123455"]]
Do not read the file as a Comma-Separated-Values (CSV) file, if it isn't one.
Using "speaking code" you could use File.readlines instead:
#!/bin/env ruby
result = ARGF.readlines.map do |line|
[line[0..5], line[7..14], line[16..23], line[24..36]]
end
puts result.inspect
# => [["GLKIIM", "08052016", "08052016", " 444-22222222"], ["ABCDEF", "87645123", "34211016", " 333-11111111"]]
If you save this script you can run it as:
readliner.rb MYFILE.TXT MYFILE2.TXT MYFILE3.TXT
or pipe into it:
cat myfile | readliner.rb
Alternatively use
File.readlines("MYFILE.TXT")
instead of ARGF.readlines in the script.
The use of readlines can bring problems with it, as it reads the whole file into memory to yield an array of lines. See the comments for a small discussion on that topic.
Let's code-golf a bit, while staying somewhat readable and removing readlines:
#!/bin/env ruby
COLS = { "SOURCE_NAME" => 0..5,
"BATCH_DATE" => 7..14,
"EFFECT_DATE" => 16..23 }
result = ARGF.each_with_index.map do |line, idx|
[idx, COLS.map{|name,range| [name, line[range]] }.to_h ]
end.to_h
puts result.inspect
# => {0=>{"SOURCE_NAME"=>"GLKIIM", "BATCH_DATE"=>"08052016", "EFFECT_DATE"=>"08052016"}, 1=>{"SOURCE_NAME"=>"ABCDEF", "BATCH_DATE"=>"87645123", "EFFECT_DATE"=>"34211016"}}
I used below code:
file = File.open('abc.TXT', "r")
i = 0
file.each_line do |line|
temp = {}
for itrator in 0..column_name.length-1
temp[column_name[itrator]] = line[start_pos[itrator]-1..end_pos[itrator]-1]
end
data_hash[i] = temp
i+=1
end
puts data_hash
Assuming that file name containing the following data is abc.txt:
GLKIIM 08052016 08052016 444-22222222 33333 5675555
ABCDEF 87645123 34211016 333-11111111 22222 5123455

how to Split and get data in ruby

I am newbie in ruby.
Now, I have problem about text splitting by ruby programming.
My text is like
AA:0.88:320:800|BB:0.82:1040:1330|CC:0.77:1330:1700|DD:0.71:1700:2010|EE:1.00:2070:2390
So I need result to (process until end of text)
AA 0.88
BB 0.82
CC 0.77
DD 0.71
EE 1.00
How to coding it. Now I can only split by "|".
Best regard.
Use String#split:
s = 'AA:0.88:320:800|BB:0.82:1040:1330|CC:0.77:1330:1700|DD:0.71:1700:2010|EE:1.00:2070:2390'
s.split('|').each do |substring|
name, num, * = substring.split(':')
puts "#{name} #{num}"
end
output:
AA 0.88
BB 0.82
CC 0.77
DD 0.71
EE 1.00
And here, just for reference, is a regexp version:
s = 'AA:0.88:320:800|BB:0.82:1040:1330|CC:0.77:1330:1700|DD:0.71:1700:2010|EE:1.00:2070:2390'
p s.scan /(?:\||\A)([^:]+):([^:]+)/
# => [["AA", "0.88"], ["BB", "0.82"], ["CC", "0.77"], ["DD", "0.71"], ["EE", "1.00"]]
The code is shorter but much harder to read and debug. Use the other answers before this one!
Edit:
And here is the same regexp with some comments:
s.scan %r{
(?: \| | \A) # Look for start of string (\A) or '|' (\|) but do not include in capture (?:)
([^:]+) # Look for and capture one or more characters that are not ':'
: # Look for but do not capture a ':', Not captured as line is not in parenthesis.
([^:]+) # Repeat second line.
}x # x modifies the regexp to allow comments.
s = 'AA:0.88:320:800|BB:0.82:1040:1330|CC:0.77:1330:1700'
s.split('|').map { |item| # produces an array and remaps it
s1, s2, * = item.split(':')
puts "#{s1} #{s2}"
[s1, s2]
}
Hope it helps.
Using gsub and
regex:
str = "AA:0.88:320:800|BB:0.82:1040:1330|CC:0.77:1330:1700|" +
"DD:0.71:1700:2010|EE:1.00:2070:2390"
puts str.gsub(':',' ').scan(/(([A-Z])\2 \d\.\d\d)/).map(&:first)
AA 0.88
BB 0.82
CC 0.77
DD 0.71
EE 1.00
These are the steps:
s1 = str.gsub(':',' ')
# => "AA 0.88 320 800|BB 0.82 1040 1330|CC 0.77 1330 1700|
DD 0.71 1700 2010|EE 1.00 2070 2390" (broken for display)
s2 = s1.scan(/(([A-Z])\2 \d\.\d\d)/)
# => [["AA 0.88", "A"], ["BB 0.82", "B"], ["CC 0.77", "C"],
["DD 0.71", "D"], ["EE 1.00", "E"]]
s3 = s2.map(&:first)
# => ["AA 0.88", "BB 0.82", "CC 0.77", "DD 0.71", "EE 1.00"]
In the regex, /(...)/ and ([A-Z]) are the first and second capture groups, respectively. \2 equals what is captured by the second group, so `([A-Z])\2 requires that the same two capital letters appear together (e.g., 'CC').

Resources