How to combine column from multiple text files? [duplicate] - bash

This question already has answers here:
How can I sum values in column based on the value in another column?
(5 answers)
Combine text from two files, output to another [duplicate]
(2 answers)
Closed 6 years ago.
I want to extract and combine a certain column from a bunch of text files into a single file as shown.
File1_example.txt
A 123 1
B 234 2
C 345 3
D 456 4
File2_example.txt
A 123 5
B 234 6
C 345 7
D 456 8
File3_example.txt
A 123 9
B 234 10
C 345 11
D 456 12
...
..
.
File100_example.txt
A 123 55
B 234 66
C 345 77
D 456 88
How can I loop through my files of interest and paste these columns together so that the final result is like below without having to type out 1000 unique file names?
1 5 9 ... 55
2 6 10 ... 66
3 7 11 ... 77
4 8 12 ... 88

Try this:
paste File[0-9]*_example.txt | awk '{i=3;while($i){printf("%s ",$i);i+=3}printf("\n")}'
Example:
File1_example.txt:
A 123 1
B 234 2
C 345 3
D 456 4
File2_example.txt:
A 123 5
B 234 6
C 345 7
D 456 8
Run command as:
$ paste File[0-9]*_example.txt | awk '{i=3;while($i){printf("%s ",$i);i+=3}printf("\n")}'
Output:
1 5
2 6
3 7
4 8

I tested below code with first 3 files
cat File*_example.txt | awk '{a[$1$2]= a[$1$2] $3 " "} END{for(x in a){print a[x]}}' | sort
1 5 9
2 6 10
3 7 11
4 8 12
1) use an awk array, a[$1$2]= a[$1$2] $3 " " index is column1 and column2, array value appends all column 3.
2) END{for(x in a){print a[x]}} travesrsed array a and prints all values.
3)use sort to sort the output.

when cating you need to ensure the file order is preserved, one way is to explicitly specify the files
cat File{1..100}_example.txt | awk '{print $NF}' | pr 4ts' '
extract last column by awk and align using pr

Related

Insert rows using awk

How can I insert a row using awk?
My file looks as:
1 43
2 34
3 65
4 75
I would like to insert three rows with "?" So my desire file looks as:
1 ?
2 ?
3 ?
4 43
5 34
6 65
7 75
I am trying with the below script.
awk '{if(NR<=3){print "NR ?"}} {printf" " NR $2}' file.txt
Here's one way to do it:
$ awk 'BEGIN{s=" "; for(c=1; c<4; c++) print c s "?"}
{print c s $2; c++}' ip.txt
1 ?
2 ?
3 ?
4 43
5 34
6 65
7 75
$ awk 'BEGIN {printf "1 ?\n2 ?\n3 ?\n"} {printf "%d", $1 + 3; printf " %s\n", $2}' file.txt
1 ?
2 ?
3 ?
4 43
5 34
6 65
7 75
You could also add the 3 lines before awk, e.g.:
{ seq 3; cat file.txt; } | awk 'NR <= 3 { $2 = "?" } $1 = NR' OFS='\t'
Output:
1 ?
2 ?
3 ?
4 43
5 34
6 65
7 75
I would do it following way using GNU AWK, let file.txt content be
1 43
2 34
3 65
4 75
then
awk 'BEGIN{OFS=" "}NR==1{print 1,"?";print 2,"?";print 3,"?"}{print NR+3,$2}' file.txt
output
1 ?
2 ?
3 ?
4 43
5 34
6 65
7 75
Explanation: I set output field separator (OFS) to 7 spaces. For 1st row I do print three lines which consisting of subsequent number and ? sheared by output field separator. You might elect to do this using for loop, especially if you expect that requirement might change here. For every line I print number of row plus 4 (to keep order) and 2nd column ($2). Thanks to use of OFS, you would need to make only one change if requirement regarding number of spaces will be altered. Note that construct like
{if(condition){dosomething}}
might be written in GNU AWK in more concise manner as
(condition){dosomething}
(tested in gawk 4.2.1)

Sort according to second column numerically and first alphabetically

I have 2 columns, I want to sort them using bash.
I used the command:
sort -k2 -n
c 9
c 11
c 11
sh 11
c 13
c 15
txt 47
txt 94
txt 345
txt 628
sh 3673
This is the result, but i need them to be sorted like this:
c 9
c 11
c 11
c 13
c 15
sh 11
sh 3673
txt 47
txt 94
txt 345
txt 628
Any ideas?
First sort by column 1, then by 2:
sort -k1,1 -k2,2n file.txt

How to merge three text files into three columns on screen

How can I merge three text files into three columns on screen?
1 A 1
2 B 2
3 C 3
D
E
I tried...
paste file1.txt file2.txt file3.txt | column -s $'\t' -t
...but I always get
1 A 1
2 B 2
3 C 3
D
E
Thanks in advance for your help!
line 1-2 of file1.txt
USB Device Class ID:
CdRom&Ven_ZALMAN&Prod__Virtual_CD-Rom&Rev_
line 1-2 of file2.txt
USB Instance ID:
______XX00000001&1
line 1-2 of file3.txt
Last updated (Subkey):
2015-01-12 15:08:45 UTC+0000
I don't know your input files, but paste works as intended.
$ paste <(seq 1 4) <(seq 10 17) <(seq 5 9)
1 10 5
2 11 6
3 12 7
4 13 8
14 9
15
16
17
:|paste -d ' ' file1 - file2 - file3 | column -ts "| " combine many files as a table column -t and -s as a separator "| " .
the output will be like that
1 A 1
2 B 2
3 C 3
D
E
If you only have 3 files or a few to deal with you can do this:
$ paste foo[12].txt | expand -t 45 | paste - foo3.txt | expand -t 12
USB Device Class ID: USB Instance ID: Last updated (Subkey):
CdRom&Ven_ZALMAN&Prod__Virtual_CD-Rom&Rev_ ______XX00000001&1 2015-01-12 15:08:45 UTC+0000
______XY0000000182
$
You need to choose the tab expansions 45 and 12 depending upon maximum line widths in foo1.txt and foo2.txt.

Divide column values of different files by a constant then output one minus the other

I have two files of the form
file1:
#fileheader1
0 123
1 456
2 789
3 999
4 112
5 131
6 415
etc.
file2:
#fileheader2
0 442
1 232
2 542
3 559
4 888
5 231
6 322
etc.
How can I take the second column of each, divide it by a value then minus one from the other and then output a new third file with the new values?
I want the output file to have the form
#outputheader
0 123/c-422/k
1 456/c-232/k
2 789/c-542/k
etc.
where c and k are numbers I can plug into the script
I have seen this question: subtract columns from different files with awk
But I don't know how to use awk to do this by myself, does anyone know how to do this or could explain what is going on in the linked question so I can try to modify it?
I'd write:
awk -v c=10 -v k=20 ' ;# pass values to awk variables
/^#/ {next} ;# skip headers
FNR==NR {val[$1]=$2; next} ;# store values from file1
$1 in val {print $1, (val[$1]/c - $2/k)} ;# perform the calc and print
' file1 file2
output
0 -9.8
1 34
2 51.8
3 71.95
4 -33.2
5 1.55
6 25.4
etc. 0

Write the number of elements per line of a file and its repetitions with awk

I have a file with all different integer in which each line may have different lenghts, like this:
1 2 3 4 5
16 7 8
9 10 101 102 13 14
15 6 17
24 28 31 30 18
I would like to print in output the number of elements that a line presents and the number of times there is the same number of elements per lines; the output of this example should be:
3 2
5 2
6 1
In the first column there are the number of elements per line, in the second the number of lines that presents the same number of elements.
The first line in the file has 5 elements and also the 5th one etc etc.
Print the count for the number of fields:
$ awk '{a[NF]++}END{for(k in a)print k,a[k]}' file
5 2
6 1
3 2
Pipe to sort for ordered output:
$ awk '{a[NF]++}END{for(k in a)print k,a[k]}' file | sort
3 2
5 2
6 1

Resources