Concatenate many files into one file without the header - bash

I have three csv files (with the same name, e.g. A_bestInd.csv) that are located in different subfolders. I want to copy all of them into one file (e.g. All_A_bestInd.csv). To do that, I did the following:
{ find . -type f -name A_bestInd.csv -exec cat '{}' \; ; } >> All_A_bestInd.csv
The result of this command is the following:
Class Conf 1 2 3 4 //header of file1
A Reduction 5 1 2 1
A Reduction 1 8 1 10
Class Conf 1 2 3 4 //header of file2
A No_red 2 1 3 2
A No_red 3 6 1 9
Class Conf 1 2 3 4 //header of file3
A Reduction 5 5 8 9
A Reduction 7 2 1 11
As you can see, the issue is the header of each file is copied. How can I change my command to keep only one header and avoid the rest?

Use tail +2 to trim the headers from all the files.
find . -type f -name A_bestInd.csv -exec tail +2 {} \; >> All_A_bestInd.csv
To keep just one header you could combine it with head -1.
{ find . -type f -name A_bestInd.csv -exec head -1 {} \; -quit
find . -type f -name A_bestInd.csv -exec tail +2 {} \; } >> All_A_bestInd.csv

There are solutions with tail +2 and awk, but it seems to me the classic way to print all but the first line of a file is sed: sed -e 1d. So:
find . -type f -name A_bestInd.csv -exec sed -e 1d '{}' \; >> All_A_bestInd.csv

Use awk to filter out header lines from all files but the first (except you have thousands of them):
find . -type f -name 'A_bestInd.csv' -exec awk 'NR==1 || FNR>1' {} + > 'All_A_bestInd.csv'
NR==1 || FNR>1 means; if the number of current line from the start of input is 1, or, the number of current line from the start of current file is greater than 1, print current line.
$ cat A_bestInd.csv
Class Conf 1 2 3 4 //header of file3
A Reduction 5 5 8 9
A Reduction 7 2 1 11
$
$ cat foo/A_bestInd.csv
Class Conf 1 2 3 4 //header of file1
A Reduction 5 1 2 1
A Reduction 1 8 1 10
$
$ cat bar/A_bestInd.csv
Class Conf 1 2 3 4 //header of file2
A No_red 2 1 3 2
A No_red 3 6 1 9
$
$ find . -type f -name 'A_bestInd.csv' -exec awk 'NR==1 || FNR>1' {} + > 'All_A_bestInd.csv'
$
$ cat All_A_bestInd.csv
Class Conf 1 2 3 4 //header of file1
A Reduction 5 1 2 1
A Reduction 1 8 1 10
A Reduction 5 5 8 9
A Reduction 7 2 1 11
A No_red 2 1 3 2
A No_red 3 6 1 9

Related

Multiply all values in txt file in bash

I have a file that I need to multiply each number with -1. I have tried some commands but the result I get every time is only the first column multiplied with -1. Please help!
The file is as follows:
-1 2 3 -4 5 -6
7 -8 9 10 12 0
The expected output would be
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
Commands I have tried are:
awk '{print $0*-1}' file
sed 's/$/ -1*p /' file | bc (syntax error)
sed 's/$/ * -1 /' file | bc (syntax error)
numfmt --from-unit=-1 < file (error: numfmt: invalid unit size: ‘-1’)
With bash and an array:
while read -r -a arr; do
declare -ia 'arr_multiplied=( "${arr[#]/%/*-1}" )';
echo "${arr_multiplied[*]}";
done < file
Output:
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
I got this idea from this Stack Overflow answer by j4x.
One awk approach:
$ awk '{for (i=1;i<=NF;i++) $i=$i*-1} 1' file
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
Using the <var_or_field><op>=<value> construct:
$ awk '{for (i=1;i<=NF;i++) $i*=-1} 1' file
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
Using perl and its autosplit mode:
perl -lane 'print join(" ", map { $_ * -1 } #F)' file
To multiply every number in the file with -1, you can use the following 'awk'command:
`awk '{ for (i=1; i<=NF; i++) $i=$i*-1; print }' file`
This command reads each line of the file, and for each field (number) in the line, it multiplies it by -1. It then prints the modified line.
The output will be as follows:
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
Alternatively, you can use the following 'sed' command:
sed 's/-\([0-9]*\)/\1/g; s/\([0-9]*\)/-\1/g' file
This command replaces all negative numbers with their positive equivalent, and all positive numbers with their negative equivalent. The output will be the same as above.
For completeness an approach with ruby.
-l Line-ending processing
-a Auto-splitting, provides $F (field, set with -F)
-p Auto-prints $_ (line)
-e Execute code
ruby -lape '$_ = $F.map {|x| x.to_i * -1}.join " "' file
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
Just switching the signs ought to do.
$: cat file
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
$: sed 's/^ */ /; s/ */ /g; s/ -/ /g; s/ / -/g; s/-0/0/g; s/^ *//;' file
-1 2 3 -4 5 -6
7 -8 9 10 12 0
If you don't care about leading spaces or signs on your zeros, you can drop some of that. The logic is flexible, too...
$: sed 's/ *-/+/g; s/ / -/g; s/+/ /g;' x
1 2 3 -4 5 -6
7 -8 9 10 12 -0
There are multiple ways we can do this.
I can think of the following 2 ways
cat file | awk '{for (i=1;i<=NF;i++){ $i*=-1} print}'
This will give out
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
In this method we overwrite the $i value and print $0
Another way
cat random.xml | awk '{for (i=1;i<=NF;i++){printf("%d ",$i*-1)} printf("\n") }'
Gives the output
1 -2 -3 4 -5 6
-7 8 -9 -10 -12 0
In this method we print the value $i*-1 and so we need to use printf() function
don't "do math" and actually multiply by -1 -
just use regex to flip the signs, and process thousands or even millions of numbers with 3 calls to gsub()

C-shell: print result from row to column

I have outputs:
1 rpt 4 qor 5 are 6 oip
I want it to become :
1 rpt
4 qor
5 are
6 oip
This is my code:
set d = `sort "04.txt" | uniq -c`
echo $d
With your shown samples, please try following.
xargs -n 2 < Input_file
From man xargs:
-n max-args, --max-args=max-args Use at most max-args arguments per command line. Fewer than max-args arguments will be used if the size
(see the -s option) is exceeded, unless the -x option is given, in
which case xargs will exit.
akshay#sys:~$ cat file
1 rpt 4 qor 5 are 6 oip
akshay#sys:~$ sed 's/ /\n/2; P; D' file
1 rpt
4 qor
5 are
6 oip
akshay#sys:~$ awk -v n=2 '{for (i=n+1; i<=NF; i+=n) $i = "\n" $i}1' file
1 rpt
4 qor
5 are
6 oip
akshay#sys:~$ awk -v RS=" " '{$1=$1; ORS=NR%2?FS:"\n" }1' file
1 rpt
4 qor
5 are
6 oip

Add column to csv file

I have two files and I need catch the last column of a file and append to other file.
file1
1 2 3
1 2 3
1 2 3
file2
5 5
5 5
5 5
Initial proposal
#!/usr/bin/env bash
column=$(awk '{print $(NF)}' $file1)
paste -d',' $file2 < $column
Expected result
file2
5 5 3
5 5 3
5 5 3
But, This script does not work yet
OBS: I do not know how many columns have in the file. I need more generic solution.
You can use this paste command:
paste -d " " file2 <(awk '{print $NF}' file1)
5 5 3
5 5 3
5 5 3
To append last column of file1 to file2:
paste -d " " file2 <(rev file1 | cut -d " " -f 1 | rev)
Output:
5 5 3
5 5 3
5 5 3
To paste the second column of file 1 to file 2:
while read line; do
read -u 3 c1 c2 c3;
echo $line $c2;
done < file2 3< file1
You can use Perl too:
$ paste -d ' ' file2.txt <(perl -lne 'print $1 if m/(\S+)\s*$/' file1.txt)
5 5 3
5 5 3
5 5 3
Or grep:
$ paste -d ' ' file2.txt <(grep -Eo '(\S+)\s*$' file1.txt)
5 5 3
5 5 3
5 5 3

Paste side by side multiple files by numerical order

I have many files in a directory with similar file names like file1, file2, file3, file4, file5, ..... , file1000. They are of the same dimension, and each one of them has 5 columns and 2000 lines. I want to paste them all together side by side in a numerical order into one large file, so the final large file should have 5000 columns and 2000 lines.
I tried
for x in $(seq 1 1000); do
paste `echo -n "file$x "` > largefile
done
Instead of writing all file names in the command line, is there a way I can paste those files in a numerical order (file1, file2, file3, file4, file5, ..., file10, file11, ..., file1000)?
for example:
file1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
...
file2
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
....
file 3
3 3 3 3 3
3 3 3 3 3
3 3 3 3 3
....
paste file1 file2 file3 .... file 1000 > largefile
largefile
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
....
Thanks.
If your current shell is bash: paste -d " " file{1..1000}
you need rename the files with leading zeroes, like
paste <(ls -1 file* | sort -te -k2.1n) <(seq -f "file%04g" 1000) | xargs -n2 echo mv
The above is for "dry run" - Remove the echo if you satisfied...
or you can use e.g. perl
ls file* | perl -nlE 'm/file(\d+)/; rename $_, sprintf("file%04d", $1);'
and after you can
paste file*
With zsh:
setopt extendedglob
paste -d ' ' file<->(n)
<x-y> is to match positive decimal integer numbers from x to y. x and/or y can be omitted so <-> is any positive decimal integer number. It could also be written [0-9]## (## being the zsh equivalent of regex +).
The (n) is the globbing qualifiers. The n globbing qualifier turns on numeric sorting which sorts on all sequences of decimal digits appearing in the file names.

Split specific column(s)

I have this kind of recrods:
1 2 12345
2 4 98231
...
I need to split the third column into sub-columns to get this (separated by single-space for example):
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Can anybody offer me a nice solution in sed, awk, ... etc ? Thanks!
EDIT: the size of the original third column may vary record by record.
Awk
% echo '1 2 12345
2 4 98231
...' | awk '{
gsub(/./, "& ", $3)
print
}
'
1 2 1 2 3 4 5
2 4 9 8 2 3 1
...
[Tested with GNU Awk 3.1.7]
This takes every character (/./) in the third column ($3) and replaces (gsub()) it with itself followed by a space ("& ") before printing the entire line.
Sed solution:
sed -e 's/\([0-9]\)/\1 /g' -e 's/ \+/ /g'
The first sed expression replaces every digit with the same digit followed by a space. The second expression replaces every block of spaces with a single space, thus handling the double spaces introduced by the previous expression. With non-GNU seds you may need to use two sed invocations (one for each -e).
Using awk substr and printf:
[srikanth#myhost ~]$ cat records.log
1 2 12345 6 7
2 4 98231 8 0
[srikanth#myhost ~]$ awk '{ len=length($3); for(i=1; i<=NF; i++) { if(i==3) { for(j = 1; j <= len; j++){ printf substr($3,j,1) " "; } } else { printf $i " "; } } printf("\n"); }' records.log
1 2 1 2 3 4 5 6 7
2 4 9 8 2 3 1 8 0
You can use this for more than three column records as well.
Using perl:
perl -pe 's/([0-9])(?! )/\1 /g' INPUT_FILE
Test:
[jaypal:~/Temp] cat tmp
1 2 12345
2 4 98231
[jaypal:~/Temp] perl -pe 's/([0-9])(?! )/\1 /g' tmp
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Using gnu sed:
sed 's/\d/& /3g' INPUT_FILE
Test:
[jaypal:~/Temp] sed 's/[0-9]/& /3g' tmp
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Using gnu awk:
gawk '{print $1,$2,gensub(/./,"& ","G", $NF)}' INPUT_FILE
Test:
[jaypal:~/Temp] gawk '{print $1,$2,gensub(/./,"& ","G", $NF)}' tmp
1 2 1 2 3 4 5
2 4 9 8 2 3 1
If you don't care about spaces, this is a succinct version:
sed 's/[0-9]/& /g'
but if you need to remove spaces, we just chain another regexp:
sed 's/[0-9]/& /g;s/ */ /g'
Note this is compatible with the original sed, thus will run on any UNIX-like.
$ awk -F '' '$1=$1' data.txt | tr -s ' '
1 2 1 2 3 4 5
2 4 9 8 2 3 1
This might work for you:
echo -e "1 2 12345\n2 4 98231" | sed 's/\B\s*/ /g'
1 2 1 2 3 4 5
2 4 9 8 2 3 1
Most probably GNU sed only.

Resources