Paste side by side multiple files by numerical order - shell

I have many files in a directory with similar file names like file1, file2, file3, file4, file5, ..... , file1000. They are of the same dimension, and each one of them has 5 columns and 2000 lines. I want to paste them all together side by side in a numerical order into one large file, so the final large file should have 5000 columns and 2000 lines.
I tried
for x in $(seq 1 1000); do
paste `echo -n "file$x "` > largefile
done
Instead of writing all file names in the command line, is there a way I can paste those files in a numerical order (file1, file2, file3, file4, file5, ..., file10, file11, ..., file1000)?
for example:
file1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
...
file2
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
....
file 3
3 3 3 3 3
3 3 3 3 3
3 3 3 3 3
....
paste file1 file2 file3 .... file 1000 > largefile
largefile
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
....
Thanks.

If your current shell is bash: paste -d " " file{1..1000}

you need rename the files with leading zeroes, like
paste <(ls -1 file* | sort -te -k2.1n) <(seq -f "file%04g" 1000) | xargs -n2 echo mv
The above is for "dry run" - Remove the echo if you satisfied...
or you can use e.g. perl
ls file* | perl -nlE 'm/file(\d+)/; rename $_, sprintf("file%04d", $1);'
and after you can
paste file*

With zsh:
setopt extendedglob
paste -d ' ' file<->(n)
<x-y> is to match positive decimal integer numbers from x to y. x and/or y can be omitted so <-> is any positive decimal integer number. It could also be written [0-9]## (## being the zsh equivalent of regex +).
The (n) is the globbing qualifiers. The n globbing qualifier turns on numeric sorting which sorts on all sequences of decimal digits appearing in the file names.

Related

How to merge three lines at a time

I have a .txt file with 9 lines:
1 2 3 4
1 2 3 5
1 2 3 6
1 2 3 4
1 2 3 5
1 2 3 6
1 2 3 4
1 2 3 5
1 2 3 6
I want to put the first 3 lines into one line, and the next three lines, and again the last three lines:
1 2 3 4 1 2 3 5 1 2 3 6
1 2 3 4 1 2 3 5 1 2 3 6
1 2 3 4 1 2 3 5 1 2 3 6
however it only gives me one consecutive line
I tried
cat old.txt | tr -d '\n' > new.txt
You can use paste to merge together lines.
paste -d " " - - - < input.txt
The -d " " uses a space to delimit between the lines being joined. Each - reads from stdin (and we're redirecting your input file to stdin). If you wanted to join more lines, just increase the number of - etc.

Loop through a file and paste columns next to one another

Given I have a python script as follows:
#!/usr/bin/python
for i in range(1,4):
print i
I want to run it in a bash loop for 3 times but I want to add the output as columns rather than concatenating. Is there a way to achieve this?
Output:
1 1 1
2 2 2
3 3 3
Like this?:
$ for i in {1..3} ; do echo $i $i $i ; done
1 1 1
2 2 2
3 3 3
You are looking for the pr command:
for i in 1 2 3 ; do
python a.py
done | pr -t -3
Output:
1 1 1
2 2 2
3 3 3
Btw, to get the numbers from 1 to 3 you need to use:
range(1,4) # <-- 4, not 3!
in Python

awk: print first column, then some values, and then all other columns

I want to print the first column, then a couple of columns with fixed values, like this command would do:
awk '{print $1,"1","2","1"}'
and then print all columns except the first after that...
I know this command prints all but the first column:
awk '{$1=""; print $0}'
But that gets rid of the first column.
In other words, this:
3 5 2 2
3 5 2 2
3 5 2 2
3 5 2 2
Needs to become this:
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
Any ideas?
use a loop to iterate through rest of the columns like this:
awk '{print $1,"1","2","1";for(i=2;i<=NF;i++) print $i}'
As an example:
$echo "3 5 2 2" | awk 'BEGIN{ORS=""}{print $1,"1","2","1";for(i=2;i<=NF;i++) print $i}'
3 1 2 1 5 2 2
$
Edit1 :
$ echo "3 5 2 2" | awk 'BEGIN{ORS="\n";OFS="\n"}{print $1,"1","2","1 ";for(i=2;i<=NF;i++) print $i" "}'
3
1
2
1
5
2
2
$
Edit2:
$ echo "3 5 2 2" | awk '{print $1,"1","2","1";for(i=2;i<=NF;i++) print $i}'
3 1 2 1
5
2
2
$
Edit3:
$ echo "3 5 2 2
3 5 2 2
3 5 2 2
3 5 2 2" | awk '{printf("%s %s ", $1,"1 2 1");for(i=2;i<=NF;i++) printf("%s ", $i); printf "\n"}'
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
You are almost there, you just need to store the first column in a temporary variable:
{
head=$1; # Store $1 in head, used later in printf
$1=""; # Empty $1, so that $0 will not contain first column
printf "%s 1 2 1%s\n", head, $0
}
And a full script:
echo "3 5 2 2" | awk '{head=$1;$1="";printf "%s 1 2 1%s\n", head, $0}'
Another solution with awk:
awk '{sub(/.*/, "1 2 1 "$2, $2)}1' File
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
Substitute the 2nd field with "1 2 1" followed by 2nd field itself.
You can do this using sed by replacing the first space by the string you want.
sed 's/ / 1 2 1 /' file
(OR)
With awk by replacing the first field($1):
awk '{$1=$1 " 1 2 1"}1' file
(I prefer the sed solution since it has less characters).

How to produce cartesian product in bash?

I want to produce such file (cartesian product of [1-3]X[1-5]):
1 1
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
3 1
3 2
3 3
3 4
3 5
I can do this using nested loop like:
for i in $(seq 3)
do
for j in $(seq 5)
do
echo $i $j
done
done
is there any solution without loops?
Combine two brace expansions!
$ printf "%s\n" {1..3}" "{1..5}
1 1
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
3 1
3 2
3 3
3 4
3 5
This works by using a single brace expansion:
$ echo {1..5}
1 2 3 4 5
and then combining with another one:
$ echo {1..5}+{a,b,c}
1+a 1+b 1+c 2+a 2+b 2+c 3+a 3+b 3+c 4+a 4+b 4+c 5+a 5+b 5+c
A shorter (but hacky) version of Rubens's answer:
join -j 999999 -o 1.1,2.1 file1 file2
Since the field 999999 most likely does not exist it is considered equal for both sets and therefore join have to do the Cartesian product. It uses O(N+M) memory and produces output at 100..200 Mb/sec on my machine.
I don't like the "shell brace expansion" method like echo {1..100}x{1..100} for large datasets because it uses O(N*M) memory and can when used careless bring your machine to knees. It is hard to stop because ctrl+c does not interrupts brace expansion which is done by the shell itself.
The best alternative for cartesian product in bash is surely -- as pointed by #fedorqui -- to use parameter expansion. However, in case your input that is not easily producible (i.e., if {1..3} and {1..5} does not suffice), you could simply use join.
For example, if you want to peform the cartesian product of two regular files, say "a.txt" and "b.txt", you could do the following. First, the two files:
$ echo -en {a..c}"\tx\n" | sed 's/^/1\t/' > a.txt
$ cat a.txt
1 a x
1 b x
1 c x
$ echo -en "foo\nbar\n" | sed 's/^/1\t/' > b.txt
$ cat b.txt
1 foo
1 bar
Notice the sed command is used to prepend each line with an identifier. The identifier must be the same for all lines, and for all files, so the join will give you the cartesian product -- instead of putting aside some of the resultant lines. So, the join goes as follows:
$ join -j 1 -t $'\t' a.txt b.txt | cut -d $'\t' -f 2-
a x foo
a x bar
b x foo
b x bar
c x foo
c x bar
After both files are joined, cut is used as an alternative to remove the column of "1"s formerly prepended.

Comparing few colums of a file with columns of another file

I have two data files 1.txt and 2.txt
1.txt contains valid lines.
For example.
1 2 1 2
1 3 1 3
In 2.txt i have an extra coloum, but if you ignore that, I have a few valid lines, and few invalid lines. There could be multiple occurrences of the same line in 2.txt
For example:
1 2 1 2 1.9
1 3 1 3 3.4
1 3 1 3 3.4
2 3 2 3 5.6
2 3 2 3 5.6
The second and third lines are the same and valid.
The fourth and fifth lines are also the same but invalid.
I want to write a shell script which compares these two files and outputs two files, valid.txt and invalid.txt which look like these...
valid.txt :
1 2 1 2 1
1 3 1 3 2
and invalid.txt :
2 3 2 3 2
The last extra column of valid.txt and invalid.txt contains the number of times the line has been repeated in 2.txt.
this awk script works for the example data:
awk 'NR==FNR{sub(/ *$/,"");a[$0]++;next}
{sub(/ [^ ]*$/,"")
if($0 in a)
v[$0]++
else
n[$0]++
}
END{
for(x in v)print x,v[x] > "valid.txt"
for(x in n) print x,n[x] >"inv.txt"
}' file1 file2
output:
kent$ head inv.txt valid.txt
==> inv.txt <==
2 3 2 3 2
==> valid.txt <==
1 3 1 3 2
1 2 1 2 1

Resources