How to merge three lines at a time - bash

I have a .txt file with 9 lines:
1 2 3 4
1 2 3 5
1 2 3 6
1 2 3 4
1 2 3 5
1 2 3 6
1 2 3 4
1 2 3 5
1 2 3 6
I want to put the first 3 lines into one line, and the next three lines, and again the last three lines:
1 2 3 4 1 2 3 5 1 2 3 6
1 2 3 4 1 2 3 5 1 2 3 6
1 2 3 4 1 2 3 5 1 2 3 6
however it only gives me one consecutive line
I tried
cat old.txt | tr -d '\n' > new.txt

You can use paste to merge together lines.
paste -d " " - - - < input.txt
The -d " " uses a space to delimit between the lines being joined. Each - reads from stdin (and we're redirecting your input file to stdin). If you wanted to join more lines, just increase the number of - etc.

Related

Divide an output into multiple variables using shell script

So I have a C program that outputs many numbers. I have to check them all. The problem is, each time I run my program, I need to change seeds. In order to do that, I've been doing it manually and was trying to make a shell script to work around this.
I've tried using sed but couldn't manage to do it.
I'm trying to get the output like this:
a=$(./algorithm < input.txt)
b=$(./algorithm2 < input.txt)
c=$(./algorithm3 < input.txt)
The output of each algorithm program is something like this:
12 13 315
1 2 3 4 5 6 7 8 10 2 8 9 1 0 0 2 3 4 5
So the variable a has all this output, and what I need is
variable a to contain this whole string
and variable a1 to contain only the third number, in this case, 315.
Another example:
2 3 712
1 23 15 12 31 23 3 2 5 6 6 1 2 3 5 51 2 3 21
echo $b should give this output:
2 3 712
1 23 15 12 31 23 3 2 5 6 6 1 2 3 5 51 2 3 21
and echo $b1 should give this output:
712
Thanks!
Not exactly what you are asking, but one way to do this would be to store the results of your algorithm in arrays, and then dereference the item of interest. You'd write something like:
a=( $(./algorithm < input.txt) )
b=( $(./algorithm2 < input.txt) )
c=( $(./algorithm3 < input.txt) )
Notice the extra () that encloses the statements. Now, a, b and c are arrays, and you can access the item of interest like ${a[0]} or $a[1].
For your particular case, since you want the 3rd element, that would have index = 2, hence:
a1=${a[2]}
b1=${b[2]}
c1=${c[2]}
Since you are using the Bash shell (see your tags), you can use Bash arrays to easily access the individual fields in your output strings. For example like so:
#!/bin/bash
# Your lines to gather the output:
# a=$(./algorithm < input.txt)
# b=$(./algorithm2 < input.txt)
# c=$(./algorithm3 < input.txt)
# Just to use your example output strings:
a="$(printf "12 13 315 \n 1 2 3 4 5 6 7 8 10 2 8 9 1 0 0 2 3 4 5")"
b="$(printf "2 3 712 \n 1 23 15 12 31 23 3 2 5 6 6 1 2 3 5 51 2 3 21")"
# Put the output in arrays.
a_array=($a)
b_array=($b)
# You can access the array elements individually.
# The array index starts from 0.
# (The names a1 and b1 for the third elements were your choice.)
a1="${a_array[2]}"
b1="${b_array[2]}"
# Print output strings.
# (The newlines in $a and $b are gobbled by echo, since they are not quoted.)
echo "Output a:" $a
echo "Output b:" $b
# Print third elements.
echo "3rd from a: $a1"
echo "3rd from b: $b1"
This script outputs
Output a: 12 13 315 1 2 3 4 5 6 7 8 10 2 8 9 1 0 0 2 3 4 5
Output b: 2 3 712 1 23 15 12 31 23 3 2 5 6 6 1 2 3 5 51 2 3 21
3rd from a: 315
3rd from b: 712
Explanation:
The trick here is that array constants (literals) in Bash have the form
(<space_separated_list_of_elements>)
for example
(1 2 3 4 a b c nearly_any_string 99)
Any variable that gets such an array assigned, automatically becomes an array variable. In the script above, this is what happens in a_array=($a): Bash expands the $a to the <space_separated_list_of_elements> and reads the whole expression again interpreting it as an array constant.
Individual elements in such arrays can be referenced like variables by using expressions of the form
<array_name>[<idx>]
like a variable name. Therein, <array_name>is the name of the array and <idx> is an integer that references the individual element. For arrays that are represented by array constants, the index counts elements continuously starting from zero. Therefore, in the script, ${a_array[2]} expands to the third element in the array a_array. If the array would have less elements, a_array[2] would be considered unset.
You can output all elements in the array a_array, the corresponding index array, and the number of elements in the array respectively by
echo "${a_array[#]}"
echo "${!a_array[#]}"
echo "${#a_array[#]}"
These commands can be used to track down the fate of the newline: Given the script above, it is still in $a, as can be seen by (watch the quotes)
echo "$a"
which yields
12 13 315
1 2 3 4 5 6 7 8 10 2 8 9 1 0 0 2 3 4 5
But the newline did not make it into the array a_array. This is because Bash considers it as part of the whitespace that separates the third and the fourth element in the array assignment. The same applies if there are no extra spaces around the newline, like here:
12 13 315\n1 2 3 4 5 6 7 8 10 2 8 9 1 0 0 2 3 4 5
I actually assume that the output of your C program comes in this form.
This will store the full string in a[0] and the individual fields in a[1-N]:
$ tmp=$(printf '12 13 315\n1 2 3 4 5 6 7 8 10 2 8 9 1 0 0 2 3 4 5\n')
$ a=( $(printf '_ %s\n' "$tmp") )
$ a[0]="$tmp"
$ echo "${a[0]}"
12 13 315
1 2 3 4 5 6 7 8 10 2 8 9 1 0 0 2 3 4 5
$ echo "${a[3]}"
315
Obviously replace $(printf '12 13 315\n1 2 3 4 5 6 7 8 10 2 8 9 1 0 0 2 3 4 5\n') with $(./algorithm < input.txt) in your real code.

awk: print first column, then some values, and then all other columns

I want to print the first column, then a couple of columns with fixed values, like this command would do:
awk '{print $1,"1","2","1"}'
and then print all columns except the first after that...
I know this command prints all but the first column:
awk '{$1=""; print $0}'
But that gets rid of the first column.
In other words, this:
3 5 2 2
3 5 2 2
3 5 2 2
3 5 2 2
Needs to become this:
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
Any ideas?
use a loop to iterate through rest of the columns like this:
awk '{print $1,"1","2","1";for(i=2;i<=NF;i++) print $i}'
As an example:
$echo "3 5 2 2" | awk 'BEGIN{ORS=""}{print $1,"1","2","1";for(i=2;i<=NF;i++) print $i}'
3 1 2 1 5 2 2
$
Edit1 :
$ echo "3 5 2 2" | awk 'BEGIN{ORS="\n";OFS="\n"}{print $1,"1","2","1 ";for(i=2;i<=NF;i++) print $i" "}'
3
1
2
1
5
2
2
$
Edit2:
$ echo "3 5 2 2" | awk '{print $1,"1","2","1";for(i=2;i<=NF;i++) print $i}'
3 1 2 1
5
2
2
$
Edit3:
$ echo "3 5 2 2
3 5 2 2
3 5 2 2
3 5 2 2" | awk '{printf("%s %s ", $1,"1 2 1");for(i=2;i<=NF;i++) printf("%s ", $i); printf "\n"}'
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
You are almost there, you just need to store the first column in a temporary variable:
{
head=$1; # Store $1 in head, used later in printf
$1=""; # Empty $1, so that $0 will not contain first column
printf "%s 1 2 1%s\n", head, $0
}
And a full script:
echo "3 5 2 2" | awk '{head=$1;$1="";printf "%s 1 2 1%s\n", head, $0}'
Another solution with awk:
awk '{sub(/.*/, "1 2 1 "$2, $2)}1' File
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
3 1 2 1 5 2 2
Substitute the 2nd field with "1 2 1" followed by 2nd field itself.
You can do this using sed by replacing the first space by the string you want.
sed 's/ / 1 2 1 /' file
(OR)
With awk by replacing the first field($1):
awk '{$1=$1 " 1 2 1"}1' file
(I prefer the sed solution since it has less characters).

Combine multiple columns of different lengths into one column in BASH

I need to combine columns of different lengths into one column using BASH. Here is an example input file:
11 1 2 3 4 5 6 7 8
12 1 2 3 4 5 6 7 8
13 1 2 3 4 5 6 7 8
14 1 2 5 6 7 8
15 1 2 7 8
And my desired output:
1
1
1
1
1
3
3
3
5
5
5
5
7
7
7
7
7
The input data is pairs of columns as shown. Each pair is separated from another by a fixed number of spaces. Values within a pair of columns are separated by one space. Thanks in advance!
Using GNU awk for fixed width field handling:
$ cat file
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 5 6 7 8
1 2 7 8
$ cat tst.awk
BEGIN{ FIELDWIDTHS="1 1 1 3 1 1 1 3 1 1 1 3 1 1 1" }
{
for (i=1;i<=NF;i++) {
a[NR,i] = $i
}
}
END {
for (i=1;i<=NF;i+=4)
for (j=1;j<=NR;j++)
if ( a[j,i] != " " )
print a[j,i]
}
$ gawk -f tst.awk file
1
1
1
1
1
3
3
3
5
5
5
5
7
7
7
7
7
You may try the following:
awk -f ext.awk input.txt
where input.txt is your input data file and ext.awk is:
BEGIN {
ncols=4 # number of columns
nspc=3 # number of spaces that separates the columns
}
{
str=$0;
for (i=1; i<=ncols; i++) {
pos=match(str,/^([0-9]+) ([0-9]+)/,a)
if (pos>0) {
b[NR,i]=a[1]
if (NR==1) colw[i]=RLENGTH; #assume col width are given as in first row
}
str=substr(str,colw[i]+1+nspc);
}
}
END {
for (i=1;i<=ncols;i++)
for (j=1;j<=NR;j++) {
if (b[j,i]) print b[j,i];
}
}

Paste side by side multiple files by numerical order

I have many files in a directory with similar file names like file1, file2, file3, file4, file5, ..... , file1000. They are of the same dimension, and each one of them has 5 columns and 2000 lines. I want to paste them all together side by side in a numerical order into one large file, so the final large file should have 5000 columns and 2000 lines.
I tried
for x in $(seq 1 1000); do
paste `echo -n "file$x "` > largefile
done
Instead of writing all file names in the command line, is there a way I can paste those files in a numerical order (file1, file2, file3, file4, file5, ..., file10, file11, ..., file1000)?
for example:
file1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
...
file2
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
....
file 3
3 3 3 3 3
3 3 3 3 3
3 3 3 3 3
....
paste file1 file2 file3 .... file 1000 > largefile
largefile
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
....
Thanks.
If your current shell is bash: paste -d " " file{1..1000}
you need rename the files with leading zeroes, like
paste <(ls -1 file* | sort -te -k2.1n) <(seq -f "file%04g" 1000) | xargs -n2 echo mv
The above is for "dry run" - Remove the echo if you satisfied...
or you can use e.g. perl
ls file* | perl -nlE 'm/file(\d+)/; rename $_, sprintf("file%04d", $1);'
and after you can
paste file*
With zsh:
setopt extendedglob
paste -d ' ' file<->(n)
<x-y> is to match positive decimal integer numbers from x to y. x and/or y can be omitted so <-> is any positive decimal integer number. It could also be written [0-9]## (## being the zsh equivalent of regex +).
The (n) is the globbing qualifiers. The n globbing qualifier turns on numeric sorting which sorts on all sequences of decimal digits appearing in the file names.

Comparing few colums of a file with columns of another file

I have two data files 1.txt and 2.txt
1.txt contains valid lines.
For example.
1 2 1 2
1 3 1 3
In 2.txt i have an extra coloum, but if you ignore that, I have a few valid lines, and few invalid lines. There could be multiple occurrences of the same line in 2.txt
For example:
1 2 1 2 1.9
1 3 1 3 3.4
1 3 1 3 3.4
2 3 2 3 5.6
2 3 2 3 5.6
The second and third lines are the same and valid.
The fourth and fifth lines are also the same but invalid.
I want to write a shell script which compares these two files and outputs two files, valid.txt and invalid.txt which look like these...
valid.txt :
1 2 1 2 1
1 3 1 3 2
and invalid.txt :
2 3 2 3 2
The last extra column of valid.txt and invalid.txt contains the number of times the line has been repeated in 2.txt.
this awk script works for the example data:
awk 'NR==FNR{sub(/ *$/,"");a[$0]++;next}
{sub(/ [^ ]*$/,"")
if($0 in a)
v[$0]++
else
n[$0]++
}
END{
for(x in v)print x,v[x] > "valid.txt"
for(x in n) print x,n[x] >"inv.txt"
}' file1 file2
output:
kent$ head inv.txt valid.txt
==> inv.txt <==
2 3 2 3 2
==> valid.txt <==
1 3 1 3 2
1 2 1 2 1

Resources