Shell scripting: how to sort array in file txt - shell

I have a file txt
for example:
11 23 4 9
5 2 17 25
and the output that I want is:
2 4 5 9
11 17 23 25

Sort the numbers in the file with sort -un:
tr ' ' '\n' < file.txt | sort -un | tr '\n' ' '

$ tr -s ' ' '\n' <file | sort -n | paste -d ' ' - - - -
2 4 5 9
11 17 23 25
tr changes all spaces into newlines and removes excess newlines from the input. This create a stream of numbers, one number per line, which is then sorted numerically and pasted into four space-separate columns.

this gawk codes will work for the dynamic number of columns:
awk '{for(x=1;x<=NF;x++)a[++i]=$x}
END{asort(a,b)
for(x=1;x<=i;x++)printf "%s%s",b[x],x%NF==0?RS:FS,b[x]}' file

Related

distribute data in both increment and decrement order

I have a file which has n number of rows, i want it's data to be distributed in 7 files as per below order
** my input file has n number of rows, this is just an example.
Input file
1
2
3
4
5
6
7
8
9
10
11
12
13
14
1
5
16
17
.
.
28
Output file
1 2 3 4 5 6 7
14 13 12 11 10 9 8
15 16 17 18 19 20 21
28 27 26 25 24 23 22
so if i open the first file it should have rows
1
14
15
28
similarly if i open the second file it should have rows
2
13
16
27
similarly output for the other files as well.
Can anybody please help, with below code it is doing what is required but not in required order.
awk '{print > ("te1234"++c".txt");c=(NR%n)?c:0}' n=7 test6.txt
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
EDIT: Since OP has changed sample of Input_file totally different so adding this solution now, again this is written and tested with shown samples only.
With xargs + single awk: (recommended one)
xargs -n7 < Input_file |
awk '
FNR%2!=0{
for(i=1;i<=NF;i++){
print $i >> (i".txt")
close(i".txt")
}
next
}
FNR%2==0{
for(i=NF;i>0;i--){
count++
print $i >> (count".txt")
close(i".txt")
}
count=""
}'
Initial solution:
xargs -n7 < Input_file |
awk '
FNR%2==0{
for(i=NF;i>0;i--){
val=(val?val OFS:"")$i
}
$i=val
val=""
}
1' |
awk '
{
for(i=1;i<=NF;i++){
print $i >> (i".txt")
close(i".txt")
}
}'
Above could be done with single awk too will add xargs + awk(single) solution in few mins too.
Could you please try following, written and tested with shown samples in GNU awk.
awk '{for(i=1;i<=NF;i++){print $i >> (i".txt");close(i".txt")}}' Input_file
The output file counter could descend for each second group of seven:
awk 'FNR%n==1 {asc=!asc}
{
out="te1234" (asc ? ++c : c--) ".txt";
print >> out;
close(out)
}' n=7 test6.txt
$ ls
file tst.awk
$ cat tst.awk
{ rec = (cnt % 2 ? $1 sep rec : rec sep $1); sep=FS }
!(NR%n) {
++cnt
nf = split(rec,flds)
for (i=1; i<=nf; i++) {
out = "te1234" i ".txt"
print flds[i] >> out
close(out)
}
rec=sep=""
}
.
$ awk -v n=7 -f tst.awk file
.
$ ls
file te12342.txt te12344.txt te12346.txt tst.awk
te12341.txt te12343.txt te12345.txt te12347.txt
$ cat te12341.txt
1
14
15
28
$ cat te12342.txt
2
13
16
27
If you can have input that's not an exact multiple of n then move the code that's currently in the !(NR%n) block into a function and call that function there and in an END section.
This might work for you (GNU sed & parallel):
parallel 'echo {1}~14w file{1}; echo {2}~14w file{1}' ::: {1..7} :::+ {14..8} |
sed -n -f - file &&
paste file{1..7}
Create a sed script to write files named filen where n is 1 thru 7 (see above first set of parameters in the parallel command and also in the paste command).
The sed script uses the n~m address where n is the starting address and m is the modulo thereafter.
The distributed files are created first and the paste command then joins them all together to produce a single output file (tab separated by default, use paste -d option to get desired delimiter).
Alternative using Bash & sed:
for ((n=1,m=14;n<=7;n++,m--));do echo "$n~14w file$n";echo "$m~14w file$n";done |
sed -nf - file &&
paste file{1..7}

Bash split and sort by two columns in same file

For my file which looks like this:
AABBCC 10 5 CCAABB 100
BBCCAA 4 2 AABBCC 50
CCAABB 16 8 BBCCAA 20
... I am trying to sort columns 4 and 5, by matching column 4 to column 1.
Ideally this would return:
AABBCC 10 5 AABBCC 50
BBCCAA 4 2 BBCCAA 20
CCAABB 16 8 CCAABB 100
I have tried using sort, however as far as I'm aware it doesn't have a utility to sort within files.
Any help would be greatly appreciated!
awk solution:
awk 'NR==FNR{ a[$4]=$5; next }$1 in a{ print $1,$2,$3,$1,a[$1] }' file1 OFS="\t" file1
The output:
AABBCC 10 5 AABBCC 50
BBCCAA 4 2 BBCCAA 20
CCAABB 16 8 CCAABB 100
You may pipe to sort at the end: ... | sort
join -t $'\t' -o 1.1,1.2,1.3,2.1,2.2 <(cut -f1-3 file.tsv | sort -k 1,1) <(cut -f4- file.tsv | sort -k 1,1) | sort
Cut the original file, then join on the first field of both. We need to specify the full join fields in -o to preserve the first column, or join will eat it.
With bash and GNU paste:
With temporary files for illustration:
cut -f 1-3 file | sort > file_1to3
cut -f 4-5 file | sort > file_4to5
paste -d '\t' file_1to3 file_4to5
Without temporary files:
paste -d '\t' <(cut -f 1-3 file | sort) <(cut -f 4-5 file | sort)
Output:
AABBCC 10 5 AABBCC 50
BBCCAA 4 2 BBCCAA 20
CCAABB 16 8 CCAABB 100

Sorting tab delimited numbers by column with pure bash script.

Im stuck on some homework. The requirements of the assignment are to accept an input file and perform some statistics on the values. The user may specify whether to calculate the statistics by row or by value. The shell script must be pure bash script so I can't use awk, sed, perl, python etc.
sample input:
1 1 1 1 1 1 1
39 43 4 3225 5 2 2
6 57 8 9 7 3 4
3 36 8 9 14 4 3
3 4 2 1 4 5 5
6 4 4814 7 7 6 6
I can't figure out how to sort and process the data by column. My code for processing the rows works fine.
# CODE FOR ROWS
while read -r line
echo $(printf "%d\n" $line | sort -n) | tr ' ' \\t > sorted.txt
....
#I perform the stats calculations
# for row line by working with the temp file sorted.txt
done
How could I process this data by column? I've never worked with shell script so I've been staring at this for hours.
If you wanted to analyze by columns you'll need the cols value first (number of columns). head -n 1 gives you the first row, and NF counts the number of fields, giving us the number of columns.
cols=$(head -n 1 test.txt | awk '{print NF}');
Then you can use cut with the '\t' delimiter to grab every column from input.txt, and run it through sort -n, as you did in your original post.
$ for i in `seq 2 $((cols+1))`; do cut -f$i -d$'\t' input.txt; done | sort -n > output.txt
For rows, you can use the shell built-in printf with the format modifier %dfor integers. The sort command works on lines of input, so we replace spaces ' ' with newlines \n using the tr command:
$ cat input.txt | while read line; do echo $(printf "%d\n" $line); done | tr ' ' '\n' | sort -n > output.txt
Now take the output file to gather our statistics:
Min: cat output.txt | head -n 1
Max: cat output.txt | tail -n 1
Sum: (courtesy of Dimitre Radoulov): cat output.txt | paste -sd+ - | bc
Mean: (courtesy of porges): cat output.txt | awk '{ $total += $2 } END { print $total/NR }'
Median: (courtesy of maxschlepzig): cat output.txt | awk ' { a[i++]=$1; } END { print a[int(i/2)]; }'
Histogram: cat output.txt | uniq -c
8 1
3 2
4 3
6 4
3 5
4 6
3 7
2 8
2 9
1 14
1 36
1 39
1 43
1 57
1 3225
1 4814

Using sort with space as a field separator

I'm trying to use the sort command to sort integers in string separated by a space. For example 8 6 5 7 9 56 -20 - 10. I receive the string on the standard output. I tried all of these but nothing works :
sort -t' '
sort -t ' '
sort -t " "
sort -t" "
sort -t=" "
echo "8 6 5 7 9 56 -20 - 10" | tr ' ' '\n' | sort -n
Sort can only sort lines.
You can first read string into an array with space as delimiter then use sort with process substitution:
s='8 6 5 7 9 56 -20 - 10'
read -ra arr <<< "$s"
sort -n <(printf "%s\n" "${arr[#]}")
Output:
-20
-10
5
6
7
8
9
56
To store output in string again:
read -r str < <(sort -n <(printf "%s\n" "${arr[#]}") | tr '\n' ' ')
And check output:
declare -p str
declare -- str="-20 -10 5 6 7 8 9 56"

how to replace [10-15] to 10 11 12..15 in BASH

I have a file/string containing the following:
[1-9]
[11-12]
[10-15]
I then want to expand that to become this:
1 2 3 4 5 6 7 8 9
11 12
10 11 12 13 14 15
I know how to do it in a very long way (first capture the two numbers and then expand them using a for loop).
I would like to know if there is a faster/smarter way of achieve the same.
One way:(Pure bash solution)
while IFS=- read l1 l2
do
eval echo ${l1/[/{}".."${l2/]/}}
done < file
There are several solutions.
Solution 1:
sed 's/^/echo /; s/[[]/{/; s/]/}/; s/-/../' | bash
Example:
$ cat 1.txt | sed 's/^/echo /; s/[[]/{/; s/]/}/; s/-/../' | bash
1 2 3 4 5 6 7 8 9
11 12
10 11 12 13 14 15
Solution 2:
tr '[]-' ' ' | sed "s/^/seq -s' '/" | bash
Example:
$ cat 1.txt | tr '[]-' ' ' | sed "s/^/seq -s' '/" | bash
1 2 3 4 5 6 7 8 9
11 12
10 11 12 13 14 15
If you're confident that your input all matches that pattern:
while read a; do
seq -s' ' $(echo "$a" | tr '[]-' ' ')
done
Add error checking as appropriate.
Here's a one-liner:
cat lines | sed -E -e 's/\[|]//g' -e 's/-/ /g' | xargs -n 2 seq -s ' ' -t '\n'
As in:
$ cat <<EOF | sed -E -e 's/\[|]//g' -e 's/-/ /g' | xargs -n 2 seq -s ' ' -t '\n'
> [1-9]
> [11-12]
> [10-15]
> EOF
1 2 3 4 5 6 7 8 9
11 12
10 11 12 13 14 15

Resources