Interchange columns using bash - shell

I have a file containing two columns e.g.
10 25
26 38
40 62
85 65
88 96
97 8
I want first column to contain all minimum values and second column containing all maximum values. Something like this:
10 25
26 38
40 62
65 85
88 96
8 97

Hope this helps.
awk '{
if ($2 < $1 )
print $2," "$1;
else
print $1," "$2;
}' filename.txt
Make sure columns are space separated

Using Python, it is straightforward:
values = [
(10, 25),
(26, 38),
(40, 62),
(85, 65),
(88, 96),
(97, 8),
]
result = [(min(v), max(v)) for v in values]
You get:
[(10, 25), (26, 38), (40, 62), (65, 85), (88, 96), (8, 97)]
Using bash… I don't know:
python -c "<your command here>'

I have tried this
#!/bin/sh
echo -n "Enter a file name > "
read name
if ["$1" -gt "$2"];
then
awk ' { t= $1; $1 = $2; $2=t ; print; } ' $name
fi
exit 0;

Related

How to add the elements in a for loop [duplicate]

This question already has answers here:
Summing values of a column using awk command
(2 answers)
Closed 1 year ago.
so basically my code looks through data and greps whatever it begins with, and so I've been trying to figure out a way where I'm able to add the those values.
the sample input is
35 45 75 76
34 45 53 55
33 34 32 21
my code:
for id in $(awk '{ print $1 }' < $3); do echo $id; done
I'm printing it right now to see the values but basically whats outputted is
35
34
33
I'm trying to add them all together but I cant figure out how, some help would be appreciated.
my desired output would be
103
Lots of ways to do this, a few ideas ...
$ cat numbers.dat
35 45 75 76
34 45 53 55
33 34 32 21
Tweaking OP's current code:
$ sum=0
$ for id in $(awk '{ print $1 }' < numbers.dat); do ((sum+=id)); done
$ echo "${sum}"
102
Eliminating awk:
$ sum=0
$ while read -r id rest_of_line; do sum=$((sum+id)); done < numbers.dat
$ echo "${sum}"
102
Using just awk (looks like Aivean beat me to it):
$ awk '{sum+=$1} END {print sum}' numbers.dat
102
awk '{ sum += $1 } END { print sum }'
Test:
35 45 75 76
34 45 53 55
33 34 32 21
Result:
102
(sum(35, 34, 33) = 102, that's what you want, right?)
Here is the detailed explanation of how this works:
$1 is the first column of the input.
sum is the variable that holds the sum of all the values in the first column.
END { print sum } is the action to be performed after all the input has been processed.
So the awk program is basically summing up the first column of the input and printing the result.
This answer was partially generated by Davinci Codex model, supervised and verified by me.

How to use awk to search for min and max values of column in certain files

I know that awk is helpful in trying to find certain things in columns in files, but I'm not sure how to use it to find the min and max values of a column in a group of files. Any advice? To be specific I have four files in a directory that I want to go through awk with.
If you're looking for the absolute maximum and minimum of column N over all the files, then you might use:
N=6
awk -v N=$N 'NR == 1 { min = max = $N }
{ if ($N > max) max = $N; else if ($N < min) min = $N }
END { print min, max }' "$#"
You can change the column number using a command line option or by editing the script (crude, but effective — go with option handling), or any other method that takes your fancy.
If you want the maximum and minimum of column N for each file, then you have to detect new files, and you probably want to identify the files, too:
awk -v N=$N 'FNR == 1 { if (NR != 1) print file, min, max; min = max = $N; file = FILENAME }
{ if ($N > max) max = $N; else if ($N < min) min = $N }
END { print file, min, max }' "$#"
Try this: it will give min and max in file with comma seperated.
simple:
awk 'BEGIN {max = 0} {if ($6>max) max=$6} END {print max}' yourfile.txt
or
awk 'BEGIN {min=1000000; max=0;}; { if($2<min && $2 != "") min = $2; if($2>max && $2 != "") max = $2; } END {print min, max}' file
or more awkish way:
awk 'NR==1 { max=$1 ; min=$1 }
FNR==NR { if ($1>=max) max=$1 ; $1<=min?min=$1:0 ; next}
{ $2=($1-min)/(max-min) ; print }' file file
sort can do the sorting and you can pick up the first and last by any means, for example, with awk
sort -nk2 file{1..4} | awk 'NR==1{print "min:"$2} END{print "max:"$2}'
sorts numerically by the second field of files file1,file2,file3,file4 and print the min and max values.
Since you didn't provide any input files, here is a worked example, for the files
==> file_0 <==
23 29 84
15 58 19
81 17 48
15 36 49
91 26 89
==> file_1 <==
22 63 57
33 10 50
56 85 4
10 63 1
72 10 48
==> file_2 <==
25 67 89
75 72 90
92 37 89
77 32 19
99 16 70
==> file_3 <==
50 93 71
10 20 55
70 7 51
19 27 63
44 3 46
if you run the script, now with a variable column number n
n=1; sort -k${n}n file_{0..3} |
awk -v n=$n 'NR==1{print "min ("n"):",$n} END{print "max ("n"):",$n}'
you'll get
min (1): 10
max (1): 99
and for the other values of n
n=2; sort ...
min (2): 3
max (2): 93
n=3; sort ...
min (3): 1
max (3): 90

bash group times and average + sum columns

I have a daily file output on a linux system like the below and was wondering is there a way to group the data in 30min increments based on $1 and avg $3 and sum $4 $5 $6 $7 $8 via a shell script using awk/gawk or something similar?
04:04:13 04:10:13 2.13 36 27 18 18 0
04:09:13 04:15:13 2.37 47 38 13 34 0
04:14:13 04:20:13 2.19 57 37 23 33 1
04:19:13 04:25:13 2.43 43 35 13 30 0
04:24:13 04:30:13 2.29 48 40 19 28 1
04:29:13 04:35:13 2.33 56 42 16 40 0
04:34:13 04:40:13 2.21 62 47 30 32 0
04:39:13 04:45:13 2.25 44 41 19 25 0
04:44:13 04:50:13 2.20 65 50 32 33 0
04:49:13 04:55:13 2.47 52 38 16 36 0
04:54:13 05:00:13 2.07 72 54 40 32 0
04:59:13 05:05:13 2.35 53 41 19 34 0
so basically this hour of data would result in something like this:
04:04:13-04:29:13 2.29 287 219 102 183 2
04:34:13-04:59:13 2.25 348 271 156 192 0
this is what I have gotten so far using awk to search between the time frames but I think there is an easier way to get the grouping done without awking each 30min interval
awk '$1>=from&&$1<=to' from="04:00:00" to="04:30:00" | awk '{ total += $3; count++ } END { print total/count }'|awk '{printf "%0.2f\n", $1'}
awk '$1>=from&&$1<=to' from="04:00:00" to="04:30:00" | awk '{ sum+=$4} END {print sum}'
This should do what you want:
{
split($1, times, ":");
i = (2 * times[1]);
if (times[2] >= 30) i++;
if (!start[i] || $1 < start[i]) start[i] = $1;
if (!end[i] || $1 > end[i]) end[i] = $1;
count[i]++;
for (col = 3; col <= 8; col++) {
data[i, col] += $col;
}
}
END {
for (i = 1; i <= 48; i++) {
if (start[i]) {
data[i, 3] = data[i, 3] / count[i];
printf("%s-%s %.2f", start[i], end[i], data[i, 3]);
for (col = 4; col <= 8; col++) {
printf(" " data[i, col]);
}
print "";
}
}
}
As you can see, I divide the day into 48 half-hour intervals and place the data into one of these bins depending on the time in the first column. After the input has been exhausted, I print out all bins that are not empty.
Personally, I would do this in Python or Perl. In awk, the arrays are not ordered (well, in gawk you could use assorti to sort the array...) which makes printing ordered buckets more work.
Here is the outline:
Read input
Convert the time stamp to seconds
Add to an ordered (or sortable) associative array of the data elements in buckets of the desired time frame (or, just keep running totals).
After the data is read, process as you wish.
Here is a Python version of that:
#!/usr/bin/python
from collections import OrderedDict
import fileinput
times=[]
interval=30*60
od=OrderedDict()
for line in fileinput.input():
li=line.split()
secs=sum(x*y for x,y in zip([3600,60,1], map(int, li[0].split(":"))))
times.append([secs, [li[0], float(li[2])]+map(int, li[3:])])
current=times[0][0]
for t, li in times:
if t-current<interval:
od.setdefault(current, []).append(li)
else:
current=t
od.setdefault(current, []).append(li)
for s, LoL in od.items():
avg=sum(e[1] for e in LoL)/len(LoL)
sums=[sum(e[i] for e in LoL) for i in range(2,7)]
print "{}-{} {:.3} {}".format(LoL[0][0], LoL[-1][0], avg, ' '.join(map(str, sums)))
Running that on your example data:
$ ./ts.py ts.txt
04:04:13-04:29:13 2.29 287 219 102 183 2
04:34:13-04:59:13 2.26 348 271 156 192 0
The advantage is you can easily change the interval and a similar technic can use timestamps that are longer than days.
If you really want awk you could do:
awk 'BEGIN{ interval=30*60 }
function fmt(){
line=sprintf("%s-%s %.2f %i %i %i %i %i", ls, $1, sums[3]/count,
sums[4], sums[5], sums[6], sums[7], sums[8])
}
{
split($1,a,":")
secs=a[1]*3600+a[2]*60+a[3]
if (NR==1) {
low=secs
ls=$1
count=0
for (i=3; i<=8; i++)
sums[i]=0
}
for (i=3; i<=8; i++){
sums[i]+=$i
}
count++
if (secs-low<interval) {
fmt()
}
else {
print line
low=secs
ls=$1
count=1
for (i=3; i<=8; i++)
sums[i]=$i
}
}
END{
fmt()
print line
}' file
04:04:13-04:29:13 2.29 287 219 102 183 2
04:34:13-04:59:13 2.26 348 271 156 192 0

split a file based upon line number

I have a large file that needs to be slitted based on line numbers.
For instance , my file is like that:
aaaaaa
bbbbbb
cccccc
dddddd
****** //here blank line//
eeeeee
ffffff
gggggg
hhhhhh
*******//here blank line//
ıııııı
jjjjjj
kkkkkk
llllll
******
//And so on...
I need two separate files as such that one file should have first 4 lines, third 4 lines, fifth 4 lines in it and the other file should have second 4 lines, fourth 4 lines, sixth 4 lines in it and so on. how can I do that in bash script?
You can play with the number of the line, NR:
$ awk 'NR%10>0 && NR%10<5' your_file > file1
$ awk 'NR%10>5' your_file > file2
If it is 10K + n, 0 < n < 5, then goes to the first file.
If it is 10K + n, n > 5, then goes to the second file.
In one line:
$ awk 'NR%10>0 && NR%10<5 {print > "file1"} NR%10>5 {print > "file2"}' file
Test
$ cat a
1
2
3
4
6
7
8
9
11
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
31
32
33
34
36
37
38
39
41
42
43
44
46
47
48
49
51
$ awk 'NR%10>0 && NR%10<5 {print > "file1"} NR%10>5 {print > "file2"}' a
$ cat file1
1
2
3
4
11
12
13
14
21
22
23
24
31
32
33
34
41
42
43
44
51
$ cat file2
6
7
8
9
16
17
18
19
26
27
28
29
36
37
38
39
46
47
48
49
You can do this with head and tail (which are not be part of the bash itself):
head -n 20 <file> | tail -n 5
gives you the lines 15 to 20.
This is however inefficient, if you want to get multiple sections of your file, since it has to be parsed again and again. In this case I'd prefer some real scripting.
Another approach is to treat blank-line-separated paragraphs as the records, and print odd-numbered and even-numbered records to different files:
awk -v RS= -v ORS='\n\n' '{
outfile = (NR % 2 == 1) ? "file1" : "file2"
print > outfile
}' file
Maybe something like that:
#!/bin/bash
EVEN="even.log"
ODD="odd.log"
line_count=0
block_count=0
while read line
do
# ignore blank lines
if [ ! -z "$line" ]; then
if [ $(( $block_count % 2 )) -eq 0 ]; then
# even
echo "$line" >> "$EVEN"
else
# odd
echo "$line" >> "$ODD"
fi
line_count=$[$line_count +1]
if [ "$line_count" -eq "4" ]; then
block_count=$[$block_count +1]
line_count=0
fi
fi
done < "$1"
The first argument is the source file: ./split.sh split_input
This script prints lines from file 1.txt with indexes 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, ...
i=0
while read p; do
if [ $i%8 -lt 4 ]
then
echo $p
fi
let i=$i+1
done < 1.txt
This script prints lines with indexes 4, 5, 6, 7, 12, 13, 14, 15, ...
i=0
while read p; do
if [ $i%8 -gt 3 ]
then
echo $p
fi
let i=$i+1
done < 1.txt

Extracting lines 1, 11, 21, etc. from a text file

Is it possible to retrieve data from lines number 1, 11, 21 ,31 from a text file using Linux commands?
I need to do the same for 2, 12, 22, 32 and so on.
You can use awk for this:
awk '(NR % 10 == 1){ print }' your_input_file
For example:
$ seq 1 100|awk '(NR%10 == 2){print}'
2
12
22
32
42
52
62
72
82
92
As glenn jackman points out, you can parametrize the awk script to make it more easy to use. And given that print is the default action, you can simply write:
$ seq 1 20|awk -v step=10 -v idx=3 'NR%step==idx'
3
13

Resources