Attempt to print out the row containing the earliest/latest date - bash

I'm stuck on what to do. My script will output the date/time itself, but not the whole row containing that value.
And the input looks like the following:
1 UID PID PPID C STIME TTY TIME CMD
2 adz110 5344 5334 0 08:47 pts /2 00:00:00 bash
3 dmq292 6908 6854 0 Jun04 pts /1 00:00:00 bash
4 adz110 7227 7150 0 Jul11 pts /9 00:00:00 who
5 erg474 7466 7461 0 08:54 pts /10 00:00:00 ls
6 dmq292 7966 7960 0 Jun04 pts /13 00:00:00 assign1 . sh if of
7 xle135 8983 8636 0 08:59 pts /15 00:00:00 ssh ctf . cs . utsarr . net
8 zeh458 9057 1980 0 08:59 pts /7 00:00:00 vim prog . c
9 rslavin 9150 9139 0 08:59 pts /16 00:00:00 ps - af
10 xle135 8636 8628 0 08:58 pts /15 00:00:00 bash
This seems to work for printing out "Earliest Start Time: Jun04" and "Latest Start Time: 08:59".
min = $5; next} {if ($5 > min) min = $5;}
{max = 0}{if ($5>max) max=$5}
I'm finishing up the entire script, which looks like this:
sort |
gawk '
$1 !~ /[a-z]{3}[0-9]{3}/ {
next
}
NR==0{min = $5; next} {if ($5 > min) min = $5;}
{max = 0}{if ($5>max) max=$5}
prev!=$1 && prev{
print "User: " prev ORS "\t " val
prev=val=""
}
{
prev=$1
val=(val?val ORS "\t ":"")substr($0, index($0,$8))
} ;
END {
if(prev){
print "User: " prev ORS "\t " val
}; { print "\nEarliest Start Time :\n" } { print min };
{ print "\nLatest Start Time :\n" } { print max };
}'
I thought I would want to change it to "min = $0" and "max = $0" to get the whole row containing the min and max values. But that prints out different rows instead.
The sample output should be:
1 User : adz110
2 bash
3 who
4 User : dmq292
5 bash
6 assign1 . sh if of
7 User : erg474
8 ls
9 User : xle135
10 bash
11 ssh ctf . cs . utsarr . net
12 User : zeh458
13 vim prog . c
14
15 Earliest Start Time :
16 dmq292 6908 6854 0 Jun04 pts /1 00:00:00 bash
17
18 Latest Start Time :
19 xle135 8983 8636 0 08:59 pts /15 00:00:00 ssh ctf . cs . utsarr . net
For me, earliest Start time just comes out as "Jun04" and latest start time comes out as "08:59"

#! /bin/bash
file=$1
awk ' BEGIN {
mon["Jan"]=1
mon["Feb"]=2
mon["Mar"]=3
mon["Apr"]=4
mon["May"]=5
mon["Jun"]=6
mon["Jul"]=7
mon["Aug"]=8
mon["Sep"]=9
mon["Oct"]=10
mon["Nov"]=11
mon["Dec"]=12
}
# main logic here
NR != 1{
# storing user info
users[$1]=users[$1]"\n\t"$9
for (i = 10 ; i <= NF ; ++i)
users[$1]=users[$1]" "$i
# min
if (min != "") {
if (compare(min, $5) == 1) {
min=$5
min_line=$0
}
} else {
min=$5
min_line=$0
}
# max
if (max != "") {
if (compare(max, $5) == -1) {
max=$5
max_line=$0
}
} else {
max=$5
max_line=$0
}
}
# printing here
END {
for (user in users) {
print "User:", user, users[user]
}
print "Earliest Start Time :\n\t"min_line
print "Latest Start Time :\n\t"max_line
}
function compare(arg1, arg2) {
# returns
# -1 if arg1 < arg2
# 0 if arg1 == arg2
# 1 if arg1 > arg2
last_two_arg1=substr(arg1, 3)
last_two_arg2=substr(arg2, 3)
first_two_arg1=substr(arg1, 0, 2)
first_two_arg2=substr(arg2, 0, 2)
if (int(first_two_arg1) == first_two_arg1) {
# arg1 is a time
if (int(first_two_arg2) == first_two_arg2) {
# arg2 is a time aswell
if (first_two_arg1 > first_two_arg2) {
return 1
} else if (first_two_arg1 == first_two_arg2) {
if (last_two_arg1 > last_two_arg2)
return 1
else if (last_two_arg1 == last_two_arg2)
return 0
else
return -1
} else {
return -1
}
} else {
# arg2 is a date means is smaller
return 1
}
} else {
# arg1 is date
if (int(first_two_arg2) == first_two_arg2) {
# arg2 is time, means arg1 is smaller
return -1
} else {
# both arg1 and arg2 are date
first_three_arg1=substr(arg1, 0, 3)
first_three_arg2=substr(arg2, 0, 3)
if (mon[first_three_arg1] > mon[first_three_arg2])
return 1
else if (mon[first_three_arg1] == mon[first_three_arg2]) {
if (last_two_arg1 > last_two_arg2)
return 1
else if (last_two_arg1 == last_two_arg2)
return 0
else
return -1
} else {
return -1
}
}
}
}' "${file}"
The main barrier in writing this script is a function to compare two different formats of time.
The above script should produce the desired result, although I haven't taken care of sorting, if that's needed, you can ask.

Related

Find nearest point from file1 in file2, shell skript

I have 2 files:
file1
-3241.42 633.261 1210.53
-1110.89 735.349 836.635
(this is the points I am looking for, with coordinates x,y,z)
file2
2014124 -2277.576 742.75 962.5816 0 0
2036599 -3236.882 638.748 1207.804 0 0
2036600 -3242.417 635.2612 1212.527 0 0
2036601 -3248.006 631.6553 1217.297 0 0
2095885 -1141.905 737.7666 843.3465 0 0
2095886 -1111.889 738.3486 833.6354 0 0
2095887 -1172.227 737.4004 853.9965 0 0
2477149 -3060.679 488.6802 1367.816 0 0
2477150 -3068.369 489.6621 1365.769 0 0
and so on
(this is the points from my model, with ID, x, y, z, 0, 0)
I am looking for such a result: (find the point IDs with nearest coordinates)
Output
2036600 , xyz= -3242.42, 635.261, 1212.53, dist= 3.00
2095886 , xyz= -1111.89, 738.349, 833.635, dist= 4.36
My algorithm would look like this:
For each line in file1, catch x1,y1,z1
Search in file2 the nearest point, that mean dist = sqrt((x1-x2)**2+(y1-y2)**2+(z1-z2)**2) is minimum
Display the result with pointID, xyz = x2, y2, z2, dist= dist
I tried to adapt a script found here, but it gives to much lines
#!/bin/bash
(($#!=2))&& { echo "Usage $0 1st_file 2nd_file"; exit 1; }
awk '
BEGIN {p=fx=0; fn=""; maxd=1.1e11;}
$0~"[^0-9. \t]" || NF!=4 && NF!=3 {next;} # skip no data lines
fn!=FILENAME {fx++; fn=FILENAME;} # fx: which file
fx==1 { if(NF!=3){printf("Change the series of two input files\n"); exit 1;}
x1[p]=$1; y1[p]=$2; z1[p]=$3;next;} # save the columns of first file
fx==2 { mv=maxd; mp=0; # search minimal distance
for(i=0; i<p; i++){
dx=x1[i]-$2; dy=y1[i]-$3; dz=z1[i]-$4; dist=sqrt(dx*dx+dy*dy+dz*dz);
if(dd<mv){mv=dd; mp=i;} # min value & min place
}
printf("%3d %6.2f %6.2f %3d\n", $1, x1[mp], y1[mp], z1[mp], dist);
}
' file1.dat file2.dat
Thank you very much!
$ cat tst.awk
BEGIN { OFS=", " }
NR==FNR {
points[NR] = $0
next
}
{
min = 0
for (i in points) {
split(points[i],coords)
dist = ($1 - coords[2])^2 + \
($2 - coords[3])^2 + \
($3 - coords[4])^2
if ( (i == 1) || (dist <= min) ) {
min = dist
point = points[i]
}
}
split(point,p)
print p[1] " ", "xyz= " p[2], p[3], p[4], "dist= " sqrt(min)
}
$ awk -f tst.awk file2 file1
2036600 , xyz= -3242.417, 635.2612, 1212.527, dist= 2.99713
2095886 , xyz= -1111.889, 738.3486, 833.6354, dist= 4.35812

Displaying formatted records using AWK

I have been giving an input file where it looks as this:
ID PID PPID C STIME TTY TIME CMD
adz110 5344 5334 0 08:47 pts/2 00:00:00 bash
dmq292 6908 6854 0 08:53 pts/1 00:00:00 bash
adz110 7227 7150 0 08:54 pts/9 00:00:00 who
erg474 7466 7461 0 08:54 pts/10 00:00:00 ls
dmq292 7966 7960 0 08:55 pts/13 00:00:00 assign1.sh if of
xle135 8636 8628 0 08:58 pts/15 00:00:00 bash
xle135 8983 8636 0 08:59 pts/15 00:00:00 ssh ctf.cs.utsarr.net
zeh458 9057 1980 0 08:59 pts/7 00:00:00 vim prog.c
HanSolo 9150 9139 0 08:59 pts/16 00:00:00 ps -af
and it needs to end like this:
User: adz110
bash
who
User: dmq292
bash
assign1.sh if of
User: erg474
ls
User: xle135
bash
ssh ctf.cs.utsarr.net
User: zeh458
vim prog.c
Earliest Start Time:
adz110 5344 5334 0 08:47 pts/2 00:00:00 bash
Latest Start Time
xle135 8983 8636 0 08:59 pts /15 00:00:00 ssh ctf.cs.utsarr.net
What I have come up until now is within a file.awk that I have written down :
/[a-z]{3}[0-9]{3}/
{
print $1
if( match($1 , arg1) )
{
for(i=8; i <= NF ; i ++)
printf("%s", $i);
printf("\n");
}
}
END {
print " Earliest Start Time:"
print "Latest Start Time: "
}
So instead of printing for each user having the username matching the [a-z][0-9]{3}, it prints the cmd along with the username without any spaces. I am having a lot of trouble understanding associative arrays and the use of specials variables like NR, RS, etc.
Try this, file.awk:
$1 !~ /[a-z]{3}[0-9]{3}/ {next;}
!fstTime {fstTime=$5; lstTime=$5; first=$0; last = $0;}
!($1 in a) {a[$1];users[++ind]=$1;}
{ cmd=$8; for(i=9;i<=NF;i++) cmd=cmd OFS $i;
cmds[$1] = cmds[$1] ? cmds[$1] "\n " cmd : " " cmd;
if ($5 < fstTime) { fstTime=$5; first=$0; }
if ($5 > lstTime) { lstTime=$5; last = $0; }
}
END {
for(i=1;i<=length(a);i++) {
print "User: " users[i];
print cmds[users[i]];
}
print "Earliest Start Time:\n" first "\n\nLatest Start Time:\n" last;
}
awk -f file.awk fileToTest.txt yields:
User: adz110
bash
who
User: dmq292
bash
assign1.sh if of
User: erg474
ls
User: xle135
bash
ssh ctf.cs.utsarr.net
User: zeh458
vim prog.c
Earliest Start Time:
adz110 5344 5334 0 08:47 pts/2 00:00:00 bash
Latest Start Time:
xle135 8983 8636 0 08:59 pts/15 00:00:00 ssh ctf.cs.utsarr.net
If you want the zeh458 9057 1980 0 08:59 pts/7 00:00:00 vim prog.c be the Latest Start Time: line, just change ($5 > lstTime) to ($5 >= lstTime) will do.
You want to apply the regex only against the first field, and collect the values for each user in memory in a format suitable for printing out.
Observe that the listing is fixed-width; so the program name and arguments are in a field which starts in column 55. Similarly, the time stamp is in columns 28-32.
awk 'NR > 1 && $1 ~ /^[a-z]{3}[0-9]{3}$/ {
when = substr($0, 28, 5)
command = substr($0, 55)
if ($1 in user) {
# Append this command to previous value
user[$1] = user[$1] ORS " " command
} else {
# Create a new item in this associative array
user[$1] = " " command
}
if (NR==2 || when > max) { max=when; maxcmd = $0 }
if (NR==2 || when < min) { min=when; mincmd = $0 }
}
END {
# XXX TODO: sort array?
for (u in user) printf "User %s:\n%s\n", u, user[u]
print "Earliest start time"; print mincmd
print "Lastest start time"; print maxcmd
}' filename
So the first time we see a particular user, they will not be in users, and so we just put their first command indented by four spaces as the value for that key in the associative array. If we see them again, we append another line to the value, with a newline (ORS) and four spaces in front.
NR is the current line number -- we skip NR==1 to avoid capturing the header line, and with NR==2 we know that this is the first line we are processing, so we set max and min to their baseline values.
Ideally, you should also normalize the timestamps into a canonical form so that you can sort "yesterday 21:24" before "21:23", but I'm not going there with this simple answer. You probably want to add embellishments in various other places, too. (Or maybe you could rely on the fact that the input seems to be sorted by process start time?)
If you want the user names to be sorted, GNU Awk has array sorting built-in; for other Awks, you will need to write your own simple sort function, or use an external pipeline.
There is one confusion in your input file (your command's output displays a few processes with the same time, e.g. xle135 and zeh458). So the solution I have come up with will sort your input file (you could put your command if you need in place of it too), then it will figure out maximum and minimum time period out of all entries. If some things have the same time stamp, it will not concatenate their values (though we could do that, too); instead, it will print the latest sorted one (with the latest timestamp only).
awk '
/^ID/{
next
}
{
split($5,array,":")
seconds=array[1]*60+array[2]
}
FNR==NR{
a[$1]++
tim[seconds]=$0
next
}
a[$1]==2{
print "User: " $1 ORS " " $NF
getline
sub(/.*:00/,"")
print" " $0
}
a[$1]==1{
val=$0
sub(/.*:00/,"",val)
print "User: " $1 ORS " " val
}
{
min=min<seconds?(min?min:seconds):seconds
max=max>seconds?max:seconds
}
END{
print "Earliest Start Time:" ORS tim[min] ORS "Latest Start Time" ORS tim[max]
}
' <(sort -k1,5 Input_file) <(sort -k1,5 Input_file)
Output will be as follows.
User: adz110
bash
who
User: dmq292
bash
assign1.sh if of
User: erg474
ls
User: HanSolo
ps -af
User: xle135
bash
ssh ctf.cs.utsarr.net
User: zeh458
vim prog.c
Earliest Start Time:
adz110 5344 5334 0 08:47 pts/2 00:00:00 bash
Latest Start Time
zeh458 9057 1980 0 08:59 pts/7 00:00:00 vim prog.c
Assuming that if a given user is running the same command multiple times you want that command to appear multiple times in the output for that user:
$ cat tst.awk
NR == 1 { next }
/^[a-z]{3}[0-9]{3}/ {
user = $1
users[user]
cmd = $0
sub(/([^[:space:]]+[[:space:]]+){7}/,"",cmd)
cmds[user,++numCmds[user]] = cmd
stime = $5
if ( (earliestStime == "") || (stime < earliestStime) ) {
earliestStime = stime
earliestData = $0
}
if ( (latestStime == "") || (stime > latestStime) ) {
latestStime = stime
latestData = $0
}
}
END {
for (user in users) {
printf "User: %s\n", user
for (cmdNr=1; cmdNr<=numCmds[user]; cmdNr++) {
printf " %s\n", cmds[user,cmdNr]
}
}
print "\nEarliest Start Time:"
print earliestData
print "\nLatest Start Time:"
print latestData
}
.
$ awk -f tst.awk file
User: xle135
bash
ssh ctf.cs.utsarr.net
User: zeh458
vim prog.c
User: dmq292
bash
assign1.sh if of
User: erg474
ls
User: adz110
bash
who
Earliest Start Time:
adz110 5344 5334 0 08:47 pts/2 00:00:00 bash
Latest Start Time:
xle135 8983 8636 0 08:59 pts/15 00:00:00 ssh ctf.cs.utsarr.net
If the order of users or commands in the output matters then tell us what the order should be and it'll be easy enough to adapt, e.g. by retaining the input order or setting PROCINFO["sorted_in"] with GNU awk before each loop in the END section.

How to write a script that searches for numeric pattern in huge file?

I have 200000 integers written in a file like this
0
1
2
3
.
98
99
.
.
100
101
102
.
I want to write with awk or join script that would tell how many times this pattern(from 0 to 99 )repeats itself.
Not battle tested:
awk 'i++!=$0{i=$0==0?1:0}i==100{c++;i=0}END{print c}' p.txt
Breakdown:
i++ != $0 { # Use a cursor (i) which will be compared to input
i=$0==0?1:0; # If not matched reset cursor if current line is zero then set to 1 because
# .. this means we already matched our first line. If not set to 0
i == 100 { # If Full pattern found:
c++; # add to count
i=0; # reset cursor
}
END {print c} # Print matched count
You can do this using a state variable which is reset anytime the pattern is incomplete. For example:
#!/usr/bin/awk -f
BEGIN {
state = -1;
count = 0;
}
/^[0-9]+$/ {
if ( $0 == ( state + 1 ) || $0 == 0 ) {
state = $0;
if ( state == 99 ) {
count++;
}
} else {
state = -1;
}
next;
}
{ state = -1; next; }
END {
print count;
}
This script assumes awk is in /usr/bin (the usual case). You would put the script in a file, e.g., "patterns", and run it like
./patterns < p.txt

{awk} How to read a line and compare a $ with its next/previous line?

The command below is used to read an input file containing 7682 lines:
I use the --field-separator then converted some fields into what I need, and the grep got rid of the 2 first lines I do not need.
awk --field-separator=";" '($1<15) {print int(a=(($1-1)/480)+1) " " ($1-((int(a)-1)*480)) " " (20*log($6)/log(10))}' 218_DW.txt | grep -v "0 480 -inf"
I used ($1<15) so that I only print 14 lines, better for testing. The output I get is exactly what I want, but, there is more I need to do on that:
1 1 48.2872
1 2 48.3021
1 3 48.1691
1 4 48.1502
1 5 48.1564
1 6 48.1237
1 7 48.1048
1 8 48.015
1 9 48.0646
1 10 47.9472
1 11 47.8469
1 12 47.8212
1 13 47.8616
1 14 47.8047
From above, $1 will increment from 1-16, $2 from 1-480, it's always continuous,
so when it gets to 16 480 47.8616 it restarts from 2 1 47.8616 until last line is 16 480 10.2156
So I get 16*480=7680 lines
What I want to do is simple, but, I don't get it :)
I want to compare the current line with the next one. But not all fields, only $3, it's a value in dB that decreases when $2 increases.
In example:
The current line is 1 1 48.2872=a
Next line is 1 2 48.3021=b
If [ (a - b) > 6 ] then print $1 $2 $3
Of course (a - b) has got to be an absolute value, always > 0.
The beast will be to be able to compare the current line (the $3 only) with it's next and previous line ($3).
Something like this:
1 3 48.1691=a
1 4 48.1502=b
1 5 48.1564=c
If [ ABS(b - a) > 6 ] OR If [ ABS(b - c) > 6 ] then print $1 $2 $3
But of course first line can only be compared with its next one and the last one with its previous one. Is it possible?
Try this:
#!/usr/bin/awk -f
function abs(x) {
if (x >= 0)
return x;
else
return -1 * x;
}
function compare(a,b) {
return abs(a - b) > 6;
}
function update() {
before_value = current_value;
current_line = $0;
current_value = $3;
}
BEGIN {
line_n = 1;
}
#Edit: added to skip blank lines and differently formatted lines in
# general. You could add some error message and/or exit function
# here to detect badly formatted data.
NF != 3 {
next;
}
line_n == 1 {
update();
line_n += 1;
next;
}
line_n == 2 {
if (compare(current_value, $3))
print current_line;
update();
line_n += 1;
next;
}
{
if (compare(current_value, before_value) && compare(current_value, $3))
print current_line;
update();
}
END {
if (compare(current_value, before_value)) {
print current_line;
}
}
The funny thing is that I had this code lying around from a old project where I had to do basically the same thing. Adapted it a little for you. I think it solves your problem (how I understood it, at least). If it doesn't, it should point you in the right direction.
Instructions to run the awk script:
Supposing you saved the code with the name "awkscript", the data file is named "datafile" and they are both in the current folder, you should first mark the script as executable with chmod +x awkscript and then execute it passing the data file as parameter with ./awkscript datafile or use it as part of a sequence of pipes as in cat datafile | ./awkscript.
Comparing the current line to the previous one is trivial, so I think the problem you're having is that you can't figure out how to compare the current line to the next one. Just keep 2 previous lines instead of 1 and always operate on the line before the one that's actually being read as $0, i.e. the line stored in the array p1 in this example (p2 is the line before it and $0 is the line after it):
function abs(val) { return (val > 0 ? val : -val) }
NR==2 {
if ( abs(p1[3] - $3) > 6 ) {
print p1[1], p1[2], p1[3]
}
}
NR>2 {
if ( ( abs(p1[3] - p2[3]) > 6 ) || ( abs(p1[3] - $3) > 6 ) ) {
print p1[1], p1[2], p1[3]
}
}
{ prev2=prev1; prev1=$0; split(prev2,p2); split(prev1,p1) }
END {
if ( ( abs(p1[3] - p2[3]) > 6 ) ) {
print p1[1], p1[2], p1[3]
}
}

Adding a loop in awk

I had a problem that was resolved in a previous post:
But because I had too many files it was not practical to do an awk on every file and then use a second script to get the output I wanted.
Here are some examples of my files:
3
10
23
.
.
.
720
810
980
And the script was used to see where the numbers from the first file fell in this other file:
2 0.004
4 0.003
6 0.034
.
.
.
996 0.01
998 0.02
1000 0.23
After that range was located, the mean values of the second column in the second file was estimated.
Here are the scripts:
awk -v start=$(head -n 1 file1) -v end=$(tail -n 1 file1) -f script file2
and
BEGIN {
sum = 0;
count = 0;
range_start = -1;
range_end = -1;
}
{
irow = int($1)
ival = $2 + 0.0
if (irow >= start && end >= irow) {
if (range_start == -1) {
range_start = NR;
}
sum = sum + ival;
count++;
}
else if (irow > end) {
if (range_end == -1) {
range_end = NR - 1;
}
}
}
END {
print "start =", range_start, "end =", range_end, "mean =", sum / count
}
How could I make a loop so that the mean for every file was estimated. My desired output would be something like this:
Name_of_file
start = number , end = number , mean = number
Thanks in advance.
.. wrap it in a loop?
for f in <files>; do
echo "$f";
awk -v start=$(head -n 1 "$f") -v end=$(tail -n 1 "$f") -f script file2;
done
Personally I would suggest combining them on one line (so that your results are block-data as opposed to file names on different lines from their results -- in that case replace echo "$f" with echo -n "$f " (to not add the newline).
EDIT: Since I suppose you're new to the syntax, <files> can either be a list of files (file1 file2 file 3), a list of files as generated by a glob (file*, files/data_*.txt, whatever), or a list of files generated by a command ( $(find files/ -name 'data' -type f), etc).

Resources