Date sorting data and print only required date columns

Date sorting data and print only required date columns - sorting

Suppose I have data like (the file is a text file).
Col1 Col2 Col3
user1 21:01:15 user1#gmail.com
user2 22:01:15 user2#gmail.com
user3 19:01:15 user3#gmail.com
user4 16:01:15 user4#gmail.com
What I want is to sort and only print columns having time between 19:01:15 to 22:01:15 on the screen. Please help.

You can use the following awk:
$ awk '$2>"19:01:15" && $2<"22:01:15"' file
user1 21:01:15 user1#gmail.com
Note that it allows you to write the exact time range you prefer.
In case you want the date range to be "lower or equal than" and "bigger or equal than", do:
$ awk ' $2>="19:01:15" && $2<="22:01:15"' file
user1 21:01:15 user1#gmail.com
user2 22:01:15 user2#gmail.com
user3 19:01:15 user3#gmail.com

Related

Edit a particular string in a file based on another file

Hello I have a file called users. In that file i have a list of users for example
user1 user2 user3
Now i have another file called searches where there is a specific string called owner = user for example
owner = user1
random text
random text
owner = user15
random text
random text
owner = user2
so is it possible to find all the users based on the users file and rename those users to user#domain.com ? for example
owner = user1#domain.com
random text
random text
owner = user15
random text
random text
owner = user2#domain.com
Currently what i am doing is the long manual process like below
awk '/owner = user1|owner = user2|owner = user3/{print $0 "#domain.com"; next}1' file
and it actually does work but for large files i have to spend a long time creating this command.

Given:
$ head users owners
==> users <==
user1 user2 user3
==> owners <==
owner = user1
random text
random text
owner = user15
random text
random text
owner = user2
You can use this awk:
awk 'BEGIN{FS="[ \t]*=[ \t]*|[ \t]+"}
FNR==NR{for (i=1;i<=NF;i++) seen[$i]; next}
/^owner[ \t]*=/ && $2 in seen{sub($2, $2 "#domain.com") } 1' users owners
Prints:
owner = user1#domain.com
random text
random text
owner = user15
random text
random text
owner = user2#domain.com

Shell - sum of column on user

Basically, I've two columns. First one stands for the users and the second one for the time they've spent on the server. So I'd like to sum for each client, how many minutes did he spend on the server.
user1 21:03
user2 19:55
user3 20:09
user1 18:57
user1 19:09
user3 21:05
user4 19:57
Let's say that I've this. I know how to split but there's one problem. Whenever I do awk -F: '{print $1} it prints users and the first parameter of the time (the number before :), and when I do awk -F: '{print $2} it prints only the numbers after :. After all of the sum, I'd like to get something like
user1 59:09
user2 19:55
user3 41:14
user4 19:57

Here's a possible solution:
perl -ne '/^(\S+) (\d\d):(\d\d)$/ or next; $t{$1} += $2 * 60 + $3; END { printf "%s %02d:%02d\n", $_, $t{$_} / 60, $t{$_} % 60 for sort keys %t }'
Or with better formatting:
perl -ne '
/^(\S+) (\d\d):(\d\d)$/ or next;
$t{$1} += $2 * 60 + $3;
END {
printf "%s %02d:%02d\n", $_, $t{$_} / 60, $t{$_} % 60
for sort keys %t;
}
'
We loop over all input lines (-n). We make sure every line matches the pattern \S+ \d\d:\d\d (i.e. a sequence of 1 or more non-space characters, a space, two digits, a colon, two digits) or else we skip it.
We accumulate the number of seconds per user in the hash %t. The keys are the user names, the values are the numbers.
At the end we print the contents of %t in a nicely formatted way.

this is an awk solution
cat 1.txt | awk '{a[$1]+=substr($2,0,2)*60+substr($2,4)} END {for(i in a) printf("%s %02d:%02d\n", i,a[i]/60,a[i]%60)}'
user1 59:09
user2 19:55
user3 41:14
user4 19:57
first construct an array with index=$1, and the value = convert the time to an integer by minutes * 60 + seconds
{a[$1]+=substr($2,0,2)*60+substr($2,4)}
then print the array in the desired format which converts integer to a mi:ss format.
printf("%s %02d:%02d\n", i,a[i]/60,a[i]%60)

If you want to use awk (and assuming the duration is always hh:mm, though their sizes can be arbitrary), the following will do the trick:
{
split($2, flds, ":") # Get hours and minutes.
mins[$1] += flds[1] * 60 + flds[2] # Add to initially zero array item.
}
END {
for (key in mins) { # For each key in array.
printf "%s %d:%02d\n", # Output specific format.
key, # Key, hours, and minutes.
mins[key] / 60,
mins[key] % 60
}
}
That's the expanded, readable variant, the compressed one is shown in the following transcript, along with the output as expected:
pax> awk '{split($2,flds,":");mins[$1] += flds[1] * 60 + flds[2]}END{for(key in mins){printf "%s %d:%02d\n",key,mins[key]/60,mins[key]%60}}' testprog.in
user1 59:09
user2 19:55
user3 41:14
user4 19:57
Just keep in mind you haven't specified the input format whenever a user entry has more than 24 hours. If it goes something like 25:42, the script will work as is.
If instead it decides to break out days (into something like 1:01:42 rather than 25:42), you'll need to adjust how the minutes are calculated. This can be done relatively easily (including the possibility for minutes-only entries) by checking the flds array size with (in the main body of the script, the non-END bit):
num = split($2, flds, ":")
if (num == 1) { add = flds[1] }
else if (num == 2) { add = flds[1] * 60 + flds[2] }
else { add = flds[1] * 1440 + flds[2] * 60 + flds[3] }
mins[$1] += add

Extract text and evaluate in bash

I need some help getting a script up and running. Basically I have some data that comes from a command output and want to select some of it and evaluate
Example data is
JSnow <jsnow#email.com> John Snow spotted 30/1/2015
BBaggins <bbaggins#email.com> Bilbo Baggins spotted 20/03/2015
Batman <batman#email.com> Batman spotted 09/09/2015
So far I have something along the lines of
# Define date to check
check=$(date -d "-90 days" "+%Y/%m/%d")
# Return user name
for user in $(command | awk '{print $1}')
do
# Return last logon date
$lastdate=(command | awk '{for(i=1;i<=NF;i++) if ($i==spotted) $(i+1)}')
# Evaluation date again current -90days
if $lastdate < $check; then
printf "$user not logged on for ages"
fi
done
I have a couple of problems, not least the fact that whilst I can get information from places I don't know how to go about getting it all together!! I'm also guessing my date evaluation will be more complicated but at this point that's another problem and just there to give a better idea of my intentions. If anyone can explain the logical steps needed to achieve my goal as well as propose a solution that would be great. Thanks

Every time you write a loop in shell just to manipulate text you have the wrong approach (see, for example, https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice). The general purpose text manipulation tool that comes on every UNIX installation is awk. This uses GNU awk for time functions:
$ cat tst.awk
BEGIN { check = systime() - (90 * 24 * 60 * 60) }
{
user = $1
date = gensub(/([0-9]+)\/([0-9]+)\/([0-9]+)/,"\\3 \\2 \\1 0 0 0",1,$NF)
secs = mktime(date)
if (secs < check) {
printf "%s not logged in for ages\n", user
}
}
$ cat file
JSnow <jsnow#email.com> John Snow spotted 30/1/2015
BBaggins <bbaggins#email.com> Bilbo Baggins spotted 20/03/2015
Batman <batman#email.com> Batman spotted 09/09/2015
$ cat file | awk -f tst.awk
JSnow not logged in for ages
BBaggins not logged in for ages
Batman not logged in for ages
Replace cat file with command.

Wondering how to merge these two bash commands in a single efficient one

I've a log file that contains some lines I need to grab:
Jul 2 06:42:00 myhostname error proc[12345]: 01310001:3: event code xxxx Slow transactions attack detected - account id: (20), number of dropped slow transactions: (3)
Jul 2 06:51:00 myhostname error proc[12345]: 01310001:3: event code xxxx Slow transactions attack detected - account id: (20), number of dropped slow transactions: (2)
Account id(xx) gives me the name of an object that I am able to gather through mysql query.
Following command (which is for sure not optimized at all, but working) gives me the number of matching lines per account id:
grep "Slow transactions" logfile| awk '{print $18}' | awk -F '[^0-9]+' '{OFS=" ";for(i=1; i<=NF; i++) if ($i != "") print($i)}' | sort | uniq -c
14 20
The output (14 20) means the account id 20 was observed 14 times (14 lines in the logfile).
Then I also have number of dropped slow transactions: (2) part.
This gives the real number of dropped transactions that was logged. In other word, a log entry could mean 1 or more dropped transaction.
I do have a small command to count the number of dropped transactions:
grep "Slow transactions" logfile | awk '{print $24}' | sed 's/(//g' | sed 's/)//g' | awk '{s+=$1} END {print s}'
73
That means 73 transactions were dropped.
These two works but when coming to the point of merging the two I am stuck. I really don't see how to combine them; I am pretty sure awk can do it (and probably a better way that I did) but I would appreciate if any expert from the community could give me some guidance.
update
Since above one was too easy for some of our awk experts in SO I introduce an optional feature :)
As previously mentioned I can convert account ID into a name issuing a mysql query. So, the idea is now to include the ID => name conversion into the awk command.
The mySQL query looks like this (XX being the account ID):
mysql -Bs -u root -p$(perl -MF5::GenUtils -e "print get_mysql_password.qq{\n}") -e "SELECT name FROM myTABLE where account_id= 'XX'"
I founded the post below which deals with commands outputs into awk but facing syntax errors...
How can I pass variables from awk to a shell command?

This uses parentheses as your field separator, so it's easier to grab the account number and the number of slow connections.
awk -F '[()]' '
/Slow transactions/ {
acct[$2]++
dropped[$2] += $4
}
END {
PROCINFO["sorted_in"] = "#ind_num_asc" # https://www.gnu.org/software/gawk/manual/html_node/Controlling-Scanning.html
for (acctnum in acct)
print acctnum, acct[acctnum], dropped[acctnum]
}
' logfile
Given your sample input, this outputs
20 2 5
Required GNU awk for the "sorted_in" method of sorting array traversal by index.

Shell script sort list

I have a list with the following content:
VIP NAME DATE ARRIVE_TIME FLIGHT_TIME
1 USER1 11-02 20.00 21.00
3 USER2 11-02 20.45 21.45
4 USER2 11-03 20.00 21.30
2 USER1 11-04 17.20 19.10
I want to sort this and similar lists with a shell script. The result should be a new list with lines that do not collide. VIP 1 is most important, if any VIP with a bigger number has ARRIVE_TIME before FLIGHT_TIME for VIP 1 on the same date this line should be removed, so the VIP number should be used to decide which lines to keep if the ARRIVE_TIME, FLIGHT_TIME and DATE collide. Similarly, VIP 2 is more important than VIP 3 and so on.
This is pretty advanced, and I am totally empty for ideas on how to solve this.

You can use the unix sort command to do this:
There's an example of how to set primary and secondary keys etc:
Example
The uniq command is what you need to remove dupes.

This might get you started:
I'm ignoring the header line. You can get rid of it using head or skip it in the for loop.
Sort the flights by date, arrival, departure and vip number - having the vip number as a sort key simplifies the logic later.
I'm saving the result in an array, but you could redirect it to a temporary file and read it in a line at a time with a while read line; do ...; done <tempfile loop.
I'm using indirection to make things more readable (naming the fields instead of using array indices directly - the exclamation point means indirection here instead of "not")
For each line in the result that occurs on the same date as the most recently printed line, compare its arrival time to the previous flight's departure time
Echo the lines that are appropriate.
save the date and departure time for later comparison.
You should adjust the < comparison to be <= if that works better for your data.
Here is the script:
#!/bin/bash
saveIFS="$IFS"
IFS=$'\n'
flights=($(sort -k3,3 -k4,4n -k5,5n -k1,1n flights ))
IFS="$saveIFS"
date=fields[2]
arrive=fields[3]
depart=fields[4]
for line in "${flights[#]}"
do
fields=($line)
if [[ ${!date} == $prevdate && ${!arrive} < $prevdep ]]
then
echo "deleted: $line" # or you could do something else here
else
echo $line
prevdep=${!depart}
prevdate=${!date}
fi
done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Date sorting data and print only required date columns - sorting

Related

Edit a particular string in a file based on another file

Shell - sum of column on user

Extract text and evaluate in bash

Wondering how to merge these two bash commands in a single efficient one

Shell script sort list

Categories

Resources