Bash and awk: converting a field from 12 hour to 24 hour clock time - bash

I have a large txt file space delimited which I split into 18 smaller files (each with their own number of columns). This split is based on a delimiter i.e. whenever the timestamp hits midnight. So effectively, I'll end up with a 18 files in the form of (note, ignore the dashes and pipes, I've used them to improve the readability):
file1
time ----------- valueA - valueB
12:00:00 AM | 54.13 | 239.12
12:00:01 AM | 51.83 | 119.93
..
file18
time ---------- valueA - valueB - valueC - valueD
12:00:00 AM | 54.92 | 239.12 | 231.23 | 882.12
12:00:01 AM | 23.92 | 121.92 | 201.23 | 892.12
..
Once I split the file I then perform some processing on each of the files using AWK so in short there's 2 stages the 'split stage' and the 'processing stage'.
Unfortunately, the timestamp contained in the large txt file is in 1 of 2 formats. Either the desirable 24 hour format of "00:00:01" or the undesirable 12 hour format of "12:00:01 AM".
As a result, I'm trying to convert all formats to be 24 hours and I'm not sure how to do this. I'm also not sure whether to attempt this at the split stage using bash or at the process stage using AWK. I know that the following function converts 12 hour to 24 hr
'date --date="12:00:01 AM" +%T'
however, I'm not sure how to incorporate this into my shell script were I'm using 'while read line' at the 'split stage' or whether I should do the time conversion in AWK (if possible?) at the 'processing stage'.

see the test below, is it helpful for you?
kent$ echo "12:00:00 AM | 54.92 | 239.12 | 231.23 | 882.12 "\
|awk -F'|' 'BEGIN{OFS="|"}{("date --date=\""$1"\" +%T") |getline $1;print }'
output
00:00:00| 54.92 | 239.12 | 231.23 | 882.12

Related

How do you get items from txt into presentable table in bash?

I'm trying to retrieve items from Node01.pc and put it within a table.
Example:
echo ${NodeCPU[0]} is able to print the item from the line.
But when I use printf or echo it either breaks or does not display the output from the array item.
The formating of the table seems work and it displays only if it's not the arrays. Could it be that there's more than to the file that I can see?
Node01.pc contains
192.168.0.99
2
70
16
80
4
4
100
4
VS122:NMAD:20:20:1:1
VS122:NAMD:20:20:1:1
RS123:FEM:10:20:1:1
QV999:BEM:20:20:1:1
But I only need lines 3,5,7,9
I'm not sure if what is the best way to do this, or if I even need to store items into arrays.
I thought about retrieving all text from the texts files and making a new file which will contain all the data, but I'm not sure how to do that.
This is the code that I have right now.
#!/bin/bash
Node01=($(cat Node01.pc))
Node02=($(cat Node02.pc))
Node03=($(cat Node03.pc))
Node04=($(cat Node04.pc))
Node05=($(cat Node05.pc))
NodeCPU=("${Node01[2]}" "${Node02[2]}" "${Node03[2]}" "${Node04[2]}" "${Node05[2]}")
NodeMEM=("${Node01[4]}" "${Node02[4]}" "${Node03[4]}" "${Node04[4]}" "${Node05[4]}")
NodeHDD=("${Node01[6]}" "${Node02[6]}" "${Node03[6]}" "${Node04[6]}" "${Node05[6]}")
NodeNET=("${Node01[8]}" "${Node02[8]}" "${Node03[8]}" "${Node04[8]}" "${Node05[8]}")
seperator=----------------------
seperator=$seperator$seperator
rows="%-10s| %-7s| %-7s| %-7s| %-7s\n"
TableWidth=140
printf "%-10s| %-7s| %-7s| %-7s| %-7s\n" NodeNumber CPU MEM HDD NET
printf "%.${TableWidth}s\n" "$seperator"
for((i=0;i<=4;i++))
do
printf "$rows" "$(( $i+1 ))" "${NodeCPU[i]}" "${NodeMEM[i]}" "${NodeHDD[i]}" "${NodeNET[i]}"
done
read
This is an example of what I want to display
NodeNumber | CPU | MEM | HDD | NET
----------------------------------
1 | 10 | 20 | 20 | 40
2 | 10 | 20 | 20 | 40
3 | 10 | 20 | 20 | 40
4 | 10 | 20 | 20 | 40
5 | 10 | 20 | 20 | 40
EDIT This is what I'm currently getting:
NodeNumber| CPU | MEM | HDD | NET
--------------------------------------------
| 4 | 70
| 5 | 90
| 6 | 100
| 6 | 70
| 40 | 40
Issue I'm having is with
printf "$rows" "$(( $i+1 ))" "${NodeCPU[i]}" "${NodeMEM[i]}" "${NodeHDD[i]}" "${NodeNET[i]}"
Why worry about all the separate array? Simply loop over all "Node*.pc" files in the current directory and read the contents of each file into an array with readarray and then output the file count and elements nos. 2, 4, 6, 8 of the array in the proper format (adjust elements output as needed), e.g.
#!/bin/bash
cnt=1 ## file counter
## print heading
printf "NodeNumber | CPU | MEM | HDD | NET\n----------------------------------\n"
for i in Node*.pc; do ## loop over all Node*.pc files in directory
readarray -t node < "$i" ## read contents into array
## output count and elements 2, 4, 6, 8 in proper format
printf "%-11s| %-4s| %-4s| %-4s| %s\n" $((cnt++)) \
"${node[2]}" "${node[4]}" "${node[6]}" "${node[8]}"
done
Example Use/Output
With the example data shown copied to the file Node01.pc in the current directory, you would get:
$ bash node.sh
NodeNumber | CPU | MEM | HDD | NET
----------------------------------
1 | 70 | 80 | 4 | 4
(I called the script node.sh)
It would output the information from each file as separate lines numbered 1, 2, ... Look things over an let me know if this is what you intended. (you can also do the same thing with awk faster by setting FS=\n and treating the lines as columns in a single record)
You can do the same thing in awk with:
awk '
BEGIN {
RS=""; FS="\n"
printf "NodeNumber | CPU | MEM | HDD | NET\n----------------------------------\n"
}
NF >= 9 {
printf "%-11s| %-4s| %-4s| %-4s| %s\n",++cnt,$3,$5,$7,$9
}
' Node*.pc
(note: in awk the field numbers are 1-based, while in bash the array indexes are 0-based)
Output is the same.

How to put pivot table using Shell script

I have data in a CSV file as below...
Emailid Storeid
a#gmail.com 2000
b#gmail.com 2001
c#gmail.com 2000
d#gmail.com 2000
e#gmail.com 2001
I am expecting below output, basically finding out how many email ids are there for each store.
StoreID Emailcount
2000 3
2001 2
So far i tried to solve my issue
IFS=","
while read f1 f2
do
awk -F, '{ A[$1]+=$2 } END { OFS=","; for (x in A) print x,A[x]; }' > /home/ec2-user/storewiseemials.csv
done < temp4.csv
With the above shell script i am not getting desired output, Can you guys please help me?
Using miller (https://github.com/johnkerl/miller) and starting from this (I have used a CSV, because I do not know if you use a tab or a white space as separator)
Emailid,Storeid
a#gmail.com,2000
b#gmail.com,2001
c#gmail.com,2000
d#gmail.com,2000
e#gmail.com,2001
and running
mlr --csv count-distinct -f Storeid -o Emailcount input >output
you will have
+---------+------------+
| Storeid | Emailcount |
+---------+------------+
| 2000 | 3 |
| 2001 | 2 |
+---------+------------+

CSV - How to add columns based on an existing column?

What is the best way to do this and how?
I gather things called sed, AWK and bash may be relevant.
I have used AWK once for one command, the others never.
I have searched and other apparently similar questions do not have an answer I need.
I have columns I have called fields in a CSV file:
_________________________
field1 | field2 | field3|
-------------------------
1990AB | 123456 | 123456|
-------------------------
I want to add fields based on these three original fields to appear as follows:
_______________________________________________________
field1 | field2 | field3 | field1a | field2a | field3a |
-------------------------------------------------------
1990AB | 123456 | 123456| 1990 | 12345 | 12345 |
-------------------------------------------------------
where:
field1a 1990 column 1 first 4 always digits then alpha
field2a 12345 column 2 is always 6 digits
field3a 12345 column 3 is always 6 digits
These are one-time-per-file actions, prior to database import.
macosx has about 6 million records. 2nd attempt at this question as my first was apparently not good. In this area I am a 100% novice.
awk to the rescue!
this should be easy to read even if you have no prior experience with awk
$ awk -F, -v OFS=, 'NR==1 {for(i=1;i<=3;i++) $(++NF)=$i"a"}
NR>1 {$(++NF)=substr($1,1,4);
$(++NF)=substr($2,1,5);
$(++NF)=substr($3,1,5)}1' file
NR is line number, special treatment for header, NF is number of fields, here incrementing for each additional column and $i is field value at position i. The last 1 is shorthand for printing the line. Initial options are for setting input field delimiter (F) and output field delimiter (OFS) to comma.

Print unique names of users logged on with finger

I'm trying to write a shell script that prints the full names of users logged on to a machine. The finger command gives me a list of users, but there are many duplicates. How can I loop through and print out only the unique ones?
Edit:
This is the format of what finger gives me:
xxxx XX of group XXX pts/59 1:00 Feb 13 16:38
xxxx XX of group XXX pts/71 1:11 Feb 13 16:27
xxxx XX of group XXX pts/105 1d Feb 12 15:22
xxxx YY of group YYY pts/102 2:19 Feb 13 14:13
xxxx ZZ of group ZZZ pts/42 2d Feb 7 12:11
I'm trying to extract the full name (i.e. whatever comes before 'of group' in column 2), so I would be using awk together with finger.
What you want is actually fairly difficult in a shell script, here is, for example, my full output of finger(1):
Login Name TTY Idle Login Time Office Phone
martin Martin Tournoij *v0 1d Wed 14:11
martin Martin Tournoij pts/2 22 Wed 15:37
martin Martin Tournoij pts/5 41 Thu 23:16
martin Martin Tournoij pts/7 31 Thu 23:24
martin Martin Tournoij pts/8 Thu 23:29
You want the full name, but this may contain 1 space (as per my example), or it may just be 'Teller' (no space), or it may be 'Captain James T. Kirk' (3 spaces). So you can't just use the space as delimiter. You could use the character position of 'TTY' in the header as an indicator, but that's not very elegant IMHO (especially with shell scripting).
My solution is therefore slightly different, we get only the username from finger(1), then we get the full name from /etc/passwd
#!/bin/sh
prev=""
for u in $(finger | tail +2 | cut -w -f1 | sort); do
[ "$u" = "$prev" ] && continue
echo "$u $(grep "^$u" /etc/passwd | cut -d: -f5)"
prev="$u"
done
Which gives me both the username & login name:
martin Martin Tournoij
Obviously, you can also print just the real name (without the $u).
The sort and uniq BinUtils commands can be used to removed duplicates.
finger | sort -u
This will remove all duplicate lines, but you will still see similar lines due to how verbose the finger command is. If you just want a list of usernames, you can filter it out further to be very specific.
finger | cut -d ' ' -f1 | sort -u
Now, you can take this one step further, and remove the "header/label" line printed out by the finger command.
finger | cut -d ' ' -f1 | sort -u | grep -iv login
Hope this helps.
Other possible solution:
finger | tail -n +2 | awk '{ print $1 }' | sort | uniq
tail -n +2 to omit the first line.
awk '{ print $1 }' to extract the first column.
sort to prepare input for uniq.
uniq remove duplicates.
If you want to iterate use:
for user in $(finger | tail -n +2 | awk '{ print $1 }' | sort | uniq)
do
echo "$user"
done
Could this be simpler?
No spaces or any other special characters to worry about!
finger -l | awk '/^Login/'
Edit: To remove the content after of group
finger -l | awk '/^Login/' | sed 's/of group.*//g'
Output:
Login: xx Name: XX
Login: yy Name: YY
Login: zz Name: ZZ

Bash scripting: using sed and cut to output a specific format

I am working on a bash script using sed and cut that will take times input in various ways and output them in a specific format. Here is an example line:
timeinhour=$(cut -d" " -f2<<<"$line" | sed 's/p/ /' | sed 's/a/ /' | sed 's/am/ /' | sed 's/pm/ /' | sed 's/AM/ /' | sed 's/PM/ /' )
As you can see I am just removing any trailing am or pm from a time entry that might be formatted in various ways leaving only the numbers.
So I want this line to just spit out the hour of the day (timeinhour), ie "1000AM" = "10" as does "10a" and "10am."
The problem I am running into is the varying lengths of the time entries. If I tell sed or cut to remove the last two characters "1000" will correctly output the hour I need: "10," but using it on one that is already "10" obviously results in a blank output.
I have been experimenting with a line like this
sed 's/\(.*\)../\1/'
If anyone has any advice, I would appreciate it.
For example, this input:
1p
1032AM
419pm
1202a
would produce:
1
10
4
12
sed 's/[^0-9]//g;s/^[0-9]\{1,2\}$/&00/;s/^\(.*\)..$/\1/'
the steps
1p -> 1 -> 100 -> 1
10a -> 10 -> 1000 -> 10
419pm -> 419 -> 419 -> 4
1202a -> 1202 -> 1202 -> 12
delete what is not number
expand 1 or 2 digit (hours) into 4 digit HHmm
ignore last two charactes (minutes)
Try:
timeinhour=$(cut -d" " -f2<<<"$line" | sed 's/p/ /;s/a/ /;s/am/ /;s/pm/ /;s/AM/ /;s/PM/ /' | sed 's/\(.*\)../\1/' # Using your example.

Resources