Related
I've raw log snippets being dumped on the console as output for custom command:
bash$ custom-command
current-capacity: 3%, buffer: 1024, not-used/total: 10/10, IsEnabled: 0. Up since Thu Jun 23 11:54:14 2022
current-capacity: 0%, buffer: 1024, not-used/total: 25/25, IsEnabled: 0. Up since Thu Jun 23 11:54:14 2022
current-capacity: 0%, buffer: 1024, not-used/total: 15/15, IsEnabled: 1. Up since Thu Jun 23 11:54:14 2022
I need have CSV format like below to capture the status in real-time based on certain criterias, I can then redirect the output to CSV file at regular interval before loading into SQL database.
current-capacity, buffer, not-used/total, IsEnabled, Up since
3%, 1024, 10/10, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 25/, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 10/10, 0, Thu Jun 23 11:54:14 2022
I've tried AWK but still facing issue since it's comma seperated for most part except IsEnabled: 0 ends with . then the Uptime. Is there a way? I'm quite new to awk.
awk -F' :|, |: ' -v OFS="," '
NR==1{
print $1,$3,$5,$7,"Up since"
}
{
print $2,$4,$6,gensub(/([0-1])\. ?Up since ?(.*)/, "\\1,\\2", 1, $8)
}' file
current-capacity,buffer,not-used/total,IsEnabled,Up since
3%,1024,10/10,0,Thu Jun 23 11:54:14 2022
0%,1024,25/25,0,Thu Jun 23 11:54:14 2022
0%,1024,15/15,1,Thu Jun 23 11:54:14 2022
awk -F' :|, |: ' '
function formatRow(r){
return gensub(/([0-1])\. ?(Up since)/, "\\1, \\2:", 1, r)
}
NR==1{
$0 = formatRow($0)
for(i=1;i<=NF-1;i+=2) printf (i==NF-1 ? "%s\n" : "%s,"), $i
}
{
$0 = formatRow($0)
for(i=2;i<=NF;i+=2) printf (i==NF ? "%s\n" : "%s,"), $i
}' file
current-capacity,buffer,not-used/total,IsEnabled,Up since
3%,1024,10/10,0,Thu Jun 23 11:54:14 2022
0%,1024,25/25,0,Thu Jun 23 11:54:14 2022
0%,1024,15/15,1,Thu Jun 23 11:54:14 2022
It's just writing a regex that matches the output and transforming it.
sed -E 's/current-capacity: (.*)%, buffer: (.*), not-used/total: (.*), IsEnabled: (.*). Up since (.*)/\1%,\2,\3,\4/'
Using any awk in any shell on every Unix box:
$ cat tst.awk
BEGIN { FS="[:,] "; OFS=", " }
match($0,/\. [^ ]+ [^ ]+/) {
$0 = substr($0,1,RSTART-1) "," substr($0,RSTART+1,RLENGTH-1) ":" substr($0,RSTART+RLENGTH)
}
NR == 1 {
for ( i=1; i<NF; i+=2 ) {
printf "%s%s", $i, (i<(NF-1) ? OFS : ORS)
}
}
{
for ( i=2; i<=NF; i+=2 ) {
printf "%s%s", $i, (i<NF ? OFS : ORS)
}
}
$ awk -f tst.awk file
current-capacity, buffer, not-used/total, IsEnabled, Up since
3%, 1024, 10/10, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 25/25, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 15/15, 1, Thu Jun 23 11:54:14 2022
In the above the first step is using match() { ... } to make the . Up since Thu field at the end of each input line use the same , and : separators as the rest of the input , Up since: Thu so the rest of the code parsing the now consistent input is easy.
Welcome to StackOverflow. Thank you for including both sample data and desired output. You are recommended to study the markdown formatting syntax here, since your code was entered as quoted HTML. It's better to use code tags. This will output fixed-width text and is easier to read.
As for your problem, you can use the match statement in gawk to capture all fields using regular expressions, because your input data is formatted same way.
Something like this will do the needed:
BEGIN{
# set output separator to comma space
OFS=", "
# define the regular expression to capture needed
# See https://regex101.com/r/K4wYoB/1
#
# ([^,]) captures all until next comma, not including comma
# (.) captures single character
# (.*) at the end, captures remaining
#
# did not use full words, since it was not needed.
#
myregexp="y: ([^,]*).*r: ([^,]*).*al: ([^,]*).*led: (.).*ce (.*)"
# print header for output
print "current-capacity, buffer, not-used/total, IsEnabled, Up since"
}
# loop lines. Skipping header line
NR>1{
# capture data fields
match($0, myregexp, a)
# print the line from "a" array
print a[1], a[2], a[3], a[4], a[5]
}
The input format, is one of Miller input formats
If you run simply
mlr --ocsv --ips : clean-whitespace input.txt
you will have
current-capacity,buffer,not-used/total,IsEnabled
3%,1024,10/10,0. Up since Thu Jun 23 11:54:14 2022
0%,1024,25/25,0. Up since Thu Jun 23 11:54:14 2022
0%,1024,15/15,1. Up since Thu Jun 23 11:54:14 2022
{m,g}awk '
NR == (NF=NF)^_ { printf("%s,%s,%s,%s%.00s, Up since\n",
$(__=_^=_<_), $(_+=++__),
$(_+=__), $(_+__), FS = FS"|(, )+")
} {
for(_^=!__;_<NF;_+=__) {
$_=___
} $(+___)=$(_-=_)
}
sub("^ *(, )*",___,$!(NF = NF))' FS='[.] Up since |[,:][ \t]+' OFS=', '
|
current-capacity, buffer, not-used/total, IsEnabled, Up since
3%, 1024, 10/10, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 25/25, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 15/15, 1, Thu Jun 23 11:54:14 2022
Hey guys I have been working on this problem where I have got a CSV File where I have to filter specific months based on the input from the user.
Record Format
firstName,lastName,YYYYMMDD
But the thing is the input is in String and the month in the file is in numbers.
For example
> cat guest.csv
Micheal,Scofield,20000312
Lincon,Burrows,19981009
Sara,Tancredi,20040923
Walter,White,20051024
Barney,Stinson,20041230
Ted,Mosbey,20031126
Eric,Forman,20070430
Jake,Peralta,20030808
Amy,Santiago,19990405
Colt,Bennett,19990906
> ./list.sh March guest.csv
Micheal,Scofield,20000312
Oneliner:
MONTH=March; REGEX=`date -d "1 ${MONTH} 2022" +%m..$`; grep $REGEX guest.csv
Awk can easily translate month names to numbers and do the filtering.
awk -v month="March" -F , '
BEGIN { split("January February March April May June July August September October November December", mon, " ");
for(i=1; i<=12; i++) mm[i] = mon[i] }
mm[0 + substr($3, 5, 2)] == month' guest.csv
The BEGIN block sets up a pair of associative arrays which can be used in the main script to look up a month number by name. Change -v month="April" to search for a different month.
If you want to wrap this in a shell script, you can easily parse out the arguments into variables:
#!/bin/sh
monthname=$1
shift
awk -v month="$monthname" -F , '
BEGIN { split("January February March April May June July August September October November December", mon, " ");
for(i=1; i<=12; i++) mm[i] = mon[i] }
mm[0 + substr($3, 5, 2)] == month' "$#"
I have a large dataset for analysis and I am looking for shell scripting to filter the rows only to what I require, so I am able to load the dataset for further analysis in R.
The structure for data is as follows:
Size,ModifiedTime,AccessTime,contentid
4886,"Jun 11, 2009 06:51:08 PM","Mar 15, 2013 09:24:53 AM",000000285b7925f511b3159a72f80a4a
4096,"Aug 21, 2008 03:54:28 PM","May 12, 2009 04:45:41 PM",0000011afae4d1227c4df57b410ea52c
84848,"Feb 12, 2007 12:40:00 PM","Apr 07, 2014 09:39:03 AM",000001cec02017ca3eb81ddc4cd1c9ff
518,"Aug 22, 2006 02:12:03 PM","Dec 25, 2007 06:48:18 AM",00000233565d1c17c3135a9504c455ca
264158,"Dec 08, 2009 03:28:14 PM","Apr 08, 2013 11:52:15 AM",000003020ba74b9d1b6075d3c1b8fcb3
725963,"Sep 29, 2008 03:45:21 PM","May 17, 2011 08:48:40 AM",0000034b98d29d84ce7b61ee68be7658
1340,"Sep 07, 2011 03:36:54 AM","Mar 12, 2013 02:55:01 AM",000004ed899e26ae1c9b1ece35a98af1
75264,"Jul 28, 2011 05:09:58 PM","Jun 07, 2014 04:21:28 PM",000005a09fd2eb706c5800eb06084160
198724,"Jul 23, 2012 02:25:58 PM","Jan 21, 2013 12:58:07 PM",0000060b9d552c35f281b5033dcfa1b4
It is essentially a large csv file.
Now I want to filter rows for which AccessTime is less than 10 years and then write it into a separate csv file, which in this case should print 2nd row (excluding header)
I tried the following: create a temp time variable and compare with the AccessTime, if it's less then print row.
BEGIN{
FPAT = "([^,]+)|(\"[^\"]+\")"; #this to read csv as some column value contains ,
OFS=",";
date=$(date -d "-3650 days" +"%s"); #temp time variable in epoch format
}
{
command="date -d" $6 " +%s"; #$6 refers to AccessTime column
( command | getline temp ); #converts Accesstime value to epoch format
close(command);
if(temp<date) print $6
}
But when I run this command, it doesn't print anything.
Any help is much appreciated.
Desired output:
Size,ModifiedTime,AccessTime,contentid
4096,"Aug 21, 2008 03:54:28 PM","May 12, 2009 04:45:41 PM",0000011afae4d1227c4df57b410ea52c
518,"Aug 22, 2006 02:12:03 PM","Dec 25, 2007 06:48:18 AM",00000233565d1c17c3135a9504c455ca
$ awk '
BEGIN {
m["Jan"]="01" # lookups for months
m["Feb"]="02" # Feb -> 02
m["Mar"]="03" # Mar -> 03
m["Apr"]="04" # etc.
m["May"]="05"
m["Jun"]="06"
m["Jul"]="07"
m["Aug"]="08"
m["Sep"]="09"
m["Oct"]="10"
m["Nov"]="11" # below we get todays date
m["Dec"]="12" # 10 years ago
dcmd="date +\"%Y%m%d,\" --date=\"10 years ago\"" # returns 20101204,
if((dcmd | getline d)<=0) # if getline fails
exit 1 # exit
# d=strftime("%Y%m%d")-10^5 "," # use this for GNU awk
}
$9 m[$7] $8>=d' file # explained below
d gets value 20101204. (notice the trailing comma) from the date +"%Y%m%d," --date="10 years ago". Reading the AccessTime from the file and rearranging the components with $9 m[$7] $8, for example, for Mar 15, 2013 is 20130315, (notice the comma again). The condition is the comparison of those two dates.
Output:
4886 Jun 11, 2009 06:51:08 PM Mar 15, 2013 09:24:53 AM 000000285b7925f511b3159a72f80a4a
84848 Feb 12, 2007 12:40:00 PM Apr 07, 2014 09:39:03 AM 000001cec02017ca3eb81ddc4cd1c9ff
264158 Dec 08, 2009 03:28:14 PM Apr 08, 2013 11:52:15 AM 000003020ba74b9d1b6075d3c1b8fcb3
725963 Sep 29, 2008 03:45:21 PM May 17, 2011 08:48:40 AM 0000034b98d29d84ce7b61ee68be7658
1340 Sep 07, 2011 03:36:54 AM Mar 12, 2013 02:55:01 AM 000004ed899e26ae1c9b1ece35a98af1
75264 Jul 28, 2011 05:09:58 PM Jun 07, 2014 04:21:28 PM 000005a09fd2eb706c5800eb06084160
198724 Jul 23, 2012 02:25:58 PM Jan 21, 2013 12:58:07 PM 0000060b9d552c35f281b5033dcfa1b4
With GNU awk for time functions, FPAT, and gensub():
$ cat tst.awk
BEGIN {
OFS = ","
FPAT = "([^" OFS "]*)|(\"[^\"]+\")"
now = strftime("%Y %m %d %H %M %S")
year = gensub(/ .*/,"",1,now)
rest = gensub(/[^ ]+/,"",1,now)
secs = mktime((year-10) rest)
thresh = strftime("%Y%m%d%H%M%S",secs)
}
NR > 1 {
split($3,t,/[ ,:]+/)
mthNr = (index("JanFebMarAprMayJunJulAugSepOctNovDec",t[1])+2)/3
hour = t[4] + ( (t[7] == "PM") && (t[4] < 12) ? 12 : 0 )
curr = sprintf("%04d%02d%02d%02d%02d%02d", t[3], mthNr, t[2], hour, t[5], t[6])
}
(NR == 1) || (curr < thresh)
$ awk -f tst.awk file
Size,ModifiedTime,AccessTime,contentid
4096,"Aug 21, 2008 03:54:28 PM","May 12, 2009 04:45:41 PM",0000011afae4d1227c4df57b410ea52c
518,"Aug 22, 2006 02:12:03 PM","Dec 25, 2007 06:48:18 AM",00000233565d1c17c3135a9504c455ca
Based on your shown samples, written and tested with shown samples in GNU awk. Also considering that you need to compare your 2nd occurrence of date with current date. Also this solution is not dealing with leap seconds concept(trying to figure out another way too here).
awk '
BEGIN{
num=split("jan,feb,mar,apr,may,jun,jul,aug,sept,oct,nov,dec",arr1,",")
for(i=1;i<=num;i++){
month[arr1[i]]=sprintf("%02d",i)
}
}
match($0,/[AP]M.*[AP]M/){
val=substr($0,RSTART,RLENGTH)
sub(/^[AP]M +/,"",val)
sub(/ [AP]M +$/,"",val)
split(val,array,"[ ,]")
dat=array[4] OFS month[tolower(array[1])] OFS array[2] OFS array[5]
timE=(systime()-mktime(gensub(/[ ":-]/," ","g",dat)))/(365*60*24*60)
if(timE>10){ print }
}
' Input_file
This will not print header in case you need to print it then add FNR==1{print;next} before match function.
Another shorter awk solution.
$ awk -F, -v ct=$(date "+%s") ' NR>1 { dc="date -d"$4 $5 " \"+%s\""; dc|getline t; yrs=(ct - t)/(24*60*60*365) } yrs>10 || NR==1 ' monte.txt
Size,ModifiedTime,AccessTime,contentid
4096,"Aug 21, 2008 03:54:28 PM","May 12, 2009 04:45:41 PM",0000011afae4d1227c4df57b410ea52c
518,"Aug 22, 2006 02:12:03 PM","Dec 25, 2007 06:48:18 AM",00000233565d1c17c3135a9504c455ca
$
Explanation:
The date command works if we just pass the string representation of access time.
$ date -d"Jun 11, 2009 06:51:08 PM"
Thu Jun 11 18:51:08 IST 2009
It works even without the comma
$ date -d"Jun 11 2009 06:51:08 PM"
Thu Jun 11 18:51:08 IST 2009
So there is no need to clean the data. Just passing $4 and $5 from the input file with comma as delimiter would work.
For comparison, I have used the epoch
awk -F, -v ct=$(date "+%s") ' #get the current epoch seconds via ct
NR>1 {
dc="date -d"$4 $5 " \"+%s\""; #build the date command using access time $4 and %5
dc|getline t; #execute the command and get the output in temp t
yrs=(ct - t)/(24*60*60*365) #calcualte the number of years between ct and t
}
yrs>10 || NR==1 #print if diff yrs > 10 or NR==1 for header
'
Another solution:
If you want to apply the logic for 10 years in the date command, then we need to just remove the double quotes from the $5.
$ awk -F, -v ct=$(date "+%s") ' NR>1 { c5=substr($5,1,length($5)-1);dc="date -d"$4 c5 " + 10 years \" \"+%s\""; dc|getline t } t<ct ' monte.txt
Size,ModifiedTime,AccessTime,contentid
4096,"Aug 21, 2008 03:54:28 PM","May 12, 2009 04:45:41 PM",0000011afae4d1227c4df57b410ea52c
518,"Aug 22, 2006 02:12:03 PM","Dec 25, 2007 06:48:18 AM",00000233565d1c17c3135a9504c455ca
$
I have been gathering data for the last 20 days using a bash script that runs every 5 minutes. I started the script with no idea how I was going to output the data. I have since found a rather cool js graph that reads from a CSV.
Only issue is my date is currently in the format of:
Fri Nov 6 07:52:02
and for the CSV I need it to be
2015-11-06 07:52:02
So I need to cat my results grep-ing for the date and convert it.
The cat/grep for the date is:
cat speeds.txt | grep 2015 | awk '{print $1" "$2" "$3" "$4}'
Any brainwaves on how I can switch this around either using bash or php?
Thanks
PS - Starting the checks again using date +%Y%m%d" "%H:%M:%S is sadly not an option :(
Assuming all of your lines contains dates:
$ cat file
Fri Nov 6 07:52:02
...
$ awk 'BEGIN {
months["Jan"] = 1;
months["Feb"] = 2;
months["Mar"] = 3;
months["Apr"] = 4;
months["May"] = 5;
months["Jun"] = 6;
months["Jul"] = 7;
months["Aug"] = 8;
months["Sep"] = 9;
months["Oct"] = 10;
months["Nov"] = 11;
months["Dec"] = 12;
}
{
month = months[$2];
printf("%s-%02d-%02d %s\n", 2015, month, $3, $4);
}' file > out
$ cat out
2015-11-06 07:52:02
...
If you only need to modify a some of the lines you can tweak the awk script a little bit, eg. match every line containing 2015:
...
# Match every line containing 2015
/2015/ {
month = months[$2];
printf("%s-%02d-%02d %s\n", 2015, month, $3, $4);
# Use next to prevent this the other print to happen for these lines
# Like 'continue' in while iterations
next;
};
# This '1' will print all other lines as well:
# Same as writing { print $0 }
1
You can use the date format to epoch time format in bash script.
date -d 'Fri Nov 6 07:52:02' +%s;
1446776522
date -d #1446776522 +"%Y-%m-%d %T "
2015-11-06 07:52:02
Since you didn't provide the input, I'll assume you have a file called speeds.txt that contains:
Fri Oct 31 07:52:02 3
Fri Nov 1 08:12:04 4
Fri Nov 2 07:43:22 5
(the 3, 4, and 5 above are just to show that you could have other data in the row, but are not necessary).
Using this command:
cat speeds.txt | cut -d ' ' -f2,3,4 | while read line ; do date -d"$line" "+2015-%m-%d %H:%M:%S" ; done;
You get the output:
2015-10-31 07:52:02
2015-11-01 08:12:04
2015-11-02 07:43:22
I want to convert 18-Aug-2015 date format to '2015-08-18' using shell script
Try this formatting:
$ date +"%Y-%m-%d"
http://www.cyberciti.biz/faq/linux-unix-formatting-dates-for-display/
The -d option is GNU specific.
Here, you don't need to do date calculation, just rewrite the string which already contains all the information:
a=$(printf '%s\n' "$Prev_date" | awk '{
printf "%04d-%02d-%02d\n", $6, \
(index("JanFebMarAprMayJunJulAugSepOctNovDec",$2)+2)/3,$3}')
Without awk, assuming your initial date is in $mydate:
IFS=- d=($mydate)
months=(Zer Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
z=1
while [[ ${months[$z]} != ${d[1]} ]]; do z=$((z+1)); done
printf "%s-%02d-%s\n" ${d[2]} $z ${d[0]}