I have a csv file with 12 numbers which correspond to the 12 months. An example of the file is as follows:
$ cat data.csv
"3","5","6","5","4","6","7","6","4","4","3","3",
I'd like to plot these with the months in the x-axis using "January, February, March and so on."
I've found this script but I don't know how to input the months:
for FILE in data.csv; do
gnuplot -p << EOF
set datafile separator ","
set xlabel "xlabel"
set ylabel "ylabel"
set title "graphTitle"
plot "$FILE" using $xcolumn:$ycolumn
EOF
done
The expected output should be a plot where the x-axis is the month and the y-axis is the data from the csv file.
Note that in the CSV file there aren't the months, just the numbers. That's why I am asking what is the best way to achieve this without having to enter them manually in the CSV or looping through an array. Is there any gnuplot function that adds the date and can be formatted?
Thank you
If you do not mind typing in the month names, I think the simplest is this. Data is shown in-line for clarity rather than reading from a file.
$DATA << EOD
"3","5","6","5","4","6","7","6","4","4","3","3",
EOD
set datafile sep comma
set xrange [0:13]
unset key
array Month[12] = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
plot for [N=1:12] $DATA using (N):(column(N)):xticlabel(Month[N]) with impulse lw 5
If you do not want to type in the month names, the following should generate the equivalent. "%b" will generate the abbreviated month names as above. "%B"would generate the full month name.
Month(i) = strftime("%b", i * 3600.*24.*28.)
plot for [N=1:12] $DATA using (N):(column(N)):xticlabel(Month(N)) with impulse lw 5
If you don't want to use the loop syntax, there is a way to read your CSV file as a 1x12 matrix. Also, for long month names, you can use gnuplot's strftime function by giving it the format specifier "%B".
The gnuplot script is here.
set key noautotitle
set datafile separator comma
set yrange [0:10]
set xrange [-1:12]
set xtics rotate by -45
set grid xtics
# This function generates the names "January", "February", ...
# from the integer value 0, 1, ...
#
monthname(i) = strftime("%B",strptime("%m",sprintf("%i",i+1)))
# `matrix every ...` specifier tells to read the data as a 1x12 matrix.
#
plot "data.csv" matrix every :::0:11:0 using 1:3:xtic(monthname($1)) with linespoints pt 7
UPDATE: After reviewing OPs post and code some more, I'm guessing the desired format looks like:
January:"3",February:"5",March:"6",April:"5",May:"4",June:"6",July:"7",August:"6",September:"4",October:"4",November:"3",December:"3",
If this is the case, we can use the same solution (below) and pipe the final results through tr to transpose the data back to a single-line/multi-column dataset, eg:
$ paste -d" " <(locale mon | tr ';' '\n') <(tr ',' '\n' < data.csv) | grep -v '^ $' | tr ' \n' ':,'
January:"3",February:"5",March:"6",April:"5",May:"4",June:"6",July:"7",August:"6",September:"4",October:"4",November:"3",December:"3",
And updating OPs code:
datfile=$(mktemp)
for FILE in data.csv
do
paste -d" " <(locale mon | tr ';' '\n') <(tr ',' '\n' < data.csv) | grep -v '^ $' | tr ' \n' ':,' > "${datfile}"
gnuplot -p <<-EOF
set datafile separator ","
set xlabel "xlabel"
set ylabel "ylabel"
set title "graphTitle"
plot "${datfile}" using $xcolumn:$ycolumn
EOF
done
'rm' -rf "${datfile}" > /dev/null 2>&1
Looks like gnuplot can accept data in various formats, including the following:
January "3"
February "5"
March "6"
April "5"
May "4"
June "6"
July "7"
August "6"
September "4"
October "4"
November "3"
December "3"
NOTE: If OP determines this is not an acceptable file format then I'm sure we can come up with something else ... would just need the question updated with a sample of a valid file format showing months and numerics.
So if we can generate this data set on the fly we could then feed it to gnuplot ...
First we'll let locale generate the months for us:
$ locale mon
January;February;March;April;May;June;July;August;September;October;November;December
Next we can transpose our single-line/multi-column datasets to multi-line/single-column datasets:
$ locale mon | tr ';' '\n'
January
February
March
April
May
June
July
August
September
October
November
December
$ tr ',' '\n' < data.csv
"3"
"5"
"6"
"5"
"4"
"6"
"7"
"6"
"4"
"4"
"3"
"3"
From here we can paste these 2 datasets together, using a space as the column delimiter:
$ paste -d" " <(locale mon | tr ';' '\n') <(tr ',' '\n' < data.csv)
January "3"
February "5"
March "6"
April "5"
May "4"
June "6"
July "7"
August "6"
September "4"
October "4"
November "3"
December "3"
One last step would be to write this to a (tmp) file, eg:
$ datfile=$(mktemp)
$ paste -d" " <(locale mon | tr ';' '\n') <(tr ',' '\n' < data.csv) | grep -v '^ $' > "${datfile}"
$ cat "${datfile}"
January "3"
February "5"
March "6"
April "5"
May "4"
June "6"
July "7"
August "6"
September "4"
October "4"
November "3"
December "3"
NOTE: The grep -v '^ $' is to get rid of the extra line at the end related to the last comma (,) in data.csv
From here "${datfile}" can be fed to gnuplot as needed and once no longer needed deleted, eg:
$ gnuplot ... "${datfile}" ...
$ 'rm' -rf "${datfile}" > /dev/null 2>&1
Yet another solution. Because you have a trailing comma and gnuplot expects a number after it, you will get a warning warning: matrix contains missing or undefined values which you can ignore. Therefore, you should limit the x-maximum to smaller 12.
In your case, replace $Data with your filename 'data.csv'. You might want to set another locale (check help locale) to get other languages for the months' names.
Code:
### plot monthly data
reset session
$Data <<EOD
"3","5","6","5","4","6","7","6","4","4","3","3",
EOD
set datafile separator comma
set boxwidth 0.8
set style fill solid 0.5
set yrange[0:10]
set xrange[-0.9:11.9]
myMonth(i) = strftime("%b",i*3600*24*31) # get month name as abbreviation, use %B for full name
plot $Data matrix u 1:0:xtic(myMonth($1)) w boxes title "my data"
### end of code
Result:
One awk solution built around the same logic as the paste answer, but which eliminates a few sub-processes (eg, grep, multiple tr's) ...
awk -F'[;,]' ' # input field delimiters are ";" and ","
BEGIN { OFS=":" ; ORS="," } # set output field delimiter as ":" and output record delimiter as ","
FNR==NR { for (i=1 ; i<=NF ; i++) # loop through fields from first file ...
month[i]=$(i) # store in our month[] array
next # skip to next input line
}
{ for (i=1 ; i< NF ; i++) # loop through fields from second file ...
print month[i],$(i) # print month and current field
}
' <(locale mon) data.csv
This generates:
January:"3",February:"5",March:"6",April:"5",May:"4",June:"6",July:"7",August:"6",September:"4",October:"4",November:"3",December:"3",
Rolling this into OP's code:
datfile=$(mktemp)
for FILE in data.csv
do
awk -F'[;,]' 'BEGIN{OFS=":";ORS=","} FNR==NR {for (i=1;i<=NF;i++) mon[i]=$(i); next} {for (i=1;i<NF;i++) print mon[i],$(i)}' <(locale mon) data.csv > "${datfile}"
gnuplot -p <<-EOF
set datafile separator ","
set xlabel "xlabel"
set ylabel "ylabel"
set title "graphTitle"
plot "${datfile}" using $xcolumn:$ycolumn
EOF
done
'rm' -rf "${datfile}" > /dev/null 2>&1
Related
This question already has answers here:
How to wrap lines within columns in Linux
(2 answers)
Closed 3 years ago.
I am trying to wrap columns of text using gawk or native bash, The fourth column (last one in this case) wraps to the next line. I would like it to wrap and all text remain under the respective heading. The output for the given input is just representative and text needing wrapped is in the last column. However id like to wrap ANY column
I have tried embedding fmt and fold commands in the awk script but have been unsuccessful in getting results required.
awk 'BEGIN{FS="|"; format="%-35s %-7s %-5s %-20s\n"
printf "\n"
printf format, "Date", "Task ID", "Code", "Description"
printf format ,"-------------------------", "-------", "-----", "------------------------------"}
{printf format, strftime("%c",$1), $2, $3, $4}'
INPUT:
1563685965|878|12015|Task HMI starting
1563686011|881|5041|Configured with engine 6000.8403 (/opt/NAI/LinuxShield/engine/lib/liblnxfv.so), dats 9322.0000 (/opt/NAI/LinuxShield/engine/dat), 197 extensions, 0 extra drivers
1563686011|882|5059|Created Scanner child id=1 pid=28,698 engine=6000.8403, dats=9322.0000
1563686139|883|12017|Task HMI Completed 2 items detected in 19 files (0 files timed out, 0 files excluded, 0 files cleaned, 0 files had errors, 0 were not scanned)
1563686139|885|5012|scanned=19 excluded=0 infected=2 cleaned=0 cleanAttempts=0 cleanRequests=0 denied=0 repaired=0 deleted=0 renamed=0 quarantined=0 timeouts=0 errors=0 uptime=174 busy=0 wait=0
I am still unclear on how to post or share on formation on this forum. This seems to work fairly well. The wrap function was taken from the duplicate post.
BEGIN{
format="%-35s %-7s %-10s %-20s\n"
printf "\n"
printf format, "Date", "Task ID", "Code", "Description"
printf format ,"-------------------------", "-------", "-----", "------------------------------"
}
{
split($0,cols,"|")
numLines=1
for(col in cols){
numLines=wrap(cols[col],80,colArr)
for(c in colArr){
fmtcol[col,c] = colArr[c]
}
maxLinesRow[col]=(numLines > maxLinesRow[col] ? numLines : maxLinesRow[col])
}
for (lineNr=1; lineNr<=maxLinesRow[col]; lineNr++) {
dt=((1,lineNr) in fmtcol ? strftime("%c",fmtcol[1,lineNr]):"")
printf format, dt, fmtcol[2,lineNr], fmtcol[3,lineNr], fmtcol[4,lineNr]
}
printf "\n"
delete colArr
}
function wrap(inStr,wid,outArr, lineEnd,numLines) {
while ( length(inStr) > wid ) {
lineEnd = ( match(substr(inStr,1,wid),/.*[[:space:]]/) ? RLENGTH - 1 : wid )
outArr[++numLines] = substr(inStr,1,lineEnd)
inStr = substr(inStr,lineEnd+1)
sub(/^[[:space:]]+/,"",inStr)
}
outArr[++numLines] = inStr
return numLines
}
I know you said you're using gawk, but tabular formatting with line wrapping like you want is really easy to do with perl, so here's a perl solution using the format feature's repeated fill mode:
#!/usr/bin/env perl
use warnings;
use strict;
use POSIX qw/strftime/;
printf "%-40s %-20s\n", 'Date', 'Description';
print '-' x 40, ' ', '-' x 20, "\n";
my ($date, $desc);
while (<>) {
chomp;
($date, $desc) = split '\|', $_;
$date = strftime '%c', localtime($date);
write;
}
format STDOUT =
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<< ~~
$date, $desc
.
Example:
$ cat input.txt
1000|this is some text
2000|this is some other text that is long enough that it will wrap around a bit.
$ perl fmt.pl input.txt
Date Description
---------------------------------------- --------------------
Wed, Dec 31, 1969 4:16:40 PM this is some text
Wed, Dec 31, 1969 4:33:20 PM this is some other
text that is long
enough that it will
wrap around a bit.
I want the code in Bash scripting
"It should print the dates in the below manner
From : 2015-October-03 2015-October-04(in the next line again it should print)
2015-October-10 2015-October-11
" "
" "
To :2017-October-21 2017-October-22
2017-October-28 2017-October-29
So, this should print all the months from the 2015-till date weekend dates in the above format only. please help me at the earliest
The following is the solution for your query.
Solution:-
#!/bin/bash
Date_Diff_Count=` echo $[$[$(date +%s)-$(date -d "2015-01-01" +%s)]/60/60/24] `
for i in ` seq -$Date_Diff_Count 0 `
do
VALUE=`date -d "+$i day" | egrep -i "Sat|Sun" | awk -F" " '{print $2" "$3" "$6}'`
[[ ! -z ${VALUE} ]] && date -d "${VALUE}" +%Y-%B-%d
done > sample.txt
paste -d " " - - < sample.txt
Output
2015-January-03 2015-January-04
2015-January-10 2015-January-11
2015-January-17 2015-January-18
2015-January-24 2015-January-25
2015-January-31 2015-February-01
...
2016-May-07 2016-May-08
2016-May-14 2016-May-15
2016-May-21 2016-May-22
2016-May-28 2016-May-29
...
2017-October-07 2017-October-08
2017-October-14 2017-October-15
2017-October-21 2017-October-22
2017-October-28 2017-October-29
Explanation
Date_Diff_Count is the variable i.e. getting number of days by
subtracting the start date from the current date. Based on your wish
you can edit the start date.
For loop is starting from -Date_Diff_Count to 0 for Ex: if
Date_Diff_Count is 500, for loop sequence starts from -500 to 0.
Value is where we are fetching only year,month and date after doing pipe on the output of date and egrep command.
if value is not zero then we are converting date into the format YYYY-month-DD
Final output will be saved in sample.txt file
Final paste command is to merge 2 consecutive lines into a single line. If you want to merge 3 lines then use paste -d " " - - -
d is delimiter to separate the merged lines. You can use any other operators based on your requirements.
sname;id;is_up;p_data
"0000";4256;100;"052263"
"006335";5228;100;"00522633"
"ABTEST";1452;100;"1522620 0"
How to i Edit the above file in unix to
Add 2 lines at the top for title and System date and time
Add ; at the end of each each row
Add the End tag at the end of the file
The final file should look like
!title
!Time: 2014-12-33
sname;id;is_up;p_data
"0000";4256;100;"052263";
"006335";5228;100;"00522633";
"ABTEST";1452;100;"1522620 0";
!End
Use sed to add ; to the end of line. If you want to skip the first line from this operation, use the address 1!.
{
echo '!title'
echo -n '!Time: '
date +%Y-%m-%d
sed '1! s/$/;/' file
echo '!End'
} > newfile
You can make use of BEGIN and END as this, together with $0=$0";" to add a ; at the end of the lines not being first one:
awk -v d="$(date)" '
BEGIN{print "!title"; print "!Time: ", d}
NR>1{$0=$0";"} 1;
END {print "!End"}
' file
See output:
$ awk -v d="$(date)" 'BEGIN{print "!title"; print "!Time: ", d} NR>1{$0=$0";"} 1; END {print "!End"}' file
!title
!Time: Wed Apr 30 15:25:53 CEST 2014
sname;id;is_up;p_data
"0000";4256;100;"052263";
"006335";5228;100;"00522633";
"ABTEST";1452;100;"1522620 0";
!End
For the date format, you should define which one you want. date alone prints everything. I would maybe go for:
$ date "+%F %T"
2014-04-30 15:41:04
How can I convert one date format to another format in a shellscript?
Example:
the old format is
MM-DD-YY HH:MM
but I want to convert it into
YYYYMMDD.HHMM
Like "20${D:6:2}${D:0:2}${D:3:2}.${D:9:2}${D:12:2}00", if the old date in the $D variable.
Take advantage of the shell's word splitting and the positional parameters:
date="12-31-11 23:59"
IFS=" -:"
set -- $date
echo "20$3$1$2.$4$5" #=> 20111231.2359
myDate="21-12-11 23:59"
#fmt is DD-MM-YY HH:MM
outDate="20${myDate:6:2}${myDate:3:2}${myDate:0:2}.${myDate:9:2}${myDate:12:2}00"
case "${outDate}" in
2[0-9][0-9][0-9][0-1][0-9][0-3][0-9].[0-2][0-9][0-5][[0-9][0-5][[0-9] )
: nothing_date_in_correct_format
;;
* ) echo bad format for ${outDate} >&2
;;
esac
Note that if you have a large file to process, then the above is an expensive(ish) process. For filebased data I would recommend something like
cat infile
....|....|21-12-11 23:59|22-12-11 00:01| ...|
awk '
function reformatDate(inDate) {
if (inDate !~ /[0-3][0-9]-[0-1][0-9]-[0-9][0-9] [0-2][0-9]:[0-5][[0-9]/) {
print "bad date format found in inDate= "inDate
return -1
}
# in format assumed to be DD-MM-YY HH:MM(:SS)
return (2000 + substr(inDate,7,2) ) substr(inDate,4,2) substr(inDate, 1,2) \
"." substr(inDate,10,2) substr(inDate,13,2) \
( substr(inDate,16,2) ? substr(inDate,16,2) : "00" )
}
BEGIN {
#add or comment out for each column of data that is a date value to convert
# below is for example, edit as needed.
dateCols[3]=3
dateCols[4]=4
# for awk people, I call this the pragmatic use of associative arrays ;-)
#assuming pipe-delimited data for columns
#....|....|21-12-11 23:59|22-12-11 00:01| ...|
FS=OFS="|"
}
# main loop for each record
{
for (i=1; i<=NF; i++) {
if (i in dateCols) {
#dbg print "i=" i "\t$i=" $i
$i=reformatDate($i)
}
}
print $0
}' infile
output
....|....|20111221.235900|20111222.000100| ...|
I hope this helps.
There is a good answer down already, but you said you wanted an alternative in the comments, so here is my [rather awful in comparison] method:
read sourcedate < <(echo "12-13-99 23:59");
read sourceyear < <(echo $sourcedate | cut -c 7-8);
if [[ $sourceyear < 50 ]]; then
read fullsourceyear < <(echo -n 20; echo $sourceyear);
else
read fullsourceyear < <(echo -n 19; echo $sourceyear);
fi;
read newsourcedate < <(echo -n $fullsourceyear; echo -n "-"; echo -n $sourcedate | cut -c -5);
read newsourcedate < <(echo -n $newsourcedate; echo -n $sourcedate | cut -c 9-14);
read newsourcedate < <(echo -n $newsourcedate; echo :00);
date --date="$newsourcedate" +%Y%m%d.%H%M%S
So, the first line just reads a date in, then we get the two-digit year, then we append it to '20' or '19' based on if it's less than 50 (so this would give you years from 1950 to 2049 - feel free to shift the line). Then we append a hyphen and the month and date. Then we append a space and the time, and lastly we append ':00' as the seconds (again feel free to make your own default). Lastly we use GNU date to read it in (since it's been standardized now) and print it in a different format (which you can edit).
It's a lot longer and uglier than cutting up the string, but having the format in the last line may be worth it. Also you could shorten it significantly with the shorthand you just learned in the first answer.
Good luck.
I have this awk script that runs through a file and counts every occurrence of a given date. The date format in the original file is the standard date format, like this: Thu Mar 5 16:46:15 EST 2009 I use awk to throw away the weekday, time, and timezone, and then do my counting by pumping the dates into an associative array with the dates as indices.
In order to get the output to be sorted by date, I converted the dates to a different format that I could sort with bash sort.
Now, my output looks like this:
Date Count
03/05/2009 2
03/06/2009 1
05/13/2009 7
05/22/2009 14
05/23/2009 7
05/25/2009 7
05/29/2009 11
06/02/2009 12
06/03/2009 16
I'd really like the output to have more human readable dates, like this:
Mar 5, 2009
Mar 6, 2009
May 13, 2009
May 22, 2009
May 23, 2009
May 25, 2009
May 29, 2009
Jun 2, 2009
Jun 3, 2009
Any suggestions for a way I could do this? If I could do this on the fly when I output the count values that would be best.
UPDATE:
Here's my solution incorporating ghostdog74's example code:
grep -i "E[DS]T 2009" original.txt | awk '{printf "%s %2.d, %s\r\n",$2,$3,$6}' >dates.txt #outputs dates for counting
date -f dates.txt +'%Y %m %d' | awk ' #reformat dates as YYYYMMDD for future sort
{++total[$0]} #pump dates into associative array
END {
for (item in total) printf "%s\t%s\r\n", item, total[item] #output dates as yyyy mm dd with counts
}' | sort -t \t | awk ' #send to sort, then to cleanup
BEGIN {printf "%s\t%s\r\n","Date","Count"}
{t=$1" "$2" "$3" 0 0 0" #cleanup using example by ghostdog74
printf "%s\t%2.d\r\n",strftime("%b %d, %Y",mktime(t)),$4
}'
rm dates.txt
Sorry this looks so messy. I've tried to put clarifying comments in.
Use awk's sort and date's stdin to greatly simplify the script
Date will accept input from stdin so you can eliminate one pipe to awk and the temporary file. You can also eliminate a pipe to sort by using awk's array sort and as a result, eliminate another pipe to awk. Also, there's no need for a coprocess.
This script uses date for the monthname conversion which would presumably continue to work in other languages (ignoring the timezone and month/day order issues, though).
The end result looks like "grep|date|awk". I have broken it into separate lines for readability (it would be about half as big if the comments were eliminated):
grep -i "E[DS]T 2009" original.txt |
date -f - +'%Y %m %d' | #reformat dates as YYYYMMDD for future sort
awk '
BEGIN { printf "%s\t%s\r\n","Date","Count" }
{ ++total[$0] #pump dates into associative array }
END {
idx=1
for (item in total) {
d[idx]=item;idx++ # copy the array indices into the contents of a new array
}
c=asort(d) # sort the contents of the copy
for (i=1;i<=c;i++) { # use the contents of the copy to index into the original
printf "%s\t%2.d\r\n",strftime("%b %e, %Y",mktime(d[i]" 0 0 0")),total[d[i]]
}
}'
I get testy when I see someone using grep and awk (and sed, cut, ...) in a pipeline. Awk can fully handle the work of many utilities.
Here's a way to clean up your updated code to run in a single instance of awk (well, gawk), and using sort as a co-process:
gawk '
BEGIN {
IGNORECASE = 1
}
function mon2num(mon) {
return(((index("JanFebMarAprMayJunJulAugSepOctNovDec", mon)-1)/3)+1)
}
/ E[DS]T [[:digit:]][[:digit:]][[:digit:]][[:digit:]]/ {
month=$2
day=$3
year=$6
date=sprintf("%4d%02d%02d", year, mon2num(month), day)
total[date]++
human[date] = sprintf("%3s %2d, %4d", month, day, year)
}
END {
sort_coprocess = "sort"
for (date in total) {
print date |& sort_coprocess
}
close(sort_coprocess, "to")
print "Date\tCount"
while ((sort_coprocess |& getline date) > 0) {
print human[date] "\t" total[date]
}
close(sort_coprocess)
}
' original.txt
if you are using gawk
awk 'BEGIN{
s="03/05/2009"
m=split(s,date,"/")
t=date[3]" "date[2]" "date[1]" 0 0 0"
print strftime("%b %d",mktime(t))
}'
the above is just an example, as you did not show your actual code and so cannot incorporate it into your code.
Why don't you prepend your awk-date to the original date? This yields a sortable key, but is human readable.
(Note: to sort right, you should make it yyyymmdd)
If needed, cut can remove the prepended column.
Gawk has strftime(). You can also call the date command to format them (man). Linux Forums gives some examples.