How to grep files from specific date to EOF awk - bash

I have a little problem with printing data from file from date to end of file, namely, I have file:
2016/08/10-12:45:14.970000 <some_data> <some_data> ...
2016/08/10-12:45:15.970000 <some_data> <some_data> ...
2016/08/10-12:45:18.970000 <some_data> <some_data> ...
2016/08/10-12:45:19.970000 <some_data> <some_data> ...
And this file has hundreds lines.
And I have to print file from one point in the time to end of file but I don't know precise time when row in logfile appeared.
And I need to print data from date 2016/08/10-12:45:16to end of file, I want to receive file looks like that:
2016/08/10-12:45:18.970000
2016/08/10-12:45:19.970000
OK if I know specific date from which I want to print data everything is easy
awk '/<start_time>/,/<end/'
awk '/2016\/08\/10-12:45:18/,/<end/'
But if I don't know specific date, I know only approximate date 2016/08/10-12:45:16 it's harder.
Can any one please help me?

You can benefit from the fact that the time format you are using supports alphanumerical comparison. With awk the command can look like this:
awk -v start='2016/08/10-12:45:16' '$1>=start' file

You can use mktime function of awk to check for time:
awk -v TIME="2016/08/10-12:45:16" '
BEGIN{
gsub("[/:-]"," ",TIME)
reftime=mktime(TIME)
}
{
t=$1
sub("[0-9]*$","",t)
gsub("[/:-]"," ",t)
if(mktime(t)>reftime)
print
}' file
This script take your reference time and convert it into number and then compare it to time found in the file.
Note the sub and gsub are only to convert your specific time format to the time format understood by awk.

You should be able to do this simply with awk:
awk '{m = "2016/08/10-12:45:18"} $0 ~ m,0 {print}' file
If you weren't sure exactly the time or date you could do:
awk '{m = "2016/08/10-12:45:1[6-8]"} $0 ~ m,0 {print}' file
This should print from your specified date and time around 12:45 +16-18 seconds to the files end. The character class [6-8] treats the seconds as a range from the original time 12:45:1...
Output:
2016/08/10-12:45:18.970000 somedata 3
2016/08/10-12:45:19.970000 somedata 4

Related

Bash: Obtain epoch time from date time string taken from file using awk

First of all, sorry as the title might not be a good one. Actually can't really think of a good title but I'll try to explain as much as I can here, so here goes,
I have a file called timeInfo which contains Date Time string in the following format,
2018-06-05 00:35:51 Controller shutdown process initiated
2018-06-05 05:32:22 Controller startup process initiated
...
...
Now what I'm trying to do is, I need to get this time and convert it into EPOCH and store it into a temporary file tempFile, what I've tried so far is ...
//$file points to **timeInfo** file
echo `grep 'Controller startup' $file | date -d "`awk '{ print $1,$2 }'`" >> $TEMP_FILE`
On using this I get the following error,
command substitution: line 73: unexpected EOF while looking for matching `"'
Then I tried a different approach and used the following code,
echo `grep "Controller startup" $file | awk '{print $1,$2}' >> $TEMP_FILE`
With this I get a file tempFile with the following info,
2018-06-06 00:35:31
2018-06-06 00:51:32
Which seems to be much better but I need to have it in EPOCH! Is there a way I can change the above script to save date time string in EPOCH format inside tempFile.
Hoping to hear your suggestion! Thank you
this may be what you want, needs gawk
$ awk '{t=$1 FS $2; gsub(/[-:]/," ",t); print mktime(t)}' file
1528173351
1528191142
or perhaps this
$ awk '/Controller (startup|shutdown)/{t=$1 FS $2;
gsub(/[-:]/," ",t);
print mktime(t)}' file
cat logfile|awk '{print($1,$2)}' |xargs -n 1 -I_ date +'%s' --date=_
Cat the file. Then awk the first two fields. and the using "xargs -n 1" passing one one data at a time to the date command and the using %s to get epoch.

Adding months using shell script

Currently I have a below record in a file.
ABC,XYZ,123,Sep-2018
Looking for a command in linux which will add months and give the output. For example If I want to add 3 months. Expected output is:
ABC,XYZ,123,Dec-2018
Well,
date -d "1-$(echo "ABC,XYZ,123,Sep-2018" | awk -F "," '{ print $4 }')+3 months" "+%b-%Y"
(Careful, that code continues past the edge of the box.)
Shows you how to get it working. Just replace the echo with a shell variable as you loop through the dates.
Basically, you use awk to grab just the date portion, add a 1- to the front to turn it into a real date then use the date command to do the math and then tell it to give you just the month abbreviation and year.
The line above gives just the date portion. The first part can be found using:
stub=`echo "ABC,XYZ,123,Dec-2018" | awk -F "," '{ printf("%s,%s,%s,",$1,$2,$3) }'`
You can use external date or (g)awk's datetime related function to do it. However you have to prepare the string to parse. Here is another way to do the job:
First prepare an index file, we name it month.txt:
Jan
Feb
......
...
Nov
Dec
Then run this:
awk -F'-|,' -v OFS="," 'NR==FNR{m[NR]=$1;a[$1]=NR;next}
{i=a[$4]; if(i==12){i=1;++$5}else i++
$4=m[i]"-"$5;NF--}7' month.txt file
With this example file:
ABC,XYZ,123,Jan-2018
ABC,XYZ,123,Nov-2018
ABC,XYZ,123,Dec-2018
You will get:
ABC,XYZ,123,Feb-2018
ABC,XYZ,123,Dec-2018
ABC,XYZ,123,Jan-2019
update
Oh, I didn't notice that you want to add 3 months. Here is the updated codes for it:
awk -F'-|,' -v OFS="," 'NR==FNR{m[NR]=$1;a[$1]=NR;next}
{i=a[$4]+3; if(i>12){i=i-12;++$5}
$4=m[i]"-"$5;NF--}7' month.txt file
Now with the same input, you get:
ABC,XYZ,123,Apr-2018
ABC,XYZ,123,Feb-2019
ABC,XYZ,123,Mar-2019

How to strip date in csv output using shell script?

I have a few csv extracts that I am trying to fix up the date on, they are as follows:
"Time Stamp","DBUID"
2016-11-25T08:28:33.000-8:00,"5tSSMImFjIkT0FpiO16LuA"
The first column is always the "Time Stamp", I would like to convert this so it only keeps the date "2016-11-25" and drops the "T08:28:33.000-8:00".
The end result would be..
"Time Stamp","DBUID"
2016-11-25,"5tSSMImFjIkT0FpiO16LuA"
There are plenty of files with different dates.
Is there a way to do this in ksh? Some kind of for each loop to loop through all the files and replace the long time-stamp and leave just the date?
Use sed:
$ sed '2,$s/T[^,]*//' file
"Time Stamp","DBUID"
2016-11-25,"5tSSMImFjIkT0FpiO16LuA"
How it works:
2,$ # Skip header (first line) removing this will make a
# replacement on the first line as well.
s/T[^,]*// # Replace everything between T (inclusive) and , (exclusive)
# `[^,]*' Matches everything but `,' zero or more times
Here's one solution using a standard aix utility,
awk -F, -v OFS=, 'NR>1{sub(/T.*$/,"",$1)}1' file > file.cln && mv file.cln file
output
"Time Stamp","DBUID"
2016-11-25,"5tSSMImFjIkT0FpiO16LuA"
(but I no longer have access to an aix environment, so only tested with my local awk).
NR>1 skips the header line, and the sub() is limited to only the first field (up to the first comma). The trailing 1 char is awk shorthand for {print $0}.
If your data layout changes and you get extra commas in your data, this may required fixing.
IHTH
Using sed:
sed -i "s/\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\).*,/\1-\2-\3,/" file.csv
Output:
"Time Stamp","DBUID"
2016-11-25,"5tSSMImFjIkT0FpiO16LuA"
-i edit files inplace
s substitute
This is a perfect job for awk, but unlike the previous answer, I recommend using the substring function.
awk -F, 'NR > 1{$1 = substr($1,1,10)} {print $0}' file.txt
Explanation
-F,: The -F flag sets a field separator, in this case a comma
NR > 1: Ignore the first row
$1: Refers to the first field
$1 = substr($1,1,10): Sets the first field to the first 10 characters of the field. In the example, this is the date portion
print $0: This will print the entire row

bash delete line condition

I couldn't find a solution to conditionally delete a line in a file using bash. The file contains year dates within strings and the corresponding line should be deleted only if the year is lower than a reference value.
The file looks like the following:
'zg_Amon_MPI-ESM-LR_historical_r1i1p1_196001-196912.nc' 'MD5'
'zg_Amon_MPI-ESM-LR_historical_r1i1p1_197001-197912.nc' 'MD5'
'zg_Amon_MPI-ESM-LR_historical_r1i1p1_198001-198912.nc' 'MD5'
'zg_Amon_MPI-ESM-LR_historical_r1i1p1_199001-199912.nc' 'MD5'
'zg_Amon_MPI-ESM-LR_historical_r1i1p1_200001-200512.nc' 'MD5'
I want to get the year 1969 from line 1 and compare it to a reference (let's say 1980) and delete the whole line if the year is lower than the reference. This means in this case the code should remove the first two lines of the file.
I tried with sed and grep, but couldn't get it working.
Thanks in advance for any ideas.
You can use awk:
awk -F- '$4 > 198000 {print}' filename
This will output all the lines where the second date is later than 31/12/1979. This will not edit the file in-place, you would have to save the output to another file then move that in place of the original:
awk -F- '$4 > 198000 {print}' filename > tmp && mv tmp filename
Using sed (will edit in-place):
sed -i '/.*19[0-7][0-9]..\.nc/d' filename
This requires a little more thought, in that you will need to construct a regex to match any values which you don't want to be displayed.
Perhaps something like this:
awk -F- '{ if (substr($4,1,4) >= 1980) print }' input.txt

Grepping progressively through large file

I have several large data files (~100MB-1GB of text) and a sorted list of tens of thousands of timestamps that index data points of interest. The timestamp file looks like:
12345
15467
67256
182387
199364
...
And the data file looks like:
Line of text
12345 0.234 0.123 2.321
More text
Some unimportant data
14509 0.987 0.543 3.600
More text
15467 0.678 0.345 4.431
The data in the second file is all in order of timestamp. I want to grep through the second file using the time stamps of the first, printing the timestamp and fourth data item in an output file. I've been using this:
grep -wf time.stamps data.file | awk '{print $1 "\t" $4 }' >> output.file
This is taking on the order of a day to complete for each data file. The problem is that this command searches though the entire data file for every line in time.stamps, but I only need the search to pick up from the last data point. Is there any way to speed up this process?
You can do this entirely in awk …
awk 'NR==FNR{a[$1]++;next}($1 in a){print $1,$4}' timestampfile datafile
JS웃's awk solution is probably the way to go. If join is available and the first field of the irrelevant "data" is not numeric, you could exploit the fact that the files are in the same order and avoid a sorting step. This example uses bash process substitution on linux
join -o2.1,2.4 -1 1 -2 1 key.txt <(awk '$1 ~ /^[[:digit:]]+$/' data.txt)
'grep' has a little used option -f filename which gets the patterns from filename and does the matching. It is likely to beat the awk solution and your timestamps would not have to be sorted.

Resources