Get lines from a specific date afterwards/backwards - shell

I'm working on a shell script.I need to get lines which contain informations from that day or older than that. In the file each line is a record and the first line is the older the last is the latest.
The file contains:
some info \t the date \t other info
If I simply grep the given date I find what I'm looking for but just if that date's present in the file. I find the last occurance and get the lines from the start of the file. I tried awk but I totally failed. It should give me each line what contain that date or is older. My failure and last attempt:
awk '$1 <= "2015/03/17"'
So I need something similar to egrep but which gives me all lines with the date of 2015/03/15 or older. Or do I have to go trought each line and compare the 2 dates, according to that write out if it's older?

Should be pretty easy since the date format you are using can be compared as characters..
File: test.txt
a 2015/03/17 b
c 2015/03/12 d
e 2014/02/10 f
g 2016/01/01 h
Awk command:
awk -v d="2015/03/15" '{if ($2 <= d) {print $0}}' test.txt
Just change d= to whatever date value you want.

Related

Awk - filter only dates with certain format from text file

I have a .txt file with many lines of text on macOS. I would like to filter only dates and have them saved in order of appearance line by line in a new text file.
I am, however, not interested in all dates, only in those who are complete, looking like 02/03/2019, and those where the number of days is below 13, i. e. 01...12.
Then, I would like to have those dates removed where the number for the day and month are the same like 01/01/2019 and 02/02/2019 etc.
How can I achieve this with awk or similar software in bash?
If perl is a choice:
perl -ne 'print if m:(\d\d)/(\d\d)/(\d\d\d\d): && $1 < 13 && $1 != $2' dates.txt >newdates.txt
this assumes this format /dd/mm/yyyy
Note that I am using a m: : notation instead of the usual / / for regex matching. Thus I do not need to escape the / slashes in the date.
Deleting Dates Inside a Text File
The following command will delete all dates of the form✱ aa/bb/cccc where aa = bb < 13. The original file will be copied to yourFile.txt.bak as a backup and the new text with deleted dates will overwrite the old file.
sed -E -i.bak 's:\b(0[0-9]|1[0-2])/\1/[0-9]{4}\b::g' yourFile.txt
If you want to insert something instead of just deleting the dates you can do so by writing the replacement between the two ::. For instance sed … 's:…:deleted date:/g' … will replace each matching date with the text deleted date.
✱ Note that it doesn't matter for your criterion whether the date format is dd/mm/yyyy or mm/dd/yyyy since your are only interested in dates where dd and mm are equal.
Extracting Specific Dates From A Text File
If you do not want to delete, but only extract specific dates as mentioned in your comment, you can use the following command.
grep -Eo '\b([0-9]{2}/){2}[0-9]{4}\b' yourFile.txt | awk -F/ '$1<13 && $1!=$2'
This will extract all dates in dd/mm/yyyy (!) format where mm ≠ dd < 13. The dates are printed in order of appearance on stdin. If you want to save them to a file append > yourOutputFile.txt to the end of the command.

Adding months using shell script

Currently I have a below record in a file.
ABC,XYZ,123,Sep-2018
Looking for a command in linux which will add months and give the output. For example If I want to add 3 months. Expected output is:
ABC,XYZ,123,Dec-2018
Well,
date -d "1-$(echo "ABC,XYZ,123,Sep-2018" | awk -F "," '{ print $4 }')+3 months" "+%b-%Y"
(Careful, that code continues past the edge of the box.)
Shows you how to get it working. Just replace the echo with a shell variable as you loop through the dates.
Basically, you use awk to grab just the date portion, add a 1- to the front to turn it into a real date then use the date command to do the math and then tell it to give you just the month abbreviation and year.
The line above gives just the date portion. The first part can be found using:
stub=`echo "ABC,XYZ,123,Dec-2018" | awk -F "," '{ printf("%s,%s,%s,",$1,$2,$3) }'`
You can use external date or (g)awk's datetime related function to do it. However you have to prepare the string to parse. Here is another way to do the job:
First prepare an index file, we name it month.txt:
Jan
Feb
......
...
Nov
Dec
Then run this:
awk -F'-|,' -v OFS="," 'NR==FNR{m[NR]=$1;a[$1]=NR;next}
{i=a[$4]; if(i==12){i=1;++$5}else i++
$4=m[i]"-"$5;NF--}7' month.txt file
With this example file:
ABC,XYZ,123,Jan-2018
ABC,XYZ,123,Nov-2018
ABC,XYZ,123,Dec-2018
You will get:
ABC,XYZ,123,Feb-2018
ABC,XYZ,123,Dec-2018
ABC,XYZ,123,Jan-2019
update
Oh, I didn't notice that you want to add 3 months. Here is the updated codes for it:
awk -F'-|,' -v OFS="," 'NR==FNR{m[NR]=$1;a[$1]=NR;next}
{i=a[$4]+3; if(i>12){i=i-12;++$5}
$4=m[i]"-"$5;NF--}7' month.txt file
Now with the same input, you get:
ABC,XYZ,123,Apr-2018
ABC,XYZ,123,Feb-2019
ABC,XYZ,123,Mar-2019

How to strip date in csv output using shell script?

I have a few csv extracts that I am trying to fix up the date on, they are as follows:
"Time Stamp","DBUID"
2016-11-25T08:28:33.000-8:00,"5tSSMImFjIkT0FpiO16LuA"
The first column is always the "Time Stamp", I would like to convert this so it only keeps the date "2016-11-25" and drops the "T08:28:33.000-8:00".
The end result would be..
"Time Stamp","DBUID"
2016-11-25,"5tSSMImFjIkT0FpiO16LuA"
There are plenty of files with different dates.
Is there a way to do this in ksh? Some kind of for each loop to loop through all the files and replace the long time-stamp and leave just the date?
Use sed:
$ sed '2,$s/T[^,]*//' file
"Time Stamp","DBUID"
2016-11-25,"5tSSMImFjIkT0FpiO16LuA"
How it works:
2,$ # Skip header (first line) removing this will make a
# replacement on the first line as well.
s/T[^,]*// # Replace everything between T (inclusive) and , (exclusive)
# `[^,]*' Matches everything but `,' zero or more times
Here's one solution using a standard aix utility,
awk -F, -v OFS=, 'NR>1{sub(/T.*$/,"",$1)}1' file > file.cln && mv file.cln file
output
"Time Stamp","DBUID"
2016-11-25,"5tSSMImFjIkT0FpiO16LuA"
(but I no longer have access to an aix environment, so only tested with my local awk).
NR>1 skips the header line, and the sub() is limited to only the first field (up to the first comma). The trailing 1 char is awk shorthand for {print $0}.
If your data layout changes and you get extra commas in your data, this may required fixing.
IHTH
Using sed:
sed -i "s/\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\).*,/\1-\2-\3,/" file.csv
Output:
"Time Stamp","DBUID"
2016-11-25,"5tSSMImFjIkT0FpiO16LuA"
-i edit files inplace
s substitute
This is a perfect job for awk, but unlike the previous answer, I recommend using the substring function.
awk -F, 'NR > 1{$1 = substr($1,1,10)} {print $0}' file.txt
Explanation
-F,: The -F flag sets a field separator, in this case a comma
NR > 1: Ignore the first row
$1: Refers to the first field
$1 = substr($1,1,10): Sets the first field to the first 10 characters of the field. In the example, this is the date portion
print $0: This will print the entire row

Bash comparing two different files with different fields

I am not sure if this is possible to do but I want to compare two character values from two different files. If they match I want to print out the field value in slot 2 from one of the files. Here is an example
# File 1
Date D
Tamb B
# File 2
F gge0001x gge0001y gge0001z
D 12-30-2006 12-30-2006 12-30-2006
T 14:15:20 14:15:55 14:16:27
B 15.8 16.1 15
Here is my thought behind the problem I want to do
if [ (field2) from (file1) == (field1) from (file2) ] ; do
echo (field1 from file1) and also (field2 from file2) on the same line
which prints out "Date 12-30-2006"
"Tamb 15.8"
" ... "
and continually run through every line from essentially file 1 printing out any matches that there are. I am assuming these will need to be some sort of array involved. Any thoughts on if this is the correct logic and if this is even possible?
This reformats file2 based on the abbreviations found in file1:
$ awk 'FNR==NR{a[$2]=$1;next;} $1 in a {print a[$1],$2;}' file1 file2
Date 12-30-2006
Tamb 15.8
How it works
FNR==NR{a[$2]=$1;next;}
This reads each line of file1 and saves the information in array a.
In more detail, NR is the number of lines that have been read in so far and FNR is the number of lines that have been read in so far from the current file. So, when NR==FNR, we know that awk is still processing the first file. Thus, the array assignment, a[$2]=$1 is only performed for the first file. The statement next tells awk to skip the rest of the code and jump to the next line.
$1 in a {print a[$1],$2;}
Because of the next statement, above, we know that, if we get to this line, we are working on file2.
If field 1 of file2 matches any a field 2 of file1, then print a reformatted version of the line.

bash delete line condition

I couldn't find a solution to conditionally delete a line in a file using bash. The file contains year dates within strings and the corresponding line should be deleted only if the year is lower than a reference value.
The file looks like the following:
'zg_Amon_MPI-ESM-LR_historical_r1i1p1_196001-196912.nc' 'MD5'
'zg_Amon_MPI-ESM-LR_historical_r1i1p1_197001-197912.nc' 'MD5'
'zg_Amon_MPI-ESM-LR_historical_r1i1p1_198001-198912.nc' 'MD5'
'zg_Amon_MPI-ESM-LR_historical_r1i1p1_199001-199912.nc' 'MD5'
'zg_Amon_MPI-ESM-LR_historical_r1i1p1_200001-200512.nc' 'MD5'
I want to get the year 1969 from line 1 and compare it to a reference (let's say 1980) and delete the whole line if the year is lower than the reference. This means in this case the code should remove the first two lines of the file.
I tried with sed and grep, but couldn't get it working.
Thanks in advance for any ideas.
You can use awk:
awk -F- '$4 > 198000 {print}' filename
This will output all the lines where the second date is later than 31/12/1979. This will not edit the file in-place, you would have to save the output to another file then move that in place of the original:
awk -F- '$4 > 198000 {print}' filename > tmp && mv tmp filename
Using sed (will edit in-place):
sed -i '/.*19[0-7][0-9]..\.nc/d' filename
This requires a little more thought, in that you will need to construct a regex to match any values which you don't want to be displayed.
Perhaps something like this:
awk -F- '{ if (substr($4,1,4) >= 1980) print }' input.txt

Resources