Last Day of Month in csvfile - bash

i try to delete all days of a csv file which not matched last days. But I find not the right solution.
date,price
2018-07-02,162.17
2018-06-29,161.94
2018-06-28,162.22
2018-06-27,162.32
2018-06-12,163.01
2018-06-11,163.53
2018-05-31,164.87
2018-05-30,165.59
2018-05-29,165.42
2018-05-25,165.96
2018-05-02,164.94
2018-04-30,166.16
2018-04-27,166.69
The output I want become
date,price
2018-06-29,161.94
2018-05-31,164.87
2018-04-30,166.16
I try it with cut + grep
cut -d, -f1 file.csv | grep -E "28|29|30"
Work but bring nothing when combine -f1,2.
I find csvkit which seem to me the right tool, but I find not the solution for multiple grep.
csvgrep -c 1 -m 30 file.csv
Bring me the right result but how can combine multiple search option? I try -m 28,29,30 and -m 28 -m 29 -m 30 all work not. Best it work with last day of every month.
Maybe one have here a idea.
Thank you and nice Sunday
Silvio

You want to get all records of the LAST day of the month. But months vary in length (28-29-30-31).
I don't see why you used cut to extract the first field (the date part), because the data in the second field does not look like dates at all (xx-xx).
I suggest to use grep directly to display the lines that matches the following pattern mm-dd; where mm is the month number, and dd is the last day of the month.
This command should do the trick:
grep -E "01-31|02-(28|29)|03-31|04-30|05-31|06-30|07-31|08-30|09-31|10-30|11-31|12-30" file.csv
This command will give the following output:
2018-05-31,164.87
2018-04-30,166.16

Related

Bash function to "use" the most recent "dated" file in a dir

I have a dir with a crap load (hundreds) of log files from over time. In certain cases I want to make a note regarding the most recent (by date in filename, not by creation time) log or I just need some piece of info from it and i want to view it quickly and I just know it was (usually) the last one created (but always) with the newest date. So I wanted to make a "simple" function in my bashrc to overcome this problem, basically what I want is a function that goes to a specific dir and finds the latest log by date (always in the same format) and open it with less or whatever pager I want.
The logs are formatted like this:
typeoflog-short-description-$(date "+%-m-%-d-%y")
basically the digits in between the last 3 dashes are what I'm interested in, for example(s):
update-log-2-24-18
removed-cuda-opencl-nvidia-12-2-19
whatever-changes-1-18-19
Now if it was January, 20 2019 and this was the last log added to the dir I need a way to see what the highest number is in the last 2 digits of the filename (that i don't really have a problem with), then check for the highest month that would be 2 "dashes" from the last set of digits whether it be 2 digits or 1 for the month, and then do the same thing for the day of the month and set that as a local variable and use it like the following example.
Something like this:
viewlatestlog(){
local loc="~/.logdir"
local name=$(echo $loc/*-19 | #awk or cut or sort or i could even loop it from 1-31 and 1-12 for the days and months.)
#I have ideas, but i know there has to be a better way to do this and it's not coming to me, maybe with expr or a couple of sort commands; i'm not sure, it would have been easier if i had made is so that each date number had 2 digits always... But I didn't
## But the ultimate goal is that i can run something like this command at the end
less $loc/$name
{
PS. For bonus points you could also tell me if there is a way to automatically copy the filename (with the location and all or without, I don't really care) to my linux clipboard, so when I'm making my note I can "link" to the log file if I ever need to go back to it...
Edit: Cleaned up post a little bit, I tend to my questions way too wordy, I apologize.
GNU sort can sort by fields:
$ find . -name whatever-changes-\* | sort -n -t- -k5 -k3 -k4
./whatever-changes-3-01-18
./whatever-changes-1-18-19
./whatever-changes-2-12-19
./whatever-changes-11-01-19
The option -t specifies the field delimiter and the option -k selects the fields starting with 1. The option -n specifies numeric sort.
Assuming your filenames do not contain tabs or newlines, how about:
loc="~/.logdir"
for f in "$loc"/* ; do
if [[ $f =~ -([0-9]{1,2})-([0-9]{1,2})-([0-9]{2})$ ]]; then
mm=${BASH_REMATCH[1]}
dd=${BASH_REMATCH[2]}
yy=${BASH_REMATCH[3]}
printf "%02d%02d%02d\t%s\n" "$yy" "$mm" "$dd" "$f"
fi
done | sort -r | head -n 1 | cut -f 2
First extract the month, date, and year from the filename.
Then create a date string formatted as "YYMMDD" and prepend to the
filename delimited by a tab character.
Then you can perform the sort command on the list.
Finally you can obtain the desired (latest) filename by extracting with top and cut.
Hope this helps.

Pick Oldest file on basis of date in Name of the file

I am stuck in one situation where I am having a bunch of files and I need to pick the oldest one on the basis of time present in name only. Not on basis of the timestamp as I am doing SCP from one system to another so timestamp would be same for all the files once SCP runs
I have files like
UAT-2019-03-21-16-31.csv
UAT-2019-03-21-17-01.csv
AIT-2019-03-21-17-01.csv
Here, 2019 represents the year, 03 the month, 21 the day, 16 the hours in 24-hour format and 31 represent the minutes.
I need to pick the UAT-2019-03-21-16-31.csv file from the above files first.
How can I do in shell scripting.
I tried doing ls -1 but it will sort alphabetically, that means AIT-2019-03-21-17-01.csv will be picked first, but I need according to time mentioned in the file name
You can try this
ls -1 | sort -t"-" -k2 -k3 -k4 -k5 -k6 | head -n1
Output :
UAT-2019-03-21-16-31.csv
Curious about alternatives answer as I know that parsing ls output is not ideal.
The best and efficient way to do this would be to convert the filename time stamp to epoch time and find the oldest among them.
You need to write a script that does below in order:
Get all the filename timestamp into a variable.
Convert all filename timestamp to epoch time.
Find the oldest and get the filename.
Command to convert the filename timestamp to epoch time would be
date -d"2019-03-21T17:01" +%s
date -d"YYYY-MM-DDTHH:MM" +%s
You can try these steps in script
Hope so this helps you to start writing the script.

tail a log file from a specific line number

I know how to tail a text file with a specific number of lines,
tail -n50 /this/is/my.log
However, how do I make that line count a variable?
Let's say I have a large log file which is appended to daily by some program, all lines in the log file start with a datetime in this format:
Day Mon YY HH:MM:SS
Every day I want to output the tail of the log file but only for the previous days records. Let's say this output runs just after midnight, I'm not worried about the tail spilling over into the next day.
I just want to be able to work out how many rows to tail, based on the first occurrence of yesterdays date...
Is that possible?
Answering the question of the title, for anyone who comes here that way, head and tail can both accept a code for how much of the file to exclude.
For tail, use -n +num for the line number num to start at
For head, use -n -num for the number of lines not to print
This is relevant to the actual question if you have remembered the number of lines from the previous time you did the command, and then used that number for tail -n +$prevlines to get the next portion of the partial log, regardless of how often the log is checked.
Answering the actual question, one way to print everything after a certain line that you can grep is to use the -A option with a ridiculous count. This may be more useful than the other answers here as you can get a number of days of results. So to get everything from yesterday and so-far today:
grep "^`date -d yesterday '+%d %b %y'`" -A1000000 log_file.txt
You can combine 2 greps to print between 2 date ranges.
Note that this relies on the date actually occurring in the log file. It has the weakness that if no events were logged on a particular day used as the range marker, then it will fail to find anything.
To resolve that you could inject dummy records for the start and end dates and sort the file before grepping. This is probably overkill, though, and the sort may be expensive, so I won't example it.
I don't think tail has any functionality like this.
You could work out the beginning and ending line numbers using awk, but if you just want to exact those lines from the log file, the simplest way is probably to use grep combined with date to do it. Matching yesterday's date at beginning of line should work:
grep "^`date -d yesterday '+%d %b %y'`" < log_file.txt
You may need to adjust the date format to match exactly what you've got in the log file.
You can do it without tail, just grep rows with previous date:
cat my.log | grep "$( date -d "yesterday 13:00" '+%d %m %Y')"
And if you need line count you can add
| wc -l
I worked this out through trial and error by getting the line numbers for the first line containing the date and the total lines, as follows:
lines=$(wc -l < myfile.log)
start=$(cat myfile.log | grep -no $datestring | head -n1 | cut -f1 -d:)
n=$((lines-start))
and then a tail, based on that:
tail -n$n myfile.log

How to extract multiple fields with specific character lengths in Bash?

I have a file (test.csv) with a few fields and what I wanted is the Title and Path with 10 character for the title and remove a few levels from the path. What have done is use the awk command to pick two fields:
$ awk -F "," '{print substr($4, 1, 10)","$6}' test.csv [1]
The three levels in the path need to be removed are not always the same. It can be /article/17/1/ or this /open-organization/17/1 so I can't use the substr for field $6.
Here the result I have:
Title,Path
Be the ope,/article/17/1/be-open-source-supply-chain
Developing,/open-organization/17/1/developing-open-leaders
Wanted result would be:
Title,Path
Be the ope,be-open-source-supply-chain
Developing,developing-open-leaders
The title is ok with 10 characters but I still need to remove 3 levels off the path.
I could use the cut command:
cut -d'/' -f5- to remove the "/.../17/1/"
But not sure how this can be piped to the [1]
I tried to use a for loop to get the title and the path one by one by but I have difficulty in getting the awk command to run one line at time.
I have spent hours on this with no luck. Any help would be appreciated.
Dummy Data for testing:
test.csv
Post date,Content type,Author,Title,Comment count,Path,Tags,Word count
31 Jan 2017,Article,Scott Nesbitt,Book review: Ours to Hack and to Own,0,/article/17/1/review-book-ours-to-hack-and-own,Books,660
31 Jan 2017,Article,Jason Baker,5 new guides for working with OpenStack,2,/article/17/1/openstack-tutorials,"OpenStack, How-tos and tutorials",419
you can replace the string by using regex.
stringZ="Be the ope,/article/17/1/be-open-source-supply-chain"
sed -E "s/((\\/\\w+){3}\\/)//" <<< $stringZ
note that you need to use -i if you are going to give file as input to sed

Grep messages for the last hour [duplicate]

This question already has answers here:
Filter log file entries based on date range
(5 answers)
Closed 6 years ago.
I have an application which emits logs in this format:
00:00:10,799 ERROR [stderr] (http-prfilrjb08/10.1.29.34:8180-9) {}:return code: 500
I would need to monitor for new ERRORs in the log file, happened in the last hour. Looking at some tutorials I've come up with the following grep:
grep "^$(date -d -1 hour +'%H:%M:%S')" /space/log/server.log | grep 'ERROR'
However nothing is grepped! Can you help me to fix it ?
Thanks!
You need quotes around the -1 hour and also you want to remove the seconds and minutes from the output (your current solution finds data only for the first second 1 hour ago):
grep "^$(date -d '-1 hour' +'%H')" /space/log/server.log | grep 'ERROR'
grep -E "^($(date -d '-1 hour' '+%H')|$(date '+%H')):[0-9]{2}:[0-9]{2}" /space/log/server.log | grep 'ERROR'
Let's take a look at the parts
grep -E tells grep to use extended regular expressions (so we don't need to escape all those brackets)
date -d '-1 hour' '+%H' prints the previous hour. Similarly date '+%H' prints the current hour. These need to be evaluated at runtime and captured in a capture group, that's why we have the (date|date) structure (you'll probably want some data not only from the previous hour, but the current running hour).
Next you need to specify that you are indeed looking at timestamps. We use : to delimit hours, minutes and seconds. A two-digit number group can be matched with the [0-9]{2} regexp (this is basically identical to [0-9][0-9] but shorter)
There you go.
Ps. I'd recommend sed.

Resources