I know how to tail a text file with a specific number of lines,
tail -n50 /this/is/my.log
However, how do I make that line count a variable?
Let's say I have a large log file which is appended to daily by some program, all lines in the log file start with a datetime in this format:
Day Mon YY HH:MM:SS
Every day I want to output the tail of the log file but only for the previous days records. Let's say this output runs just after midnight, I'm not worried about the tail spilling over into the next day.
I just want to be able to work out how many rows to tail, based on the first occurrence of yesterdays date...
Is that possible?
Answering the question of the title, for anyone who comes here that way, head and tail can both accept a code for how much of the file to exclude.
For tail, use -n +num for the line number num to start at
For head, use -n -num for the number of lines not to print
This is relevant to the actual question if you have remembered the number of lines from the previous time you did the command, and then used that number for tail -n +$prevlines to get the next portion of the partial log, regardless of how often the log is checked.
Answering the actual question, one way to print everything after a certain line that you can grep is to use the -A option with a ridiculous count. This may be more useful than the other answers here as you can get a number of days of results. So to get everything from yesterday and so-far today:
grep "^`date -d yesterday '+%d %b %y'`" -A1000000 log_file.txt
You can combine 2 greps to print between 2 date ranges.
Note that this relies on the date actually occurring in the log file. It has the weakness that if no events were logged on a particular day used as the range marker, then it will fail to find anything.
To resolve that you could inject dummy records for the start and end dates and sort the file before grepping. This is probably overkill, though, and the sort may be expensive, so I won't example it.
I don't think tail has any functionality like this.
You could work out the beginning and ending line numbers using awk, but if you just want to exact those lines from the log file, the simplest way is probably to use grep combined with date to do it. Matching yesterday's date at beginning of line should work:
grep "^`date -d yesterday '+%d %b %y'`" < log_file.txt
You may need to adjust the date format to match exactly what you've got in the log file.
You can do it without tail, just grep rows with previous date:
cat my.log | grep "$( date -d "yesterday 13:00" '+%d %m %Y')"
And if you need line count you can add
| wc -l
I worked this out through trial and error by getting the line numbers for the first line containing the date and the total lines, as follows:
lines=$(wc -l < myfile.log)
start=$(cat myfile.log | grep -no $datestring | head -n1 | cut -f1 -d:)
n=$((lines-start))
and then a tail, based on that:
tail -n$n myfile.log
Related
I have a dir with a crap load (hundreds) of log files from over time. In certain cases I want to make a note regarding the most recent (by date in filename, not by creation time) log or I just need some piece of info from it and i want to view it quickly and I just know it was (usually) the last one created (but always) with the newest date. So I wanted to make a "simple" function in my bashrc to overcome this problem, basically what I want is a function that goes to a specific dir and finds the latest log by date (always in the same format) and open it with less or whatever pager I want.
The logs are formatted like this:
typeoflog-short-description-$(date "+%-m-%-d-%y")
basically the digits in between the last 3 dashes are what I'm interested in, for example(s):
update-log-2-24-18
removed-cuda-opencl-nvidia-12-2-19
whatever-changes-1-18-19
Now if it was January, 20 2019 and this was the last log added to the dir I need a way to see what the highest number is in the last 2 digits of the filename (that i don't really have a problem with), then check for the highest month that would be 2 "dashes" from the last set of digits whether it be 2 digits or 1 for the month, and then do the same thing for the day of the month and set that as a local variable and use it like the following example.
Something like this:
viewlatestlog(){
local loc="~/.logdir"
local name=$(echo $loc/*-19 | #awk or cut or sort or i could even loop it from 1-31 and 1-12 for the days and months.)
#I have ideas, but i know there has to be a better way to do this and it's not coming to me, maybe with expr or a couple of sort commands; i'm not sure, it would have been easier if i had made is so that each date number had 2 digits always... But I didn't
## But the ultimate goal is that i can run something like this command at the end
less $loc/$name
{
PS. For bonus points you could also tell me if there is a way to automatically copy the filename (with the location and all or without, I don't really care) to my linux clipboard, so when I'm making my note I can "link" to the log file if I ever need to go back to it...
Edit: Cleaned up post a little bit, I tend to my questions way too wordy, I apologize.
GNU sort can sort by fields:
$ find . -name whatever-changes-\* | sort -n -t- -k5 -k3 -k4
./whatever-changes-3-01-18
./whatever-changes-1-18-19
./whatever-changes-2-12-19
./whatever-changes-11-01-19
The option -t specifies the field delimiter and the option -k selects the fields starting with 1. The option -n specifies numeric sort.
Assuming your filenames do not contain tabs or newlines, how about:
loc="~/.logdir"
for f in "$loc"/* ; do
if [[ $f =~ -([0-9]{1,2})-([0-9]{1,2})-([0-9]{2})$ ]]; then
mm=${BASH_REMATCH[1]}
dd=${BASH_REMATCH[2]}
yy=${BASH_REMATCH[3]}
printf "%02d%02d%02d\t%s\n" "$yy" "$mm" "$dd" "$f"
fi
done | sort -r | head -n 1 | cut -f 2
First extract the month, date, and year from the filename.
Then create a date string formatted as "YYMMDD" and prepend to the
filename delimited by a tab character.
Then you can perform the sort command on the list.
Finally you can obtain the desired (latest) filename by extracting with top and cut.
Hope this helps.
i try to delete all days of a csv file which not matched last days. But I find not the right solution.
date,price
2018-07-02,162.17
2018-06-29,161.94
2018-06-28,162.22
2018-06-27,162.32
2018-06-12,163.01
2018-06-11,163.53
2018-05-31,164.87
2018-05-30,165.59
2018-05-29,165.42
2018-05-25,165.96
2018-05-02,164.94
2018-04-30,166.16
2018-04-27,166.69
The output I want become
date,price
2018-06-29,161.94
2018-05-31,164.87
2018-04-30,166.16
I try it with cut + grep
cut -d, -f1 file.csv | grep -E "28|29|30"
Work but bring nothing when combine -f1,2.
I find csvkit which seem to me the right tool, but I find not the solution for multiple grep.
csvgrep -c 1 -m 30 file.csv
Bring me the right result but how can combine multiple search option? I try -m 28,29,30 and -m 28 -m 29 -m 30 all work not. Best it work with last day of every month.
Maybe one have here a idea.
Thank you and nice Sunday
Silvio
You want to get all records of the LAST day of the month. But months vary in length (28-29-30-31).
I don't see why you used cut to extract the first field (the date part), because the data in the second field does not look like dates at all (xx-xx).
I suggest to use grep directly to display the lines that matches the following pattern mm-dd; where mm is the month number, and dd is the last day of the month.
This command should do the trick:
grep -E "01-31|02-(28|29)|03-31|04-30|05-31|06-30|07-31|08-30|09-31|10-30|11-31|12-30" file.csv
This command will give the following output:
2018-05-31,164.87
2018-04-30,166.16
I need to grep all lines for "yesterday" in /var/log/messages.
When I use following snippet, I get zero results due to the fact that the dates are in the format "Jun 9". (It doesn't show here, but in the log file, the days of the month are padded with an extra space when smaller than 10).
cat /var/log/messages | grep `date --date="yesterday" +%b\ %e`
When I enter
$ date --date="yesterday" +%b\ %e
on the commandline, it returns yesterday's date, with padding.
But when I combine it with grep and the backticks, the extra padding gets suppressed, which gives me zero results.
What do I need to change so that the "date" gets evaluated with extra padding?
You should be able to fix this by putting quotes around the backticks:
cat /var/log/messages | grep "`date --date="yesterday" +%b\ %e`"
How to filter the most counted first line var in all the files under directory (where other directories should also be checked)?
I want to find all the lines in my files (I want all the files in lots of folders under pwd) first variable where this first var display the most times
I am trying to use awk like this:
awk -f : { print $1} FILENAME
EDIT:
I will explain the purpose:
I have a server and i want to filter his logs cause I have a certain IP which repeat every day 100 times the first var in line is the ip
I want the find what is the ip which repeats problem : i have two servers therefore checking this will not be effiant by checking one log for 100 times I hope that this script will help me find out what is the IP that repeats ...
You should rewrite your question to make it clearer. I understood that you want to know which first lines are most common across a set of files. For that, I'd use this:
head -qn 1 * | sort | uniq -c | sort -nr
head prints the first line for every file in the current directory. -q causes it not to print the name of the file too; -n lets you specify the amount of lines).
sort groups them in sorted order.
uniq -c counts the occurrences, that is the amount of repeated lines in each block after the previous sort.
sort -r orders them with the most popular coming first. -r means reverse; by default it sorts in ascending order.
Not sure, if this helps. Question is not so clear.
Try if something like this can help.
find . -type f -name "*.*" -exec head -1 {} \; 2>/dev/null | awk -F':' 'BEGIN {max=0;}{if($2>max){max=$2;}}END{print max;}'
find - tries to find all the files from the current directory till end (type f) with any name and extension (*.*) and gets the first line of each of those files.
awk - sets the field seperator as : (-F:) and before processing the first line BEGIN sets the max to 0.
gets the second field after : ($2) checks if $2 > current_max_value. If it is, then it sets the current field as the new max value.
At the end of processing all the lines(first lines from all the files under current directory) END prints the max value.
i want to get the last part since a given timestamp "t0" from a possible huge logfile (>50..1000mb):
__________________
|1 xxx xxx ... |
|2 xxx ... | uninteresting part
|4 ... |
|... |
___|423 ... | ___ timestamp t0
|425 xxx ... |
|437 ... |
|... | <-- i want this part ( from t0 to EOF)
|__________________|
and an additional constraint is that i want to do this using simple bash commands. a simple solution may be:
awk '$1 > 423' file.log
but this scans the whole file with all the unintresting lines. there's the command tail but i just can give him the number of last lines i want which i don't know - i just know the timestamp. is there a way "awking" from behind and stop processing when the first timestamp doesn't match?
tac is your friend here:
tac file.log | awk '{ if ($1 >= 423) print; else exit; }' | tac
tac will dump each line of a file starting with the last line, then working to the beginning of the file. do it once to get the lines you want, then do it again to fix their order.
If I understand right you just need to get n lines from a timestamp regexp to the end of the file.
Lest say your huge file is something like this:
~$ cat > file << EOF
rubish
n lines of rubish
more rubish
timestamp regexp
interesting
n interesting lines
interesting
end of file
EOF
If you are able to get a feasible regexp for the timestamp you are looking for, you can get the part you want with sed:
~$ sed -n '/timestamp regexp/,$ {p}' file
timestamp regexp
interesting
n interesting lines
interesting
end of file
Using standard Unix commands, there isn't much you can do other than scan the entire file. If you write your own program, you could do a binary search on the file:
seek to a point in the file,
read forwards to the next start of record,
check whether the timestamp is too big or too small,
and iterate until you find the right point in the file.
You might even do a search with linear interpolation rather than a pure binary search if the time stamps are pure numbers; it probably isn't worth the extra coding if the stamps are more complex, but it depends on how often you're going to need this.
Indeed, unless you are going to be doing this a lot and can demonstrate that the performance is a problem, I'd go with the simple awk solution.
you can poll until you hit "423". Just a hypothetical example (not tested)
n=100 # number of lines you want to go back
while true
do
if tail -$n file | grep -q "423" ;then
tail -$n file | awk '$1>423'
break
else
((n+=100)) # increment every 100 lines
fi
done