Order grep output by second token - bash

I am grepping for log output in many log files. I want the output of the grep command to be ordered by the time, which is the second token of every log line. The log lines are preceded with the date in the following format:
[2019-Jan-18 09:46:40.385624]
The logs are only ever from today so ordering only by the time will suffice.
I am using the following command to grep for a string:
grep "needle" /path/to/logs/*
How can I order the output by ascending time? I have tried piping to the sort command
grep "needle" /path/to/logs/* | sort
but that only sorts by the filename.

To force sort to sort second column you should use command like:
grep "needle" /path/to/logs/* | sort -k2
To use for sort only second column you should use it on this way:
grep "needle" /path/to/logs/* | sort -k2,2

Related

Sort text file with cat and sort concatenation

I got a txt file with some content looking like
stuff,stuff,2012-12-12
morestuff,morestuff,2012-09-09
evenmorestuff,yeah,2012-08-02
and I want to use cat and sort to get them reverse ordered by the date as an output on my command-line by concatenation.
not sure why you think you need to cat a file into sort, but here are 2 options
cat yourFile | sort -t, -k3r
sort -t, -k3r yourFile
To test this I did
echo "stuff,stuff,2012-12-12
morestuff,morestuff,2012-09-09
evenmorestuff,yeah,2012-08-02" \
| sort -t, -k3r
output
stuff,stuff,2012-12-12
morestuff,morestuff,2012-09-09
evenmorestuff,yeah,2012-08-02
And finally, you can overwrite your existing file using the -o option like
sort -t, -o yourFile -k3r yourFile
Thanks to #karakfa for reminding me your your requirement for reverse order sort. This is accomplished by adding an r to the key specification, hence -k3r.
IHTH

Get most popular domains from log

I am trying to get most popular domain from a log file
The log format is like this
197.123.43.59, 27/May/2015:01:00:11 -0600, https://m.facebook.com/
I am interested only with the domain and i want an output as follows
XXXX facebook.com
where XXXX is the number of similar entries in logs
A one liner unix command anyone
Edit
I tried the following
grep -i * sites.log | sort | uniq -c | sort -nr | head -10 &> popular.log
but popular.log is empty , implying that command is wrong
perl -nle '$d{$1}++ if m!//([^/]+)!; END {foreach(sort {$d{$a} <= $d{$b}} keys(%d)) {print "$d{$_}\t$_"};}' your.log
if you don't mind perl
uniq -c counts unique occurences, but requires sorted input.
sort sorts a stream of data
grep has a flag -o which returns only the output that matched the regex
These three parts put together is what you need to perform map/reduce on this and get the data you want.
grep -o '[^.]*\.[^.]*$' logfile | sort | uniq -c
The grep gets only the single topdomains and the domain name, all the entries are sored by the sort and the uniq counts the occurences.
Adding a sort -n at the end will give you a list where the top most entries are the highest
grep -o '[^.]*\.[^.]*$' logfile | sort | uniq -c | sort -nr

Unix shell script to sort files depending on the 'date string' present in their file name

I am trying to sort files in a directory, depending on the 'date string' attached in the file name, for example files looks as below
SSA_F12_05122013.request.done
SSA_F13_12142012.request.done
SSA_F14_01062013.request.done
Where 05122013,12142012 and 01062013 represents the dates in format.
Please help me in providing a unix shell script to sort these files on the date string present in their file name(in descending and ascending order).
Thanks in advance.
Hmmm... why call on heavyweights like awk and Perl when sort itself has the capability to define what exactly to sort by?
ls SSA_F*.request.done | sort -k 1.13,1.16 -k 1.9,1.10 -k 1.11,1.12
Each -k option defines a "sort key":
-k 1.13,1.16
This defines a sort key ranging from field 1, column 13 to field 1, column 16. (A field is by default delimited by whitespace, which your filenames don't have.)
If your filenames are varying in length, defining the underscore as field separator (using the -t option) and then addressing columns in the third field would be the way to go.
Refer to man sort for details. Use the -r option to sort in descending order.
one way with awk and sort:
ls -1|awk -F'[_.]' '{s=gensub(/^([0-9]{4})(.*)/,"\\2\\1","g",$3);print s,$0}'|sort|awk '$0=$NF'
if we break it down:
ls -1|
awk -F'[_.]' '{s=gensub(/^([0-9]{4})(.*)/,"\\2\\1","g",$3);print s,$0}'|
sort|
awk '$0=$NF'
the ls -1 just example. I think you have your way to get the file list, one per line.
test a little bit:
kent$ echo "SSA_F13_12142012.request.done
SSA_F12_05122013.request.done
SSA_F14_01062013.request.done"|awk -F'[_.]' '{s=gensub(/^([0-9]{4})(.*)/,"\\2\\1","g",$3);print s,$0}'|
sort|
awk '$0=$NF'
SSA_F13_12142012.request.done
SSA_F14_01062013.request.done
SSA_F12_05122013.request.done
ls -lrt *.done | perl -lane '#a=split /_|\./,$F[scalar(#F)-1];$a[2]=~s/(..)(..)(....)/$3$2$1/g;print $a[2]." ".$_' | sort -rn | awk '{$1=""}1'
ls *.done | perl -pe 's/^.*_(..)(..)(....)/$3$2$1$&/' | sort -rn | cut -b9-
this would do +

Bash Script: count unique lines in file

Situation:
I have a large file (millions of lines) containing IP addresses and ports from a several hour network capture, one ip/port per line. Lines are of this format:
ip.ad.dre.ss[:port]
Desired result:
There is an entry for each packet I received while logging, so there are a lot of duplicate addresses. I'd like to be able to run this through a shell script of some sort which will be able to reduce it to lines of the format
ip.ad.dre.ss[:port] count
where count is the number of occurrences of that specific address (and port). No special work has to be done, treat different ports as different addresses.
So far, I'm using this command to scrape all of the ip addresses from the log file:
grep -o -E [0-9]+\.[0-9]+\.[0-9]+\.[0-9]+(:[0-9]+)? ip_traffic-1.log > ips.txt
From that, I can use a fairly simple regex to scrape out all of the ip addresses that were sent by my address (which I don't care about)
I can then use the following to extract the unique entries:
sort -u ips.txt > intermediate.txt
I don't know how I can aggregate the line counts somehow with sort.
You can use the uniq command to get counts of sorted repeated lines:
sort ips.txt | uniq -c
To get the most frequent results at top (thanks to Peter Jaric):
sort ips.txt | uniq -c | sort -bgr
To count the total number of unique lines (i.e. not considering duplicate lines) we can use uniq or Awk with wc:
sort ips.txt | uniq | wc -l
awk '!seen[$0]++' ips.txt | wc -l
Awk's arrays are associative so it may run a little faster than sorting.
Generating text file:
$ for i in {1..100000}; do echo $RANDOM; done > random.txt
$ time sort random.txt | uniq | wc -l
31175
real 0m1.193s
user 0m0.701s
sys 0m0.388s
$ time awk '!seen[$0]++' random.txt | wc -l
31175
real 0m0.675s
user 0m0.108s
sys 0m0.171s
This is the fastest way to get the count of the repeated lines and have them nicely printed sored by the least frequent to the most frequent:
awk '{!seen[$0]++}END{for (i in seen) print seen[i], i}' ips.txt | sort -n
If you don't care about performance and you want something easier to remember, then simply run:
sort ips.txt | uniq -c | sort -n
PS:
sort -n parse the field as a number, that is correct since we're sorting using the counts.

Get the newest file based on timestamp

I am new to shell scripting so i need some help need how to go about with this problem.
I have a directory which contains files in the following format. The files are in a diretory called /incoming/external/data
AA_20100806.dat
AA_20100807.dat
AA_20100808.dat
AA_20100809.dat
AA_20100810.dat
AA_20100811.dat
AA_20100812.dat
As you can see the filename of the file includes a timestamp. i.e. [RANGE]_[YYYYMMDD].dat
What i need to do is find out which of these files has the newest date using the timestamp on the filename not the system timestamp and store the filename in a variable and move it to another directory and move the rest to a different directory.
For those who just want an answer, here it is:
ls | sort -n -t _ -k 2 | tail -1
Here's the thought process that led me here.
I'm going to assume the [RANGE] portion could be anything.
Start with what we know.
Working Directory: /incoming/external/data
Format of the Files: [RANGE]_[YYYYMMDD].dat
We need to find the most recent [YYYYMMDD] file in the directory, and we need to store that filename.
Available tools (I'm only listing the relevant tools for this problem ... identifying them becomes easier with practice):
ls
sed
awk (or nawk)
sort
tail
I guess we don't need sed, since we can work with the entire output of ls command. Using ls, awk, sort, and tail we can get the correct file like so (bear in mind that you'll have to check the syntax against what your OS will accept):
NEWESTFILE=`ls | awk -F_ '{print $1 $2}' | sort -n -k 2,2 | tail -1`
Then it's just a matter of putting the underscore back in, which shouldn't be too hard.
EDIT: I had a little time, so I got around to fixing the command, at least for use in Solaris.
Here's the convoluted first pass (this assumes that ALL files in the directory are in the same format: [RANGE]_[yyyymmdd].dat). I'm betting there are better ways to do this, but this works with my own test data (in fact, I found a better way just now; see below):
ls | awk -F_ '{print $1 " " $2}' | sort -n -k 2 | tail -1 | sed 's/ /_/'
... while writing this out, I discovered that you can just do this:
ls | sort -n -t _ -k 2 | tail -1
I'll break it down into parts.
ls
Simple enough ... gets the directory listing, just filenames. Now I can pipe that into the next command.
awk -F_ '{print $1 " " $2}'
This is the AWK command. it allows you to take an input line and modify it in a specific way. Here, all I'm doing is specifying that awk should break the input wherever there is an underscord (_). I do this with the -F option. This gives me two halves of each filename. I then tell awk to output the first half ($1), followed by a space (" ")
, followed by the second half ($2). Note that the space was the part that was missing from my initial suggestion. Also, this is unnecessary, since you can specify a separator in the sort command below.
Now the output is split into [RANGE] [yyyymmdd].dat on each line. Now we can sort this:
sort -n -k 2
This takes the input and sorts it based on the 2nd field. The sort command uses whitespace as a separator by default. While writing this update, I found the documentation for sort, which allows you to specify the separator, so AWK and SED are unnecessary. Take the ls and pipe it through the following sort:
sort -n -t _ -k 2
This achieves the same result. Now you only want the last file, so:
tail -1
If you used awk to separate the file (which is just adding extra complexity, so don't do it sheepish), you can replace the space with an underscore again with sed:
sed 's/ /_/'
Some good info here, but I'm sure most people aren't going to read down to the bottom like this.
This should work:
newest=$(ls | sort -t _ -k 2,2 | tail -n 1)
others=($(ls | sort -t _ -k 2,2 | head -n -1))
mv "$newest" newdir
mv "${others[#]}" otherdir
It won't work if there are spaces in the filenames although you could modify the IFS variable to affect that.
Try:
$ ls -lr
Hope it helps.
Use:
ls -r -1 AA_*.dat | head -n 1
(assuming there are no other files matching AA_*.dat)
ls -1 AA* |sort -r|tail -1
Due to the naming convention of the files, alphabetical order is the same as date order. I'm pretty sure that in bash '*' expands out alphabetically (but can not find any evidence in the manual page), ls certainly does, so the file with the newest date, would be the last one alphabetically.
Therefore, in bash
mv $(ls | tail -1) first-directory
mv * second-directory
Should do the trick.
If you want to be more specific about the choice of file, then replace * with something else - for example AA_*.dat
My solution to this is similar to others, but a little simpler.
ls -tr | tail -1
What is actually does is to rely on ls to sort the output, then uses tail to get the last listed file name.
This solution will not work if the filename you require has a leading dot (e.g. .profile).
This solution does work if the file name contains a space.

Resources