sort list of files by date in bash - bash

Given a text file containing some list of files, e.g.
$ cat file_list.txt
/var/x/file1.txt
/var/y/file2.txt
<etc>
How can I sort this list of files by some criteria - like their last accessed time, or last changed time?
Thanks in advance.

You can use stat command with sort like this:
while read -r line; do
stat -c '%Y %n' "$line"
done < file_list.txt | sort -n | cut -d ' ' -f2
stat -c '%Y %n' lists time of last modification, seconds since Epoch followed by a space and file name
sort -n sorts timestamps and their filename numerically
cut -d ' ' -f2 prints only file names from sort's output

Try one liner (by modification time):
ls -t $(cat file_list.txt)
OR
ls -t `cat file_list.txt`

You can get the most recently changed file with
cat file_list.txt | xargs stat -c '%Y %n' | sort | tail -1 | cut -c 12-
You can get the most recent timestamp with
cat file_list.txt | xargs stat -c '%Y %n' | sort | tail -1 | cut -c -10

Related

extract the total count line (wc -l) number in shell

I am trying to figure out how to extract the last total count number when I use "wc -l" on multiple files under a directory. For example:
currentDir$ wc -l *.fastq
216272 a.fastq
402748 b.fastq
4789028 c.fastq
13507076 d.fastq
5818620 e.fastq
24733744 total
I would only need to extract 24733744 from the above. I tried
wc -l *.fastq | tail -l
to get
24733744 total
but not sure what to do next. If I use "cut", the annoying thing is that there are multiple spaces before the number, and I will need to use this code for other folders too, and the number of spaces may differ.
Any advice is appreciated. Thank you very much!
For this particular problem, it's probably easier to do :
cat *.fastq | wc -l
This should work with any number of spaces:
wc -l *.fastq | tail -l | tr -s ' ' | cut -f 2 -d ' '
Example:
echo " 24733744 total" | tr -s ' ' | cut -f 2 -d ' '
24733744

Count erros messages in multiple files per day

So i have a log file that display err|error messages that i wanna count every day
cat user.log | grep 'err|error' | wc -l
gimme almos all that i need whoever there are other log files that are zip so
zcat user.log.* | grep 'err|error' | wc -l
also almost there
so here is where im stuck, i need to check every log.zip file check if has any error message of today date in multiple files and also on user.log that is my current file collecting the errors
so i need to check over 50 user.log.Z files count all line that has today date
Oct 8 00:00:00 until 23:59:59
and my user.log
gratz in advance
EDIT-------------
solverd with
DATE=`date "+%b %e"` ;find /var/adm/ ! -path /var/adm/ -prune -name "user.log*" -prune -mtime -1 -exec zgrep "$DATE" {} \; |grep "user:err|error" |wc -l
If the date is part of the filename, append it (whereever it's added by logrotate):
zcat -f user.log.* user.log-$(date +%Y-%m-%d).* | grep 'err|error' | wc -l
zcat -f user.log user.log.$(date +%Y-%m-%d).* | grep 'err|error' | wc -l
grep can also count, zgrep can also process compressed files, so a bit shorter:
zgrep -c 'err|error' user.log user.log.$(date +%Y-%m-%d)*
If the date is not part of the filename, you need to process all files and filter out all lines with a different date:
zgrep $(date +%Y-%m-%d) file ... | grep -c 'err|error'
zgrep $(date "+%b %d") file ... | grep -c 'err|error'
Or using awk:
# Assumes the date is in the first field, like "2018-10-09 ... err|error ..."
zcat ... | awk 'BEGIN{d=strftime("%Y-%m-%d")}/err\|error/&&$01==d{c++}END{print c}'

Getting file size in bytes with bash (Ubuntu)

Hi, i'm looking for a way to output a filesize in bytes. Whatever i try i will get either 96 or 96k instead of 96000.
if [[ -d $1 ]]; then
largestN=$(find $1 -depth -type f | tr '\n' '\0' | du -s --files0-from=- | sort | tail -n 1 | awk '{print $2}')
largestS=$(find $1 -depth -type f | tr '\n' '\0' | du -h --files0-from=- | sort | tail -n 1 | awk '{print $1}')
echo "The largest file is $largestN which is $largestS bytes."
else
echo "$1 is not a directory..."
fi
This prints "The largest file [file] is 96k bytes"
there is -b option for this
$ du -b ...
Looks like you're trying to find the largest file in a given directory. It's more efficient (and shorter) to let find do the heavy lifting for you:
find $1 -type f -printf '%s %p\n' | sort -n | tail -n1
Here, %s expands to the size in bytes of the file, and %p expands to the name of the file.

Bash scripting: Deleting the oldest directory

I want to look for the oldest directory (inside a directory), and delete it. I am using the following:
rm -R $(ls -1t | tail -1)
ls -1t | tail -1 does indeed gives me the oldest directory, the the problem is that it is not deleting the directory, and that it also list files.
How could I please fix that?
rm -R "$(find . -maxdepth 1 -type d -printf '%T#\t%p\n' | sort -r | tail -n 1 | sed 's/[0-9]*\.[0-9]*\t//')"
This works also with directory whose name contains spaces, tabs or starts with a "-".
This is not pretty but it works:
rm -R $(ls -lt | grep '^d' | tail -1 | tr " " "\n" | tail -1)
rm -R $(ls -tl | grep '^d' | tail -1 | cut -d' ' -f8)
find directory_name -type d -printf "%TY%Tm%Td%TH%TM%TS %p\n" | sort -nr | tail -1 | cut -d" " -f2 | xargs -n1 echo rm -Rf
You should remove the echo before the rm if it produces the right results

sort | uniq | xargs grep ... where lines contain spaces

I have a comma delimited file "myfile.csv" where the 5th column is a date/time stamp. (mm/dd/yyyy hh:mm). I need to list all the rows that contain duplicate dates (there are lots)
I'm using a bash shell via cygwin for WinXP
$ cut -d, -f 5 myfile.csv | sort | uniq -d
correctly returns a list of the duplicate dates
01/01/2005 00:22
01/01/2005 00:37
[snip]
02/29/2009 23:54
But I cannot figure out how to feed this to grep to give me all the rows.
Obviously, I can't use xargs straight up since the output contains spaces. I thought I could do uniq -z -d but for some reason, combining those flags causes uniq to (apparently) return nothing.
So, given that
$ cut -d, -f 5 myfile.csv | sort | uniq -d -z | xargs -0 -I {} grep '{}' myfile.csv
doesn't work... what can I do?
I know that I could do this in perl or another scripting language... but my stubborn nature insists that I should be able to do it in bash using standard commandline tools like sort, uniq, find, grep, cut, etc.
Teach me, oh bash gurus. How can I get the list of rows I need using typical cli tools?
sort -k5,5 will do the sort on fields and avoid the cut;
uniq -f 4 will ignore the first 4 fields for the uniq;
Plus a -D on the uniq will get you all of the repeated lines (vs -d, which gets you just one);
but uniq will expect tab-delimited instead of csv, so tr '\t' ',' to fix that.
Problem is if you have fields after #5 that are different. Are your dates all the same length? You might be able to add a -w 16 (to include time), or -w 10 (for just dates), to the uniq.
So:
tr '\t' ',' < myfile.csv | sort -k5,5 | uniq -f 4 -D -w 16
The -z option of uniq needs the input to be NUL separated. You can filter the output of cut through:
tr '\n' '\000'
To get zero separated rows. Then sort, uniq and xargs have options to handle that. Try something like:
cut -d, -f 5 myfile.csv | tr '\n' '\000' | sort -z | uniq -d -z | xargs -0 -I {} grep '{}' myfile.csv
Edit: the position of tr in the pipe was wrong.
You can tell xargs to use each line as an argument in its entirety using the -d option. Try:
cut -d, -f 5 myfile.csv | sort | uniq -d | xargs -d '\n' -I '{}' grep '{}' myfile.csv
This is a good candidate for awk:
BEGIN { FS="," }
{ split($5,A," "); date[A[0]] = date[A[0]] " " NR }
END { for (i in date) print i ":" date[i] }
Set field seperator to ',' (CSV).
Split fifth field on the space, stick result in A.
Concatenate the line number to the list of what we have already stored for that date.
Print out the line numbers for each date.
Try escaping the spaces with sed:
echo 01/01/2005 00:37 | sed 's/ /\\ /g'
cut -d, -f 5 myfile.csv | sort | uniq -d | sed 's/ /\\ /g' | xargs -I '{}' grep '{}' myfile.csv
(Yet another way would be to read the duplicate date lines into an IFS=$'\n' array and iterate over it in a for loop.)

Resources