How to use Bash script or awk to extract a file name - bash

I receive files with names constructed in the following format
[2letters e.g.AF][6-digit-number sequence][Date in ccyymmdd][Time in hhmmss]
For Example:
AF00010720120917144500.csv
I want to automate loading such files onto my database using the date part of the file.
something which may start like this:
#!/bin/bash
filename_datepart=$(echo `date -d "1 day ago" +"%d%m%Y"`)
filename="/home/hlosi/AF000107"$filename_datepart".csv"
But remember, the part 000107 changes with each new file.

You can use wildcards to fill in the unknown values
#!/bin/bash
file=/home/hlosi/AF??????`date -d "1 day ago" +"%d%m%Y"`??????.csv
echo $file

Here is a BASH solution:
#!/bin/bash
#The full name
fullname="/home/hlosi/AF00010720120917144500.csv"
#Rip off the directory
file=$(basename "$fullname")
#Now pull out just the characters that we want
extract=$(echo "$file" | cut -c3-8)
echo "You want: $extract"

I think you want this. In case you have to handle multiple files.
#!/bin/bash
fpath=/home/hlosi/
filename_datepart=$(echo `date -d "1 day ago" +"%d%m%Y"`)
files=$(find $fpath -iname "*$filename_datepart.csv")
for file in $files
do
echo "found file: " $file
done

forgive me for my ignorance there is -atime -ctime -mtime I think its -ctime
find -ctime 1 -name \*.csv -print
-mtime match ending of csv and are exactly 1 day old, the trouble of this is it works in 24 hour period so files less than 24 hours but still yesterday would not show
this would be a simpler way of doing things since and would not care about changes in file name formatting for future proofing.
cd pathtcsv;
d=`date -d "1 day ago" +"%d"`
d=$d find . -type f -name \*.csv -ctime 1 -exec ls -l {} \;|awk '$7 ~ d'|awk '{print $NF}'| awk '{ print substr( $0, length($0) - 1, length($0) ) }'
# D = $d which is set as yesterday's date, it finds files from yseterday that have csv ending it then does an ls, pipes into awk and checks out value 7 against the date of yesterday which $7 is the date value on ls -l, it finally prints the last field and pipes into a final awk which prints the string and splits from char position to char position which is what you wanted ? you need to figure out what chars you need here is another example of above for char positions of 0 to 10.
d=$d find . -type f -ctime 1 -name \*.csv -exec ls -l {} \;|awk '$7 ~ d'|awk '{print $NF}'| awk '{ print substr( $0, 0, 10)}'

Related

Searching for .extension files recursively and print the number of lines in the files found?

I ran into a problem I am trying to solve but can't think about a way without doing the whole thing from the beginning. My script gets an extension and searches for every .extension file recursively, then outputs the "filename:row #:word #". I would like to print out the total amount of row #-s found in those files too. Is there any way to do it using the existing code?
for i in find . -name '*.$1'|awk -F/ '{print $NF}'
do
echo "$i:`wc -l <$i|bc`:`wc -w <$i|bc`">>temp.txt
done
sort -r -t : -k3 temp.txt
cat temp.txt
I think you're almost there, unless I am missing something in your requirements:
#!/bin/bash
total=0
for f in `find . -name "*.$1"` ; do
lines=`wc -l < $f`
words=`wc -w < $f`
total=`echo "$lines+$total" | bc`
echo "* $f:$lines:$words"
done
echo "# Total: $total"
Edit:
Per recommendation of #Mark Setchel in the comments, this is a more refined version of the script above:
#!/bin/bash
total=0
for f in `find . -name "*.$1"` ; do
read lines words _ < <(wc -wl "$f")
total=$(($lines+$total))
echo "* $f:$lines:$words"
done
echo "# Total: $total"
Cheers
This is a one-liner printing the lines found per file, the path of the file and at the end the sum of all lines found in all the files:
find . -name "*.go" -exec wc -l {} \; | awk '{s+=$1} {print $1, $2} END {print s}'
In this example if will find for all files ending *.go then will execute use wc -l to get the number of lines and print the output to stdout, awk then is used to sum all the output of column 1 in the variable s the one will be only printed at the end: END {print s}
In case you would also like to get the words and the total sum at the end you could use:
find . -name "*.go" -exec wc {} \; | \
awk '{s+=$1; w+=$2} {print $1, $2, $4} END {print "Total:", s, w}'
Hope this can give you an idea about how to format, sum etc. your data based on the input.

find oldest file from list

I've a file with a list of files in different directories and want to find the oldest one.
It feels like something that should be easy with some shell scripting but I don't know how to approach this. I'm sure it's really easy in perl and other scripting languages but I'd really like to know if I've missed some obvious bash solution.
Example of the contents of the source file:
/home/user2/file1
/home/user14/tmp/file3
/home/user9/documents/file9
#!/bin/sh
while IFS= read -r file; do
[ "${file}" -ot "${oldest=$file}" ] && oldest=${file}
done < filelist.txt
echo "the oldest file is '${oldest}'"
You can use stat to find the last modification time of each file, looping over your source file:
oldest=5555555555
while read file; do
modtime=$(stat -c %Y "$file")
[[ $modtime -lt $oldest ]] && oldest=$modtime && oldestf="$file"
done < sourcefile.txt
echo "Oldest file: $oldestf"
This uses the %Y format of stat, which is the last modification time. You could also use %X for last access time, or %Z for last change time.
Use find() to find the oldest file:
find /home/ -type f -printf '%T+ %p\n' | sort | head -1 | cut -d' ' -f2-
And with source file:
find $(cat /path/to/source/file) -type f -printf '%T+ %p\n' | sort | head -1 | cut -d' ' -f2-

Bash script to limit a directory size by deleting files accessed last

I had previously used a simple find command to delete tar files not accessed in the last x days (in this example, 3 days):
find /PATH/TO/FILES -type f -name "*.tar" -atime +3 -exec rm {} \;
I now need to improve this script by deleting in order of access date and my bash writing skills are a bit rusty. Here's what I need it to do:
check the size of a directory /PATH/TO/FILES
if size in 1) is greater than X size, get a list of the files by access date
delete files in order until size is less than X
The benefit here is for cache and backup directories, I will only delete what I need to to keep it within a limit, whereas the simplified method might go over size limit if one day is particularly large. I'm guessing I need to use stat and a bash for loop?
I improved brunner314's example and fixed the problems in it.
Here is a working script I'm using:
#!/bin/bash
DELETEDIR="$1"
MAXSIZE="$2" # in MB
if [[ -z "$DELETEDIR" || -z "$MAXSIZE" || "$MAXSIZE" -lt 1 ]]; then
echo "usage: $0 [directory] [maxsize in megabytes]" >&2
exit 1
fi
find "$DELETEDIR" -type f -printf "%T#::%p::%s\n" \
| sort -rn \
| awk -v maxbytes="$((1024 * 1024 * $MAXSIZE))" -F "::" '
BEGIN { curSize=0; }
{
curSize += $3;
if (curSize > maxbytes) { print $2; }
}
' \
| tac | awk '{printf "%s\0",$0}' | xargs -0 -r rm
# delete empty directories
find "$DELETEDIR" -mindepth 1 -depth -type d -empty -exec rmdir "{}" \;
Here's a simple, easy to read and understand method I came up with to do this:
DIRSIZE=$(du -s /PATH/TO/FILES | awk '{print $1}')
if [ "$DIRSIZE" -gt "$SOMELIMIT" ]
then
for f in `ls -rt --time=atime /PATH/TO/FILES/*.tar`; do
FILESIZE=`stat -c "%s" $f`
FILESIZE=$(($FILESIZE/1024))
DIRSIZE=$(($DIRSIZE - $FILESIZE))
if [ "$DIRSIZE" -lt "$LIMITSIZE" ]; then
break
fi
done
fi
I didn't need to use loops, just some careful application of stat and awk. Details and explanation below, first the code:
find /PATH/TO/FILES -name '*.tar' -type f \
| sed 's/ /\\ /g' \
| xargs stat -f "%a::%z::%N" \
| sort -r \
| awk '
BEGIN{curSize=0; FS="::"}
{curSize += $2}
curSize > $X_SIZE{print $3}
'
| sed 's/ /\\ /g' \
| xargs rm
Note that this is one logical command line, but for the sake of sanity I split it up.
It starts with a find command based on the one above, without the parts that limit it to files older than 3 days. It pipes that to sed, to escape any spaces in the file names find returns, then uses xargs to run stat on all the results. The -f "%a::%z::%N" tells stat the format to use, with the time of last access in the first field, the size of the file in the second, and the name of the file in the third. I used '::' to separate the fields because it is easier to deal with spaces in the file names that way. Sort then sorts them on the first field, with -r to reverse the ordering.
Now we have a list of all the files we are interested in, in order from latest accessed to earliest accessed. Then the awk script adds up all the sizes as it goes through the list, and begins outputting them when it gets over $X_SIZE. The files that are not output this way will be the ones kept, the other file names go to sed again to escape any spaces and then to xargs, which runs rm them.

Find files in order of modification time

I have a certain shell script like this:
for name in `find $1 -name $2 -type f -mmin +$3`
do
Filename=`basename "ls $name"`
echo "$Filename">>$1/order.txt
done
find command returns N number of files in alphabetical order. Their names are inserted into order.txt in alphabetical order. How to change this into the order of modification time?
i.e., if file F2 was modified first then file F1, then the above script enters first F1 then F2 into order.txt as per alphabetical order. But I want F2 to be entered first then F1, that is as per order of modification timeI want order.txt after the script to be
Order.txt=>
F2F1and not as F1F2Please help
find has an -exec switch, allowing you to pass any matched filenames to an external command:
find $1 -name $2 -type -mmin +$3 -exec ls -1t [-r] {} +
With this, find will pass all of the matching files at once to ls and allow that to do the sorting for you. With the optional -r flag, files will be printed in order of oldest to newest; without, in order of newest to oldest.
for name in `find $1 -name $2 -type f -mmin +$3`
do
ftime=$(stat -c %Y $name)
Filename=`basename "ls $name"`
echo "$ftime $Filename"
done | sort -n | awk '{print $2}' > $1/order.txt
One way: get file mtimes in seconds since epoch, sort on seconds since epoch, then print only the filename
Here you go:
find_date_sorted() {
# Ascending, by ISO date
while IFS= read -r -d '' -u 9
do
cut -d ' ' -f 3- <<< "$REPLY"
done 9< <(find ${1+"$#"} -printf '%TY-%Tm-%Td %TH:%TM:%TS %p\0' | sort -z)
}

KornShell script to get files between two dates

Need to get the files between two given dates via a KornShell (ksh) script. If there are multiple files on one day get the latest of the files for that day.
I haven't tried it out, but there's a mailing list post about finding files between two dates. The relevant part:
Touch 2 files, start_date and
stop_date, like this: $ touch -t
200603290000.00 start_date $ touch -t 200603290030.00 stop_date
Ok, start_date is 03/29/06 midnight,
stop_date is 03/29/06 30 minutes after
midnight. You might want to do a ls
-al to check.
On to find, you can find -newer and
then ! -newer, like this: $ find /dir
-newer start_date ! -newer stop_date -print
Combine that with ls -l, you get: $
find /dir -newer start_date ! -newer
stop_date -print0 | xargs -0 ls -l
(Or you can try -exec to execute ls
-l. I am not sure of the format, so you have to muck around a little bit)
in bash shell, just an example, you can use the -nt test operator (korn shell comes with it also, if i am not wrong).
printf "Enter start date( YYYYMMDD ):"
read startdate
printf "Enter end date( YYYYMMDD ):"
read enddate
touch -t "${startdate}0000.00" sdummy
touch -t "${enddate}0000.00" edummy
for fi in *
do
if [ $fi -nt "sdummy" -a ! $fi -nt "edummy" ] ;then
echo "-->" $fi
fi
done
In a nut shell for ksh :
!/usr/bin/ksh
# main from_date to_date path
# date format: YYMMDDhhmmss
ls -l --time-style "+%y%m%d%H%M%S" $3 | awk '{ print $6 " " $7 }' | while read t n
do
if (( t > $1 )) && (( t < $2 )); then
echo $t $n
fi
done

Resources