I have an archive folder, inside which some sub folders (as A,B,C) containing archived files.How to find and delete the oldest created files from sub folders (say B)which I want?
This command would do exactly what you desired,
find . -mindepth 2 -type f -printf '%T+ %p\n' | sort | awk 'NR==1{print $2}' | xargs rm -v
Brief explanation,
find . -mindepth 2 -type f -printf '%T+ %p\n': limit the min depth to 2, it means find would only show the files under the sub-directories or even further. And then prints the file's last modification time and its name.
Pipe the output of find ... to sort to sort the modification time of all found files.
awk 'NR==1{print $2}': pipe the output to awk to get the name of oldest file
xargs rm -v: remove the oldest file
Eidt
For further request to pass the sub-directories name using variables, here's the modified method to use. You only need to modify the awk part,
$ a="sub_dir1"
$ b="sub_dir2"
$ find ... | sort | awk -v a=$a -v b=$b '$2 ~ "./" a "/" || $2 ~ "./" b "/"{print $2 ;exit}' | xargs ...
If you are trying to delete the oldest modified file (not created), then you can use this:
rm "$(ls -t | tail -1)"
Related
I have a delete backup files function which takes in the arguments as a directory name and to backup the files of a specific directory and specific type of file like this delete_old_backup_files $(dirname $$abc) "$abc.*"
The function body is:
local fpath=$1
local fexpr=$2
# delete backup files older than a day
find $fpath -name "${fexpr##*/}" -mmin +1 -type f | xargs rm -f
Currently deleting files that are older than a day. Now I want to modify the function such that this function should delete all backup files of type $abc.*, except the last 5 backup files created. Tried various commands using stat or -printf but couldn't succeed.
What is the correct way of completing this function?
Assuming the filenames do not contain newline characters, would you please
try:
delete_old_backup_files() {
local fpath=$1
local fexpr=$2
find "$fpath" -type f -name "${fexpr##*/}" -printf "%T#\t%p\n" | sort -nr | tail -n +6 | cut -f2- | xargs rm -f --
}
-printf "%T#\t%p\n" prints the seconds since epoch (%T#) followed
by a tab character (\t) then the filename (%p) and a newline (\n).
sort -nr numerically sorts the lines in descending order (newer first,
older last).
tail -n +6 prints the 6th and following lines.
cut -f2- removes the prepended timestamp leaving the filename only.
[Edit]
In case of MacOS, please try instead (not tested):
find "$fpath" -type f -print0 | xargs -0 stat -f "%m%t%N" | sort -nr | tail -n +6 | cut -f2- | xargs rm --
In the stat command, %m is expanded to the modification time (seconds since epoch), %t is replaced with a tab, and %N to be a filename.
I would use sorting instead of find. You can use ls -t
$ touch a b c
$ sleep 3
$ touch d e f
ls -t | tr ' ' '\n' | tail -n +4
a
b
c
$ ls -t | tr ' ' '\n' | tail -n +4 | xargs rm
$ ls
d e f
From man ls:
-t sort by modification time, newest first
Make sure you create backups before you delete stuff :-)
I have a large sum of files (~50000 files).
ls /home/abc/def/
file1.txt
file2.txt
file3.txt
.........
.........
file50000.txt
I want to create a CSV file with two columns: first column provides the filename and the second provides the absolute file path as:
output.csv
file1.txt,/home/abc/def/file1.txt
file2.txt,/home/abc/def/file2.txt
file3.txt,/home/abc/def/file3.txt
.........................
.........................
file50000.txt,/home/abc/def/file50000.txt
How to do this with bash commands. I tried with ls and find as
find /home/abc/def/ -type f -exec ls -ld {} \; | awk '{ print $5, $9 }' > output.csv
but this gives me absolute paths. How to get the output as shown in output.csv above
You can get both just the filename and the full path with GNU find's -printf option:
find /home/abc/def -type f -printf "%f,%p\n"
Pipe through sort if you want sorted results.
How about:
$ find /path/ | awk -F/ -v OFS=, '{print $NF,$0}'
Add proper switches to find where needed.
if u wanna fully canonicalize all existing paths, including fixing duplicate / and resolving symlinks out to their physical why not just
find … -print0 |
or
gls --zero |
or
mawk 8 ORS='\0' filelist.txt |
xargs -0 -P 8 grealpath -ePq
In plain bash:
for file in /home/abc/def/*.txt; do printf '%s,%s\n' "${file##*/}" "$file"; done
or,
dir=/home/abc/def
cd "$dir" && for file in *.txt; do printf '%s,%s\n' "$file" "$dir/$file"; done
How would I modify this code to give me the full file path of the last modified file in the code directory, including nested sub-directories?
# Gets the last modified file in the code directory.
get_filename(){
cd "$code_directory" || no_code_directory_error # Stop script if directory doesn't exist.
last_modified=$(ls -t | head -n1)
echo "$last_modified"
}
Use find instead of ls, because the use of ls is an anti-pattern.
Use a Schwartzian transform to prefix your data with a sort key.
Sort the data.
Take what you need.
Remove the sort key.
Post process the data.
find "$code_directory" -type f -printf '%T# %p\n' |
sort -rn |
head -1 |
sed 's/^[0-9.]\+ //' |
xargs readlink -f
You can use the realpath utility.
# Gets the last modified file in the code directory.
get_filename(){
cd "$code_directory" || no_code_directory_error # Stop script if directory doesn't exist.
last_modified=$(ls -t | head -1)
echo "$last_modified"
realpath "$last_modified"
}
Output:
blah.txt
/full/path/to/blah.txt
ls -t sort by modification time and if you want first one you can add | head -1, R helps you recursively sort files, I think the only tips here is ls -tR doesn't stack all files then sort them, so you can use
find . -type f -printf "%T# %f\n" | sort -rn > out.txt
I have a directory that contains files and other directories. And I have one specific file where I know that there are duplicates of somewhere in the given directory tree.
How can I find these duplicates using Bash on macOS?
Basically, I'm looking for something like this (pseudo-code):
$ find-duplicates --of foo.txt --in ~/some/dir --recursive
I have seen that there are tools such as fdupes, but I'm neither interested in any duplicate files (only duplicates of a specific file) nor am I interested in duplicates anywhere on disk (only within the given directory or its subdirectories).
How do I do this?
For a solution compatible with macOS built-in shell utilities, try this instead:
find DIR -type f -print0 | xargs -0 md5 -r | grep "$(md5 -q FILE)"
where:
DIR is the directory you are interested in;
FILE is the file (path) you are searching for duplicates of.
If you only need the duplicated files paths, then pipe thru this as well:
cut -d' ' -f2
If you're looking for a specific filename, you could do:
find ~/some/dir -name foo.txt
which would return a list of all files with the name foo.txt in the directory. If you're looking if there are multiple files in the directory with the same name, you could do:
find ~/some/dir -exec basename {} \; | sort | uniq -d
This will give you a list of files with duplicate names (you can then use find again to figure out where those live).
---- EDIT -----
If you're looking for identical files (with the same md5 sum), you could also do:
find . -type f -exec md5sum {} \; | sort | uniq -d --check-chars=32
--- EDIT 2 ----
If your md5sum doesn't output the filename, you can use:
find . -type f -exec echo -n "{} " \; -exec md5sum {} \; | awk {'print $2 $1'} | sort | uniq -d --check-chars=32
--- EDIT 3 ----
if you're looking for a file with a specific md5 sums:
sum=`md5sum foo.txt | cut -f1 -d " "`
find ~/some/dir -type f -exec md5sum {} \; | grep $sum
This question already has answers here:
How to extract one column of a csv file
(18 answers)
Closed 8 years ago.
I have a folder of about 10 thousand files and I need to write a bash shell script that will pull a COLUMN of data out and put it in a file. Help??? Please and thank you!
EDIT To Include:
#!/bin/bash
cd /Users/Larry/Desktop/TestFolder
find . -maxdepth 1 -mindepth 1 -type d
sed '4q;d'
A separate attempt
for dir in /Users/Larry/Desktop/TestFolder
do
dir=${dir%*/}
sed -n '4q;d' > Success.txt
done
The files are comma separated value files that open in a spreadsheet program like Numbers or Excel in a spreadsheet. I want to extract a single column from each file but there are at least 10 thousand files in each folder so arguments give to error "too long".
Another attempt
find /Users/Larry/Desktop/modified -type f -maxdepth 1 -name '.csv' -print0 | xargs -0 awk -F '","' {print $2}' find /Users/Larry/Desktop/modified -type f -maxdepth 1 -name '.csv' -print0 | xargs -0 awk -F '"*,*' '{print $2}' > DidItWorkThisTime.csv
The link to a previous question does not work for large sets of files.
If the directory has so many files that you exceed the argument limit, you should use find and xargs.
find /Users/Larry/Desktop/modified -type f -maxdepth 1 -name '*.csv' -print0 |
xargs -0 awk -F '"*,"*' '{print $2}' > Success.txt
Try:
find /Users/Larry/Desktop/TestFolder -type f -maxdepth 1 -name '*.csv' -exec awk -F, '{ print $2 }' '{}' \; > Success.txt
It should execute awk on each csv file found, using a comma to separate fields (-F,), to print the second ($2) field, and redirect the output to Success.txt.
Also, you might swap > Success.txt for | tee Success.txt if you want to see the output AND have it saved to the file, at least while you're testing the command and don't want to wait for all those files to be processed to see if it worked.
A simple and straightforward adaptation of the code you already have.
find /Users/Larry/Desktop/TestFolder -maxdepth 1 -mindepth 1 -type f -name '*.csv' |
xargs cut -f2
If you want files, -type d is wrong. I changed that to -type f and added the -name option to select only *.csv files.
for dir in /Users/Larry/Desktop/TestFolder/*
do
cut -f2 "$dir"/*.csv
done
This is assuming TestFolder contains a number of directories, and each of them contains one or more *.csv files. This can be further simplified to
cut -f2 /Users/Larry/Desktop/TestFolder/*/*.csv
but this could get you the Argument lenght exceeded error you tried to avoid.
All of these will print to standard out; add >Success.txt at the end to redirect to a file.
cut -d',' -f1,2,3 *.csv > result.csv
Assuming the field delimiter in your files is , [a csv file after all] and that you need in the result columns 1,2 and 3.
Above command will have problems if needed columns are having the delimiter in the column itself: "...,...",