Wondering how to delete the files when it's name increments? - macos

I have a file in the dir as
file3.proto
file2.proto
file1.proto
I want to delete the file1 and file2, the highest number is the latest file that I don't want to delete. How can I achieve this in the shell script?
This below thing does the job but I want to be more dynamic. I don't want to change the shell script every time if the number increments, example if the file is 4 then I need to change 1..3.
ls | grep '.proto' | rm file{1..2}.proto

ls *.proto | head -n -1 | xargs rm
which with these files
file1.proto
file2.proto
file3.proto
executes the command
rm file1.proto file2.proto
UPDATE: Be warned that ls command outputs files in alphabetical order, which is not numerical order... I mean, if you have also a file25.proto, you'll get this output from ls:
file1.proto
file25.proto
file2.proto
file3.proto
So it should be better (if possible) to rename files like file001.proto, depending on the maximum possible number of files present in the folder. This is a common issue with file names ordering...

Related

bash: use list of file names to concatenate matching files across directories and save all files in new directory

I have a large number of files that are found in three different directories. For some of these files, a file with an identical name exists in another directory. Other files exist in only one directory. I'd like to use bash to copy all of the files from the three directories to a single new directory, but for files with identically named files in more than one directory I want to concatenate the file contents across directories before saving to the new directory.
Here's an example of what my file structure looks like:
ls dir1/
file1.txt
file2.txt
file4.txt
ls dir2/
file2.txt
file5.txt
file6.txt
file9.txt
ls dir3/
file2.txt
file3.txt
file4.txt
file7.txt
file8.txt
file10.txt
Using this example, I'd like to produce a new directory that contains file1.txt through file10.txt, but with the contents of identically named files (e.g. file2.txt, file4.txt) concatenated in the new directory.
I have a unique list of all of the file names contained in my three directories (single instance of each unique file name is contained within the list). So far, I have come up with code to take a list of file names from one directory and concatenate these files with identically named files in a second directory, but I'm not sure how to use my list of file names as a reference for concatenating and saving files (instead of the output from ls in the first directory). Any ideas for how to modify? Thanks very much!
PATH1='/path/to/dir1'
PATH2='/path/to/dir2'
PATH3='/path/to/dir3'
mkdir dir_new
ls $PATH1 | while read FILE; do
cat $PATH1/"$FILE" $PATH2/"$FILE" $PATH3/"$FILE" >> ./dir_new/"$FILE"
done
You can do it like this:
mkdir -p new_dir
for f in path/to/dir*/*.txt; do
cat "$f" >> "new_dir/${f##*/}"
done
This is a common use for substring removal with parameter expansion, in order to use only the basename of the file to construct the output filename.
Or you can use a find command to get the files and execute the command for each one:
find path/to/dir* -type f -name '*.txt' -print0 |\
xargs -0 -n1 sh -c 'cat "$0" >> new_dir/"${0##*/}"'
In the above command, the filenames out of find are preserved with zero separation (-print0), and xargs also accepts a zero separated list (-0). For each argument (-n1) the command following is executed. We call sh -c 'command' for convenience to use the substring removal inside there, we can access the argument provided by xargs as $0.

Bash shell script: recursively cat TXT files in folders

I have a directory of files with a structure like below:
./DIR01/2019-01-01/Log.txt
./DIR01/2019-01-01/Log.txt.1
./DIR01/2019-01-02/Log.txt
./DIR01/2019-01-03/Log.txt
./DIR01/2019-01-03/Log.txt.1
...
./DIR02/2019-01-01/Log.txt
./DIR02/2019-01-01/Log.txt.1
...
./DIR03/2019-01-01/Log.txt
...and so on.
Each DIRxx directory has a number of subdirectories named by date, which themselves have a number of log files that need to be concatenated. The number of text files to concatenate varies, but could theoretically could be as many as 5. I would like to see the following command performed for each set of files within the dated directories (note that the files must be concatenated in reverse order):
cd ./DIR01/2019-01-01/
cat Log.txt.4 Log.txt.3 Log.txt.2 Log.txt.1 Log.txt > ../../Log.txt_2019-01-01_DIR01.txt
(I understand the above command will give an error that certain files do not exist, but the cat will do what I need of it anyways)
Aside from cding into each directory and running the above cat command, how can I script this into a Bash shell script?
If you just want to concatenate all files in all subdirectories whose name starts with Log.txt, you could do something like this:
for dir in DIR*/*; do
date=${dir##*/};
dirname=${dir%%/*};
cat $dir/Log.txt* > Log.txt_"${date}"_"${dirname}".txt;
done
If you need the files in reverse numerical order, from 5 to 1 and then Log.txt, you can do this:
for dir in DIR*/*; do
date=${dir##*/};
dirname=${dir%%/*};
cat $dir/Log.txt.{5..1} $dir/Log.txt > Log.txt_"${date}"_"${dirname}".txt;
done
That will, as you mention in your question, complain for files that don't exist, but that's just a warning. If you don't want to see that, you can redirect error output (although that might cause you to miss legitimate error messages as well):
for dir in DIR*/*; do
date=${dir##*/};
dirname=${dir%%/*};
cat $dir/Log.txt.{5..1} $dir/Log.txt > Log.txt_"${date}"_"${dirname}".txt;
done 2>/dev/null
Not as comprehensive as the other, but quick and easy. Use find and sort your output however you like (-zrn is --zero-terminated --reverse --numeric-sort) then iterate over it with read.
find . -type f -print0 |
sort -zrn |
while read -rd ''; do
cat "$REPLY";
done >> log.txt

How to delete one set of files in a directory containing similarly named files?

A series of several hundred directories contains files in the following pattern:
Dir1:
-text_76.txt
-text_81.txt
-sim_76.py
-sim_81.py
Dir2:
-text_90.txt
-text_01.txt
-sim_90.py
-sim_01.py
Within each directory, the files beginning with text or sim are essentially duplicates of the other text or sim file, respectively. Each set of duplicate files has a unique numerical identifier. I only want one set per directory. So, in Dir1, I would like to delete everything in the set labeled either 81 OR 76, with no preference. Likewise, in Dir2, I would like to delete either the set labeled 90 OR 01. Each directory contains exactly two sets, and there is no way to predict the random numerical IDs used in each directory. How can I do this?
Assuming you always have 1 known file, say text_xx.txt then you could run this script in each sub-directory:
ls text_*.txt | { read first; rm *"${first:4:4}"*; };
This will list all files matching the wildcard pattern text_*.txt. Using read takes only the first matching result of the ls command. This will result in a $first shell variable containing one fully expanded match: text_xx.txt. After that ${first:4:4} sub-strings this fully expanded match to get the characters _xx. by knowing the length of test_ and xx. Finally, rm *""* appends wild cards to the search result and executes it as a command: rm *_xx.*.
I chose to include _ and . around xx to be a bit conservative about what gets deleted.
If the length of xx is not known, things gets a bit more complicated. A safer command unsure of this length might be:
ls text_??.txt | { read first; rm *_"${first:5:2}".*; };
This should remove one "fileset" every time it is run in a given sub-directory. If there is only 1 fileset, it would still remove the fileset.
Edit: Simplified to remove unnecessary use of IFS command.
Edit: Attempt to expand on and clarify the explanation.
ls | grep -P "*[81|76]*" | xargs -d"\n" rm
ls | grep -P "*[90|01]*" | xargs -d"\n" rm
How it works:
ls lists all files (one by line since the result is piped).
grep -P filter
xargs -d"\n" rm executes rm line once for every line that is piped to it.

How to archive files under certain dir that are not text files in Mac OS?

Hey, guys, I used zip command, but I only want to archive all the files except *.txt. For example, if two dirs file1, file2; both of them have some *.txt files. I want archive only the non-text ones from file1 and file2.
tl;dr: How to tell linux to give me all the files that don't match *.txt
$ zip -r zipfile -x'*.txt' folder1 folder2 ...
Move to you desired directory and run:
ls | grep -P '\.(?!txt$)' | zip -# zipname
This will create a zipname.zip file containing everything but .txt files. In short, what it does is:
List all files in the directory, one per line (this can be achieved by using the -1 option, however it is not needed here as it's the default when output is not the terminal, it is a pipe in this case).
Extract from that all lines that do not end in .txt. Note it's grep using a Perl regular expression (option -P) so the negative lookahead can be used.
Zip the list from stdin (-#) into zipname file.
Update
The first method I posted fails with files with two ., like I described in the comments. For some reason though, I forgot about the -v option for grep which prints only what doesn't match the regex. Plus, go ahead and include a case insensitive option.
ls | grep -vi '\.txt$' | zip -# zipname
Simple, use bash's Extended Glob option like so:
#!/bin/bash
shopt -s extglob
zip -some -options !(*.txt)
Edit
This isn't as good as the -x builtin option to zip but my solution is generic across any command that may not have this nice feature.

bash script to delete old deployments

I have a directory where our deployments go. A deployment (which is itself a directory) is named in the format:
<application-name>_<date>
e.g. trader-gui_20091102
There are multiple applications deployed to this same parent directory, so the contents of the parent directory might look something like this:
trader-gui_20091106
trader-gui_20091102
trader-gui_20091010
simulator_20091106
simulator_20091102
simulator_20090910
simulator_20090820
I want to write a bash script to clean out all deployments except for the most current of each application. (The most current denoted by the date in the name of the deployment). So running the bash script on the above parent directory would leave:
trader-gui_20091106
simulator_20091106
Any help would be appreciated.
A quick one-liner:
ls | sed 's/_[0-9]\{8\}$//' | uniq |
while read name; do
rm $(ls -r ${name}* | tail -n +2)
done
List the files, chop off an underscore followed by eight digits, only keep unique names. For each name, remove everything but the most recent.
Assumptions:
the most recent will be last when sorted alphabetically. If that's not the case, add a sort that does what you want in the pipeline before tail -n +2
no other files in this directory. If there are, limit the output of the ls, or pipe it through a grep to select only what you want.
no weird characters in the filenames. If there are... instead of directly using the output of the inner ls pipeline, you'd probably want to pipe it into another while loop so you can quote the individual lines, or else capture it in an array so you can use the quoted expansion.
shopt -s exglob
ls|awk -F"_" '{a[$1]=$NF}END{for(i in a)print i"_"a[i]}'|while read -r F
do
rm !($F)
done
since your date in filename is already "sortable" , the awk command finds the latest file of each application. rm (!$F) just means remove those filename that is not latest.
You could try find:
# Example: Find and delete all directories in /tmp/ older than 7 days:
find /tmp/ -type d -mtime +7 -exec rm -rf {} \; &>/dev/null

Resources