wget - prevent creating empty directories - bash

Is there a way to stop wget from creating empty directories? Most of the files I need are found at one level of depth, i.e. in folder 2 of /1/2/, but I need to use infinite recursion because sometimes the file I need is at 1/2/3/ or deeper. Or at least, I need infinite recursion for the time being, until I figure out the maximum depth of where the files of interest are located.
Right now I'm using
wget -nH --cut-dirs=3 -rl 0 -A "*assembly*.txt" ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria
Which gets all the files I need, but I am left with a bunch of empty directories. I would prefer the directory structure /bacteria/organism/*assembly*.txt, but if creating multiple subdirectories cannot be avoided, I want to at least stop wget from creating empty directories. I can, of course, remove the empty directories after running wget, but I want to stop wget from creating them in the first place if possible

Short answer: you can't prevent the directories from being created.
You can do post-processing on the directories though:
find bacteria/ -type d -empty -exec rmdir {} \;
Looking at a bunch of these directories (including the very busy one for e. coli) it appears, as you said, that the only files matching *assembly*.txt are stored in the first directory below bacteria. Unless there's some variation to this rule, you could just do this:
wget -nH --cut-dirs=2 -rl 2 -A "*assembly*.txt" ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria
BTW if you want your directory structure to start at bacteria/ you'll need to change --cut-dirs to 2 instead of 3.

Related

How can I delete all files in all subdirectories with a certain name?

I have been trying to use the command line to delete all files in all subdirectories with the name s_1_1102_c.jpg.
This question is similar to what I need How to remove folders with a certain name but it is removing directories and I only want to delete the files with the name s_1_1102_c.jpg.
I will need to remove this file from 260 subdirectories under the L001 directory. My directory structure is like this:
L001
C5.1
s_1_1101_a.jpg
s_1_1101_c.jpg
s_1_1101_g.jpg
s_1_1101_t.jpg
s_1_1102_a.jpg
s_1_1102_c.jpg
s_1_1102_g.jpg
s_1_1102_t.jpg
s_1_1103_a.jpg
s_1_1103_c.jpg
s_1_1103_g.jpg
s_1_1103_t.jpg
C6.1
s_1_1101_a.jpg
s_1_1101_c.jpg
s_1_1101_g.jpg
s_1_1101_t.jpg
s_1_1102_a.jpg
s_1_1102_c.jpg
s_1_1102_g.jpg
s_1_1102_t.jpg
s_1_1103_a.jpg
s_1_1103_c.jpg
s_1_1103_g.jpg
s_1_1103_t.jpg
Ultimately I need to remove several files from all subdirectories (s_1_1101_g.jpg, s_1_1101_t.jpg, s_1_1102_a.jpg, s_1_1102_c.jpg, s_1_1102_g.jpg, s_1_1102_t.jpg). So maybe there is a way to provide a list of the file names I need to delete.
How can I delete these files?
find . -name "s_1_1102_c.jpg" -exec rm -f {} \;
Note: This will find and delete the file in any subdirectory of the current one. So you could execute it in L001 or wherever else you want to do this.
for i in s_1_1101_g.jpg s_1_1101_t.jpg s_1_1102_a.jpg s_1_1102_c.jpg s_1_1102_g.jpg s_1_1102_t.jpg; do
echo rm L001/*/"$i";
done
If output looks fine, remove echo.
The final method I used to delete my files was given by #Peter - Reinstate Monica
for f in s_1_1101_t.jpg s_1_1102_a.jpg s_1_1102_c.jpg s_1_1102_g.jpg s_1_1102_t.jpg s_1_1103_a.jpg s_1_1103_c.jpg s_1_1103_g.jpg s_1_1103_t.jpg s_1_1104_a.jpg s_1_1104_c.jpg s_1_1104_g.jpg s_1_1104_t.jpg s_1_2101_g.jpg s_1_2101_t.jpg s_1_2102_a.jpg s_1_2102_c.jpg s_1_2102_g.jpg s_1_2102_t.jpg s_1_2103_a.jpg s_1_2103_c.jpg s_1_2103_g.jpg s_1_2103_t.jpg s_1_2104_g.jpg s_1_2104_t.jpg; do find /hpc/home/L001 -name $f -delete; done
I was concerned that my file list would be too long but it worked in this situation.

Merge files with same name in more than 100 folders

I have a problem similar to Merge files with same name in different folders, I have about 100 different folders in which there is a .txt file "replaced_txt", the problem is that I need to merge those files but since there is 100 different folders I want to know if tehre is something quicker than doing :
cat /folder1/replaced_txt /folder2/replaced_txt /folder3/replaced_txt ...
The cat command is just about the simplest there is, so there is no obvious and portable way to make the copying of file contents any faster. The bottleneck is probably going to be finding the files, anyway, not in copying them. If indeed the files are all in subdirectories immediately below the root directory,
cat /*/replaced_txt >merged_txt
will expand the wildcard alphabetically (so /folder10/replaced_txt comes before /folder2/replaced_txt) but might run into "Argument list too long" and/or take a long time to expand the wildcard if some of these directories are large (especially on an older Linux system with an ext3 filesystem, which doesn't scale to large directories very well). A more general solution is find, which is better at finding files in arbitrarily nested subdirectories, and won't run into "Argument list too long" because it never tries to assemble all the file names into an alphabetized list; instead, it just enumerates the files it finds as it traverses directories in whichever order the filesystem reports them, and creates a new cat process when the argument list fills up to the point where the system's ARG_MAX limit would be exceeded.
find / -type f -name replaced_txt -xdev -exec cat {} + >merged_txt
If you want to limit how far subdirectories will be traversed or you only want to visit some directories, look at the find man page for additional options.

Script to change photos folder hierarchy

I'm looking for a script to change an existing folder structure quickly on my synology nas using ssh from YYYY/MM/DD/ to YYYY-MM-DD/ so nested to flat but struggling to find one or any examples probably due to me not searching for the correct terminology.
I did start to use exiftool and go through moving each item but the collection is taking ages.
for example an image01 that currently resides in say 2020/01/01/images01.jpg needs to move to 2020-01-01/images.jpg
image and video files only currently live in the day folders.
For example, you can do this with this command:
find . -mindepth 3 -type d |awk -F"/" 'system("mv " $2"/"$3"/"$4" "$2"-"$3"-"$4" && rmdir "$2"/"$3" && rmdir "$2" 2>/dev/null")'
Example input folder structure:
./2020/06/12/img1.jpg
./2020/06/12/file2.mpg
./2020/05/10/img2.jpg
./2020/05/10/img1.jpg
Result:
./2020-05-10/img1.jpg
./2020-05-10/img2.jpg
./2020-06-12/img1.jpg
./2020-06-12/file2.mpg
Explanation:
find . searches in the current directory
-mindepth 3 only searches at level 3 nesting
-type d only search for directories
| creates a pipe, directing the find result to AWK
-F"/" sets the AWK field separator to /
system command is executed for each line which transfers files to the new directory and deletes unnecessary old directories
&& do next command only when previous ones were successful
2>/dev/null directs stderr to the void so that you won't see any errors trying to delete a non-empty "year" directory
You cannot use rm -rf because the year directory may still contain other directories, hence using rmdir twice.
You can resolve the stderr redirection more elegantly by testing whether the directory of the year is empty before deleting it,
I do not put this test so as not to obscure the idea of ​​action unnecessarily.

Unix - Move folder where containing files match a name

I was wondering how to move a number of folders to another one, according to the filename of a file inside each of the folders.
I mean, let's assume I have a big amount of folders, each one with a name starting by 'folder*', each one containing 3 files. Specifically one the files contains a string which might be '-100', '-200' or '-300' for example.
I want to move the folders containing the files according to this strings, and put them in a folder called 'string'. For example, to put every folder containing a file which contains the string '-100' into the folder 'FOLDER1'I'm trying something like:
find folder* -name '100' -exec mv {} folder* FOLDER1
but it returns -bash: /usr/bin/find: Argument list too long.
How can I pass less arguments to find at every step so I don't get this.
Thank in advance.
Best.
Using your example, and running in the topmost folder containing all the folders, I believe that what you need is this:
grep -rlw folder* -e "-100" | xargs -I % mv % FOLDER1

Processing only the current directory with find/prune

I've been reading up on find's -prune action. One common task I do is to process only the files of a directory, ignoring all directories.
Prune, from what I've learned, is great for ignoring directories if you know their names (or wildcards matching their names). But what if you don't know their names (or a pattern that matches files as well as directories)?
I found that -maxdepth achieves what I'm trying to do. I'm just wondering what the equivalent -prune approach might be.
For example, say I want to process all the files of my home directory, but not recurse into any subdirectory. Let's say my directory structure and files look like this (directories ending in '/'):
~/tmpData.dat
~/.bashrc
~/.vimrc
~/.Xdefaults
~/tmp/
~/tmp/.bashrc
~/bkups/.bashrc
~/bkups/.vimrc
~/bkups/.Xdefaults
~/bkups/tmpData.dat
.. what would be the correct find/prune command?
OK, I found my own solution. I simply specify pattern(s) that match everything in my
home directory ('~/*' for example). But in order to include all my dot files (.bashrc,
etc.), I have to use two patterns; one for non-dotted filenames and one for the files
starting with dots:
find ~/* ~/.* -type d -prune -o -type f -print

Resources