Shell, copy files with similar names - bash

I would like to copy a series of similar files from the current directory to the target directory, the files under the current directory are:
prod07_sim0500-W31-0.2_velocity-models-2D_t80_f0001_ux.hst
prod07_sim0500-W31-0.2_velocity-models-2D_t80_f0001_uz.hst
prod07_sim0500-W31-0.2_velocity-models-2D_t80_f0002_ux.hst
prod07_sim0500-W31-0.2_velocity-models-2D_t80_f0002_uz.hst
prod07_sim0500-W31-0.2_velocity-models-2D_t80_f0003_ux.hst
prod07_sim0500-W31-0.2_velocity-models-2D_t80_f0003_uz.hst
Where sim is from sim0001 to sim0500 and f is from f0001 to f0009. I only need f0002, f0005 and f0008. I write the following code:
target_dir="projects/data"
for i in {0001..0500}; do
for s in f000{2,5,8}; do
files="[*]$i[*]$s[*]"
cp $files target_dir
done
done
I am very new to Shell, and wondering how to write the $files="[*]$i[*]$s[*]"$, so that it could match only the f0002, f0005 and f0008. The reason why I also use for i in {0001..0500}; do is that the files are too large and I would like to make sure I could access some completed ones (for example, including all sim0001) in the beginning.
Edit: changed for s in f0002 f0005 f0008; do to f000{2,5,8}.

What you need is globbing and a bit different quoting:
cp *"$i"*"$s"* "$target_dir"
Not storing this in a variable is intentional - it's faster and it's safe. If you end up with such a large list of files that you start running into system limits you'll have to look into xargs.

Related

Print all ongoing targets in Makefile

I have written a makefile which has pretty complicated dependency, and executes with multiple jobs in parallel (make -j100 for example). I am trying to find a way to print all the current running target names. Any idea? Thanks in advance.
If what you want is a kind of command that you can run from time to time while make is running, and that shows all currently executing recipes, you could slightly modify your recipes such that they first create a temporary file with the name of the target, do whatever they are supposed to do and delete the temporary file. Listing these temporary files anytime will then show you the currently executing recipes.
Example if all targets are located under the directory from which make is called (or sub-directories of it):
TAGSDIR := .tags
MKTAG = mkdir -p "$(TAGSDIR)/$(#D)" && touch "$(TAGSDIR)/$#"
RMTAG = rm -f "$(TAGSDIR)/$#"
<target>: <prerequisites>
#$(MKTAG)
<regular recipe>
#$(RMTAG)
And list all files under .tags to get the names of all currently running recipes. Example with find:
find .tags -type f -printf '%P\n'
You could even encapsulate this in an infinite loop and refresh the list e.g. every second:
while true; do clear; find -type f -printf '%P\n'; sleep 1; done
EDIT
Andreas noticed that this works only if the targets are all located under the directory from which make is called. If a target is ../foobar, for instance, the temporary tag file would be .tags/../foobar, which is not what we want.
Andreas suggests to substitute .. with \.\. and / with \/. We could maybe find a way to do something like this under GNU/Linux and macOS (but not exactly, you cannot have a slash in a file name) but there could still be other issues under Windows (C:, backslashes...).
We could also store the name of the target in a text file and use mktemp or an equivalent to generate the text file with a unique name. But we would then need a way to propagate this unique name from MKTAG to RMTAG. This is doable with a shell variable and a one-line recipe (or the .ONESHELL special target) but not very nice.
As you use GNU make we could also use abspath and create temporary files named $(TAGSDIR)/$(abspath $#) but I do not know what abspath does under Windows with drive letters, nor do I know if you can name a file something\c:\something under Windows...
So, if your targets are not all located under the directory from which make is called, the best is to use another solution.

How to recursively rename all files and folder including specific part of the filename with Windows Bash?

This has to be a duplicate but I have read and tried at least a dozen of Q&As here on SO, and I cannot get any of them working for my case.
Really hope this won't result in downvotes because of it.
So I'm on Windows (10) and have a Bash terminal that I want to use for my task. The MINGW64 one I downloaded when I started working with Git.
I would prefer the solution with this program, but will be perfectly happy with one in Command Prompt Terminal or even PowerShell.
I created a TemplateApp which is in C:\Apps\TemplateApp folder which has multiple folders and subfolders named TemplateApp or TemplateApp.something as well as a lot of files that have TemplateApp as a part of their name.
Could be:
TemplateApp.ext
TemplateApp.something.ext
something.TemplateApp.something.ext
Then I copied the uppermost folder to C:\Apps\TemplateApp - Copy and in turn renamed it to C:\Apps\ProductionApplication.
Now for the love of whomever, I cannot make any of the scripts I found on SO to work for my case, ie. to rename all the above mentioned files and folders by replacing TemplateApp with ProductionApplication.
Here is a bash function I wrote that I think does very much like what you are wanting to do.
function func_CreateSourceAndDestination() {
#
for (( i = 0 ; i < ${#files_syncSource[#]} ; i++ )) ; do
files_syncDestination[${i}]="${files_syncSource[${i}]#${directory_MusicLibraryRoot_source}}"
file_destinationPath="$( dirname -- "${directory_PMPRoot_destination}${files_syncDestination[${i}]}" )"
if [ ! -d "${file_destinationPath}" ] ; then
mkdir -p "${file_destinationPath}"
fi
rsync -rltDvPmz "${files_syncSource[${i}]}" "${directory_PMPRoot_destination}${files_syncDestination[${i}]}"
done
}
In my case I'm feeding into rsync for a source and a destination. I'm pulling all the file paths from an array that has been split into path segments. I have to make certain character substitutions for FAT and NTFS file systems. I do this recursively.
files_syncDestination[${i}]="${files_syncDestination[${i}]//\:/__}"
That's the magic. I load a new array with the character substituted. You could do the same with a loaded variable including your phrases for change.
files_syncDestination[${i}]="${files_syncDestination[${i}]//${targetPhrase}/${subPhrase}}"
After that change in the function, you could use rsync or cp or mv as you prefer to go from your source array to your destination array.
(The double-slash in the substitution makes the substitution global.)

Bash/shell/OS interpretation of . and .. — can I define ...?

How do . and .., as paths (vs. ranges, e.g., {1..10}, which I'm not concerned with), really work? I know what they do, and use them all the time, but don't fully grasp how/where they're interpreted. Does the shell handle them? The interpreting process? The OS?
The reason why I'm asking is that I'd like to be able to use ... to refer to ../.., .... to refer to ../../.., etc. (up to some small finite number; I don't need bash to process an arbitrarily large number of dots). I.e., if my current directory is /tmp/let/me/out, and I call cd ..., my resulting current directory should be /tmp/let. I don't particularly care if ... etc. show up in ls -a output like . and .. do, but I would like to be able to call cat /tmp/let/me/out/..../phew.txt to print the contents of /tmp/phew.txt.
Pointers to relevant documentation appreciated as well as direct answers. This kind of syntax question is very hard to Google.
I'm using bash 4.3.42, by the way, with the autocd and globstar shell options.
. and .. are genuine directory names. They are not "sort-cuts", aliases, or anything fake.
They happen to point to the same inode as the other name you use. A file or directory can have several names pointing to the same inode, these are usually known as hard links, to distinguish them from symbolic (or soft) links.
If you are on Linux or OS X you can use stat to look at most of the inode metadata - it is what ls looks at. You will see there is an inode number. If you stat . and stat current-directory-name you will see that number is the same.
The one thing that is not held in the inode is the filename - that is held in the directory.
So . and .. reside in the directory on the file system, they are not a figment of the shell's imagination. So, for example, I can use . and .. quite happily from C.
I doubt you can change them - personally I have never tried and I never will. You would have to change what these filenames linked to by editing the directory. If you managed it you would probably do irreparable damage to your file system.
I write this to clarify what has already been written before.
In many file systems a DIRECTORY is a file; a special type of file that the file system identifies as being distinctly a directly.
A directory file contains a list of names that map to files on the disk
A file, including a directly does not have an intrinsic name associated with it (not true in all file systems). The name of a file exists only in a directory.
The same file can have an entry in multiple directories (hard link). The same file can then have multiple names and multiple paths.
The file system maintains in every directory entries for "." and ".."
In such file systems there are always directory ENTRIES for the NAMES "." and "..". These entries are maintained by the file system.
The name "." links to its own directory.
The name ".." links to the parent directory EXCEPT for the top level directory where it links to itself (. and .. thus link to the same directory file).
So when you use "." and ".." as in /dir1/dir2/../dir3/./dir4/whatever,
"." and ".." are processed in the exact same way as "dir1" and "dir2".
This translation is done by the file system; not the shell.
cd ...
Does not work because there is no entry for "..." (at least not normally).
You can create a directory called "..." if you want.
You can actually achieve something like this, though this is an ugly hack:
You can run a command before every command entered to bash, and after every command. For that you trap the DEBUG pseudo signal and set a command to PROMPT_COMMAND, respectively.
trap 'ln -s ../.. ... &>/dev/null | true' DEBUG
PROMPT_COMMAND='rm ...'
With this, it seems like there's an additional entry in the current directory:
pwd
# /tmp/crazy-stuff
ls -a
# . .. ... foo
ls -a .../tmp/crazy-stuff
# . .. ... foo
Though this only works in the current directory, because the symbolic links is deleted after each command invokation. Thus ls foo/bar/... won't work this way.
Another ugly hack would be to "override" mkdir such that it populates every new directory with these symbolic links.
See also the comments on the second answer here, particularly Eliah's: https://askubuntu.com/questions/327126/what-is-a-dot-only-named-folder
Much in the same way that when you cd into some directory subdir, you're actually following a pointer that points to that directory, .. is a pointer added by the OS that points to the parent directory, and I'd imagine . works the same way.

Append part of folder name to all .gz within

I have a folder of data folders with the following structure:
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/data1.gz
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/data2.gz
sampleName2-randomNumbers/subfolder1/subfolder2/subfolder3/data1.gz
I want to modify all the data.gz within each sample folder by appending the sample name but not the random numbers to get:
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/sampleName1_data1.gz
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/sampleName1_data2.gz
sampleName2-randomNumbers/subfolder1/subfolder2/subfolder3/sampleName2_data1.gz
It seems like this should be a simple mv for loop but I haven't been able to figure out how to pull part of a folder name using basename.
for i in */Data/Intensities/BaseCalls/*.gz; do mv $i "fastq""/"${i%%-*}"."`basename $i`; done
I couldn't figure out how to make the files stay in their original folder but for my purposes it works to have all the files go to a new folder ("fastq")
I suppose the "sampleName" part doesn't include dashes. In that case, use the standard pattern removal expansion: %%. That is, suppose your full path (relative to directory root) is stored in $path, just do ${path%%-*} to extract the "sampleName" part. Search for %% in the Bash Reference Manual for more details. As a simple example:
> path=sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/data1.gz
> echo ${path%%-*}
sampleName1
Otherwise, you could also use more advanced substring extraction based on regex. See BashFAQ/100 or Manipulating Strings from the TLDP Advanced Bash Scripting Guide.
Update. Here's the full command to perform the job described, and it is entirely native to the shell:
for file in */Data/Intensities/BaseCalls/*.gz; do
mv "$file" "${file%/*}/${file%%-*}_${file##*/}"
done

Create new files from existing ones but change their extension

In shell, what is a good way to duplicating files in an existing directory so that the result gives the same file but with a different extension? So taking something like:
path/view/blah.html.erb
And adding:
path/view/blah.mobile.erb
So that in the path/view directory, there would be:
path/view/blah.html.erb
path/view/blah.mobile.erb
I'd ideally like to perform this at a directory level and not create the file if it already has both extensions but that isn't necessary.
You can do:
cd /path/view/
for f in *.html.erb; do
cp "$f" "${f/.html./.mobile.}"
done
PS: This replaces first instance of .html. with .mobile., syntax is bash specific (let me know if you're not using BASH).

Resources