Bash union of two directories in one statement - bash

I'm trying to run a command that takes one location input (intended for a single directory of files), but I need to run it on files in several locations. While I'd normally run it on */*.type, I'm looking for some way to run the command over (*/dirA/*.type AND dirB/*.type).
I basically need all files of *.type within a directory structure, but they're not all at the same directory level (or I'd just do */*/*.type or something to grab them all). Unfortunately they're in a particular layout for a reason, which I can't just reorganize to make this command run.
Is there any bash shortcut/command/whatever-it's-called that I could use to get both sets of files at once?

you can say
dir{A,B}/*.type
For example running this with ls command
root#do:/tmp# ls dir{A,B}/*.type
dirA/test.type dirB/test.type

If the command works when you pass one wildcard in, that means it is expecting a list of file names. In that case you can pass it two wildcards just as easily.
command */dirA/*.type dirb/*.type

Related

I want to get the latest file names under each directory of gcs

I want to know the path to the latest file under each directory using gsutil ls.
Executing the command in a loop like this is very slow.
I want the final output to be
How can I do this?
I want to know the path to the latest file under each directory using gsutil ls.
shell script
for dir in dir_list[#];do
file+=$(gsutil ls -R ${dir} | tail -n 1);
done
Running the command in a loop process is very slow.
I want the final output to be
Is there another way?
results image
gs://bucket/dir_a/latest.txt
gs://bucket/dir_b/latest.txt
gs://bucket/dir_c/latest.txt
gs://bucket/dir_d/latest.txt
There isn't other strategy for a good reason: directory doesn't exist. So, you need to scan all the files, get the metadata, get this one which is the last, and do that for each "similar prefix".
A prefix is what you call directories "/path/to/prefix/". That's why you can only perform search by prefix in GCS not by file pattern.
So, you can imagine to build a custom app which, for each different prefix (directory), create a concurrent process (fork) dedicated to this prefix. Like that you can perform parallelization. It's not so simple to write but you can!

append a parameter to a command in the file and run the appended command

I have a the following command in a file called $stat_val_result_command.
I want to add -Xms1g parameter at the end of the file so that is should look like this:
<my command in the file> -Xms1g
However, I want to run this command after append. I am running this in a workflow system called "nextflow". I tied many things, including following, but it does not working. check the script section which runs in Bash by default:
process statisticalValidation {
input:
file stat_val_result_command from validation_results_command.flatten()
output:
file "*_${params.ticket}_statistical_validation.txt" into validation_results
script:
"""
echo " -Xms1g" >> $stat_val_result_command && ```cat $stat_val_result_command```
"""
}
Best to avoid appending to or manipulating input files localized in the workdir as these can be, and are by default, symbolic links to the original files.
In your case, consider instead exporting the JAVA_TOOL_OPTIONS environment variable. This might or might not work for you, but might give you some ideas if you have control over how the scripts are being generated:
export JAVA_TOOL_OPTIONS="-Xms1g"
bash "${stat_val_result_command}"
Also, it's generally better to avoid localizing and running scripts like this. It might be unavoidable, but usually there are better options. For example, third-party scripts, like your Bash script could be handled more simply:
Grant the execute permission to these files and copy them into a
folder named bin/ in the root directory of your project repository.
Nextflow will automatically add this folder to the PATH environment
variable, and the scripts will automatically be accessible in your
pipeline without the need to specify an absolute path to invoke them.
This of course assumes you can control and parameterize the process that creates your Bash scripts.

Loop Over Files as Input for Program, Rename and Write Output to Different Directory

I have a problem with writing the output of a program to a different directory when I loop different files as variables as inputs. I run this in the command line. The problem is that I do not know how to "tell" the program to put the output with a changed filename into another directory than the input directory.
Here is the command, although it is a bioinformatic tool which requires specific input file formats. I am sorry that I could not give a better example. Nonetheless, the program is called computeMatrix in a software-tool box called deeptools2.
command:
for f in ~/my/path/*spc_files*; do computeMatrix reference-point--referencePoint center --regionsFileName /target/region.bed --binSize 500 --scoreFileName "$f" **--outFileName "$f.matrix"** ; done \
So far, I tried to use the command basename to just get the filename and then change the directory before that. However I could not figure out:
if this is combinable
what is the correct order of the commands (e.g.:
outputFile='basename"$f"', "~/new/targetDir/'basename$f'")
Probably there are other options to solve the problem which I could not think of/ find.

Should you change the current directory in a shell script?

I've always mentally regarded the current directory as something for users, not scripts, since it is dependent on the user's location and can be different each time the script is executed.
So when I came across the Java jar utility's -C option I was a little puzzled.
For those who don't know the -C option is used before specifying a file/folder to include in a jar. Since the path to the file/folder is replicated in the jar, the -C option changes directories before including the file:
in other words:
jar -C flower lily.class
will make a jar containing the lily.class file, whereas:
jar flower/lily.class
will make a flower folder in the jar which contains lily.class
For a jar-ing script I'm making I want to use Bourne wild-cards folder/* but that would make using -C impossible since it only applies to the next immediate argument.
So the only way to use wild-cards is run from the current directory; but I still feel uneasy towards changing and using the current directory in a script.
Is there any downside to using the current directory in scripts? Is it frowned upon for some reason perhaps?
I don't think there's anything inherently wrong with changing the current directory from a shell script. Certainly it won't cause anything bad to happen, if taken by itself.
In fact, I have a standard script that I use for starting up a Java-based server, and the very first line is:
cd `dirname $0`
This ensures that the rest of the commands in the script are executed in the directory that contains the script file itself (useful when a single machine is hosting multiple server instances), regardless of where the shell script was actually invoked from. Without changing the current directory in the script, it would only work correctly if the user remember to manually cd into the corresponding directory before running the script.
In this case, performing the cd operation from within the script removes a manual step from the server startup/shutdown process, and makes things slightly less error-prone as a result.
So as with most things, there are legitimate uses for this sort of thing. And I'm sure there are also some questionable ones, as well. It really depends upon what's most appropriate for your specific use-case. Which is something I can't really comment on...I always just let maven build my JAR's for me.

regex/wildcard in scp

Is it possible to use a wildcard in scp
I am trying to achieve:
loop
{
substitue_host (scp path/file.jar user#host:path1/foo*/path2/jar/)
}
I keep on getting "scp: ambiguous target"
Actually I am calling an api with source and dest that uses scp underneath and loops over diff hosts to put files
Thanks!
In general, yes, it is certainly possible to use a wildcard in scp.
But, in your scp command, the second argument is the target, the first argument is the source. You certainly cannot copy a source into multiple targets.
If you were trying to copy multiple jars, for example, then the following would certainly work:
scp path/*.jar user#host:path2/jar/
"ambigious target" in this case is specifically complaining that the wildcard you're using results in multiple possible target directories on the #host system.
--- EDIT:
If you want to copy to multiple directories on a remote system and have to determine them dynamically, a script like the following should work:
dir_list=$(ssh user#host ls -d '/path1/foo*/path2/jar/')
for dir in $dir_list; do
scp path/file.jar user#host:$dir
done
The dir_list variable will hold the results of the execution of the ls on the remote system. The -d is so that you get the directory names, not their contents. The single quotes are to ensure that wildcard expansion waits to execute on the remote system, not on the local system.
And then you'll loop through each dir to do the remote copy into that directory.
(All this is ksh syntax, btw.)

Resources