Bash: Identifying file based on part of filename - bash

I have a folder containing paired files with names that look like this:
PB3999_Tail_XYZ_1234.bam
PB3999_PB_YWZ_5524.bam
I want to pass the files into a for loop as such:
for input in `ls PB*_Tail_.bam`; do tumor=${input%_Tail_*.bam}; $gatk Mutect2 -I $input -I$tumor${*}; done
The issue is, I can't seem to get the syntax right for the tumor input. I want it to recognise the paired file by the first part of the name PB3999_PB while ignoring the second half of the file name _YWZ_5524 that does not match.
Thank you for any help!

Just replaced ${*} with * and added _PB_ suffix to the prefix, to the script in the question. And, renamed variables.
for tailfname in PB*_Tail_*.bam; do
pairprefix="${tailfname%_Tail_*.bam}"
echo command with ${tailfname} ${pairprefix}_PB_*.bam
done
Hope this helps. The name tumor sounds scary. Hope the right files are paired.

I'm trying to fully understand what you want to do here.
If you want to extract just the first two parts, this should do:
echo "PB3999_Tail_XYZ_1234.bam" | cut -d '_' -f 1-2
That returns just the "PB3999_Tail" part.

Related

How to batch replace part of filenames with the name of their parent directory in a Bash script?

All of my file names follow this pattern:
abc_001.jpg
def_002.jpg
ghi_003.jpg
I want to replace the characters before the numbers and the underscore (not necessarily letters) with the name of the directory in which those files are located. Let's say this directory is called 'Pictures'. So, it would be:
Pictures_001.jpg
Pictures_002.jpg
Pictures_003.jpg
Normally, the way this website works, is that you show what you have done, what problem you have, and we give you a hint on how to solve it. You didn't show us anything, so I will give you a starting point, but not the complete solution.
You need to know what to replace: you have given the examples abc_001 and def_002, are you sure that the length of the "to-be-replaced" part always is equal to 3? In that case, you might use the cut basic command for deleting this. In other ways, you might use the position of the '_' character or you might use grep -o for this matter, like in this simple example:
ls -ltra | grep -o "_[0-9][0-9][0-9].jpg"
As far as the current directory is concerned, you might find this, using the environment variable $PWD (in case Pictures is the deepest subdirectory, you might use cut, using '/' as a separator and take the last found entry).
You can see the current directory with pwd, but alse with echo "${PWD}".
With ${x#something} you can delete something from the beginning of the variable. something can have wildcards, in which case # deletes the smallest, and ## the largest match.
First try the next command for understanding above explanation:
echo "The last part of the current directory `pwd` is ${PWD##*/}"
The same construction can be used for cutting the filename, so you can do
for f in *_*.jpg; do
mv "$f" "${PWD##*/}_${f#*_}"
done

Counting char in word with different delimiter

I am writing a shell script, in which I get the location of java via which java. As response I get (for example)
/usr/pi/java7_32/jre/bin/java.
I need the path to be cut so it ends with /jre/, more specificly
/usr/pi/java7_32/jre/
as the programm this information is provided to can not handle the longe path to work.
I have used cut with the / as delimiter and as I thought that the directory of the Java installation is always the same, therfore a
cut -d'/' -f1-5
worked just fine to get this result:
/usr/pi/java7_32/jre/
But as the java could be installed somewhere else aswell, for example at
/usr/java8_64/jre/
the statement would not work correctly.
I need tried sed, awk, cut and different combinations of them but found no answer I liked.
As the title says I would count the number of appereance of the car / until the substing jre/ is found under the premisse that the shell counts from the left to the right.
The incremented number would be the the field I want to see by cutting with the delimiter.
path=$(which java) # example: /usr/pi/java7_32/jre/bin/java
i=0
#while loop with a statment which would go through path
while substring != jre/ {
if (char = '/')
i++
}
#cut the path
path=$path | cut -d'/' -f 1-i
#/usr/pi/java7_32/jre result
Problem is the eventual difference in the path before and after
/java7_64/jre/, like */java*/jre/
I am open for any ideas and solutions, thanks a lot!
Greets
Jan
You can use the shell's built-in parameter operations to get what you need. (This will save the need to create other processes to extract the information you need).
jpath="$(which java)"
# jpath now /usr/pi/java7_32/jre/bin/java
echo ${jpath%jre*}jre
produces
/usr/pi/java7_32/jre
The same works for
jpath=/usr/java8_64/jre/
The % indicates remove from the right side of the string the matching shell reg-ex pattern. Then we just put back jre to have your required path.
You can overwrite the value from which java
jpath=${jpath%jre*}jre
IHTH
You can get the results with grep:
path=$(echo $path | grep -o ".*/jre/")

Recursively dumping the content of a file located in different folders

Still being a newbie with bash-programming I am fighting with another task I got. A specific file called ".dump" (yes, with a dot in the beginning) is located in each folder and always contains three numbers. I need to dump the third number in a variable in case it is greater than 1000 and then print this and the folder name locating the number. So the outcome should look like this:
"/dir1/ 1245"
"/dir1/subdir1/ 3434"
"/dir1/subdir2/ 10003"
"/dir1/subdir2/subsubdir3/ 4123"
"/dir2/ 45440"
(without "" and each of them in a new line (not sure, why it is not shown correctly here))
I was playing around with awk, find and while, but the results are that bad that I do not wanna post them here, which I hope is understood. So any code snippet helping is appreciated.
This could be cleaned up, but should work:
find /dir1 /dir2 -name .dump -exec sh -c 'k=$(awk "\$3 > 1000{print \$3; exit 1}" $0) ||
echo ${0%.dump} $k ' {} \;
(I'm assuming that all three numbers in your .dump files appear on one line. The awk will need to be modified if the input is in a different format.)

replace $1 variable in file with 1-10000

I want to create 1000s of this one file.
All I need to replace in the file is one var
kitename = $1
But i want to do that 1000s of times to create 1000s of diff files.
I'm sure it involves a loop.
people answering people is more effective than google search!
thx
I'm not really sure what you are asking here, but the following will create 1000 files named filename.n containing 1 line each which is "kite name = n" for n = 1 to n = 1000
for i in {1..1000}
do
echo "kitename = $i" > filename.$i
done
If you have mysql installed, it comes with a lovely command line util called "replace" which replaces files in place across any number of files. Too few people know about this, given it exists on most linux boxen everywhere. Syntax is easy:
replace SEARCH_STRING REPLACEMENT -- targetfiles*
If you MUST use sed for this... that's okay too :) The syntax is similar:
sed -i.bak s/SEARCH_STRING/REPLACEMENT/g targetfile.txt
So if you're just using numbers, you'd use something like:
for a in {1..1000}
do
cp inputFile.html outputFile-$a.html
replace kitename $a -- outputFile-$a.html
done
This will produce a bunch of files "outputFile-1.html" through "outputFile-1000.html", with the word "kitename" replaced by the relevant number, inside the file.
But, if you want to read your lines from a file rather than generate them by magic, you might want something more like this (we're not using "for a in cat file" since that splits on words, and I'm assuming here you'd have maybe multi-word replacement strings that you'd want to put in:
cat kitenames.txt | while read -r a
do
cp inputFile.html "outputFile-$a.html"
replace kitename "$a" -- kitename-$a
done
This will produce a bunch of files like "outputFile-red kite.html" and "outputFile-kite with no string.html", which have the word "kitename" replaced by the relevant name, inside the file.

How to rename files keeping a variable part of the original file name

I'm trying to make a script that will go into a directory and run my own application with each file matching a regular expression, specifically Test[0-9]*.txt.
My input filenames look like this TestXX.txt. Now, I could just use cut and chop off the Test and .txt, but how would I do this if XX wasn't predefined to be two digits? What would I do if I had Test1.txt, ..., Test10.txt? In other words, How would I get the [0-9]* part?
Just so you know, I want to be able to make a OutputXX.txt :)
EDIT:
I have files with filename Test[0-9]*.txt and I want to manipulate the string into Output[0-9]*.txt
Would something like this help?
#!/bin/bash
for f in Test*.txt ;
do
process < $f > ${f/Test/Output}
done
Bash Shell Parameter Expansion
A good tutorial on regexes in bash is here. Summarizing, you need something like:
if [[$filenamein =~ "^Test([0-9]*).txt$"]]; then
filenameout = "Output${BASH_REMATCH[1]}.txt"
and so on. The key is that, when you perform the =~" regex-match, the "sub-matches" to parentheses-enclosed groups in the RE are set in the entries of arrayBASH_REMATCH(the[0]entry is the whole match,1` the first parentheses-enclosed group, etc).
You need to use rounded brackets around the part you want to keep.
i.e. "Test([0-9]*).txt"
The syntax for replacing these bracketed groups varies between programs, but you'll probably find you can use \1 , something like this:
s/Test(0-9*).txt/Output\1.txt/
If you're using a unix shell, then 'sed' might be your best bet for performing the transformation.
http://www.grymoire.com/Unix/Sed.html#uh-4
Hope that helps
for file in Test[0-9]*.txt;
do
num=${file//[^0-9]/}
process $file > "Output${num}.txt"
done

Resources