Design bash script for extract unknown file name section - bash

I have to make a simple design but i don't know how to do it
I have two folders folderA and folder B. Inside of folderA I have two files named "file_" and "file_anything". The "anything" part of the second file is some text i don't know (or that can be different for various folders). What i need to do is change the name of folderB to whatever text is "anything" without needing to know specifically
I would aprreciate if beyond the procedure someone gives me a link to the topics taht you have use to understand and be able to modify or adapt the solutuions to others situations. I want to learn
thanks
edit:
i need this solution to be included inside a bash script (no perl functions) that is gonna be repeated for a lot of couples of folders that have the same estructure. For example:
FolderA (with files "file_" and "file_manana") and FolderB--- change to---> FolderA and manana (former FolderB)
FolderC (with files "file_" and "file_monkey") and FolderD--- change to ---> FolderA and monkey (former FolderD)
FolderE (with files "file_" and "file_moose") and FolderF--- change to ---> FolderA and moose (former FolderF)
many many times with many more folders
Edit 2:
Ok, i'm getting closer. the problem now is this: I define fn like this: fn=file_a* knowing that in that folder is only one file that matches that indictation. I confirm this doing echo $fn. Now i do this: fn=${fn##*_}. However, fn don't tranform into "anything" but into "a*". What do i fix that? #David Zaslavsky
Edit 3: Thx #chepner . BASH_REMATCH was the way to go. I use it with a little change because the way you wrote it didin't work for me
for f in FolderA/file_*; do # I assume a single match
[[ $f =~ "file_"(.*) ]]
suffix=${BASH_REMATCH[1]}
mv FolderB "$suffix"
done
Note the quotation marks. Between them I can even include spaces
Thx everyone

I don't quite understand what your end result should be, but you can extract the trailing part of file_anything with the following:
$ f="file_manana"
$ [[ $f =~ file_(.*) ]]
$ suffix=${BASH_REMATCH[1]}
$ echo $suffix
manana
So what I think you want to do is
for f in FolderA/file_*; do # I assume a single match
[[ $f =~ file_(.*) ]]
suffix=${BASH_REMATCH[1]}
mv FolderB "$suffix"
done

You'll want to look up parameter expansion in the Bash manual. If you store the name of file_anything in a variable, let's say fn, you can use ${fn##*_} to remove the longest prefix matching *_ from the filename, and then you can use that in a mv command to rename folder B.

Related

How do I move files into folders with similar names in Unix?

I'm sorry if this question has been asked before, I just didn't know how to word it as a search query.
I have a set of folders that look like this:
Brain - Amygdala/ Brain - Spinal cord (cervical c-1)/ Skin - Sun Exposed (Lower leg)/
Brain - Caudate (basal ganglia)/ Lung/ Whole Blood/
I also have a set of files that look like this:
Brain_Amygdala.v7.covariates_output.txt Skin_Not_Sun_Exposed_Suprapubic.v7.covariates_output.txt
Brain_Caudate_basal_ganglia.v7.covariates_output.txt Skin_Sun_Exposed_Lower_leg.v7.covariates_output.txt
Brain_Spinal_cord_cervical_c-1.v7.covariates_output.txt Whole_Blood.v7.covariates_output.txt
As you can see, the files do not perfectly match up with the directories in their names. For example, Brain_Amygdala.v7.covariates_output.txt is not totally identical to Brain - Amygdala/. Even if we were to excise the tissue name from the covariates file, Brain_Amygdala is formatted differently from its corresponding folder.
Same with Whole Blood/. It is different from Whole_Blood.v7.covariates_output.txt, even if you were to isolate the tissue name from the covariates file Whole_Blood.
What I want to do, however, is to move each of these tissue files to their corresponding folder. If you notice, the covariate files are named after the tissue leading up to the first dot . in the file name. They are separated by underscores _. How I was thinking about approaching this was to break up the first few words leading up to the first . of the file name so that I can easily move it to its corresponding file.
e.g.
Brain_Amygdala.v7.covariates_output.txt -> Brain*Amygdala [mv]-> Brain*Amygdala/
a) I'm not sure how to isolate the first words of a file name leading up to the first . in a filename
b) if I were to do that, I don't know how to insert a wildcard in between each word and match that to the corresponding folder.
However, I am completely open to other ways of doing something like this.
Not a full answer, but it should address some of your concerns:
a) to isolate the first word of a string, leading up to the first .: use Parameter Expansions
string=Brain_Amygdala.v7.covariates_output.txt
until_dot=${string%%.*}
echo "$until_dot"
will output Brain_Amygdala (which we saved in the variable until_dot).
b) You may want to use the ${parameter/pattern/string} parameter expansion:
# Replace all non-alphabetic characters by the glob *
glob_pattern=${until_dot//[^[:alpha:]]/*}
echo "$glob_pattern"
will output (with the same variables as above) Brain*Amygdala
c) To use all of this: it's probably a good idea to determine the possible targets first, and do some basic checks:
# Use nullglob to have non matching glob expand to nothing
shopt -s nullglob
# DO NOT USE QUOTES IN THE FOLLOWING EXPANSION:
# the variable is actually a glob!
# Could also do dirs=( $glob_pattern*/ ) to check if directory
dirs=( $glob_pattern/ )
# Now check how many matches there are:
if ((${#dirs[#]} == 0)); then
echo >&2 "No matches for $glob_pattern"
elif ((${#dirs[#]} > 1)); then
echo >&2 "More than one matches for $glob_pattern: ${dirs[#]}"
else
echo "All good!"
# Remove the echo to actually perform the move
echo mv "$string" "${dirs[0]}"
fi
I don't know how your data will effectively conform to these, but I hope this answer actually answers some of your questions! (and to learn more about parameter expansions, do read — and experiment with — the link to the reference I gave you).

Running a process on every combination between files in two folders

I have two folders where the 1st has 19 .fa files and the 2nd has 37096 .fa files
Files in the 1st folder are named BF_genomea[a-s].fa, and files in the 2nd are named [1-37096]ZF_genome.fa
I have to run this process where lastz filein1stfolder filein2ndfolder [arguments] > outputfile.axt, so that I run every file in the 1st folder against every file in the 2nd folder.
Any sort of output file's naming would serve, as far as it allows for id which particular combination of parent files they came from, and they have extension .axt
This is what I have done so far
for file in /tibet/madzays/finch_data/BF_genome_split/*.fa; do for otherfile in /tibet/madzays/finch_data/ZF_genome_split/*.fa; name="${file##*/}"; othername="${otherfile##*/}"; lastz $file $otherfile --step=19 --hspthresh=2200 --gappedthresh=10000 --ydrop=3400 --inner=2000 --seed=12of19 --format=axt --scores=/tibet/madzays/finch_data/BFvsZFLASTZ/HoxD55.q > /home/madzays/qsub/test/"$name""$othername".axt; done; done
Ad I said in a comment, the inner loop is missing a do keyword (for otherfile in pattern; do <-- right there). Is this in the form of a script file? If so, you should add a shebang as the first line to tell the OS how to run the script. And break it into multiple lines and indent the contents of the loops, to make it easier to read (and easier to spot problems like the missing do).
Off the top of my head, I see one other thing I'd change: the output filenames are going to be pretty ugly, just the two input files mashed together with a ".atx" on the end (along the lines of "BF_genomeac.fa14ZF_genome.fa.axt"). I'd parse the IDs out of the input filenames and then use them to build a more reasonable output filename convention. Something like this
#!/bin/bash
for file in /tibet/madzays/finch_data/BF_genome_split/*.fa; do
for otherfile in /tibet/madzays/finch_data/ZF_genome_split/*.fa; do
name="${file##*/}"
tmp="${name#BF_genomea}" # remove filename prefix
id="${tmp%.*}" # remove extension to get the ID
othername="${otherfile##*/}"
otherid="${othername%ZF_genome.fa}" # just have to remove a suffix here
lastz $file $otherfile --step=19 --hspthresh=2200 --gappedthresh=10000 --ydrop=3400 --inner=2000 --seed=12of19 --format=axt --scores=/tibet/madzays/finch_data/BFvsZFLASTZ/HoxD55.q > "/home/madzays/qsub/test/BF${id}_${otherid}ZF.axt"
done
done
The code can nearly directly been translated from your requierements:
base=/tibet/madzays/finch_data
for b in {a..s}
do
for z in {1..37096}
do
lastz $base/BF_genome_split/${b}.fa $base/ZF_genome_split/${z}.fa --hspthresh=2200 --gappedthresh=10000 --ydrop=3400 --inner=2000 --seed=12of19 --format=axt --scores=$base/BFvsZFLASTZ/HoxD55.q > /home/madzays/qsub/test/${b}-${z}.axt
done
done
Note that oneliners easily lead to errors, like missing dos, which are then hard to find from the error message (error in line 1).

In Bash, how do I take pairs of files and put them into directories with matching names?

I have a bunch of files like this (currently all in one directory, but I can separate them by file type or whatever if need be):
Pep_1-1.pdb
Pep_1-1.psf
Pep_1-2.pdb
Pep_1-2.psf
Pep_1-3.pdb
...
I want to take each pair, make a directory with the corresponding name and then place the two files in that directory (steps don't have to be in this order, I just care about the outcome), so that I have directories like Pep_1-1, Pep_1-2, etc. each containing the two corresponding files. What's the most efficient way to do that?
Thanks :)
Assuming the files always exist in pairs, it's easiest to iterate over one of the pair and extract the name sans extension.
for f in *.pdb; do
basename=${f%.*}
mkdir "$basename"
mv "$f" "$basename.psf" "$basename"
done
You could use sed and awk or use basename but I think simple problems should be met with simple solutions. This is why I asked if your files will always be in the form of Pep_1-#.pdb and Pep_1-#.psf.
Simply build the for loop as follows:
for i in `seq 1 50`;
do
mkdir "Pep_1-$i";
# Cannot do glob expansion
cp "Pep_1-$i.pdb" "Pep_1-$i/";
cp "Pep_1-$i.psf" "Pep_1-$i/";
done
Always backup your directories before testing!

Bash scripting print list of files

Its my first time to use BASH scripting and been looking to some tutorials but cant figure out some codes. I just want to list all the files in a folder, but i cant do it.
Heres my code so far.
#!/bin/bash
# My first script
echo "Printing files..."
FILES="/Bash/sample/*"
for f in $FILES
do
echo "this is $f"
done
and here is my output..
Printing files...
this is /Bash/sample/*
What is wrong with my code?
You misunderstood what bash means by the word "in". The statement for f in $FILES simply iterates over (space-delimited) words in the string $FILES, whose value is "/Bash/sample" (one word). You seemingly want the files that are "in" the named directory, a spatial metaphor that bash's syntax doesn't assume, so you would have to explicitly tell it to list the files.
for f in `ls $FILES` # illustrates the problem - but don't actually do this (see below)
...
might do it. This converts the output of the ls command into a string, "in" which there will be one word per file.
NB: this example is to help understand what "in" means but is not a good general solution. It will run into trouble as soon as one of the files has a space in its name—such files will contribute two or more words to the list, each of which taken alone may not be a valid filename. This highlights (a) that you should always take extra steps to program around the whitespace problem in bash and similar shells, and (b) that you should avoid spaces in your own file and directory names, because you'll come across plenty of otherwise useful third-party scripts and utilities that have not made the effort to comply with (a). Unfortunately, proper compliance can often lead to quite obfuscated syntax in bash.
I think problem in path "/Bash/sample/*".
U need change this location to absolute, for example:
/home/username/Bash/sample/*
Or use relative path, for example:
~/Bash/sample/*
On most systems this is fully equivalent for:
/home/username/Bash/sample/*
Where username is your current username, use whoami to see your current username.
Best place for learning Bash: http://www.tldp.org/LDP/abs/html/index.html
This should work:
echo "Printing files..."
FILES=(/Bash/sample/*) # create an array.
# Works with filenames containing spaces.
# String variable does not work for that case.
for f in "${FILES[#]}" # iterate over the array.
do
echo "this is $f"
done
& you should not parse ls output.
Take a list of your files)
If you want to take list of your files and see them:
ls ###Takes list###
ls -sh ###Takes list + File size###
...
If you want to send list of files to a file to read and check them later:
ls > FileName.Format ###Takes list and sends them to a file###
ls > FileName.Format ###Takes list with file size and sends them to a file###

replace $1 variable in file with 1-10000

I want to create 1000s of this one file.
All I need to replace in the file is one var
kitename = $1
But i want to do that 1000s of times to create 1000s of diff files.
I'm sure it involves a loop.
people answering people is more effective than google search!
thx
I'm not really sure what you are asking here, but the following will create 1000 files named filename.n containing 1 line each which is "kite name = n" for n = 1 to n = 1000
for i in {1..1000}
do
echo "kitename = $i" > filename.$i
done
If you have mysql installed, it comes with a lovely command line util called "replace" which replaces files in place across any number of files. Too few people know about this, given it exists on most linux boxen everywhere. Syntax is easy:
replace SEARCH_STRING REPLACEMENT -- targetfiles*
If you MUST use sed for this... that's okay too :) The syntax is similar:
sed -i.bak s/SEARCH_STRING/REPLACEMENT/g targetfile.txt
So if you're just using numbers, you'd use something like:
for a in {1..1000}
do
cp inputFile.html outputFile-$a.html
replace kitename $a -- outputFile-$a.html
done
This will produce a bunch of files "outputFile-1.html" through "outputFile-1000.html", with the word "kitename" replaced by the relevant number, inside the file.
But, if you want to read your lines from a file rather than generate them by magic, you might want something more like this (we're not using "for a in cat file" since that splits on words, and I'm assuming here you'd have maybe multi-word replacement strings that you'd want to put in:
cat kitenames.txt | while read -r a
do
cp inputFile.html "outputFile-$a.html"
replace kitename "$a" -- kitename-$a
done
This will produce a bunch of files like "outputFile-red kite.html" and "outputFile-kite with no string.html", which have the word "kitename" replaced by the relevant name, inside the file.

Resources