In Bash, how do I take pairs of files and put them into directories with matching names? - bash

I have a bunch of files like this (currently all in one directory, but I can separate them by file type or whatever if need be):
Pep_1-1.pdb
Pep_1-1.psf
Pep_1-2.pdb
Pep_1-2.psf
Pep_1-3.pdb
...
I want to take each pair, make a directory with the corresponding name and then place the two files in that directory (steps don't have to be in this order, I just care about the outcome), so that I have directories like Pep_1-1, Pep_1-2, etc. each containing the two corresponding files. What's the most efficient way to do that?
Thanks :)

Assuming the files always exist in pairs, it's easiest to iterate over one of the pair and extract the name sans extension.
for f in *.pdb; do
basename=${f%.*}
mkdir "$basename"
mv "$f" "$basename.psf" "$basename"
done

You could use sed and awk or use basename but I think simple problems should be met with simple solutions. This is why I asked if your files will always be in the form of Pep_1-#.pdb and Pep_1-#.psf.
Simply build the for loop as follows:
for i in `seq 1 50`;
do
mkdir "Pep_1-$i";
# Cannot do glob expansion
cp "Pep_1-$i.pdb" "Pep_1-$i/";
cp "Pep_1-$i.psf" "Pep_1-$i/";
done
Always backup your directories before testing!

Related

Bash scripting print list of files

Its my first time to use BASH scripting and been looking to some tutorials but cant figure out some codes. I just want to list all the files in a folder, but i cant do it.
Heres my code so far.
#!/bin/bash
# My first script
echo "Printing files..."
FILES="/Bash/sample/*"
for f in $FILES
do
echo "this is $f"
done
and here is my output..
Printing files...
this is /Bash/sample/*
What is wrong with my code?
You misunderstood what bash means by the word "in". The statement for f in $FILES simply iterates over (space-delimited) words in the string $FILES, whose value is "/Bash/sample" (one word). You seemingly want the files that are "in" the named directory, a spatial metaphor that bash's syntax doesn't assume, so you would have to explicitly tell it to list the files.
for f in `ls $FILES` # illustrates the problem - but don't actually do this (see below)
...
might do it. This converts the output of the ls command into a string, "in" which there will be one word per file.
NB: this example is to help understand what "in" means but is not a good general solution. It will run into trouble as soon as one of the files has a space in its name—such files will contribute two or more words to the list, each of which taken alone may not be a valid filename. This highlights (a) that you should always take extra steps to program around the whitespace problem in bash and similar shells, and (b) that you should avoid spaces in your own file and directory names, because you'll come across plenty of otherwise useful third-party scripts and utilities that have not made the effort to comply with (a). Unfortunately, proper compliance can often lead to quite obfuscated syntax in bash.
I think problem in path "/Bash/sample/*".
U need change this location to absolute, for example:
/home/username/Bash/sample/*
Or use relative path, for example:
~/Bash/sample/*
On most systems this is fully equivalent for:
/home/username/Bash/sample/*
Where username is your current username, use whoami to see your current username.
Best place for learning Bash: http://www.tldp.org/LDP/abs/html/index.html
This should work:
echo "Printing files..."
FILES=(/Bash/sample/*) # create an array.
# Works with filenames containing spaces.
# String variable does not work for that case.
for f in "${FILES[#]}" # iterate over the array.
do
echo "this is $f"
done
& you should not parse ls output.
Take a list of your files)
If you want to take list of your files and see them:
ls ###Takes list###
ls -sh ###Takes list + File size###
...
If you want to send list of files to a file to read and check them later:
ls > FileName.Format ###Takes list and sends them to a file###
ls > FileName.Format ###Takes list with file size and sends them to a file###

Bash: find references to filenames in other files

Problem:
I have a list of filenames, filenames.txt:
Eg.
/usr/share/important-library.c
/usr/share/youneedthis-header.h
/lib/delete/this-at-your-peril.c
I need to rename or delete these files and I need to find references to these files in a project directory tree: /home/noob/my-project/ so I can remove or correct them.
My thought is to use bash to extract the filename: basename filename, then grep for it in the project directory using a for loop.
FILELISTING=listing.txt
PROJECTDIR=/home/noob/my-project/
for f in $(cat "$FILELISTING"); do
extension=$(basename ${f##*.})
filename=$(basename ${f%.*})
pattern="$filename"\\."$extension"
grep -r "$pattern" "$PROJECTDIR"
done
I could royally screw up this project -- does anyone see a flaw in my logic; better: do you see a more reliable scalable way to do this over a huge directory tree? Let's assume that revision control is off the table ( it is, in fact ).
A few comments:
Instead of
for f in $(cat "$FILELISTING") ; do
...
done
it's somewhat safer to write
while IFS= read -r f ; do
...
done < "$FILELISTING"
That way, your code will have no problem with spaces, tabs, asterisks, and so on in the filenames (though it still won't support newlines).
Your goal in separating f into extension and filename, and then reassembling them with \., seems to be that you want the filename to be treated as a literal string; right? Like, you're worried that grep will treat the . as meaning "any character" rather than as "one dot". A more general solution is to use grep's -F option, which tells it to treat the pattern as a fixed string rather than a regex:
grep -r -F "$f" "$PROJECTDIR"
Your introduction mentions using basename, but then you don't actually use it. Is that intentional?
If your non-use of basename is intentional, then filenames.txt really just contains a list of patterns to search for; you don't even need to write a loop, in this case, since grep's -f option tells it to take a newline-separated list of patterns from a file:
grep -r -F -f "$FILELISTING" "$PROJECTDIR"
You should back up your project, using something like tar -czf backup.tar.gz "$PROJECTDIR". "Revision control is off the table" doesn't mean you can't have a rollback strategy!
Edited to add:
To pass all your base-names to grep at once, in the hopes that it can do something smarter with them than just looping over them just as though the calls were separate, you can write something like:
grep -r -F "$(sed 's#.*/##g' "$FILELISTING")" "$PROJECTDIR"
(I used sed rather than while+basename for brevity's sake, but you can an entire loop inside the "$(...)" if you prefer.)
This is a job for an IDE.
You're right that this is a perilous task, and unless you know the build process and the search directories and the order of the directories, you really can't say what header is with which file.
Let's take something as simple as this:
# include "sql.h"
You have a file in the project headers/sql.h. Is that file needed? Maybe it is. Maybe not. There's also a /usr/include/sql.h. Maybe that's the one that's actually used. You can't tell without looking at the Makefile and seeing the order of the include directories which is which.
Then, there are the libraries that get included and may need their own header files in order to be able to compile. And, once you get to the C preprocessor, you really will have a hard time.
This is a task for an IDE (Integrated Development Environment). An IDE builds the project and tracks file and other resource dependencies. In the Java world, most people use Eclipse, and there is a C/C++ plugin for those developers. However, there are over 2 dozen listed in Wikipedia and almost all of them are open source. The best one will depend upon your environment.

Design bash script for extract unknown file name section

I have to make a simple design but i don't know how to do it
I have two folders folderA and folder B. Inside of folderA I have two files named "file_" and "file_anything". The "anything" part of the second file is some text i don't know (or that can be different for various folders). What i need to do is change the name of folderB to whatever text is "anything" without needing to know specifically
I would aprreciate if beyond the procedure someone gives me a link to the topics taht you have use to understand and be able to modify or adapt the solutuions to others situations. I want to learn
thanks
edit:
i need this solution to be included inside a bash script (no perl functions) that is gonna be repeated for a lot of couples of folders that have the same estructure. For example:
FolderA (with files "file_" and "file_manana") and FolderB--- change to---> FolderA and manana (former FolderB)
FolderC (with files "file_" and "file_monkey") and FolderD--- change to ---> FolderA and monkey (former FolderD)
FolderE (with files "file_" and "file_moose") and FolderF--- change to ---> FolderA and moose (former FolderF)
many many times with many more folders
Edit 2:
Ok, i'm getting closer. the problem now is this: I define fn like this: fn=file_a* knowing that in that folder is only one file that matches that indictation. I confirm this doing echo $fn. Now i do this: fn=${fn##*_}. However, fn don't tranform into "anything" but into "a*". What do i fix that? #David Zaslavsky
Edit 3: Thx #chepner . BASH_REMATCH was the way to go. I use it with a little change because the way you wrote it didin't work for me
for f in FolderA/file_*; do # I assume a single match
[[ $f =~ "file_"(.*) ]]
suffix=${BASH_REMATCH[1]}
mv FolderB "$suffix"
done
Note the quotation marks. Between them I can even include spaces
Thx everyone
I don't quite understand what your end result should be, but you can extract the trailing part of file_anything with the following:
$ f="file_manana"
$ [[ $f =~ file_(.*) ]]
$ suffix=${BASH_REMATCH[1]}
$ echo $suffix
manana
So what I think you want to do is
for f in FolderA/file_*; do # I assume a single match
[[ $f =~ file_(.*) ]]
suffix=${BASH_REMATCH[1]}
mv FolderB "$suffix"
done
You'll want to look up parameter expansion in the Bash manual. If you store the name of file_anything in a variable, let's say fn, you can use ${fn##*_} to remove the longest prefix matching *_ from the filename, and then you can use that in a mv command to rename folder B.

renaming images into an ordered sequence using shell

I have a bunch of folders containing images that are in order but are not sequential like this:
/root
/f1
img21.jpg
img24.jpg
img26.jpg
img27.jpg
/f2
img06.jpg
img14.jpg
img36.jpg
img57.jpg
and I want to get them looking like this, having the folder title as well as having all the images in sequential order:
/root
/f1
f1_01.jpg
f1_02.jpg
f1_03.jpg
f1_04.jpg
/f2
f2_01.jpg
f2_02.jpg
f2_03.jpg
f2_04.jpg
I'm not sure how to do this using shell script.
Thanks in advance!
Use a for loop to iterate over the directories and another for loop to iterate over the files. Maintain a counter that you increment by 1 for each file.
There's no direct convenient way of padding numbers with leading zeroes. You can call printf, but that's a little slow. A useful, fast trick is to start counting at 101 (if you want two-digit numbers — 1000 if you want 3-digit numbers, and so on) and strip the leading 1.
cd /root
for d in */; do
i=100
for f in "$d/"*; do
mv -- "$f" "$d/${d%/}_${i#1}.${f##*.}"
i=$(($i+1))
done
done
${d%/} strips / at the end of $d, ${i#1} strips 1 at the start of $i and ${f##*.} strip everything from $f except what follows the last .. These constructs are documented in the section on parameter expansion in your shell's manual.
Note that this script assumes that the target file names will not clash with the names of existing files. If you have a directory called img, some files will be overwritten. If this may be a problem, the simplest method is to first move all the files to a different directory, then move them back to the original directory as you rename them.
Within a directory, ls will give you files in lexical order, which gets you the correct sort. So you can do something like this:
let i=0
ls *.jpg | while read file; do
mv $file prefix_$(printf "%02d" $i).jpg
let i++
done
This will take all the *.jpg files and rename them starting with prefix_00.jpg, prefix_01.jpg and so forth.
This obviously only works for a single directory, but hopefully with a little work you can use this to build something that will do what you want.

Way to move files in bash and rename copied file automatically without overwriting an existing file

I'm doing some major restructuring of large numbers of directories with tons of jpgs, some of which my have the same name as files in other directories. I want to move / copy files to alternate directories and have bash automatically rename them if the name matches another file in that directory (renaming IMG_238.jpg to IMG_238_COPY1.jpg, IMG_238_COPY2.jpg, etc), instead of overwriting the existing file.
I've set up a script that takes jpegs and moves them to a new directory based on exif data. The final line of the script that moves one jpg is: mv -n "$JPEGFILE" "$DIRNAME"
I'm using the -n option because I don't want to overwrite files, but now I have to go and manually sort through the ones that didn't get moved / copied. My GUI does this automatically... Is there a relatively simple way to do this in bash?
(In case it matters, I'm using bash 3.2 in Mac OSX Lion).
This ought to do it
# strip path, if any
fname="${JPEGFILE##*/}"
[ -f "$DIRNAME/$fname" ] && {
n=1
while [ -f "$DIRNAME/${fname%.*}_COPY${n}.${fname##*.}" ] ; do
let n+=1
done
mv "$JPEGFILE" "$DIRNAME/${fname%.*}_COPY${n}.${fname##*.}"
} || mv "$JPEGFILE" "$DIRNAME"
EDIT: Improved.
You can try downloading and seeing if Ubuntu/Debian's Perl-based rename works. It has sed-style functionality. Quoth the man page (on my system, but the script should be the same one as linked):
"rename" renames the filenames supplied according to the rule specified
as the first argument. The perlexpr argument is a Perl expression
which is expected to modify the $_ string in Perl for at least some of
the filenames specified. If a given filename is not modified by the
expression, it will not be renamed. If no filenames are given on the
command line, filenames will be read via standard input.
For example, to rename all files matching "*.bak" to strip the
extension, you might say
rename 's/\.bak$//' *.bak
To translate uppercase names to lower, you'd use
rename 'y/A-Z/a-z/' *

Resources