bash - rename fasta headers and filenames within subdirectories - append prefix - bash

A simplified example of my file structure is this:
/Assemblies/A_velvet/contigs.fasta
/Assemblies/A_velvet/info.log
/Assemblies/BB_velvet/contigs.fasta
/Assemblies/BB_velvet/info.log
I am trying to write a script that I can pass the Assemblies directory - then it will:
loop through each subdirectory (A_velvet, BB_velvet) - take the strain name (A, BB) add it as a prefix to all files within (ie. A_contigs.fasta, A_file.log).
Add the same prefix to the fasta headers within the contigs.fasta file.
Maybe use sed command to substitute ('s/>NODE/>${name}/g')?
I've found alot of very closely related questions, but can't seem to make them work. Any help is very much appreciated! Here's my code so far:
#!/bin/bash
#Run with: ./test.sh <assembly_directory>
#dir= directory with all assemblies inside it
dir=$1
for subdir in $dir
do
if [ -d "${subdir}" ]; then
name=`basename $subdir|cut -d '_' -f 1`;
echo "${subdir} name ${name}"
for * in $subdir;
`do mv "$file" "$subdir/${name}_$(basename "$file")"; done
fi
done

Your method for looping through the contents of a directory isn't going to work. In the first case, the only item in your loop is $dir. I'm not sure what you're trying to do in the second case. Try something like this:
dir=$1
for subdir in `ls $dir`
do
if [ -d "${subdir}" ]; then
name=`basename $subdir|cut -d '_' -f 1`;
echo "${subdir} name ${name}"
for file in `ls $subdir`;
do mv "$file" "$subdir/${name}_$(basename "$file")"; done
fi
done

Related

How to iterate over a directory and display only filename

I would want to iterate over contents of a directory and list only ordinary files.
The path of the directory is given as an user input. The script works if the input is current directory but not with others.
I am aware that this can be done using ls.. but i need to use a for .. in control structure.
#!/bin/bash
echo "Enter the path:"
read path
contents=$(ls $path)
for content in $contents
do
if [ -f $content ];
then
echo $content
fi
done
ls is only returning the file names, not including the path. You need to either:
Change your working directory to the path in question, or
Combine the path with the names for your -f test
Option #2 would just change:
if [ -f $content ];
to:
if [ -f "$path/$content" ];
Note that there are other issues here; ls may make changes to the output that break this, depending on wrapping. If you insist on using ls, you can at least make it (somewhat) safer with:
contents="$(command ls -1F "$path")"
You have two ways of doing this properly:
Either loop through the * pattern and test file type:
#!/usr/bin/env bash
echo "Enter the path:"
read -r path
for file in "$path/"*; do
if [ -f "$file" ]; then
echo "$file"
fi
done
Or using find to iterate a null delimited list of file-names:
#!/usr/bin/env bash
echo "Enter the path:"
read -r path
while IFS= read -r -d '' file; do
echo "$file"
done < <(
find "$path" -maxdepth 1 -type f -print0
)
The second way is preferred since it will properly handle files with special characters and offload the file-type check to the find command.
Use file, set to search for files (-type f) from $path directory:
find "$path" -type f
Here is what you could write:
#!/usr/bin/env bash
path=
while [[ ! $path ]]; do
read -p "Enter path: " path
done
for file in "$path"/*; do
[[ -f $file ]] && printf '%s\n' "$file"
done
If you want to traverse all the subdirectories recursively looking for files, you can use globstar:
shopt -s globstar
for file in "$path"/**; do
printf '%s\n' "$file"
done
In case you are looking for specific files based on one or more patterns or some other condition, you could use the find command to pick those files. See this post:
How to loop through file names returned by find?
Related
When to wrap quotes around a shell variable?
Why you shouldn't parse the output of ls
Is double square brackets [[ ]] preferable over single square brackets [ ] in Bash?

Bash: how to copy multiple files with same name to multiple folders

I am working on Linux machine.
I have a lot of files named the same, with a directory structure like this:
P45_input_foo/result.dat
P45_input_bar/result.dat
P45_input_tar/result.dat
P45_input_cool/result.dat ...
It is difficult to copy them one by one. I want to copy them into another folder named as data with similar folder names and file names:
/data/foo/result.dat
/data/bar/result.dat
/data/tar/result.dat
/data/cool/result.dat ...
In stead of copy them one by one what I should do?
Using a for loop in bash :
# we list every files following the pattern : ./<somedirname>/<any file>
# if you want to specify a format for the folders, you could change it here
# i.e. for your case you could write 'for f in P45*/*' to only match folders starting by P45
for f in */*
do
# we strip the path of the file from its filename
# i.e. 'P45_input_foo/result.dat' will become 'P45_input_foo'
newpath="${f%/*}"
# mkdir -p /data/${newpath##*_} will create our new data structure
# - /data/${newpath##*_} extract the last chain of character after a _, in our example, 'foo'
# - mkdir -p will recursively create our structure
# - cp "$f" "$_" will copy the file to our new directory. It will not launch if mkdir returns an error
mkdir -p /data/${newpath##*_} && cp "$f" "$_"
done
the ${newpath##*_} and ${f%/*} usage are part of Bash string manipulation methods. You can read more about it here.
You will need to extract the 3rd item after "_" :
P45_input_foo --> foo
create the directory (if needed) and copy the file to it. Something like this (not tested, might need editing):
STARTING_DIR="/"
cd "$STARTING_DIR"
VAR=$(ls -1)
while read DIR; do
TARGET_DIR=$(echo "$DIR" | cut -d'_' -f3)
NEW_DIR="/data/$DIR"
if [ ! -d "$NEW_DIR" ]; then
mkdir "$NEW_DIR"
fi
cp "$DIR/result.dat" "$NEW_DIR/result.dat"
if [ $? -ne 0 ];
echo "ERROR: encountered an error while copying"
fi
done <<<"$VAR"
Explanation: assuming all the paths you've mentioned are under root / (if not change STARTING_PATH accordingly). With ls you get the list of the directories, store the output in VAR. Pass the content of VAR to the while loop.
A bit of find and with a few bash tricks, the below script could do the trick for you. Remember to run the script without the mv and see if "/data/"$folder"/" is the actual path that you want to move the file(s).
#!/bin/bash
while IFS= read -r -d '' file
do
fileNew="${file%/*}" # Everything before the last '\'
fileNew="${fileNew#*/}" # Everything after the last '\'
IFS="_" read _ _ folder <<<"$fileNew"
mv -v "$file" "/data/"$folder"/"
done < <(find . -type f -name "result.dat" -print0)

Shell Script to list files in a given directory and if they are files or directories

Currently learning some bash scripting and having an issue with a question involving listing all files in a given directory and stating if they are a file or directory. The issue I am having is that I only get either my current directory or if a specify a directory it will just say that it is a directory eg. /home/user/shell_scripts will return shell_scipts is a directory rather than the files contained within it.
This is what I have so far:
dir=$dir
for file in $dir; do
if [[ -d $file ]]; then
echo "$file is a directory"
if [[ -f $file ]]; then
echo "$file is a regular file"
fi
done
Your line:
for file in $dir; do
will expand $dir just to a single directory string. What you need to do is expand that to a list of files in the directory. You could do this using the following:
for file in "${dir}/"* ; do
This will expand the "${dir}/"* section into a name-only list of the current directory. As Biffen points out, this should guarantee that the file list wont end up with split partial file names in file if any of them contain whitespace.
If you want to recurse into the directories in dir then using find might be a better approach. Simply use:
for file in $( find ${dir} ); do
Note that while simple, this will not handle files or directories with spaces in them. Because of this, I would be tempted to drop the loop and generate the output in one go. This might be slightly different than what you want, but is likely to be easier to read and a lot more efficient, especially with large numbers of files. For example, To list all the directories:
find ${dir} -maxdepth 1 -type d
and to list the files:
find ${dir} -maxdepth 1 -type f
if you want to iterate into directories below, then remove the -maxdepth 1
This is a good use for globbing:
for file in "$dir/"*
do
[[ -d "$file" ]] && echo "$file is a directory"
[[ -f "$file" ]] && echo "$file is a regular file"
done
This will work even if files in $dir have special characters in their names, such as spaces, asterisks and even newlines.
Also note that variables should be quoted ("$file"). But * must not be quoted. And I removed dir=$dir since it doesn't do anything (except break when $dir contains special characters).
ls -F ~ | \
sed 's#.*/$#/& is a Directory#;t quit;s#.*#/& is a File#;:quit;s/[*/=>#|] / /'
The -F "classify" switch appends a "/" if a file is a directory. The sed code prints the desired message, then removes the suffix.
for file in $(ls $dir)
do
[ -f $file ] && echo "$file is File"
[ -d $file ] && echo "$file is Directory"
done
or replace the
$(ls $dir)
with
`ls $`
If you want to list files that also start with . use:
for file in "${dir}/"* "${dir}/"/.[!.]* "${dir}/"/..?* ; do

How can I manipulate file names using bash and sed?

I am trying to loop through all the files in a directory.
I want to do some stuff on each file (convert it to xml, not included in example), then write the file to a new directory structure.
for file in `find /home/devel/stuff/static/ -iname "*.pdf"`;
do
echo $file;
sed -e 's/static/changethis/' $file > newfile +".xml";
echo $newfile;
done
I want the results to be:
$file => /home/devel/stuff/static/2002/hello.txt
$newfile => /home/devel/stuff/changethis/2002/hello.txt.xml
How do I have to change my sed line?
If you need to rename multiple files, I would suggest to use rename command:
# remove "-n" after you verify it is what you need
rename -n 's/hello/hi/g' $(find /home/devel/stuff/static/ -type f)
or, if you don't have rename try this:
find /home/devel/stuff/static/ -type f | while read FILE
do
# modify line below to do what you need, then remove leading "echo"
echo mv $FILE $(echo $FILE | sed 's/hello/hi/g')
done
Are you trying to change the filename? Then
for file in /home/devel/stuff/static/*/*.txt
do
echo "Moving $file"
mv "$file" "${file/static/changethis}.xml"
done
Please make sure /home/devel/stuff/static/*/*.txt is what you want before using the script.
First, you have to create the name of the new file based on the name of the initial file. The obvious solution is:
newfile=${file/static/changethis}.xml
Second you have to make sure that the new directory exists or create it if not:
mkdir -p $(dirname $newfile)
Then you can do something with your file:
doSomething < $file > $newfile
I wouldn't do the for loop because of the possibility of overloading your command line. Command lines have a limited length, and if you overload it, it'll simply drop off the excess without giving you any warning. It might work if your find returns 100 file. It might work if it returns 1000 files, but it might fail if your find returns 1000 files and you'll never know.
The best way to handle this is to pipe the find into a while read statement as glenn jackman.
The sed command only works on STDIN and on files, but not on file names, so if you want to munge your file name, you'll have to do something like this:
$newname="$(echo $oldname | sed 's/old/new/')"
to get the new name of the file. The $() construct executes the command and puts the results of the command on STDOUT.
So, your script will look something like this:
find /home/devel/stuff/static/ -name "*.pdf" | while read $file
do
echo $file;
newfile="$(echo $file | sed -e 's/static/changethis/')"
newfile="$newfile.xml"
echo $newfile;
done
Now, since you're renaming the file directory, you'll have to make sure the directory exists before you do your move or copy:
find /home/devel/stuff/static/ -name "*.pdf" | while read $file
do
echo $file;
newfile="$(echo $file | sed -e 's/static/changethis/')"
newfile="$newfile.xml"
echo $newfile;
#Check for directory and create it if it doesn't exist
$dirname=$(dirname "$newfile")
if [ ! -d "$dirname" ]
then
mkdir -p "$dirname"
fi
#Directory now exists, so you can do the move
mv "$file" "$newfile"
done
Note the quotation marks to handle the case there's a space in the file name.
By the way, instead of doing this:
if [ ! -d "$dirname" ]
then
mkdir -p "$dirname"
fi
You can do this:
[ -d "$dirname"] || mkdir -p "$dirname"
The || means to execute the following command only if the test isn't true. Thus, if [ -d "$dirname" ] is a false statement (the directory doesn't exist), you run mkdir.
It's a fairly common shortcut when you see shell scripts.
find ... | while read file; do
newfile=$(basename "$file").xml;
do something to "$file" > "$somedir/$newfile"
done
OUTPUT="$(pwd)";
for file in `find . -iname "*.pdf"`;
do
echo $file;
cp $file $file.xml
echo "file created in directory = {$OUTPUT}"
done
This will create a new file with name whatyourfilename.xml, for hello.pdf the new file created would be hello.pdf.xml, basically it creates a new file with .xml appended at the end.
Remember the above script finds files in the directory /home/devel/stuff/static/ whose file names match the matcher string of the find command (in this case *.pdf), and copies it to your present working directory.
The find command in this particular script only finds files with filenames ending with .pdf If you wanted to run this script for files with file names ending with .txt, then you need to change the find command to this find /home/devel/stuff/static/ -iname "*.txt",
Once I wanted to remove trailing -min from my files. i.e. wanted alg-min.jpg to turn into alg.jpg. so after some struggle, managed to figure something like this:
for f in *; do echo $f; mv $f $(echo $f | sed 's/-min//g');done;
Hope this helps someone willing to REMOVE or SUBTITUDE some part of their file names.

Trying to cat files - unrecognized wildcard

I am trying to create a file that contains all of the code of an app. I have created a file called catlist.txt so that the files are added in the order I need them.
A snippet of my catlist.txt:
app/controllers/application_controller.rb
app/views/layouts/*
app/models/account.rb
app/controllers/accounts_controller.rb
app/views/accounts/*
When I run the command the files that are explicitly listed get added but the wildcard files do not.
cat catlist.txt|xargs cat > fullcode
I get
cat: app/views/layouts/*: No such file or directory
cat: app/views/accounts/*: No such file or directory
Can someone help me with this. If there is an easier method I am open to all suggestions.
Barb
Your problem is that xargs is not the shell, so the wildcard is being interpreted literally as an star. You'll need to have a shell to do the expansion for you like this:
cat catlist.txt | xargs -I % sh -c "cat %" > fullcode
Note that the * is not recursive in your data file. I assume that was what you meant. If you want the entries to be recursive, that's a little trickier and would need something more like DevNull's script, but that will require that you change your data file a bit to not include the stars.
Are you positive those directories exist?
The problem with doing a cat on a list like that (where you're using wildcards) is that the cat isn't recursive. It will only list the contents of that directory; not any subdirectories.
Here's what I would do:
#!/bin/bash.exe
output="out.txt"
if [ -f "$output" ]
then
rm $output
fi
for file in $(cat catlist.txt)
do
if [ -f "$file" ]
then
echo "$file is a file."
cat $file >> $output
elif [ -d "$file" ]
then
echo "$file is a directory."
find $file -type f -exec cat {} >> $output \;
else
echo "huh?"
fi
done
If the entry listed is a directory, it finds all files from that point on and cats them.
use a while read loop to read your file
while read -r file
do
if [ -f "$file" ]; then
yourcode "$file"
fi
# expand asterix
case "$file" in
*"*" )
for f in $file
do
yourcode "$f"
done
esac
done <"catlist.txt"

Resources