Bash to rename multiple files to append different folder names - bash

I am currently analysing genomes from SPADESs.
I currently have 500+ directories from SPADES named EC18PR-0001, EC18PR-0002, ECPK-0001 ECPK-0002 etc. And inside each directory is a contig file named 'contigs.fasta'.
I was trying to find a way to go through each directory and append each individual directory name to the 'contigs.fasta' file so it would be like: EC18PR-0001-contigs.fasta.
This loop doesn't seem to work:
for file in *EC18
do
sample=${file/.fasta} perl -ane
'if(/\>/){$a++;print ">NODE_$a\n"}else{print;}' ${sample}.fasta >
/pathway/where/files/are/SPADEs/${sample}.fasta
done

This might work:
for file in EC18*/*; do
if [[ $file =~ contigs.fasta ]];then
echo $(echo $file | sed 's#/#-#g')
fi
done

Related

Batch Rename a file based of part of name of second file

I want to batch rename some file based of Part of Name of other files
let me explain my question with a example, I think its better in this way
I have some file with these name in a folder
sc_gen_08-bigfile4-0-data.txt
signal_1_1-bigfile8.1-0-data.txt
and these file in other folder
sc_gen_08-acaaf2d4180b743b7b642e8c875a9765-1-data.txt
signal_1_1-dacaaf280b743b7b642e8c875a9765-4-data.txt
I want to batch rename first files to name of second files, how can I do this? also file in both first and second folder have common in name
name(common in file in both folder)-[only this part is diffrent in each file]-data.txt
Thanks (sorry if its not a good question for everyone, but its a question for me)
Let's name the original folder as "folder1" and the other folder as "folder2". Then would you please try the following:
#!/bin/bash
folder1="folder1" # the original folder name
folder2="folder2" # the other folder name
declare -A map # create an associative array
for f in "$folder2"/*-data.txt; do # find files in the "other folder"
f=${f##*/} # remove directory name
common=${f%%-*} # extract the common substring
map[$common]=$f # associate common name with full filename
done
for f in "$folder1"/*-data.txt; do # find files in the original folder
f=${f##*/} # remove directory name
common=${f%%-*} # extract the common substring
mv -- "$folder1/$f" "$folder1/${map[$common]}"
# rename the file based on the value in map
done
If your files are all called as you mentioned. I have created the next script.
It is located following the next structure.
root#vm:~/test# ll
folder1/
folder2/
script.sh
The script is the next:
#Declare folders
folder1=./folder1
folder2=./folder2
#Create new folder if it does not exist
if [ ! -d ./new ]; then
mkdir ./new;
fi
#Iterate over first directory
for file1 in folder1/*; do
#Iterate over second directory
for file2 in folder2/*; do
#Compare begining of each file, if they match, they will be copied.
if [[ $(basename $file1 | cut -f1 -d-) == $(basename $file2 | cut -f1 -d-) ]]; then
echo $(basename $file1) $(basename $file2) "Match"
cp folder1/$(basename $file1) new/$(basename $file2)
fi
done
done
It creates a folder called new and will copy all your files there. If you want to delete them, use mv instead. But I didn't want to use mv in the first attempt just in case to get some undesired effect.

How to delete files with different extensions according to a specific condition?

Although there are several quite similar questions I could not apply the answer to my problem:
I have a lot of txt-files with a corresponding tsv-file of the same name but with different extensions, for example
$ ls myDirectory
file1.tsv (empty)
file1.txt
file2.tsv (not empty)
file2.txt
Only if the tsv-file is empty, I would like to delete both files. If the tsv-file is not empty, I would like to keep both files. Like so:
$ ls myDirectory
file2.tsv
file2.txt
Alternatively, I would like to delete both corresponding files if and only if a specific string is not contained in the txt-file? (In case that is easier.)
How can that be done with a shell script?
Loop over tsv files. Check the existence of the corresponding txt file, if it's there but the tsv is empty, remove them.
#! /bin/bash
for tsv in *.tsv ; do
txt=${tsv%.tsv}.txt # Corresponding txt file.
[[ -f $txt ]] || continue # Skip tsv if txt doesn't exist.
if [[ ! -s $tsv ]] ; then # If tsv is empty
rm "$tsv" "$txt" # remove both.
fi
done
The alternative can be implemented in a similar way, just use a different condition in the if-clause:
if ! grep -q "$search" "$txt" ; then
where $search contains the regex you want to search for.

Add filename as a column in .txt file and concatenate into a single file

I have 28 directories with names as such as such:
_.LD_RESULTS_Astrocytes_BloodCells_GeneSet
_.LD_RESULTS_Astrocytes_Endothelia_GeneSet
_.LD_RESULTS_Polydendrocytes_Endothelia_GeneSet
_.LD_RESULTS_Microglia.x_Microglia.y_GeneSet
_.LD_RESULTS_Endothelia_Neurons_GeneSet
They all contain different .txt files (around 20 per directory).
_.LD_RESULTS_Astrocytes_BloodCells_GeneSet_ADHD.cell_type_results.txt _.LD_RESULTS_Astrocytes_BloodCells_GeneSet_Height.cell_type_results.txt
_.LD_RESULTS_Astrocytes_BloodCells_GeneSet_ALS.cell_type_results.txt _.LD_RESULTS_Astrocytes_BloodCells_GeneSet_Insomnia.cell_type_results.txt
_.LD_RESULTS_Astrocytes_BloodCells_GeneSet_Alzheimers.cell_type_results.txt
I am trying to add these filenames as a column to the text files, and concatenate them all into a single file and call it foldername_results.txt
I started with this, which works at concatenating all .txt files into a single file called all_results.txt, across all directories.
for d in *; do
[[ -d "$d" ]] && cd "$d" || continue
awk '{print $0,FILENAME}' *.txt > all_results.txt
cd -
done
I only want to slightly modify this so that all_results.txt includes the directory name as well. How can I achieve this?
Thank you.
...
awk -v dir="$PWD" '{print $0,dir,FILENAME}' ...
...

Move files in S3 to folders based on filename

I have s3 folder where files are staged from an application.
I need to move these files based on a specified folder structure using the filenames.
The files are named in a particular format:
s3://bucketname/staging/file1_YYYY_MM_DD_HH_MM_SS
s3://bucketname/staging/file1_YYYY_MM_DD_HH_MM_SS
I need to move them to s3 folders of this format:
s3://bucketname/file1/YYYY/MM/DD
I have the following code now to store all the filenames present in the staging folder in a file.
path=s3://bucketname/staging
count=`s3cmd ls $path | wc -l`
echo $count
if [[ $count -gt 0 ]]; then
list_files_to_move_s3=$(s3cmd ls -r $path | awk '{print $4}' > files_in_bucket.txt)
echo "exists"
else
echo "do not exist"
fi
I now need to read the filenames and move the files accordingly.
Can you please help.
You can parse the contents of files_in_bucket.txt with sed to produce the output you want:
---> cat tests3.txt
s3://bucketname/staging/file1_YYYY_MM_DD_HH_MM_SS
s3://bucketname/staging/file1_YYYY_MM_DD_HH_MM_SS
---> sed -r "s|^(s3://.*)/.*/(.*)_(.*)_(.*)_(.*)_.*_.*_.*$|\1/\2/\3/\4/\5|g" tests3.txt
s3://bucketname/file1/YYYY/MM/DD
s3://bucketname/file1/YYYY/MM/DD
--->
What's happening there is it's parsing out each line from the file tests3.txt, with each bit inside parentheses saved as a "variable" (I'm not sure what the correct term is for sed, but you get the idea) which can then be referenced in the substitution string as \1, \2, \3, etc. So it's picking out the first bit, including up until the first slash, skipping the "staging" bit, and then picking out the file and date portions of the file name.
Note that this assumes a very standardized layout of the filenames and your desired output.
Let me know if you have any questions about this or need further help.

Renames numbered files using names from list in other file

I have a folder where there are books and I have a file with the real name of each file. I renamed them in a way that I can easily see if they are ordered, say "00.pdf", "01.pdf" and so on.
I want to know if there is a way, using the shell, to match each of the lines of the file, say "names", with each file. Actually, match the line i of the file with the book in the positiĆ³n i in sort order.
<name-of-the-book-in-the-1-line> -> <book-in-the-1-position>
<name-of-the-book-in-the-2-line> -> <book-in-the-2-position>
.
.
.
<name-of-the-book-in-the-i-line> -> <book-in-the-i-position>
.
.
.
I'm doing this in Windows, using Total Commander, but I want to do it in Ubuntu, so I don't have to reboot.
I know about mv and rename, but I'm not as good as I want with regular expressions...
renamer.sh:
#!/bin/bash
for i in `ls -v |grep -Ev '(renamer.sh|names.txt)'`; do
read name
mv "$i" "$name.pdf"
echo "$i" renamed to "$name.pdf"
done < names.txt
names.txt: (line count must be the exact equal to numbered files count)
name of first book
second-great-book
...
explanation:
ls -v returns naturally sorted file list
grep excludes this script name and input file to not be renamed
we cycle through found file names, read value from file and rename the target files by this value
For testing purposes, you can comment out the mv command:
#mv "$i" "$name"
And now, simply run the script:
bash renamer.sh
This loops through names.txt, creates a filename based on a counter (padding to two digits with printf, assigning to a variable using -v), then renames using mv. ((++i)) increases the counter for the next filename.
#!/bin/bash
i=0
while IFS= read -r line; do
printf -v fname "%02d.pdf" "$i"
mv "$fname" "$line"
((++i))
done < names.txt

Resources