Batch Rename a file based of part of name of second file - bash

I want to batch rename some file based of Part of Name of other files
let me explain my question with a example, I think its better in this way
I have some file with these name in a folder
sc_gen_08-bigfile4-0-data.txt
signal_1_1-bigfile8.1-0-data.txt
and these file in other folder
sc_gen_08-acaaf2d4180b743b7b642e8c875a9765-1-data.txt
signal_1_1-dacaaf280b743b7b642e8c875a9765-4-data.txt
I want to batch rename first files to name of second files, how can I do this? also file in both first and second folder have common in name
name(common in file in both folder)-[only this part is diffrent in each file]-data.txt
Thanks (sorry if its not a good question for everyone, but its a question for me)

Let's name the original folder as "folder1" and the other folder as "folder2". Then would you please try the following:
#!/bin/bash
folder1="folder1" # the original folder name
folder2="folder2" # the other folder name
declare -A map # create an associative array
for f in "$folder2"/*-data.txt; do # find files in the "other folder"
f=${f##*/} # remove directory name
common=${f%%-*} # extract the common substring
map[$common]=$f # associate common name with full filename
done
for f in "$folder1"/*-data.txt; do # find files in the original folder
f=${f##*/} # remove directory name
common=${f%%-*} # extract the common substring
mv -- "$folder1/$f" "$folder1/${map[$common]}"
# rename the file based on the value in map
done

If your files are all called as you mentioned. I have created the next script.
It is located following the next structure.
root#vm:~/test# ll
folder1/
folder2/
script.sh
The script is the next:
#Declare folders
folder1=./folder1
folder2=./folder2
#Create new folder if it does not exist
if [ ! -d ./new ]; then
mkdir ./new;
fi
#Iterate over first directory
for file1 in folder1/*; do
#Iterate over second directory
for file2 in folder2/*; do
#Compare begining of each file, if they match, they will be copied.
if [[ $(basename $file1 | cut -f1 -d-) == $(basename $file2 | cut -f1 -d-) ]]; then
echo $(basename $file1) $(basename $file2) "Match"
cp folder1/$(basename $file1) new/$(basename $file2)
fi
done
done
It creates a folder called new and will copy all your files there. If you want to delete them, use mv instead. But I didn't want to use mv in the first attempt just in case to get some undesired effect.

Related

Bash to rename multiple files to append different folder names

I am currently analysing genomes from SPADESs.
I currently have 500+ directories from SPADES named EC18PR-0001, EC18PR-0002, ECPK-0001 ECPK-0002 etc. And inside each directory is a contig file named 'contigs.fasta'.
I was trying to find a way to go through each directory and append each individual directory name to the 'contigs.fasta' file so it would be like: EC18PR-0001-contigs.fasta.
This loop doesn't seem to work:
for file in *EC18
do
sample=${file/.fasta} perl -ane
'if(/\>/){$a++;print ">NODE_$a\n"}else{print;}' ${sample}.fasta >
/pathway/where/files/are/SPADEs/${sample}.fasta
done
This might work:
for file in EC18*/*; do
if [[ $file =~ contigs.fasta ]];then
echo $(echo $file | sed 's#/#-#g')
fi
done

Script to automate moving files from multiple directories to newly created directories with the same suffix (not file extension)

I'm processing a large collection of born-digital materials for an archive but I'm being slowed down by the fact that I'm having to manually create directories and find and move files from multiple directories into newly created directories.
Problem: I have three directories containing three different types of content derived from different sources:
-disk_images -evidence_photos -document_scans
The disk images were created from CDs that come with cases and writing on the cases that need to be accessible and preserved for posterity so pictures have been taken of them and loaded into the evidence photos folder with a prefix and inventory number. Some CDs came with indexes on paper and have been scanned and OCR'd and loaded into the document scan folder with a prefix and an inventory number. Not all disk images have corresponding photos or scans so the inventory numbers in those folders are not linear.
I've been trying to think of ways to write a script that would look through each of these directories and move files with the same suffix (not extension) to newly created directories for each inventory number but his is way beyond my expertise. Any help would be much appreciated and I will be more than happy to clarify if need be.
examples of file names:
-disk_images/ahacd_001.iso
-evidence_photos/ahacd_case_001.jpg
-document_scans/ahacd_notes_001.pdf
Potential new directory name= ahacd_001
There all files with inventory number 001 would need to end up in ahacd_001
Bold= inventory number
Here is a squeleton of program to iterate through your 3 starting folders and split your file names:
for folder in `ls -d */` #list directories
do
echo "moving folder $folder"
ls $folder | while read file # list the files in the directory
do
echo $file
# split the file name with awk and get the first part ( 'ahacd' ) and the last ('002')
echo $file | awk -F '.' '{print $1}' |awk -F '_' '{print $1 "_" $NF}'
# when you are statisfied that your file splitting works...
mkdir folder # create your folder
move file # move the file
done
done
A few pointers to split the filenames :
Get last field using awk substr
First I would like to say that file or directory names starting with - is a bad idea even if it's allowed.
Test case:
mkdir -p /tmp/test/{-disk_images,-evidence_photos,-document_scans}
cd /tmp/test
touch -- "-disk_images/ahacd_001.iso" #create your three test files
touch -- "-evidence_photos/ahacd_case_001.jpg"
touch -- "-document_scans/ahacd_notes_001.pdf"
find -type f|perl -nlE \
'm{.*/(.*?)_(.*_)?(\d+)\.}&&say qq(mkdir -p target/$1_$3; mv "$_" target/$1_$3)'
...will not move the files, it just shows you what commands it thinks should be runned.
If those commands is what you want to be runned, then run them by adding |bash at the end of the same find|perl command:
find -type f|perl -nlE \
'm{.*/(.*?)_(.*_)?(\d+)\.}&&say qq(mkdir -p target/$1_$3; mv "$_" target/$1_$3)' \
| bash
find -ls #to see the result
All three files are now in the target/ahacd_001/ subfolder.

Comparing two directories to produce output

I am writing a Bash script that will replace files in folder A (source) with folder B (target). But before this happens, I want to record 2 files.
The first file will contain a list of files in folder B that are newer than folder A, along with files that are different/orphans in folder B against folder A
The second file will contain a list of files in folder A that are newer than folder B, along with files that are different/orphans in folder A against folder B
How do I accomplish this in Bash? I've tried using diff -qr but it yields the following output:
Files old/VERSION and new/VERSION differ
Files old/conf/mime.conf and new/conf/mime.conf differ
Only in new/data/pages: playground
Files old/doku.php and new/doku.php differ
Files old/inc/auth.php and new/inc/auth.php differ
Files old/inc/lang/no/lang.php and new/inc/lang/no/lang.php differ
Files old/lib/plugins/acl/remote.php and new/lib/plugins/acl/remote.php differ
Files old/lib/plugins/authplain/auth.php and new/lib/plugins/authplain/auth.php differ
Files old/lib/plugins/usermanager/admin.php and new/lib/plugins/usermanager/admin.php differ
I've also tried this
(rsync -rcn --out-format="%n" old/ new/ && rsync -rcn --out-format="%n" new/ old/) | sort | uniq
but it doesn't give me the scope of results I require. The struggle here is that the data isn't in the correct format, I just want files not directories to show in the text files e.g:
conf/mime.conf
data/pages/playground/
data/pages/playground/playground.txt
doku.php
inc/auth.php
inc/lang/no/lang.php
lib/plugins/acl/remote.php
lib/plugins/authplain/auth.php
lib/plugins/usermanager/admin.php
List of files in directory B (new/) that are newer than directory A (old/):
find new -newermm old
This merely runs find and examines the content of new/ as filtered by -newerXY reference with X and Y both set to m (modification time) and reference being the old directory itself.
Files that are missing in directory B (new/) but are present in directory A (old/):
A=old B=new
diff -u <(find "$B" |sed "s:$B::") <(find "$A" |sed "s:$A::") \
|sed "/^+\//!d; s::$A/:"
This sets variables $A and $B to your target directories, then runs a unified diff on their contents (using process substitution to locate with find and remove the directory name with sed so diff isn't confused). The final sed command first matches for the additions (lines starting with a +/), modifies them to replace that +/ with the directory name and a slash, and prints them (other lines are removed).
Here is a bash script that will create the file:
#!/bin/bash
# Usage: bash script.bash OLD_DIR NEW_DIR [OUTPUT_FILE]
# compare given directories
if [ -n "$3" ]; then # the optional 3rd argument is the output file
OUTPUT="$3"
else # if it isn't provided, escape path slashes to underscores
OUTPUT="${2////_}-newer-than-${1////_}"
fi
{
find "$2" -newermm "$1"
diff -u <(find "$2" |sed "s:$2::") <(find "$1" |sed "s:$1::") \
|sed "/^+\//!d; s::$1/:"
} |sort > "$OUTPUT"
First, this determines the output file, which either comes from the third argument or else is created from the other inputs using a replacement to convert slashes to underscores in case there are paths, so for example, running as bash script.bash /usr/local/bin /usr/bin would output its file list to _usr_local_bin-newer-than-_usr_bin in the current working directory.
This combines the two commands and then ensures they are sorted. There won't be any duplicates, so you don't need to worry about that (if there were, you'd use sort -u).
You can get your first and second files by changing the order of arguments as you invoke this script.

Remove second part of file name from files in a series of folders

I have a series of folders (about a 100) with a set of files that look like the following:
Folder 1:
species_2136.dbf
species_2136.lyr
species_2136.prj
species_2136.sbn
species_2136.sbx
species_2136.shp
species_2136.shp.xml
species_2136.shx
Folder 2:
species_136524.dbf
species_136524.lyr
species_136524.prj
species_136524.sbn
species_136524.sbx
species_136524.shp
species_136524.shp.xml
species_136524.shx
I'd like everything to be named species.ext. How can I remove the _#### from all files in all folders to look like this?
Folder 1:
species.dbf
species.lyr
species.prj
species.sbn
species.sbx
species.shp
species.shp.xml
species.shx
Folder 2:
species.dbf
species.lyr
species.prj
species.sbn
species.sbx
species.shp
species.shp.xml
species.shx
In bash parameter expansion,
for file in ./{folder1,folder2}/*
do
mv "$file" "${file%_*}"."${file#*.}"
done
(or) in a single line as
for file in ./{folder1,folder2}/*; do mv "$file" "${file%_*}"."${file#*.}"; done
loop can also be done as,
for file in ./folder1/* ./folder2/*; do mv "$file" "${file%_*}"."${file#*.}"; done
With Perl‘s rename (standalone command):
rename -n 's/_[0-9]+//' "Folder "*/species*
If everything looks fine, remove option -n.
your files names
species_2136.dbf
species_2136.lyr
species_2136.prj
species_2136.sbn
species_2136.sbx
species_2136.shp
species_2136.shp.xml
species_2136.shx
Very easy and simple to do by rename command. First go to your folder and then try this:
rename -n 's/_.*?\./\./'
Here -n is for no action and just show the output to you
The tricky part is this regex: _.*?\. and it match everything from _ to . for once. and the substitute them with a single dot . that's it.
Prove
$ cat your-list-of-file | rename -n 's/_.*?\./\./'
rename(species_2136.dbf, species.dbf)
rename(species_2136.lyr, species.lyr)
rename(species_2136.prj, species.prj)
rename(species_2136.sbn, species.sbn)
rename(species_2136.sbx, species.sbx)
rename(species_2136.shp, species.shp)
rename(species_2136.shp.xml, species.shp.xml)
rename(species_2136.shx, species.shx)

Bash for loop pull text from file recursively

I am having trouble writing a Bash for loop script that can extract the contents of a specific file that is common to many child directories under a parent directory.
Directory structure:
/Parent/child/grand_child/great_grand_child/file
In which there are many child, grandchild, and great grandchild folders.
I want my script to do (in psuedo-code):
For EVERY grand_child folder, in EVERY child folder:
Search through ONLY ONE great_grand_child folder
find the file named 0001.txt
print the text in row 10 of 0001.txt to an output file
In the next Column of the output file, print the full directory path to the file that the text was extracted from.
My script so far:
for i in /Parent/**; do
if [ -d "$i" ]; then
echo "$i"
fi
done
Can I have some help designing this script?
So far this gives me the path to each grand_child folder, but I don't know how to isolate just one great_grand_child folder, and then ask for text in row 10 of the 0001.txt file inside the great_grand_child folder.
# For every grandchild directory like Parent/Child/Grandchild
for grandchild in Parent/*/*
do
# Look for a file like $grandchild/Greatgrandchild/0001.txt
for file in "$grandchild/"*/0001.txt
do
# If there is no such file, just skip this Grandchild directory.
if [ ! -f "$file" ]
then
echo "Skipping $grandchild, no 0001.txt files" >&2
continue
fi
# Otherwise print the 10th line and the file that it came from.
awk 'FNR == 10 { print $0, FILENAME }' "$file"
# Don't look at any more 0001.txt files in this Grandchild directory,
# we only care about one of them.
break
done
done
Given that the names are sane (no spaces or other awkward characters), then I'd probably go with:
find /Parent -name '0001.txt' |
sort -t / -k1,1 -k2,2 -k3,3 -u |
xargs awk 'FNR == 10 { print $0, FILENAME }' > output.file
Find the files named 0001.txt under /Parent. Sort the list so that there is just one entry per /Parent/Child/Grandchild. Run awk as often as necessary, printing line 10 of each file along with the file name. Capture the output in output.file.

Resources