Processing every file from a list of folders - bash

I have a folder structure like
base
|
|---rbbc_23434
| |------rbbp_34954
| | |___this.json
|
|---rbbc_222334
| |------rbbp_39884954
| | |___this.json
|
etc
And I want to process each this.json. Notice that the letters after rbbp are random
I have the following
#! /bin/bash
search_dir=/path/to/base/
bf="$(basename -- $search_dir)"
for entry in "$search_dir"*/
do
#echo "$entry"
f="$(basename -- $entry)"
echo "$f"
if [[ "$f" == "rbb"* ]]
then
echo "$entry"
ls "$entry""rbbp"*"/this.json"
#echo "$entry""rdgp"*"/this2.json"
#python3 something.py --input "$entry""rbbp"*"/this.json"
fi
done
With ls I can localize the this.json files from all folders but these wildcards do not seem to work when specifying a file for input to a python script or even echo
How can I specify this this.json file as a path to the something.py script?

You don't need the nested loops and if statements, just make a wildcard that matches all the directories in the path.
search_dir=/path/to/base
for file in "$search_dir"/rbb*/rbbp*/this.json
do
python3 something.py --input "$file"
done

Related

How to create a bash script to make directories and specific files inside each directory

I wrote a bash script trying to generate one directory named after each file inside the directory from which I run the script.
Original directory= /home/agalvez/data//sims/phylip_format
sim1.phylip
sim2.phylip
Directories to create = sim1 sim2
The contents of these new directories should be a copy of the original file that names the new directory and an extra file called "input". This file should contain the name of the .phylip file as well as the following:
"Name of original file"
U
5
Y
/home/agalvez/data/sims/trees/tree_nodenames.txt
After that I want to run the following command (sequentially) in all these new directories:
phylip dollop < input > screenout
My approach is the following one but it is not working:
!/bin/bash
for f in *.phylip;
mkdir /home/agalvez/data/sims/dollop/$f;
cp $f /home/agalvez/data/sims/dollop/$f;
cd /home/agalvez/data/sims/dollop/$f;
echo "$f" | cat > input;
echo "U" | cat >> input;
echo "5" | cat >> input;
echo "Y" | cat >> input;
echo "/home/agalvez/data/sims/trees/tree_nodenames.txt" | cat >> input;
phylip dollop < input > screenout;
;done
Edit: The error messge looks like this:
line 4: syntax error near unexpected token `mkdir'
line 4: ` mkdir /home/agalvez/data/sims/dollop/$f;'
FINAL SOLUTION:
#!/bin/bash
for f in *.phylip;
do
mkdir /home/agalvez/data/sims/dollop/$f;
cp /home/agalvez/data/sims/phylip_format/$f /home/agalvez/data/sims/dollop/$f;
cd /home/agalvez/data/sims/dollop/$f;
echo "$f" | cat > input;
echo "U" | cat >> input;
echo "5" | cat >> input;
echo "Y" | cat >> input;
echo "/home/agalvez/data/sims/trees/tree_nodenames.txt" | cat >> input;
phylip dollop < input > screenout;
done
The immediate problem is that you are lacking a do at the beginning of the loop body; but you'll want to refactor this code to avoid hardcoding the directory structure etc.
The first line needs to start with literally the two characters # and ! in order to be a valid shebang.
Notice also When to wrap quotes around a shell variable?
The printf could be replaced with a here document; I like the compactness of printf here.
#!/bin/bash
for f in *.phylip; do
mkdir -p dollop/"$f"
cp "$f" dollop/"$f"
cd dollop/"$f"
printf "%s\n" "$f" "U" "5" "Y" \
"/home/agalvez/data/sims/trees/tree_nodenames.txt" |
phylip dollop > screenout
done
Going forward, try http://shellcheck.net/ for diagnosing many common beginner problems in shell scripts.
Assuming you have a directory named pingping in your ${HOME} folder with files 1.txt, 2.txt, 3.txt. You can accomplish that like this. Modify this code to suit your needs.
#! /bin/bash
working_directory="${HOME}/pingping/"
cd $working_directory
for f in *.txt
do
mkdir "${f%%.*}"
if [ -f "${f%%.*}.txt" ]
then
if [ -d "${f%%.*}" ]
then
cp ${f%%.*}.txt ${f%%.*}
echo "Done copying"
#phylip dollop < input > screenout
#echo "Succesfully ran the command
fi
else
echo "not found"
fi
done

Determine the maximum path length of a directory in bash

I'm trying to create an hierarchy of directories and files. The idea is the following : i want my script to create my directories level by level
So lets say the user types these commands from command line
/dirname #dirs=8 #levels=2
and i want it to create something like this:
/dirname
|-->/a/b
|
|-->/b/c
|
|-->/e/f
|
|-->/g/h
How can i determine the number of paths i will have in my directories
i've tried with PATH_MAX but it doesnt seem to work
#!/bin/bash
ARG_NUM=$# #argc
echo $#
if [ $ARG_NUM -ne 3 ];then
echo "Wrong number of aguments"
exit 1
fi
# ROOT_DIR
dir_name=$1
if [[ ! -d $dir_name ]]; then
mkdir $dir_name
fi
#CREATING FOLDERS WITH RANDOM NAMES
LIMIT=$3
for ((j=1; j<=LIMIT; j++))
do
mkdir -p $dir_name/$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 8 | head -n 1)
echo $dir_name "created"
done

rename file names work in command prompt but not in bash script

I'm trying to rename commands in a bash script. If I run for example:
echo /home/scientist/mySalesData/campaignData_1482386214.24417.csv | sed 's/\(.*\)\(_.*\)/mv \"&" \"\1.csv\"/' | bash
It works fine and gives me campaignData.csv in the directory /home/scientist/mySalesData/ .
However, if I put this in a bash script as follows:
for f in /home/scientist/SalesData/*; do
if [ -f "$f" ];
cp "$f" /home/scientist/SalesForce/SalesData/Backups/
echo $f$ | sed 's/\(.*\)\(_.*\)/mv \"&" \"\1.csv\"/' | bash |
fi
done
I get:
mv: cannot stat '/home/scientist/SalesData/campaignData_1482386214.24417.csv$': No such file or directory
Any help would be much appreciated!
cd "$srcdir"
for f in *; do
if [ -f "$f" ]; then
cp "./$f" "$dstdir/${f%_*}.csv"
fi
done
The % is the strip shortest suffix pattern operator.
You have a trailing $ here:
echo $f$
remove that (and quote the expansion):
echo "$f"
You could use here string too:
sed ... <<<"$f"

Move files from directories listed in file

I have a directory structure like the following toy example
DirectoryTo
DirectoryFrom
-Dir1
---File1.txt
---File2.txt
---File3.txt
-Dir2
---File4.txt
---File5.txt
---File6.txt
-Dir3
---File1.txt
---File5.txt
---File7.txt
I'm trying to copy all the files from DirectoryFrom to DirectoryTo, keeping the newer file if there are duplicates.
DirectoryTo
-File1.txt
-File2.txt
-File3.txt
-File4.txt
-File5.txt
-File6.txt
-File7.txt
DirectoryFrom
-Dir1
---File1.txt
---File2.txt
---File3.txt
-Dir2
---File4.txt
---File5.txt
---File6.txt
-Dir3
---File1.txt
---File5.txt
---File7.txt
I've created a text file with a list of all the subdirectories. This list is in the order such that the NEWEST files will be listed first:
Filelist.txt
C:/DirectoryFrom/Dir1
C:/DirectoryFrom/Dir2
C:/DirectoryFrom/Dir3
So what I'd like to do is loop through each directory in Filelist.txt, copy the files, and NOT replace if the file already exists.
I'd like to do this at the command line, in a shell script, or possibly in Python. I'm pretty new to Python, but have a little experience with the command line. However, I've never done something this complicated.
In reality, I have ~60 folders, each with 50-200 files in them, to give you a feel for how many I have. Also, each file is ~75MB.
I've done something similar in R before, but it's slow and not really meant for this. But here's what I've tried for a shell script, edited to fit this toy example:
#!/bin/bash
for line in Filelist.txt
do
cp -n line C:/DirectoryTo/
done
If you have only one one directory level in your DirectoryFrom then you can use:
cp -n DirectoryFrom/*/* DirectoryTo
explanation : copy every file which exist in subdirectories of DirectoryFrom to DirectoryTo if it doesn't exist
n flag is for not overwriting files if they already exist.
cp will also ignore directories if they exist in subdirectories of DirectoryTo
# Create test environnement :
mkdir C:/DirectoryTo
mkdir C:/DirectoryFrom
cd C:/DirectoryFrom
mkdir Dir1 Dir2 Dir3
(
cat << EOF
Dir1/File1.txt
Dir1/File2.txt
Dir1/File3.txt
Dir2/File4.txt
Dir2/File5.txt
Dir2/File6.txt
Dir3/File1.txt
Dir3/File5.txt
Dir3/File7.txt
EOF
)| while read f
do
echo "$f : `date`"
echo "$f : `date`" > $f
sleep 1
done
# create Filelist.txt file :
(
cat << EOF
C:/DirectoryFrom/Dir1
C:/DirectoryFrom/Dir2
C:/DirectoryFrom/Dir3
EOF
) > Filelist.txt
# Generate the liste of all files :
cd C:/DirectoryFrom
cat Filelist.txt | while read f; do ls -1 $f; done | sort -u > filenames.txt
cat filenames.txt
# liste of all files path, sorted by time order :
cd C:/DirectoryFrom
ls -1tr */* > all_filespath_sorted.txt
cat all_filespath_sorted.txt
# selected files to be copied :
cat filenames.txt | while read f; do cat all_filespath_sorted.txt | grep $f | tail -1 ; done
# copy of selected files:
cat filenames.txt | while read f; do cat all_filespath_sorted.txt | grep $f | tail -1 ; done | while read c
do
echo $c
cp -p $c C:/DirectoryTo
done
# verifying :
cd C:/DirectoryTo
ls -ltr
# or
ls -1 | while read f; do echo -e "\n$f\n-------"; cat $f; done
#------------------------------------------------
# Other solution for a limited number of files :
#------------------------------------------------
# To list files by order :
find `cat Filelist.txt | xargs` -type f | xargs ls -1tr
# To copy files, the newer will replace the older :
find `cat Filelist.txt | xargs` -type f | xargs ls -1tr | while read c
do
echo $c
cp -p $c C:/DirectoryTo
done

Bourne Shell doesn't find unix commands on script

#!/bin/sh
echo "Insert the directory you want to detail"
read DIR
#Get the files:
FILES=`ls "$DIR" | sort`
echo "Files in the list:"
echo "$FILES"
echo ""
echo "Separating directories from files..."
for FILE in $FILES
do
PATH=${DIR}"/$FILE"
OUTPUT="Path: $PATH"
if [ -f "$PATH" ]; then
NAME=`echo "$FILE" | cut -d'.' -f1`
OUTPUT=${OUTPUT}" (filename: $NAME"
EXTENSION=`echo "$FILE" | cut -s -d'.' -f2`
if [ ${#EXTENSION} -gt 0 ]; then
OUTPUT=${OUTPUT}" - type: $EXTENSION)"
else
OUTPUT=${OUTPUT}")"
fi
elif [ -d "$PATH" ]; then
OUTPUT=${OUTPUT}" (dir name: $FILE)"
fi
echo "$OUTPUT"
done
I get this output when running it (I ran using relative path and full path)
$ ./problem.sh
Insert the directory you want to detail
.
Files in the list:
directoryExample
problem.sh
Separating directories from files...
Path: ./directoryExample (dir name: directoryExample)
./problem.sh: cut: not found
./problem.sh: cut: not found
Path: ./problem.sh (filename: )
$
$
$ ./problem.sh
Insert the directory you want to detail
/home/geppetto/problem
Files in the list:
directoryExample
problem.sh
Separating directories from files...
Path: /home/geppetto/problem/directoryExample (dir name: directoryExample)
./problem.sh: cut: not found
./problem.sh: cut: not found
Path: /home/geppetto/problem/problem.sh (filename: )
$
As you can see I received "cut: not found" two times when arranging the output string of file types. why? (I am using Free BSD)
PATH is the variable used by the shell to store the list of directories where commands like cut might be found. You overwrote the value of that variable, losing the initial list. The easy fix is to not use PATH in your for loop. The more complete answer is to avoid all variable names consisting of only uppercase letters, as those are reserved for use by the shell. Include as least one lowercase letter or number in all your own variable names to avoid interfering with current (or future) variables used by the shell.

Resources