Is there a way to optimize this code and make it faster or anyother better solution? - bash

I am looking to collect yang models from my project .jar files.Though i came with an approach but it takes time and my colleagues are not happy.
#!/bin/sh
set -e
# FIXME: make this tuneable
OUTPUT="yang models"
INPUT="."
JARS=`find $INPUT/system/org/linters -type f -name '*.jar' | sort -u`
# FIXME: also wipe output?
[ -d "$OUTPUT" ] || mkdir "$OUTPUT"
for jar in $JARS; do
artifact=`basename $jar | sed 's/.jar$//'`
echo "Extracting modules from $artifact"
# FIXME: better control over unzip errors
unzip -q "$jar" 'META-INF/yang/*' -d "$artifact" \
2>/dev/null || true
dir="$artifact/META-INF/yang"
if [ -d "$dir" ]; then
for file in `find $dir -type f -name '*.yang'`; do
module=`basename "$file"`
echo -e "\t$module"
# FIXME: better duplicate detection
mv -n "$file" "$OUTPUT"
done
fi
rm -rf "$artifact"
done

If the .jar files don't all change between invocations of your script then you could make the script significantly faster by caching the .jar files and only operating on the ones that changed, e.g.:
#!/bin/env bash
set -e
# FIXME: make this tuneable
output='yang models'
input='.'
cache='/some/where'
mkdir -p "$cache" || exit 1
readarray -d '' jars < <(find "$input/system/org/linters" -type f -name '*.jar' -print0 | sort -zu)
# FIXME: also wipe output?
mkdir -p "$output" || exit 1
for jarpath in "${jars[#]}"; do
diff -q "$jarpath" "$cache" || continue
cp "$jarpath" "$cache"
jarfile="${jarpath##*/}"
artifact="${jarfile%.*}"
printf 'Extracting modules from %s\n' "$artifact"
# FIXME: better control over unzip errors
unzip -q "$jarpath" 'META-INF/yang/*' -d "$artifact" 2>/dev/null
dir="$artifact/META-INF/yang"
if [ -d "$dir" ]; then
readarray -d '' yangs < <(find "$dir" -type f -name '*.yang' -print0)
for yangpath in "${yangs[#]}"; do
yangfile="${yangpath##*/}"
printf '\t%s\n' "$yangfile"
# FIXME: better duplicate detection
mv -n "$yangpath" "$output"
done
fi
rm -rf "$artifact"
done
See Correct Bash and shell script variable capitalization, http://mywiki.wooledge.org/BashFAQ/082, https://mywiki.wooledge.org/Quotes, How can I store the "find" command results as an array in Bash for some of the other changes I made above.
I assume you have some reason for looping on the .yang files and not moving them if a file by the same name already exists rather than unzipping the .jar file into the final output directory.

Related

moving files to their respective folders using bash scripting

I have files in this format:
2022-03-5344-REQUEST.jpg
2022-03-5344-IMAGE.jpg
2022-03-5344-00imgtest.jpg
2022-03-5344-anotherone.JPG
2022-03-5343-kdijffj.JPG
2022-03-5343-zslkjfs.jpg
2022-03-5343-myimage-2010.jpg
2022-03-5343-anotherone.png
2022-03-5342-ebee5654.jpeg
2022-03-5342-dec.jpg
2022-03-5341-att.jpg
2022-03-5341-timephoto_december.jpeg
....
about 13k images like these.
I want to create folders like:
2022-03-5344/
2022-03-5343/
2022-03-5342/
2022-03-5341/
....
I started manually moving them like:
mkdir name
mv name-* name/
But of course I'm not gonna repeat this process for 13k files.
So I want to do this using bash scripting, and since I am new to bash, and I am working on a production environment, I want to play it safe, but it doesn't give me my results. This is what I did so far:
#!/bin/bash
name = $1
mkdir "$name"
mv "${name}-*" $name/
and all I can do is: ./move.sh name for every folder, I didn't know how to automate this using loops.
With bash and a regex. I assume that the files are all in the current directory.
for name in *; do
if [[ "$name" =~ (^....-..-....)- ]]; then
dir="${BASH_REMATCH[1]}"; # dir contains 2022-03-5344, e.g.
echo mkdir -p "$dir" || exit 1;
echo mv -v "$name" "$dir";
fi;
done
If output looks okay, remove both echo.
Try this
xargs -i sh -c 'mkdir -p {}; mv {}-* {}' < <(ls *-*-*-*|awk -F- -vOFS=- '{print $1,$2,$3}'|uniq)
Or:
find . -maxdepth 1 -type f -name "*-*-*-*" | \
awk -F- -vOFS=- '{print $1,$2,$3}' | \
sort -u | \
xargs -i sh -c 'mkdir -p {}; mv {}-* {}'
Or find with regex:
find . -maxdepth 1 -type f -regextype posix-extended -regex ".*/[0-9]{4}-[0-9]{2}-[0-9]{4}.*"
You could use awk
$ cat awk.script
/^[[:digit:]-]/ && ! a[$1]++ {
dir=$1
} /^[[:digit:]-]/ {
system("sudo mkdir " dir )
system("sudo mv " $0" "dir"/"$0)
}
To call the script and use for your purposes;
$ awk -F"-([0-9]+)?[[:alpha:]]+.*" -f awk.script <(ls)
You will see some errors such as;
mkdir: cannot create directory ‘2022-03-5341’: File exists
after the initial dir has been created, you can safely ignore these as the dir now exist.
The content of each directory will now have the relevant files
$ ls 2022-03-5344
2022-03-5344-00imgtest.jpg 2022-03-5344-IMAGE.jpg 2022-03-5344-REQUEST.jpg 2022-03-5344-anotherone.JPG

Bash script to String concat two variables and do File compare

what I am trying to achieve is, to delete same filenames(filename+modfiedtimestamp)exisitng in Src_Dir1 and Src_Dir2
So first i have tried to deploy all the filenames to tempa(Src_Dir1) and tempb(Src_Dir2) respectively.
Below is the screenshot of the source directory.
Files inside archive be like this and few files outside too..
So, initially I am want to deal with the files inside Archive(SRC_Dir1) and later outside Archive(SRC_Dir2) what I am trying to do is to use a while loop to read each and every filename and string concat with the modified timestamp(mtime) and input to tempc(like for example it should be like AirTimeActs_2018-12-03.csv+2019-01-24 14:41:53.000000000 -0500 = AirTimeActs_2018-12-03.csv_2019-01-24 14:41:53.000000000 -0500 this is how it should be generating into tempc file for each and every filename inside Archive(SRC_Dir1). This is where I am stuck under string concat variable section on how to proceed. Please help me with the code, hope I am comprehensible.
IMPORTANT
(Really appreciate it, if you help me out with the extension of the code which i haven't mentioned here and yet to achieve which is - >
Have to implement the same code(which I am trying to do for tempa, I'd like to do it for tempb too and name it as tempd) and then do a file data compare between tempc and tempd) if there is any kind of same data filename, then delete the file existing in Src_Dir2, if there is no same data filename, then do nothing.)
#!/bin/bash
Src_Dir1=path/Airtime_Activation/Archive
Src_Dir2=path/Airtime_Activation/
find "$Src_Dir1" -maxdepth 1 -name "*.xlsx" -o -name "*.csv" | sed "s/.*\///" > -print>path/Airtime_Activation/temp_a
find "$Src_Dir2" -maxdepth 1 -name "*.xlsx" -o -name "*.csv" | sed "s/.*\///" > -print>path/Airtime_Activation/temp_b
echo 'phase1'
cat path/Airtime_Activation/temp_a | while read file;
do
echo 'phase1.5'
echo "$file"
echo 'phase2'
mtime=$(stat -c '%y' $file)
Full_name=${file}_${mtime}
echo "$Full_name" >> path/Airtime_Activation/temp_c
echo 'phase3'
done
#!/bin/bash
Src_Dir1=path/Airtime_Activation/Archive
Src_Dir2=path/Airtime_Activation/
find "$Src_Dir1" -maxdepth 1 -name "*.xlsx" -o -name "*.csv" | sed "s/.*\///" > -print>path/Airtime_Activation/temp_a
find "$Src_Dir2" -maxdepth 1 -name "*.xlsx" -o -name "*.csv" | sed "s/.*\///" > -print>path/Airtime_Activation/temp_b
echo 'phase1'
cat path/Airtime_Activation/temp_a | while read file;
do
echo 'phase1.5'
echo "$file"
echo 'phase2'
mtime=$(stat -c '%y' $file)
Full_name=${file}_${mtime}
echo "$Full_name" >> path/Airtime_Activation/temp_c
echo 'phase3'
done
cat /path/Airtime_Activation/temp_b | while read file
#while IFS="" read -r -d $'\0' file;
do
#echo "$file"
echo 'phase2'
mtime=$(stat -c '%y' $Src_Dir2/$file)
Full_name=${file}_${mtime}
echo "$Full_name" >> path/temp_d
echo 'phase3'
done
#file compare and delete old files from outisde archive
grep -Ff temp_d temp_c > path/Airtime_Activation/temp_e
cat path/Airtime_Activation/temp_e | while read file
#while IFS="" read -r -d $'\0' file;
do
#echo "$file"
echo 'phase2'
echo "${file%_*}"
rm $Src_Dir2/${file%_*}
echo 'phase3'
done

How to prevent Travis-CI to terminate a job?

I have a bunch of files that need to be copied over to a tmp/ directory and then compressed.
I tried cp -rf $SRC $DST but the job is terminated before the command is complete. The verbose option int help either because the log file exceeds the size limit.
I wrote a small function to print only a percentage bar, but I get the same problem with the log size limit so maybe I need to redirect stdout to stderr but I'm not sure.
This is the snippet with the function:
function cp_p() {
local files=0
while IFS= read -r -d '' file; do ((files++)); done < <(find -L $1 -mindepth 1 -name '*.*' -print0)
local duration=$(tput cols)
duration=$(($duration<80?$duration:80-8))
local count=1
local elapsed=1
local bar=""
already_done() {
bar="\r|"
for ((done=0; done<$(( ($elapsed)*($duration)/100 )); done++)); do
printf -v bar "$bar▇"
done
}
remaining() {
for ((remain=$(( ($elapsed)*($duration)/100 )); remain<$duration; remain++)); do
printf -v bar "$bar "
done
printf -v bar "$bar|"
}
percentage() {
printf -v bar "$bar%3d%s" $elapsed '%%'
}
mkdir -p "$2/$1"
chmod `stat -f %A "$1"` "$2/$1"
while IFS= read -r -d '' file; do
file=$(echo $file | sed 's|^\./\(.*\)|"\1"|')
elapsed=$(( (($count)*100)/($files) ))
already_done
remaining
percentage
printf "$bar"
if [[ -d "$file" ]]; then
dst=$2/$file
test -d "$dst" || (mkdir -p "$dst" && chmod `stat -f %A "$file"` "$dst")
else
src=${file%/*}
dst=$2/$src
test -d "$dst" || (mkdir -p "$dst" && chmod `stat -f %A "$src"` "$dst")
cp -pf "$file" "$2/$file"
fi
((count++))
done < <(find -L $1 -mindepth 1 -name '*.*' -print0)
printf "\r"
}
This is the error I get
packaging files (this may take several minutes) ...
|▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ | 98%
The log length has exceeded the limit of 4 MB (this usually means that the test suite is raising the same exception over and over).
The job has been terminated
Have you tried travis_wait cp -rf $SRC $DST? See https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received for details.
Also, I believe that generally disk operations are rather slow on macOS builds. You might be better off compressing the file structure while the files are touched. Assuming you want to gzip the thing:
travis_wait tar -zcf $DST.tar.gz $SRC

Bash Script: unary operator expected error?

#!/usr/bin/env bash
FILETYPES=( "*.html" "*.css" "*.js" "*.xml" "*.json" )
DIRECTORIES=`pwd`
MIN_SIZE=1024
for currentdir in $DIRECTORIES
do
for i in "${FILETYPES[#]}"
do
find $currentdir -iname "$i" -exec bash -c 'PLAINFILE={};GZIPPEDFILE={}.gz; \
if [ -e $GZIPPEDFILE ]; \
then if [ `stat --printf=%Y $PLAINFILE` -gt `stat --printf=%Y $GZIPPEDFILE` ]; \
then gzip -k -4 -f -c $PLAINFILE > $GZIPPEDFILE; \
fi; \
elif [ `stat --printf=%s $PLAINFILE` -gt $MIN_SIZE ]; \
then gzip -k -4 -c $PLAINFILE > $GZIPPEDFILE; \
fi' \;
done
done
This script compresses all web static files using gzip. When I try to run it, I get this error bash: line 5: [: 93107: unary operator expected. What is going wrong in this script?
You need to export the MIN_SIZE variable. The bash you are having find spawn doesn't have a value for it so the script runs (as I just mentioned in my comment on #ooga's answer) [ $result_from_stat -gt ] which is an error and (when the result is 93107) gets you [ 93107 -gt ] which (if you run that in your shell) gets you output of:
$ [ 93107 -gt ]
-bash: [: 93107: unary operator expected
This could be simpler:
#!/usr/bin/env bash
FILETYPES=(html css js xml json)
DIRECTORIES=("$PWD")
MIN_SIZE=1024
IFS='|' eval 'FILTER="^.*[.](${FILETYPES[*]})\$"'
for DIR in "${DIRECTORIES[#]}"; do
while IFS= read -ru 4 FILE; do
GZ_FILE=$FILE.gz
if [[ -e $GZ_FILE ]]; then
[[ $GZ_FILE -ot "$FILE" ]] && gzip -k -4 -c "$FILE" > "$GZ_FILE"
elif [[ $(exec stat -c '%s' "$FILE") -ge MIN_SIZE ]]; then
gzip -k -4 -c "$FILE" > "$GZ_FILE"
fi
done 4< <(exec find "$DIR" -mindepth 1 -type f -regextype egrep -iregex "$FILTER")
done
There's no need to use pwd. You can just have $PWD. And probably what you needed was an array variable as well.
Instead of calling bash multiple times as an argument to find with static string commands, just read input from a pipe or better yet from a named pipe through process substitution.
Instead of comparing stats, you can just use -ot or -nt.
You don't need -f if you're writing the output through redirection (>) as that form of redirection overwrites the target by default.
You can just call find against multiple files once by making a pattern as it's more efficient. You can check how I made the filter and used -iregex. Probably doing \( -iname one_ext_pat -or -iname another_ext_pat \) can also be applicable but it's more difficult.
exec is optional to prevent unnecessary use of another process.
Always prefer [[ ]] over [ ].
4< opens input with file descriptor 4 and -u 4 makes read read from that file descriptor, not stdin (0).
What you probably need is -ge MIN_SIZE (greater than or equal) not -gt.
Come to think of it, readarray is a cleaner option if your bash is version 4.0 or newer:
for DIR in "${DIRECTORIES[#]}"; do
readarray -t FILES < <(exec find "$DIR" -mindepth 1 -type f -regextype egrep -iregex "$FILTER")
for FILE in "${FILES[#]}"; do
...
done
done

How to loop through a directory recursively to delete files with certain extensions

I need to loop through a directory recursively and remove all files with extension .pdf and .doc. I'm managing to loop through a directory recursively but not managing to filter the files with the above mentioned file extensions.
My code so far
#/bin/sh
SEARCH_FOLDER="/tmp/*"
for f in $SEARCH_FOLDER
do
if [ -d "$f" ]
then
for ff in $f/*
do
echo "Processing $ff"
done
else
echo "Processing file $f"
fi
done
I need help to complete the code, since I'm not getting anywhere.
As a followup to mouviciel's answer, you could also do this as a for loop, instead of using xargs. I often find xargs cumbersome, especially if I need to do something more complicated in each iteration.
for f in $(find /tmp -name '*.pdf' -or -name '*.doc'); do rm $f; done
As a number of people have commented, this will fail if there are spaces in filenames. You can work around this by temporarily setting the IFS (internal field seperator) to the newline character. This also fails if there are wildcard characters \[?* in the file names. You can work around that by temporarily disabling wildcard expansion (globbing).
IFS=$'\n'; set -f
for f in $(find /tmp -name '*.pdf' -or -name '*.doc'); do rm "$f"; done
unset IFS; set +f
If you have newlines in your filenames, then that won't work either. You're better off with an xargs based solution:
find /tmp \( -name '*.pdf' -or -name '*.doc' \) -print0 | xargs -0 rm
(The escaped brackets are required here to have the -print0 apply to both or clauses.)
GNU and *BSD find also has a -delete action, which would look like this:
find /tmp \( -name '*.pdf' -or -name '*.doc' \) -delete
find is just made for that.
find /tmp -name '*.pdf' -or -name '*.doc' | xargs rm
Without find:
for f in /tmp/* tmp/**/* ; do
...
done;
/tmp/* are files in dir and /tmp/**/* are files in subfolders. It is possible that you have to enable globstar option (shopt -s globstar).
So for the question the code should look like this:
shopt -s globstar
for f in /tmp/*.pdf /tmp/*.doc tmp/**/*.pdf tmp/**/*.doc ; do
rm "$f"
done
Note that this requires bash ≥4.0 (or zsh without shopt -s globstar, or ksh with set -o globstar instead of shopt -s globstar). Furthermore, in bash <4.3, this traverses symbolic links to directories as well as directories, which is usually not desirable.
If you want to do something recursively, I suggest you use recursion (yes, you can do it using stacks and so on, but hey).
recursiverm() {
for d in *; do
if [ -d "$d" ]; then
(cd -- "$d" && recursiverm)
fi
rm -f *.pdf
rm -f *.doc
done
}
(cd /tmp; recursiverm)
That said, find is probably a better choice as has already been suggested.
Here is an example using shell (bash):
#!/bin/bash
# loop & print a folder recusively,
print_folder_recurse() {
for i in "$1"/*;do
if [ -d "$i" ];then
echo "dir: $i"
print_folder_recurse "$i"
elif [ -f "$i" ]; then
echo "file: $i"
fi
done
}
# try get path from param
path=""
if [ -d "$1" ]; then
path=$1;
else
path="/tmp"
fi
echo "base path: $path"
print_folder_recurse $path
This doesn't answer your question directly, but you can solve your problem with a one-liner:
find /tmp \( -name "*.pdf" -o -name "*.doc" \) -type f -exec rm {} +
Some versions of find (GNU, BSD) have a -delete action which you can use instead of calling rm:
find /tmp \( -name "*.pdf" -o -name "*.doc" \) -type f -delete
For bash (since version 4.0):
shopt -s globstar nullglob dotglob
echo **/*".ext"
That's all.
The trailing extension ".ext" there to select files (or dirs) with that extension.
Option globstar activates the ** (search recursivelly).
Option nullglob removes an * when it matches no file/dir.
Option dotglob includes files that start wit a dot (hidden files).
Beware that before bash 4.3, **/ also traverses symbolic links to directories which is not desirable.
This method handles spaces well.
files="$(find -L "$dir" -type f)"
echo "Count: $(echo -n "$files" | wc -l)"
echo "$files" | while read file; do
echo "$file"
done
Edit, fixes off-by-one
function count() {
files="$(find -L "$1" -type f)";
if [[ "$files" == "" ]]; then
echo "No files";
return 0;
fi
file_count=$(echo "$files" | wc -l)
echo "Count: $file_count"
echo "$files" | while read file; do
echo "$file"
done
}
This is the simplest way I know to do this:
rm **/#(*.doc|*.pdf)
** makes this work recursively
#(*.doc|*.pdf) looks for a file ending in pdf OR doc
Easy to safely test by replacing rm with ls
The following function would recursively iterate through all the directories in the \home\ubuntu directory( whole directory structure under ubuntu ) and apply the necessary checks in else block.
function check {
for file in $1/*
do
if [ -d "$file" ]
then
check $file
else
##check for the file
if [ $(head -c 4 "$file") = "%PDF" ]; then
rm -r $file
fi
fi
done
}
domain=/home/ubuntu
check $domain
There is no reason to pipe the output of find into another utility. find has a -delete flag built into it.
find /tmp -name '*.pdf' -or -name '*.doc' -delete
The other answers provided will not include files or directories that start with a . the following worked for me:
#/bin/sh
getAll()
{
local fl1="$1"/*;
local fl2="$1"/.[!.]*;
local fl3="$1"/..?*;
for inpath in "$1"/* "$1"/.[!.]* "$1"/..?*; do
if [ "$inpath" != "$fl1" -a "$inpath" != "$fl2" -a "$inpath" != "$fl3" ]; then
stat --printf="%F\0%n\0\n" -- "$inpath";
if [ -d "$inpath" ]; then
getAll "$inpath"
#elif [ -f $inpath ]; then
fi;
fi;
done;
}
I think the most straightforward solution is to use recursion, in the following example, I have printed all the file names in the directory and its subdirectories.
You can modify it according to your needs.
#!/bin/bash
printAll() {
for i in "$1"/*;do # for all in the root
if [ -f "$i" ]; then # if a file exists
echo "$i" # print the file name
elif [ -d "$i" ];then # if a directroy exists
printAll "$i" # call printAll inside it (recursion)
fi
done
}
printAll $1 # e.g.: ./printAll.sh .
OUTPUT:
> ./printAll.sh .
./demoDir/4
./demoDir/mo st/1
./demoDir/m2/1557/5
./demoDir/Me/nna/7
./TEST
It works fine with spaces as well!
Note:
You can use echo $(basename "$i") # print the file name to print the file name without its path.
OR: Use echo ${i%/##*/}; # print the file name which runs extremely faster, without having to call the external basename.
Just do
find . -name '*.pdf'|xargs rm
If you can change the shell used to run the command, you can use ZSH to do the job.
#!/usr/bin/zsh
for file in /tmp/**/*
do
echo $file
done
This will recursively loop through all files/folders.
The following will loop through the given directory recursively and list all the contents :
for d in /home/ubuntu/*;
do
echo "listing contents of dir: $d";
ls -l $d/;
done

Resources