Bash script to String concat two variables and do File compare - bash

what I am trying to achieve is, to delete same filenames(filename+modfiedtimestamp)exisitng in Src_Dir1 and Src_Dir2
So first i have tried to deploy all the filenames to tempa(Src_Dir1) and tempb(Src_Dir2) respectively.
Below is the screenshot of the source directory.
Files inside archive be like this and few files outside too..
So, initially I am want to deal with the files inside Archive(SRC_Dir1) and later outside Archive(SRC_Dir2) what I am trying to do is to use a while loop to read each and every filename and string concat with the modified timestamp(mtime) and input to tempc(like for example it should be like AirTimeActs_2018-12-03.csv+2019-01-24 14:41:53.000000000 -0500 = AirTimeActs_2018-12-03.csv_2019-01-24 14:41:53.000000000 -0500 this is how it should be generating into tempc file for each and every filename inside Archive(SRC_Dir1). This is where I am stuck under string concat variable section on how to proceed. Please help me with the code, hope I am comprehensible.
IMPORTANT
(Really appreciate it, if you help me out with the extension of the code which i haven't mentioned here and yet to achieve which is - >
Have to implement the same code(which I am trying to do for tempa, I'd like to do it for tempb too and name it as tempd) and then do a file data compare between tempc and tempd) if there is any kind of same data filename, then delete the file existing in Src_Dir2, if there is no same data filename, then do nothing.)
#!/bin/bash
Src_Dir1=path/Airtime_Activation/Archive
Src_Dir2=path/Airtime_Activation/
find "$Src_Dir1" -maxdepth 1 -name "*.xlsx" -o -name "*.csv" | sed "s/.*\///" > -print>path/Airtime_Activation/temp_a
find "$Src_Dir2" -maxdepth 1 -name "*.xlsx" -o -name "*.csv" | sed "s/.*\///" > -print>path/Airtime_Activation/temp_b
echo 'phase1'
cat path/Airtime_Activation/temp_a | while read file;
do
echo 'phase1.5'
echo "$file"
echo 'phase2'
mtime=$(stat -c '%y' $file)
Full_name=${file}_${mtime}
echo "$Full_name" >> path/Airtime_Activation/temp_c
echo 'phase3'
done

#!/bin/bash
Src_Dir1=path/Airtime_Activation/Archive
Src_Dir2=path/Airtime_Activation/
find "$Src_Dir1" -maxdepth 1 -name "*.xlsx" -o -name "*.csv" | sed "s/.*\///" > -print>path/Airtime_Activation/temp_a
find "$Src_Dir2" -maxdepth 1 -name "*.xlsx" -o -name "*.csv" | sed "s/.*\///" > -print>path/Airtime_Activation/temp_b
echo 'phase1'
cat path/Airtime_Activation/temp_a | while read file;
do
echo 'phase1.5'
echo "$file"
echo 'phase2'
mtime=$(stat -c '%y' $file)
Full_name=${file}_${mtime}
echo "$Full_name" >> path/Airtime_Activation/temp_c
echo 'phase3'
done
cat /path/Airtime_Activation/temp_b | while read file
#while IFS="" read -r -d $'\0' file;
do
#echo "$file"
echo 'phase2'
mtime=$(stat -c '%y' $Src_Dir2/$file)
Full_name=${file}_${mtime}
echo "$Full_name" >> path/temp_d
echo 'phase3'
done
#file compare and delete old files from outisde archive
grep -Ff temp_d temp_c > path/Airtime_Activation/temp_e
cat path/Airtime_Activation/temp_e | while read file
#while IFS="" read -r -d $'\0' file;
do
#echo "$file"
echo 'phase2'
echo "${file%_*}"
rm $Src_Dir2/${file%_*}
echo 'phase3'
done

Related

bash iterate over a directory sorted by file size

As a webmaster, I generate a lot of junk files of code. Periodically I have to purge the unneeded files filtered by extention. Example: "cleaner txt" Easy enough. But I want to sort the files by size and process them for the "for" loop. How can I do that?
cleaner:
#/bin/bash
if [ -z "$1" ]; then
echo "Please supply the filename suffixes to delete.";
exit;
fi;
filter=$1;
for FILE in *.$filter; do clear;
cat $FILE; printf '\n\n'; rm -i $FILE; done
You can use a mix of find (to print file sizes and names), sort (to sort the output of find) and cut (to remove the sizes). In case you have very unusual file names containing any possible character including newlines, it is safer to separate the files by a character that cannot be part of a name: NUL.
#/bin/bash
if [ -z "$1" ]; then
echo "Please supply the filename suffixes to delete.";
exit;
fi;
filter=$1;
while IFS= read -r -d '' -u 3 FILE; do
clear
cat "$FILE"
printf '\n\n'
rm -i "$FILE"
done 3< <(find . -mindepth 1 -maxdepth 1 -type f -name "*.$filter" \
-printf '%s\t%p\0' | sort -zn | cut -zf 2-)
Note that we must use a different file descriptor than stdin (3 in this example) to pass the file names to the loop. Else, if we use stdin, it will also be used to provide the answers to rm -i.
Inspired from this answer, you could use the find command as follows:
find ./ -type f -name "*.yaml" -printf "%s %p\n" | sort -n
find command prints the the size of the files and the path so that the sort command prints the results from the smaller one to the larger.
In case you want to iterate through (let's say) the 5 bigger files you can do something like this using the tail command like this:
for f in $(find ./ -type f -name "*.yaml" -printf "%s %p\n" |
sort -n |
cut -d ' ' -f 2)
do
echo "### $f"
done
If the file names don't contain newlines and spaces
while read filesize filename; do
printf "%-25s has size %10d\n" "$filename" "$filesize"
done < <(du -bs *."$filter"|sort -n)
while read filename; do
echo "$filename"
done < <(du -bs *."$filter"|sort -n|awk '{$0=$2}1')

find and grep / zgrep / lzgrep progress bar

I would like to add a progress bar to this command line:
find . \( -iname "*.bz" -o -iname "*.zip" -o -iname "*.gz" -o -iname "*.rar" \) -print0 | while read -d '' file; do echo "$file"; lzgrep -a stringtosearch\.anything "$file"; done
The progress file should be calculated on the total of compressed size files (not on the single file).
Of course, it can be a script too.
I would also like to add other progress bars, if possible:
The total number of files processed (example 3 out of 21)
The percentage of progress of the single file
Can anybody help me please?
Here some example of it should look alike (example from here):
tar cf - /folder-with-big-files -P | pv -s $(du -sb /folder-with-big-files | awk '{print $1}') | gzip > big-files.tar.gz
Multiple progress bars (example from here):
pv -cN orig < foo.tar.bz2 | bzcat | pv -cN bzcat | gzip -9 | pv -cN gzip > foo.tar.gz
Thanks,
This is the first time I've ever heard of pv and it's not on any machine I have access to but assuming it needs to know a total at startup and then a number on each iteration of a command, you could do something like this to get a progress bar per file processed:
IFS= readarray -d '' files < <(find . -whatever -print0)
printf '%s\n' "${files[#]}" | pv -s "${#files[#]}" | command
The first line gives you an array of files so you can then use "${#files[#]}" to provide pv it's initial total value (looks like you use -s value for that?) and then do whatever you normally do to get progress as each file is processed.
I don't see any way to tell pv that the pipe it's reading from is NUL-terminated rather than newline-terminated so if your files can have newlines in their names then you'd have to figure out how to solve that problem.
To additionally get progress on a single file you might need something like:
IFS= readarray -d '' files < <(find . -whatever -print0)
printf '%s\n' "${files[#]}" |
pv -s "${#files[#]}" |
xargs -n 1 -I {} sh -c 'pv {} | command'
I don't have pv so all of the above is untested so check the syntax, especially since I've never heard of pv :-).
Thanks to Max C., I found a solution for the main question:
find ./ -type f -iname *\.gz -o -iname *\.bz | (tot=0;while read fname; do s=$(stat -c%s "$fname"); if [ ! -z "$s" ] ; then echo "$fname"; tot=$(($tot+$s)); fi; done; echo $tot) | tac | (read size; xargs -i{} cat "{}" | pv -s $size | lzgrep -a something -)
But this work only for gz and bz files, now I have to develop to use different tool according to extension.
I'm gonna to try the Ed solution too.
Thanks to ED and Max C., here the verision 0.2
This version work with zgrep, but not with lzgrep. :-\
#!/bin/bash
echo -n "collecting dump... "
IFS= readarray -d '' files < <(find . \( -iname "*.bz" -o -iname "*.gz" \) -print0)
echo done
echo "Calculating archives size..."
tot=0
for line in "${files[#]}"; do
s=$(stat -c\%s "$line")
if [ ! -z "$s" ]
then
tot=$(($tot+$s))
fi
done
(for line in "${files[#]}"; do
s=$(stat -c\%s "$line")
if [ ! -z "$s" ]
then
echo "$line"
fi
done
) | xargs -i{} sh -c 'echo Processing file: "{}" 1>&2 ; cat "{}"' | pv -s $tot | zgrep -a anything -

Is there a way to optimize this code and make it faster or anyother better solution?

I am looking to collect yang models from my project .jar files.Though i came with an approach but it takes time and my colleagues are not happy.
#!/bin/sh
set -e
# FIXME: make this tuneable
OUTPUT="yang models"
INPUT="."
JARS=`find $INPUT/system/org/linters -type f -name '*.jar' | sort -u`
# FIXME: also wipe output?
[ -d "$OUTPUT" ] || mkdir "$OUTPUT"
for jar in $JARS; do
artifact=`basename $jar | sed 's/.jar$//'`
echo "Extracting modules from $artifact"
# FIXME: better control over unzip errors
unzip -q "$jar" 'META-INF/yang/*' -d "$artifact" \
2>/dev/null || true
dir="$artifact/META-INF/yang"
if [ -d "$dir" ]; then
for file in `find $dir -type f -name '*.yang'`; do
module=`basename "$file"`
echo -e "\t$module"
# FIXME: better duplicate detection
mv -n "$file" "$OUTPUT"
done
fi
rm -rf "$artifact"
done
If the .jar files don't all change between invocations of your script then you could make the script significantly faster by caching the .jar files and only operating on the ones that changed, e.g.:
#!/bin/env bash
set -e
# FIXME: make this tuneable
output='yang models'
input='.'
cache='/some/where'
mkdir -p "$cache" || exit 1
readarray -d '' jars < <(find "$input/system/org/linters" -type f -name '*.jar' -print0 | sort -zu)
# FIXME: also wipe output?
mkdir -p "$output" || exit 1
for jarpath in "${jars[#]}"; do
diff -q "$jarpath" "$cache" || continue
cp "$jarpath" "$cache"
jarfile="${jarpath##*/}"
artifact="${jarfile%.*}"
printf 'Extracting modules from %s\n' "$artifact"
# FIXME: better control over unzip errors
unzip -q "$jarpath" 'META-INF/yang/*' -d "$artifact" 2>/dev/null
dir="$artifact/META-INF/yang"
if [ -d "$dir" ]; then
readarray -d '' yangs < <(find "$dir" -type f -name '*.yang' -print0)
for yangpath in "${yangs[#]}"; do
yangfile="${yangpath##*/}"
printf '\t%s\n' "$yangfile"
# FIXME: better duplicate detection
mv -n "$yangpath" "$output"
done
fi
rm -rf "$artifact"
done
See Correct Bash and shell script variable capitalization, http://mywiki.wooledge.org/BashFAQ/082, https://mywiki.wooledge.org/Quotes, How can I store the "find" command results as an array in Bash for some of the other changes I made above.
I assume you have some reason for looping on the .yang files and not moving them if a file by the same name already exists rather than unzipping the .jar file into the final output directory.

Rename files to unique names and move them into a single destination directory

i have 100s of directories with same filename of content.html along with other files.
I am trying to copy all these content.html files under 1 directory, but since they have same name, it overwrites each other
so how can i rename and move all these under 1 directory
Eg:
./0BD3D9D2-F8B1-4472-95C2-13319650A45C:
card.png content.html note.xhtml quickLook.png snippet.txt
./0EA34DB4-CD56-42BE-91DA-F631E44FB6E0:
card.png content.html note.xhtml quickLook.png related snippet.txt
./1A33F29E-3938-4C2F-BA99-6B98FD045742:
card.png content.html note.xhtml quickLook.png snippet.txt
command i tried:
rename content.html to content
find . -type f | grep content.html | while read f; do mv $f ${f/.html/}; done
append number to filename "content" to make it unique
find . -type f | grep content | while read f; do i=1; echo mv $f $f$i.html; i=i+1; done
MacBook-Pro$ find . -type f | grep content | while read f; do i=1; echo mv $f $f$i.html; i=i+1; done
mv ./0BD3D9D2-F8B1-4472-95C2-13319650A45C/content ./0BD3D9D2-F8B1-4472-95C2-13319650A45C/content1.html
mv ./0EA34DB4-CD56-42BE-91DA-F631E44FB6E0/content ./0EA34DB4-CD56-42BE-91DA-F631E44FB6E0/content1.html
mv ./1A33F29E-3938-4C2F-BA99-6B98FD045742/content ./1A33F29E-3938-4C2F-BA99-6B98FD045742/content1.html
once above step is successful, i should be able do this to achieve my desired output:
find . -type f | grep content | while read f; do mv $f ../; done
however, i am sure i can do this in 1 step command and also my step 2 is not working (incrementing i)
any idea why step2 is not working??
bash script:
#!/bin/bash
find . -type f -name content.html | while IFS= read -r f; do
name=$(basename $f)
((++i))
mv "$f" "for_content/${name%.*}$i.html"
done
replace for_content with your destination folder name
Suppose in your base directory, you create a folder named final for storing
content.html files, then do something like below
find . -path ./final -prune -o -name "content.html" -print0 |
while read -r -d '' name
do
mv "$name" "./final/content$(mktemp -u XXXX).html"
# mktemp with -u option just creates random characters, or it is just a dry run
done
At the end you'll get all the content.html files under ./final folder in the format contentXXXX.html where XXXX are random characters.
Note:-path ./final -prune -o in find prevents it from descending to our results folder.
The inode of the of the files should be unique and so you could use the following:
find $(pwd) -name "content.html" -printf %f" "%i" "%p"\n" | awk '{ system("mv "$3" <directorytomoveto>"$2$1) }'
I'd use something like this:
find . -type f -name 'test' | awk 'BEGIN{ cnt=0 }{ printf "mv %s ./output-dir/content_%03d.txt\n", $0, cnt++ }' | bash;
You can replace ./output-dir/ with your destination directory
Example:
[root#sl7-o2 test]# ls -R
.:
1 2 3 output-dir
./1:
test
./2:
test
./3:
test
./output-dir:
[root#sl7-o2 test]# find . -type f -name 'test' | awk 'BEGIN{ cnt=0 }{ printf "mv %s ./output-dir/content_%03d.txt\n", $0, cnt++ }' | bash;
[root#sl7-o2 test]# ls ./output-dir/
content_000.txt content_001.txt content_002.txt
You can use shopt -s globstar to grab all content.html files recursively and then use a loop to rename them:
#!/bin/bash
set -o globstar
counter=0
dest_dir=/path/to/destination
for f in **/content.html; do # pick up all content.html files
[[ -f "$f" ]] || continue # skip if not a regular file
mv "$f" "$dest_dir/content_$((++counter).html"
done

iterate over lines in file then find in directory

I am having trouble looping and searching. It seems that the loop is not waiting for the find to finish. What am I doing wrong?
I made a loop the reads a file line by line. I then want to use that "name" to search a directory looking to see if a folder has that name. If it exists copy it to a drive.
#!/bin/bash
DIRFIND="$2"
DIRCOPY="$3"
if [ -d $DIRFIND ]; then
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "$line"
FILE=`find "$DIRFIND" -type d -name "$line"`
if [ -n "$FILE" ]; then
echo "Found $FILE"
cp -a "$FILE" "$DIRCOPY"
else
echo "$line not found."
fi
done < "$1"
else
echo "No such file or directory"
fi
Have you tried xargs...
Proposed Solution
cat filenamelist | xargs -n1 -I {} find . -type d -name {} -print | xargs -n1 -I {} mv {} .
what the above does is pipe a list of filenames into find (one at a time), when found find prints the name and passes to xarg which moves the file...
Expansion
file = yogo
yogo -> | xargs -n1 -I yogo find . -type d -name yogo -print | xargs -n1 -I {} mv ./<path>/yogo .
I hope the above helps, note that xargs has the advantage that you do not run out of command line buffer.

Resources