Simplest Bash code to find what files from a defined list don't exist in a directory? - bash

This is what I came up with. It works perfectly -- I'm just curious if there's a smaller/crunchier way to do it. (wondering if possible without a loop)
files='file1|file2|file3|file4|file5'
path='/my/path'
found=$(find "$path" -regextype posix-extended -type f -regex ".*\/($files)")
for file in $(echo "$files" | tr '|', ' ')
do
if [[ ! "$found" =~ "$file" ]]
then
echo "$file"
fi
done

You can do this without invoking any external tools:
IFS="|"
for file in $files
do
[ -f "$file" ] || printf "%s\n" "$file"
done

Your code will break if you have file names with whitespace. This is how I would do it, which is a bit more concise.
echo "$files" | tr '|' '\n' | while read file; do
[ -e "$file" ] || echo "$file"
done
You can probably play around with xargs if you want to get rid of the loop all together.

$ eval "ls $path/{${files//|/,}} 2>&1 1>/dev/null | awk '{print \$4}' | tr -d :"
Or use awk
$ echo -n $files | awk -v path=$path -v RS='|' '{printf("! [[ -e %s ]] && echo %s\n", path"/"$0, path"/"$0) | "bash"}'

without whitespace in filenames:
files=(mbox todo watt zoff xorf)
for f in ${files[#]}; do test -f $f || echo $f ; done

Related

How to Parse the data from .property file from Jenkins using Index

I have a Property file in Jenkins lets call it Something.txt
and Something.txt contains
A_B_C
D_E_F
and i have used below shell to read the file and Execute my Automation
file="/var/lib/jenkins/components.txt"
if [ -f "$file" ]
then
echo "$file found."
Websites="$(awk -F '_' '{print $1}' $file | paste -d, -s)"
Profiles="$(awk -F '_' '{print $2}' $file | paste -d, -s)"
Component="$(awk -F '_' '{print $3}' $file | paste -d, -s)"
for i in $(echo $Websites | sed "s/,/ /g"); do
for j in $(echo $Profiles | sed "s/,/ /g"); do
for k in $(echo $Component| sed "s/,/ /g"); do
mvn clean verify -D "cucumber.options=--tags #"${j} -D surefire.suiteXmlFiles=./XMLScripts/${i}.${k}.testng.xml ||true
done
done
done
but what is happening is My Job is running as
A-B-C & A-B-F & D-B-C & B-E-F
but the expected result is A-B-C & D-E-F how to achieve this?
Don't read lines with for
#!/usr/bin/env bash
file="/var/lib/jenkins/components.txt"
if [[ -f "$file" ]]; then
while IFS=_ read -r website profile component; do
printf '%s %s %s\n' "$website" "$profile" "$component"
done < "$file"
fi
In your case you can do
#!/usr/bin/env bash
file="/var/lib/jenkins/components.txt"
if [[ -f "$file" ]]; then
while IFS=_ read -r website profile component; do
echo mvn clean verify -D cucumber.options=--tags #"$website" -D "surefire.suiteXmlFiles=./XMLScripts/$profile.$component.testng.xml" ||true
done < "$file"
fi
Remove the echo if you're satisfied with the result.

Bash - Extract Matching String from GZIP Files Is Running Very Slow

Complete novice in Bash. Trying to iterate thru 1000 gzip files, may be GNU parallel is the solution??
#!/bin/bash
ctr=0
echo "file_name,symbol,record_count" > $1
dir="/data/myfolder"
for f in "$dir"/*.gz; do
gunzip -c $f | while read line;
do
str=`echo $line | cut -d"|" -f1`
if [ "$str" == "H" ]; then
if [ $ctr -gt 0 ]; then
echo "$f,$sym,$ctr" >> $1
fi
ctr=0
sym=`echo $line | cut -d"|" -f3`
echo $sym
else
ctr=$((ctr+1))
fi
done
done
Any help to speed the process will be greatly appreciated !!!
#!/bin/bash
ctr=0
export ctr
echo "file_name,symbol,record_count" > $1
dir="/data/myfolder"
export dir
doit() {
f="$1"
gunzip -c $f | while read line;
do
str=`echo $line | cut -d"|" -f1`
if [ "$str" == "H" ]; then
if [ $ctr -gt 0 ]; then
echo "$f,$sym,$ctr"
fi
ctr=0
sym=`echo $line | cut -d"|" -f3`
echo $sym >&2
else
ctr=$((ctr+1))
fi
done
}
export -f doit
parallel doit ::: *gz 2>&1 > $1
The Bash while read loop is probably your main bottleneck here. Calling multiple external processes for simple field splitting will exacerbate the problem. Briefly,
while IFS="|" read -r first second third rest; do ...
leverages the shell's built-in field splitting functionality, but you probably want to convert the whole thing to a simple Awk script anyway.
echo "file_name,symbol,record_count" > "$1"
for f in "/data/myfolder"/*.gz; do
gunzip -c "$f" |
awk -F "\|" -v f="$f" -v OFS="," '
/H/ { if(ctr) print f, sym, ctr
ctr=0; sym=$3;
print sym >"/dev/stderr"
next }
{ ++ctr }'
done >>"$1"
This vaguely assumes that printing the lone sym is just for diagnostics. It should hopefully not be hard to see how this can be refactored if this is an incorrect assumption.

bash script loop to check if variable contains string - not working

i have a script which copy files from one s3 bucket to local server, do some stuff and upload it to another s3 bucket.
in the original bucket i have few folders, one of them called "OTHER"
i dot want my script to work on this folder
i tried to define a loop to check if the path string does not contains the string "OTHER" only then to continue to other commands but for some reason it is not working.
what am i doing wrong ?
#!/bin/bash
shopt -s extglob
gcs3='s3://gc-reporting-pud-production/splunk_printer_log_files/'
gcs3ls=$((aws s3 ls 's3://gc-reporting-pud-production/splunk_printer_log_files/' --recursive) | sed 's/^.*\(splunk_printer.*\)/\1/g'| tr -s ' ' | tr ' ' '_')
ssyss3=s3://ssyssplunk
tokenFile=/splunkData/GCLogs/tokenFile.txt
nextToken=$((aws s3api list-objects-v2 --bucket "gc-reporting-pud-production" --prefix splunk_printer_log_files/ --max-items 5) |grep -o 'NEXTTOKEN.*' |awk -F " " '{print $2}')
newToken=$( tail -n 1 /splunkData/GCLogs/tokenFile.txt )
waterMark=$(aws s3api list-objects-v2 --bucket "gc-reporting-pud-production" --prefix splunk_printer_log_files/ --max-items 5 --starting-token
$newToken|sed 's/^.*\(splunk_printer.*zip\).*$/\1/'|sed '1d'|sed '$d')
while true; do
for j in $waterMark ; do
echo $j
if [ "$j" != *"OTHER"* ]; then
gcRegion=$(echo $j | awk -F'/' '{print $2}')
echo "gcRegion:"$gcRegion
if [ "$gcRegion" != "OTHER" ]; then
gcTech=$(echo $j | awk -F'/' '{print $3}')
echo "GCTech:"$gcTech
gcPrinterFamily=$(echo $j | awk -F'/' '{print $4}')
echo "gcPrinterFamily:" $gcPrinterFamily
gcPrinterType=$(echo $j | awk -F'/' '{print $5}')
echo "gcPrinterType:" $gcPrinterType
gcPrinterName=$(echo $j| awk -F'/' '{print $6}')
echo "gcPrinterName:" $gcPrinterName
gcFileName=$(echo $j| awk -F'/' '{print $7}'| awk -F'.zip' '{print $1}')
echo "gcFileName:" $gcFileName
cd /splunkData/GCLogs
dir="/splunkData/GCLogs/$gcRegion/$gcTech/$gcPrinterFamily/$gcPrinterType/$gcPrinterName"
echo "dir:"$dir
mkdir -p $dir
aws s3 sync $gcs3$gcRegion/$gcTech/$gcPrinterFamily/$gcPrinterType/$gcPrinterName/ $dir
find $dir -name '*.zip' -exec sh -c 'unzip -o -d "${0%.*}" "$0"' '{}' ';'
aws s3 cp $dir $ssyss3/$gcRegion/$gcTech/$gcPrinterFamily/$gcPrinterType/$gcPrinterName/ --recursive --exclude "*.zip"
newToken=$( tail -n 1 /splunkData/GCLogs/tokenFile.txt )
nextToken=$(aws s3api list-objects-v2 --bucket "gc-reporting-pud-production" --prefix splunk_printer_log_files/ --max-items 5 --starting-token $newToken |grep -o 'NEXTTOKEN.*' |awk -F " " '{print $2}')
waterMark=$(aws s3api list-objects-v2 --bucket "gc-reporting-pud-production" --prefix splunk_printer_log_files/ --max-items 5 --starting-token $newToken|sed 's/^.*\(splunk_printer.*zip\).*$/\1/'|sed '1d'|sed '$d')
echo "$nextToken" > "$tokenFile"
fi
fi
done
done
You need to use the double-bracket conditional command to turn == and != into pattern matching operators:
if [[ "$j" != *"OTHER"* ]]; then
# ^^ ^^
Or use case
case "$j" in
*OTHER*) ... ;;
*) echo "this is like an `else` block" ;;
esac
Paste your code into https://www.shellcheck.net/ for other things to fix.
I think glenn jackman was on the right path. Try this:
if [[ "$j" != *OTHER* ]]; then
The [[ ]] is required for pattern string matching (and you have to remove the " ). The case statement is also a good idea. You can abandon the shell test altogether and use grep as follows:
if
grep -q '.*OTHER.*' <<< "$j" 2>/dev/null
then
...
fi
Here's a check of the [[ ]]:
$ echo $j
abOTHERc
$ [[ "$j" == *OTHER* ]]
$ echo $?
0
As per BenjaminW., the quotes around $j in [[ ]] are unnecessary. However, the quotes around *OTHER* do make a big difference. See below:
$ j="OTHER THINGS"
$ [[ $j == "*OTHER*" ]] ; echo "$j" matches '"*OTHER*"': $?
OTHER THINGS matches "*OTHER*": 1
$ [[ $j == *OTHER* ]] ; echo "$j" matches '*OTHER*': $?
OTHER THINGS matches *OTHER*: 0

Trying to make a bash script that extracts html code from files

I'm making a script to extract html code from .html files in a directory which happen to have non-html code outside the html tags. I wish for the output overwrite the source files
Here is what I have so far but I'm having trouble getting it work.
#!/bin/bash
for f in `ls .`; do
if [[ $f =~ \.html$ ]]
then
cat $f | tr "\n" "|" | grep -o '<html>.*</html>' | sed 's/|/\n/g' > $f
fi
done
#!/bin/bash
for f in `ls .`; do
if [[ $f =~ \.html$ ]]
then
cat $f | tr "\n" "|" | grep -o '<html>.*</html>' | sed 's/|/\n/g' > $f.temp
mv $f.temp $f
fi
done
You can replace the whole script with:
sed -i '/<[Hh][Tt][Mm][Ll]/,/<\/[Hh][Tt][Mm][Ll]/!d' *.html
Or if you don't need it to be case-insensitive:
sed -i '/<html/,/<\/html/!d' *.html

How to quote file name using awk?

I want output 'filename1','filename2' ,'filename3' ....
I m using awk ..but no idea how to print last quoate after filename.
It printing me ,'filename ===>I need ,'filename'
ls -ltr | grep -v ^d | sed '1d'| awk '{print "," sprintf("%c", 39) $9}'
Thanks in advance!
You can use the find command as:
find . -maxdepth 1 -type f -printf "'%f'," | sed s/,$//
if you have Ruby(1.9+)
ruby -e 'puts Dir["*"].select{|x|test(?f,x)}.join("\47,\47")'
else
find . -maxdepth 1 -type f -printf '%f\n' | sed -e ':a N' -e "s#\n#','#" -e 'b a'
Use the printf function http://www.gnu.org/manual/gawk/html_node/Basic-Printf.html
Pure bash (probably posix sh, too):
comma=
for file in * ; do
if [ ! -d "$file" ] ; then
if [ ! -z $comma ] ; then
printf ","
fi
comma=1
printf "'%s'" "$file"
fi
done
Files with ' in the name are not accounted for, but nobody else has been doing that either. Presuming that escaping with \ is correct you could do.
comma=
for file in * ; do
if [ ! -d "$file" ] ; then
if [ ! -z $comma ] ; then
printf ","
fi
comma=1
printf "'%s'" "${file//\'/\'}"
fi
done
But some CSV systems would require you to follow write '' instead, which would be
printf "'%s'" "${file//\'/''}"
Let's pretend that you're processing some other data besides the output of ls.
$ printf "hello\ngoodbye\no'malley\n" | awk '{gsub("\047","\047\\\047\047",$1);printf "%s\047%s\047",comma,$1; comma=","}END{printf "\n"}'
'hello','goodbye','o'\''malley'
This variant works fine but I think there should be more elegant way to do it.
ls -1 $1 | cut -d'.' -f1 | awk '{printf "," sprintf("%c", 39) $1 sprintf("%c", 39) "\n" }'| sed '1 s/,*//'

Resources