Creating file takes time in bash - bash

I have a bash script in which I am doing string substitutions by taking input values different source files to create one complete string record. I have to create 5L such records in a file in 5mins on-the-go(records need to be written to the file as soon as it is created), however the script is very slow (20k records in 5mins). Below is the script I used.
#!/bin/bash
sampleRecod="__TIME__-0400 INFO 639582 truefile?apikey=__API_KEY__json||__STATUS__|34|0||0|0|__MAINSIZE__|1|"
count=0;
license_array=(`cat license.txt | xargs`)
status_array=(`cat status.json | xargs`)
error_array=(`cat 403.json | xargs`)
finalRes="";
echo $(date +"%Y-%m-%dT%H:%M:%S.%3N")
while true;do
time=$(date +'%Y-%m-%dT%T.%3N')
line=${license_array[`shuf -i 0-963 -n 1`]}
status=${status_array[`shuf -i 0-7 -n 1`]}
responseMainPart=$(shuf -i 100-999 -n 1)
if [ $status -eq 403 ] || [ $status -eq 0 ]
then
responseMainPart=${error_array[`shuf -i 0-3 -n 1`]}
fi
result=$(echo "$sampleRecod" | sed "s/__TIME__/$time/g")
result=$(echo "$result" | sed "s/__KEY__/$line/g")
result=$(echo "$result" | sed "s/__STATUS__/$status/g")
result=$(echo "$result" | sed "s/__MAIN_SIZE__/$responseMainPart/g")
finalRes+="${result} \n";
count=$((count+1))
if [ $count -eq 1000 ]
then
#echo "got count";
count=0;
echo -e $finalRes >> new_data_1.log;
finalRes="";
fi
done
echo -e $finalRes >> new_data_1.log;
echo $(date +"%Y-%m-%dT%H:%M:%S.%3N")
Can anyone suggest how can I optimize this?? The files I am retrieving values do not have many lines as well.
I have tried replacing shuf with sed but still not much help.

Related

How to remove this repeated call to `ack`

I'm new to bash and I am writing a shell script that goes through the listed dependencies in package.json (jq), sees how many times it's used (ack), and if it's less than 2 times, echo that.
arr=( $(jq -r '.dependencies | keys | .[]' package.json) )
for i in "${arr[#]}"
do
n=$(ack $i --ignore-dir=dist --ignore-file='match:/checkDependencies.sh|package.json/' | wc -l)
if [[ $n -le 2 ]]; then
echo "Package $i has too few occurences"
ack $i --ignore-dir=dist --ignore-file='match:/checkDependencies.sh|package.json/'
echo
fi
done
You can see that I ack twice. How can I just ack once? I tried setting the output to a variable but it's not working how I want it to.
output from john1024's answer:
bash -x checkDependencies.sh
+ arr=($(jq -r '.devDependencies | keys | .[]' package.json))
++ jq -r '.devDependencies | keys | .[]' package.json
checkDependencies.sh: line 4: syntax error near unexpected token `s=$(ack "$i" --ignore-dir=dist --ignore-file='match:/package.json/')'
checkDependencies.sh: line 4: ` s=$(ack "$i" --ignore-dir=dist --ignore-file='match:/package.json/')'
provided solution (now with the do added)
arr=( $(jq -r '.devDependencies | keys | .[]' package.json) )
for i in "${arr[#]}"
do
s=$(ack "$i" --ignore-dir=dist --ignore-file='match:/package.json/')
if [ "$(wc -l <<<"$s")" -le 2 ]; then
echo "Package $i has too few occurences"
echo "$s"
fi
done
To use ack only once, try:
arr=( $(jq -r '.dependencies | keys | .[]' package.json) )
for i in "${arr[#]}"
do
s=$(ack "$i" --ignore-dir=dist --ignore-file='match:/checkDependencies.sh|package.json/')
if [ "$(wc -l <<<"$s")" -le 2 ]; then
echo "Package $i has too few occurences"
echo "$s"
fi
done
Here, for each package, we store ack's output in variable s and then use s wherever the output of ack is needed.

If condition for "not equal" is not working as expected in shell script

#!/bin/bash
a=2
b=2
COUNTER=0
sam="abcd"
sam1="xyz"
sam2="mno"
for x in ls | grep .rpm
do
`C=rpm -qpR $x | grep -v CompressedFileNames | grep -v PayloadFilesHavePrefix | wc -l`
if [ "sam2"!="$sam1" ]
then
echo "${sam1}"
echo "${sam2}"
if [ $C -eq $a ]
then
COUNTER=$((COUNTER+1))
echo "${x}"
eval sam=$x
#eval sam1=sam | cut -d '-' -f 1
sam1=`echo "${sam}"| cut -d '-' -f 1`
if [ $COUNTER -eq $b ]
then
break
fi
fi
fi
sam2=`echo "${x}"| cut -d '-' -f 1`
done
This is the output I am getting:
xyz
mno
comps-4ES-0.20050107.x86_64.rpm
comps
comps
comps-4ES-0.20050525.x86_64.rpm
My question is: why is the if condition returning true despite sam1 and sam2 being equal? I have checked for non-equality.
Response is the same even if I use
if [ $C -eq $a ] && [ "$sam2" != " $sam1" ]
As Ansgar Wiechers pointed out, you're missing a "$" in front of the sam2 variable. That way, you're comparing the literal string "sam2" with the string value of $sam1 (which initially is set to "xyz"). What you want to do is compare the string values of both variables:
if [ "$sam2" != "$sam1" ]
Regarding $C, you should only include the commands to be evaluated inside backticks, not the evaluation itself. This is called a command substitution - a subshell is created in which the commands are executed, and the backtick expression is substituted by the computed value. The line should look like this:
C=`rpm -qpR $x | grep -v CompressedFileNames | grep -v PayloadFilesHavePrefix | wc -l`
Your for loop also needs a command substitution: for x in ls | grep .rpm makes it look as if you're piping the output of a for command into grep. What you want to do is iterate over the ls | grep part, which you can do with the following command substitution:
for x in `ls | grep .rpm`
Hi Guys Got the solution:
#!/bin/bash
read -p "enter dep number" a
read -p "enter no of rpms" b
COUNTER=0
sam="abcd"
sam1="xyz"
sam2="mno"
for x in `ls | grep .rpm`
do
C=`rpm -qpR $x |grep -v CompressedFileNames | grep -v PayloadFilesHavePrefix | wc -l`
# echo "${C}:c"
if [ $C -eq $a ] && [ "$sam2" != "$sam1" ]
then
COUNTER=$((COUNTER+1))
# echo "${COUNTER}:counter"
# echo "${x}"
eval sam=$x
#eval sam1=sam | cut -d '-' -f 1
sam1=`echo "${sam}"| cut -d '-' -f 1`
if [ $COUNTER -eq $b ]
then
break
fi
fi
sam2=`echo "${x}"| cut -d '-' -f 1`
#echo "${sam2}"
#echo "${sam1}"
done

Creating a shell script with diff function to compare multiple files

I have five different files and all are in different directory, I want to check matching files and find out the unique files as well.
I am not sure how should I handle this.
You can look to the output of
chksum "path1/file1" "path2/f2" "p3/f3" "p4/f4" "p5/f5" | sort
You can also make a script looping through the files with
files=("path1/file1" "path2/f2" "p3/f3" "p4/f4" "p5/f5")
for i in {0..4}; do
((j=$i+1))
while [ $j -le 4 ]; do
diff "${files[i]}" "${files[j]}" >/dev/null
if [ $? -eq 0 ]; then
echo "${files[i]} and ${files[j]} are the same."
else
echo "${files[i]} and ${files[j]} are different."
fi
((j++))
done
done
You can use cksum ou md5sum to detect identical files :
find . -type f | while read f; do md5sum "$f"; done > tmp.txt
cat tmp.txt | cut -d" " -f1 | while read c
do n=`grep $c tmp.txt | wc -l`
if [ "$n" != "1" ]; then
grep $c tmp.txt
fi
done | sort -u

how to speed up checking if file exists in bash

I'm new at Bashing and wrote a code to check my photos files but find it very slow and gets a few empty returns checking 17000+ photos. Is there any way to use all 4 cpus running this script and so speed it up
Please help
#!/bin/bash
readarray -t array < ~/Scripts/ourphotos.txt
totalfiles="${#array[#]}"
echo $totalfiles
i=0
ii=0
check1=""
while :
do
check=${array[$i]}
if [[ ! -r $( echo $check ) ]] ; then
if [ $check = $check1 ]; then
echo "empty "$check
else
unset array[$i]
ii=$((ii + 1 ))
fi
fi
if [ $totalfiles = $i ]; then
break
fi
i=$(( i + 1 ))
done
if [ $ii -gt "1" ]; then
notify-send -u critical $ii" files have been deleted or are unreadable"
fi
It's a filesystem operation so multiple cores will hardly help.
Simplification might:
while read file; do
i=$((i+1)); [ -e "$file" ] || ii=$(ii+1));
done < "$HOME/Scripts/ourphotos.txt"
#...
Two points:
you don't need to keep the whole file in memory (no arrays needed)
$( echo $check ) forks a proces. You generally want to avoid forking and execing in loops.
This is an old question, but a common problem lacking an evidence-based solution.
awk '{print "[ -e "$1" ] && echo "$2}' | parallel # 400 files/s
awk '{print "[ -e "$1" ] && echo "$2}' | bash # 6000 files/s
while read file; do [ -e $file ] && echo $file; done # 12000 files/s
xargs find # 200000 files/s
parallel --xargs find # 250000 files/s
xargs -P2 find # 400000 files/s
xargs -P96 find # 800000 files/s
I tried this on a few different systems and the results were not consistent, but xargs -P (parallel execution) was consistently the fastest. I was surprised that xargs -P was faster than GNU parallel (not reported above, but sometimes much faster), and I was surprised that parallel execution helped so much — I thought that file I/O would be the limiting factor and parallel execution wouldn't matter much.
Also noteworthy is that xargs find is about 20x faster than the accepted solution, and much more concise. For example, here is a rewrite of OP's script:
#!/bin/bash
total=$(wc -l ~/Scripts/ourphotos.txt | awk '{print $1}')
# tr '\n' '\0' | xargs -0 handles spaces and other funny characters in filenames
found=$(cat ~//Scripts/ourphotos.txt | tr '\n' '\0' | xargs -0 -P4 find | wc -l)
if [ $total -ne $found ]; then
ii=$(expr $total - $found)
notify-send -u critical $ii" files have been deleted or are unreadable"
fi

Converting time formats in zsh

I have created a z shell script that takes a time option argument. (ex: --time 00:03:30). I would like the user to be able to enter the time in a format other than HH:MM:SS. My script can already convert HH:MM:SS into seconds (which is what the end result needs to be), but the HH:MM:SS format is clunky. So, I made a function that will convert another format (#h#m#s) into the original one:
if [ "$(echo "$1" | grep -E "([[:digit:]]+[hms]|[[:digit:]]+[hms][[:digit:]]+[hms]|[[:digit:]]+[hms][[:digit:]]+[hms][[:digit:]]+[hms])")" ]; then
if [ "$(echo "$1" | grep "h")" ]; then
H="$(echo "$1" | sed -E 's|.*([[:digit:]]+)h.*|\1|')"
else
H=00
fi
if [ "$(echo "$1" | grep "m")" ]; then
M="$(echo "$1" | sed -E 's|.*([[:digit:]]+)m.*|\1|')"
else
M=00
fi
if [ "$(echo "$1" | grep "s")" ]; then
S="$(echo "$1" | sed -E 's|.*([[:digit:]]+)s.*|\1|')"
else
S=00
fi
echo "$H:$M:$S" | sed -Ee 's|:([[:digit:]]):|:0\1:|' -e 's|^([[:digit:]]):|0\1:|' -e 's|:([[:digit:]])$|:0\1|'
fi
Yes, I have created my own solution. I came here to find a better one. Also, if you know of any other formats that you can convert into to seconds please let me know.
Is this what you want?
% zmodload zsh/datetime
% echo $(( $(strftime -r '%H:%M:%S' 01:01:12) - $(strftime -r '%H:%M:%S' 0:0:0) ))
3672

Resources