bash gnu parallel help - bash

its about
http://en.wikipedia.org/wiki/Parallel_(software)
and very rich manpage http://www.gnu.org/software/parallel/man.html
(for x in `cat list` ; do
do_something $x
done) | process_output
is replaced by this
cat list | parallel do_something | process_output
i am trying to implement that on this
while [ "$n" -gt 0 ]
do
percentage=${"scale=2;(100-(($n / $end) * 100))"|bc -l}}
#get url from line specified by n from file done1
nextUrls=`sed -n "${n}p" < done1`
echo -ne "${percentage}% $n / $end urls saved going to line 1. current: $nextUrls\r"
# function that gets links from the url
getlinks $nextUrls
#save n
echo $n > currentLine
let "n--"
let "end=`cat done1 |wc -l`"
done
while reading documentation for gnu parallel
i found out that functions are not supported so getlinks wont be used in parallel
best i have found so far is
seq 30 | parallel -n 4 --colsep ' ' echo {1} {2} {3} {4}
makes output
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
17 18 19 20
21 22 23 24
25 26 27 28
29 30
while loop mentioned above should go like this if I am right
end=`cat done1 |wc -l`
seq $end -1 1 | parallel -j+4 -k
#(all exept getlinks function goes here, but idk how? )|
# everytime it finishes do
getlinks $nextUrls
thx for help in advance

It seems what you want is a progress meter. Try:
cat done1 | parallel --eta wget
If that is not what you want, look at sem (sem is an alias for parallel --semaphore and is normally installed with GNU Parallel):
for i in `ls *.log` ; do
echo $i
sem -j+0 gzip $i ";" echo done
done
sem --wait
In your case it will be something like:
while [ "$n" -gt 0 ]
do
percentage=${"scale=2;(100-(($n / $end) * 100))"|bc -l}}
#get url from line specified by n from file done1
nextUrls=`sed -n "${n}p" < done1`
echo -ne "${percentage}% $n / $end urls saved going to line 1. current: $nextUrls\r"
# function that gets links from the url
THE_URL=`getlinks $nextUrls`
sem -j10 wget $THE_URL
#save n
echo $n > currentLine
let "n--"
let "end=`cat done1 |wc -l`"
done
sem --wait
echo All done

Why does getlinks need to be a function? Take the function and transform it into a shell script (should be essentially identical except you need to export environmental variables in and you of course cannot affect the outside environment without lots of work).
Of course, you cannot save $n into currentline when you are trying to execute in parallel. All files will be overwriting each other at the same time.

i was thinking of makeing something more like this, if not parallel or sam something else because parallel does not supprot funcitons aka http://www.gnu.org/software/parallel/man.html#aliases_and_functions_do_not_work
getlinks(){
if [ -n "$1" ]
then
lynx -image_links -dump "$1" > src
grep -i ".jpg" < src > links1
grep -i "http" < links1 >links
sed -e 's/.*\(http\)/http/g' < links >> done1
sort -f done1 > done2
uniq done2 > done1
rm -rf links1 links src done2
fi
}
func(){
percentage=${"scale=2;(100-(($1 / $end) * 100))"|bc -l}}
#get url from line specified by n from file done1
nextUrls=`sed -n "${$1}p" < done1`
echo -ne "${percentage}% $n / $end urls saved going to line 1. current: $nextUrls\r"
# function that gets links from the url
getlinks $nextUrls
#save n
echo $1 > currentLine
let "$1--"
let "end=`cat done1 |wc -l`"
}
while [ "$n" -gt 0 ]
do
sem -j10 func $n
done
sem --wait
echo All done
My script has become really complex, and i do not want to make a feature unavailable with something i am not sure it can be done
this way i can get links with full internet traffic been used, should take less time that way

tryed sem
#!/bin/bash
func (){
echo 1
echo 2
}
for i in `seq 10`
do
sem -j10 func
done
sem --wait
echo All done
you get
errors
Can't exec "func": No such file or directory at /usr/share/perl/5.10/IPC/Open3.p
m line 168.
open3: exec of func failed at /usr/local/bin/sem line 3168

It is not quite clear what the end goal of your script is. If you are trying to write a parallel web crawler, you might be able to use the below as a template.
#!/bin/bash
# E.g. http://gatt.org.yeslab.org/
URL=$1
# Stay inside the start dir
BASEURL=$(echo $URL | perl -pe 's:#.*::; s:(//.*/)[^/]*:$1:')
URLLIST=$(mktemp urllist.XXXX)
URLLIST2=$(mktemp urllist.XXXX)
SEEN=$(mktemp seen.XXXX)
# Spider to get the URLs
echo $URL >$URLLIST
cp $URLLIST $SEEN
while [ -s $URLLIST ] ; do
cat $URLLIST |
parallel lynx -listonly -image_links -dump {} \; wget -qm -l1 -Q1 {} \; echo Spidered: {} \>\&2 |
perl -ne 's/#.*//; s/\s+\d+.\s(\S+)$/$1/ and do { $seen{$1}++ or print }' |
grep -F $BASEURL |
grep -v -x -F -f $SEEN | tee -a $SEEN > $URLLIST2
mv $URLLIST2 $URLLIST
done
rm -f $URLLIST $URLLIST2 $SEEN

Related

Bash script that checks between 2 csv files old and new. To check that in the new file, the line count has content which is x % of the old files?

As of now how i am writing the script is to count the number of lines for the 2 files.
Then i put it though condition if it is greater than the old.
However, i am not sure how to compare it based on percentage of the old files.
I there a better way to design the script.
#!/bin/bash
declare -i new=$(< "$(ls -t file name*.csv | head -n 1)" wc -l)
declare -i old=$(< "$(ls -t file name*.csv | head -n 2)" wc -l)
echo $new
echo $old
if [ $new -gt $old ];
then
echo "okay";
else
echo "fail";
If you need to check for x% max diff line, you can count the number of '<' lines in the diff output. Recall the the diff output will look like.
+ diff node001.html node002.html
2,3c2,3
< 4
< 7
---
> 2
> 3
So that code will look like:
old=$(wc -l < file1)
diff1=$(diff file1 file2 | grep -c '^<')
pct=$((diff1*100/(old-1)))
# Check Percent
if [ "$pct" -gt 60 ] ; then
...
fi

Grep command in array

For a homework assignment I have to Take the results from the grep command, and write out up to the first 5 of them, numbering them from 1 to 5. (Print the number, then a space, then the line from grep.) If there are no lines, print a message saying so. So far I managed to store the grep command in an array but this is where I've gotten stuck: Can anyone provide guidance as to how to proceed in printing this as stated above
pattern="*.c"
fileList=$(grep -l "main" $pattern)
IFS=$"\n"
declare -a array
array=$fileList
for x in "${array[#]}"; do
echo "$x"
done
you can grep options -c and -l
pattern="*.c"
searchPattern="main"
counter=1
while read -r line ; do
IFS=':' read -r -a lineInfo <<< "$line"
if [[ $counter > 5 ]]; then
exit 1
fi
if [[ ${lineInfo[1]} > 0 ]]; then
numsOfLine=""
while read -r fileline ; do
IFS=':' read -r -a fileLineInfo <<< "$fileline"
numsOfLine="$numsOfLine ${fileLineInfo[0]} "
done < <(grep -n $searchPattern ${lineInfo[0]})
echo "$counter ${lineInfo[0]} match on lines: $numsOfLine"
let "counter += 1"
else
echo "${lineInfo[0]} no match lines"
fi
done < <(grep -c $searchPattern $pattern)
If you're only allowed to use grep and bash(?):
pattern="*.c"
fileList=($(grep -l "main" $pattern))
if test ${#fileList[#]} = 0 ; then
echo "No results"
else
n=0
while test $n -lt ${#fileList[#]} -a $n -lt 5 ; do
i=$n
n=$(( n + 1 ))
echo "$n ${fileList[$i]}"
done
fi
If you are allowed to use commands in addition to grep, you can pipe the results through nl to add line numbers, then head to limit the results to the first 5 lines, then a second grep to test if there were any lines. For example:
if ! grep -l "main" $pattern | \
nl -s ' ' | sed -e 's/^ *//' | \
head -n 5 | grep '' ; then
echo "No results"
fi

how to speed up checking if file exists in bash

I'm new at Bashing and wrote a code to check my photos files but find it very slow and gets a few empty returns checking 17000+ photos. Is there any way to use all 4 cpus running this script and so speed it up
Please help
#!/bin/bash
readarray -t array < ~/Scripts/ourphotos.txt
totalfiles="${#array[#]}"
echo $totalfiles
i=0
ii=0
check1=""
while :
do
check=${array[$i]}
if [[ ! -r $( echo $check ) ]] ; then
if [ $check = $check1 ]; then
echo "empty "$check
else
unset array[$i]
ii=$((ii + 1 ))
fi
fi
if [ $totalfiles = $i ]; then
break
fi
i=$(( i + 1 ))
done
if [ $ii -gt "1" ]; then
notify-send -u critical $ii" files have been deleted or are unreadable"
fi
It's a filesystem operation so multiple cores will hardly help.
Simplification might:
while read file; do
i=$((i+1)); [ -e "$file" ] || ii=$(ii+1));
done < "$HOME/Scripts/ourphotos.txt"
#...
Two points:
you don't need to keep the whole file in memory (no arrays needed)
$( echo $check ) forks a proces. You generally want to avoid forking and execing in loops.
This is an old question, but a common problem lacking an evidence-based solution.
awk '{print "[ -e "$1" ] && echo "$2}' | parallel # 400 files/s
awk '{print "[ -e "$1" ] && echo "$2}' | bash # 6000 files/s
while read file; do [ -e $file ] && echo $file; done # 12000 files/s
xargs find # 200000 files/s
parallel --xargs find # 250000 files/s
xargs -P2 find # 400000 files/s
xargs -P96 find # 800000 files/s
I tried this on a few different systems and the results were not consistent, but xargs -P (parallel execution) was consistently the fastest. I was surprised that xargs -P was faster than GNU parallel (not reported above, but sometimes much faster), and I was surprised that parallel execution helped so much — I thought that file I/O would be the limiting factor and parallel execution wouldn't matter much.
Also noteworthy is that xargs find is about 20x faster than the accepted solution, and much more concise. For example, here is a rewrite of OP's script:
#!/bin/bash
total=$(wc -l ~/Scripts/ourphotos.txt | awk '{print $1}')
# tr '\n' '\0' | xargs -0 handles spaces and other funny characters in filenames
found=$(cat ~//Scripts/ourphotos.txt | tr '\n' '\0' | xargs -0 -P4 find | wc -l)
if [ $total -ne $found ]; then
ii=$(expr $total - $found)
notify-send -u critical $ii" files have been deleted or are unreadable"
fi

Bash - How to count C source file functions calls

I want to find for each function defined in a C source file how many times it's called and on which line.
Should I search for patterns which look like function definitions in C and then count how many times that function name occurs. If so, how can I do it? regular expressions?
Any help will be highly appreciated!
#!/bin/bash
if [ -r $1 ]; then
#??????
else
echo The file \"$1\" does NOT exist
fi
The final result is: (please report any bugs)
10 if [ -r $1 ]; then
11 functs=`grep -n -e "\(void\|double\|char\|int\) \w*(.*)" $1 | sed 's/^.*\(void\|double\|int\) \(\w*\)(.*$/\2/g'`
12 for f in $functs;do
13 echo -n $f\(\) is called:
14 grep -n $f $1 > temp.txt
15 echo -n `grep -c -v -e "\(void\|double\|int\) $f(.*)" -e"//" temp.txt`
16 echo " times"
17 echo -n on lines:
18 echo -n `grep -v -e "\(void\|double\|int\) $f(.*)" -e"//" temp.txt | sed -n 's/^\([0-9]*\)[:].*/\1/p'`
19 echo
20 echo
21 done
22 else
23 echo The file \"$1\" does not exist
24 fi
This might sort of work. The first bit finds function definitions like
<datatype> <name>(<stuff>)
and pulls out the <name>. Then grep for that string. There are loads of situations where this won't work, but it might be a good place to start if you're trying to make a simple shell script that works on some programs.
functions=`grep -e "\(void\|double\|int\) \w*(.*)$" -f input.c | sed 's/^.*\(void\|double\|int\) \(\w*\)(.*$/\2/g'`
for func in $functions
do
echo "Counting references for $func:"
grep "$func" -f input.c | wc -l
done
You can try with this regex
(^|[^\w\d])?(functionName(\s)*\()
for example to search all printf occurrences
(^|[^\w\d])?(printf(\s)*\()
to use this expression with grep you have to use the option -E, like this
grep -E "(^|[^\w\d])?(printf(\s)*\()" the_file.txt
Final note, what miss with this solution is to skip the occurrences in comment bloks.

Error Handling on bash script

Infinite loop on bash script and I want to run forever but (I guess) something goes wrong script is killed. Is there any way like try-catch, just continue to running forever, unconditionaly.
#!/bin/bash
iteration=0
for (( ; ; ))
do
process_id=`ps -ef | grep java | grep TEST | awk '{print $2}' `
kill_command='kill -3 '$process_id
time=`date | awk '{print substr($4,0,5)}' `
last_write=`ls -l /files/*.txt | awk '{print $8}' `
if [ "$time" != "$last_write" ]
then
$kill_command
sleep 1
$kill_command
sleep 1
$kill_command
sleep 1
/test/show_queue.sh
fi
let "iteration+=1"
if [ "$iteration" == "30" ]
then
let "iteration=0"
$kill_command
echo '------------' >> memory_status.log
date >> memory_status.log
prstat -n 7 1 1 >> memory_status.log
echo '------------' >> memory_status.log
/test/show_queue.sh
fi
sleep 60
done
A very simple way to do it is to use two scripts. One with the loop and one which does the killing task :
for (( ; ; ))
do
DoKillingTask
rc=$? # <- You get the return code of the script and decide what to do
done
If it continues to be killed, Mikel (in comment of your question) is right.

Resources