Bash Script Loop Out of Memory? - bash

In bash I need to run a script that loops from i=1 to i=99999999 but it always run out of memory. Is there any workaround? or is there a max value for i?
first=1
last=99999999
randomString="CXCXQOOPSOIS"
for val in $( seq $first $last )
do
padVal=$( printf "%010d\n" $val )
hash=$( echo -n $randomString$padVal | md5sum )
if [[ "$hash" =~ ^000000) ]]; then
echo "Number: $val" >> log_000000
echo "$val added to log - please check."
fi
done

bash provides C-like syntax for loop:
first=1
last=99999999
randomString="CXCXQOOPSOIS"
for ((val=$first; val<$last; val++))
do
padVal=$( printf "%010d\n" $val )
hash=$( echo -n $randomString$padVal | md5sum )
if [[ "$hash" =~ ^000000) ]]; then
echo "Number: $val" >> log_000000
echo "$val added to log - please check."
fi
done

Your seq command generates 100 million numbers (bar a couple) and requires 800 MiB or so of memory to hold just the list of digits (probably an under-estimate; each number might be held in a separate memory allocation, which might mean 8 bytes for a pointer and 16 bytes for the allocated block, which triples the storage space estimate).
You can improve things dramatically by using:
for millions in $(seq 0 99)
do
for smallstuff in $(seq -f "%6.0f" 0 999999)
do
val="$millions$smallstuff"
...
done
done
This dramatically reduces the amount of memory needed; the only issue to watch is that it tests 0 which your original code did not.

If you still want to use seq => therefore separate seq and the loop using a pipe: |
This solution is more portable and can be used on other shells.
The memory print is still reduced, but this script requires to process two threads.
first=1
last=99999999
randomString="CXCXQOOPSOIS"
seq $first $last |
while read val
do
padVal=$( printf "%010d\n" $val )
hash=$( echo -n $randomString$padVal | md5sum )
if [[ "$hash" =~ ^000000) ]]; then
echo "Number: $val" >> log_000000
echo "$val added to log - please check."
fi
done

Related

How to make loop which depends of number from variable

I tried so much to do this, but I still don't know to write it property.
Example:
I have someVariable which contain words.
someVariable="Otter HoneyBadger Seal"
Depending on the number of Strings located in someVariable I need to make a visual list with echo.
I had this idea:
someVariable="Otter HoneyBadger Seal"
someVariable_number="echo $someVariable | wc -w"
while true
do
j=1
if [[ ${j} -gt ${someVariable_number} ]]; then
echo $someVariable | awk '{ print $j }'
((j++))
fi
done
Apart from the stylistics that I did not include, would like the output to look like this
./someScript
#OUTPUT
[1] Otter
[2] HoneyBadger
[3] Seal
With bash:
someVariable="Otter HoneyBadger Seal"
for i in $someVariable; do echo "[$((++c))] $i"; done
Output:
[1] Otter
[2] HoneyBadger
[3] Seal
Use an array. It will be simpler and faster and safer.
# read the words of the variable into an array
read -ra words <<<"$someVariable"
# iterate over the array indices
for idx in "${!words[#]}"; do
printf '[%d] %s\n' $((idx + 1)) "${words[idx]}"
done
[1] Otter
[2] HoneyBadger
[3] Seal
Safer because you don't have any unquoted variables. Try other techniques with
someVariable="Otter HoneyBadger Seal *"
and see how many you get
Addressing just the issues with OP's current code ...
A few issues...
# wrong: assigns string to variable
$ someVariable_number="echo $someVariable | wc -w"
$ echo "${someVariable}"
echo Otter HoneyBadger Seal | wc -w
# right: executes 'echo|wc' and saves result to variable
$ someVariable_number=$(echo $someVariable | wc -w)
$ echo "${someVariable}"
3
Infinite loop since j=1 will never be -gt 3 so the echo|awk;j++ is never executed and loop will run forever; also, each pass through loop you are resetting j=1 effectively wiping out the j++; last issue is the incorrect usage of a bash variable inside an awk script:
# wrong:
while true
do
j=1
if [[ ${j} -gt ${someVariable_number} ]]; then
echo $someVariable | awk '{ print $j }'
((j++))
fi
done
# right: consolidate `while` and `if`
j=1
while [[ "${j}" -le "${someVariable_number}" ]]
do
echo "${someVariable}" | awk -v field_no="${j}" '{print "[" field_no "] " $field_no}'
((j++))
done
# better (?):
myarray=( dummy_placeholder ${someVariable} )
for ((j=1; j<=${someVariable_number}; j++))
do
printf "[%d] %s\n" "${j}" "${myarray[j]}"
done
Which generates:
[1] Otter
[2] HoneyBadger
[3] Seal
NOTE: see Cyrus' answer for one idea on streamlining the code

I am trying to write a bash script that displays 10 different random numbers, so far some of the numbers that I have are duplicated

I think there is something wrong with the condition and the array.
this is my script
thank you for your time I appreciate.
#!/bin/bash
loop=10
range=20
count=1
declare -a prev
numb=$[1+RANDOM% $range]
prev+=($numb)
echo ===========================
echo $loop DIFFERENT RANDOM NUMBERS
echo ===========================
echo $numb
until [ "$count" -ge "$loop" ]
do
numb=$[1+RANDOM% $range]
if [[ ${prev[#]} -ne $numb ]] ; then
echo $numb
prev+=$numb
((count++))
fi
done
The code attempt to locate previously selected numbers by using the conditions [[ ${prev[#]} -ne $numb ]]. However, bash does not have "in" (or "not in") operators that work on an array and a value.
Consider instead using bash associative arrays. Each used elements is marked by entering a value into the position associated with the selected number
#! /bin/bash
loop=10
range=20
# Associative array prev[N]=1, if N was already printed
declare -A prev
echo ===========================
echo $loop DIFFERENT RANDOM NUMBERS
echo ===========================
for ((count=1 ; count <= loop ; count++)) ; do
numb=$[1+RANDOM% $range]
while [ "${prev[$numb]}" ] ; do
numb=$[1+RANDOM% $range]
done ;
echo $numb
prev[$numb]=1
done
Code using traditional for loop for (( ; ; )) to force loop to run specific number of times.
A quick version using an array -
$: loop=10 min=20 range=20 all=( $( seq $min $((min+range)) ) )
$: while (( loop-- ))
do ndx=$((RANDOM%range))
if (( all[ndx] ))
then echo "${all[ndx]}"
unset "all[ndx]"
else let loop++
fi
done
33
38
27
23
39
32
22
20
36
35
Unsetting each element as used prevents dups.
I'm pretty sure there's a better way... still thinking.

Convert IP Range to IP address

I have a raw file with IP ranges (xx.xx.xx.xx-yy.yy.yy.yy)
I want to create a new list with the range converted into single IP addresses.
(All ranges are in a 1-255 range)
conditions
(1) If the difference between the fourth IP octet on each line is less or equal to the max
variable (say 5) It will loop and report each iteration as a single /32 address.
(2) IP address with more than the max variable will be reported as ip address with /24
The following bash script works fine but it is slow on files of 50,000 lines?
Any help would be appreciated. Its part of a script that does other functions so I need to stay in BASH.
for i in $data; do
A=$(echo $i | sed 's/-.*//'); B=$(echo $i | sed 's/^.*-//')
A1=$(echo $A | cut -d '.' -f 4); B1=$(echo $B | cut -d '.' -f 4)
diff=`expr $B1 - $A1`
if [ "$diff" == "0" ]; then
echo $A >> $outfile
elif [ "$diff" -gt "0" -a "$diff" -le $max ]; then
echo $A >> $outfile
for a in $(jot "$diff"); do
count=`expr $A1 + $a`
echo $A | sed "s/\.[0-9]*$/.$count/" >> $outfile
done
else
echo $A | sed 's/\.[0-9]*$/.0\/24/' >> $outfile
fi
done
The likely reason your script is so slow for 50000 lines is that you having bash call a lot of external programs (sed, cut, jot, expr), several times in each iteration of your inner and outer loops. Forking external processes adds a lot of time overhead, when compounded over multiple iterations.
If you want to do this in bash, and improve performance, you'll need to make use of the equivalent features that are built into bash. I took a stab at this for your script and came up with this. I have tried to keep the functionality the same:
for i in $data; do
A="${i%-*}"; B="${i#*-}"
A1="${A##*.}"; B1="${B##*.}"
diff=$(($B1 - $A1))
if [ "$diff" == "0" ]; then
echo $A >> $outfile
elif [ "$diff" -gt "0" -a "$diff" -le $max ]; then
echo $A >> $outfile
for ((a=1; a<=$diff; a++)); do
count=$(($A1 + $a))
echo "${A%.*}.$count" >> $outfile
done
else
echo "${A%.*}.0/24" >> $outfile
fi
done
In particular I've made a lot of use of parameter expansions and arithmetic expansions. I'd be interested to see what kind of speedup (if any) this has over the original. I think it should be significantly faster.
If you are okay with using python, install (download, extract and run sudo python setup.py install) ipaddr library https://pypi.python.org/pypi/ipaddr, then write something like this
import ipaddr
for ip in (ipaddr.IPv4Network('192.0.2.0/24')):
print ip

How to write a tail script without the tail command

How would you achieve this in bash. It's a question I got asked in an interview and I could think of answers in high level languages but not in shell.
As I understand it, the real implementation of tail seeks to the end of the file and then reads backwards.
The main idea is to keep a fixed-size buffer and to remember the last lines. Here's a quick way to do a tail using the shell:
#!/bin/bash
SIZE=5
idx=0
while read line
do
arr[$idx]=$line
idx=$(( ( idx + 1 ) % SIZE ))
done < text
for ((i=0; i<SIZE; i++))
do
echo ${arr[$idx]}
idx=$(( ( idx + 1 ) % SIZE ))
done
If all not-tail commands are allowed, why not be whimsical?
#!/bin/sh
[ -r "$1" ] && exec < "$1"
tac | head | tac
Use wc -l to count the number of lines in the file. Subtract the number of lines you want from this, and add 1, to get the starting line number. Then use this with sed or awk to start printing the file from that line number, e.g.
sed -n "$start,\$p"
There's this:
#!/bin/bash
readarray file
lines=$(( ${#file[#]} - 1 ))
for (( line=$(($lines-$1)), i=${1:-$lines}; (( line < $lines && i > 0 )); line++, i-- )); do
echo -ne "${file[$line]}"
done
Based on this answer: https://stackoverflow.com/a/8020488/851273
You pass in the number of lines at the end of the file you want to see then send the file via stdin, puts the entire file into an array, and only prints the last # lines of the array.
The only way I can think of in “pure” shell is to do a while read linewise on the whole file into an array variable with indexing modulo n, where n is the number of tail lines (default 10) — i.e. a circular buffer, then iterate over the circular buffer from where you left off when the while read ends. It's not efficient or elegant, in any sense, but it'll work and avoids reading the whole file into memory. For example:
#!/bin/bash
incmod() {
let i=$1+1
n=$2
if [ $i -ge $2 ]; then
echo 0
else
echo $i
fi
}
n=10
i=0
buffer=
while read line; do
buffer[$i]=$line
i=$(incmod $i $n)
done < $1
j=$i
echo ${buffer[$i]}
i=$(incmod $i $n)
while [ $i -ne $j ]; do
echo ${buffer[$i]}
i=$(incmod $i $n)
done
This script somehow imitates tail:
#!/bin/bash
shopt -s extglob
LENGTH=10
while [[ $# -gt 0 ]]; do
case "$1" in
--)
FILES+=("${#:2}")
break
;;
-+([0-9]))
LENGTH=${1#-}
;;
-n)
if [[ $2 != +([0-9]) ]]; then
echo "Invalid argument to '-n': $1"
exit 1
fi
LENGTH=$2
shift
;;
-*)
echo "Unknown option: $1"
exit 1
;;
*)
FILES+=("$1")
;;
esac
shift
done
PRINTHEADER=false
case "${#FILES[#]}" in
0)
FILES=("/dev/stdin")
;;
1)
;;
*)
PRINTHEADER=true
;;
esac
IFS=
for I in "${!FILES[#]}"; do
F=${FILES[I]}
if [[ $PRINTHEADER == true ]]; then
[[ I -gt 0 ]] && echo
echo "==> $F <=="
fi
if [[ LENGTH -gt 0 ]]; then
LINES=()
COUNT=0
while read -r LINE; do
LINES[COUNT++ % LENGTH]=$LINE
done < "$F"
for (( I = COUNT >= LENGTH ? LENGTH : COUNT; I; --I )); do
echo "${LINES[--COUNT % LENGTH]}"
done
fi
done
Example run:
> bash script.sh -n 12 <(yes | sed 20q) <(yes | sed 5q)
==> /dev/fd/63 <==
y
y
y
y
y
y
y
y
y
y
y
y
==> /dev/fd/62 <==
y
y
y
y
y
> bash script.sh -4 <(yes | sed 200q)
y
y
y
y
Here's the answer I would give if I were actually asked this question in an interview:
What environment is this where I have bash but not tail? Early boot scripts, maybe? Can we get busybox in there so we can use the full complement of shell utilities? Or maybe we should see if we can squeeze a stripped-down Perl interpreter in, even without most of the modules that would make life a whole lot easier. You know dash is much smaller than bash and perfectly good for scripting use, right? That might also help. If none of that is an option, we should check how much space a statically linked C mini-tail would need, I bet I can fit it in the same number of disk blocks as the shell script you want.
If that doesn't convince the interviewer that it's a silly question, then I go on to observe that I don't believe in using bash extensions, because the only good reason to write anything complicated in shell script nowadays is if total portability is an overriding concern. By avoiding anything that isn't portable even in one-offs, I don't develop bad habits, and I don't get tempted to do something in shell when it would be better done in a real programming language.
Now the thing is, in truly portable shell, arrays may not be available. (I don't actually know whether the POSIX shell spec has arrays, but there certainly are legacy-Unix shells that don't have them.) So, if you have to emulate tail using only shell builtins and it's got to work everywhere, this is the best you can do, and yes, it's hideous, because you're writing in the wrong language:
#! /bin/sh
a=""
b=""
c=""
d=""
e=""
f=""
while read x; do
a="$b"
b="$c"
c="$d"
d="$e"
e="$f"
f="$x"
done
printf '%s\n' "$a"
printf '%s\n' "$b"
printf '%s\n' "$c"
printf '%s\n' "$d"
printf '%s\n' "$e"
printf '%s\n' "$f"
Adjust the number of variables to match the number of lines you want to print.
The battle-scarred will note that printf is not 100% available either. Unfortunately, if all you have is echo, you are up a creek: some versions of echo cannot print the literal string "-n", and others cannot print the literal string "\n", and even figuring out which one you have is a bit of a pain, particularly as, if you don't have printf (which is in POSIX), you probably don't have user-defined functions either.
(N.B. The code in this answer, sans rationale, was originally posted by user 'Nirk' but then deleted under downvote pressure from people whom I shall charitably assume were not aware that some shells do not have arrays.)

parse and expand interval

In my script I need to expand an interval, e.g.:
input: 1,5-7
to get something like the following:
output: 1,5,6,7
I've found other solutions here, but they involve python and I can't use it in my script.
Solution with Just Bash 4 Builtins
You can use Bash range expansions. For example, assuming you've already parsed your input you can perform a series of successive operations to transform your range into a comma-separated series. For example:
value1=1
value2='5-7'
value2=${value2/-/..}
value2=`eval echo {$value2}`
echo "input: $value1,${value2// /,}"
All the usual caveats about the dangers of eval apply, and you'd definitely be better off solving this problem in Perl, Ruby, Python, or AWK. If you can't or won't, then you should at least consider including some pipeline tools like tr or sed in your conversions to avoid the need for eval.
Try something like this:
#!/bin/bash
for f in ${1//,/ }; do
if [[ $f =~ - ]]; then
a+=( $(seq ${f%-*} 1 ${f#*-}) )
else
a+=( $f )
fi
done
a=${a[*]}
a=${a// /,}
echo $a
Edit: As #Maxim_united mentioned in the comments, appending might be preferable to re-creating the array over and over again.
This should work with multiple ranges too.
#! /bin/bash
input="1,5-7,13-18,22"
result_str=""
for num in $(tr ',' ' ' <<< "$input"); do
if [[ "$num" == *-* ]]; then
res=$(seq -s ',' $(sed -n 's#\([0-9]\+\)-\([0-9]\+\).*#\1 \2#p' <<< "$num"))
else
res="$num"
fi
result_str="$result_str,$res"
done
echo ${result_str:1}
Will produce the following output:
1,5,6,7,13,14,15,16,17,18,22
expand_commas()
{
local arg
local st en i
set -- ${1//,/ }
for arg
do
case $arg in
[0-9]*-[0-9]*)
st=${arg%-*}
en=${arg#*-}
for ((i = st; i <= en; i++))
do
echo $i
done
;;
*)
echo $arg
;;
esac
done
}
Usage:
result=$(expand_commas arg)
eg:
result=$(expand_commas 1,5-7,9-12,3)
echo $result
You'll have to turn the separated words back into commas, of course.
It's a bit fragile with bad inputs but it's entirely in bash.
Here's my stab at it:
input=1,5-7,10,17-20
IFS=, read -a chunks <<< "$input"
output=()
for chunk in "${chunks[#]}"
do
IFS=- read -a args <<< "$chunk"
if (( ${#args[#]} == 1 )) # single number
then
output+=(${args[*]})
else # range
output+=($(seq "${args[#]}"))
fi
done
joined=$(sed -e 's/ /,/g' <<< "${output[*]}")
echo $joined
Basically split on commas, then interpret each piece. Then join back together with commas at the end.
A generic bash solution using the sequence expression `{x..y}'
#!/bin/bash
function doIt() {
local inp="${#/,/ }"
declare -a args=( $(echo ${inp/-/..}) )
local item
local sep
for item in "${args[#]}"
do
case ${item} in
*..*) eval "for i in {${item}} ; do echo -n \${sep}\${i}; sep=, ; done";;
*) echo -n ${sep}${item};;
esac
sep=,
done
}
doIt "1,5-7"
Should work with any input following the sample in the question. Also with multiple occurrences of x-y
Use only bash builtins
Using ideas from both #Ansgar Wiechers and #CodeGnome:
input="1,5-7,13-18,22"
for s in ${input//,/ }
do
if [[ $f =~ - ]]
then
a+=( $(eval echo {${s//-/..}}) )
else
a+=( $s )
fi
done
oldIFS=$IFS; IFS=$','; echo "${a[*]}"; IFS=$oldIFS
Works in Bash 3
Considering all the other answers, I came up with this solution, which does not use any sub-shells (but one call to eval for brace expansion) or separate processes:
# range list is assumed to be in $1 (e.g. 1-3,5,9-13)
# convert $1 to an array of ranges ("1-3" "5" "9-13")
IFS=,
local range=($1)
unset IFS
list=() # initialize result list
local r
for r in "${range[#]}"; do
if [[ $r == *-* ]]; then
# if the range is of the form "x-y",
# * convert to a brace expression "{x..y}",
# * using eval, this gets expanded to "x" "x+1" … "y" and
# * append this to the list array
eval list+=( {${r/-/..}} )
else
# otherwise, it is a simple number and can be appended to the array
list+=($r)
fi
done
# test output
echo ${list[#]}

Resources