Bash script number generator - bash

I need to generate random numbers in an specific format as test data. For example, given a number "n" I need to produce "n" random numbers and write them in a file. The file must contain at most 3 numbers per line. Here is what I have:
#!/bin/bash
m=$1
output=$2
for ((i=1; i<= m; i++)) do
echo $((RANDOM % 29+2)) >> $output
done
This outputs the numbers as:
1
2
24
21
10
14
and what I want is:
1 2 24
21 10 14
Thank you for your help!

Pure bash (written as a function rather than a script file)
randx3() {
local d=$' \n'
local i
for ((i=0;i<$(($1 - 1));++i)); do
printf "%d%c" $((RANDOM%29 + 2)) "${d:$((i%3)):1}"
done
printf "%d\n" $((RANDOM%29 + 2))
}
Note that it doesn't take a file argument; rather it outputs to stdout, so you would use it like this:
randx3 11 > /path/to/output
That style is often more flexible.
Here's a less hacky one which allows you to select how often you want a newline:
randx() {
local i
local m=$1
local c=${2:-3}
for ((i=1;i<=m;++i)); do
if ((i%c && i<m)); then
printf "%d " $((RANDOM%29 + 2))
else
printf "%d\n" $((RANDOM%29 + 2))
fi
done
}
Call that one as randx 11 or randx 11 7 (second argument defaults to 3).

Pipe the output to a command that will read 3 lines at a time:
for ((i=1; i<= m; i++)) do
echo $((RANDOM % 29+2))
done | sed -e '$!N;$!N;s/\n/ /g' >> $output

This is what paste was designed for:
$ for i in {0..10}; do echo $RANDOM; done | paste -d' ' - - -
14567 3240 16354
17457 25616 12772
3912 7490 12206
7342 10554
Another approach would be to build up the values in an array, then use printf.
m=$1
output=$2
vals=()
while (( m-- )); do
vals+=( $((RANDOM % 29+2)) )
done
printf '%d %d %d\n' "${vals[#]}" > "$output"

Shortest!!!
I need to produce "n" random numbers and write them in a file. The file must contain at most 3 numbers per line.
pr -t -3 -s\ <(for ((n=6;n--;)){ echo $((RANDOM % 29+2));}) >file
Then
cat file
11 29 27
14 21 22
YAS: Yet another bash solution
As a script:
#!/bin/bash
n=$1
file=$2
out=()
>$file
for ((i=1;i<=n;i++));do
out+=($((RANDOM%29+2)))
[ $((i%3)) -eq 0 ] && echo ${out[*]} >>$file && out=()
done
[ "$out" ] && echo ${out[*]} >>$file
Usage:
script <quantity of random> <filename>
Important remark about RANDOM%29
This way of rendering random between 2 to 30 is not equitable!
As $RANDOM give a number between 0 and 32767, there is:
for ((i=0;i<32768;i++)) ;do
((RL[$((i%29+2))]++))
done
for ((i=0;i<32;i++));do
printf "%3d %5d\n" $i ${RL[i]}
done | column
0 0 7 1130 14 1130 21 1130 28 1130
1 0 8 1130 15 1130 22 1130 29 1129
2 1130 9 1130 16 1130 23 1130 30 1129
3 1130 10 1130 17 1130 24 1130 31 0
4 1130 11 1130 18 1130 25 1130
5 1130 12 1130 19 1130 26 1130
6 1130 13 1130 20 1130 27 1130
... there is 1130 chances to obtain a number between 2 to 28, but only 1129 chances to obtain a 29 or a 30.
To prevent this, you have to drop unwanted results:
random2to30() {
local _random=32769
while (( $_random>=32741 )) ;do
_random=$RANDOM;
done;
printf -v $1 "%d" $((2+_random%29))
}
The proof:
tstr2to30() {
unset $1
local _random=32769
while (( $_random>=32741 )); do
read _random || break
done
[ "$_random" ] && printf -v $1 "%d" $((2 +_random % 29 ))
}
unset RL
while tstr2to30 MyRandom && [ "$MyRandom" ] ;do
((RL[MyRandom]++))
done < <(seq 0 32767)
for ((i=0;i<32;i++));do
printf "%3d %5d\n" $i ${RL[i]}
done | column
Give:
0 0 7 1129 14 1129 21 1129 28 1129
1 0 8 1129 15 1129 22 1129 29 1129
2 1129 9 1129 16 1129 23 1129 30 1129
3 1129 10 1129 17 1129 24 1129 31 0
4 1129 11 1129 18 1129 25 1129
5 1129 12 1129 19 1129 26 1129
6 1129 13 1129 20 1129 27 1129
Where all value do obtain exactly same (1129) chances!
Final useable script
So the script could become (Don't forget bash's shebang!):
#!/bin/bash
n=${1:-11} # default to 11 values
c=${2:-3} # default to 3 values by lines
minval=${3:-2} # default to 2 random min
maxval=${4:-30} # defailt to 30 random max
file=${5:-/dev/stdout} # default to STDOUT
rnum=$(( maxval - minval + 1 ))
rmax=$(( ( 32768 / rnum ) * rnum ))
randomGen() {
local _random=33000
while [ $_random -ge $rmax ] ;do
_random=$RANDOM
done
printf -v $1 "%d" $(( minval +_random % rnum ))
}
out=()
for ((i=1;i<=n;i++));do
randomGen MyRandom
out+=($MyRandom)
[ $((i%c)) -eq 0 ] && echo ${out[*]} >>"$file" && out=()
done
[ "$out" ] && echo ${out[*]} >>"$file"

This awk will insert a newline after every 3rd line or a space:
for ((i=1; i<= m; i++)); do
echo $((RANDOM % 29+2))
done | awk '{printf "%s%c", $1, (NR % 3) ? " " : "\n"}' >> $output

yet another way of doing it :
eval echo {1..$m} | xargs -n3 echo $((RANDOM % 29+2)) > $output

Related

How to find values 2 exponential in shell?

Is there a way to find a value's 2 exponential form in bash.
For example if I input 512 it should result output as 9 meaning 2 ^ 9 is 512.
Any help here is immensely appreciated - Thanks
When I read the question, 512 is the input, and 9 is the output. Is is possible what is being asked here is the answer to "log_base_2(512)" which has an answer of "9". If so, then maybe this would help.
$ echo "l(512) / l(2)" | bc -l
9.00000000000000000008
The explanation of the math can be found here:
How do I calculate the log of a number using bc?
Using awk.
$ echo 512 | awk '{print log($1)/log(2)}'
9
Put that into a script (expo.sh):
#!/bin/bash
_num="$1"
expon=$(awk -v a="$_num" 'BEGIN{print log(a)/log(2)}')
if [[ $expon =~ ^[0-9]+\.[0-9]*$ ]]; then # Match floating points
echo "$_num is not an exponent of 2"; # Not exponent if floating point
else
echo "$_num = 2^${expon}"; # print number
fi
Run:
$ ./expo.sh 512
512 = 2^9
$ ./expo.sh 21
21 is not an exponent of 2
A fast way to check a number x is an 2 exponent is to check bitwise and x and x-1 and to exclude 0, x>0
((x>0 && ( x & x-1 ) == 0 )) && echo $x is a 2-exponent
using this algorithm: fast-computing-of-log2-for-64-bit-integers to compute log2
tab32=( 0 9 1 10 13 21 2 29
11 14 16 18 22 25 3 30
8 12 20 28 15 17 24 7
19 27 23 6 26 5 4 31 )
log2_32() {
local value=$1
(( value |= value >> 1 ))
(( value |= value >> 2 ))
(( value |= value >> 4 ))
(( value |= value >> 8 ))
(( value |= value >> 16 ))
log2_32=${tab32[(value * 16#7C4ACDD & 16#ffffffff)>>27]}
}
log2_32 262144
echo "$log2_32"

bash 4.2 / 4.3: Different behavior in C-style loop

bash 4.2 show the assumed correct behavior in a C-style for loop:
me#server:/some/dir# TIMES=30; for (( n=0; n<$(shuf -i ${TIMES}-$(expr ${TIMES} + 20) -n 1); n++ )); do echo $n; done
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
me#server:/some/dir# bash --version
GNU bash, Version 4.2.25(1)-release (x86_64-pc-linux-gnu)
(...)
me#server:/some/dir#
The same under bash 4.3 throws an error:
me#server:/some/dir# TIMES=30; for (( n=0; n<$(shuf -i ${TIMES}-$(expr ${TIMES} + 20) -n 1); n++ )); do echo $n; done
-bash: syntax error near unexpected token `newline'
me#server:/some/dir# bash --version
GNU bash, Version 4.3.30(1)-release (x86_64-pc-linux-gnu)
(...)
Yet the part to find a random number between ${TIMES} and ${TIMES}+20 works:
me#server:/some/dir# shuf -i 20-50 -n 1
26
me#server:/some/dir#
So does inserting the numeral directly instead of $()-subshell'ing it:
me#server:/some/dir# TIMES=30; for (( n=0; n<26; n++ )); do echo $n; done
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
me#server:/some/dir#
What's going on here? Any ideas why the subshell is not executed correctly under bash 4.3?
If you replace $(expr with $((, it starts to work:
TIMES=30
for (( n=0; n < $(shuf -i $TIMES-$((TIMES + 20)) -n 1); n++ )) ; do
echo $n
done

Read the number of columns using awk/sed

I have the following test file
Kmax Event File - Text Format
1 4 1000
65 4121 9426 12312
56 4118 8882 12307
1273 4188 8217 12309
1291 4204 8233 12308
1329 4170 8225 12303
1341 4135 8207 12306
63 4108 8904 12300
60 4106 8897 12307
731 4108 8192 12306
...
ÿÿÿÿÿÿÿÿ
In this file I want to delete the first two lines and apply some mathematical calculations. For instance each column i will be $i-(i-1)*number. A script that does this is the following
#!/bin/bash
if test $1 ; then
if [ -f $1.evnt ] ; then
rm -f $1.dat
sed -n '2p' $1.evnt | (read v1 v2 v3
for filename in $1*.evnt ; do
echo -e "Processing file $filename"
sed '$d' < $filename > $1_tmp
sed -i '/Kmax/d' $1_tmp
sed -i '/^'"$v1"' '"$v2"' /d' $1_tmp
cat $1_tmp >> $1.dat
done
v3=`wc -l $1.dat | awk '{print $1}' `
echo -e "$v1 $v2 $v3" > .$1.dat
rm -f $1_tmp)
else
echo -e "\a!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
echo -e " Event file $1.evnt doesn't exist !!!!!!"
echo -e "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
fi
else
echo -e "\a!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
echo -e "!!!!! Give name for event files !!!!!"
echo -e "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
fi
awk '{print $1, $2-4096, $3-(2*4096), $4-(3*4096)}' $1.dat >$1_Processed.dat
rm -f $1.dat
exit 0
The file won't always have 4 columns. Is there a way to read the number of columns, print this number and apply those calculations?
EDIT The idea is to have an input file (*.evnt), convert it to *.dat or any other ascii file(it doesn't matter really) which will only include the number in columns and then apply the calculation $i=$i-(i-1)*number. In addition it will keep the number of columns in a variable, that will be called in another program. For instance in the above file, number=4096 and a sample output file is the following
65 25 1234 24
56 22 690 19
1273 92 25 21
1291 108 41 20
1329 74 33 15
1341 39 15 18
63 12 712 12
60 10 705 19
731 12 0 18
while in the console I will get the message There are 4 detectors.
Finally a new file_processed.dat will be produced, where file is the initial name of awk's input file.
The way it should be executed is the following
./myscript <filename>
where <filename> is the name without the format. For instance, the files will have the format filename.evnt so it should be executed using
./myscript filename
Let's start with this to see if it's close to what you're trying to do:
$ numdet=$( awk -v num=4096 '
NR>2 && NF>1 {
out = FILENAME "_processed.dat"
for (i=1;i<=NF;i++) {
$i = $i-(i-1)*num
}
nf = NF
print > out
}
END {
printf "There are %d detectors\n", nf | "cat>&2"
print nf
}
' file )
There are 4 detectors
$ cat file_processed.dat
65 25 1234 24
56 22 690 19
1273 92 25 21
1291 108 41 20
1329 74 33 15
1341 39 15 18
63 12 712 12
60 10 705 19
731 12 0 18
$ echo "$numdet"
4
Is that it?
Using awk
awk 'NR<=2{next}{for (i=1;i<=NF;i++) $i=$i-(i-1)*4096}1' file

How do i split the input into chunks of six entries each using bash?

This is the script which i run to output the raw data of data_tripwire.sh
#!/bin/sh
LOG=/var/log/syslog-ng/svrs/sec2tes1
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
CBS=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.41 |sort|uniq | wc -l`
echo $CBS >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
GFS=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.31 |sort|uniq | wc -l`
echo $GFS >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
HR1=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.10.1 |sort|uniq | wc -l `
echo $HR1 >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
HR2=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.21.12 |sort|uniq | wc -l`
echo $HR2 >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
PAYROLL=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.21.18 |sort|uniq | wc -l`
echo $PAYROLL >> /home/secmgr/attmrms1/data_tripwire1.sh
done
for count in 6 5 4 3 2 1 0
do
MONTH=`date -d"$count month ago" +"%Y-%m"`
INCV=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.71 |sort|uniq | wc -l`
echo $INCV >> /home/secmgr/attmrms1/data_tripwire1.sh
done
data_tripwire.sh
91
58
54
108
52
18
8
81
103
110
129
137
84
15
14
18
11
17
12
6
1
28
6
14
8
8
0
0
28
24
25
23
21
13
9
4
18
17
18
30
13
3
I want to do the first 6 entries(91,58,54,108,52,18) from the output above. Then it will break out of the loop.After that it will continue for the next 6 entries.Then it will break out of the loop again....
The problem now is that it reads all the 42 numbers without breaking out of the loop.
This is the output of the table
Tripwire
Month CBS GFS HR HR Payroll INCV
cb2db1 gfs2db1 hr2web1 hrm2db1 hrm2db1a incv2svr1
2013-07 85 76 12 28 26 4
2013-08 58 103 18 6 24 18
2013-09 54 110 11 14 25 17
2013-10 108 129 17 8 23 18
2013-11 52 137 12 8 21 30
2013-12 18 84 6 0 13 13
2014-01 8 16 1 0 9 3
The problem now is that it read the total 42 numbers from 85...3
I want to make a loop which run from july till jan for one server.Then it will do the average mean and standard deviation calculation which is already done below.
After that done, it will continue the next cycle of 6 numbers for the next server and it will do the same like initial cycle.Assistance is required for the for loops which has break and continue in it or any simpler.
This is my standard deviation calculation
count=0 # Number of data points; global.
SC=3 # Scale to be used by bc. three decimal places.
E_DATAFILE=90 # Data file error
## ----------------- Set data file ---------------------
if [ ! -z "$1" ] # Specify filename as cmd-line arg?
then
datafile="$1" # ASCII text file,
else #+ one (numerical) data point per line!
datafile=/home/secmgr/attmrms1/data_tripwire1.sh
fi # See example data file, below.
if [ ! -e "$datafile" ]
then
echo "\""$datafile"\" does not exist!"
exit $E_DATAFILE
fi
Calculate the mean
arith_mean ()
{
local rt=0 # Running total.
local am=0 # Arithmetic mean.
local ct=0 # Number of data points.
while read value # Read one data point at a time.
do
rt=$(echo "scale=$SC; $rt + $value" | bc)
(( ct++ ))
done
am=$(echo "scale=$SC; $rt / $ct" | bc)
echo $am; return $ct # This function "returns" TWO values!
# Caution: This little trick will not work if $ct > 255!
# To handle a larger number of data points,
#+ simply comment out the "return $ct" above.
} <"$datafile" # Feed in data file.
sd ()
{
mean1=$1 # Arithmetic mean (passed to function).
n=$2 # How many data points.
sum2=0 # Sum of squared differences ("variance").
avg2=0 # Average of $sum2.
sdev=0 # Standard Deviation.
while read value # Read one line at a time.
do
diff=$(echo "scale=$SC; $mean1 - $value" | bc)
# Difference between arith. mean and data point.
dif2=$(echo "scale=$SC; $diff * $diff" | bc) # Squared.
sum2=$(echo "scale=$SC; $sum2 + $dif2" | bc) # Sum of squares.
done
avg2=$(echo "scale=$SC; $sum2 / $n" | bc) # Avg. of sum of squares.
sdev=$(echo "scale=$SC; sqrt($avg2)" | bc) # Square root =
echo $sdev # Standard Deviation.
} <"$datafile" # Rewinds data file.
Showing the output
mean=$(arith_mean); count=$? # Two returns from function!
std_dev=$(sd $mean $count)
echo
echo "<tr><th>Servers</th><th>"Number of data points in \"$datafile"\"</th> <th>Arithmetic mean (average)</th><th>Standard Deviation</th></tr>" >> $HTML
echo "<tr><td>cb2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>gfs2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hr2web1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hrm2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hrm2db1a<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>incv21svr1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo
I want to split the input into chunks of six entries each with the arithmetic mean and the sd of the entries 1..6, then of the entries 7..12, then of 13..18 etc.
This is the output of the table i want.
Tripwire
Month CBS GFS HR HR Payroll INCV
cb2db1 gfs2db1 hr2web1 hrm2db1 hrm2db1a incv2svr1
2013-07 85 76 12 28 26 4
2013-08 58 103 18 6 24 18
2013-09 54 110 11 14 25 17
2013-10 108 129 17 8 23 18
2013-11 52 137 12 8 21 30
2013-12 18 84 6 0 13 13
2014-01 8 16 1 0 9 3
*Standard
deviation
(7mths) 31.172 35.559 5.248 8.935 5.799 8.580
* Mean
(7mths) 54.428 94.285 11.142 9.142 20.285 14.714
paste - - - - - - < data_tripwire.sh | while read -a values; do
# values is an array with 6 values
# ${values[0]} .. ${values[5]}
arith_mean "${values[#]}"
done
This means you have to rewrite your function so they don't use read: change
while read value
to
for value in "$#"
#Matt, yes change both functions to iterate over arguments instead of reading from stdin. Then, you will pass the data file (now called "data_tripwire1.sh" (terrible file extension for data, use .txt or .dat)) into paste to reformat the data so that the first 6 values now form the first row. Read the line into the array values (using read -a values) and invoke the functions :
arith_mean () {
local sum=$(IFS=+; echo "$*")
echo "scale=$SC; ($sum)/$#" | bc
}
sd () {
local mean=$1
shift
local sum2=0
for i in "$#"; do
sum2=$(echo "scale=$SC; $sum2 + ($mean-$i)^2" | bc)
done
echo "scale=$SC; sqrt($sum2/$#)"|bc
}
paste - - - - - - < data_tripwire1.sh | while read -a values; do
mean=$(arith_mean "${values[#]}")
sd=$(sd $mean "${values[#]}")
echo "${values[#]} $mean $sd"
done | column -t
91 58 54 108 52 18 63.500 29.038
8 81 103 110 129 137 94.666 42.765
84 15 14 18 11 17 26.500 25.811
12 6 1 28 6 14 11.166 8.648
8 8 0 0 28 24 11.333 10.934
25 23 21 13 9 4 15.833 7.711
18 17 18 30 13 3 16.500 7.973
Note you don't need to return a fancy value from the functions: you know how many points you pass in.
Based on Glenn's answer I propose this which needs very little changes to the original:
paste - - - - - - < data_tripwire.sh | while read -a values
do
for value in "${values[#]}"
do
echo "$value"
done | arith_mean
for value in "${values[#]}"
do
echo "$value"
done | sd
done
You can type (or copy & paste) this code directly in an interactive shell. It should work out of the box. Of course, this is not feasible if you intend to use this often, so you can put that code into a text file, make that executable and call that text file as a shell script. In this case you should add #!/bin/bash as first line in that file.
Credit to Glenn Jackman for the use of paste - - - - - - which is the real solution I'd say.
The functions will now be able to only read 6 items in datafile.
arith_mean ()
{
local rt=0 # Running total.
local am=0 # Arithmetic mean.
local ct=0 # Number of data points.
while read value # Read one data point at a time.
do
rt=$(echo "scale=$SC; $rt + $value" | bc)
(( ct++ ))
done
am=$(echo "scale=$SC; $rt / $ct" | bc)
echo $am; return $ct # This function "returns" TWO values!
# Caution: This little trick will not work if $ct > 255!
# To handle a larger number of data points,
#+ simply comment out the "return $ct" above.
} <(awk -v block=$i 'NR > (6* (block - 1)) && NR < (6 * block + 1) {print}' "$datafile") # Feed in data file.
sd ()
{
mean1=$1 # Arithmetic mean (passed to function).
n=$2 # How many data points.
sum2=0 # Sum of squared differences ("variance").
avg2=0 # Average of $sum2.
sdev=0 # Standard Deviation.
while read value # Read one line at a time.
do
diff=$(echo "scale=$SC; $mean1 - $value" | bc)
# Difference between arith. mean and data point.
dif2=$(echo "scale=$SC; $diff * $diff" | bc) # Squared.
sum2=$(echo "scale=$SC; $sum2 + $dif2" | bc) # Sum of squares.
done
avg2=$(echo "scale=$SC; $sum2 / $n" | bc) # Avg. of sum of squares.
sdev=$(echo "scale=$SC; sqrt($avg2)" | bc) # Square root =
echo $sdev # Standard Deviation.
} <(awk -v block=$i 'NR > (6 * (block - 1)) && NR < (6 * block + 1) {print}' "$datafile") # Rewinds data file.
From main you will need to set your blocks to read.
for((i=1; i <= $(( $(wc -l $datafile | sed 's/[A-Za-z \/]*//g') / 6 )); i++))
do
mean=$(arith_mean); count=$? # Two returns from function!
std_dev=$(sd $mean $count)
done
Of course it is better to move the wc -l outside of the loop for faster execution. But you get the idea.
The syntax error occured between < and ( due to space. There shouldn't be a space between them. Sorry for the typo.
cat <(awk -F: '{print $1}' /etc/passwd) works.
cat < (awk -F: '{print $1}' /etc/passwd) syntax error near unexpected token `('

split a file based upon line number

I have a large file that needs to be slitted based on line numbers.
For instance , my file is like that:
aaaaaa
bbbbbb
cccccc
dddddd
****** //here blank line//
eeeeee
ffffff
gggggg
hhhhhh
*******//here blank line//
ıııııı
jjjjjj
kkkkkk
llllll
******
//And so on...
I need two separate files as such that one file should have first 4 lines, third 4 lines, fifth 4 lines in it and the other file should have second 4 lines, fourth 4 lines, sixth 4 lines in it and so on. how can I do that in bash script?
You can play with the number of the line, NR:
$ awk 'NR%10>0 && NR%10<5' your_file > file1
$ awk 'NR%10>5' your_file > file2
If it is 10K + n, 0 < n < 5, then goes to the first file.
If it is 10K + n, n > 5, then goes to the second file.
In one line:
$ awk 'NR%10>0 && NR%10<5 {print > "file1"} NR%10>5 {print > "file2"}' file
Test
$ cat a
1
2
3
4
6
7
8
9
11
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
31
32
33
34
36
37
38
39
41
42
43
44
46
47
48
49
51
$ awk 'NR%10>0 && NR%10<5 {print > "file1"} NR%10>5 {print > "file2"}' a
$ cat file1
1
2
3
4
11
12
13
14
21
22
23
24
31
32
33
34
41
42
43
44
51
$ cat file2
6
7
8
9
16
17
18
19
26
27
28
29
36
37
38
39
46
47
48
49
You can do this with head and tail (which are not be part of the bash itself):
head -n 20 <file> | tail -n 5
gives you the lines 15 to 20.
This is however inefficient, if you want to get multiple sections of your file, since it has to be parsed again and again. In this case I'd prefer some real scripting.
Another approach is to treat blank-line-separated paragraphs as the records, and print odd-numbered and even-numbered records to different files:
awk -v RS= -v ORS='\n\n' '{
outfile = (NR % 2 == 1) ? "file1" : "file2"
print > outfile
}' file
Maybe something like that:
#!/bin/bash
EVEN="even.log"
ODD="odd.log"
line_count=0
block_count=0
while read line
do
# ignore blank lines
if [ ! -z "$line" ]; then
if [ $(( $block_count % 2 )) -eq 0 ]; then
# even
echo "$line" >> "$EVEN"
else
# odd
echo "$line" >> "$ODD"
fi
line_count=$[$line_count +1]
if [ "$line_count" -eq "4" ]; then
block_count=$[$block_count +1]
line_count=0
fi
fi
done < "$1"
The first argument is the source file: ./split.sh split_input
This script prints lines from file 1.txt with indexes 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, ...
i=0
while read p; do
if [ $i%8 -lt 4 ]
then
echo $p
fi
let i=$i+1
done < 1.txt
This script prints lines with indexes 4, 5, 6, 7, 12, 13, 14, 15, ...
i=0
while read p; do
if [ $i%8 -gt 3 ]
then
echo $p
fi
let i=$i+1
done < 1.txt

Resources