Script to create a four character string based on known numerical relationships - bash

Consider a three line input file containing four unique numbers (1,2,3,4) such that each line represents the position of one number relative to another number.
So for example in the following input set, 4 is next to 2, 2 is next to 3, and 1 is next to 4.
42
23
14
So given that how would a script assemble all four numbers in such a way that it maintains each numbers known relationship?
In other words there are two answers 1423 or 3241 but how to arrive at that programmatically?

Not very sensible or efficient, but fun (for me, at least) :-)
This will echo all the permutations using GNU Parallel:
parallel echo {1}{2}{3}{4} ::: {1..4} ::: {1..4} ::: {1..4} ::: {1..4}
And add some grepping on the end:
parallel echo {1}{2}{3}{4} ::: {1..4} ::: {1..4} ::: {1..4} ::: {1..4} | grep -E "42|24" | grep -E "23|32" | grep -E "14|41"
Output
1423
3241

Brute forcing the luck:
for (( ; ; ))
do
res=($(echo "42
23
14" | shuf))
if ((${res[0]}%10 == ${res[1]}/10 && ${res[1]}%10 == ${res[2]}/10))
then
echo "success: ${res[#]}"
break
fi
echo "fail: ${res[#]}"
done
fail: 42 14 23
fail: 42 23 14
fail: 42 14 23
success: 14 42 23
For 3 numbers, this approach is acceptable.
Shuf shuffles the input lines and fills the array res with the numbers.
Then we take to following numbers and test, if the last digit of the first matches the first digit of the next, and for the 2nd and 3rd number accordingly.
If so, we break with a success message. For debugging, a failure message is better than a silent endless loop.
For longer chains of numbers, a systematic permutation might be better to test and a function to check two following numbers, which can be called by index or better a loop would be suitable.

Related

Iterate Through List with Seq and Variable

I am attempting to loop through a list of integers starting out like so:
start=000
for i in $(seq -w $start 48 006);
However, when I try this code above, the loop seems to loop once and then quit.
What do I need to modify? (The leading zeroes need to stay)
Could you please try following.
start=0
diff=6
for i in $(seq $start $diff 48);
do
printf '%03d\n' $i
done
Output will be as follows.
000
006
012
018
024
030
036
042
048
Problem in OP's tried code:
I believe you have given wrong syntax in seq it should be startpoint then increment_number then endpoint eg-->(seq(start_point increment end_point)). Since you have given them wrongly thus it is printing them only once in loop.
In your attempt it is taking starting point as 0 and should run till 6 with difference of 48 which is NOT possible so it is printing only very first integer value which is fair enough.
EDIT: As per #Cyrus sir's comment adding BASH builtin solution here without using seq.
for ((i=0; i<=48; i=i+6)); do printf '%03d\n' $i; done
seq's input takes a start, increment-by, and finish.
You've reversed the increment-by with finish: seq -w $start 48 006 means start at zero, increment by 48 to finish at 6. The simple fix is seq -w $start 6 48. Note: 006 is not needed, just 6 since seq will equalize the widths of the numbers to two places.

Generate random numbers without collisions

I want to generate different random numbers in bash . I used $RANDOM , but in my output some numbers are identical.
var1=$((1+$RANDOM%48))
var2=$((1+$RANDOM%48))
var3=$((1+$RANDOM%48))
var4=$((1+$RANDOM%48))
var5=$((1+$RANDOM%48))
var6=$((1+$RANDOM%48))
it gives me 6 numbers between 1 and 48 but i need 6 DIFFERENT numbers between 1 and 48, the fact is that im really new and i dont know even how to start.
if you want 6 different pseudo-random numbers between 1-48 this is one way to make it
$ seq 48 | shuf | head -6
18
10
17
3
11
6
or directly with shuf options (as in this answer)
shuf -i 1-48 -n 6
another method would be rejection sampling. With awk
awk 'BEGIN{srand();
do {a[int(1+rand()*48)]} while (length(a)<6);
for(k in a) print k}'
8
14
16
23
24
27
here rejection is implicit, adding the same number again won't increase the array size (essentially a hash map)
to assign result to variable is possible, but the structure begs for array use, for example
declare -a randarray
readarray randarray < <(seq 48 | shuf | head -6)
you can access the individual elements with the corresponding index (0-based)
echo ${randarray[3]}
In general if your number of samples are close to the range of the sample space, you will want shuffle (extreme case if you want N numbers from the range 1-N, what you're asking is a random permutation), or if the ratio is small, rejection sampling might be better, (extreme case you want one random variable). Rejection sampling is used mostly if you have other conditions to eliminate a sample. However, direct use of shuf with options is very fast already and you may not need rejection sampling method at all for basic uses.
shuf -i 1-48 -n 6
will give you 6 random numbers between 1 and 48. All will be different per default.
This algorithm will roll six random numbers and check each of them to make sure there haven't been any duplicates.
declare -a nums
let times=6
for i in $(eval echo {0..$times}); do
let nums[$i]=0
let dup=1
while [ $dup -eq 1 ]; do
let temp=$((1+$RANDOM%48))
let dup=0
for j in $(eval echo {0..${#nums[*]}}); do
let cur=$((${nums[$j]}))
if ! [ "$cur" -eq "$cur" ] 2> /dev/null; then
break
fi
if [ "$cur" -eq "$(eval echo $temp)" ]; then
let dup=1
fi
done
done
let nums[$(($j - 1))]=$temp
done
echo ${nums[#]:0}
exit 1

Splitting files by line across two files equally without pre-defined chunk length - Unix

I have two files of equal length (i.e. no. of lines):
text.en
text.cs
I want to incrementally split the files into 12 parts and as I iterate, I need to add 1 out of the first ten part to it.
Let's say if I the files contain 100 lines, I need some sort of loop that does:
#!/bin/bash
F1=text.en
F2=text.cs
for i in `seq 0 9`;
do
split -n l/12 -d text.en
cat x10 > dev.en
cat x11 > test.en
echo "" > train.en
for j in `seq 0 $i`; do
cat x0$j >> train.en
done
split -n l/12 -d text.cs
cat x10 > dev.cs
cat x11 > test.cs
echo "" > train.cs
for j in `seq 0 $i`; do
cat x0$j >> train.cs
done
wc -l train.en train.cs
echo "############"
done
[out]:
55632 train.en
55468 train.cs
111100 total
############
110703 train.en
110632 train.cs
221335 total
############
165795 train.en
165011 train.cs
330806 total
############
It's giving me unequal chunks between the files.
Also, when I use split, it's splitting into unequal chunks:
alvas#ubi:~/workspace/cvmt$ split -n l/12 -d text.en
alvas#ubi:~/workspace/cvmt$ wc -l x*
55631 x00
55071 x01
55092 x02
54350 x03
54570 x04
54114 x05
55061 x06
53432 x07
52685 x08
52443 x09
52074 x10
52082 x11
646605 total
I don't know the no. of lines of the file before hand, so I can't use the split -l option.
How do I split a file into equal size by no. of lines given that I don't know how many lines are there in the files beforehand? Should I do some sort of pre-calculation with wc -l?
How do I ensure that the split across two files are of equal size in for every chunk?
(Note that the solution needs to split the file at the end of the lines, i.e. don't split up any lines, just split the file by line).
It's not entirely clear what you're trying to achieve, but here are a few pointers:
split -n l/12 splits into 12 chunks of roughly equal byte size, not number of lines.
split -n r/12 will try to distribute the line count evenly, but if the chunk size is not a divisor of the total line count, you'll still get (slightly) varying line counts: the extra lines are distributed round-robin style.
E.g., with 100 input lines and a line chunk size of 12, you'll get line counts of 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8: 100 / 12 = 8 (integer division), and 100 % 12 = 4, so all files get at least 8 lines, with the extra 4 lines distributed among the first 4 output files.
So, yes, if you want a fixed line count for all files (except for the last, if the chunk size is not a divisor), you must calculate the total line count up front, perform integer division to get the fixed line count, and use split -l with that count:
totalLines=$(wc -l < text.en)
linesPerFile=$(( totalLines / 12 ))
split -l 12 text.en # with 100 lines, yields 8 files with 12 and 1 with 4 lines
Additional observations:
With a small, fixed iteration count, it is easier and more efficient to use brace expansion (e.g., for i in {0..9} rather than for i in `seq 0 9`).
If a variable must be used, or with larger numbers, use an arithmetic expression:
n=9; for (( i = 0; i <= $n; i++ )); do ...; done
While you cannot do cat x0{0..$i} directly (because Bash doesn't support variables in brace expansions), you can emulate it by combining seq -f and xargs:
You can replace
echo "" > train.en
for j in `seq 0 $i`; do
cat x0$j >> train.en
done
with the following:
seq -f 'x%02.f' "$i" | xargs cat > train.en
Since you control the value of $i, you could even simplify to:
eval "cat x0{0..$i}" > train.en # !! Only do this if you trust $i to contain a number.

Array intersection in Bash [duplicate]

This question already has answers here:
Intersection of two lists in Bash
(5 answers)
Closed 6 years ago.
How do you compare two arrays in Bash to find all intersecting values?
Let's say:
array1 contains values 1 and 2
array2 contains values 2 and 3
I should get back 2 as a result.
My own answer:
for item1 in $array1; do
for item2 in $array2; do
if [[ $item1 = $item2 ]]; then
result=$result" "$item1
fi
done
done
I'm looking for alternate solutions as well.
The elements of list 1 are used as regular expression looked up in list2 (expressed as string: ${list2[*]} ):
list1=( 1 2 3 4 6 7 8 9 10 11 12)
list2=( 1 2 3 5 6 8 9 11 )
l2=" ${list2[*]} " # add framing blanks
for item in ${list1[#]}; do
if [[ $l2 =~ " $item " ]] ; then # use $item as regexp
result+=($item)
fi
done
echo ${result[#]}
The result is
1 2 3 6 8 9 11
Taking #Raihan's answer and making it work with non-files (though FDs are created)
I know it's a bit of a cheat but seemed like good alternative
Side effect is that the output array will be lexicographically sorted, hope thats okay
(also don't kno what type of data you have, so I just tested with numbers, there may be additional work needed if you have strings with special chars etc)
result=($(comm -12 <(for X in "${array1[#]}"; do echo "${X}"; done|sort) <(for X in "${array2[#]}"; do echo "${X}"; done|sort)))
Testing:
$ array1=(1 17 33 99 109)
$ array2=(1 2 17 31 98 109)
result=($(comm -12 <(for X in "${array1[#]}"; do echo "${X}"; done|sort) <(for X in "${array2[#]}"; do echo "${X}"; done|sort)))
$ echo ${result[#]}
1 109 17
p.s. I'm sure there was a way to get the array to out one value per line w/o the for loop, I just forget it (IFS?)
Your answer won't work, for two reasons:
$array1 just expands to the first element of array1. (At least, in my installed version of Bash that's how it works. That doesn't seem to be a documented behavior, so it may be a version-dependent quirk.)
After the first element gets added to result, result will then contain a space, so the next run of result=$result" "$item1 will misbehave horribly. (Instead of appending to result, it will run the command consisting of the first two items, with the environment variable result being set to the empty string.) Correction: Turns out, I was wrong about this one: word-splitting doesn't take place inside assignments. (See comments below.)
What you want is this:
result=()
for item1 in "${array1[#]}"; do
for item2 in "${array2[#]}"; do
if [[ $item1 = $item2 ]]; then
result+=("$item1")
fi
done
done
If it was two files (instead of arrays) you were looking for intersecting lines, you could use the comm command.
$ comm -12 file1 file2
Now that I understand what you mean by "array", I think -- first of all -- that you should consider using actual Bash arrays. They're much more flexible, in that (for example) array elements can contain whitespace, and you can avoid the risk that * and ? will trigger filename expansion.
But if you prefer to use your existing approach of whitespace-delimited strings, then I agree with RHT's suggestion to use Perl:
result=$(perl -e 'my %array2 = map +($_ => 1), split /\s+/, $ARGV[1];
print join " ", grep $array2{$_}, split /\s+/, $ARGV[0]
' "$array1" "$array2")
(The line-breaks are just for readability; you can get rid of them if you want.)
In the above Bash command, the embedded Perl program creates a hash named %array2 containing the elements of the second array, and then it prints any elements of the first array that exist in %array2.
This will behave slightly differently from your code in how it handles duplicate values in the second array; in your code, if array1 contains x twice and array2 contains x three times, then result will contain x six times, whereas in my code, result will contain x only twice. I don't know if that matters, since I don't know your exact requirements.

KornShell Sort Array of Integers

Is there a command in KornShell (ksh) scripting to sort an array of integers? In this specific case, I am interested in simplicity over efficiency. For example if the variable $UNSORTED_ARR contained values "100911, 111228, 090822" and I wanted to store the result in $SORTED_ARR
Is it actually an indexed array or a list in a string?
Array:
UNSORTED_ARR=(100911 111228 090822)
SORTED_ARR=($(printf "%s\n" ${UNSORTED_ARR[#]} | sort -n))
String:
UNSORTED_ARR="100911, 111228, 090822"
SORTED_ARR=$(IFS=, printf "%s\n" ${UNSORTED_ARR[#]} | sort -n | sed ':a;$s/\n/,/g;N;ba')
There are several other ways to do this, but the principle is the same.
Here's another way for a string using a different technique:
set -s -- ${UNSORTED_ARR//,}
SORTED_ARR=$#
SORTED_ARR=${SORTED_ARR// /, }
Note that this is a lexicographic sort so you would see this kind of thing when the numbers don't have leading zeros:
$ set -s -- 10 2 1 100 20
$ echo $#
1 10 100 2 20
If I take that out then it works but I can't loop through it (because its a list of strings now) – pws5068 Mar 4 '11 at 21:01
Do this:
\# create sorted array
set **-s** -A $#

Resources