Simple method to shuffle the elements of an array in BASH shell? - bash

I can do this in PHP but am trying to work within the BASH shell. I need to take an array and then randomly shuffle the contents and dump that to somefile.txt.
So given array Heresmyarray, of elements a;b;c;d;e;f; it would produce an output file, output.txt, which would contain elements f;c;b;a;e;d;
The elements need to retain the semicolon delimiter. I've seen a number of bash shell array operations but nothing that seems even close to this simple concept. Thanks for any help or suggestions!

The accepted answer doesn't match the headline question too well, though the details in the question are a bit ambiguous. The question asks about how to shuffle elements of an array in BASH, and kurumi's answer shows a way to manipulate the contents of a string.
kurumi nonetheless makes good use of the 'shuf' command, while siegeX shows how to work with an array.
Putting the two together yields an actual "simple method to shuffle the elements of an array in BASH shell":
$ myarray=( 'a;' 'b;' 'c;' 'd;' 'e;' 'f;' )
$ myarray=( $(shuf -e "${myarray[#]}") )
$ printf "%s" "${myarray[#]}"
d;b;e;a;c;f;

From the BashFaq
This function shuffles the elements of an array in-place using the Knuth-Fisher-Yates shuffle algorithm.
#!/bin/bash
shuffle() {
local i tmp size max rand
# $RANDOM % (i+1) is biased because of the limited range of $RANDOM
# Compensate by using a range which is a multiple of the array size.
size=${#array[*]}
max=$(( 32768 / size * size ))
for ((i=size-1; i>0; i--)); do
while (( (rand=$RANDOM) >= max )); do :; done
rand=$(( rand % (i+1) ))
tmp=${array[i]} array[i]=${array[rand]} array[rand]=$tmp
done
}
# Define the array named 'array'
array=( 'a;' 'b;' 'c;' 'd;' 'e;' 'f;' )
shuffle
printf "%s" "${array[#]}"
Output
$ ./shuff_ar > somefile.txt
$ cat somefile.txt
b;c;e;f;d;a;

If you just want to put them into a file (use redirection > )
$ echo "a;b;c;d;e;f;" | sed -r 's/(.[^;]*;)/ \1 /g' | tr " " "\n" | shuf | tr -d "\n"
d;a;e;f;b;c;
$ echo "a;b;c;d;e;f;" | sed -r 's/(.[^;]*;)/ \1 /g' | tr " " "\n" | shuf | tr -d "\n" > output.txt
If you want to put the items in array
$ array=( $(echo "a;b;c;d;e;f;" | sed -r 's/(.[^;]*;)/ \1 /g' | tr " " "\n" | shuf | tr -d " " ) )
$ echo ${array[0]}
e;
$ echo ${array[1]}
d;
$ echo ${array[2]}
a;
If your data has &#abcde;
$ echo "a;&#abcde;c;d;e;f;" | sed -r 's/(.[^;]*;)/ \1 /g' | tr " " "\n" | shuf | tr -d "\n"
d;c;f;&#abcde;e;a;
$ echo "a;&#abcde;c;d;e;f;" | sed -r 's/(.[^;]*;)/ \1 /g' | tr " " "\n" | shuf | tr -d "\n"
&#abcde;f;a;c;d;e;

Related

How to turn a string into a modified hex representation?

i want to turn a string like
AaAa
into
a string like this
%<41>%<61>%<41>%<61>
Simple enough with the programming languages i am familar with, but with bash i can't get get the piping right to do what i am trying to do:
split string into char array
turn each char into hex
wrap each hex value into %<FF>
concat string
this is my current way which gets me half way there:
echo -n "AaAa" | od -A n -t x1
If you are already using od,
printf "%%<%s>" $(od -A n -t x1<<<"AaAa")
For an all-bash without od,
while read -r -N 1 c; do printf "%%<%02X>" "$( printf "%d" \'$c )"; done <<< AaAa
The downside of this approach is that it spawns a subshell for every character, and assumes ASCII/UTF8.
edit
#Shawn pointed out that you don't need the subshell -
while read -r -N 1 c; do printf "%%<%02X>" \'$c; done <<< AaAa
I noticed that these are leaving the string terminator in your output, though, and realized I could eliminate that and the read by assigning the data to a variable and using the built-in parsing tools.
$: x=AaAa && for((i=0;i<${#x};i++)); do printf "%%<%02X>" \'${x:i:1}; done; echo
%<41>%<61>%<41>%<61>
A simple Perl substitution would do the trick:
echo -n AaAa | perl -pe's/(.)/ sprintf "%%<%02X>", ord($1) /seg'
Shorter:
echo -n AaAa | perl -ne'printf "%%<%02X>", $_ for unpack "C*"'
In both cases, the output is the expected
%<41>%<61>%<41>%<61>
(No trailing line feed added. If you want one, append ; END { print "\n" }.)
You can pipe to sed to wrap each byte in %<> and then remove the whitespace.
echo -n "AaAa" | od -A n -t x1 | sed -E -e 's/[a-z0-9]+/%<&>/g' -e 's/ //g'
You could use perl:
echo -n AaAa | perl -ne 'for $c (split//) { printf("%%<%02X>", ord($c)); }'
Output
%<41>%<61>%<41>%<61>
Maybe awk
echo -n "AaAa" |
od -A n -t x1 |
awk 'BEGIN { ORS = "" } { for (i = 1; i <= NF; i+=1) print "%<"$i">"}'

Bash: Split two strings directly into associative array

I have two strings of same number of substrings divided by a delimiter.
I need to create key-value pairs from substrings.
Short example:
Input:
firstString='00011010:00011101:00100001'
secondString='H:K:O'
delimiter=':'
Desired result:
${translateMap['00011010']} -> 'H'
${translateMap['00011101']} -> 'K'
${translateMap['00100001']} -> 'O'
So, I wrote:
IFS="$delimiter" read -ra fromArray <<< "$firstString"
IFS="$delimiter" read -ra toArray <<< "$secondString"
declare -A translateMap
curIndex=0
for from in "${fromArray[#]}"; do
translateMap["$from"]="${toArray[curIndex]}"
((curIndex++))
done
Is there any way to create the associative array directly from 2 strings without the unneeded arrays and loop? Something like:
IFS="$delimiter" read -rA translateMap["$(read -ra <<< "$firstString")"] <<< "$secondString"
Is it possible?
A (somewhat convoluted) variation on #accdias's answer of assigning the values via the declare -A command, but will need a bit of explanation for each step ...
First we need to break the 2 variables into separate lines for each item:
$ echo "${firstString}" | tr "${delimiter}" '\n'
00011010
00011101
00100001
$ echo "${secondString}" | tr "${delimiter}" '\n'
H
K
O
What's nice about this is that we can now process these 2 sets of key/value pairs as separate files.
NOTE: For the rest off this discussion I'm going to replace "${delimiter}" with ':' to make this a tad bit (but not much) less convoluted.
Next we make use of the paste command to merge our 2 'files' into a single file; we'll also designate ']' as the delimiter between key/value mappings:
$ paste -d ']' <(echo "${firstString}" | tr ':' '\n') <(echo "${secondString}" | tr ':' '\n')
00011010]H
00011101]K
00100001]O
We'll now run these results through a couple sed patterns to build our array assignments:
$ paste -d ']' <(echo "${firstString}" | tr ':' '\n') <(echo "${secondString}" | tr ':' '\n') | sed 's/^/[/g;s/]/]=/g'
[00011010]=H
[00011101]=K
[00100001]=O
What we'd like to do now is use this output in the typeset -A command but unfortunately we need to build the entire command and then eval it:
$ evalstring="typeset -A kv=( "$(paste -d ']' <(echo "${firstString}" | tr ':' '\n') <(echo "${secondString}" | tr ':' '\n') | sed 's/^/[/g;s/]/]=/g')" )"
$ echo "$evalstring"
typeset -A kv=( [00011010]=H
[00011101]=K
[00100001]=O )
If we want to remove the carriage returns and put on a single line we append another tr at the output from the sed command:
$ evalstring="typeset -A kv=( "$(paste -d ']' <(echo "${firstString}" | tr ':' '\n') <(echo "${secondString}" | tr ':' '\n') | sed 's/^/[/g;s/]/]=/g' | tr '\n' ' ')" )"
$ cat "${evalstring}"
typeset -A kv=( [00011010]=H [00011101]=K [00100001]=O )
At this point we can eval our auto-generated typeset -A command:
$ eval "${evalstring}"
And now loop through our array displaying the key/value pairs:
$ for i in ${!kv[#]}; do echo "kv[${i}] = ${kv[${i}]}"; done
kv[00011010] = H
kv[00100001] = O
kv[00011101] = K
Hey, I did say this would be a bit convoluted! :-)
It is probably not what you expect, but this works:
key_string="A:B:C:D"
val_string="1:2:3:4"
declare -A map
while [ -n "$key_string" ] && [ -n "$val_string" ]; do
IFS=: read -r key key_string <<<"$key_string"
IFS=: read -r val val_string <<<"$val_string"
map[$key]="$val"
done
for key in "${!map[#]}"; do echo "$key => ${map[$key]}"; done
It uses recursion in the read function to reassign the string value.
The downside of this method is that it destroys the original strings. The while-loop checks constantly if both strings have a non-zero length.
Next to the above in pure bash, you could any command to generate the associative array. See How do I populate a bash associative array with command output?
This generally looks like:
declare -A map="( $( magic_command ) )"
where the magic_command generates an output like
[key1]=val1
[key2]=val2
[key3]=val3
In this case we use the command:
paste -d "" <(echo "[${key_string//:/]=$'\n'[}]=") \
<(echo "${val_string//:/$'\n'}")
where we use bash substitution to replace the delimiter with a newline. However, any other magic_command might do. For completion:
key_string="A:B:C:D"
val_string="1:2:3:4"
declare -A map="( $(paste -d "" <(echo "[${key_string//:/]=$'\n'[}]=") \
<(echo "${val_string//:/$'\n'}")) )"
for key in "${!map[#]}"; do echo "$key => ${map[$key]}"; done
Both examples generate the following output
D => 4
C => 3
B => 2
A => 1
Not exactly the answer for what you asked but at least it is shorter:
key='00011010:00011101:00100001'
value='H:K:O'
ifs=':'
IFS="$ifs" read -ra keys <<< "$key"
IFS="$ifs" read -ra values <<< "$value"
declare -A kv
for ((i=0; i<${#keys[*]}; i++)); do
kv[${keys[i]}]=${values[i]}
done
As a side note, you can initialize an associative array in one step with:
declare -A kv=([key1]=value1 [key2]=value2 [keyn]=valuen)
But I don't know how to use that in your case.
If values in your strings won't use spaces i would suggest this approach
firstString='00011010:00011101:00100001'
secondString='H:K:O'
delimiter=':'
declare -A translateMap
firstArray=( ${firstString//$delimiter/' '} )
secondArray=( ${secondString//$delimiter/' '} )
for i in ${!firstArray[#]}; {
translateMap[firstArray[$i]}]=${secondArray[$i]}
}

convert factor to numeric in bash

What's the most efficient way to convert a factor vector (not all levels are unique) into a numeric vector in bash? The values in the numeric vector do not matter as long as each represents a unique level of the factor.
To illustrate, this would be the R equivalent to what I want to do in bash:
numeric<-seq_along(levels(factor))[factor]
I.e.:
factor
AV1019A
ABG1787
AV1019A
B77hhA
B77hhA
numeric
1
2
1
3
3
Many thanks.
It is most probably not the most efficient, but maybe something to start.
#!/bin/bash
input_data=$( mktemp )
map_file=$( mktemp )
# your example written to a file
echo -e "AV1019A\nABG1787\nAV1019A\nB77hhA\nB77hhA" >> $input_data
# create a map <numeric, factor> and write to file
idx=0
for factor in $( cat $input_data | sort -u )
do
echo $idx $factor
let idx=$idx+1
done > $map_file
# go through your file again and replace values with keys
while read line
do
key=$( cat $map_file | grep -e ".* ${line}$" | awk '{print $1}' )
echo $key
done < $input_data
# cleanup
rm -f $input_data $map_file
I initially wanted to use associative arrays, but it's a bash 4+ feature only and not available here and there. If you have bash 4 then you have one file less, which is obviously more efficient.
#!/bin/bash
# your example written to a file
input_data=$( mktemp )
echo -e "AV1019A\nABG1787\nAV1019A\nB77hhA\nB77hhA" >> $input_data
# declare an array
declare -a factor_map=($( cat $input_data | sort -u | tr "\n" " " ))
# go through your file replace values with keys
while read line
do
echo ${factor_map[#]/$line//} | cut -d/ -f1 | wc -w | tr -d ' '
done < $input_data
# cleanup
rm -f $input_data

How to sort and get unique values from an array in bash?

Im new to bash scripting... Im trying to sort and store unique values from an array into another array.
eg:
list=('a','b','b','b','c','c');
I need,
unique_sorted_list=('b','c','a')
I tried a couple of things, didnt help me ..
sorted_ids=($(for v in "${ids[#]}"; do echo "$v";done| sort| uniq| xargs))
or
sorted_ids=$(echo "${ids[#]}" | tr ' ' '\n' | sort -u | tr '\n' ' ')
Can you guys please help me in this ....
Try:
$ list=(a b b b c c)
$ unique_sorted_list=($(printf "%s\n" "${list[#]}" | sort -u))
$ echo "${unique_sorted_list[#]}"
a b c
Update based on comments:
$ uniq=($(printf "%s\n" "${list[#]}" | sort | uniq -c | sort -rnk1 | awk '{ print $2 }'))
The accepted answer doesn't work if array elements contain spaces.
Try this instead:
readarray -t unique_sorted_list < <( printf "%s\n" "${list[#]}" | sort -u )
In Bash, readarray is an alias to the built-in mapfile command. See help mapfile for details.
The -t option is to remove the trailing newline (used in printf) from each line read.

How to split a string in shell and get the last field

Suppose I have the string 1:2:3:4:5 and I want to get its last field (5 in this case). How do I do that using Bash? I tried cut, but I don't know how to specify the last field with -f.
You can use string operators:
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5
This trims everything from the front until a ':', greedily.
${foo <-- from variable foo
## <-- greedy front trim
* <-- matches anything
: <-- until the last ':'
}
Another way is to reverse before and after cut:
$ echo ab:cd:ef | rev | cut -d: -f1 | rev
ef
This makes it very easy to get the last but one field, or any range of fields numbered from the end.
It's difficult to get the last field using cut, but here are some solutions in awk and perl
echo 1:2:3:4:5 | awk -F: '{print $NF}'
echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]'
Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:
$ echo "1:2:3:4:5" | grep -oE "[^:]+$"
5
Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($). -o only prints the matching part.
You could try something like this if you want to use cut:
echo "1:2:3:4:5" | cut -d ":" -f5
You can also use grep try like this :
echo " 1:2:3:4:5" | grep -o '[^:]*$'
One way:
var1="1:2:3:4:5"
var2=${var1##*:}
Another, using an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
var2=${var2[#]: -1}
Yet another with an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
count=${#var2[#]}
var2=${var2[$count-1]}
Using Bash (version >= 3.2) regular expressions:
var1="1:2:3:4:5"
[[ $var1 =~ :([^:]*)$ ]]
var2=${BASH_REMATCH[1]}
$ echo "a b c d e" | tr ' ' '\n' | tail -1
e
Simply translate the delimiter into a newline and choose the last entry with tail -1.
Using sed:
$ echo '1:2:3:4:5' | sed 's/.*://' # => 5
$ echo '' | sed 's/.*://' # => (empty)
$ echo ':' | sed 's/.*://' # => (empty)
$ echo ':b' | sed 's/.*://' # => b
$ echo '::c' | sed 's/.*://' # => c
$ echo 'a' | sed 's/.*://' # => a
$ echo 'a:' | sed 's/.*://' # => (empty)
$ echo 'a:b' | sed 's/.*://' # => b
$ echo 'a::c' | sed 's/.*://' # => c
There are many good answers here, but still I want to share this one using basename :
basename $(echo "a:b:c:d:e" | tr ':' '/')
However it will fail if there are already some '/' in your string.
If slash / is your delimiter then you just have to (and should) use basename.
It's not the best answer but it just shows how you can be creative using bash commands.
If your last field is a single character, you could do this:
a="1:2:3:4:5"
echo ${a: -1}
echo ${a:(-1)}
Check string manipulation in bash.
Using Bash.
$ var1="1:2:3:4:0"
$ IFS=":"
$ set -- $var1
$ eval echo \$${#}
0
echo "a:b:c:d:e"|xargs -d : -n1|tail -1
First use xargs split it using ":",-n1 means every line only have one part.Then,pring the last part.
Regex matching in sed is greedy (always goes to the last occurrence), which you can use to your advantage here:
$ foo=1:2:3:4:5
$ echo ${foo} | sed "s/.*://"
5
A solution using the read builtin:
IFS=':' read -a fields <<< "1:2:3:4:5"
echo "${fields[4]}"
Or, to make it more generic:
echo "${fields[-1]}" # prints the last item
for x in `echo $str | tr ";" "\n"`; do echo $x; done
improving from #mateusz-piotrowski and #user3133260 answer,
echo "a:b:c:d::e:: ::" | tr ':' ' ' | xargs | tr ' ' '\n' | tail -1
first, tr ':' ' ' -> replace ':' with whitespace
then, trim with xargs
after that, tr ' ' '\n' -> replace remained whitespace to newline
lastly, tail -1 -> get the last string
For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice choice to solve this problem.
$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'
From the pythonpy help: -x treat each row of stdin as x.
With that tool, it is easy to write python code that gets applied to the input.
Edit (Dec 2020):
Pythonpy is no longer online.
Here is an alternative:
$ echo "a:b:c:d:e" | python -c 'import sys; sys.stdout.write(sys.stdin.read().split(":")[-1])'
it contains more boilerplate code (i.e. sys.stdout.read/write) but requires only std libraries from python.

Resources