How to sort and get unique values from an array in bash? - bash

Im new to bash scripting... Im trying to sort and store unique values from an array into another array.
eg:
list=('a','b','b','b','c','c');
I need,
unique_sorted_list=('b','c','a')
I tried a couple of things, didnt help me ..
sorted_ids=($(for v in "${ids[#]}"; do echo "$v";done| sort| uniq| xargs))
or
sorted_ids=$(echo "${ids[#]}" | tr ' ' '\n' | sort -u | tr '\n' ' ')
Can you guys please help me in this ....

Try:
$ list=(a b b b c c)
$ unique_sorted_list=($(printf "%s\n" "${list[#]}" | sort -u))
$ echo "${unique_sorted_list[#]}"
a b c
Update based on comments:
$ uniq=($(printf "%s\n" "${list[#]}" | sort | uniq -c | sort -rnk1 | awk '{ print $2 }'))

The accepted answer doesn't work if array elements contain spaces.
Try this instead:
readarray -t unique_sorted_list < <( printf "%s\n" "${list[#]}" | sort -u )
In Bash, readarray is an alias to the built-in mapfile command. See help mapfile for details.
The -t option is to remove the trailing newline (used in printf) from each line read.

Related

Split pipe into two and paste the results together?

I want to pipe the output of the command into two commands and paste the results together. I found this answer and similar ones suggesting using tee but I'm not sure how to make it work as I'd like it to.
My problem (simplified):
Say that I have a myfile.txt with keys and values, e.g.
key1 /path/to/file1
key2 /path/to/file2
What I am doing right now is
paste \
<( cat myfile.txt | cut -f1 ) \
<( cat myfile.txt | cut -f2 | xargs wc -l )
and it produces
key1 23
key2 42
The problem is that cat myfile.txt is repeated here (in the real problem it's a heavier operation). Instead, I'd like to do something like
cat myfile.txt | tee \
<( cut -f1 ) \
<( cut -f2 | xargs wc -l ) \
| paste
But it doesn't produce the expected output. Is it possible to do something similar to the above with pipes and standard command-line tools?
This doesn't answer your question about pipes, but you can use AWK to solve your problem:
$ printf %s\\n 1 2 3 > file1.txt
$ printf %s\\n 1 2 3 4 5 > file2.txt
$ cat > myfile.txt <<EOF
key1 file1.txt
key2 file2.txt
EOF
$ cat myfile.txt | awk '{ ("wc -l " $2) | getline size; sub(/ .+$/,"",size); print $1, size }'
key1 3
key2 5
On each line we first we run wc -l $2 and save the result into a variable. Not sure about yours, but on my system wc -l includes the filename in the output, so we strip it with sub() to match your example output. And finally, we print the $1 field (key) and the size we got from wc -l command.
Also, can be done with shell, now that I think about it:
cat myfile.txt | while read -r key value; do
printf '%s %s\n' "$key" "$(wc -l "$value" | cut -d' ' -f1)"
done
Or more generally, by piping to two commands and using paste, therefore answering the question:
cat myfile.txt | while read -r line; do
printf %s "$line" | cut -f1
printf %s "$line" | cut -f2 | xargs wc -l | cut -d' ' -f1
done | paste - -
P.S. The use of cat here is useless, I know. But it's just a placeholder for the real command.

Creating an associative array from grep results in bash

I am using the following loc in a bash script to return unique grep result strings and their counts:
hitStrings="$(eval 'find "$DIR" -type f -print0 | xargs -0 grep -roh "\w*$searchWord\w*"' | sort | uniq -c)"
For example, if I have a $searchWord of "you", I could get the following results:
5 Kougyou 2 Layout 10 layouts 2330 you 859 your 17 yourself
My questions are:
How to I created an associative array containing the strings that are returned as the keys, and their counts as the values?
How do I omit the initial searchWord and its count from the associative array above (so no you-859 when I search for "you")?
Thanks
you have too many unnecessary layers, you can achieve the same with
$ grep -roh "\w*$key\w*" | sort | uniq -c > counts
and
$ declare -A counts; while read -r v k; do counts[$k]=$v; done < counts
$ echo ${counts["you"]}
Note that depends on the usage, you may get away by the file itself. Again searching for "you" count from the file
$ awk -v key="you" '$2==key{print $1}' counts
if the same name confuses you change one of them, or remove the temp file altogether by substitution.
$ declare -A counts
$ while read -r v k; do counts[$k]=$v; done < <(grep -roh "\w*$key\w*" | sort | uniq -c)
or with evil eval you can do
$ eval declare -A counts=( $(grep -roh "\w*$key\w*" | sort | uniq -c | awk '{print "["$2"]="$1}') )
but why? The while loop is a perfectly fine solution.

shell - Characters contained in both strings - edited

I want to compare two string variables and print the characters that are the same for both. I'm not really sure how to do this, I was thinking of using comm or diff but I'm not really sure the right parameters to print only matching characters. also they say they take in files and these are strings. Can anyone help?
Input:
a=$(echo "abghrsy")
b=$(echo "cgmnorstuvz")
Output:
"grs"
You don't need to do that much work to assign $a and $b shell variables, you can just...
a=abghrsy
b=cdgmrstuvz
Now, there is a classic computer science problem called the longest common subsequence1 that is similar to yours.
However, if you just want the common characters, one way would let Ruby do the work...
$ ruby -e "puts ('$a'.chars.to_a & '$b'.chars.to_a).join"
1. Not to be confused with the different longest common substring problem.
Use Character Classes with GNU Grep
The isn't a widely-applicable solution, but it fits your particular use case quite well. The idea is to use the first variable as a character class to match against the second string. For example:
a='abghrsy'
b='cgmnorstuvz'
echo "$b" | grep --only-matching "[$a]" | xargs | tr --delete ' '
This produces grs as you expect. Note that the use of xargs and tr is simply to remove the newlines and spaces from the output; you can certainly handle this some other way if you prefer.
Set Intersection
What you're really looking for is a set intersection, though. While you can "wing it" in the shell, you'd be better off using a language like Ruby, Python, or Perl to do this.
A Ruby One-Liner
If you need to integrate with an existing shell script, a simple Ruby one-liner that uses Bash variables could be called like this inside your current script:
a='abghrsy'
b='cgmnorstuvz'
ruby -e "puts ('$a'.split(//) & '$b'.split(//)).join"
A Ruby Script
You could certainly make things more elegant by doing the whole thing in Ruby instead.
string1_chars = 'abghrsy'.split //
string2_chars = 'cgmnorstuvz'.split //
intersection = string1_chars & string2_chars
puts intersection.join
This certainly seems more readable and robust to me, but your mileage may vary. At least now you have some options to choose from.
Nice question +1.
You can use an awk trick to get this done.
a=abghrsy
b=cdgmrstuvz
comm -12 <(echo $a|awk -F"\0" '{for (i=1; i<=NF; i++) print $i}') <(echo $b|awk -F"\0" '{for (i=1; i<=NF; i++) print $i}')|tr -d '\n'
OUTPUT:
grs
Note use of awk -F"\0" that breaks input string character by character into different awk fiedls. Rest is pretty straightforward use of comm and tr.
PS: If you input string is not sorted then you need to pipe awk's output to sort or do sort of an array inside awk.
UPDATE: awk only solution (without comm):
echo "$a;$b" | awk -F"\0" '{scnd=0; for (i=1; i<=NF; i++) {if ($i!=";") {if (!scnd) arr1[$i]=$i; else if ($i in arr1) arr2[$i]=$i} else scnd=1}} END { for (a in arr2) printf("%s", a)}'
This assumes semicolon doesn't appear in your string (you can use any other character if that's not the case).
UPDATE 2: I think simplest solution is using grep -o
(thanks to answer from #CodeGnome)
echo "$b" | grep -o "[$a]" | tr -d '\n'
Using gnu coreutils(inspired by #DigitalRoss)..
a="abghrsy"
b="cgmnorstuvz"
echo "$(comm -12 <(echo "$a" | fold -w1 | sort | uniq) <(echo "$b" | fold -w1 | sort | uniq) | tr -d '\n')"
will print grs. I assumed you only want uniq characters.
UPDATE:
Modified for dash..
#!/bin/dash
string1=$(printf "$1" | fold -w1 | sort | uniq | tr -d '\n');
string2=$(printf "$2" | fold -w1 | sort | uniq | tr -d '\n');
while [ "$string1" != "" ]; do
c1=$(printf '%s\n' "$string1" | cut -c 1-1 )
string2=$(printf "$2" | fold -w1 | sort | uniq | tr -d '\n');
while [ "$string2" != "" ]; do
c2=$(printf '%s\n' "$string2" | cut -c 1-1 )
if [ "$c1" = "$c2" ]; then
echo "$c1\c"
fi
string2=$(printf '%s\n' "$string2" | cut -c 2- )
done
string1=$(printf '%s\n' "$string1" | cut -c 2- )
done
echo;
Note: I am just a beginner. There might be a better way of doing this.

Simple method to shuffle the elements of an array in BASH shell?

I can do this in PHP but am trying to work within the BASH shell. I need to take an array and then randomly shuffle the contents and dump that to somefile.txt.
So given array Heresmyarray, of elements a;b;c;d;e;f; it would produce an output file, output.txt, which would contain elements f;c;b;a;e;d;
The elements need to retain the semicolon delimiter. I've seen a number of bash shell array operations but nothing that seems even close to this simple concept. Thanks for any help or suggestions!
The accepted answer doesn't match the headline question too well, though the details in the question are a bit ambiguous. The question asks about how to shuffle elements of an array in BASH, and kurumi's answer shows a way to manipulate the contents of a string.
kurumi nonetheless makes good use of the 'shuf' command, while siegeX shows how to work with an array.
Putting the two together yields an actual "simple method to shuffle the elements of an array in BASH shell":
$ myarray=( 'a;' 'b;' 'c;' 'd;' 'e;' 'f;' )
$ myarray=( $(shuf -e "${myarray[#]}") )
$ printf "%s" "${myarray[#]}"
d;b;e;a;c;f;
From the BashFaq
This function shuffles the elements of an array in-place using the Knuth-Fisher-Yates shuffle algorithm.
#!/bin/bash
shuffle() {
local i tmp size max rand
# $RANDOM % (i+1) is biased because of the limited range of $RANDOM
# Compensate by using a range which is a multiple of the array size.
size=${#array[*]}
max=$(( 32768 / size * size ))
for ((i=size-1; i>0; i--)); do
while (( (rand=$RANDOM) >= max )); do :; done
rand=$(( rand % (i+1) ))
tmp=${array[i]} array[i]=${array[rand]} array[rand]=$tmp
done
}
# Define the array named 'array'
array=( 'a;' 'b;' 'c;' 'd;' 'e;' 'f;' )
shuffle
printf "%s" "${array[#]}"
Output
$ ./shuff_ar > somefile.txt
$ cat somefile.txt
b;c;e;f;d;a;
If you just want to put them into a file (use redirection > )
$ echo "a;b;c;d;e;f;" | sed -r 's/(.[^;]*;)/ \1 /g' | tr " " "\n" | shuf | tr -d "\n"
d;a;e;f;b;c;
$ echo "a;b;c;d;e;f;" | sed -r 's/(.[^;]*;)/ \1 /g' | tr " " "\n" | shuf | tr -d "\n" > output.txt
If you want to put the items in array
$ array=( $(echo "a;b;c;d;e;f;" | sed -r 's/(.[^;]*;)/ \1 /g' | tr " " "\n" | shuf | tr -d " " ) )
$ echo ${array[0]}
e;
$ echo ${array[1]}
d;
$ echo ${array[2]}
a;
If your data has &#abcde;
$ echo "a;&#abcde;c;d;e;f;" | sed -r 's/(.[^;]*;)/ \1 /g' | tr " " "\n" | shuf | tr -d "\n"
d;c;f;&#abcde;e;a;
$ echo "a;&#abcde;c;d;e;f;" | sed -r 's/(.[^;]*;)/ \1 /g' | tr " " "\n" | shuf | tr -d "\n"
&#abcde;f;a;c;d;e;

How to split a string in shell and get the last field

Suppose I have the string 1:2:3:4:5 and I want to get its last field (5 in this case). How do I do that using Bash? I tried cut, but I don't know how to specify the last field with -f.
You can use string operators:
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5
This trims everything from the front until a ':', greedily.
${foo <-- from variable foo
## <-- greedy front trim
* <-- matches anything
: <-- until the last ':'
}
Another way is to reverse before and after cut:
$ echo ab:cd:ef | rev | cut -d: -f1 | rev
ef
This makes it very easy to get the last but one field, or any range of fields numbered from the end.
It's difficult to get the last field using cut, but here are some solutions in awk and perl
echo 1:2:3:4:5 | awk -F: '{print $NF}'
echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]'
Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:
$ echo "1:2:3:4:5" | grep -oE "[^:]+$"
5
Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($). -o only prints the matching part.
You could try something like this if you want to use cut:
echo "1:2:3:4:5" | cut -d ":" -f5
You can also use grep try like this :
echo " 1:2:3:4:5" | grep -o '[^:]*$'
One way:
var1="1:2:3:4:5"
var2=${var1##*:}
Another, using an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
var2=${var2[#]: -1}
Yet another with an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
count=${#var2[#]}
var2=${var2[$count-1]}
Using Bash (version >= 3.2) regular expressions:
var1="1:2:3:4:5"
[[ $var1 =~ :([^:]*)$ ]]
var2=${BASH_REMATCH[1]}
$ echo "a b c d e" | tr ' ' '\n' | tail -1
e
Simply translate the delimiter into a newline and choose the last entry with tail -1.
Using sed:
$ echo '1:2:3:4:5' | sed 's/.*://' # => 5
$ echo '' | sed 's/.*://' # => (empty)
$ echo ':' | sed 's/.*://' # => (empty)
$ echo ':b' | sed 's/.*://' # => b
$ echo '::c' | sed 's/.*://' # => c
$ echo 'a' | sed 's/.*://' # => a
$ echo 'a:' | sed 's/.*://' # => (empty)
$ echo 'a:b' | sed 's/.*://' # => b
$ echo 'a::c' | sed 's/.*://' # => c
There are many good answers here, but still I want to share this one using basename :
basename $(echo "a:b:c:d:e" | tr ':' '/')
However it will fail if there are already some '/' in your string.
If slash / is your delimiter then you just have to (and should) use basename.
It's not the best answer but it just shows how you can be creative using bash commands.
If your last field is a single character, you could do this:
a="1:2:3:4:5"
echo ${a: -1}
echo ${a:(-1)}
Check string manipulation in bash.
Using Bash.
$ var1="1:2:3:4:0"
$ IFS=":"
$ set -- $var1
$ eval echo \$${#}
0
echo "a:b:c:d:e"|xargs -d : -n1|tail -1
First use xargs split it using ":",-n1 means every line only have one part.Then,pring the last part.
Regex matching in sed is greedy (always goes to the last occurrence), which you can use to your advantage here:
$ foo=1:2:3:4:5
$ echo ${foo} | sed "s/.*://"
5
A solution using the read builtin:
IFS=':' read -a fields <<< "1:2:3:4:5"
echo "${fields[4]}"
Or, to make it more generic:
echo "${fields[-1]}" # prints the last item
for x in `echo $str | tr ";" "\n"`; do echo $x; done
improving from #mateusz-piotrowski and #user3133260 answer,
echo "a:b:c:d::e:: ::" | tr ':' ' ' | xargs | tr ' ' '\n' | tail -1
first, tr ':' ' ' -> replace ':' with whitespace
then, trim with xargs
after that, tr ' ' '\n' -> replace remained whitespace to newline
lastly, tail -1 -> get the last string
For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice choice to solve this problem.
$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'
From the pythonpy help: -x treat each row of stdin as x.
With that tool, it is easy to write python code that gets applied to the input.
Edit (Dec 2020):
Pythonpy is no longer online.
Here is an alternative:
$ echo "a:b:c:d:e" | python -c 'import sys; sys.stdout.write(sys.stdin.read().split(":")[-1])'
it contains more boilerplate code (i.e. sys.stdout.read/write) but requires only std libraries from python.

Resources