Bash - Sort a list of strings - bash

Would you please show me know how I can sort the following list (ascending oder A to Z) (or a list in general) with Bash?
I have been trying but still could not get the expected results:
my_list='a z t b e c'
And the result should be a list as well, as I will use it for Select Loop.
my_list='a b c e t z'
Thanks for your help!

You can use xargs twice along with a built in sort command to accomplish this.
$ my_list='a z t b e c'
$ my_list=$(echo $my_list | xargs -n1 | sort | xargs)
$ echo $my_list
a b c e t z

If you permit using the sort program (rather than program a sorting algorithm in bash) the answer could be like this:
my_list='a z t b e c'
echo "$my_list" | tr ' ' '\n' | sort | tr '\n' ' '
The result: a b c e t z'

Arrays are more suitable to store a list of things:
list=(a z t b "item with spaces" c)
sorted=()
while IFS= read -rd '' item; do
sorted+=("$item")
done < <(printf '%s\0' "${list[#]}" | sort -z)
With bash 4.4 you can utilize readarray -d:
list=(a z t b "item with spaces" c)
readarray -td '' sorted < <(printf '%s\0' "${list[#]}" | sort -z)
To use the array to create a simple menu with select:
select item in "${sorted[#]}"; do
# do something
done

Using GNU awk and controling array traversal order with PROCINFO["sorted_in"]:
$ echo -n $my_list |
awk 'BEGIN {
RS=ORS=" " # space as record seaparator
PROCINFO["sorted_in"]="#val_str_asc" # array value used as order
}
{
a[NR]=$0 # hash on NR to a
}
END {
for(i in a) # in given order
print a[i] # output values in a
print "\n" # happy ending
}'
a b c e t z

You can do this
my_list=($(sort < <(echo 'a z t b e c'|tr ' ' '\n') | tr '\n' ' ' | sed 's/ $//\
'))
This will create my_list which is an array.

Related

how to compare total in unix

i have a file simple.txt. with contents as below:
a b
c d
c d
I want to check which pair 'a b' or 'c d' has maximum occurrence? I have written this code which gives me output of individual occurrence of each word :
cat simple.txt | tr -cs '[:alnum:]' '[\n*]' | sort | uniq -c |
grep -E -i "\<a\>|\<b\>|\<c\>|\<d\>"
1 a
1 b
2 c
2 d
how can i total the result of this output? or can i write a different code?
If we can assume that each pair of letters is a complete line, one way to handle this would be to sort the lines, use the uniq utility to get a count of each unique line, and then reverse sort to get the count:
sort simple.txt | uniq -c | sort -rn
You may want to get rid of the empty lines using egrep:
egrep '\w' simple.txt | sort | uniq -c | sort -rn
Which should give you:
2 c d
1 a b
$ sort file |
uniq -c |
sort -nr > >(read -r count pair; echo "max count $count is for pair $pair")
sort, count numerically in descending order, read the first and print the results.
or all the above in one awk script...
$ awk '{c[$0]++}
END{n=asorti(c,ci); k=ci[n];
print "max count is " c[k] " for pair " k}' file
With single GNU awk command:
awk 'BEGIN{ PROCINFO["sorted_in"] = "#val_num_desc" }
NF{ a[$0]++ }
END{ for (i in a) { print "The pair with max occurence is:", i; break } }' file
The output:
The pair with max occurence is: c d
To get the pair that occurs most frequently:
$ sort <simple.txt | uniq -c | sort -nr | awk '{print "The pair with max occurence is",$2,$3; exit}'
The pair with max occurence is c d
This can be done entirely by awk and without any need for pipelines:
$ awk '{a[$0]++} END{for (x in a) if (a[x]>(max+0)) {max=a[x]; line=x}; print "The pair with max occurence is",line}' simple.txt
The pair with max occurence is c d

Unix - Find and Sort by part of filename

I have this:
D-T4-0.txt
A-2.txt
C-3-1.txt
B-X1-3.txt
E-2-4.txt
and I wish to order as follows:
D, C, A, B, E
I need to look at the last number in each (before .txt) D-0, C-1, A-2, B-3, E-4.
It's possible?
for i in `awk -F- '{print $NF}' file_name |sort`;do grep -- -$i file_name;done
here I am extracting last field using awk delimited by - and sorting
and using loop to grep on the sorted lines by adding - in front.
You can do it in a pipeline like this:
# List files
ls |
# Include the sorting key in the front as well
sed -E 's/^(.*)([0-9]+)\.txt$/\2\t\1\2.txt/' |
# Sort on the sorting key
sort -n |
# Remove the sorting key
cut -f2-
# Grab the first letter
cut -c1
Output:
D
C
A
B
E

I want to match two variables on a line and find out how many unique variables there are

I have a GraphViz file like this
graph {
edge [arrowhead = none]
A -> B
B -> C
B -> D [ label="foobar" ];
C -> A
}
and I want to find out, how many nodes there are, e.g. in this case (A, B, C, D) 4.
When I stick with 1-letter nodes, I use a script like this
grep -- -\> graph.gv | grep -o . | sort | grep [A-Z] | uniq | wc -l
but that fails should I need to use multi-letter nodes.
Ideally I'd have something that just matches
match $a -> $b ; echo $a\n $b\n | uniq | wc -l
but I have no idea how to do this via sed/grep/awk… whatever works best
As I understand, use awk with a hash that skips duplicates:
awk '{ arr[$1]++; arr[$3]++ } END { print length(arr) }' infile
It yields:
4
UPDATE: In awk there is a pattern section that lets you select a condition to process the line. As I can see in your edit, could be a not-match for curly braces, like:
awk '$0 !~ /[{}]/ { arr[$1]++; arr[$3]++ } END { print length(arr) }' infile

Counting unique strings where there's a single string per line in bash

Given input file
z
b
a
f
g
a
b
...
I want to output the number of occurrences of each string, for example:
z 1
b 2
a 2
f 1
g 1
How can this be done in a bash script?
You can sort the input and pass to uniq -c:
$ sort input_file | uniq -c
2 a
2 b
1 f
1 g
1 z
If you want the numbers on the right, use awk to switch them:
$ sort input_file | uniq -c | awk '{print $2, $1}'
a 2
b 2
f 1
g 1
z 1
Alternatively, do the whole thing in awk:
$ awk '
{
++count[$1]
}
END {
for (word in count) {
print word, count[word]
}
}
' input_file
f 1
g 1
z 1
a 2
b 2
cat text | sort | uniq -c
should do the job
Try:
awk '{ freq[$1]++; } END{ for( c in freq ) { print c, freq[c] } }' test.txt
Where test.txt would be your input file.
Here's a bash-only version (requires bash version 4), using an associative array.
#! /bin/bash
declare -A count
while read val ; do
count[$val]=$(( ${count[$val]} + 1 ))
done < your_intput_file # change this as needed
for key in ${!count[#]} ; do
echo $key ${count[$key]}
done
This might work for you:
cat -n file |
sort -k2,2 |
uniq -cf1 |
sort -k2,2n |
sed 's/^ *\([^ ]*\).*\t\(.*\)/\2 \1/'
This output the number of occurrences of each string in the order in which they appear.
You can use sort filename | uniq -c.
Have a look at the Wikipedia page on uniq.

Count number of names starts with particular character in file

i have the following file::
FirstName, FamilyName, Address, PhoneNo
the file is sorted according to the family name, how can i count the number of family names starts with a particular character ??
output should look like this ::
A: 2
B: 1
...
??
With awk:
awk '{print substr($2, 1, 1)}' file|
uniq -c|
awk '{print $2 ": " $1}'
OK, no awk. Here's with sed:
sed s'/[^,]*, \(.\).*/\1/' file|
uniq -c|
sed 's/.*\([0-9]\)\+ \([a-zA-Z]\)\+/\2: \1/'
OK, no sed. Here's with python:
import csv
r = csv.reader(open(file_name, 'r'))
d = {}
for i in r:
d[i[1][1]] = d.get(i[1][1], 0) + 1
for (k, v) in d.items():
print "%s: %s" % (k, v)
while read -r f l r; do echo "$l"; done < inputfile | cut -c 1 | sort | uniq -c
Just the Shell
#! /bin/bash
##### Count occurance of familyname initial
#FirstName, FamilyName, Address, PhoneNo
exec <<EOF
Isusara, Ali, Someplace, 022-222
Rat, Fink, Some Hole, 111-5555
Louis, Frayser, whaterver, 123-1144
Janet, Hayes, whoever St, 111-5555
Mary, Holt, Henrico VA, 222-9999
Phillis, Hughs, Some Town, 711-5525
Howard, Kingsley, ahahaha, 222-2222
EOF
while read first family rest
do
init=${family:0:1}
[ -n "$oinit" -a $init != "$oinit" ] && {
echo $oinit : $count
count=0
}
oinit=$init
let count++
done
echo $oinit : $count
Running
frayser#gentoo ~/doc/Answers/src/SH/names $ sh names.sh
A : 1
F : 2
H : 3
K : 1
frayser#gentoo ~/doc/Answers/src/SH/names $
To read from a file, remove the here document, and run:
chmod +x names.sh
./names.sh <file
The "hard way" — no use of awk or sed, exactly as asked for. If you're not sure what any of these commands mean, you should definitely look at the man page for each one.
INTERMED=`mktemp` # Creates a temporary file
COUNTS_L=`mktemp` # A second...
COUNTS_R=`mktemp` # A third...
cut -d , -f 2 | # Extracts the FamilyName field only
tr -d '\t ' | # Deletes spaces/tabs
cut -c 1 | # Keeps only the first character
# on each line
tr '[:lower:]' '[:upper:]' | # Capitalizes all letters
sort | # Sorts the list
uniq -c > $INTERMED # Counts how many of each letter
# there are
cut -c1-7 $INTERMED | # Cuts out the LHS of the temp file
tr -d ' ' > $COUNTS_R # Must delete the padding spaces though
cut -c9- $INTERMED > $COUNTS_L # Cut out the RHS of the temp file
# Combines the two halves into the final output in reverse order
paste -d ' ' /dev/null $COUNTS_R | paste -d ':' $COUNTS_L -
rm $INTERMED $COUNTS_L $COUNTS_R # Cleans up the temp files
awk one-liner:
awk '
{count[substr($2,1,1)]++}
END {for (init in count) print init ": " count[init]}
' filename
Prints the how many words start with each letter:
for i in {a..z}; do echo -n "$i:"; find path/to/folder -type f -exec sed "s/ /\n/g" {} \; | grep ^$i | wc -c | awk '{print $0}'; done

Resources