KornShell Sort Array of Integers - shell

Is there a command in KornShell (ksh) scripting to sort an array of integers? In this specific case, I am interested in simplicity over efficiency. For example if the variable $UNSORTED_ARR contained values "100911, 111228, 090822" and I wanted to store the result in $SORTED_ARR

Is it actually an indexed array or a list in a string?
Array:
UNSORTED_ARR=(100911 111228 090822)
SORTED_ARR=($(printf "%s\n" ${UNSORTED_ARR[#]} | sort -n))
String:
UNSORTED_ARR="100911, 111228, 090822"
SORTED_ARR=$(IFS=, printf "%s\n" ${UNSORTED_ARR[#]} | sort -n | sed ':a;$s/\n/,/g;N;ba')
There are several other ways to do this, but the principle is the same.
Here's another way for a string using a different technique:
set -s -- ${UNSORTED_ARR//,}
SORTED_ARR=$#
SORTED_ARR=${SORTED_ARR// /, }
Note that this is a lexicographic sort so you would see this kind of thing when the numbers don't have leading zeros:
$ set -s -- 10 2 1 100 20
$ echo $#
1 10 100 2 20

If I take that out then it works but I can't loop through it (because its a list of strings now) – pws5068 Mar 4 '11 at 21:01
Do this:
\# create sorted array
set **-s** -A $#

Related

How can I create a one-to-one relationship between two loops in bash?

In a bash script, I want to create a one-to-one relationship between two for loops that each have variables defined as a sequence. For example, I want something like
for g in `seq 11 1 21`;do
for i in `seq 1 1 10`;do
cat >$i.txt <<EOF
this one is $g.
EOF
done
done
to result in ten files (1.txt, 2.txt, 3.txt, etc). 1.txt would contain "this one is 11." 2.txt would contain "this one is 12." Etc.
The above example is permutative, where each value of g acts on each value of i. Is there a way to make it so only one value of g acts on only one value of i in a corresponding order (ie 1 corresponds to 11, 2 corresponds to 12, etc)?
Any help is greatly appreciated. Thank you.
Before I answer, there's a critical problem with the question: the two sequences have different lengths (there are eleven g values, but only ten i values). Either one of g's values must be skipped, or something filled in as the extra value for i. For my answer I'll assume g should actually run from 11 to 20, not 21.
If you don't want all combinations, then you only want one loop; the trick is to make a single loop iterate through both sequences simultaneously. There are a couple of ways to do this in bash. One is to store both sequences as arrays, and then iterate over their indexes:
g_array=( {11..20} ) # Could also use g_array=( $(seq 11 1 20) ) here
i_array=( {1..10} )
for index in "${!g_array[#]}"; do # The ${!arr[2]} gets the *indexes* of the array
cat >"${i_array[index]}.txt" <<EOF
this one is ${g_array[index]}.
EOF
done
Alternately, since these are just numeric sequences, you can use the for ((init; test; step)) structure to do it:
for ((g=11,i=1; g<=20; g++,i++)); do # Note: semicolons between parts, commas between things that happen together
cat >"$i.txt" <<EOF
this one is $g.
EOF
done
arr1=( $(seq 11 21 ))
arr2=( $(seq 1 10 ))
for idx in $(seq 0 ${#arr2[*]})
do
file="/tmp/"${arr2[$idx]}".txt"
echo "This one is ${arr1[$idx]} " > $file
done
First,both the sequences are assigned to 2 arrays. And then as you iterate over the length of 2nd array, you frame the string and write to file.

How can I find the missing integers in a unique and sequential list (one per line) in a unix terminal?

Suppose I have a file as follows (a sorted, unique list of integers, one per line):
1
3
4
5
8
9
10
I would like the following output (i.e. the missing integers in the list):
2
6
7
How can I accomplish this within a bash terminal (using awk or a similar solution, preferably a one-liner)?
Using awk you can do this:
awk '{for(i=p+1; i<$1; i++) print i} {p=$1}' file
2
6
7
Explanation:
{p = $1}: Variable p contains value from previous record
{for ...}: We loop from p+1 to the current row's value (excluding current value) and print each value which is basically the missing values
Using seq and grep:
seq $(head -n1 file) $(tail -n1 file) | grep -vwFf file -
seq creates the full sequence, grep removes the lines that exists in the file from it.
perl -nE 'say for $a+1 .. $_-1; $a=$_'
Calling no external program (if filein contains the list of numbers):
#!/bin/bash
i=0
while read num; do
while (( ++i<num )); do
echo $i
done
done <filein
To adapt choroba's clever answer for my own use case, I needed my sequence to deal with zero-padded numbers.
The -w switch to seq is the magic here - it automatically pads the first number with the necessary number of zeroes to keep it aligned with the second number:
-w, --equal-width equalize width by padding with leading zeroes
My integers go from 0 to 9999, so I used the following:
seq -w 0 9999 | grep -vwFf "file.txt"
...which finds the missing integers in a sequence from 0000 to 9999. Or to put it back into the more universal solution in choroba's answer:
seq -w $(head -n1 "file.txt") $(tail -n1 "file.txt") | grep -vwFf "file.txt"
I didn't personally find the - in his answer was necessary, but there may be usecases which make it so.
Using Raku (formerly known as Perl_6)
raku -e 'my #a = lines.map: *.Int; say #a.Set (^) #a.minmax.Set;'
Sample Input:
1
3
4
5
8
9
10
Sample Output:
Set(2 6 7)
I'm sure there's a Raku solution similar to #JJoao's clever Perl5 answer, but in thinking about this problem my mind naturally turned to Set operations.
The code above reads lines into the #a array, mapping each line so that elements in the #a array are Ints, not strings. In the second statement, #a.Set converts the array to a Set on the left-hand side of the (^) operator. Also in the second statement, #a.minmax.Set converts the array to a second Set, on the right-hand side of the (^) operator, but this time because the minmax operator is used, all Int elements from the min to max are included. Finally, the (^) symbol is the symmetric set-difference (infix) operator, which finds the difference.
To get an unordered whitespace-separated list of missing integers, replace the above say with put. To get a sequentially-ordered list of missing integers, add the explicit sort below:
~$ raku -e 'my #a = lines.map: *.Int; .put for (#a.Set (^) #a.minmax.Set).sort.map: *.key;' file
2
6
7
The advantage of all Raku code above is that finding "missing integers" doesn't require a "sequential list" as input, nor is the input required to be unique. So hopefully this code will be useful for a wide variety of problems in addition to the explicit problem stated in the Question.
OTOH, Raku is a Perl-family language, so TMTOWTDI. Below, a #a.minmax array is created, and grepped so that none of the elements of #a are returned (none junction):
~$ raku -e 'my #a = lines.map: *.Int; .put for #a.minmax.grep: none #a;' file
2
6
7
https://docs.raku.org/language/setbagmix
https://docs.raku.org/type/Junction
https://raku.org

Generate ID number from a name in bash

Currently I have a bunch of names that are tied to numbers, for example:
Joe Bloggs - 17
John Smith - 23
Paul Smith - 24
Joe Bloggs - 32
Using the name and the number I'd like to generate a random/unique ID made of 4 numbers that also ends with the initial number.
So for example, Joe Bloggs and 17 would make something random/unique like: xxxx17.
Is this possible in bash? Would it be better in some other language?
This would be used on debian and darwin based systems.
It is impossible to ensure than 4-digit hash (checksum) would be unique for a set of 10 character long names.
As an alternative, you can try
file="./somefile"
paste -d"\0\n" <(seq -f "%04g" 9999 | sort -R | head -$(grep -c '' "$file")) <(grep -oP '\d+' "$file")
for better readability
paste -d"\0\n" <(
seq -f "%04g" 9999 | gsort -R | head -$(grep -c '' "$file")
) <(
grep -oP '\d+' "$file"
)
for your input produces something like:
010817
161523
748024
269032
All lines are in the form RRRRXX, where:
the RRRR is an guaranteed unique and random number (from the range 0001 up to 9999)
the XX is the number from your input
decomposition:
seq produces 9999 4-digit numbers (ofc, each number is unique)
sort -R sorts the lines in random order (based on their hash, so get unique random numbers)
head - from the random list show only first N lines, where the N is the number of lines in your file,
the number of lines is counted by grep -c '' (better than wc -l)
the grep -oP filters the numbers from your file
finally the paste combines the two inputs to the final output
the <(..) <(..) is process substitution
Each name, after you add their number, becomes unique already unless there are two Joe Bloggs 17. In your case, there are two Joe Bloggs, one with 17 and 32. Put those together, you have uniqueness "Joe Bloggs 17" and "Joe Bloggs 32" are not the same. Using this, you can simply assign a number to each name + number pair and remember that number in an associative array (dictionary). No need to be random. When you find a name that isn't already in the dictionary, just keep incrementing the number and, then, associate the new number with the name. If uniqueness is the only goal, then you are in good shape for 10,000 people.
Python is a great language for this, but you can make associative arrays in BASH too.
You can get very close to doing exactly what you want using the random string generated by $(date +%N) and then selecting 4 digits to use as the first for characters in the new ID. You can choose from the beginning if you want IDs that are closer together, or from the mid part of the string for more randomness. After selecting your random 4, then just keep track of the ones used in an array and check against the array as each new ID is assigned. This overhead is negligible for 10,000 or so IDs:
#!/bin/bash
declare -a used4=0 # array to hold IDs you have assigned
declare -i dupid=0 # a flag to prompt regeneration in case of a dup
while read -r line || [ -n "$line" ]; do
name=${line% -*}
id2=${line##* }
while [ $dupid -eq 0 ]; do
ns=$(date +%N) # fill variable with nanoseconds
fouri=${ns:4:4} # take 4 integers (mid 4 for better randomness)
# test for duplicate (this is BASH only test - use loop if portability needed)
[[ "$fouri" =~ "${used4[#]}" ]] && continue
newid="${fouri}${id2}" # contatinate 4ints + orig 2 digit id
used4+=( "$fouri" ) # add 4ints to used4 array
dupid=1
done
dupid=0 # reset flag
printf "%s => %s\n" "$line" "$newid"
done<"$1"
output:
$ bash fourid.sh dat/nameid.dat
Joe Bloggs - 17 => 762117
John Smith - 23 => 603623
Paul Smith - 24 => 210424
Joe Bloggs - 32 => 504732

Get common values in 2 arrays in shell scripting [duplicate]

This question already has answers here:
Intersection of two lists in Bash
(5 answers)
Closed 3 years ago.
I have an
array1 = (20,30,40,50)
array2 = (10,20,30,80,100,110,40)
I have to get the common values from these 2 arrays in my array 3 like:
array3 = (20,30,40)
in ascending sorted order.
Shell and standard Unix utilities are good at dealing with text files.
In that realm, arrays would be text files whose elements are the lines.
To find the common part between two such arrays, there's the standard comm command. comm expects alphabetically sorted input though.
So, if you have two files A and B containing the elements of those two arrays, one per line (which also means the array elements can't contain newline characters), you can find the intersection with
comm -12 <(sort A) <(sort B)
If you want to start with bash arrays (but using arrays in shells is generally a good indication that you're using the wrong tool for your task), you can convert back and forth between the bash arrays and our text file arrays of lines with printf '%s\n' and word splitting:
array_one=(20 30 40 50)
array_two=(10 20 30 80 100 110 40)
IFS=$'\n'; set -f
intersection=($(comm -12 <(
printf '%s\n' "${array_one[#]}" | sort) <(
printf '%s\n' "${array_two[#]}" | sort)))
You almost certainly should not be using shell for this so here's ONE awk solution to your specific problem:
awk 'BEGIN{
split("20,30,40,50",array1,/,/)
split("10,20,30,80,100,110,40",array2,/,/)
for (i=1;i in array1;i++)
for (j=1;j in array2;j++)
if (array1[i] == array2[j])
array3[++k] = array1[i]
for (k=1; k in array3; k++)
printf "array3[%d] = %d\n",k,array3[k]
}'
array3[1] = 20
array3[2] = 30
array3[3] = 40
and if you tell us what you're really trying to do you can get a lot more help.
A pure bash solution using arrays:
#!/bin/bash
array1=(20,30,40,50)
array2=(10,20,30,80,100,110,40)
IFS=,
for i in $array1 $array2;{ ((++tmp[i]));}
for i in ${!tmp[*]};{ [ ${tmp[i]} -gt 1 ] && array3+=($i);}
echo ${array3[*]}
Output
20 30 40
As array3 is not an associative array, the indexes comes in ascending order using ${!array[*]} notation. If You need comma separated list as input, use echo "${array3[*]}".
It can be used if the source elements are integers. It works only if each of the source arrays contain unique numbers..
Here's a solution with standard command line tools (sort and join):
join <(printf %s\\n "${array1[#]}" | sort -u) \
<(printf %s\\n "${array2[#]}" | sort -u) | sort -n
join requires its inputs to be sorted, and does not recognize numerical sort order. Consequently, I sort both lists in the default collation order, join them, and then resort the result numerically.
I also assumed that you'd created the arrays really as arrays, i.e.:
array1=(20 30 40 50)
I think the rest is more or less self-evident, possibly with the help of help printf and man bash.
maybe you can use perl for try.
#!/bin/perl
use warnings;
use strict;
my #array1 = (20,30,40,50);
my #array2 = (10,20,30,80,100,110,40);
my #array3 = ();
foreach my $x (#array1) {
# body...
if (grep(/$x/, #array2)){
print "found $x\n";
#array3=(#array3,$x);
};
}
print #array3
In addition to any of these fine answers, it seems that you also want to sort your array (containing the answer) in ascending order.
You can do that in a number of different ways, including this:
readarray array3 <<<"$(printf "%s\n" "${array3[#]}" | sort -n)"
This method also allows you to filter out duplicate values:
readarray array3 <<<"$(printf "%s\n" "${array3[#]}" | sort -n | uniq)"
And for the sake of the exercise, here's yet another way of solving it:
#!/bin/bash
array1=(20 30 40 50)
array2=(10 20 30 80 100 110 40)
declare -a array3
#sort both arrays
readarray array1 <<<"$(printf "%s\n" "${array1[#]}" | sort -n)"
readarray array2 <<<"$(printf "%s\n" "${array2[#]}" | sort -n)"
# look for values
i2=0
for i1 in ${!array1[#]}; do
while (( i2 < ${#array2[#]} && ${array1[$i1]} > ${array2[$i2]} )); do (( i2++ )); done
[[ ${array1[$i1]} == ${array2[$i2]} ]] && array3+=(${array1[$i1]})
done
echo ${array3[#]}
Consider using python:
In [6]: array1 = (20,30,40,50)
In [7]: array2 = (10,20,30,80,100,110,40)
In [8]: set(array1) & set(array2)
Out[8]: set([40, 20, 30])

Array intersection in Bash [duplicate]

This question already has answers here:
Intersection of two lists in Bash
(5 answers)
Closed 6 years ago.
How do you compare two arrays in Bash to find all intersecting values?
Let's say:
array1 contains values 1 and 2
array2 contains values 2 and 3
I should get back 2 as a result.
My own answer:
for item1 in $array1; do
for item2 in $array2; do
if [[ $item1 = $item2 ]]; then
result=$result" "$item1
fi
done
done
I'm looking for alternate solutions as well.
The elements of list 1 are used as regular expression looked up in list2 (expressed as string: ${list2[*]} ):
list1=( 1 2 3 4 6 7 8 9 10 11 12)
list2=( 1 2 3 5 6 8 9 11 )
l2=" ${list2[*]} " # add framing blanks
for item in ${list1[#]}; do
if [[ $l2 =~ " $item " ]] ; then # use $item as regexp
result+=($item)
fi
done
echo ${result[#]}
The result is
1 2 3 6 8 9 11
Taking #Raihan's answer and making it work with non-files (though FDs are created)
I know it's a bit of a cheat but seemed like good alternative
Side effect is that the output array will be lexicographically sorted, hope thats okay
(also don't kno what type of data you have, so I just tested with numbers, there may be additional work needed if you have strings with special chars etc)
result=($(comm -12 <(for X in "${array1[#]}"; do echo "${X}"; done|sort) <(for X in "${array2[#]}"; do echo "${X}"; done|sort)))
Testing:
$ array1=(1 17 33 99 109)
$ array2=(1 2 17 31 98 109)
result=($(comm -12 <(for X in "${array1[#]}"; do echo "${X}"; done|sort) <(for X in "${array2[#]}"; do echo "${X}"; done|sort)))
$ echo ${result[#]}
1 109 17
p.s. I'm sure there was a way to get the array to out one value per line w/o the for loop, I just forget it (IFS?)
Your answer won't work, for two reasons:
$array1 just expands to the first element of array1. (At least, in my installed version of Bash that's how it works. That doesn't seem to be a documented behavior, so it may be a version-dependent quirk.)
After the first element gets added to result, result will then contain a space, so the next run of result=$result" "$item1 will misbehave horribly. (Instead of appending to result, it will run the command consisting of the first two items, with the environment variable result being set to the empty string.) Correction: Turns out, I was wrong about this one: word-splitting doesn't take place inside assignments. (See comments below.)
What you want is this:
result=()
for item1 in "${array1[#]}"; do
for item2 in "${array2[#]}"; do
if [[ $item1 = $item2 ]]; then
result+=("$item1")
fi
done
done
If it was two files (instead of arrays) you were looking for intersecting lines, you could use the comm command.
$ comm -12 file1 file2
Now that I understand what you mean by "array", I think -- first of all -- that you should consider using actual Bash arrays. They're much more flexible, in that (for example) array elements can contain whitespace, and you can avoid the risk that * and ? will trigger filename expansion.
But if you prefer to use your existing approach of whitespace-delimited strings, then I agree with RHT's suggestion to use Perl:
result=$(perl -e 'my %array2 = map +($_ => 1), split /\s+/, $ARGV[1];
print join " ", grep $array2{$_}, split /\s+/, $ARGV[0]
' "$array1" "$array2")
(The line-breaks are just for readability; you can get rid of them if you want.)
In the above Bash command, the embedded Perl program creates a hash named %array2 containing the elements of the second array, and then it prints any elements of the first array that exist in %array2.
This will behave slightly differently from your code in how it handles duplicate values in the second array; in your code, if array1 contains x twice and array2 contains x three times, then result will contain x six times, whereas in my code, result will contain x only twice. I don't know if that matters, since I don't know your exact requirements.

Resources