How to sort 2 arrays in bash

How to sort 2 arrays in bash - bash

I want to sort 2 arrays at the same time. The arrays are the following: wordArray and numArray. Both are global.
These 2 arrays contain all the words (without duplicates) and the number of the appearances of each word from a text file.
Right now I am using Bubble Sort to sort both of them at the same time:
# Bubble Sort function
function bubble_sort {
local max=${#numArray[#]}
size=${#numArray[#]}
while ((max > 0))
do
local i=0
while ((i < max))
do
if [ "$i" != "$(($size-1))" ]
then
if [ ${numArray[$i]} \< ${numArray[$((i + 1))]} ]
then
local temp=${numArray[$i]}
numArray[$i]=${numArray[$((i + 1))]}
numArray[$((i + 1))]=$temp
local temp2=${wordArray[$i]}
wordArray[$i]=${wordArray[$((i + 1))]}
wordArray[$((i + 1))]=$temp2
fi
fi
((i += 1))
done
((max -= 1))
done
}
#Calling Bubble Sort function
bubble_sort "${numArray[#]}" "${wordArray[#]}"
But for some reason it won't sort them properly when large arrays are in place.
Does anyone knows what's wrong with it or an other approach to sort the words with the corresponding number of appearance with or without arrays?
This:
wordArray = (because, maybe, why, the)
numArray = (5, 12, 20, 13)
Must turn to this:
wordArray = (why, the, maybe, because)
numArray = (20, 13, 12, 5)
Someone recommended to write the two arrays side by side in a text file and sort the file.
How will it work for this input:
1 Arthur
21 Zebra
to turn to this output:
21 Zebra
1 Arthur

Assuming the arrays do not contain tab character or newline character, how about:
#!/bin/bash
wordArray=(why the maybe because)
numArray=(20 13 12 5)
tmp1=$(mktemp tmp.XXXXXX) # file to be sorted
tmp2=$(mktemp tmp.XXXXXX) # sorted result
for (( i = 0; i < ${#wordArray[#]}; i++ )); do
echo "${numArray[i]}"$'\t'"${wordArray[i]}" # write the number and word delimited by a tab character
done > "$tmp1"
sort -nrk1,1 "$tmp1" > "$tmp2" # sort the file by number in descending order
while IFS=$'\t' read -r num word; do # read the lines splitting by the tab character
numArray_sorted+=("$num") # add the number to the array
wordArray_sorted+=("$word") # add the word to the array
done < "$tmp2"
rm -- "$tmp1" # unlink the temp file
rm -- "$tmp2" # same as above
echo "${wordArray_sorted[#]}" # same as above
echo "${numArray_sorted[#]}" # see the result
Output:
why the maybe because
20 13 12 5
If you prefer not to create temp files, here is the process substitution version, which will run faster without writing/reading temp files.
#!/bin/bash
wordArray=(why the maybe because)
numArray=(20 13 12 5)
while IFS=$'\t' read -r num word; do
numArray_sorted+=("$num")
wordArray_sorted+=("$word")
done < <(
sort -nrk1,1 < <(
for (( i = 0; i < ${#wordArray[#]}; i++ )); do
echo "${numArray[i]}"$'\t'"${wordArray[i]}"
done
)
)
echo "${wordArray_sorted[#]}"
echo "${numArray_sorted[#]}"
Or simpler (using the suggestion by KamilCuk):
#!/bin/bash
wordArray=(why the maybe because)
numArray=(20 13 12 5)
while IFS=$'\t' read -r num word; do
numArray_sorted+=("$num")
wordArray_sorted+=("$word")
done < <(
paste <(printf "%s\n" "${numArray[#]}") <(printf "%s\n" "${wordArray[#]}") | sort -nrk1,1
)
echo "${wordArray_sorted[#]}"
echo "${numArray_sorted[#]}"

You need numeric sort for the numbers. You can sort an array like this:
mapfile -t wordArray <(printf '%s\n' "${wordArray[#]}" | sort -n)
But what you actually need is something like:
for num in "${numArray[#]}"; do
echo "$num: ${wordArray[j++]}"
done |
sort -n k1,1
But, earlier in the process, you should have used only one array, where the word and frequency (or vice versa) are key value pairs. Then they always have a direct relationship, and can be printed similarly to the for loop above.

Related

How to find the max value in array without using sort cmd in shell script

I have a array=(4,2,8,9,1,0) and I don't want to sort the array to find the highest number in the array because I need to get the index value of the highest number as it is, so I can use it for further reference.
Expected output:
9 index value => 3
Can somebody help me to achieve this?

Slight variation with a loop using the ternary conditional operator and no assumptions about range of values:
arr=(4 2 8 9 1 0)
max=${arr[0]}
maxIdx=0
for ((i = 1; i < ${#arr[#]}; ++i)); do
maxIdx=$((arr[i] > max ? i : maxIdx))
max=$((arr[i] > max ? arr[i] : max))
done
printf '%s index => values %s\n' "$maxIdx" "$max"
The only assumption is that array indices are contiguous. If they aren't, it becomes a little more complex:
arr=([1]=4 [3]=2 [5]=8 [7]=9 [9]=1 [11]=0)
indices=("${!arr[#]}")
maxIdx=${indices[0]}
max=${arr[maxIdx]}
for i in "${indices[#]:1}"; do
((arr[i] <= max)) && continue
maxIdx=$i
max=${arr[i]}
done
printf '%s index => values %s\n' "$maxIdx" "$max"
This first gets the indices into a separate array and sets the initial maximum to the value corresponding to the first index; then, it iterates over the indices, skipping the first one (the :1 notation), checks if the current element is a new maximum, and if it is, stores the index and the maximum.

Without using sort, you can use a simple loop in shell. Here is a sample bash code:
#!/usr/bin/env bash
array=(4 2 8 9 1 0)
for i in "${!array[#]}"; do
[[ -z $max ]] || (( ${array[i]} > $max )) && { max="${array[i]}"; maxind=$i; }
done
echo "max=$max, maxind=$maxind"
max=9, maxind=3

arr=(4 2 8 9 1 0)
paste <(printf "%s\n" "${arr[#]}") <(seq 0 $((${#arr[#]} - 1)) ) |
sort -k1,1 |
tail -n1 |
sed 's/\t/ index value => /'
Print each array element on a newline with printf
Print array indexes with seq
Join both streams using paste
Numerically sort the lines using the first fields (ie. array value) sort
Print the last line tail -n1
The array value and result is separated by a tab. Substitute tab with the output string you want using sed. One could use ex. cut -d, -f2 to get only the index or use read a b <( ... ) to read the numbers into variables, etc.

Using Perl
$ export data=4,2,8,9,1,0
$ echo $data | perl -ne ' map{$i++; if($_>$x) {$x=$_;$id=$i} } split(","); print "max=$x", " index=",--${id},"\n" '
max=9 index=3
$

Bash - creating multiple lists files from one file list

The directory contains x files. I get a list of files. I want to split this list into a larger number of n lists, which would have a limited number of elements.
Examples:
files=$( ls -d /*.csv | sort )
echo $files
/100347_111111.csv
/111301_111111.csv
/111301_222222.csv
/256467_111111.csv
/256467_222222.csv
/256467_333333.csv
/256467_444444.csv
/256467_555555.csv
/256467_666666.csv
/256467_777777.csv
From the resulting list I want to create 3 lists. The lists must not have more than 4 elements. The first list should be composed of the first 4 elements from the files, the other list should contain the following 4 elements, the third list should contain the remaining elements.
n1
/100347_111111.csv
/111301_111111.csv
/111301_222222.csv
/256467_111111.csv
n2
/256467_222222.csv
/256467_333333.csv
/256467_444444.csv
/256467_555555.csv
n3
/256467_666666.csv
/256467_777777.csv
Does someone can help, how to create lists as described above?

FILES=( `ls -d * | sort`)
echo "${FILES[#]:0:4}"
Loop of 4
count=4
for i in $(seq 0 $(( ${#FILES[#]}/$count - 1 ))) ;
do
echo "######## Set" $i "#######";
echo "${FILES[#]:$(( i * $count )):$count }" ;
done

An example which may be reinventing the wheel:
\ls -1 |
{
n=0
cr=""
pack=1
while read -r l
do
mod=$(($n % 4))
if [[ "$mod" == "0" ]]
then
echo -e "$cr"n"$pack:"
fi
echo $l
n=$((n + 1))
pack=$((pack + 1))
cr="\n";
done
}
Here, we use the modulo operator to check if a new pack is about to be displayed (n modulo 4 = 0 if n is a multiple of 4).
I used curly brackets {} to put var initialization and the while loop in the same environment (other wise while won't be able to retrieve n, pack and cr variables).

Try split:
ls -d /*.csv | sort | split -l 4 -d
this will create files x01 x02... containing maximum 4 lines.

How to sum a row of numbers from text file-- Bash Shell Scripting

I'm trying to write a bash script that calculates the average of numbers by rows and columns. An example of a text file that I'm reading in is:
1 2 3 4 5
4 6 7 8 0
There is an unknown number of rows and unknown number of columns. Currently, I'm just trying to sum each row with a while loop. The desired output is:
1 2 3 4 5 Sum = 15
4 6 7 8 0 Sum = 25
And so on and so forth with each row. Currently this is the code I have:
while read i
do
echo "num: $i"
(( sum=$sum+$i ))
echo "sum: $sum"
done < $2
To call the program it's stats -r test_file. "-r" indicates rows--I haven't started columns quite yet. My current code actually just takes the first number of each column and adds them together and then the rest of the numbers error out as a syntax error. It says the error comes from like 16, which is the (( sum=$sum+$i )) line but I honestly can't figure out what the problem is. I should tell you I'm extremely new to bash scripting and I have googled and searched high and low for the answer for this and can't find it. Any help is greatly appreciated.

You are reading the file line by line, and summing line is not an arithmetic operation. Try this:
while read i
do
sum=0
for num in $i
do
sum=$(($sum + $num))
done
echo "$i Sum: $sum"
done < $2
just split each number from every line using for loop. I hope this helps.

Another non bash way (con: OP asked for bash, pro: does not depend on bashisms, works with floats).
awk '{c=0;for(i=1;i<=NF;++i){c+=$i};print $0, "Sum:", c}'

Another way (not a pure bash):
while read line
do
sum=$(sed 's/[ ]\+/+/g' <<< "$line" | bc -q)
echo "$line Sum = $sum"
done < filename

Using the numsum -r util covers the row addition, but the output format needs a little glue, by inefficiently paste-ing a few utils:
paste "$2" \
<(yes "Sum =" | head -$(wc -l < "$2") ) \
<(numsum -r "$2")
Output:
1 2 3 4 5 Sum = 15
4 6 7 8 0 Sum = 25
Note -- to run the above line on a given file foo, first initialize $2 like so:
set -- "" foo
paste "$2" <(yes "Sum =" | head -$(wc -l < "$2") ) <(numsum -r "$2")

increment a letter sequence to represent a whole number where a=0 and z=25

I have tried several different search terms but have not found exactly what I want, I am sure there is already an answer for this so please point me to it if so.
I would like to understand how to increment a letter code given a standard number convention in a bash script.
Starting with AAAA=0 or with leading zerosAAAA=000000 (26x26x26x26) I would like to increment the value with a a positive single digit each time, so aaab=000001,aaac=000002 and aaba=000026 and aaaca=000052 etc.
Thanks Art!

I guess this is what you want
echo {a..z}{a..z}{a..z}{a..z} | tr ' ' '\n' | nl
will be too long, perhaps test with this first
echo {a..z}{a..z} | tr ' ' '\n' | nl
if you don't need the line numbers remove last pipe and nl
If you need the output in xxxx=nnnnnn format, you can use awk
echo {a..z}{a..z}{a..z}{a..z} | tr ' ' '\n' | awk '{printf "%s=%06d\n", $0, NR-1}'
aaaa=000000
aaab=000001
aaac=000002
aaad=000003
aaae=000004
aaaf=000005
aaag=000006
aaah=000007
aaai=000008
aaaj=000009
...
zzzv=456971
zzzw=456972
zzzx=456973
zzzy=456974
zzzz=456975

Fast
If you are aiming for speed and simplicity:
#!/bin/bash
i=0
for text in {a..z}{a..z}{a..z}{a..z}; do
printf '%06d %5.5s\n' "$i" "$text"
(( i++ ))
done
Precise
Aiming at having a function that convert any number to the character string:
We must Understand that what you are describing is a number written in base 26, using the character a as 0, b as 1, c as 3, etc.
Thus, aaaa means 0000, aaab means 0001, aaac means 0002, .... aaaz means 0025
and aaba means 0026, aaca means 0052.
bc could do the base conversion directly (as numbers):
$ echo 'obase=26; 199'|bc
07 17
The 7th letter is: a0, b1, c2, d3, e4, f5, g6, (h)7,
the 17th letter is (r).
If we set the variable list to: list=$(printf '%s' {a..z}) or list=abcdefghijklmnopqrstuvwxyz
We could get each letter from the number with: ${list:7:1} and ${list:17:1}
$ echo "${list:7:1} and ${list:17:1}"
h and r
$ printf '%s' "${list:7:1}" "${list:17:1}" # Using printf:
hr
Script
All together inside an script, is:
#!/bin/bash
list=$(printf '%s' {a..z})
getletters(){
local numbers
numbers="$(echo "obase=26; $1"|bc)"
for number in $numbers; do
printf '%s' "${list:10#$number:1}";
done;
echo;
}
count=2
limit=$(( 26**$count - 1 ))
for (( i=0; i<=$limit; i++)); do
printf '%06d %-5.5s\n' "$i" "$(getletters "$i")"
done
Please change count from 2 to 4 to get the whole list. Be aware that such list is more than half a million lines: The limit is 456,975 and will take some time.

With perl, you can ++ a string to increment the letter:
for (my ($n,$s) = (0,"aaaa"); $n < 200; $n++, $s++) {
printf "%s=%0*d\n", $s, length($s), $n;
}
outputs
aaaa=0000
aaab=0001
aaac=0002
aaad=0003
...
aaby=0050
aabz=0051
aaca=0052
aacb=0053
...
aahp=0197
aahq=0198
aahr=0199

Get common values in 2 arrays in shell scripting [duplicate]

This question already has answers here:
Intersection of two lists in Bash
(5 answers)
Closed 3 years ago.
I have an
array1 = (20,30,40,50)
array2 = (10,20,30,80,100,110,40)
I have to get the common values from these 2 arrays in my array 3 like:
array3 = (20,30,40)
in ascending sorted order.

Shell and standard Unix utilities are good at dealing with text files.
In that realm, arrays would be text files whose elements are the lines.
To find the common part between two such arrays, there's the standard comm command. comm expects alphabetically sorted input though.
So, if you have two files A and B containing the elements of those two arrays, one per line (which also means the array elements can't contain newline characters), you can find the intersection with
comm -12 <(sort A) <(sort B)
If you want to start with bash arrays (but using arrays in shells is generally a good indication that you're using the wrong tool for your task), you can convert back and forth between the bash arrays and our text file arrays of lines with printf '%s\n' and word splitting:
array_one=(20 30 40 50)
array_two=(10 20 30 80 100 110 40)
IFS=$'\n'; set -f
intersection=($(comm -12 <(
printf '%s\n' "${array_one[#]}" | sort) <(
printf '%s\n' "${array_two[#]}" | sort)))

You almost certainly should not be using shell for this so here's ONE awk solution to your specific problem:
awk 'BEGIN{
split("20,30,40,50",array1,/,/)
split("10,20,30,80,100,110,40",array2,/,/)
for (i=1;i in array1;i++)
for (j=1;j in array2;j++)
if (array1[i] == array2[j])
array3[++k] = array1[i]
for (k=1; k in array3; k++)
printf "array3[%d] = %d\n",k,array3[k]
}'
array3[1] = 20
array3[2] = 30
array3[3] = 40
and if you tell us what you're really trying to do you can get a lot more help.

A pure bash solution using arrays:
#!/bin/bash
array1=(20,30,40,50)
array2=(10,20,30,80,100,110,40)
IFS=,
for i in $array1 $array2;{ ((++tmp[i]));}
for i in ${!tmp[*]};{ [ ${tmp[i]} -gt 1 ] && array3+=($i);}
echo ${array3[*]}
Output
20 30 40
As array3 is not an associative array, the indexes comes in ascending order using ${!array[*]} notation. If You need comma separated list as input, use echo "${array3[*]}".
It can be used if the source elements are integers. It works only if each of the source arrays contain unique numbers..

Here's a solution with standard command line tools (sort and join):
join <(printf %s\\n "${array1[#]}" | sort -u) \
<(printf %s\\n "${array2[#]}" | sort -u) | sort -n
join requires its inputs to be sorted, and does not recognize numerical sort order. Consequently, I sort both lists in the default collation order, join them, and then resort the result numerically.
I also assumed that you'd created the arrays really as arrays, i.e.:
array1=(20 30 40 50)
I think the rest is more or less self-evident, possibly with the help of help printf and man bash.

maybe you can use perl for try.
#!/bin/perl
use warnings;
use strict;
my #array1 = (20,30,40,50);
my #array2 = (10,20,30,80,100,110,40);
my #array3 = ();
foreach my $x (#array1) {
# body...
if (grep(/$x/, #array2)){
print "found $x\n";
#array3=(#array3,$x);
};
}
print #array3

In addition to any of these fine answers, it seems that you also want to sort your array (containing the answer) in ascending order.
You can do that in a number of different ways, including this:
readarray array3 <<<"$(printf "%s\n" "${array3[#]}" | sort -n)"
This method also allows you to filter out duplicate values:
readarray array3 <<<"$(printf "%s\n" "${array3[#]}" | sort -n | uniq)"
And for the sake of the exercise, here's yet another way of solving it:
#!/bin/bash
array1=(20 30 40 50)
array2=(10 20 30 80 100 110 40)
declare -a array3
#sort both arrays
readarray array1 <<<"$(printf "%s\n" "${array1[#]}" | sort -n)"
readarray array2 <<<"$(printf "%s\n" "${array2[#]}" | sort -n)"
# look for values
i2=0
for i1 in ${!array1[#]}; do
while (( i2 < ${#array2[#]} && ${array1[$i1]} > ${array2[$i2]} )); do (( i2++ )); done
[[ ${array1[$i1]} == ${array2[$i2]} ]] && array3+=(${array1[$i1]})
done
echo ${array3[#]}

Consider using python:
In [6]: array1 = (20,30,40,50)
In [7]: array2 = (10,20,30,80,100,110,40)
In [8]: set(array1) & set(array2)
Out[8]: set([40, 20, 30])

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to sort 2 arrays in bash - bash

Related

How to find the max value in array without using sort cmd in shell script

Bash - creating multiple lists files from one file list

How to sum a row of numbers from text file-- Bash Shell Scripting

increment a letter sequence to represent a whole number where a=0 and z=25

Get common values in 2 arrays in shell scripting [duplicate]

Categories

Resources