Array intersection in Bash [duplicate]

Array intersection in Bash [duplicate] - bash

This question already has answers here:
Intersection of two lists in Bash
(5 answers)
Closed 6 years ago.
How do you compare two arrays in Bash to find all intersecting values?
Let's say:
array1 contains values 1 and 2
array2 contains values 2 and 3
I should get back 2 as a result.
My own answer:
for item1 in $array1; do
for item2 in $array2; do
if [[ $item1 = $item2 ]]; then
result=$result" "$item1
fi
done
done
I'm looking for alternate solutions as well.

The elements of list 1 are used as regular expression looked up in list2 (expressed as string: ${list2[*]} ):
list1=( 1 2 3 4 6 7 8 9 10 11 12)
list2=( 1 2 3 5 6 8 9 11 )
l2=" ${list2[*]} " # add framing blanks
for item in ${list1[#]}; do
if [[ $l2 =~ " $item " ]] ; then # use $item as regexp
result+=($item)
fi
done
echo ${result[#]}
The result is
1 2 3 6 8 9 11

Taking #Raihan's answer and making it work with non-files (though FDs are created)
I know it's a bit of a cheat but seemed like good alternative
Side effect is that the output array will be lexicographically sorted, hope thats okay
(also don't kno what type of data you have, so I just tested with numbers, there may be additional work needed if you have strings with special chars etc)
result=($(comm -12 <(for X in "${array1[#]}"; do echo "${X}"; done|sort) <(for X in "${array2[#]}"; do echo "${X}"; done|sort)))
Testing:
$ array1=(1 17 33 99 109)
$ array2=(1 2 17 31 98 109)
result=($(comm -12 <(for X in "${array1[#]}"; do echo "${X}"; done|sort) <(for X in "${array2[#]}"; do echo "${X}"; done|sort)))
$ echo ${result[#]}
1 109 17
p.s. I'm sure there was a way to get the array to out one value per line w/o the for loop, I just forget it (IFS?)

Your answer won't work, for two reasons:
$array1 just expands to the first element of array1. (At least, in my installed version of Bash that's how it works. That doesn't seem to be a documented behavior, so it may be a version-dependent quirk.)
After the first element gets added to result, result will then contain a space, so the next run of result=$result" "$item1 will misbehave horribly. (Instead of appending to result, it will run the command consisting of the first two items, with the environment variable result being set to the empty string.) Correction: Turns out, I was wrong about this one: word-splitting doesn't take place inside assignments. (See comments below.)
What you want is this:
result=()
for item1 in "${array1[#]}"; do
for item2 in "${array2[#]}"; do
if [[ $item1 = $item2 ]]; then
result+=("$item1")
fi
done
done

If it was two files (instead of arrays) you were looking for intersecting lines, you could use the comm command.
$ comm -12 file1 file2

Now that I understand what you mean by "array", I think -- first of all -- that you should consider using actual Bash arrays. They're much more flexible, in that (for example) array elements can contain whitespace, and you can avoid the risk that * and ? will trigger filename expansion.
But if you prefer to use your existing approach of whitespace-delimited strings, then I agree with RHT's suggestion to use Perl:
result=$(perl -e 'my %array2 = map +($_ => 1), split /\s+/, $ARGV[1];
print join " ", grep $array2{$_}, split /\s+/, $ARGV[0]
' "$array1" "$array2")
(The line-breaks are just for readability; you can get rid of them if you want.)
In the above Bash command, the embedded Perl program creates a hash named %array2 containing the elements of the second array, and then it prints any elements of the first array that exist in %array2.
This will behave slightly differently from your code in how it handles duplicate values in the second array; in your code, if array1 contains x twice and array2 contains x three times, then result will contain x six times, whereas in my code, result will contain x only twice. I don't know if that matters, since I don't know your exact requirements.

Related

How can I create a one-to-one relationship between two loops in bash?

In a bash script, I want to create a one-to-one relationship between two for loops that each have variables defined as a sequence. For example, I want something like
for g in `seq 11 1 21`;do
for i in `seq 1 1 10`;do
cat >$i.txt <<EOF
this one is $g.
EOF
done
done
to result in ten files (1.txt, 2.txt, 3.txt, etc). 1.txt would contain "this one is 11." 2.txt would contain "this one is 12." Etc.
The above example is permutative, where each value of g acts on each value of i. Is there a way to make it so only one value of g acts on only one value of i in a corresponding order (ie 1 corresponds to 11, 2 corresponds to 12, etc)?
Any help is greatly appreciated. Thank you.

Before I answer, there's a critical problem with the question: the two sequences have different lengths (there are eleven g values, but only ten i values). Either one of g's values must be skipped, or something filled in as the extra value for i. For my answer I'll assume g should actually run from 11 to 20, not 21.
If you don't want all combinations, then you only want one loop; the trick is to make a single loop iterate through both sequences simultaneously. There are a couple of ways to do this in bash. One is to store both sequences as arrays, and then iterate over their indexes:
g_array=( {11..20} ) # Could also use g_array=( $(seq 11 1 20) ) here
i_array=( {1..10} )
for index in "${!g_array[#]}"; do # The ${!arr[2]} gets the *indexes* of the array
cat >"${i_array[index]}.txt" <<EOF
this one is ${g_array[index]}.
EOF
done
Alternately, since these are just numeric sequences, you can use the for ((init; test; step)) structure to do it:
for ((g=11,i=1; g<=20; g++,i++)); do # Note: semicolons between parts, commas between things that happen together
cat >"$i.txt" <<EOF
this one is $g.
EOF
done

arr1=( $(seq 11 21 ))
arr2=( $(seq 1 10 ))
for idx in $(seq 0 ${#arr2[*]})
do
file="/tmp/"${arr2[$idx]}".txt"
echo "This one is ${arr1[$idx]} " > $file
done
First,both the sequences are assigned to 2 arrays. And then as you iterate over the length of 2nd array, you frame the string and write to file.

printf returns multiple copies

On OSX High Sierra, bash's printf seems to behave erroneously. Consider:
printf "[%s]" "x"
returns
[x]
all good... but:
printf "[%s]" "x" "y"
returns
[x][y]
instead of just [x] !!
don't tell me: don't provide more parameters. I don't know what the format will look like as it's passed to me, but I have parameters
the docs don't address this squarely, merely stating:
The format string is reused as often as necessary to satisfy the arguments.
Any extra format specifications are evaluated with zero or the null string.
is this broken?

From posix utilities printf:
The format operand shall be reused as often as necessary to satisfy the argument operands.
That exactly means that the format string is repeated as many times it needs to go through all the arguments. This is exactly how it was intended to work and this is one of the most useful features of printf.
You want to repeat a character '#' 10 times? Nothing simpler:
printf "#%.0s" $(seq 10)
# will expand to:
printf "#%.0s" 1 2 3 4 5 6 7 8 9 10
# is equivalent to:
printf "#%.0s#%.0s#%.0s#%.0s#%.0s#%.0s#%.0s#%.0s#%.0s#%.0s" 1 2 3 4 5 6 7 8 9 10
The %.0s will print zero character from the string, so it will print zero character, so it will.. print nothing. Thus the # is repeated as many times as many arguments are there.
You have an array and want to print all array members separated with a newline? Nothing simpler:
arr=(1 2 3 value1 test5 text7)
printf "%s\n" "${arr[#]}"

From my understanding is behaving as stated in this sentence of documentation:
The format string is reused as often as necessary to satisfy the arguments.
In your case, you have 2 arguments ("y" and "z") and just 1 format string ([%s]), so it is reused (i.e: use the same for each argument).
It iterates the arguments list and when it reaches the format string list end, it starts from the beginning:
The command:
printf "[%s](%s)" "x" "y" "z" "a"
Ouputs:
[x](y)[z](a)

How to iterate over two strings simultaneously ksh

I'm using data that is returned by another person's ksh93 script in the format of a print to the standard output. Depending on the flag I give it, their script gives me the information I need for my code. It comes out like a list separated by spaces, such that a run of the program has the format of:
"1 3 4 7 8"
"First Third Fourth Seventh Eighth"
For what I'm working on, I need to be able to match the entries of each output, so that I could make the information print in the following format:
1:First
3:Third
4:Fourth
7:Seventh
8:Eighth
I need to do more than just printing with the data, I just need to be able to access the pairs of information in each of the strings. Even though the actual contents of the strings can be any number of values, the two strings I get from running the other script will always be the same length.
I'm wondering if there exists a way to iterate over both at the same time, something along the lines of:
str_1=$(other_script -f)
str_2=$(other_script -i)
for a,b in ${str_1},${str_2} ; do
print "${a}:${b}"
done
This obviously isn't the right syntax, but I have been unable to find a way to make it work. Is there a way to iterate over both at the same time?
I know I could convert them to arrays first then iterate by numerical element, but I would like to save the time of converting them if there's a way to iterate over both simultaneously.

Why do you think it is not quick to convert the strings to arrays?
For example:
`#!/bin/ksh93
set -u
set -A line1
string1="1 3 4 7 8"
line1+=( ${string1} )
set -A line2
string2="First Third Fourth Seventh Eighth"
line2+=( ${string2})
typeset -i num_elem_line1=${#line1[#]}
typeset -i num_elem_line2=${#line2[#]}
typeset -i loop_counter=0
if (( num_elem_line1 == num_elem_line2 ))
then
while (( loop_counter < num_elem_line1 ))
do
print "${line1[${loop_counter}]}:${line2[${loop_counter}]}"
(( loop_counter += 1 ))
done
fi
`

As with the other comments, not sure why an array would be out of the question, especially if you plan on referencing the individual elements more than once later in your code.
A sample script that assumes you want to maintain your str_1/str_2 variables as strings; we'll load into arrays for referencing individual elements:
$ cat testme
#!/bin/ksh
str_1="1 3 4 7 8"
str_2="First Third Fourth Seventh Eighth"
str1=( ${str_1} )
str2=( ${str_2} )
# at this point matching array elements have the same index (0..4) ...
echo "++++++++++ str1[index]=element"
for i in "${!str1[#]}"
do
echo "str1[${i}]=${str1[${i}]}"
done
echo "++++++++++ str2[index]=element"
for i in "${!str1[#]}"
do
echo "str2[${i}]=${str2[${i}]}"
done
# since matching array elements have the same index, we just need
# to loop through one set of indexes to allow us to access matching
# array elements at the same time ...
echo "++++++++++ str1:str2"
for i in "${!str1[#]}"
do
echo ${str1[${i}]}:${str2[${i}]}
done
echo "++++++++++"
And a run of the script:
$ testme
++++++++++ str1[index]=element
str1[0]=1
str1[1]=3
str1[2]=4
str1[3]=7
str1[4]=8
++++++++++ str2[index]=element
str2[0]=First
str2[1]=Third
str2[2]=Fourth
str2[3]=Seventh
str2[4]=Eighth
++++++++++ str1:str2
1:First
3:Third
4:Fourth
7:Seventh
8:Eighth
++++++++++

What is "echo ${#names[3]} ${#names[#]}" doing after defining "names=(apples [3]=orange tomatoe)"?

can anyone explain to me how I should be reading/understanding this command
Here, I'm not sure I understand what [3] means/does
names=(apples [3]=orange tomatoe)
here we call names twice, what is [3] and [#]?
echo ${#names[3]} ${#names[#]}
The output is 6 3 . I don't understand, if someone has time to explain or point me towards the correct man page, that would be great.

The first part demonstrates the general assignment syntax for arrays. The simple form,
$ names=(apples oranges tomatoe)
$ echo "${!names[#]}" # Show the indices defined for the array
0 1 2
assigns each element to consecutive integer indices starting with 0. If an index is explicitly given, that index is used instead, and subsequent values are assigned consecutively from there. Shell arrays don't have to be contiguous; your example leaves ${names[1]} and ${names[2]} undefined.
$ names=(apples [3]=orange tomatoe)
$ echo "${!names[#]}"
0 3 4
In the second case, you are using the parameter length operator. The first one tells you the length of the ${names[3]}:
$ echo "${#names[3]}" # orange has 6 characters
6
The second one, with # as the index, tells you the length of the array, i.e., how many values are in the array.
$ echo "${#names[#]}"
3
$ printf '%s\n' "${names[#]}"
apples
orange
tomatoe

How can I find the missing integers in a unique and sequential list (one per line) in a unix terminal?

Suppose I have a file as follows (a sorted, unique list of integers, one per line):
1
3
4
5
8
9
10
I would like the following output (i.e. the missing integers in the list):
2
6
7
How can I accomplish this within a bash terminal (using awk or a similar solution, preferably a one-liner)?

Using awk you can do this:
awk '{for(i=p+1; i<$1; i++) print i} {p=$1}' file
2
6
7
Explanation:
{p = $1}: Variable p contains value from previous record
{for ...}: We loop from p+1 to the current row's value (excluding current value) and print each value which is basically the missing values

Using seq and grep:
seq $(head -n1 file) $(tail -n1 file) | grep -vwFf file -
seq creates the full sequence, grep removes the lines that exists in the file from it.

perl -nE 'say for $a+1 .. $_-1; $a=$_'

Calling no external program (if filein contains the list of numbers):
#!/bin/bash
i=0
while read num; do
while (( ++i<num )); do
echo $i
done
done <filein

To adapt choroba's clever answer for my own use case, I needed my sequence to deal with zero-padded numbers.
The -w switch to seq is the magic here - it automatically pads the first number with the necessary number of zeroes to keep it aligned with the second number:
-w, --equal-width equalize width by padding with leading zeroes
My integers go from 0 to 9999, so I used the following:
seq -w 0 9999 | grep -vwFf "file.txt"
...which finds the missing integers in a sequence from 0000 to 9999. Or to put it back into the more universal solution in choroba's answer:
seq -w $(head -n1 "file.txt") $(tail -n1 "file.txt") | grep -vwFf "file.txt"
I didn't personally find the - in his answer was necessary, but there may be usecases which make it so.

Using Raku (formerly known as Perl_6)
raku -e 'my #a = lines.map: *.Int; say #a.Set (^) #a.minmax.Set;'
Sample Input:
1
3
4
5
8
9
10
Sample Output:
Set(2 6 7)
I'm sure there's a Raku solution similar to #JJoao's clever Perl5 answer, but in thinking about this problem my mind naturally turned to Set operations.
The code above reads lines into the #a array, mapping each line so that elements in the #a array are Ints, not strings. In the second statement, #a.Set converts the array to a Set on the left-hand side of the (^) operator. Also in the second statement, #a.minmax.Set converts the array to a second Set, on the right-hand side of the (^) operator, but this time because the minmax operator is used, all Int elements from the min to max are included. Finally, the (^) symbol is the symmetric set-difference (infix) operator, which finds the difference.
To get an unordered whitespace-separated list of missing integers, replace the above say with put. To get a sequentially-ordered list of missing integers, add the explicit sort below:
~$ raku -e 'my #a = lines.map: *.Int; .put for (#a.Set (^) #a.minmax.Set).sort.map: *.key;' file
2
6
7
The advantage of all Raku code above is that finding "missing integers" doesn't require a "sequential list" as input, nor is the input required to be unique. So hopefully this code will be useful for a wide variety of problems in addition to the explicit problem stated in the Question.
OTOH, Raku is a Perl-family language, so TMTOWTDI. Below, a #a.minmax array is created, and grepped so that none of the elements of #a are returned (none junction):
~$ raku -e 'my #a = lines.map: *.Int; .put for #a.minmax.grep: none #a;' file
2
6
7
https://docs.raku.org/language/setbagmix
https://docs.raku.org/type/Junction
https://raku.org

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Array intersection in Bash [duplicate] - bash

If it was two files (instead of arrays) you were looking for intersecting lines, you could use the comm command. $ comm -12 file1 file2

Related

How can I create a one-to-one relationship between two loops in bash?

printf returns multiple copies

How to iterate over two strings simultaneously ksh

What is "echo ${#names[3]} ${#names[#]}" doing after defining "names=(apples [3]=orange tomatoe)"?

How can I find the missing integers in a unique and sequential list (one per line) in a unix terminal?

Categories

Resources