Related
Given a store of 3-tuples where:
All elements are numeric ex :( 1, 3, 4) (1300, 3, 15) (1300, 3, 15) …
Tuples are removed and added frequently
At any time the store is typically under 100,000 elements
All Tuples are available in memory
The application is interactive requiring 100s of searches per second.
What are the most efficient algorithms/data structures to perform wild card (*) searches such as:
(1, *, 6) (3601, *, *) (*, 1935, *)
The aim is to have a Linda like tuple space but on an application level
Well, there are only 8 possible arrangements of wildcards, so you can easily construct 6 multi-maps and a set to serve as indices: one for each arrangement of wildcards in the query. You don't need an 8th index because the query (*,*,*) trivially returns all tuples. The set is for tuples with no wildcards; only a membership test is needed in this case.
A multimap takes a key to a set. In your example, e.g., the query (1,*,6) would consult the multimap for queries of the form (X,*,Y), which takes key <X,Y> to the set of all tuples with X in the first position and Y in third. In this case, X=1 and Y=6.
With any reasonable hash-based multimap implementation, lookups ought to be very fast. Several hundred a second ought to be easy, and several thousand per second doable (with e.g a contemporary x86 CPU).
Insertions and deletions require updating the maps and set. Again this ought to be reasonably fast, though not as fast as lookups of course. Again several hundred per second ought to be doable.
With only ~10^5 tuples, this approach ought to be fine for memory as well. You can save a bit of space with tricks, e.g. keeping a single copy of each tuple in an array and storing indices in the map/set to represent both key and value. Manage array slots with a free list.
To make this concrete, here is pseudocode. I'm going to use angle brackets <a,b,c> for tuples to avoid too many parens:
# Definitions
For a query Q <k2,k1,k0> where each of k_i is either * or an integer,
Let I(Q) be a 3-digit binary number b2|b1|b0 where
b_i=0 if k_i is * and 1 if k_i is an integer.
Let N(i) be the number of 1's in the binary representation of i
Let M(i) be a multimap taking a tuple with N(i) elements to a set
of tuples with 3 elements.
Let t be a 3 element tuple. Then T(t,i) returns a new tuple with
only the elements of t in positions where i has a 1. For example
T(<1,2,3>,0) = <> and T(<1,2,3>,6) = <2,3>
Note that function T works fine on query tuples with wildcards.
# Algorithm to insert tuple T into the database:
fun insert(t)
for i = 0 to 7
add the entry T(t,i)->t to M(i)
# Algorithm to delete tuple T from the database:
fun delete(t)
for i = 0 to 7
delete the entry T(t,i)->t from M(i)
# Query algorithm
fun query(Q)
let i = I(Q)
return M(i).lookup(T(Q, i)) # lookup failure returns empty set
Note that for simplicity, I've not shown the "optimizations" for M(0) and M(7). For M(0), the algorithm above would create a multimap taking the empty tuple to the set of all 3-tuples in the database. You can avoid this merely by treating i=0 as a special case. Similarly M(7) would take each tuple to a set containing only itself.
An "optimized" version:
fun insert(t)
for i = 1 to 6
add the entry T(t,i)->t to M(i)
add t to set S
fun delete(t)
for i = 1 to 6
delete the entry T(t,i)->t from M(i)
remove t from set S
fun query(Q)
let i = I(Q)
if i = 0, return S
elsif i = 7 return if Q\in S { Q } else {}
else return M(i).lookup(T(Q, i))
Addition
For fun, a Java implementation:
package hacking;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Random;
import java.util.Scanner;
import java.util.Set;
public class Hacking {
public static void main(String [] args) {
TupleDatabase db = new TupleDatabase();
int n = 200000;
long start = System.nanoTime();
for (int i = 0; i < n; ++i) {
db.insert(db.randomTriple());
}
long stop = System.nanoTime();
double elapsedSec = (stop - start) * 1e-9;
System.out.println("Inserted " + n + " tuples in " + elapsedSec
+ " seconds (" + (elapsedSec / n * 1000.0) + "ms per insert).");
Scanner in = new Scanner(System.in);
for (;;) {
System.out.print("Query: ");
int a = in.nextInt();
int b = in.nextInt();
int c = in.nextInt();
System.out.println(db.query(new Tuple(a, b, c)));
}
}
}
class Tuple {
static final int [] N_ONES = new int[] { 0, 1, 1, 2, 1, 2, 2, 3 };
static final int STAR = -1;
final int [] vals;
Tuple(int a, int b, int c) {
vals = new int[] { a, b, c };
}
Tuple(Tuple t, int code) {
vals = new int[N_ONES[code]];
int m = 0;
for (int k = 0; k < 3; ++k) {
if (((1 << k) & code) > 0) {
vals[m++] = t.vals[k];
}
}
}
#Override
public boolean equals(Object other) {
if (other instanceof Tuple) {
Tuple triple = (Tuple) other;
return Arrays.equals(this.vals, triple.vals);
}
return false;
}
#Override
public int hashCode() {
return Arrays.hashCode(this.vals);
}
#Override
public String toString() {
return Arrays.toString(vals);
}
int code() {
int c = 0;
for (int k = 0; k < 3; k++) {
if (vals[k] != STAR) {
c |= (1 << k);
}
}
return c;
}
Set<Tuple> setOf() {
Set<Tuple> s = new HashSet<>();
s.add(this);
return s;
}
}
class Multimap extends HashMap<Tuple, Set<Tuple>> {
#Override
public Set<Tuple> get(Object key) {
Set<Tuple> r = super.get(key);
return r == null ? Collections.<Tuple>emptySet() : r;
}
void put(Tuple key, Tuple value) {
if (containsKey(key)) {
super.get(key).add(value);
} else {
super.put(key, value.setOf());
}
}
void remove(Tuple key, Tuple value) {
Set<Tuple> set = super.get(key);
set.remove(value);
if (set.isEmpty()) {
super.remove(key);
}
}
}
class TupleDatabase {
final Set<Tuple> set;
final Multimap [] maps;
TupleDatabase() {
set = new HashSet<>();
maps = new Multimap[7];
for (int i = 1; i < 7; i++) {
maps[i] = new Multimap();
}
}
void insert(Tuple t) {
set.add(t);
for (int i = 1; i < 7; i++) {
maps[i].put(new Tuple(t, i), t);
}
}
void delete(Tuple t) {
set.remove(t);
for (int i = 1; i < 7; i++) {
maps[i].remove(new Tuple(t, i), t);
}
}
Set<Tuple> query(Tuple q) {
int c = q.code();
switch (c) {
case 0: return set;
case 7: return set.contains(q) ? q.setOf() : Collections.<Tuple>emptySet();
default: return maps[c].get(new Tuple(q, c));
}
}
Random gen = new Random();
int randPositive() {
return gen.nextInt(1000);
}
Tuple randomTriple() {
return new Tuple(randPositive(), randPositive(), randPositive());
}
}
Some output:
Inserted 200000 tuples in 2.981607358 seconds (0.014908036790000002ms per insert).
Query: -1 -1 -1
[[504, 296, 987], [500, 446, 184], [499, 482, 16], [488, 823, 40], ...
Query: 500 446 -1
[[500, 446, 184], [500, 446, 762]]
Query: -1 -1 500
[[297, 56, 500], [848, 185, 500], [556, 351, 500], [779, 986, 500], [935, 279, 500], ...
If you think of the tuples like a ip address, then a radix tree (trie) type structure might work. Radix tree is used for IP discovery.
Another way maybe to calculate use bit operations and calculate a bit hash for the tuple and in your search do bit (or, and) for quick discovery.
I have the following problem:
Calculate the combination of three digits number consisting of 0-9, and no duplicate is allowed.
As far as I know, combinations don't care about ordering, so 123 is equal to 312 and the number of possible combinations should be
( 10 ) = 120 combinations
( 3 )
that said: I know how to calculate permutations (via backtracking) but I don't know how to calculate the combinations.
Any hint?
Finding the comnbination is also done via backtracking. At each step - you "guess" if you should or should not add the current candidate element, and recurse on the decision. (and repeat for both "include" and "exclude" decisions).
Here is a jave code:
public static int getCombinations(int[] arr, int maxSize) {
return getCombinations(arr, maxSize, 0, new Stack<Integer>());
}
private static int getCombinations(int[] arr, int maxSize, int i, Stack<Integer> currentSol) {
if (maxSize == 0) {
System.out.println(currentSol);
return 1;
}
if (i >= arr.length) return 0;
//"guess" to include:
currentSol.add(arr[i]);
int x = getCombinations(arr, maxSize-1, i+1, currentSol);
//clean up:
currentSol.pop();
x += getCombinations(arr, maxSize, i+1, currentSol);
return x;
}
You can run it with the following demo:
public static void main(String args[]) {
int[] data = {0,1,2,3,4,5,6,7,8,9};
int x = getCombinations(data, 3);
System.out.println("number of combinations generated: " + x);
}
And get a series of combinations, and at the number of combinations printed (unsurprisingly, 120)
Example function to choose k items from a list of n items
void recurCombinations( listSoFar, listRemaining )
{
if ( length(listSoFar) == k )
{
print listSoFar;
return;
}
if ( length(listRemaining) <= 0 )
return;
// recur further without adding next item
recurCombinations( listSoFar, listRemaining - listRemaining[0] );
// recur further after adding next item
recurCombinations( listSoFar + listRemaining[0], listRemaining - listRemaining[0] );
}
recurCombinations( [], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] );
You probably seek How to generate a combination by its number. The algorithm consists of creating a sequence of C(a[i],i) with i iterating from the number of items in a combination downto 1, so that the sum of these C values is equal to your given number. Then those a[i] get inverted by length-1 and produced as result. A code in Powershell that makes this run:
function getC {
# this returns Choose($big,$small)
param ([int32]$big,[int32]$small)
if ($big -lt $small) { return 0 }
$l=$big
$total=[int64]1
1..$small | % {
$total *= $l
$total /= $_
$l-=1
}
return $total
}
function getCombinationByNumber {
param([string[]]$array, [int32]$howMany, [int64[]]$numbers)
$total=(getc $array.length $howMany)-1
foreach($num in $numbers) {
$res=#()
$num=$total-$num # for lexicographic inversion, see link
foreach($current in $howMany..1) {
# compare "numbers" to C($inner,$current) as soon as getting less than "numbers" take "inner"
foreach ($inner in $array.length..($current-1)) {
$c=getc $inner $current
if ($c -le $num) {
$num-=$c
$res+=$inner
break;
}
}
}
# $numbers=0, $res contains inverted indexes
$res2=#()
$l=$array.length-1
$res | % { $res2+=$array[$l-$_] }
return $res2
} }
To launch, provide the function an array from which to get combinations, e.g. #(0,1,2,3,4,5,6,7,8,9), the number of items in a combination (3) and the number of combination, starting from zero. An example:
PS C:\Windows\system32> $b=#(0,1,2,3,4,5,6,7,8,9)
PS C:\Windows\system32> getCombinationByNumber $b 3 0
0
1
2
PS C:\Windows\system32> [String](getCombinationByNumber $b 3 0)
0 1 2
PS C:\Windows\system32> [String](getCombinationByNumber $b 3 102)
4 5 8
I'm trying to write an algorithm for finding the index of the closest value that is lesser than or equal to the search value in a sorted array. In the example of the array [10, 20, 30], the following search values should output these indexes:
searchValue: 9, index: -1
searchValue: 10, index: 0
searchValue: 28, index: 1
searchValue: 55555, index: 2
I want to use binary search for logarithmic runtime. I have an algorithm in C-esque psuedocode, but it has 3 base cases. Can these 3 base cases be condensed into 1 for a more elegant solution?
int function indexOfClosestLesser(array, searchValue, startIndex, endIndex) {
if (startIndex == endIndex) {
if (searchValue >= array[startIndex]) {
return startIndex;
} else {
return -1;
}
}
// In the simplistic case of searching for 2 in [0, 2], the midIndex
// is always 0 due to int truncation. These checks are to avoid recursing
// infinitely from index 0 to index 1.
if (startIndex == endIndex - 1) {
if (searchValue >= array[endIndex]) {
return endIndex;
} else if (searchValue >= array[startIndex]) {
return startIndex;
} else {
return -1;
}
}
// In normal binary search, this would be the only base case
if (startIndex < endIndex) {
return -1;
}
int midIndex = endIndex / 2 + startIndex / 2;
int midValue = array[midIndex];
if (midValue > searchValue) {
return indexOfClosestLesser(array, searchValue, startIndex, midIndex - 1);
} else if (searchValue >= midValue) {
// Unlike normal binary search, we don't start on midIndex + 1.
// We're not sure whether the midValue can be excluded yet
return indexOfClosestLesser(array, searchValue, midIndex, endIndex);
}
}
Based on your recursive approach, I would suggest the following c++ snippet that reduces the number of different cases a bit:
int search(int *array, int start_idx, int end_idx, int search_val) {
if( start_idx == end_idx )
return array[start_idx] <= search_val ? start_idx : -1;
int mid_idx = start_idx + (end_idx - start_idx) / 2;
if( search_val < array[mid_idx] )
return search( array, start_idx, mid_idx, search_val );
int ret = search( array, mid_idx+1, end_idx, search_val );
return ret == -1 ? mid_idx : ret;
}
Basically it performs a normal binary search. It only differs in the return statement of the last case to fulfill the additional requirement.
Here is a short test program:
#include <iostream>
int main( int argc, char **argv ) {
int array[3] = { 10, 20, 30 };
std::cout << search( array, 0, 2, 9 ) << std::endl;
std::cout << search( array, 0, 2, 10 ) << std::endl;
std::cout << search( array, 0, 2, 28 ) << std::endl;
std::cout << search( array, 0, 2, 55555 ) << std::endl;
return 0;
}
The output is as desired:
-1
0
1
2
Frankly speaking, I find the logic of finding a number greater than a given number a lot easier than the logic needed to find numbers less than or equal to a given number. Obviously, the reason behind that is the extra logic (that forms the edge cases) required to handle the duplicate numbers (of given num) present in the array.
public int justGreater(int[] arr, int val, int s, int e){
// Returns the index of first element greater than val.
// If no such value is present, returns the size of the array.
if (s >= e){
return arr[s] <= N ? s+1 : s;
}
int mid = (s + e) >> 1;
if (arr[mid] < val) return justGreater(arr, val, mid+1, e);
return justGreater(arr, val, s, mid);
}
and then to find the index of the closest value that is lesser than or equal to the search value in a sorted array, just subtract the returned value by 1:
ans = justGreater(arr, val, 0, arr.length-1) - 1;
Trick
The trick here is to search for searchValue + 1 and return the the found index as index - 1 which is left - 1 in the code below
For example if we search for 9 in [10, 20, 30]. The code will look for 10 and return that it's present at 0th index and we return 0-1 which is -1
Similarly if we try to search for 10 in the above example it will search for 10 + 1 and return 1st index and we return 1-1 which is 0
Code
def binary_search(array, searchValue, startIndex=0, endIndex=2 ** 32):
"""
Binary search for the closest value less than or equal to the search value
:param array: The given sorted list
:param searchValue: Value to be found in the array
:param startIndex: Initialized with 0
:param endIndex: Initialized with 2**32
:return: Returns the index closest value less than or equal to the search value
"""
left = max(0, startIndex)
right = min(len(array), endIndex)
while left < right:
mid = (left + right) // 2
if array[mid] < searchValue + 1:
left = mid + 1
else:
right = mid
return left - 1
It can also be done in a single line with the standard library.
import bisect
def standard_binary_search(array, searchVal):
return bisect.bisect_left(array, searchVal + 1) - 1
Testing
Testing the test cases provided by OP
array = [10, 20, 30]
print(binary_search(array, 9))
print(binary_search(array, 10))
print(binary_search(array, 28))
print(binary_search(array, 5555))
Results
-1
0
1
2
I created a linear search to test the binary search.
def linear_search(array, searchVal):
ans = -1
for i, num in enumerate(array):
if num > searchVal:
return ans
ans = i
return ans
And a function to test all the binary search functions above.
Check for correctness
def check_correctness(array, searchVal):
assert binary_search(array, searchVal) == linear_search(array, searchVal)
assert binary_search(array, searchVal) == standard_binary_search(array, searchVal)
return binary_search(array, searchVal)
Driver Function
nums = sorted(
[460, 4557, 1872, 2698, 4411, 1730, 3870, 4941, 77, 7789, 8553, 6011, 9882, 9597, 8060, 1518, 8210, 380, 6822, 9022,
8255, 8977, 2492, 5918, 3710, 4253, 8386, 9660, 2933, 7880, 615, 1439, 9311, 3526, 5674, 1899, 1544, 235, 3369,
519, 8018, 8489, 3093, 2547, 4903, 1836, 2447, 570, 7666, 796, 7149, 9623, 681, 1869, 4381, 2711, 9882, 4348, 4617,
7852, 5897, 4135, 9471, 4202, 6630, 3037, 9694, 9693, 7779, 3041, 3160, 4911, 8022, 7909, 297, 7258, 4379, 3216,
9474, 8876, 6108, 7814, 9484, 2868, 882, 4206, 3986, 3038, 3659, 3287, 2152, 2964, 7057, 7122, 261, 2716, 4845,
3709, 3562, 1928]
)
for num in range(10002):
ans = check_correctness(nums, num)
if ans != -1:
print(num, nums[check_correctness(nums, num)])
The driver function ran without any assert errors. This proves the correctness of the above two functions.
Commented version in typescript. Based on this answer but modified to return less than or equal to.
/**
* Binary Search of a sorted array but returns the closest smaller value if the
* needle is not in the array.
*
* Returns null if the needle is not in the array and no smaller value is in
* the array.
*
* #param haystack the sorted array to search #param needle the need to search
* for in the haystack #param compareFn classical comparison function, return
* -1 if a is less than b, 0 if a is equal to b, and 1 if a is greater than b
*/
export function lessThanOrEqualBinarySearch<T>(
haystack: T[],
needle: T,
compareFn: (a: T, b: T) => number
): T | null {
let lo = 0;
let hi = haystack.length - 1;
let lowestFound: T | null = null;
// iteratively search halves of the array but when we search the larger
// half keep track of the largest value in the smaller half
while (lo <= hi) {
let mid = (hi + lo) >> 1;
let cmp = compareFn(needle, haystack[mid]);
// needle is smaller than middle
// search in the bottom half
if (cmp < 0) {
hi = mid - 1;
continue;
}
// needle is larger than middle
// search in the top half
else if (cmp > 0) {
lo = mid + 1;
lowestFound = haystack[mid];
} else if (cmp === 0) {
return haystack[mid];
}
}
return lowestFound;
}
Here's a PHP version, based on user0815's answer.
Adapted it to take a function, not just an array, and made it more efficient by avoiding evaluation of $mid_idx twice.
function binarySearchLessOrEqual($start_idx, $end_idx, $search_val, $valueFunction)
{
//N.B. If the start index is bigger or equal to the end index, we've reached the end!
if( $start_idx >= $end_idx )
{
return $valueFunction($end_idx) <= $search_val ? $end_idx : -1;
}
$mid_idx = intval($start_idx + ($end_idx - $start_idx) / 2);
if ( $valueFunction($mid_idx) > $search_val ) //If the function is too big, we search in the bottom half
{
return binarySearchLessOrEqual( $start_idx, $mid_idx-1, $search_val, $valueFunction);
}
else //If the function returns less than OR equal, we search in the top half
{
$ret = binarySearchLessOrEqual($mid_idx+1, $end_idx, $search_val, $valueFunction);
//If nothing is suitable, then $mid_idx was actually the best one!
return $ret == -1 ? $mid_idx : $ret;
}
}
Rather than taking an array, it takes a int-indexed function. You could easily adapt it to take an array instead, or simply use it as below:
function indexOfClosestLesser($array, $searchValue)
{
return binarySearchLessOrEqual(
0,
count($array)-1,
$searchValue,
function ($n) use ($array)
{
return $array[$n];
}
);
}
Tested:
$array = [ 10, 20, 30 ];
echo "0: " . indexOfClosestLesser($array, 0) . "<br>"; //-1
echo "5: " . indexOfClosestLesser($array, 5) . "<br>"; //-1
echo "10: " . indexOfClosestLesser($array, 10) . "<br>"; //0
echo "15: " . indexOfClosestLesser($array, 15) . "<br>"; //0
echo "20: " . indexOfClosestLesser($array, 20) . "<br>"; //1
echo "25: " . indexOfClosestLesser($array, 25) . "<br>"; //1
echo "30: " . indexOfClosestLesser($array, 30) . "<br>"; //2
echo "35: " . indexOfClosestLesser($array, 35) . "<br>"; //2
Try using a pair of global variables, then reference those variables inside the COMPARE function for bsearch
In RPGIV we can call c functions.
The compare function with global variables looks like this:
dcl-proc compInvHdr;
dcl-pi compInvHdr int(10);
elmPtr1 pointer value;
elmPtr2 pointer value;
end-pi;
dcl-ds elm1 based(elmPtr1) likeds(invHdr_t);
dcl-ds elm2 based(elmPtr2) likeds(elm1);
dcl-s low int(10) inz(-1);
dcl-s high int(10) inz(1);
dcl-s equal int(10) inz(0);
select;
when elm1.rcd.RECORDNO < elm2.rcd.RECORDNO;
lastHiPtr = elmPtr2;
return low;
when elm1.rcd.RECORDNO > elm2.rcd.RECORDNO;
lastLoPtr = elmPtr2;
return high;
other;
return equal;
endsl;
end-proc;
Remember, that in bsearch the first element is the search key and the second element is the actual storage element in your array/memory, that is why the COMPARE procedure is using elmPtr2;
the call to bsearch looks like this:
// lastLoPtr and LastHiPtr are global variables
// basePtr points to the beginning of the array
lastLoPtr = basePtr;
lastHiPtr = basePtr + ((numRec - 1) * sizRec));
searchKey = 'somevalue';
hitPtr = bsearch(%addr(searchkey)
:basePtr
:numRec
:sizRec
:%PADDR('COMPINVHDR'));
if hitPtr <> *null;
//? not found
hitPtr = lastLoPtr;
else;
//? found
endif;
So if the key is not found then the hitPtr is set to the key of the closest match, effectively archiving a "Less than or Equal key".
If you want the opposite, the next greater key. Then use lastHiPtr to reference the first key greater than the search key.
Note: protect the global variables against race conditions (if applicable).
Wanted to provide a non-binary search way of doing this, in C#. The following finds the closest value to X, without being greater than X, but it can be equal to X. My function also does not need the list to be sorted. It is also theoretically faster than O(n), but only in the event that the exact target number is found, in which case it terminates early and returns the integer.
public static int FindClosest(List<int> numbers, int target)
{
int current = 0;
int difference = Int32.MaxValue;
foreach(int integer in numbers)
{
if(integer == target)
{
return integer;
}
int diff = Math.Abs(target - integer);
if(integer <= target && integer >= current && diff < difference)
{
current = integer;
difference = diff;
}
}
return current;
}
I tested this with the following setup, and it appears to be working flawlessly:
List<int> values = new List<int>() {1,24,32,6,14,9,11,22 };
int target = 21;
int closest = FindClosest(values,target);
Console.WriteLine("Closest: " + closest);
7 years later, I hope to provide some intuition:
If search_val <= arr[mid], we know for the sure that the solution resides in the interval [lo, mid], inclusive. So, we set right=mid (we probably can set right=mid-1 if mid is not included). Note that if search_val < arr[mid], we in fact know that the solution resides in [lo, mid), mid not inclusive. This is because search_val won't fall back on mid and use mid as the closest value <= search value if it is less than arr[mid].
On the other hand, search_val >= arr[mid]. In this case, we know that the solution resides in [mid, hi]. In fact, even if search_val > arr[mid], the solution is still [mid, hi]. This means that we should set left = mid. HOWEVER, in binary search, left is usually always set to mid + 1 to avoid infinite loops. But this means, when the loops at left==right, it is possible we are 1 index over the solution. Thus, we do a check at the very end to return either the left or left-1, that you can see in the other solutions.
Practice Problem: Search a 2D Matrix
Write an efficient algorithm that searches for a value target in an m x n integer matrix matrix. This matrix has the following properties:
Integers in each row are sorted from left to right.
The first integer of each row is greater than the last integer of the
previous row.
The smart solution to this problem is to treat the two-dimensional array as an one-dimensional one and use regular binary search. But I wrote a solution that first locates the correct row. The process of finding the correct row in this problem is basically the same as finding the closest value less than equal to the search value.
Additionally link on binary search: Useful Insights into Binary Search
a non-recursive way using loop, I'm using this in javascript so I'll just post in javascript:
let left = 0
let right = array.length
let mid = 0
while (left < right) {
mid = Math.floor((left + right) / 2)
if (searchValue < array[mid]) {
right = mid
} else {
left = mid + 1
}
}
return left - 1
since general guideline tells us to look at the middle pointer, many fail to see that the actual answer is the left pointer's final value.
I'm practicing algorithms and one of my tasks is to count the number of all longest increasing sub-sequences for given 0 < n <= 10^6 numbers. Solution O(n^2) is not an option.
I have already implemented finding a LIS and its length (LIS Algorithm), but this algorithm switches numbers to the lowest possible. Therefore, it's impossible to determine if sub-sequences with a previous number (the bigger one) would be able to achieve the longest length, otherwise I could just count those switches, I guess.
Any ideas how to get this in about O(nlogn)? I know that it should be solved using dynamic-programming.
I implemented one solution and it works well, but it requires two nested loops (i in 1..n) x (j in 1..i-1).
So it's O(n^2) I think, nevertheless it's too slow.
I tried even to move those numbers from array to a binary tree (because in each i iteration I look for all smaller numbers then number[i] - going through elements i-1..1), but it was even slower.
Example tests:
1 3 2 2 4
result: 3 (1,3,4 | 1,2,4 | 1,2,4)
3 2 1
result: 3 (1 | 2 | 3)
16 5 8 6 1 10 5 2 15 3 2 4 1
result: 3 (5,8,10,15 | 5,6,10,15 | 1,2,3,4)
Finding the number of all longest increasing subsequences
Full Java code of improved LIS algorithm, which discovers not only the length of longest increasing subsequence, but number of subsequences of such length, is below. I prefer to use generics to allow not only integers, but any comparable types.
#Test
public void testLisNumberAndLength() {
List<Integer> input = Arrays.asList(16, 5, 8, 6, 1, 10, 5, 2, 15, 3, 2, 4, 1);
int[] result = lisNumberAndlength(input);
System.out.println(String.format(
"This sequence has %s longest increasing subsequenses of length %s",
result[0], result[1]
));
}
/**
* Body of improved LIS algorithm
*/
public <T extends Comparable<T>> int[] lisNumberAndLength(List<T> input) {
if (input.size() == 0)
return new int[] {0, 0};
List<List<Sub<T>>> subs = new ArrayList<>();
List<Sub<T>> tails = new ArrayList<>();
for (T e : input) {
int pos = search(tails, new Sub<>(e, 0), false); // row for a new sub to be placed
int sum = 1;
if (pos > 0) {
List<Sub<T>> pRow = subs.get(pos - 1); // previous row
int index = search(pRow, new Sub<T>(e, 0), true); // index of most left element that <= e
if (pRow.get(index).value.compareTo(e) < 0) {
index--;
}
sum = pRow.get(pRow.size() - 1).sum; // sum of tail element in previous row
if (index >= 0) {
sum -= pRow.get(index).sum;
}
}
if (pos >= subs.size()) { // add a new row
List<Sub<T>> row = new ArrayList<>();
row.add(new Sub<>(e, sum));
subs.add(row);
tails.add(new Sub<>(e, 0));
} else { // add sub to existing row
List<Sub<T>> row = subs.get(pos);
Sub<T> tail = row.get(row.size() - 1);
if (tail.value.equals(e)) {
tail.sum += sum;
} else {
row.add(new Sub<>(e, tail.sum + sum));
tails.set(pos, new Sub<>(e, 0));
}
}
}
List<Sub<T>> lastRow = subs.get(subs.size() - 1);
Sub<T> last = lastRow.get(lastRow.size() - 1);
return new int[]{last.sum, subs.size()};
}
/**
* Implementation of binary search in a sorted list
*/
public <T> int search(List<? extends Comparable<T>> a, T v, boolean reversed) {
if (a.size() == 0)
return 0;
int sign = reversed ? -1 : 1;
int right = a.size() - 1;
Comparable<T> vRight = a.get(right);
if (vRight.compareTo(v) * sign < 0)
return right + 1;
int left = 0;
int pos = 0;
Comparable<T> vPos;
Comparable<T> vLeft = a.get(left);
for(;;) {
if (right - left <= 1) {
if (vRight.compareTo(v) * sign >= 0 && vLeft.compareTo(v) * sign < 0)
return right;
else
return left;
}
pos = (left + right) >>> 1;
vPos = a.get(pos);
if (vPos.equals(v)) {
return pos;
} else if (vPos.compareTo(v) * sign > 0) {
right = pos;
vRight = vPos;
} else {
left = pos;
vLeft = vPos;
}
}
}
/**
* Class for 'sub' pairs
*/
public static class Sub<T extends Comparable<T>> implements Comparable<Sub<T>> {
T value;
int sum;
public Sub(T value, int sum) {
this.value = value;
this.sum = sum;
}
#Override public String toString() {
return String.format("(%s, %s)", value, sum);
}
#Override public int compareTo(Sub<T> another) {
return this.value.compareTo(another.value);
}
}
Explanation
As my explanation seems to be long, I will call initial sequence "seq" and any its subsequence "sub". So the task is to calculate count of longest increasing subs that can be obtained from the seq.
As I mentioned before, idea is to keep counts of all possible longest subs obtained on previous steps. So let's create a numbered list of rows, where number of each line equals the length of subs stored in this row. And let's store subs as pairs of numbers (v, c), where "v" is "value" of ending element, "c" is "count" of subs of given length that end by "v". For example:
1: (16, 1) // that means that so far we have 1 sub of length 1 which ends by 16.
We will build such list step by step, taking elements from initial sequence by their order. On every step we will try to add this element to the longest sub that it can be added to and record changes.
Building a list
Let's build the list using sequence from your example, since it has all possible options:
16 5 8 6 1 10 5 2 15 3 2 4 1
First, take element 16. Our list is empty so far, so we just put one pair in it:
1: (16, 1) <= one sub that ends by 16
Next is 5. It cannot be added to a sub that ends by 16, so it will create new sub with length of 1. We create a pair (5, 1) and put it into line 1:
1: (16, 1)(5, 1)
Element 8 is coming next. It cannot create the sub [16, 8] of length 2, but can create the sub [5, 8]. So, this is where algorithm is coming. First, we iterate the list rows upside down, looking at the "values" of last pair. If our element is greater than values of all last elements in all rows, then we can add it to existing sub(s), increasing its length by one. So value 8 will create new row of the list, because it is greater than values all last elements existing in the list so far (i. e. > 5):
1: (16, 1)(5, 1)
2: (8, ?) <=== need to resolve how many longest subs ending by 8 can be obtained
Element 8 can continue 5, but cannot continue 16. So we need to search through previous row, starting from its end, calculating the sum of "counts" in pairs which "value" is less than 8:
(16, 1)(5, 1)^ // sum = 0
(16, 1)^(5, 1) // sum = 1
^(16, 1)(5, 1) // value 16 >= 8: stop. count = sum = 1, so write 1 in pair next to 8
1: (16, 1)(5, 1)
2: (8, 1) <=== so far we have 1 sub of length 2 which ends by 8.
Why don't we store value 8 into subs of length 1 (first line)? Because we need subs of maximum possible length, and 8 can continue some previous subs. So every next number greater than 8 will also continue such sub and there is no need to keep 8 as sub of length less that it can be.
Next. 6. Searching upside down by last "values" in rows:
1: (16, 1)(5, 1) <=== 5 < 6, go next
2: (8, 1)
1: (16, 1)(5, 1)
2: (8, 1 ) <=== 8 >= 6, so 6 should be put here
Found the room for 6, need to calculate a count:
take previous line
(16, 1)(5, 1)^ // sum = 0
(16, 1)^(5, 1) // 5 < 6: sum = 1
^(16, 1)(5, 1) // 16 >= 6: stop, write count = sum = 1
1: (16, 1)(5, 1)
2: (8, 1)(6, 1)
After processing 1:
1: (16, 1)(5, 1)(1, 1) <===
2: (8, 1)(6, 1)
After processing 10:
1: (16, 1)(5, 1)(1, 1)
2: (8, 1)(6, 1)
3: (10, 2) <=== count is 2 because both "values" 8 and 6 from previous row are less than 10, so we summarized their "counts": 1 + 1
After processing 5:
1: (16, 1)(5, 1)(1, 1)
2: (8, 1)(6, 1)(5, 1) <===
3: (10, 2)
After processing 2:
1: (16, 1)(5, 1)(1, 1)
2: (8, 1)(6, 1)(5, 1)(2, 1) <===
3: (10, 2)
After processing 15:
1: (16, 1)(5, 1)(1, 1)
2: (8, 1)(6, 1)(5, 1)(2, 1)
3: (10, 2)
4: (15, 2) <===
After processing 3:
1: (16, 1)(5, 1)(1, 1)
2: (8, 1)(6, 1)(5, 1)(2, 1)
3: (10, 2)(3, 1) <===
4: (15, 2)
After processing 2:
1: (16, 1)(5, 1)(1, 1)
2: (8, 1)(6, 1)(5, 1)(2, 2) <===
3: (10, 2)(3, 1)
4: (15, 2)
If when searching rows by last element we find equal element, we calculate its "count" again based on previous row, and add to existing "count".
After processing 4:
1: (16, 1)(5, 1)(1, 1)
2: (8, 1)(6, 1)(5, 1)(2, 2)
3: (10, 2)(3, 1)
4: (15, 2)(4, 1) <===
After processing 1:
1: (16, 1)(5, 1)(1, 2) <===
2: (8, 1)(6, 1)(5, 1)(2, 2)
3: (10, 2)(3, 1)
4: (15, 2)(4, 1)
So what do we have after processing all initial sequence? Looking at the last row, we see that we have 3 longest subs, each consist of 4 elements: 2 end by 15 and 1 ends by 4.
What about complexity?
On every iteration, when taking next element from initial sequence, we make 2 loops: first when iterating rows to find room for next element, and second when summarizing counts in previous row. So for every element we make maximum to n iterations (worst cases: if initial seq consists of elements in increasing order, we will get a list of n rows with 1 pair in every row; if seq is sorted in descending order, we will obtain list of 1 row with n elements). By the way, O(n2) complexity is not what we want.
First, this is obvious, that in every intermediate state rows are sorted by increasing order of their last "value". So instead of brute loop, binary searching can be performed, which complexity is O(log n).
Second, we don't need to summarize "counts" of subs by looping through row elements every time. We can summarize them in process, when new pair is added to the row, like:
1: (16, 1)(5, 2) <=== instead of 1, put 1 + "count" of previous element in the row
So second number will show not count of longest subs that can be obtained with given value at the end, but summary count of all longest subs that end by any element that is greater or equal to "value" from the pair.
Thus, "counts" will be replaced by "sums". And instead of iterating elements in previous row, we just perform binary search (it is possible because pairs in any row are always ordered by their "values") and take "sum" for new pair as "sum" of last element in previous row minus "sum" from element left to found position in previous row plus "sum" of previous element in the current row.
So when processing 4:
1: (16, 1)(5, 2)(1, 3)
2: (8, 1)(6, 2)(5, 3)(2, 5)
3: (10, 2)(3, 3)
4: (15, 2) <=== room for (4, ?)
search in row 3 by "values" < 4:
3: (10, 2)^(3, 3)
4 will be paired with (3-2+2): ("sum" from the last pair of previous row) - ("sum" from pair left to found position in previous row) + ("sum" from previous pair in current row):
4: (15, 2)(4, 3)
In this case, final count of all longest subs is "sum" from the last pair of the last row of the list, i. e. 3, not 3 + 2.
So, performing binary search to both row search and sum search, we will come with O(n*log n) complexity.
What about memory consumed, after processing all array we obtain maximum n pairs, so memory consumption in case of dynamic arrays will be O(n). Besides, when using dynamic arrays or collections, some additional time is needed to allocate and resize them, but most operations are made in O(1) time because we don't make any kind of sorting and rearrangement during process. So complexity estimation seems to be final.
Sasha Salauyou's answer is great but I am not clear why
sum -= pRow.get(index).sum;
here is my code based on the same idea
import java.math.BigDecimal;
import java.util.*;
class lisCount {
static BigDecimal lisCount(int[] a) {
class Container {
Integer v;
BigDecimal count;
Container(Integer v) {
this.v = v;
}
}
List<List<Container>> lisIdxSeq = new ArrayList<List<Container>>();
int lisLen, lastIdx;
List<Container> lisSeqL;
Container lisEle;
BigDecimal count;
int pre;
for (int i = 0; i < a.length; i++){
pre = -1;
count = new BigDecimal(1);
lisLen = lisIdxSeq.size();
lastIdx = lisLen - 1;
lisEle = new Container(i);
if(lisLen == 0 || a[i] > a[lisIdxSeq.get(lastIdx).get(0).v]){
// lis len increased
lisSeqL = new ArrayList<Container>();
lisSeqL.add(lisEle);
lisIdxSeq.add(lisSeqL);
pre = lastIdx;
}else{
int h = lastIdx;
int l = 0;
while(l < h){
int m = (l + h) / 2;
if(a[lisIdxSeq.get(m).get(0).v] < a[i]) l = m + 1;
else h = m;
}
List<Container> lisSeqC = lisIdxSeq.get(l);
if(a[i] <= a[lisSeqC.get(0).v]){
int hi = lisSeqC.size() - 1;
int lo = 0;
while(hi < lo){
int mi = (hi + lo) / 2;
if(a[lisSeqC.get(mi).v] < a[i]) lo = mi + 1;
else hi = mi;
}
lisSeqC.add(lo, lisEle);
pre = l - 1;
}
}
if(pre >= 0){
Iterator<Container> it = lisIdxSeq.get(pre).iterator();
count = new BigDecimal(0);
while(it.hasNext()){
Container nt = it.next();
if(a[nt.v] < a[i]){
count = count.add(nt.count);
}else break;
}
}
lisEle.count = count;
}
BigDecimal rst = new BigDecimal(0);
Iterator<Container> i = lisIdxSeq.get(lisIdxSeq.size() - 1).iterator();
while(i.hasNext()){
rst = rst.add(i.next().count);
}
return rst;
}
public static void main(String[] args) {
System.out.println(lisCount(new int[] { 1, 3, 2, 2, 4 }));
System.out.println(lisCount(new int[] { 3, 2, 1 }));
System.out.println(lisCount(new int[] { 16, 5, 8, 6, 1, 10, 5, 2, 15, 3, 2, 4, 1 }));
}
}
Patience sorting is also O(N*logN), but way shorter and simpler than the methods based on binary search:
static int[] input = {4, 5, 2, 8, 9, 3, 6, 2, 7, 8, 6, 6, 7, 7, 3, 6};
/**
* Every time a value is tested it either adds to the length of LIS (by calling decs.add() with it), or reduces the remaining smaller cards that must be found before LIS consists of smaller cards. This way all inputs/cards contribute in one way or another (except if they're equal to the biggest number in the sequence; if want't to include in sequence, replace 'card <= decs.get(decIndex)' with 'card < decs.get(decIndex)'. If they're bigger than all decs, they add to the length of LIS (which is something we want), while if they're smaller than a dec, they replace it. We want this, because the smaller the biggest dec is, the smaller input we need before we can add onto LIS.
*
* If we run into a decreasing sequence the input from this sequence will replace each other (because they'll always replace the leftmost dec). Thus this algorithm won't wrongfully register e.g. {2, 1, 3} as {2, 3}, but rather {2} -> {1} -> {1, 3}.
*
* WARNING: This can only be used to find length, not actual sequence, seeing how parts of the sequence will be replaced by smaller numbers trying to make their sequence dominate
*
* Due to bigger decs being added to the end/right of 'decs' and the leftmost decs always being the first to be replaced with smaller decs, the further a dec is to the right (the bigger it's index), the bigger it must be. Thus, by always replacing the leftmost decs, we don't run the risk of replacing the biggest number in a sequence (the number which determines if more cards can be added to that sequence) before a sequence with the same length but smaller numbers (thus currently equally good, due to length, and potentially better, due to less needed to increase length) has been found.
*/
static void patienceFindLISLength() {
ArrayList<Integer> decs = new ArrayList<>();
inputLoop: for (Integer card : input) {
for (int decIndex = 0; decIndex < decs.size(); decIndex++) {
if (card <= decs.get(decIndex)) {
decs.set(decIndex, card);
continue inputLoop;
}
}
decs.add(card);
}
System.out.println(decs.size());
}
Cpp implementation of above logic:
#include<bits/stdc++.h>
using namespace std;
#define pb push_back
#define pob pop_back
#define pll pair<ll, ll>
#define pii pair<int, int>
#define ll long long
#define ull unsigned long long
#define fori(a,b) for(i=a;i<b;i++)
#define forj(a,b) for(j=a;j<b;j++)
#define fork(a,b) for(k=a;k<b;k++)
#define forl(a,b) for(l=a;l<b;l++)
#define forir(a,b) for(i=a;i>=b;i--)
#define forjr(a,b) for(j=a;j>=b;j--)
#define mod 1000000007
#define boost std::ios::sync_with_stdio(false)
struct comp_pair_int_rev
{
bool operator()(const pair<int,int> &a, const int & b)
{
return (a.first > b);
}
bool operator()(const int & a,const pair<int,int> &b)
{
return (a > b.first);
}
};
struct comp_pair_int
{
bool operator()(const pair<int,int> &a, const int & b)
{
return (a.first < b);
}
bool operator()(const int & a,const pair<int,int> &b)
{
return (a < b.first);
}
};
int main()
{
int n,i,mx=0,p,q,r,t;
cin>>n;
int a[n];
vector<vector<pii > > v(100005);
vector<pii > v1(100005);
fori(0,n)
cin>>a[i];
v[1].pb({a[0], 1} );
v1[1]= {a[0], 1};
mx=1;
fori(1,n)
{
if(a[i]<=v1[1].first)
{
r=v1[1].second;
if(v1[1].first==a[i])
v[1].pob();
v1[1]= {a[i], r+1};
v[1].pb({a[i], r+1});
}
else if(a[i]>v1[mx].first)
{
q=upper_bound(v[mx].begin(), v[mx].end(), a[i], comp_pair_int_rev() )-v[mx].begin();
if(q==0)
{
r=v1[mx].second;
}
else
{
r=v1[mx].second-v[mx][q-1].second;
}
v1[++mx]= {a[i], r};
v[mx].pb({a[i], r});
}
else if(a[i]==v1[mx].first)
{
q=upper_bound(v[mx-1].begin(), v[mx-1].end(), a[i], comp_pair_int_rev() )-v[mx-1].begin();
if(q==0)
{
r=v1[mx-1].second;
}
else
{
r=v1[mx-1].second-v[mx-1][q-1].second;
}
p=v1[mx].second;
v1[mx]= {a[i], p+r};
v[mx].pob();
v[mx].pb({a[i], p+r});
}
else
{
p=lower_bound(v1.begin()+1, v1.begin()+mx+1, a[i], comp_pair_int() )-v1.begin();
t=v1[p].second;
if(v1[p].first==a[i])
{
v[p].pob();
}
q=upper_bound(v[p-1].begin(), v[p-1].end(), a[i], comp_pair_int_rev() )-v[p-1].begin();
if(q==0)
{
r=v1[p-1].second;
}
else
{
r=v1[p-1].second-v[p-1][q-1].second;
}
v1[p]= {a[i], t+r};
v[p].pb({a[i], t+r});
}
}
cout<<v1[mx].second;
return 0;
}
Although I completely agree with Alex this can be done very easily using Segment tree.
Here is the logic to find the length of LIS using segment tree in NlogN.
https://www.quora.com/What-is-the-approach-to-find-the-length-of-the-strictly-increasing-longest-subsequence
Here is an approach that finds no of LIS but takes N^2 complexity.
https://codeforces.com/blog/entry/48677
We use segment tree(as used here) to optimize approach given in this.
Here is the logic:
first sort the array in ascending order(also keep the original order), initialise segment tree with zeroes, segment tree should query two things(use pair for this) for a given range:
a. max of first.
b. sum of second corresponding to max-first.
iterate through sorted array.
let j be the original index of current element, then we query (0 - j-1) and update the j-th element(if result of query is 0,0 then we update it with (1,1)).
Here is my code in c++:
#include<bits/stdc++.h>
#define tr(container, it) for(typeof(container.begin()) it = container.begin(); it != container.end(); it++)
#define ll long long
#define pb push_back
#define endl '\n'
#define pii pair<ll int,ll int>
#define vi vector<ll int>
#define all(a) (a).begin(),(a).end()
#define F first
#define S second
#define sz(x) (ll int)x.size()
#define hell 1000000007
#define rep(i,a,b) for(ll int i=a;i<b;i++)
#define lbnd lower_bound
#define ubnd upper_bound
#define bs binary_search
#define mp make_pair
using namespace std;
#define N 100005
ll max(ll a , ll b)
{
if( a > b) return a ;
else return
b;
}
ll n,l,r;
vector< pii > seg(4*N);
pii query(ll cur,ll st,ll end,ll l,ll r)
{
if(l<=st&&r>=end)
return seg[cur];
if(r<st||l>end)
return mp(0,0); /* 2-change here */
ll mid=(st+end)>>1;
pii ans1=query(2*cur,st,mid,l,r);
pii ans2=query(2*cur+1,mid+1,end,l,r);
if(ans1.F>ans2.F)
return ans1;
if(ans2.F>ans1.F)
return ans2;
return make_pair(ans1.F,ans2.S+ans1.S); /* 3-change here */
}
void update(ll cur,ll st,ll end,ll pos,ll upd1, ll upd2)
{
if(st==end)
{
// a[pos]=upd; /* 4-change here */
seg[cur].F=upd1;
seg[cur].S=upd2; /* 5-change here */
return;
}
ll mid=(st+end)>>1;
if(st<=pos&&pos<=mid)
update(2*cur,st,mid,pos,upd1,upd2);
else
update(2*cur+1,mid+1,end,pos,upd1,upd2);
seg[cur].F=max(seg[2*cur].F,seg[2*cur+1].F);
if(seg[2*cur].F==seg[2*cur+1].F)
seg[cur].S = seg[2*cur].S+seg[2*cur+1].S;
else
{
if(seg[2*cur].F>seg[2*cur+1].F)
seg[cur].S = seg[2*cur].S;
else
seg[cur].S = seg[2*cur+1].S;
/* 6-change here */
}
}
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(0);
cout.tie(0);
int TESTS=1;
// cin>>TESTS;
while(TESTS--)
{
int n ;
cin >> n;
vector< pii > arr(n);
rep(i,0,n)
{
cin >> arr[i].F;
arr[i].S = -i;
}
sort(all(arr));
update(1,0,n-1,-arr[0].S,1,1);
rep(i,1,n)
{
pii x = query(1,0,n-1,-1,-arr[i].S - 1 );
update(1,0,n-1,-arr[i].S,x.F+1,max(x.S,1));
}
cout<<seg[1].S;//answer
}
return 0;
}
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
The community reviewed whether to reopen this question 12 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
You are given as input an unsorted array of n distinct numbers, where n is a power of 2. Give an algorithm that identifies the second-largest number in the array, and that uses at most n+log₂(n)−2 comparisons.
Start with comparing elements of the n element array in odd and even positions and determining largest element of each pair. This step requires n/2 comparisons. Now you've got only n/2 elements. Continue pairwise comparisons to get n/4, n/8, ... elements. Stop when the largest element is found. This step requires a total of n/2 + n/4 + n/8 + ... + 1 = n-1 comparisons.
During previous step, the largest element was immediately compared with log₂(n) other elements. You can determine the largest of these elements in log₂(n)-1 comparisons. That would be the second-largest number in the array.
Example: array of 8 numbers [10,9,5,4,11,100,120,110].
Comparisons on level 1: [10,9] ->10 [5,4]-> 5, [11,100]->100 , [120,110]-->120.
Comparisons on level 2: [10,5] ->10 [100,120]->120.
Comparisons on level 3: [10,120]->120.
Maximum is 120. It was immediately compared with: 10 (on level 3), 100 (on level 2), 110 (on level 1).
Step 2 should find the maximum of 10, 100, and 110. Which is 110. That's the second largest element.
sly s's answer is derived from this paper, but he didn't explain the algorithm, which means someone stumbling across this question has to read the whole paper, and his code isn't very sleek as well. I'll give the crux of the algorithm from the aforementioned paper, complete with complexity analysis, and also provide a Scala implementation, just because that's the language I chose while working on these problems.
Basically, we do two passes:
Find the max, and keep track of which elements the max was compared to.
Find the max among the elements the max was compared to; the result is the second largest element.
In the picture above, 12 is the largest number in the array, and was compared to 3, 1, 11, and 10 in the first pass. In the second pass, we find the largest among {3, 1, 11, 10}, which is 11, which is the second largest number in the original array.
Time Complexity:
All elements must be looked at, therefore, n - 1 comparisons for pass 1.
Since we divide the problem into two halves each time, there are at most log₂n recursive calls, for each of which, the comparisons sequence grows by at most one; the size of the comparisons sequence is thus at most log₂n, therefore, log₂n - 1 comparisons for pass 2.
Total number of comparisons <= (n - 1) + (log₂n - 1) = n + log₂n - 2
def second_largest(nums: Sequence[int]) -> int:
def _max(lo: int, hi: int, seq: Sequence[int]) -> Tuple[int, MutableSequence[int]]:
if lo >= hi:
return seq[lo], []
mid = lo + (hi - lo) // 2
x, a = _max(lo, mid, seq)
y, b = _max(mid + 1, hi, seq)
if x > y:
a.append(y)
return x, a
b.append(x)
return y, b
comparisons = _max(0, len(nums) - 1, nums)[1]
return _max(0, len(comparisons) - 1, comparisons)[0]
The first run for the given example is as follows:
lo=0, hi=1, mid=0, x=10, a=[], y=4, b=[]
lo=0, hi=2, mid=1, x=10, a=[4], y=5, b=[]
lo=3, hi=4, mid=3, x=8, a=[], y=7, b=[]
lo=3, hi=5, mid=4, x=8, a=[7], y=2, b=[]
lo=0, hi=5, mid=2, x=10, a=[4, 5], y=8, b=[7, 2]
lo=6, hi=7, mid=6, x=12, a=[], y=3, b=[]
lo=6, hi=8, mid=7, x=12, a=[3], y=1, b=[]
lo=9, hi=10, mid=9, x=6, a=[], y=9, b=[]
lo=9, hi=11, mid=10, x=9, a=[6], y=11, b=[]
lo=6, hi=11, mid=8, x=12, a=[3, 1], y=11, b=[9]
lo=0, hi=11, mid=5, x=10, a=[4, 5, 8], y=12, b=[3, 1, 11]
Things to note:
There are exactly n - 1=11 comparisons for n=12.
From the last line, y=12 wins over x=10, and the next pass starts with the sequence [3, 1, 11, 10], which has log₂(12)=3.58 ~ 4 elements, and will require 3 comparisons to find the maximum.
I have implemented this algorithm in Java answered by #Evgeny Kluev. The total comparisons are n+log2(n)−2. There is also a good reference:
Alexander Dekhtyar: CSC 349: Design and Analyis of Algorithms. This is similar to the top voted algorithm.
public class op1 {
private static int findSecondRecursive(int n, int[] A){
int[] firstCompared = findMaxTournament(0, n-1, A); //n-1 comparisons;
int[] secondCompared = findMaxTournament(2, firstCompared[0]-1, firstCompared); //log2(n)-1 comparisons.
//Total comparisons: n+log2(n)-2;
return secondCompared[1];
}
private static int[] findMaxTournament(int low, int high, int[] A){
if(low == high){
int[] compared = new int[2];
compared[0] = 2;
compared[1] = A[low];
return compared;
}
int[] compared1 = findMaxTournament(low, (low+high)/2, A);
int[] compared2 = findMaxTournament((low+high)/2+1, high, A);
if(compared1[1] > compared2[1]){
int k = compared1[0] + 1;
int[] newcompared1 = new int[k];
System.arraycopy(compared1, 0, newcompared1, 0, compared1[0]);
newcompared1[0] = k;
newcompared1[k-1] = compared2[1];
return newcompared1;
}
int k = compared2[0] + 1;
int[] newcompared2 = new int[k];
System.arraycopy(compared2, 0, newcompared2, 0, compared2[0]);
newcompared2[0] = k;
newcompared2[k-1] = compared1[1];
return newcompared2;
}
private static void printarray(int[] a){
for(int i:a){
System.out.print(i + " ");
}
System.out.println();
}
public static void main(String[] args) {
//Demo.
System.out.println("Origial array: ");
int[] A = {10,4,5,8,7,2,12,3,1,6,9,11};
printarray(A);
int secondMax = findSecondRecursive(A.length,A);
Arrays.sort(A);
System.out.println("Sorted array(for check use): ");
printarray(A);
System.out.println("Second largest number in A: " + secondMax);
}
}
the problem is:
let's say, in comparison level 1, the algorithm need to be remember all the array element because largest is not yet known, then, second, finally, third. by keep tracking these element via assignment will invoke additional value assignment and later when the largest is known, you need also consider the tracking back. As the result, it will not be significantly faster than simple 2N-2 Comparison algorithm. Moreover, because the code is more complicated, you need also think about potential debugging time.
eg: in PHP, RUNNING time for comparison vs value assignment roughly is :Comparison: (11-19) to value assignment: 16.
I shall give some examples for better understanding. :
example 1 :
>12 56 98 12 76 34 97 23
>>(12 56) (98 12) (76 34) (97 23)
>>> 56 98 76 97
>>>> (56 98) (76 97)
>>>>> 98 97
>>>>>> 98
The largest element is 98
Now compare with lost ones of the largest element 98. 97 will be the second largest.
nlogn implementation
public class Test {
public static void main(String...args){
int arr[] = new int[]{1,2,2,3,3,4,9,5, 100 , 101, 1, 2, 1000, 102, 2,2,2};
System.out.println(getMax(arr, 0, 16));
}
public static Holder getMax(int[] arr, int start, int end){
if (start == end)
return new Holder(arr[start], Integer.MIN_VALUE);
else {
int mid = ( start + end ) / 2;
Holder l = getMax(arr, start, mid);
Holder r = getMax(arr, mid + 1, end);
if (l.compareTo(r) > 0 )
return new Holder(l.high(), r.high() > l.low() ? r.high() : l.low());
else
return new Holder(r.high(), l.high() > r.low() ? l.high(): r.low());
}
}
static class Holder implements Comparable<Holder> {
private int low, high;
public Holder(int r, int l){low = l; high = r;}
public String toString(){
return String.format("Max: %d, SecMax: %d", high, low);
}
public int compareTo(Holder data){
if (high == data.high)
return 0;
if (high > data.high)
return 1;
else
return -1;
}
public int high(){
return high;
}
public int low(){
return low;
}
}
}
Why not to use this hashing algorithm for given array[n]? It runs c*n, where c is constant time for check and hash. And it does n comparisons.
int first = 0;
int second = 0;
for(int i = 0; i < n; i++) {
if(array[i] > first) {
second = first;
first = array[i];
}
}
Or am I just do not understand the question...
In Python2.7: The following code works at O(nlog log n) for the extra sort. Any optimizations?
def secondLargest(testList):
secondList = []
# Iterate through the list
while(len(testList) > 1):
left = testList[0::2]
right = testList[1::2]
if (len(testList) % 2 == 1):
right.append(0)
myzip = zip(left,right)
mymax = [ max(list(val)) for val in myzip ]
myzip.sort()
secondMax = [x for x in myzip[-1] if x != max(mymax)][0]
if (secondMax != 0 ):
secondList.append(secondMax)
testList = mymax
return max(secondList)
public static int FindSecondLargest(int[] input)
{
Dictionary<int, List<int>> dictWinnerLoser = new Dictionary<int, List<int>>();//Keeps track of loosers with winners
List<int> lstWinners = null;
List<int> lstLoosers = null;
int winner = 0;
int looser = 0;
while (input.Count() > 1)//Runs till we get max in the array
{
lstWinners = new List<int>();//Keeps track of winners of each run, as we have to run with winners of each run till we get one winner
for (int i = 0; i < input.Count() - 1; i += 2)
{
if (input[i] > input[i + 1])
{
winner = input[i];
looser = input[i + 1];
}
else
{
winner = input[i + 1];
looser = input[i];
}
lstWinners.Add(winner);
if (!dictWinnerLoser.ContainsKey(winner))
{
lstLoosers = new List<int>();
lstLoosers.Add(looser);
dictWinnerLoser.Add(winner, lstLoosers);
}
else
{
lstLoosers = dictWinnerLoser[winner];
lstLoosers.Add(looser);
dictWinnerLoser[winner] = lstLoosers;
}
}
input = lstWinners.ToArray();//run the loop again with winners
}
List<int> loosersOfWinner = dictWinnerLoser[input[0]];//Gives all the elemetns who lost to max element of array, input array now has only one element which is actually the max of the array
winner = 0;
for (int i = 0; i < loosersOfWinner.Count(); i++)//Now max in the lossers of winner will give second largest
{
if (winner < loosersOfWinner[i])
{
winner = loosersOfWinner[i];
}
}
return winner;
}