How to implement pagination for multikey indexes in Tarantool? - tarantool

I have multikey index over field that store an array of integers.
How I can implement cursor-based pagination using it?
In real if tuple has more than one value in array field it tuple will be selected #tuple[array_field_idx] times.
Yesterday I implemented "distinct" select (using ffi to get tuple pointer address), but it seems that it will not be works with pagination.
Have you any ideas how it can be implemented in Tarantool?

In order to implement pagination for multikey index you should know something about Tarantool internals.
I will write code that works for Tarantool 2.3+ (but it could be broken in future). Please test them carefully and update FFI definition in case of upgrade Tarantool version.
So, let's start. Firstly you should know that Tarantool BTree stores data in special structure memtx_tree_data that contains a pointer to your tuple and hint. It's special number that allows to speedup comparisons between tuples for simple tree index and it's a position of indexed element in array.
Firstly, we should to understand how to extract tuple hint with a tuple.
It could be done using some piece of FFI code and tree iterator.
local ffi = require('ffi')
ffi.cdef([[
typedef struct index_def;
typedef struct index;
typedef struct memtx_tree;
typedef struct mempool;
typedef uint64_t hint_t;
enum iterator_type {
/* ITER_EQ must be the first member for request_create */
ITER_EQ = 0, /* key == x ASC order */
ITER_REQ = 1, /* key == x DESC order */
ITER_ALL = 2, /* all tuples */
ITER_LT = 3, /* key < x */
ITER_LE = 4, /* key <= x */
ITER_GE = 5, /* key >= x */
ITER_GT = 6, /* key > x */
ITER_BITS_ALL_SET = 7, /* all bits from x are set in key */
ITER_BITS_ANY_SET = 8, /* at least one x's bit is set */
ITER_BITS_ALL_NOT_SET = 9, /* all bits are not set */
ITER_OVERLAPS = 10, /* key overlaps x */
ITER_NEIGHBOR = 11, /* tuples in distance ascending order from specified point */
iterator_type_MAX
};
typedef struct iterator {
/**
* Iterate to the next tuple.
* The tuple is returned in #ret (NULL if EOF).
* Returns 0 on success, -1 on error.
*/
int (*next)(struct iterator *it, struct tuple **ret);
/** Destroy the iterator. */
void (*free)(struct iterator *);
/** Space cache version at the time of the last index lookup. */
uint32_t space_cache_version;
/** ID of the space the iterator is for. */
uint32_t space_id;
/** ID of the index the iterator is for. */
uint32_t index_id;
/**
* Pointer to the index the iterator is for.
* Guaranteed to be valid only if the schema
* state has not changed since the last lookup.
*/
struct index *index;
};
struct memtx_tree_key_data {
/** Sequence of msgpacked search fields. */
const char *key;
/** Number of msgpacked search fields. */
uint32_t part_count;
/** Comparison hint, see tuple_hint(). */
hint_t hint;
};
struct memtx_tree_data {
/* Tuple that this node is represents. */
struct tuple *tuple;
/** Comparison hint, see key_hint(). */
hint_t hint;
};
typedef int16_t bps_tree_pos_t;
typedef uint32_t bps_tree_block_id_t;
typedef uint32_t matras_id_t;
struct matras_view {
/* root extent of the view */
void *root;
/* block count in the view */
matras_id_t block_count;
/* all views are linked into doubly linked list */
struct matras_view *prev_view, *next_view;
};
struct memtx_tree_iterator {
/* ID of a block, containing element. -1 for an invalid iterator */
bps_tree_block_id_t block_id;
/* Position of an element in the block. Could be -1 for last in block*/
bps_tree_pos_t pos;
/* Version of matras memory for MVCC */
struct matras_view view;
};
typedef struct tree_iterator {
struct iterator base;
struct memtx_tree_iterator tree_iterator;
enum iterator_type type;
struct memtx_tree_key_data key_data;
struct memtx_tree_data current;
/** Memory pool the iterator was allocated from. */
struct mempool *pool;
};
]])
local function get_tree_comparison_hint(box_iterator_state)
if box_iterator_state == nil then
return nil
end
local casted = ffi.cast("struct tree_iterator*", box_iterator_state)
--
-- IMPORTANT: hint is zero-based (as arrays in C)
-- Lua arrays is one-based.
--
return casted.current.hint
end
return {
get_tree_comparison_hint = get_tree_comparison_hint,
}
Then consider following example:
local box_iterator = require('common.box_iterator')
box.cfg{}
local space = box.schema.create_space('dict', {
format = {
{name = 'id', type = 'number'},
{name = 'bundles', type = 'array'}
},
if_not_exists = true,
})
space:create_index('pk', {
unique = true,
parts = {
{field = 1, type = 'number'}
},
if_not_exists = true,
})
space:create_index('multikey', {
unique = false,
parts = {
{field = 2, type = 'string', path = '[*]'},
-- Note: I intentionally add primary index parts here
{field = 1, type = 'number'}
},
if_not_exists = true,
})
space:replace({1, {'a', 'b', 'c', 'd'}})
space:replace({2, {'b', 'c'}})
space:replace({3, {'a', 'd'}})
space:replace({4, {'c', 'd'}})
for iter_state, tuple in space.index.multikey:pairs({'a'}, {iterator = 'GE'}) do
local position = box_iterator.get_tree_comparison_hint(iter_state) + 1
print(
string.ljust(tostring(tuple), 30),
position,
tuple[2][tonumber(position)]
)
end
os.exit()
Output is:
# Tuple Hint Indexed element
[1, ['a', 'b', 'c', 'd']] 1ULL a
[3, ['a', 'd']] 1ULL a
[1, ['a', 'b', 'c', 'd']] 2ULL b
[2, ['b', 'c']] 1ULL b
[1, ['a', 'b', 'c', 'd']] 3ULL c
[2, ['b', 'c']] 2ULL c
[4, ['c', 'd']] 1ULL c
[1, ['a', 'b', 'c', 'd']] 4ULL d
[3, ['a', 'd']] 2ULL d
[4, ['c', 'd']] 2ULL d
As you see the order is strictly determined.
Tarantool returns me tuples in order that is determined with (a) indexed value - tuple[path_to_array][hint+1] and primary key.
The second condition is common for all Tarantool secondary non-unique indexes.
Tarantool internally merge primary key to every non-unique index.
All you need is to specify it explicitly in your schema.
So the next term is cursor. Cursor allows you to continue iteration since place where you previously stopped. For unique indexes cursor is fields of this indexes, for non-unique index it's fields of this indexes with merger primary key (for details see key_def.merge function, currently it doesn't support multikey indexes but it's useful if you need to understand how to work index parts merge).
Following combination (merge(secondary_index_parts, primary_index_parts)) is always unique value that allows to to continue iteration since strictly determined place.
Let's return back to my example. E.g. I stopped at the row [1, ['a', 'b', 'c', 'd']] 3ULL c. My cursor is {'c', 1}.
Well, now I could continue since this point:
-- "GE" is changed to "GT" to skip already scanned tuple: [1, ['a', 'b', 'c', 'd']]
for iter_state, tuple in space.index.multikey:pairs({'c', 1}, {iterator = 'GT'}) do
local position = box_iterator.get_tree_comparison_hint(iter_state) + 1
print(
string.ljust(tostring(tuple), 30),
position,
tuple[2][tonumber(position)]
)
end
--[[
Result:
[2, ['b', 'c']] 2ULL c
[4, ['c', 'd']] 1ULL c
[1, ['a', 'b', 'c', 'd']] 4ULL d
[3, ['a', 'd']] 2ULL d
[4, ['c', 'd']] 2ULL d
--]]
You can compare with previous snippet and understand that I continue scan since needed value and don't scan redundant values and don't lose anything.
Such approach isn't quite clear and it's not completely comfortable.
You need to extract some magic value from Tarantool internals,
store them anywhere. But we use such approach in our projects because we don't have any alternatives yet :)

Related

Sorting a list of electronic components?

In a spreadsheet the bill of materials of electronic components lists the components by their values, for example, 1R0 (for 1 Ohm), 1K0 (for 1 kilo-Ohm), or 22p (for 22 pico-Farad), 1n0 (for 1 nano-Farad). How can numbers in this format be sorted in numerical order?
Before, unsorted:
Resistors
1K0
1R0
Capacitors
1n0
22p
After, sorted:
Resistors
1R0
1K0
Capacitors
22p
1n0
You can use the following custom function in order to sort your values:
var unitMap = {
'p': 1e-12,
'n': 1e-9,
'u': 1e-6,
'm': 1e-3,
'R': 1,
'K': 1e3,
'M': 1e6,
}
var unitRegex = /[pnumRKM]/;
function parseValue(val) {
var result = {};
var unitIdx = val.search(unitRegex);
var int = parseInt(val.substring(0, unitIdx));
var dec = parseFloat("0." + val.substring(unitIdx+1));
var multiplier = unitMap[val[unitIdx]];
return (int + dec) * multiplier;
}
/**
* Sorts E96 values.
*
* #param {range} input The range to sort.
* #param {number} input The column to sort by, starting at 1.
* #param {boolean} input Is ascending
* #return Sorted range.
* #customfunction
*/
function CUSTOMSORT(values, sort_column, is_ascending) {
values.sort(function (a, b) {
var a_value = parseValue(a[sort_column-1]);
var b_value = parseValue(b[sort_column-1]);
return is_ascending ? a_value - b_value : b_value - a_value;
});
return values;
}
The behaviour is pretty much the same as you would expect from the Sheets' built-in =SORT() function, albeit with less features. You can see two examples below:
you can custom sort it like this where you define the sorting order of each scalar:
=ARRAYFORMULA(SUBSTITUTE(TRANSPOSE(QUERY(TRANSPOSE(IFERROR(
ARRAY_CONSTRAIN(SORT({REGEXEXTRACT(A2:A, "(\d+)(.*)"),
VLOOKUP(REGEXEXTRACT(A2:A, "\d+(.*)"),
{"R0", 1;
"K0", 2;
"M0", 3}, 2, 0)}, 3, 1, 1, 1), 999^99, 2))),,999^99)), " ", ))

How do I pass an array of C.double's to a Cgo function?

I'm just getting started with CGo and I'm trying to send data to a C library that performs statistical computations on arrays of floats/doubles. What I'm trying to figure out right now is how to send an array of floats, or C.double's, to a CGo function that has a signature like this:
double pop_mean(int numPoints, double a[])
I've figured out how to get in the C.int in there, but I'm having trouble figuring out how to send in an array of doubles.
I haven't yet seen any blog posts or SO Questions about this exact thing, so I thought I'd ask.
The following is my best effort so far.
// Get a basic function to work, while passing in an ARRAY arr := make([]C.double, 0)
arr = append(arr, C.double(10.0))
arr = append(arr, C.double(20.0))
arr = append(arr, C.double(30.0))
var fixedArray [3]C.double = arr[:]
// ptr := C.CBytes(arr)
// defer C.free(unsafe.Pointer(ptr))
coolMean := C.pop_mean(3, &fixedArray)
fmt.Println("pop_mean (10, 20, 30): ", coolMean)
And this is the error I'm getting:
./main.go:64:6: cannot use arr[:] (type []_Ctype_double) as type [3]_Ctype_double in assignment
./main.go:69:35: cannot use &fixedArray (type *[3]_Ctype_double) as type *_Ctype_double in argument to _Cfunc_pop_mean
How should I be passing an array of C.double to the code?
When an array name is passed to a function, what is passed is the
location of the initial element. Within the called function, this
argument is a local variable, and so an array name parameter is a
pointer, that is, a variable containing an address.
C Programming Language, 2nd Edition
Slice types
A slice is a descriptor for a contiguous segment of an underlying
array and provides access to a numbered sequence of elements from that
array.
Like arrays, slices are indexable and have a length. The length of a
slice s can be discovered by the built-in function len; unlike with
arrays it may change during execution. The elements can be addressed
by integer indices 0 through len(s)-1. The slice index of a given
element may be less than the index of the same element in the
underlying array.
A slice, once initialized, is always associated with an underlying
array that holds its elements.
The Go Programming Language Specification
Reference: Go Command cgo
For a slice a, the arguments to the pop_mean(int numPoints, double a[]) C function are len(a), the length of the slice underlying array, and &a[0], the address of the first element of the slice underlying array.
In Go, we often hide details in a function. For example, a popMean function,
package main
import (
"fmt"
)
/*
double pop_mean(int numPoints, double a[]) {
if (a == NULL || numPoints == 0) {
return 0;
}
double mean = 0;
for (int i = 0; i < numPoints; i++) {
mean+=a[i];
}
return mean / numPoints;
}
*/
import "C"
func popMean(a []float64) float64 {
// This is the general case, which includes the special cases
// of zero-value (a == nil and len(a) == 0)
// and zero-length (len(a) == 0) slices.
if len(a) == 0 {
return 0
}
return float64(C.pop_mean(C.int(len(a)), (*C.double)(&a[0])))
}
func main() {
a := make([]float64, 10)
for i := range a {
a[i] = float64(i + 1)
}
// slice
fmt.Println(len(a), a)
pm := popMean(a)
fmt.Println(pm)
// subslice
b := a[1:4]
fmt.Println(len(b), b)
pm = popMean(b)
fmt.Println(pm)
// zero length
c := a[:0]
fmt.Println(len(c), c)
pm = popMean(c)
fmt.Println(pm)
// zero value (nil)
var z []float64
fmt.Println(len(z), z, z == nil)
pm = popMean(z)
fmt.Println(pm)
}
Output:
10 [1 2 3 4 5 6 7 8 9 10]
5.5
3 [2 3 4]
3
0 []
0
0 [] true
0
I figured out that you have to send a pointer to the first value in the array, rather than sending a pointer to the first element of the slice, or to the slice itself.
AND I also ran into the problem where I had created a new variable that was assigned the value of the first item in the slice and later created a pointer to that variable (which was no longer a part of the original array), instead of creating a pointer to the first item in the array (like I wanted).
Below is the working code, with comments to help avoid the problem in the paragraph above.
// Get a basic function to work, while passing in an ARRAY
// Create a dummy array of (10,20,30), the mean of which is 20.
arr := make([]C.double, 0)
arr = append(arr, C.double(10.0))
arr = append(arr, C.double(20.0))
arr = append(arr, C.double(30.0))
firstValue := &(arr[0]) // this notation seems to be pretty important... Re-use this!
// if you don't make it a pointer right away, then you make a whole new object in a different location, so the contiguous-ness of the array is jeopardized.
// Because we have IMMEDIATELY made a pointer to the original value,the first value in the array, we have preserved the contiguous-ness of the array.
fmt.Println("array length: ", len(arr))
var arrayLength C.int
arrayLength = C.int(len(arr))
// arrayLength = C.int(2)
fmt.Println("array length we are using: ", arrayLength)
arrayMean := C.pop_mean(arrayLength, firstValue)
fmt.Println("pop_mean (10, 20, 30): ", arrayMean)
This produces the following result:
array length: 3
array length we are using: 3
pop_mean (10, 20, 30): 20
Or if we uncomment the line that changes the arrayLength to be 2, we get this result:
array length: 3
array length we are using: 2
pop_mean (10, 20, 30): 15

Swift returning the indexes that will sort an array (similar to numpy argsort) [duplicate]

This question already has an answer here:
Index of element in sorted()
(1 answer)
Closed 7 years ago.
I'm trying to return the indices of an array which correspond to the sorted values. For example,
let arr = [7, 10, -3]
let idxs = argsort(arr) // [2, 0, 1]
My attempt works but is not pretty, and only functions for CGFloat. I'm looking for some ways in which I can improve the function, make it generic and easier to read. The code just looks ugly,
func argsortCGFloat( a : [CGFloat] ) -> [Int] {
/* 1. Values are wrapped in (index, values) tuples */
let wrapped_array = Array(Zip2(indices(a),a))
/* 2. A comparator compares the numerical value from
two tuples and the array is sorted */
func comparator(a: (index : Int, value : CGFloat), b: (index : Int, value : CGFloat)) -> Bool {
return a.value < b.value
}
var values = sorted(wrapped_array, comparator)
/* 3. The sorted indexes are extracted from the sorted
array of tuples */
var sorted_indexes: [Int] = []
for pair in values {
sorted_indexes.append(pair.0)
}
return sorted_indexes
}
You can do it by creating an array of indexes, and sorting them using the array from the outer context, like this:
func argsort<T:Comparable>( a : [T] ) -> [Int] {
var r = Array(indices(a))
r.sort({ a[$0] > a[$1] })
return r
}
let arr = [7, 10, -3]
let idxs = argsort(arr)
println (idxs)

Cartesian/combination algorithm (while maintaining order)

Since I don't quite know the language of these types of algorithms (i.e. how to google this), I'll just demonstrate what I'm looking for:
I have a three arrays (source arrays are of not equal lengths):
$array1 = array('A', 'B', 'C', 'D');
$array2 = array('x', 'y', 'z');
$array3 = array('1', '2', '3');
I would like all possible combinations of these arrays where:
No more than one element from each source array is taken.
The order of array1, array2, array3 is never broken (ABC always comes before xyz always comes before 123).
So the result would be:
array(
array('A', 'x', '1'),
array('A', 'x', '2'),
array('A', 'x', '3'),
array('A', 'y', '1'),
// etc ...
// But I also need all the partial sets, as long as the rule about
// ordering isn't broken i.e.:
array('B'),
array('B', 'x'),
array('B', 'x', '1'),
array('x'),
array('x', '1'),
array('1'),
);
The order of the results doesn't matter to me.
Working in php, but similar language or pseudo code is fine of course. Or I'd just take a tip on what specific types of permutation/combination algorithms I should be looking at.
I'd say these are Cartesian products. Generating them is quite easy.
for fixed number of arrays (in Perl):
for my $a(#arrayA) {
for my $b(#arrayB) {
push #result, [$a, $b];
}
}
general procedure: Assume #partial is an array for Cartesian product of A1 x A2 x ... x An and we want A1 x ... x An x An+1
for my $a(#partial) {
for my $b(#An_plus_1) {
push #result, [#$a, $b];
}
}
This would obviously need to iterate over all the arrays.
Now, that you want also to omit some of the elements in the sets, you just twist it a little. In the first method, you can just add another element to each of the arrays (undef is obvious choice, but anything will do) and then filter out these elements in the result sets. In the second method, it is even easier: You just add #partial and map { [$_] } #An_plus_1 to the result (or, in English, all the sets resulting from the partial Cartesian product of A1 x ... x An plus the single element sets made form the elements of the new set).
With RBarryYoung's hint, this is the shortest way to produce them, bash (and sed, to remove D, w, and 4):
echo {A..D}{w..z}{1..4} | sed 's/[Dw4]//g'
A1 A2 A3 A Ax1 Ax2 Ax3 Ax Ay1 Ay2 Ay3 Ay Az1 Az2 Az3 Az
B1 B2 B3 B Bx1 Bx2 Bx3 Bx By1 By2 By3 By Bz1 Bz2 Bz3 Bz
C1 C2 C3 C Cx1 Cx2 Cx3 Cx Cy1 Cy2 Cy3 Cy Cz1 Cz2 Cz3 Cz
1 2 3 x1 x2 x3 x y1 y2 y3 y z1 z2 z3 z
Another, easy way, is SQL, which does it by default:
SELECT upper, lower, num
FROM uppers, lowers, numbers
WHERE upper in ('A', 'B', 'C', ' ')
AND lower in (' ', 'x', 'y', 'z')
AND (number in (1, 2, 3) OR number IS NULL);
If your tables only contain 'A,B,C, ,' and 'x,y,z, ,' and '1,2,3, ' it is much shorter:
SELECT upper, lower, num
FROM uppers, lowers, numbers;
Another word, beside cartesian product, for this combinations is cross product.
For an unknown number of unknown size of Lists/Sequences/other collections, I would recommend an Iterator - if PHP has such things. Here is an implementation in Scala:
class CartesianIterator (val ll: Seq[Seq[_]]) extends Iterator [Seq[_]] {
var current = 0
def size = ll.map (_.size).product
lazy val last: Int = len
def get (n: Int, lili: Seq[Seq[_]]): List[_] = lili.length match {
case 0 => List ()
case _ => {
val inner = lili.head
inner (n % inner.size) :: get (n / inner.size, lili.tail)
}
}
override def hasNext () : Boolean = current != last
override def next (): Seq[_] = {
current += 1
get (current - 1, ll)
}
}
val ci = new CartesianIterator (List(List ('A', 'B', 'C', 'D', ' '), List ('x', 'y', 'z', ' '), List (1, 2, 3, 0)))
for (c <- ci) println (c)
List(A, x, 1)
List(B, x, 1)
List(C, x, 1)
List(D, x, 1)
List( , x, 1)
List(A, y, 1)
List(B, y, 1)
...
List( , z, 0)
List(A, , 0)
List(B, , 0)
List(C, , 0)
List(D, , 0)
List( , , 0)
A wrapper could be used to remove the '0' and ' ' from the output.

Implement a queue in which push_rear(), pop_front() and get_min() are all constant time operations

I came across this question:
Implement a queue in which push_rear(), pop_front() and get_min() are all constant time operations.
I initially thought of using a min-heap data structure which has O(1) complexity for a get_min(). But push_rear() and pop_front() would be O(log(n)).
Does anyone know what would be the best way to implement such a queue which has O(1) push(), pop() and min()?
I googled about this, and wanted to point out this Algorithm Geeks thread. But it seems that none of the solutions follow constant time rule for all 3 methods: push(), pop() and min().
Thanks for all the suggestions.
You can implement a stack with O(1) pop(), push() and get_min(): just store the current minimum together with each element. So, for example, the stack [4,2,5,1] (1 on top) becomes [(4,4), (2,2), (5,2), (1,1)].
Then you can use two stacks to implement the queue. Push to one stack, pop from another one; if the second stack is empty during the pop, move all elements from the first stack to the second one.
E.g for a pop request, moving all the elements from first stack [(4,4), (2,2), (5,2), (1,1)], the second stack would be [(1,1), (5,1), (2,1), (4,1)]. and now return top element from second stack.
To find the minimum element of the queue, look at the smallest two elements of the individual min-stacks, then take the minimum of those two values. (Of course, there's some extra logic here is case one of the stacks is empty, but that's not too hard to work around).
It will have O(1) get_min() and push() and amortized O(1) pop().
Okay - I think I have an answer that gives you all of these operations in amortized O(1), meaning that any one operation could take up to O(n), but any sequence of n operations takes O(1) time per operation.
The idea is to store your data as a Cartesian tree. This is a binary tree obeying the min-heap property (each node is no bigger than its children) and is ordered in a way such that an inorder traversal of the nodes gives you back the nodes in the same order in which they were added. For example, here's a Cartesian tree for the sequence 2 1 4 3 5:
1
/ \
2 3
/ \
4 5
It is possible to insert an element into a Cartesian tree in O(1) amortized time using the following procedure. Look at the right spine of the tree (the path from the root to the rightmost leaf formed by always walking to the right). Starting at rightmost node, scan upward along this path until you find the first node smaller than the node you're inserting.
Change that node so that its right child is this new node, then make that node's former right child the left child of the node you just added. For example, suppose that we want to insert another copy of 2 into the above tree. We walk up the right spine past the 5 and the 3, but stop below the 1 because 1 < 2. We then change the tree to look like this:
1
/ \
2 2
/
3
/ \
4 5
Notice that an inorder traversal gives 2 1 4 3 5 2, which is the sequence in which we added the values.
This runs in amortized O(1) because we can create a potential function equal to the number of nodes in the right spine of the tree. The real time required to insert a node is 1 plus the number of nodes in the spine we consider (call this k). Once we find the place to insert the node, the size of the spine shrinks by length k - 1, since each of the k nodes we visited are no longer on the right spine, and the new node is in its place. This gives an amortized cost of 1 + k + (1 - k) = 2 = O(1), for the amortized O(1) insert. As another way of thinking about this, once a node has been moved off the right spine, it's never part of the right spine again, and so we will never have to move it again. Since each of the n nodes can be moved at most once, this means that n insertions can do at most n moves, so the total runtime is at most O(n) for an amortized O(1) per element.
To do a dequeue step, we simply remove the leftmost node from the Cartesian tree. If this node is a leaf, we're done. Otherwise, the node can only have one child (the right child), and so we replace the node with its right child. Provided that we keep track of where the leftmost node is, this step takes O(1) time. However, after removing the leftmost node and replacing it with its right child, we might not know where the new leftmost node is. To fix this, we simply walk down the left spine of the tree starting at the new node we just moved to the leftmost child. I claim that this still runs in O(1) amortized time. To see this, I claim that a node is visited at most once during any one of these passes to find the leftmost node. To see this, note that once a node has been visited this way, the only way that we could ever need to look at it again would be if it were moved from a child of the leftmost node to the leftmost node. But all the nodes visited are parents of the leftmost node, so this can't happen. Consequently, each node is visited at most once during this process, and the pop runs in O(1).
We can do find-min in O(1) because the Cartesian tree gives us access to the smallest element of the tree for free; it's the root of the tree.
Finally, to see that the nodes come back in the same order in which they were inserted, note that a Cartesian tree always stores its elements so that an inorder traversal visits them in sorted order. Since we always remove the leftmost node at each step, and this is the first element of the inorder traversal, we always get the nodes back in the order in which they were inserted.
In short, we get O(1) amortized push and pop, and O(1) worst-case find-min.
If I can come up with a worst-case O(1) implementation, I'll definitely post it. This was a great problem; thanks for posting it!
Ok, here is one solution.
First we need some stuff which provide push_back(),push_front(),pop_back() and pop_front() in 0(1). It's easy to implement with array and 2 iterators. First iterator will point to front, second to back. Let's call such stuff deque.
Here is pseudo-code:
class MyQueue//Our data structure
{
deque D;//We need 2 deque objects
deque Min;
push(element)//pushing element to MyQueue
{
D.push_back(element);
while(Min.is_not_empty() and Min.back()>element)
Min.pop_back();
Min.push_back(element);
}
pop()//poping MyQueue
{
if(Min.front()==D.front() )
Min.pop_front();
D.pop_front();
}
min()
{
return Min.front();
}
}
Explanation:
Example let's push numbers [12,5,10,7,11,19] and to our MyQueue
1)pushing 12
D [12]
Min[12]
2)pushing 5
D[12,5]
Min[5] //5>12 so 12 removed
3)pushing 10
D[12,5,10]
Min[5,10]
4)pushing 7
D[12,5,10,7]
Min[5,7]
6)pushing 11
D[12,5,10,7,11]
Min[5,7,11]
7)pushing 19
D[12,5,10,7,11,19]
Min[5,7,11,19]
Now let's call pop_front()
we got
D[5,10,7,11,19]
Min[5,7,11,19]
The minimum is 5
Let's call pop_front() again
Explanation: pop_front will remove 5 from D, but it will pop front element of Min too, because it equals to D's front element (5).
D[10,7,11,19]
Min[7,11,19]
And minimum is 7. :)
Use one deque (A) to store the elements and another deque (B) to store the minimums.
When x is enqueued, push_back it to A and keep pop_backing B until the back of B is smaller than x, then push_back x to B.
when dequeuing A, pop_front A as return value, and if it is equal to the front of B, pop_front B as well.
when getting the minimum of A, use the front of B as return value.
dequeue and getmin are obviously O(1). For the enqueue operation, consider the push_back of n elements. There are n push_back to A, n push_back to B and at most n pop_back of B because each element will either stay in B or being popped out once from B. Over all there are O(3n) operations and therefore the amortized cost is O(1) as well for enqueue.
Lastly the reason this algorithm works is that when you enqueue x to A, if there are elements in B that are larger than x, they will never be minimums now because x will stay in the queue A longer than any elements in B (a queue is FIFO). Therefore we need to pop out elements in B (from the back) that are larger than x before we push x into B.
from collections import deque
class MinQueue(deque):
def __init__(self):
deque.__init__(self)
self.minq = deque()
def push_rear(self, x):
self.append(x)
while len(self.minq) > 0 and self.minq[-1] > x:
self.minq.pop()
self.minq.append(x)
def pop_front(self):
x = self.popleft()
if self.minq[0] == x:
self.minq.popleft()
return(x)
def get_min(self):
return(self.minq[0])
If you don't mind storing a bit of extra data, it should be trivial to store the minimum value. Push and pop can update the value if the new or removed element is the minimum, and returning the minimum value is as simple as getting the value of the variable.
This is assuming that get_min() does not change the data; if you would rather have something like pop_min() (i.e. remove the minimum element), you can simply store a pointer to the actual element and the element preceding it (if any), and update those accordingly with push_rear() and pop_front() as well.
Edit after comments:
Obviously this leads to O(n) push and pop in the case that the minimum changes on those operations, and so does not strictly satisfy the requirements.
You Can actually use a LinkedList to maintain the Queue.
Each element in LinkedList will be of Type
class LinkedListElement
{
LinkedListElement next;
int currentMin;
}
You can have two pointers One points to the Start and the other points to the End.
If you add an element to the start of the Queue. Examine the Start pointer and the node to insert. If node to insert currentmin is less than start currentmin node to insert currentmin is the minimum. Else update the currentmin with start currentmin.
Repeat the same for enque.
JavaScript implementation
(Credit to adamax's solution for the idea; I loosely based an implementation on it. Jump to the bottom to see fully commented code or read through the general steps below. Note that this finds the maximum value in O(1) constant time rather than the minimum value--easy to change up):
The general idea is to create two Stacks upon construction of the MaxQueue (I used a linked list as the underlying Stack data structure--not included in the code; but any Stack will do as long as it's implemented with O(1) insertion/deletion). One we'll mostly pop from (dqStack) and one we'll mostly push to (eqStack).
Insertion: O(1) worst case
For enqueue, if the MaxQueue is empty, we'll push the value to dqStack along with the current max value in a tuple (the same value since it's the only value in the MaxQueue); e.g.:
const m = new MaxQueue();
m.enqueue(6);
/*
the dqStack now looks like:
[6, 6] - [value, max]
*/
If the MaxQueue is not empty, we push just the value to eqStack;
m.enqueue(7);
m.enqueue(8);
/*
dqStack: eqStack: 8
[6, 6] 7 - just the value
*/
then, update the maximum value in the tuple.
/*
dqStack: eqStack: 8
[6, 8] 7
*/
Deletion: O(1) amortized
For dequeue we'll pop from dqStack and return the value from the tuple.
m.dequeue();
> 6
// equivalent to:
/*
const tuple = m.dqStack.pop() // [6, 8]
tuple[0];
> 6
*/
Then, if dqStack is empty, move all values in eqStack to dqStack, e.g.:
// if we build a MaxQueue
const maxQ = new MaxQueue(3, 5, 2, 4, 1);
/*
the stacks will look like:
dqStack: eqStack: 1
4
2
[3, 5] 5
*/
As each value is moved over, we'll check if it's greater than the max so far and store it in each tuple:
maxQ.dequeue(); // pops from dqStack (now empty), so move all from eqStack to dqStack
> 3
// as dequeue moves one value over, it checks if it's greater than the ***previous max*** and stores the max at tuple[1], i.e., [data, max]:
/*
dqStack: [5, 5] => 5 > 4 - update eqStack:
[2, 4] => 2 < 4 - no update
[4, 4] => 4 > 1 - update
[1, 1] => 1st value moved over so max is itself empty
*/
Because each value is moved to dqStack at most once, we can say that dequeue has O(1) amortized time complexity.
Finding the maximum value: O(1) worst case
Then, at any point in time, we can call getMax to retrieve the current maximum value in O(1) constant time. As long as the MaxQueue is not empty, the maximum value is easily pulled out of the next tuple in dqStack.
maxQ.getMax();
> 5
// equivalent to calling peek on the dqStack and pulling out the maximum value:
/*
const peekedTuple = maxQ.dqStack.peek(); // [5, 5]
peekedTuple[1];
> 5
*/
Code
class MaxQueue {
constructor(...data) {
// create a dequeue Stack from which we'll pop
this.dqStack = new Stack();
// create an enqueue Stack to which we'll push
this.eqStack = new Stack();
// if enqueueing data at construction, iterate through data and enqueue each
if (data.length) for (const datum of data) this.enqueue(datum);
}
enqueue(data) { // O(1) constant insertion time
// if the MaxQueue is empty,
if (!this.peek()) {
// push data to the dequeue Stack and indicate it's the max;
this.dqStack.push([data, data]); // e.g., enqueue(8) ==> [data: 8, max: 8]
} else {
// otherwise, the MaxQueue is not empty; push data to enqueue Stack
this.eqStack.push(data);
// save a reference to the tuple that's next in line to be dequeued
const next = this.dqStack.peek();
// if the enqueueing data is > the max in that tuple, update it
if (data > next[1]) next[1] = data;
}
}
moveAllFromEqToDq() { // O(1) amortized as each value will move at most once
// start max at -Infinity for comparison with the first value
let max = -Infinity;
// until enqueue Stack is empty,
while (this.eqStack.peek()) {
// pop from enqueue Stack and save its data
const data = this.eqStack.pop();
// if data is > max, set max to data
if (data > max) max = data;
// push to dequeue Stack and indicate the current max; e.g., [data: 7: max: 8]
this.dqStack.push([data, max]);
}
}
dequeue() { // O(1) amortized deletion due to calling moveAllFromEqToDq from time-to-time
// if the MaxQueue is empty, return undefined
if (!this.peek()) return;
// pop from the dequeue Stack and save it's data
const [data] = this.dqStack.pop();
// if there's no data left in dequeue Stack, move all data from enqueue Stack
if (!this.dqStack.peek()) this.moveAllFromEqToDq();
// return the data
return data;
}
peek() { // O(1) constant peek time
// if the MaxQueue is empty, return undefined
if (!this.dqStack.peek()) return;
// peek at dequeue Stack and return its data
return this.dqStack.peek()[0];
}
getMax() { // O(1) constant time to find maximum value
// if the MaxQueue is empty, return undefined
if (!this.peek()) return;
// peek at dequeue Stack and return the current max
return this.dqStack.peek()[1];
}
}
#include <iostream>
#include <queue>
#include <deque>
using namespace std;
queue<int> main_queue;
deque<int> min_queue;
void clearQueue(deque<int> &q)
{
while(q.empty() == false) q.pop_front();
}
void PushRear(int elem)
{
main_queue.push(elem);
if(min_queue.empty() == false && elem < min_queue.front())
{
clearQueue(min_queue);
}
while(min_queue.empty() == false && elem < min_queue.back())
{
min_queue.pop_back();
}
min_queue.push_back(elem);
}
void PopFront()
{
int elem = main_queue.front();
main_queue.pop();
if (elem == min_queue.front())
{
min_queue.pop_front();
}
}
int GetMin()
{
return min_queue.front();
}
int main()
{
PushRear(1);
PushRear(-1);
PushRear(2);
cout<<GetMin()<<endl;
PopFront();
PopFront();
cout<<GetMin()<<endl;
return 0;
}
This solution contains 2 queues:
1. main_q - stores the input numbers.
2. min_q - stores the min numbers by certain rules that we'll described (appear in functions MainQ.enqueue(x), MainQ.dequeue(), MainQ.get_min()).
Here's the code in Python. Queue is implemented using a List.
The main idea lies in the MainQ.enqueue(x), MainQ.dequeue(), MainQ.get_min() functions.
One key assumption is that emptying a queue takes o(0).
A test is provided at the end.
import numbers
class EmptyQueueException(Exception):
pass
class BaseQ():
def __init__(self):
self.l = list()
def enqueue(self, x):
assert isinstance(x, numbers.Number)
self.l.append(x)
def dequeue(self):
return self.l.pop(0)
def peek_first(self):
return self.l[0]
def peek_last(self):
return self.l[len(self.l)-1]
def empty(self):
return self.l==None or len(self.l)==0
def clear(self):
self.l=[]
class MainQ(BaseQ):
def __init__(self, min_q):
super().__init__()
self.min_q = min_q
def enqueue(self, x):
super().enqueue(x)
if self.min_q.empty():
self.min_q.enqueue(x)
elif x > self.min_q.peek_last():
self.min_q.enqueue(x)
else: # x <= self.min_q.peek_last():
self.min_q.clear()
self.min_q.enqueue(x)
def dequeue(self):
if self.empty():
raise EmptyQueueException("Queue is empty")
x = super().dequeue()
if x == self.min_q.peek_first():
self.min_q.dequeue()
return x
def get_min(self):
if self.empty():
raise EmptyQueueException("Queue is empty, NO minimum")
return self.min_q.peek_first()
INPUT_NUMS = (("+", 5), ("+", 10), ("+", 3), ("+", 6), ("+", 1), ("+", 2), ("+", 4), ("+", -4), ("+", 100), ("+", -40),
("-",None), ("-",None), ("-",None), ("+",-400), ("+",90), ("-",None),
("-",None), ("-",None), ("-",None), ("-",None), ("-",None), ("-",None), ("-",None), ("-",None))
if __name__ == '__main__':
min_q = BaseQ()
main_q = MainQ(min_q)
try:
for operator, i in INPUT_NUMS:
if operator=="+":
main_q.enqueue(i)
print("Added {} ; Min is: {}".format(i,main_q.get_min()))
print("main_q = {}".format(main_q.l))
print("min_q = {}".format(main_q.min_q.l))
print("==========")
else:
x = main_q.dequeue()
print("Removed {} ; Min is: {}".format(x,main_q.get_min()))
print("main_q = {}".format(main_q.l))
print("min_q = {}".format(main_q.min_q.l))
print("==========")
except Exception as e:
print("exception: {}".format(e))
The output of the above test is:
"C:\Program Files\Python35\python.exe" C:/dev/python/py3_pocs/proj1/priority_queue.py
Added 5 ; Min is: 5
main_q = [5]
min_q = [5]
==========
Added 10 ; Min is: 5
main_q = [5, 10]
min_q = [5, 10]
==========
Added 3 ; Min is: 3
main_q = [5, 10, 3]
min_q = [3]
==========
Added 6 ; Min is: 3
main_q = [5, 10, 3, 6]
min_q = [3, 6]
==========
Added 1 ; Min is: 1
main_q = [5, 10, 3, 6, 1]
min_q = [1]
==========
Added 2 ; Min is: 1
main_q = [5, 10, 3, 6, 1, 2]
min_q = [1, 2]
==========
Added 4 ; Min is: 1
main_q = [5, 10, 3, 6, 1, 2, 4]
min_q = [1, 2, 4]
==========
Added -4 ; Min is: -4
main_q = [5, 10, 3, 6, 1, 2, 4, -4]
min_q = [-4]
==========
Added 100 ; Min is: -4
main_q = [5, 10, 3, 6, 1, 2, 4, -4, 100]
min_q = [-4, 100]
==========
Added -40 ; Min is: -40
main_q = [5, 10, 3, 6, 1, 2, 4, -4, 100, -40]
min_q = [-40]
==========
Removed 5 ; Min is: -40
main_q = [10, 3, 6, 1, 2, 4, -4, 100, -40]
min_q = [-40]
==========
Removed 10 ; Min is: -40
main_q = [3, 6, 1, 2, 4, -4, 100, -40]
min_q = [-40]
==========
Removed 3 ; Min is: -40
main_q = [6, 1, 2, 4, -4, 100, -40]
min_q = [-40]
==========
Added -400 ; Min is: -400
main_q = [6, 1, 2, 4, -4, 100, -40, -400]
min_q = [-400]
==========
Added 90 ; Min is: -400
main_q = [6, 1, 2, 4, -4, 100, -40, -400, 90]
min_q = [-400, 90]
==========
Removed 6 ; Min is: -400
main_q = [1, 2, 4, -4, 100, -40, -400, 90]
min_q = [-400, 90]
==========
Removed 1 ; Min is: -400
main_q = [2, 4, -4, 100, -40, -400, 90]
min_q = [-400, 90]
==========
Removed 2 ; Min is: -400
main_q = [4, -4, 100, -40, -400, 90]
min_q = [-400, 90]
==========
Removed 4 ; Min is: -400
main_q = [-4, 100, -40, -400, 90]
min_q = [-400, 90]
==========
Removed -4 ; Min is: -400
main_q = [100, -40, -400, 90]
min_q = [-400, 90]
==========
Removed 100 ; Min is: -400
main_q = [-40, -400, 90]
min_q = [-400, 90]
==========
Removed -40 ; Min is: -400
main_q = [-400, 90]
min_q = [-400, 90]
==========
Removed -400 ; Min is: 90
main_q = [90]
min_q = [90]
==========
exception: Queue is empty, NO minimum
Process finished with exit code 0
Java Implementation
import java.io.*;
import java.util.*;
public class queueMin {
static class stack {
private Node<Integer> head;
public void push(int data) {
Node<Integer> newNode = new Node<Integer>(data);
if(null == head) {
head = newNode;
} else {
Node<Integer> prev = head;
head = newNode;
head.setNext(prev);
}
}
public int pop() {
int data = -1;
if(null == head){
System.out.println("Error Nothing to pop");
} else {
data = head.getData();
head = head.getNext();
}
return data;
}
public int peek(){
if(null == head){
System.out.println("Error Nothing to pop");
return -1;
} else {
return head.getData();
}
}
public boolean isEmpty(){
return null == head;
}
}
static class stackMin extends stack {
private stack s2;
public stackMin(){
s2 = new stack();
}
public void push(int data){
if(data <= getMin()){
s2.push(data);
}
super.push(data);
}
public int pop(){
int value = super.pop();
if(value == getMin()) {
s2.pop();
}
return value;
}
public int getMin(){
if(s2.isEmpty()) {
return Integer.MAX_VALUE;
}
return s2.peek();
}
}
static class Queue {
private stackMin s1, s2;
public Queue(){
s1 = new stackMin();
s2 = new stackMin();
}
public void enQueue(int data) {
s1.push(data);
}
public int deQueue() {
if(s2.isEmpty()) {
while(!s1.isEmpty()) {
s2.push(s1.pop());
}
}
return s2.pop();
}
public int getMin(){
return Math.min(s1.isEmpty() ? Integer.MAX_VALUE : s1.getMin(), s2.isEmpty() ? Integer.MAX_VALUE : s2.getMin());
}
}
static class Node<T> {
private T data;
private T min;
private Node<T> next;
public Node(T data){
this.data = data;
this.next = null;
}
public void setNext(Node<T> next){
this.next = next;
}
public T getData(){
return this.data;
}
public Node<T> getNext(){
return this.next;
}
public void setMin(T min){
this.min = min;
}
public T getMin(){
return this.min;
}
}
public static void main(String args[]){
try {
FastScanner in = newInput();
PrintWriter out = newOutput();
// System.out.println(out);
Queue q = new Queue();
int t = in.nextInt();
while(t-- > 0) {
String[] inp = in.nextLine().split(" ");
switch (inp[0]) {
case "+":
q.enQueue(Integer.parseInt(inp[1]));
break;
case "-":
q.deQueue();
break;
case "?":
out.println(q.getMin());
default:
break;
}
}
out.flush();
out.close();
} catch(IOException e){
e.printStackTrace();
}
}
static class FastScanner {
static BufferedReader br;
static StringTokenizer st;
FastScanner(File f) {
try {
br = new BufferedReader(new FileReader(f));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
public FastScanner(InputStream f) {
br = new BufferedReader(new InputStreamReader(f));
}
String next() {
while (st == null || !st.hasMoreTokens()) {
try {
st = new StringTokenizer(br.readLine());
} catch (IOException e) {
e.printStackTrace();
}
}
return st.nextToken();
}
String nextLine(){
String str = "";
try {
str = br.readLine();
} catch (IOException e) {
e.printStackTrace();
}
return str;
}
int nextInt() {
return Integer.parseInt(next());
}
long nextLong() {
return Long.parseLong(next());
}
double nextDoulbe() {
return Double.parseDouble(next());
}
}
static FastScanner newInput() throws IOException {
if (System.getProperty("JUDGE") != null) {
return new FastScanner(new File("input.txt"));
} else {
return new FastScanner(System.in);
}
}
static PrintWriter newOutput() throws IOException {
if (System.getProperty("JUDGE") != null) {
return new PrintWriter("output.txt");
} else {
return new PrintWriter(System.out);
}
}
}
We know that push and pop are constant time operations [O(1) to be precise].
But when we think of get_min()[i.e to find the current minimum number in the queue] generally the first thing that comes to mind is searching the whole queue every time the request for the minimum element is made. But this will never give the constant time operation, which is the main aim of the problem.
This is generally asked very frequently in the interviews, so you must know the trick
To do this we have to use two more queues which will keep the track of minimum element and we have to go on modifying these 2 queues as we do push and pop operations on the queue so that minimum element is obtained in O(1) time.
Here is the self-descriptive pseudo code based on the above approach mentioned.
Queue q, minq1, minq2;
isMinq1Current=true;
void push(int a)
{
q.push(a);
if(isMinq1Current)
{
if(minq1.empty) minq1.push(a);
else
{
while(!minq1.empty && minq1.top < =a) minq2.push(minq1.pop());
minq2.push(a);
while(!minq1.empty) minq1.pop();
isMinq1Current=false;
}
}
else
{
//mirror if(isMinq1Current) branch.
}
}
int pop()
{
int a = q.pop();
if(isMinq1Current)
{
if(a==minq1.top) minq1.pop();
}
else
{
//mirror if(isMinq1Current) branch.
}
return a;
}

Resources