Sudden slowdown with lazy sequence - performance

I had this code:
:history
(cons [t (:latest thing)] (take n (:history thing)) )
which was for adding a rolling window of recent history to a map on each iteration of my program. What I found was that after a certain number of iterations (~50) my program would slow down dramatically and progressively.
whereas, if I changed the code to:
:history
(cons [t (:latest thing)] (take n (vec (:history thing))) )
then the program ran slightly longer on each iteration (as the lazy seq was being realized), but ran consistently and did not slow down.
Being new to Clojure I don't understand...is it to do with chunked sequences ?

I think by program slowdown you mean to say "the consumption of this lazy sequence slows down as the sequence becomes bigger due to may cons operations that your first code sample does". This is because when you build a lazy sequence using lazy operators like cons, it creates a chain of operations (functions) to generate the sequence and this chain of operation will gets executed every time you consume this lazy sequence for ex: 50 operations of cons will create a 50 chained function calls, which gets executed every time you consume the sequence which will obviously be slow than having a vector of 50 elements and consuming it.
In your second case the lazy sequence will have only 1 operation (i.e the cons) and rest will be take operation from a already realized vector (i.e the vector call)

Related

What is the performance cost of converting between seqs and vectors?

Many core Clojure functions return lazy sequences, even when vectors are passed into them. For example, if I had a vector of numbers, and wanted to filter them based on some predicate but get another vector back, I'd have to do something like this:
(into [] (filter my-pred my-vec))
Or:
(vec (filter my-pred my-vec))
Though I'm not sure if there's any meaningful difference between the two.
Is this operation expensive, or do you get it effectively for free, as when converting to/from a transient?
I understand that the seq is lazy so nothing will actually get calculated until you plop it into the output vector, but is there an overhead to converting from a seq and a concrete collection? Can it be characterized in terms of big-O, or does big-O not make sense here? What about the other way, when converting from a vector to a seq?
There's an FAQ in the Clojure site for good use cases for transducers, which could be handy for some complex transformations (more than just filtering, or when the predicate is fairly complex). Otherwise you can leverage on filterv, which is on the core library and you can assume it does any reasonable optimization for you.
TL;DR Don't worry about it
Longer version:
The main cost is memory allocation/GC. Usually this is trivial. If you have too much data to fit simultaneously in RAM, the lazy version can save you.
If you want to measure toy problems, you can experiment with the Criterium library. Try powers of 10 from 10^2 up to 10^9.
(crit/quick-bench (println :sum (reduce + 0 (into [] (range (Math/pow 10 N))))))
for N=2..9 with and without the (into [] ...) part.

About speed of procedures between user-made and built-in in scheme (related with SICP exercise 1.23)

//My question was so long So I reduced.
In scheme, user-made procedures consume more time than built-in procedures?
(If both's functions are same)
//This is my short version question.
//Below is long long version question.
EX 1.23 is problem(below), why the (next) procedure isn't twice faster than (+ 1)?
This is my guess.
reason 1 : (next) contains 'if' (special-form) and it consumes time.
reason 2 : function call consumes time.
http://community.schemewiki.org/?sicp-ex-1.23 says reason 1 is right.
And I want to know reason 2 is also right.
SO I rewrote the (next) procedure. I didn't use 'if' and checked the number divided by 2 just once before use (next)(so (next) procedure only do + 2). And I remeasured the time. It was more fast than before BUT still not 2. SO I rewrote again. I changed (next) to (+ num 2). Finally It became 2 or almost 2. And I thought why. This is why I guess the 'reason 2'. I want to know what is correct answer.
ps. I'm also curious about why some primes are being tested (very?) fast than others? It doesn't make sense because if a number n is prime, process should see from 2 to sqrt(n). But some numbers are tested faster. Do you know why some primes are tested faster?
Exercise 1.23. The smallest-divisor procedure shown at the start of this section does lots of needless testing: After it checks to see if the number is divisible by 2 there is no point in checking to see if it is divisible by any larger even numbers. This suggests that the values used for test-divisor should not be 2, 3, 4, 5, 6, ..., but rather 2, 3, 5, 7, 9, .... To implement this change, define a procedure next that returns 3 if its input is equal to 2 and otherwise returns its input plus 2. Modify the smallest-divisor procedure to use (next test-divisor) instead of (+ test-divisor 1). With timed-prime-test incorporating this modified version of smallest-divisor, run the test for each of the 12 primes found in exercise 1.22. Since this modification halves the number of test steps, you should expect it to run about twice as fast. Is this expectation confirmed? If not, what is the observed ratio of the speeds of the two algorithms, and how do you explain the fact that it is different from 2?
Where the book is :
http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-11.html#%_sec_1.2.6
Your long ans short questions are actually addressing two different problems.
EX 1.23 is problem(below), why the (next) procedure isn't twice faster than (+ 1)?
You provide two possible reasons to explain the relative lack of speed up (and 30% speed gain is already a good achievement):
The apparent use of a function next instead of a simple arithmetic expression (as I understand it from your explanations),
the inclusion of a test in that very next function,
The first reason is an illusion: (+ 1) is a function incrementing its argument, so in both cases there is a function call (although the increment function is certainly a builtin one, a point which is addressed by your other question).
The second reason is indeed relevant: any test in a block of code will introduce a potential redirection in the code flow (a jump from the current executing instruction to some other address in the program), which may incur a delay. Note that this is analogous to a function call, which will also induce an address jump, so both reasons actually resolve to only one potential cause.
Regarding your short question, builtin functions are indeed usually faster, because the compiler is able to apply a special treatment to them in certain cases. This is due to the facts that:
knowing the semantics of the builtins, compiler designers are able to include special rules pertaining to the algebraic properties of these builtins, and for instance fuse successive incréments in a single call, or suppress a combination of increment and decrement called in sequence.
A builtin function call, when not optimised away, will be converted into a native machine code function call, which may not have to abide to all the calling convention rules. If your scheme compiler produce machine code from the source, then there might be only a marginal gain, but if it produce so called bytecode, the gain might be quite substantial, since user written functions will be translated to that bytecode format, and still require some form of interpretation. If you are only using an interpreter, then the gain is even more important.
I believe this is highly implementation and setting dependent. In many implementations there are different kinds of optimizations and in some there are none. To get the best performance you may need to compile your code or use settings to reduce debug information / stack traces. Getting the best performance in one implementation can worsen the performance in another.
Primitive procedures are usually compiled to be native and in some implementations, like ikarus, it's even inlined. (When you do (map car lst) ikarus changes it to (map (lambda (x) (car x)) lst) since car isn't a procedure) Lambdas are supposed to be cheap.. Remember many scheme implementations change your code to CPS and that is one procedure call for each expression in the body of a procedure call. It will never be as fast as machine code since it needs to do load closure variables.
To check which of the two options which are correct for your implementation make next do the same as it originally did. eg. no if but just increment the argument. The difference now is the extra call and nothing else. Then you can inline next by writing it's code directly in your procedure and substituting arguments for the operands. Is it still slower, then it's if. you need to run the tests several times, preferably with large enough number of primes to produce that it runs for a minute or so. Use time or similar in both the Scheme implementations to get the differences in ms. I use unix time command as well to see how the OS reflects on it.
You should also test to see if you get the same reason in some other implementation. It's not like it's not enough Scheme implementations out there so know yourself out! The differences between them might amaze you. I always use racket (raco exe source to make executable) and Ikarus. WHen doing a large test I include Chicken, Gambit and Chibi.

Lazy Evaluation: Why is it faster, advantages vs disadvantages, mechanics (why it uses less cpu; examples?) and simple proof of concept examples [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Lazy evaluation is said to be a way of delaying a process until the first time it is needed. This tends to avoid repeated evaluations and thats why I would imagine that is performing a lot faster.
Functional language like Haskell (and JavaScript..?) have this functionality built-in.
However, I don't understand how and why other 'normal' approaches (that is; same functionality but not using lazy evaluation) are slower.. how and why do these other approaches do repeated evaluations? Can someone elaborate on this by giving simple examples and explaining the mechanics of each approach?
Also, according to Wikipedia page about lazy evaluation these are said to be the advantages of this approach:
Performance increases by avoiding needless calculations, and error
conditions in evaluating compound expressions
The ability to construct potentially infinite data structures
The ability to define control flow (structures) as abstractions
instead of primitives
However, can we just control the calculations needed and avoid repeating the same ones? (1)
We can use i.e. a Linked List to create an infinite data structure (2)
Can we do (3) already..??? We can define classes/templates/objects and use those instead of primitives (i.e JavaScript).
Additionally, it seems to me that (at least from the cases i have seen), lazy evaluation goes hand-to-hand with recursion and using the 'head' and 'tail' (along with others) notions. Surely, there are cases where recursion is useful but is lazy evaluation something more than that...? more than a recursive approach to solving a problem..? Streamjs is JavaScript library that uses recursion along with some other simple operations (head,tail,etc) to perform lazy evaluation.
It seems i can't get my head around it...
Thanks in advance for any contribution.
I'll show examples in both Python 2.7 and Haskell.
Say, for example, you wanted to do a really inefficient sum of all the numbers from 0 to 10,000,000. You could do this with a for loop in Python as
total = 0
for i in range(10000000):
total += i
print total
On my computer, this takes about 1.3s to execute. If instead, I changed range to xrange (the generator form of range, lazily produces a sequence of numbers), it takes 1.2s, only slightly faster. However, if I check the memory used (using the memory_profiler package), the version with range uses about 155MB of RAM, while the xrange version uses only 1MB of RAM (both numbers not including the ~11MB Python uses). This is an incredibly dramatic difference, and we can see where it comes from with this tool as well:
Mem usage Increment Line Contents
===========================================
10.875 MiB 0.004 MiB total = 0
165.926 MiB 155.051 MiB for i in range(10000000):
165.926 MiB 0.000 MiB total += i
return total
This says that before we started we were using 10.875MB, total = 0 added 0.004MB, and then for i in range(10000000): added 155.051MB when it generated the entire list of numbers [0..9999999]. If we compare to the xrange version:
Mem usage Increment Line Contents
===========================================
11.000 MiB 0.004 MiB total = 0
11.109 MiB 0.109 MiB for i in xrange(10000000):
11.109 MiB 0.000 MiB total += i
return total
So we started with 11MB and for i in xrange(10000000): added only 0.109MB. This is a huge memory savings by only adding a single letter to the code. While this example is fairly contrived, it shows how not computing a whole list until the element is needed can make things a lot more memory efficient.
Python has iterators and generators which act as a sort of "lazy" programming for when you need to yield sequences of data (although there's nothing stopping you from using them for single values), but Haskell has laziness built into every value in the language, even user-defined ones. This lets you take advantage of things like data structures that won't fit in memory without having to program complicated ways around that fact. The canonical example would be the fibonacci sequence:
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
which very elegantly expresses this famous sequence to define a recursive infinite list generating all fibonacci numbers. It's CPU efficient because all values are cached, so each element only has to be computed once (compared to a naive recursive implementation)1, but if you calculate too many elements your computer will eventually run out of RAM because you're now storing this huge list of numbers. This is an example where lazy programming lets you have CPU efficiency, but not RAM efficiency. There is a way around this, though. If you were to write
fib :: Int -> Integer
fib n = let fibs = 1 : 1 : zipWith (+) fibs (tail fibs) in fibs !! n
then this runs in near-constant memory, and does so very quickly, but memoization is lost as subsequent calls to fib have to recompute fibs.
A more complex example can be found here, where the author shows how to use lazy programming and recursion in Haskell to perform dynamic programming with arrays, a feat that most initially think is very difficult and requires mutation, but Haskell manages to do very easily with "tying the knot" style recursion. It results in both CPU and RAM efficiency, and does so in fewer lines than I'd expect in C/C++.
All this being said, there are plenty of cases where lazy programming is annoying. Often you can build up huge numbers of thunks instead of computing things as you go (I'm looking at you, foldl), and some strictness has to be introduced to attain efficiency. It also bites a lot of people with IO, when you read a file to a string as a thunk, close the file, and then try to operate on that string. It's only after the file is closed that the thunk gets evaluated, causing an IO error to occur and crashes your program. As with anything, lazy programming is not without its flaws, gotchas, and pitfalls. It takes time to learn how to work with it well, and to know what its limitations are.
1) By "naive recursive implementation", I mean implementing the fibonacci sequence as
fib :: Integer -> Integer
fib 0 = 1
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
With this implementation, you can see the mathematical definition very clearly, it's very much in the style of inductive proofs, you show your base cases and then the general case. However, if I call fib 5, this will "expand" into something like
fib 5 = fib 4 + fib 3
= fib 3 + fib 2 + fib 2 + fib 1
= fib 2 + fib 1 + fib 1 + fib 0 + fib 1 + fib 0 + fib 1
= fib 1 + fib 0 + fib 1 + fib 1 + fib 0 + fib 1 + fib 0 + fib 1
= 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
= 8
When instead we'd like to share some of those computations, that way fib 3 only gets computed once, fib 2 only gets computed once, etc.
By using a recursively defined list in Haskell, we can avoid this. Internally, this list is represented something like this:
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
= 1 : 1 : zipWith (+) (f1:f2:fs) (f2:fs)
^--------------------^ ^ ^
^-------------------|-------|
= 1 : 1 : 2 : zipWith (+) (f2:f3:fs) (f3:fs)
^--------------------^ ^ ^
^-------------------|-------|
= 1 : 1 : 2 : 3 : zipWith (+) (f3:f4:fs) (f4:fs)
^--------------------^ ^ ^
^-------------------|-------|
So hopefully you can see the pattern forming here, as the list is build, it keeps pointers back to the last two elements generated in order to compute the next element. This means that for the nth element computed, there are n-2 additions performed. Even for the naive fib 5, you can see that there are more additions performed than that, and the number of additions will continue to grow exponentially. This definition is made possible through laziness and recursions, letting us turn an O(2^n) algorithm into an O(n) algorithm, but we have to give up RAM to do so. If this is defined at the top level, then values are cached for the lifetime of the program. It does mean that if you need to refer to the 1000th element repeatedly, you don't have to recompute it, just index it.
On the other hand, the definition
fib :: Int -> Integer
fib n =
let fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
in fibs !! n
uses a local copy of fibs every time fib is called. We don't get caching between calls to fib, but we do get local caching, leaving our complexity O(n). Additionally, GHC is smart enough to know that we don't have to keep the beginning of the list around after we've used it to calculate the next element, so as we traverse fibs looking for the nth element, it only needs to hold on to 2-3 elements and a thunk pointing at the next element. This saves us RAM while computing it, and since it isn't defined at a global level it doesn't eat up RAM over the lifetime of the program. It's a tradeoff between when we want to spend RAM and CPU cycles, and different approaches are better for different situations. These techniques are applicable to much of Haskell programming in general, not just for this sequence!
Lazy evaluation is not, in general, faster. When it's said that lazy evaluation is more efficient, it is because when you consider Lambda Calculus (which is essentially what your Haskell programs are once the compiler finishes de-sugaring them) as a system of terms and reduction rules, then applying those rules in the order specified by the rules of a call-by-name with sharing evaluation policy always applies the same or fewer reduction rules than when you follow the rules in the order specified by call-by-value evaluation.
The reason that this theoretical result does not make lazy evaluation faster in general is that the translation to a linear sequential machine model with a memory access bottleneck tends to make all the reductions performed much more expensive! Initial attempts at implementing this model on computers led to programs that executed orders of magnitude more slowly than typical eagerly-evaluating language implementations. It has taken a lot of research and engineering into techniques for implementing lazy evaluation efficiently to get Haskell performance to where it is today. And the fastest Haskell programs take advantage of a form of static analysis called "strictness analysis" which attempts to determine at compile time which expressions will always be needed so that they can be evaluated eagerly rather than lazily.
There are still some cases where straightforward implementations of algorithms will execute faster in Haskell due to only evaluating terms that are needed for the result, but even eager languages always have some facility for evaluating some expressions by need. Conditionals and short-circuiting boolean expressions are ubiquitous examples, and in many eager languages, one can also delay evaluation by wrapping an expression in an anonymous function or some other sort of delaying form. So you can typically use these mechanisms (or even more awkward rewrites) to avoid evaluating expensive things that won't be necessary in an eager language.
The real advantage of Haskell's lazy evaluation is not a performance-related one. Haskell makes it easier to pull expressions apart, re-combine them in different ways, and generally reason about code as if it were a system of mathematical equations instead of being a sequentially-evaluated set of machine instructions. By not specifying any evaluation order, it forced the developers of the language to avoid side-effects that rely on a simple evaluation ordering, such as mutation or IO. This in turn led to a host of elegant abstractions that are generally useful and might not have been developed into usability otherwise.
The state of Haskell is now such that you can write high-level, elegant algorithms that make better re-use of existing higher-order functions and data structures than in nearly any other high-level typed language. And once you become familiar with the costs and benefits of lazy evaluation and how to control when it occurs, you can ensure that the elegant code also performs very well. But getting the elegant code to a state of high performance is not necessarily automatic and may require a bit more thought than in a similar but eagerly-evaluated language.
The concept of "lazy evaluation" is only about 1 thing, and only about that 1 thing:
The ability to postpone evaluation of something until needed
That's it.
Everything else in that wikipedia article follows from it.
Infinite data structures? Not a problem. We'll just make sure we don't actually figure out what the next element is until you actually ask for it. For instance, asking some code what the next value after X is, if the operation to perform is just to increase X by 1, will be infite. If you create a list containing all those values, it's going to fill your available memory in the computer. If you only figure out what the next value is when asked, not so much.
Needless calculations? Sure. You can return an object containing a lot of properties that when asked will provide you with some value. If you don't ask (ie. never inspect the value of a given property), the calculation necessary to figure out the value of that property will never be done.
Control flow ... ? Not at all sure what that is about.
The purpose of lazy evaluation of something is exactly as I stated to begin with, to avoid evaluating something until you actually need it. Be it the next value of something, the value of a property, whatever, adding support for lazy evaluation might conserve CPU cycles.
What would the alternative be?
I want to return an object to the calling code, containing any number of properties, some of which might be expensive to calculate. Without lazy evaluation, I would have to calculate the values of all those properties either:
Before constructing the object
After constructing the object, on the first time you inspected a property
After constructing the object, every time you inspected that property
With lazy evaluation you usually end up with number 2. You postpone evaluating the value of that property until some code inspects it. Note that you might cache the value once evaluated, which would save CPU cycles when inspecting the same property more than once, but that is caching, not quite the same, but in the same line of work: optimizations.

Efficiency: recursion vs loop

This is just curiosity on my part, but what is more efficient, recursion or a loop?
Given two functions (using common lisp):
(defun factorial_recursion (x)
(if (> x 0)
(* x (factorial_recursion (decf x)))
1))
and
(defun factorial_loop (x)
(loop for i from 1 to x for result = 1 then
(* result i) finally
(return result)))
Which is more efficient?
I don't even have to read your code.
Loop is more efficient for factorials. When you do recursion, you have up to x function calls on the stack.
You almost never use recursion for performance reasons. You use recursion to make the problem more simple.
Mu.
Seriously now, it doesn't matter. Not for examples this size. They both have the same complexity. If your code is not fast enough for you, this is probably one of the last places you'd look at.
Now, if you really want to know which is faster, measure them. On SBCL, you can call each function in a loop and measure the time. Since you have two simple functions, time is enough. If your program was more complicated, a profiler would be more useful. Hint: if you don't need a profiler for your measurements, you probably don't need to worry about performance.
On my machine (SBCL 64 bit), I ran your functions and got this:
CL-USER> (time (loop repeat 1000 do (factorial_recursion 1000)))
Evaluation took:
0.540 seconds of real time
0.536034 seconds of total run time (0.496031 user, 0.040003 system)
[ Run times consist of 0.096 seconds GC time, and 0.441 seconds non-GC time. ]
99.26% CPU
1,006,632,438 processor cycles
511,315,904 bytes consed
NIL
CL-USER> (time (loop repeat 1000 do (factorial_loop 1000)))
Evaluation took:
0.485 seconds of real time
0.488030 seconds of total run time (0.488030 user, 0.000000 system)
[ Run times consist of 0.072 seconds GC time, and 0.417 seconds non-GC time. ]
100.62% CPU
902,043,247 processor cycles
511,322,400 bytes consed
NIL
After putting your functions in a file with (declaim (optimize speed)) at the top, the recursion time dropped to 504 milliseconds and the loop time dropped to 475 milliseconds.
And if you really want to know what's going on, try dissasemble on your functions and see what's in there.
Again, this looks like a non-issue to me. Personally, I try to use Common Lisp like a scripting language for prototyping, then profile and optimize the parts that are slow. Getting from 500ms to 475ms is nothing. For instance, in some personal code, I got a couple of orders of magnitude speedup by simply adding an element type to an array (thus making the array storage 64 times smaller in my case). Sure, in theory it would have been faster to reuse that array (after making it smaller) and not allocate it over and over. But simply adding :element-type bit to it was enough for my situation - more changes would have required more time for very little extra benefit. Maybe I'm sloppy, but 'fast' and 'slow' don't mean much to me. I prefer 'fast enough' and 'too slow'. Both your functions are 'fast enough' in most cases (or both are 'too slow' in some cases) so there's no real difference between them.
If you can write recursive functions in such a way that the recursive call is the very last thing done (and the function is thus tail-recursive) and the language and compiler/interpreter you are using supports tail recursion, then the recursive function can (usually) be optimised into code that is really iterative, and is as fast as an iterative version of the same function.
Sam I Am is correct though, iterative functions are usually faster than their recursive counterparts. If a recursive function is to be as fast as an iterative function that does the same thing, you have to rely on the optimiser.
The reason for this is that a function call is much more expensive than a jump, plus you consume stack space, a (very) finite resource.
The function you give is not tail recursive because you call factorial_recursion and then you multiply it by x. An example of a tail-recursive version would be
(defun factorial-recursion-assist (x cur)
(if (> x 1)
(factorial-recursion-assist (- x 1) (+ cur (* (- x 1) x)))
cur))
(defun factorial-recursion (x)
(factorial-recursion-assist x 1))
(print (factorial-recursion 4))
Here's a tail-recursive factorial (I think):
(defun fact (x)
(funcall (alambda (i ret)
(cond ((> i 1)
(self (1- i) (* ret i)))
(t
ret)))
x 1))

Immutable queue in Clojure

What is the best way to obtain a simple, efficient immutable queue data type in Clojure?
It only needs two operations, enqueue and dequeue with the usual semantics.
I considered lists and vectors of course, but I understand that they have comparatively poor performance (i.e. O(n) or worse) for modifications at the end and beginning respectively - so not ideal for queues!
Ideally I'd like a proper persistent data structure with O(log n) for both enqueue and dequeue operations.
Problem solved - solution for others who may find it helpful.
I've found that Clojure has the clojure.lang.PersistentQueue class that does what is needed.
You can create an instance like this:
(def x (atom clojure.lang.PersistentQueue/EMPTY))
As far as I can see, you currently need to use the Java interop to create the instance but as Michal helpfully pointed out you can use peek, pop and conj subsequently.
I use the following function queue to create a PersistentQueue. Optionally, you might want to have a print-method and a data-reader if you're going to be printing and reading the queues.
The usual Clojure functions are already implemented for PersistentQueue.
peek -- get the head
pop -- returns a new PersistentQueue without the head
conj -- add item to the tail
empty? -- true if empty
seq -- contents as a sequence (list)
(defn queue
([] clojure.lang.PersistentQueue/EMPTY)
([coll] (reduce conj clojure.lang.PersistentQueue/EMPTY coll)))
(defmethod print-method clojure.lang.PersistentQueue
[q ^java.io.Writer w]
(.write w "#queue ")
(print-method (sequence q) w))
(comment
(let [*data-readers* {'queue #'queue}]
(read-string (pr-str (queue [1 2 3])))))
Clojure could really benefit from a queue literal. This would be cleaner (and more portable) than relying on Java interop.
However, it's not that hard to roll your own portable persistent queue, just using ordinary household items like lists.
Consider the queue as two lists, one providing the head portion of the queue, and the other the tail. enqueue adds to the first list, dequeue pops from the latter. Most ISeq functions are trivially implemented.
Probably the only tricky part is what happens when the tail is empty and you want to dequeue. In that case the head list is reversed and becomes the new tail, and the empty list becomes the new head list. I believe, even with the overhead of the reverse, that enqueue and dequeue remain O(1), though the k is going to be higher than a vanilla vector of course.
Happy queueing!

Resources