Algorithmic complexity of mongodb's array operations

Algorithmic complexity of mongodb's array operations - algorithm

Mongodb supports many useful array operations such as $push and $pop but I can't seem to find any information about their algorithmic complexity nor how they are implemented to figure out their runtime complexity. Any help would be greatly appreciated.

I think when it comes to Mongo updates, there are only three relevant cases:
1) an in-place atomic update. For example just increment an integer. This is very fast.
2) an in-place replace. The whole document has to be rewritten, but it still fits into the current space (it shrank or there is enough padding).
3) a document migration. You have to write the document to a new location.
In addition to that there is the cost of updating affected indexes (all, if the whole thing had to be moved).
What you actually do inside of the document (push around an array, add a field) should not have any significant impact on the total cost of the operation, which seem to depend mostly linearly on the size of the document (network and disk transfer costs).

Here's where they are implemented. You can figure out the complexity from there.
This is the $pop operator, for example (this seems like O(N) to me):
case POP: {
uassert( 10135 , "$pop can only be applied to an array" , in.type() == Array );
BSONObjBuilder bb( builder.subarrayStart( shortFieldName ) );
int n = 0;
BSONObjIterator i( in.embeddedObject() );
if ( elt.isNumber() && elt.number() < 0 ) {
// pop from front
if ( i.more() ) {
i.next();
n++;
}
while( i.more() ) {
bb.appendAs( i.next() , bb.numStr( n - 1 ) );
n++;
}
}
else {
// pop from back
while( i.more() ) {
n++;
BSONElement arrI = i.next();
if ( i.more() ) {
bb.append( arrI );
}
}
}
ms.pushStartSize = n;
verify( ms.pushStartSize == in.embeddedObject().nFields() );
bb.done();
break;
}

Related

Removing points from one PCL to another in an efficient way

There are similar questions, but my main concern here is regarding processing time.
I have two PCL, both of type pcl::PointXYZL, this is, there is a label/id info together with each point. I want to remove in PC B all the points existent in A (recognized by the label info).
An iteration of this type takes too much time.
I decide to save the labels from PC A into boost::container::flat_set<int> labels, and set cloud B as std::map<int, pcl::PointXYZL> cloud_B, where the key is the label/id from the point. Then I do:
for(boost::container::flat_set<int>::iterator it = labels.begin() ; it != labels.end(); ++it){
if ( auto point{ cloud_B.find( *it ) }; point != std::end(cloud_B)){
cloud_B.erase(*it);
}
}
It is now much much faster, but honestly, I think there may have a more efficient solution.
I also try:
for(boost::container::flat_set<int>::iterator it = labels.begin() ; it != labels.end(); ++it){
try{
cloud_B.erase(*it);
throw 505;
}
catch (...){
continue;
}
}
But it takes more time than the first example I brought.
I appreciated any help with this!

Both of your solution are roughly O(n^2) complexity.
Use unordered_set<std::uint32_t> to check if it exist or not. the average case of unordered_set<std::uint32_t>::count is constant, so it should be better than your solution
Sample code(untested) but you get the rough idea.
pcl::PointCloud<pcl::PointXYZL> cloud_a;
pcl::PointCloud<pcl::PointXYZL> cloud_b;
... (fill cloud)
// generate unordered_set<uint32_t> O(n)
std::unordered_set<std::uint32_t> labels_in_cloud_a;
for (int i = 0; i < cloud_a.size(); ++i) {
labels_in_cloud_a.insert(cloud_a.points[i].label);
}
// indices where in cloud_b where label doesn't exist in labels_in_clouda O(n*1) = O(n)
std::vector<int> indices;
for (int i = 0; i < cloud_b.size(); ++i) {
if (labels_in_cloud_a.count(cloud_b.points[i].label) == 0) {
indices.push_back(i);
}
}
// only indices that doesn't exist in cloud_a will remain. O(n)
pcl::copyPointCloud(cloud_b, indices, cloud_b);

This is my solution so far...
First, I set both clouds as type std::map<int, pcl::PointXYZL>. Remembering that I want to remove from cloud_B all points that are in cloud_A. Then:
std::map<int, pcl::PointXYZL>::iterator it = cloud_B.begin();
for ( ; it != cloud_B.end(); ){
if (auto point{ cloud_A.find(it->first) }; point != std::end(cloud_A)){ // if the label was found
it = cloud_B.erase(it);
}
else
++it;
}

Speed up linear time comparison between two arrays?

I have a speed bottleneck in my code right now. The following function compares two arrays (position and size) and produces a new array of position elements that are smaller than size. This runs in O(n) time but is called many times. Is there anyway for me to do better for this very specific case?
code:
function findValidDimensions(positions, sizes) {
var forwardDimensions = [];
var backwardDimensions = [];
for (var i = 0; i < sizes.length; i++) {
if (positions[i] < sizes[i]) {
forwardDimensions.push(i);
}
if (positions[i] > 1) {
backwardDimensions.push(i);
}
}
// we can go forward or backward
return {
"forward": forwardDimensions,
"backward": backwardDimensions
}
}

Unless there are elements you can avoid looking at (which, from your spare description, sounds unlikely), looking at n elements will take O(n) time.

As i know that the complicity for your method is to large if you use a big array the best searching pattern is the heap or the B-Tree :)
Some ref :
https://en.wikipedia.org/wiki/Heap_%28data_structure%29
https://en.wikipedia.org/wiki/B-tree

What will be the name of this sorting algorithm?

It is some what identical to what we do in hashing , and after adding elements inn the hash table i am simply searching for each element in in decreasing order and removing the element if it is found after printing it , i used it in solving Following very easy problem on codechef here is the basic algo that i had used , but i want to know what is it called ?
func(int nos){
int arr[1000000] = {0};
while( nos-- ) {
int k;
cin>>k;
arr[k]++;
}
for( i=0 ; i<1000000; ) {
if( arr[i]==0 )
{
i++;
continue;
}
cout<<i<<endl;
arr[i]--;
}
}
Thanks !

This is known as counting sort.

How can we find a repeated number in array in O(n) time and O(1) space complexity

How can we find a repeated number in array in O(n) time and O(1) complexity?
eg
array 2,1,4,3,3,10
output is 3
EDIT:
I tried in following way.
i found that if no is oddly repeated then we can achieve the result by doing xor . so i thought to make the element which is odd no repeating to even no and every evenly repeating no to odd.but for that i need to find out unique element array from input array in O(n) but couldn't find the way.

Assuming that there is an upped bound for the values of the numbers in the array (which is the case with all built-in integer types in all programming languages I 've ever used -- for example, let's say they are 32-bit integers) there is a solution that uses constant space:
Create an array of N elements, where N is the upper bound for the integer values in the input array and initialize all elements to 0 or false or some equivalent. I 'll call this the lookup array.
Loop over the input array, and use each number to index into the lookup array. If the value you find is 1 or true (etc), the current number in the input array is a duplicate.
Otherwise, set the corresponding value in the lookup array to 1 or true to remember that we have seen this particular input number.
Technically, this is O(n) time and O(1) space, and it does not destroy the input array. Practically, you would need things to be going your way to have such a program actually run (e.g. it's out of the question if talking about 64-bit integers in the input).

Without knowing more about the possible values in the array you can't.
With O(1) space requirement the fastest way is to sort the array so it's going to be at least O(n*log(n)).

Use Bit manipulation ... traverse the list in one loop.
Check if the mask is 1 by shifting the value from i.
If so print out repeated value i.
If the value is unset, set it.
*If you only want to show one repeated values once, add another integer show and set its bits as well like in the example below.
**This is in java, I'm not sure we will reach it, but you might want to also add a check using Integer.MAX_VALUE.
public static void repeated( int[] vals ) {
int mask = 0;
int show = 0;
for( int i : vals ) {
// get bit in mask
if( (( mask >> i ) & 1) == 1 &&
(( show >> i ) & 1) == 0 )
{
System.out.println( "\n\tfound: " + i );
show = show | (1 << i);
}
// set mask if not found
else
{
mask = mask | (1 << i);
System.out.println( "new: " + i );
}
System.out.println( "mask: " + mask );
}
}

This is impossible without knowing any restricted rules about the input array, either that the Memory complexity would have some dependency on the input size or that the time complexity is gonna be higher.
The 2 answers above are infact the best answers for getting near what you have asked, one's trade off is Time where the second trade off is in Memory, but you cant have it run in O(n) time and O(1) complexity in SOME UNKNOWN INPUT ARRAY.

I met the problem too and my solution is using hashMap .The python version is the following:
def findRepeatNumber(lists):
hashMap = {}
for i in xrange(len(lists)):
if lists[i] in hashMap:
return lists[i]
else:
hashMap[lists[i]]=i+1
return

It is possible only if you have a specific data. Eg all numbers are of a small range. Then you could store repeat info in the source array not affecting the whole scanning and analyzing process.
Simplified example: You know that all the numbers are smaller than 100, then you can mark repeat count for a number using extra zeroes, like put 900 instead of 9 when 9 is occurred twice.
It is easy when NumMax-NumMin
http://www.geeksforgeeks.org/find-the-maximum-repeating-number-in-ok-time/

public static string RepeatedNumber()
{
int[] input = {66, 23, 34, 0, 5, 4};
int[] indexer = {0,0,0,0,0,0}
var found = 0;
for (int i = 0; i < input.Length; i++)
{
var toFind = input[i];
for (int j = 0; j < input.Length; j++)
{
if (input[j] == toFind && (indexer[j] == 1))
{
found = input[j];
}
else if (input[j] == toFind)
{
indexer[j] = 1;
}
}
}
return $"most repeated item in the array is {found}";
}

You can do this
#include<iostream.h>
#include<conio.h>
#include<stdio.h>
void main ()
{
clrscr();
int array[5],rep=0;
for(int i=1; i<=5; i++)
{
cout<<"enter elements"<<endl;
cin>>array[i];
}
for(i=1; i<=5; i++)
{
if(array[i]==array[i+1])
{
rep=array[i];
}
}
cout<<" repeat value is"<<rep;
getch();
}

Remove duplicate items with minimal auxiliary memory?

What is the most efficient way to remove duplicate items from an array under the constraint that axillary memory usage must be to a minimum, preferably small enough to not even require any heap allocations? Sorting seems like the obvious choice, but this is clearly not asymptotically efficient. Is there a better algorithm that can be done in place or close to in place? If sorting is the best choice, what kind of sort would be best for something like this?

I'll answer my own question since, after posting, I came up with a really clever algorithm to do this. It uses hashing, building something like a hash set in place. It's guaranteed to be O(1) in axillary space (the recursion is a tail call), and is typically O(N) time complexity. The algorithm is as follows:
Take the first element of the array, this will be the sentinel.
Reorder the rest of the array, as much as possible, such that each element is in the position corresponding to its hash. As this step is completed, duplicates will be discovered. Set them equal to sentinel.
Move all elements for which the index is equal to the hash to the beginning of the array.
Move all elements that are equal to sentinel, except the first element of the array, to the end of the array.
What's left between the properly hashed elements and the duplicate elements will be the elements that couldn't be placed in the index corresponding to their hash because of a collision. Recurse to deal with these elements.
This can be shown to be O(N) provided no pathological scenario in the hashing:
Even if there are no duplicates, approximately 2/3 of the elements will be eliminated at each recursion. Each level of recursion is O(n) where small n is the amount of elements left. The only problem is that, in practice, it's slower than a quick sort when there are few duplicates, i.e. lots of collisions. However, when there are huge amounts of duplicates, it's amazingly fast.
Edit: In current implementations of D, hash_t is 32 bits. Everything about this algorithm assumes that there will be very few, if any, hash collisions in full 32-bit space. Collisions may, however, occur frequently in the modulus space. However, this assumption will in all likelihood be true for any reasonably sized data set. If the key is less than or equal to 32 bits, it can be its own hash, meaning that a collision in full 32-bit space is impossible. If it is larger, you simply can't fit enough of them into 32-bit memory address space for it to be a problem. I assume hash_t will be increased to 64 bits in 64-bit implementations of D, where datasets can be larger. Furthermore, if this ever did prove to be a problem, one could change the hash function at each level of recursion.
Here's an implementation in the D programming language:
void uniqueInPlace(T)(ref T[] dataIn) {
uniqueInPlaceImpl(dataIn, 0);
}
void uniqueInPlaceImpl(T)(ref T[] dataIn, size_t start) {
if(dataIn.length - start < 2)
return;
invariant T sentinel = dataIn[start];
T[] data = dataIn[start + 1..$];
static hash_t getHash(T elem) {
static if(is(T == uint) || is(T == int)) {
return cast(hash_t) elem;
} else static if(__traits(compiles, elem.toHash)) {
return elem.toHash;
} else {
static auto ti = typeid(typeof(elem));
return ti.getHash(&elem);
}
}
for(size_t index = 0; index < data.length;) {
if(data[index] == sentinel) {
index++;
continue;
}
auto hash = getHash(data[index]) % data.length;
if(index == hash) {
index++;
continue;
}
if(data[index] == data[hash]) {
data[index] = sentinel;
index++;
continue;
}
if(data[hash] == sentinel) {
swap(data[hash], data[index]);
index++;
continue;
}
auto hashHash = getHash(data[hash]) % data.length;
if(hashHash != hash) {
swap(data[index], data[hash]);
if(hash < index)
index++;
} else {
index++;
}
}
size_t swapPos = 0;
foreach(i; 0..data.length) {
if(data[i] != sentinel && i == getHash(data[i]) % data.length) {
swap(data[i], data[swapPos++]);
}
}
size_t sentinelPos = data.length;
for(size_t i = swapPos; i < sentinelPos;) {
if(data[i] == sentinel) {
swap(data[i], data[--sentinelPos]);
} else {
i++;
}
}
dataIn = dataIn[0..sentinelPos + start + 1];
uniqueInPlaceImpl(dataIn, start + swapPos + 1);
}

Keeping auxillary memory usage to a minimum, your best bet would be to do an efficient sort to get them in order, then do a single pass of the array with a FROM and TO index.
You advance the FROM index every time through the loop. You only copy the element from FROM to TO (and increment TO) when the key is different from the last.
With Quicksort, that'll average to O(n-log-n) and O(n) for the final pass.

If you sort the array, you will still need another pass to remove duplicates, so the complexity is O(NN) in the worst case (assuming Quicksort), or O(Nsqrt(N)) using Shellsort.
You can achieve O(N*N) by simply scanning the array for each element removing duplicates as you go.
Here is an example in Lua:
function removedups (t)
local result = {}
local count = 0
local found
for i,v in ipairs(t) do
found = false
if count > 0 then
for j = 1,count do
if v == result[j] then found = true; break end
end
end
if not found then
count = count + 1
result[count] = v
end
end
return result, count
end

I don't see any way to do this without something like a bubblesort. When you find a dupe, you need to reduce the length of the array. Quicksort is not designed for the size of the array to change.
This algorithm is always O(n^2) but it also use almost no extra memory -- stack or heap.
// returns the new size
int bubblesqueeze(int* a, int size) {
for (int j = 0; j < size - 1; ++j) {
for (int i = j + 1; i < size; ++i) {
// when a dupe is found, move the end value to index j
// and shrink the size of the array
while (i < size && a[i] == a[j]) {
a[i] = a[--size];
}
if (i < size && a[i] < a[j]) {
int tmp = a[j];
a[j] = a[i];
a[i] = tmp;
}
}
}
return size;
}

Is you have two different var for traversing a datadet insted of just one then you can limit the output by dismissing all diplicates that currently are already in the dataset.
Obvious this example in C is not an efficiant sorting algorith but it is just an example on one way to look at the probkem.
You could also blindly sort the data first and then relocate the data for removing dups, but I'm not sure that would be faster.
#define ARRAY_LENGTH 15
int stop = 1;
int scan_sort[ARRAY_LENGTH] = {5,2,3,5,1,2,5,4,3,5,4,8,6,4,1};
void step_relocate(char tmp,char s,int *dataset)
{
for(;tmp<s;s--)
dataset[s] = dataset[s-1];
}
int exists(int var,int *dataset)
{
int tmp=0;
for(;tmp < stop; tmp++)
{
if( dataset[tmp] == var)
return 1;/* value exsist */
if( dataset[tmp] > var)
tmp=stop;/* Value not in array*/
}
return 0;/* Value not in array*/
}
void main(void)
{
int tmp1=0;
int tmp2=0;
int index = 1;
while(index < ARRAY_LENGTH)
{
if(exists(scan_sort[index],scan_sort))
;/* Dismiss all values currently in the final dataset */
else if(scan_sort[stop-1] < scan_sort[index])
{
scan_sort[stop] = scan_sort[index];/* Insert the value as the highest one */
stop++;/* One more value adde to the final dataset */
}
else
{
for(tmp1=0;tmp1<stop;tmp1++)/* find where the data shall be inserted */
{
if(scan_sort[index] < scan_sort[tmp1])
{
index = index;
break;
}
}
tmp2 = scan_sort[index]; /* Store in case this value is the next after stop*/
step_relocate(tmp1,stop,scan_sort);/* Relocated data already in the dataset*/
scan_sort[tmp1] = tmp2;/* insert the new value */
stop++;/* One more value adde to the final dataset */
}
index++;
}
printf("Result: ");
for(tmp1 = 0; tmp1 < stop; tmp1++)
printf( "%d ",scan_sort[tmp1]);
printf("\n");
system( "pause" );
}
I liked the problem so I wrote a simple C test prog for it as you can see above. Make a comment if I should elaborate or you see any faults.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Algorithmic complexity of mongodb's array operations - algorithm

Mongodb supports many useful array operations such as $push and $pop but I can't seem to find any information about their algorithmic complexity nor how they are implemented to figure out their runtime complexity. Any help would be greatly appreciated.

Related

Removing points from one PCL to another in an efficient way

Speed up linear time comparison between two arrays?

What will be the name of this sorting algorithm?

How can we find a repeated number in array in O(n) time and O(1) space complexity

Remove duplicate items with minimal auxiliary memory?

Categories

Resources