this concept would be really easy if I were using something like an array. For instance the pseudocode would be something like:
if array.count > 20000 {
delete unwanted element and append new value
} else {
append value to array
}
However, when it comes to binary trees getting the count kills my CPU / memory. It makes sense as to why, but I was wondering if someone had an alternative way of keeping a binary tree from getting too big?
Note: Keeping track of how many times the insert function gets called wouldn't work since duplicate values are discarded...
Hope this made sense.... Thanks in advance :)
Related
I have a RDD called myRdd:RDD[(Long, String)] (Long is an index which it was got using zipWithIndex()) with a number of elements but I need to cut it to get a specific number of elements for the final result.
I am wondering which is better way to do this:
myRdd.take(num)
or
myRdd.filterByRange(0, num)
I don't care about the order of the selected elements, but I do care about the performance.
Any suggestions? Any other way to do this? Thank you!
take is an action, and filterByRange is a transformation. An action sends the results to the driver node, and a transformation does not get executed until an action is called.
The take method will take the first n elements of the RDD and will send it back to the driver. The filterByRange a little bit more sophisticated, since it will take those element whose key is between the specified bounds.
I'd say that there are not so many differences in terms of performance between them. If you just want to send the results to the driver, without caring about the order, use the take method. However, if you want to benefit of the distributed computation and you don't need to send results back to the driver, use filterByRange method and then call to the action.
I'm having a problem with the not operation (and nearly all operations) in a list. What I mean with a list is 0 i1 i2 i3 ... in-1 in 0 with a unknown n
In my program I'm at an unknown index in that list and I need to check if it is 0
For the not algorithm you need a temporary value but you can only get to that value with a [<] or a [>] but then you will lose the value in the list.
reminder: the a = 0 algorithm goes like this:
t0[-]+
a[t0-]
t0[
<code>
]
The only thing I could come up with is leaving a 1 between each index but that seems extremely un-elegant.
so my questions is : is there a better way to do this?
Actually the 1 between each element thing is really one of the more efficient ways to do it. Then you simply walk back and forth until you meet a zero and you know at which end of the sequence you are, and also how many there are. And they're really easy to clear up after each operation as well.
There are ways to use only one cell per element, but it would require moving all elements to the left of the one you want one position to the left, and then moving them all back, for each operation. In some cases this might be faster if you only store small values in each element and you have a lot of elements.
Depends what you want to achieve. Personally I think the first option of leaving a trail of 1s and clearing them afterwards is the better option, even though it requires twice the space, as it is usually significantly faster in the general case.
as I'm well aware, using AppendTo[] for large lists isn't recommended as the function gets progressively slower with each appending done.
Lots of suggestions talk about using Reap[] and Sow[], however the ones I found don't deal with nested lists.
My question is relatively simple, how to substitute AppendTo[] with Reap[], Sow[] as directly as possible?
Specific information regarding my problem: I would like to append data in form of
data = {{x},{y}}
to a list, so after a few iterations, the list would look something like
list = {{{x1},{y1}},{{x2},{y2}},{{x3},{y3}}}
My solution works for only 1 iteration, and then breaks apart because of increasing use of Flatten[], so it's obviously a non-working solution. Of course, I'm open to other alternatives, faster than AppendTo[].
Thanks!
I have an assignment to write an algorithm to find duplicate in a Dynamic Sorted Array. I want to write this algoirthm but before starting, I must know the data structure Dynamic Sorted Array but, I dont know it. I tried to googling but I couldn't find anything like Dynamic Sorted Array. would you please guide me? What is this data structure and how dos it look like? thanks.
I think your instructor is simply referring to an array that can change and sort itself, so you can assume that it's always in the correct order and that it is of variable length. If the algorithm is to be written in pseudo-code that's probably all you need to know.
Let's see, you need to understand what a Dynamic Sorted array is:
You already know what a sorted array is, so let's try to understand what a dynamic array is: It's a grow able array where there is no restriction on the size of the array.
So, to summarize, you need to write an array which is:
A. Sorted
B. Dynamic in nature (expanding)
How to implement? Read Dynamic arrays overview and implementation in Java and C++
I assume it means an array whose length is dynamic (i.e. unknown at compile-time), and whose values are sorted.
I never heard of that data structure, but based on the individual words I would guess that it:
Behaves like an array, that is with O(1) access operations get(index) and set(index).
Can be resized if necessary.
Is always sorted.
I don't think though that such a data structure is very efficient for finding duplicates. I would prefer some sort of map, unless you need very simple algorithms.
I would say you may have a typo in your assignment. It might ought to read "sorted Dynamic Array".
However, a dynamic array which always inserts new items in sorted order would probably fit that terminology. So take your dynamic array:
[2][5][7][9]
Inserting the element '8' would result in the following array:
[2][5][7][8][9]
I have an array of integers, which could run into the hundreds of thousands (or more), sorted numerically ascending since that's how they were originally stacked.
I need to be able to query the array to get the index of its first occurrence of a number >= some input, as efficiently as possible. The only way I would know how to do this without even thinking about it would be to iterate through the array testing the condition until it returns true, at which point I'd stop iterating. However, this is the most expensive solution to this problem and I'm looking for the best algorithm to solve it.
I'm coding in Objective-C, but I'll give an example in JavaScript to broaden the audience of people who are able to respond.
// Sample set
var numbers = [1, 7, 23, 23, 23, 89, 1002, 1003];
var indexAfter100 = getIndexOfValueGreaterThan(100);
var indexAfter7 = getIndexOfValueGreaterThan(7);
// (indexAfter100 == 6) == true
// (indexAfter7 == 2) == true
Putting this data into a DB in order to perform this search will only be a last-resort solution since I'm keen to see some sort of algorithm to tackle this quickly in memory.
I do have the ability to change the data structure, or to store an additional data structure as I'm building the array, since my program has already pushed each number one by one onto this stack, so I'd just modify the code that's adding them to the stack. Searching for the index as they're being added to the stack isn't possible since the search operation will be repeated frequently with different values after the fact.
Right now I'm thinking "B-Tree" but to be honest, I would have no idea how to implement one and before I go off and start figuring that out, I wonder if there's a nice algorithm that fits this single use-case better?
You should use binary search. Objective C could even have a built-in method for that (many languages I know do). B-tree won't probably help much, unless you want to store the data on disk.
I don't know about Objective-C, but C (plain 'ol C) comes with a function called bsearch (besides, AFAIK, Obj-C can call C functions just fine):
http://www.cplusplus.com/reference/clibrary/cstdlib/bsearch/
That basically does a binary search which sounds like it's what you need.
A fast search algorithm should be able to handle an array of ints of that size without taking too long, I should think (and the array is sorted, so a binary search would probably be the way to go).
I think a btree is probably overkill...
Since they are sorted in a particular ASCending order and you only need the bigger ones, I would serialize that array, explode it by the INT and keep the part of the serialized string that holds the bigger INTs, then unserialize it and voilá.
Linear search also referred to as sequential search looks at each element in sequence from the start to see if the desired element is present in the data structure. When the amount of data is small, this search is fast.Its easy but work needed is in proportion to the amount of data to be searched.Doubling the number of elements will double the time to search if the desired element is not present.
Binary search is efficient for larger array. In this we check the middle element.If the value is bigger that what we are looking for, then look in the first half;otherwise,look in the second half. Repeat this until the desired item is found. The table must be sorted for binary search. It eliminates half the data at each iteration.Its logarithmic