This wouldn't be hard to test but I'm curious if anyone's already tested this: is there a difference in performance for setting a newly instantiated gameobject as the first vs. last sibling in a hierarchy? For example, I could order a list of gameobjects in descending order and set each gameobject as the last sibling or order it in ascending order and set each as the first to get my desired order.
I'm asking because it reminded me of the performance gain of inserting at the head of a linked list vs. at the end in order to avoid traversal of the list.
Children are automatically added as last siblings upon instantiation so setting any as first siblings would be more expensive.
If you don't care for objects precendence, why set child order at all?
Internally, the hierarchy is actually stored on the C++ side of the engine, so doing micro-optimisations at this level are just wasting time - the fact you are making a P/Invoke call to the unmanaged side is probably more expensive than the difference between first and last its quite likely that its more similar to an ordinary array, than to a list, on the other side.
Related
I'm a little confused with the deletion in B+ tree. I searched a lot in Google and find that there are two implementation when the key you want to delete appears in the index:
Delete the key in the index
Keep the key in the index
Algorithm from https://www.javatpoint.com/b-plus-tree-deletion uses the first way.
Algorithm from https://www.cs.princeton.edu/courses/archive/fall08/cos597A/Notes/BplusInsertDelete.pdf uses the second way.
So I really want to know which one is right.
But I'm more inclined to take that as an undefined behavior. At this point, could someone help me figure out the advantage and disadvantage between them? And how to choose between them?
Thanks in advance.
Both methods are correct.
The difference that you highlight is not so much in deleting/not-deleting internal keys, but in updating/not-updating them.
Obviously, when you delete a value (i.e. a key in a leaf node), the b-plus-tree property is not violated: all child values are still within the range dictated by the parent information. You can never break this range-rule by merely removing a value from a leaf. This rule is also still valid when you update the internal key(s) in the path to that leaf (according to method 1), which is only necessary when the deleted value was the left-most one in its leaf.
Note that the two methods may produce quite different trees after a long sequence of the same operations (insert, delete).
But on average the second method will have slightly less work to do. This difference is not significant though.
I have algorithms that works with dynamically growing lists (contiguous memory like a C++ vector, Java ArrayList or C# List). Until recently, these algorithms would insert new values into the middle of the lists. Of course, this was usually a very slow operation. Every time an item was added, all the items after it needed to be shifted to a higher index. Do this a few times for each algorithm and things get really slow.
My realization was that I could add the new items to the end of the list and then rotate them into position later. That's one option!
Another option, when I know how many items I'm adding ahead of time, is to add that many items to the back, shift the existing items and then perform the algorithm in-place in the hole I've made for myself. The negative is that I have to add some default value to the end of the list and then just overwrite them.
I did a quick analysis of these options and concluded that the second option is more efficient. My reasoning was that the rotation with the first option would result in in-place swaps (requiring a temporary). My only concern with the second option is that I am creating a bunch of default values that just get thrown away. Most of the time, these default values will be null or a mem-filled value type.
However, I'd like someone else familiar with algorithms to tell me which approach would be faster. Or, perhaps there's an even more efficient solution I haven't considered.
Arrays aren't efficient for lots of insertions or deletions into anywhere other than the end of the array. Consider whether using a different data structure (such as one suggested in one of the other answers) may be more efficient. Without knowing the problem you're trying to solve, it's near-impossible to suggest a data structure (there's no one solution for all problems). That being said...
The second option is definitely the better option of the two. A somewhat better option (avoiding the default-value issue): simply copy 789 to the end and overwrite the middle 789 with 456. So the only intermediate step would be 0123789789.
Your default-value concern is, however, (generally) not a big issue:
In Java, for one, you cannot (to my knowledge) even assign memory for an array that's not 0- or null-filled. C++ STL containers also enforce this I believe (but not C++ itself).
The size of a pointer compared to any moderate-sized class is minimal (thus assigning it to a default value also takes minimal time) (in Java and C# everything is pointers, in C++ you can use pointers (something like boost::shared_ptr or a pointer-vector is preferred above straight pointers) (N/A to primitives, which are small to start, so generally not really a big issue either).
I'd also suggest forcing a reallocation to a specified size before you start inserting to the end of the array (Java's ArrayList::ensureCapacity or C++'s vector::reserve). In case you didn't know - varying-length-array implementations tend to have an internal array that's bigger than what size() returns or what's accessible (in order to prevent constant reallocation of memory as you insert or delete values).
Also note that there are more efficient methods to copy parts of an array than doing it manually with for loops (e.g. Java's System.arraycopy).
You might want to consider changing your representation of the list from using a dynamic array to using some other structure. Here are two options that allow you to implement these operations efficiently:
An order statistic tree is a modified type of binary tree that supports insertions and selections anywhere in O(log n) time, as well as lookups in O(log n) time. This will increase your memory usage quite a bit because of the overhead for the pointers and extra bookkeeping, but should dramatically speed up insertions. However, it will slow down lookups a bit.
If you always know the insertion point in advance, you could consider switching to a linked list instead of an array, and just keep a pointer to the linked list cell where insertions will occur. However, this slows down random access to O(n), which could possibly be an issue in your setup.
Alternatively, if you always know where insertions will happen, you could consider representing your array as two stacks - one stack holding the contents of the array to the left of the insert point and one holding the (reverse) of the elements to the right of the insertion point. This makes insertions fast, and if you have the right type of stack implementation could keep random access fast.
Hope this helps!
HashMaps and Linked Lists were designed for the problem you are having. Given a indexed data structure with numbered items, the difficulty of inserting items in the middle requires a renumbering of every item in the list.
You need a data structure which is optimized to make inserts a constant O(1) complexity. HashMaps were designed to make insert and delete operations lightning quick regardless of dataset size.
I can't pretend to do the HashMap subject justice by describing it. Here is a good intro: http://en.wikipedia.org/wiki/Hash_table
If I got it right, / means that the node right to it must be an immediate child of the node left to it, e.g. /ul/li returns li items which are immediate children of a ul item which is the document root. //ul//li returns li items which are descendants of any ul item which is somewhere in the document.
Now: Is /ul/li faster than //ul//li, even if the result set is the same?
Generally speaking, yes, of course!
/ul/li visits at most (number_of_ul * number_of_li nodes), with a maximum depth of 2. //ul//li could potentially visit every node in the document.
However, you may be using a document system with some kind of indexing, or you could have a document where the same number of nodes ends up getting visited, or whatever, which could either make // not as slow or or the same speed as or possibly even faster than /ul/li. I guess you could also potentially have a super-dumb XPath implementation that visits every node anyway.
You should profile your specific scenario rather than ask which is faster. "It depends" is the answer.
There are probably at least 50 implementations of XPath, and their performance varies dramatically, by several orders of magnitude. It's therefore meaningless to ask questions about XPath performance without reference to a specific implementation.
Generally the advice given is to use as specific a path as you can: /a/b/c/d is better than //d. However, this advice is not always correct. Some products will execute //d faster because they have gone to the trouble of creating an index: this is particularly true if you are running against an XML database. Also, performance isn't everything. When you are dealing with complex vocabularies like FpML, the specific path to an element can easily be ten steps with names averaging 20 characters, so that's a 200-character XPath, and it can be very hard to spot when you get it wrong. Programmer performance matters more than machine performance.
I heard an interview question:
"Print a singly-linked list backwards,
in constant space and linear time."
My solution was to reverse the linkedlist in place and then print it like that. Is there another solution that is nondestructive?
You've already figured out most of the answer: reverse the linked list in place, and traverse the list back to the beginning to print it. To keep it from being (permanently) destructive, reverse the linked list in place again as you're traversing it back to the beginning and printing it.
Note, however, that this only works if you either only have a single thread of execution, or make the whole traversal a critical section so only one thread does it at a time (i.e., a second thread can never play with the list in the middle of the traversal).
If you reverse it again after printing it will no longer be destructive, since the original order is restored.
You could use a recursive call down the linked list chain with a reference to what you wish to write to. Each node would use the child node's print function while passing the reference before printing itself.
That way each node in the list would pass down, until the last one couldn't and would go straight to the write, then each one back up the chain would write after the last all the way back up to the front.
Edit
This actually doesn't fit the specs because of the linear space on stack. If you had something outside to walk the functions and a method of writing to the front of a string the base logic can still work though.
Okay , this could be an interview question , but it is actually a question behind weis algorithms book. The question clearly states that we cannot use recursion (something the interviewer will hide and reveal later on) as recursion will not use constant space, moslty recursion will become a major point of discusion going forward. Solution is reverse print and reverse back.
Here's an unconventional approach: Change your console to right-to-left reading order and then print the list in normal order. They will appear in backward order. Having to visit the actual data in reverse order doesn't sound like a constraint to the problem.
Is there any performance difference between myCollection.Where(...).FirstOrDefault() and myCollection.FirstOrDefault(...)
Filling in the dots with the predicate you are using.
Assuming we're talking LinqToObjects (obviously LinqToSql, LinqToWhatever have their own rules), the first will be ever so slightly slower since a new iterator must be created, but it's incredibly unlikely that you would ever notice the difference. In terms of number of comparisons and number of items examined, the time the two take to run will be virtually identical.
In case you're worried, what will not happen is that the .Where operator filters the list to n items and the .FirstOfDefault takes the first out of the filtered list. Both sequences will short-circuit correctly
If we assume that in both cases you're using the Extension methods provided by the Enumerable static class, then you'll be hard pressed to measure any difference between the two.
The longer form ...
myCollection.Where(...).FirstOrDefault()
... will (technically) produce more memory activity (creating an intermediary iterator to handle the Where() clause) and involve a few more cycles of processing.
The thing is that these iterators are lazy - the Where() clause won't go merrily through the entire list evaluating the predicate, it will only check as many items as necessary to find one to pass through.