I've searched for this and was surprised not to find this.
My question regards specifically to FORTRAN (95), but I assume it is not different for other languages:
If I want to access an array in reverse order (last value/index, going backwards to the first value/index), would that be slower (in any meaningful way) than going forward (first value/index to last value/index)?
I could, of course, just test it out - but I'm interested knowing the reason and not just the yes/no part.
Just to point out, the question refers a 1D array, so this has nothing to do with indexing order (i,j,k vs k,j,i) or anything of that nature.
Thanks!
Related
I was creating two arrays to hold x and y values for points I want to draw. As I was doing this, I remembered the PVectors I had recently learned about, and instead created a single array of PVectors instead of the two I had originally. Which method is more efficient and will result in less browser lag given a large set of x and y values?
Why don't you try both and find out? Create an array of 1,000 PVectors, and compare that to creating arrays with 1,000 float values. Increase that to 10,000, or 100,000, or 1,000,000.
Display the frameRate variable on the screen in each case. (Don't use println(), it's way too slow- use the text() function instead!) When do you notice it start to drop?
This might seem like I'm answering your question with a question, but the best way to answer questions about performance is to just try it yourself, do some benchmarking and profiling, and decide what's best in your specific context.
But I would doubt that you're going to notice a huge difference in either case. The speed is probably identical. The only difference you might notice is that using PVector will probably use a little bit more memory. But again, probably not enough to really care about.
Instead of worrying about this kind of optimization (which is a premature optimization, or a micro-optimization), you should just use whichever approach makes the most sense to you. Code readability and maintainability are more important than little things like this, so just use whichever seems more logical in your program.
I have a question about fundamentals in data structures.
I understand that array's access time is faster than a linked list. O(1)- array vs O(N) -linked list
But a linked list beats an array in removing an element since there is no shifting needing O(N)- array vs O(1) -linked list
So my understanding is that if the majority of operations on the data is delete then using a linked list is preferable.
But if the use case is:
delete elements but not too frequently
access ALL elements
Is there a clear winner? In a general case I understand that the downside of using the list is that I access each node which could be on a separate page while an array has better locality.
But is this a theoretical or an actual concern that I should have?
And is the mixed-type i.e. create a linked list from an array (using extra fields) good idea?
Also does my question depend on the language? I assume that shifting elements in array has the same cost in all languages (at least asymptotically)
Singly-linked lists are very useful and can be better performance-wise relative to arrays if you are doing a lot of insertions/deletions, as opposed to pure referencing.
I haven't seen a good use for doubly-linked lists for decades.
I suppose there are some.
In terms of performance, never make decisions without understanding relative performance of your particular situation.
It's fairly common to see people asking about things that, comparatively speaking, are like getting a haircut to lose weight.
Before writing an app, I first ask if it should be compute-bound or IO-bound.
If IO-bound I try to make sure it actually is, by avoiding inefficiencies in IO, and keeping the processing straightforward.
If it should be compute-bound then I look at what its inner loop is likely to be, and try to make that swift.
Regardless, no matter how much I try, there will be (sometimes big) opportunities to make it go faster, and to find them I use this technique.
Whatever you do, don't just try to think it out or go back to your class notes.
Your problem is different from anyone else's, and so is the solution.
The problem with a list is not just the fragmentation, but mostly the data dependency. If you access every Nth element in array you don't have locality, but the accesses may still go to memory in parallel since you know the address. In a list it depends on the data being retrieved, and therefore traversing a list effectively serializes your memory accesses, causing it to be much slower in practice. This of course is orthogonal to asymptotic complexities, and would harm you regardless of the size.
I've got a list of names and I need to split them up into first and last names. Since some names have 2-3 spaces in them, a simple split for a space won't do.
What sort of heuristics do people use to perform the split?
Note that this isn't a duplicate of questions that effectively ask how to split at a space; I'm looking for heuristics and algorithms, not actual code help.
Update: I'm limiting the problem set to English-style names. This is all I need to solve and likely all that anyone approaching this (English language) question will need as well.
I've read a very interesting and comprehensive post on this subject:
http://www.w3.org/International/questions/qa-personal-names
It even suggests to ask yourself whether you really need separate fields for first and last names. It seems to depend on the target region(s) of your application.
Two approaches can help, though not fully solve this problem.
Programatically separate the easy ones, the ones that are not easy get pushed into a different list, "remaining to be split". Manually sort that list. As you manually sort, some heuristics might emerge which could be coded, further reducing the size of the remaining list. If this is a one-time thing, and list is not super massive, this will get the job done.
A closely related problem is when a name is split, but you don't know which is the first and which is last. Some systems work around this problem by doing fuzzy lookups such that if on the first attempt no match is found, flip the first and last name and try again. You didn't say why you need to split the names. If it is to lookup against reference data, consider some kind of similar fuzzy lookup heuristics which allow for trying different splits instead of trying to get the split correct up-front.
Not really an answer, but in this case there really is no perfect answer.
Different countries and regions have different formats for names. For example, Asia the family name is usually first and then given names follows. The West, you’ve got the first name and last name convention, but gets complicated when people double barrel, or include middle names. And then some regions people are only given one name.
Personally, I don’t think there’s one single algorithm that can give you 100% accurate results I’m afraid.
The following assumes English-style surnames. If that's not the case, please update your question.
It's usually safe to assume that the last space character signals the start of a person's surname. But since there are exceptions, one strategy would be to compile a large database of known multi-word surnames from some other source. You could then test for these surnames, and treat them as exceptions.
In short, is the cost (in time and cpu) higher to call kind_of? twice or to create a new array with one value, then iterate through it? The 'backstory' below simply details why I need to know this, but is not a necessary read to answer the question.
Backstory:
I have a bunch of location data. Latitude/longitude pairs and the name of the place they represent. I need to sort these lat/lon values by distance from another lat/lon pair provided by a user. I have to calculate the distances on the fly, and they aren't known before.
I was thinking it would be easy to do this by adding the distance => placename map to a hash, then get a keyset and sort that, then read out the values in that order. However, there is the potential for two distances being equal, making two keys equal to each other.
I have come up with two solutions to this, either I map
if hash.has_key?(distance)
hash[distance].kind_of? Array
? hash[distance] << placename
: hash.merge!({distance => [hash[distance], placename]})
else
hash.merge!({distance => placename})
end
then when reading the values I check
hash[distance] kind_of? Array ? grab the placename : iterate through hash and grab all placenames
each time. Or I could make each value an array from the start even if it has only one placename.
You've probably spent more time thinking about the issue than you will ever save in CPU time. Developer brain time (both yours and others who will maintain the code when you're gone) is often much more precious than CPU cycles. Focus on code clarity.
If you get indications that your code is a bottleneck, it may be a good idea to benchmark it, but don't forget to benchmark both before and after any changes you make, to make sure that you are actually improving the code. It is surprisingly how often "optimizations" aren't improving the code at all, just making it harder to read.
To be honest, this sounds like a very negligible performance issue, so I'd say just go with whatever feels better to you.
If you really believe that this has a real world performance impact (and frankly, there are other areas of Ruby you should worry more about speed-wise), reduce your problem to the simplest form that still resembles your problem and use the Benchmark module:
http://www.ruby-doc.org/stdlib/libdoc/benchmark/rdoc/index.html
I would bet that you'll achieve both higher performance and better legibility using the built-in Enumerable#group_by method.
As others have said, it's likely that this isn't a bottleneck, that gains will be negligible in any case and that you should focus on other things!
Phil Bagwell, in his 2002 paper on the VList data structure, indicates that you can use a VList to implement a persistent hash table. However, his explanation of how that worked didn't include much detail, and I don't understand it. Can anybody give me a more detailed explanation, or even examples?
Further, it appears to me from what I can see that this data structure, while it may have the same big-O complexity as a Hashtable, will be slower because it does additional lookups. Does anybody care to do a detailed analysis of just how much slower, preferably including cache behaviour? How does the performance relationship between the two change in the case of having no collisions or many?
I had a look at this paper, and it appears very preliminary. The fact that no later version has been published, and that the original appeared in IFL (which is a work-in-progress sort of meeting), suggests that you may be wasting your time.
Hrmm there seem to be a number of issues with the data structures proposed by the paper in question.
Off the cuff, the naive vlists mentioned first seem to need unique references in order to get anything near the time guarantees proposed. You lose the ability for the most part to share tails. You can share the tiny nodes towards the back of the list, but you wind up having to duplicate the largest vlist node the moment you cons something onto the cdr of vlist that is still active. That cost is proportional to the cost of copying the whole list.
With the 2d modifications mentioned later it becomes constant again, but its a pretty large constant, since you wind up at least copying the head of a list of pages (or worse, a vlist) and the first page in your list.
The functional hash list stuff in there didn't seem to make much sense to me to be honest. It was just a brief blurb that seemed to be bolted onto an otherwise unrelated paper, without enough detail to really make out how practical it is.