I hope this doesn't come up as too much an open-ended question.
I'm using RPOPLPUSH to implement a reliable queue and I'm trying to evaluate the tradeoffs between having the (possibly big) string value (e.g. JSON) directly in the list or by having only a "key" in the list and storing/retrieving the value using SET/GET (i.e. 2 extra calls), in this case, LREM is still O(N), but the string being smaller, should perform better.
Something I have not considered ?
You should test it yourself.
That said, memory allocation/deallocation is faster than network so a large string will probably perform better.
Related
I know this is a stupid question, but I feel like someone might want to know (or inform a <redacted> co-worker about) this. I am not attaching a specific programming language to this since I think it could apply to all of them. Correct me if I am wrong about that.
Main question: Is it faster and/or better to look for entries in a constant String or in a List<String>?
Details: Let's say that I want to see if a given extension is in a list of supported extensions. Which of the following is the best (in regards to programming style) and/or fastest:
const String SUPPORTED = ".exe.bin.sh.png.bmp" /*etc*/;
public static bool IsSupported(String ext){
ext = Normalize(ext);//put the extension in some expected state like lowercase
//String.Contains
return SUPPORTED.Contains(ext);
}
final static List<String> SUPPORTED = MakeAnImmutableList(".exe", ".bin", ".sh",
".png",".bmp" /*etc*/);
public static bool IsSupported(String ext){
ext = Normalize(ext);//put the extension in some expected state like lowercase
//List.Contains
return SUPPORTED.Contains(ext);
}
First, it is important to note that the solutions are not functionally equivalent. The substring search will return true for strings like x and exe.bin while the List<String>.contains() will not. In that sense the List<String> version is likely to be closer to the semantic you want. Any possible performance comparison should keep that in mind.
Now, on to performance.
Theoretical
From an asymptotic and algorithm complexity point of view, the List<String>.contains() approach will be faster than the other alternative as the length of the strings grows. Conceptually, the String.contains version needs to look for a match at each position in the SUPPORTED String, while the List.contains() version only needs to match starting at the start of each substring - as soon as finds a mismatch in the current candidate, it skips to the next. This is related to the above note that the options aren't functionally equivalent: the String.contains options can in theory match a much wider universe of inputs, so it has to do more work before rejecting candidates.
Complexity-wise, this difference could be something like O(N) for List.contains() versus O(N^2) for String.contains() if you take N to be the number of candidates and assume each candidate has a bounded length, and to the String.contains() method in the usual brute-force "look for a match starting at each position" algorithm. As it turns out, the Java String.contains() implementation isn't exactly doing the basic O(N^2) search, but it isn't doing Boyer-Moore either. In general you can expect that once the substrings get long enough, the List.String approach will be faster.
Close(r) to the Metal
At a closer to the metal perspective, both approaches have their advantages. The String.contains() solution avoids the overhead of iterating over the List elements: the entire call will be spent in the intrinsified String.contains implementation, and all the chars making up the entire SUPPORTED Strings are contiguous, so this is memory-friendly. The List.contains() approach will spend a lot of time doing the double-dereferencing needed to go from each List element to the contained String and then to the contained char[] array, and this is likely to dominate if the strings you are comparing against are very short.
On other hand, the List.contains solution ultimately calls into String.equals which is likely implemented in terms of Arrays.equal(char[], char[]) which is heavily optimized with SSE and AVX intrinsics on x86 platforms and likely to be blazing fast, even compared to the optimized version of String.contains(). So if the Strings become long, again expect List.contains() to pull ahead.
All that said, there is a simple, canonical way to do this quickly: a HashSet<String> with all the candidate strings. That's just a simple String.hash() (which is cached and so often "free") and a single lookup in the hash table.
Well, it can vary from implementation to implementation, but if want to look to this problem in a generalized way, lets see.
If you want to look for a specific sub-string inside a string, let say a file extension inside a immutable string containing different extensions, you just need to traverse the string with extensions once.
In the other hand, with a list of immutable strings, still need to traverse each one of the string in that list plus the overhead of iterate over that list.
As a conclusion, in a generalized way, you can see that using a list to store the strings need more processing.
But, you can look to both solutions by its readability, maintainability, etc. For example, if you want to add or remove new extensions or apply more complex operations, may worth the overhead using a list of string.
I am working with Stream parallel processing and get to know that if I am using plane Array stream, it gets processed very fast. But if I am using ArrayList, then the processing gets a bit slower. But if I use LinkedList or some Binary Tree, the processing gets even more slower.
All that sounds like the more is the splittability of stream, more faster the processing would be. That means array and array list is most efficient in case of parallel streams. Is it true? If so, Shall we always use ArrayList or Array if we want to process stream in parallel? If so, how to use LinkedList and BlockingQueue in case of parallel stream?
Another thing is the state-fulness of the intermediate functions chosen. If I perform stateless operations like filter(), map(), the performance is high, but if perform the state full operations like distinct(), sorted(), limit(), skip(), it takes a lot of time. So again, the parallel stream get slower. Does that means we should not go for state full intermediate functions in parallel stream? If so, then what is the work around for that?
Well, as discussed in this question, there is hardly any reason to use LinkedList at all. The higher iteration costs apply to all operations, not just parallel streams.
Generally, the splitting support has indeed a big impact on the parallel performance. First, whether it has a genuine, hopefully cheap, splitting support rather than inheriting the buffering default behavior of AbstractSpliterator, second, how balanced the splits are.
In this regard, there is no reason why a binary tree should perform badly. A tree can be split into sub-trees easily and if the tree is balanced at the beginning, the splits will be balanced too. Of course, this requires that the actual Collection implementation implements the spliterator() method returning a suitable Spliterator implementation rather than inheriting the default method. E.g. TreeSet has a dedicated spliterator. Still, iterating the sub-trees might be more expensive than iterating an array, but that’s not a property of the parallel processing, as that would apply to sequential processing as well or any kind of iteration over the elements in general.
The question, how to use LinkedList and BlockingQueue in case of parallel streams, is moot. You choose the collection type depending on the application’s needs and if you really need one of these (in case of LinkedList hard to imagine), then you use it and live with the fact that its parallel stream performance would be less than that of ArrayList, which apparently didn’t fit your other needs. There is no general trick to make the parallel stream performance of badly splittable collections better. If there was, it would be part of the library.
There are some corner cases where the JRE doesn’t provide the maximum performance, which will be addressed in Java 9, like String.chars(), Files.lines() or the default spliterator for 3rd part RandomAccess Lists, but none of these apply to LinkedList, BlockingQueue or custom Binary Tree implementations.
In other words, if you have a particular use case with a particular collection, there might be something to improve, but there is no trick that could improve the parallel performance of all tasks with all collections.
It is correct that stateful intermediate operations like distinct(), sorted(), limit(), skip() have higher costs for parallel streams and their documentation even tells this. So we could give the general advice to avoid them, especially for parallel stream, but that would be kind of pointless, as you didn’t use them, if you didn’t need them. And again, there is no general work-around for that, as there wouldn’t be much sense in offering these operations if there was a generally better alternative.
Not a bad questions IMO.
Of course the array and ArrayList are going to be splittable much better then LinkedList or some type of a Tree. You can look at how their Spliterators are made to convince yourself. They usually start with some batch size (1024 elements) and increase from that. LinkedList does that and Files.lines if I remember correctly. So yes, using an arrays and ArrayList will have a very good parallelization.
If you want a better parallel support for some structures like LinkedList you could write your own spliterator - I think StreamEx did that for Files.lines to start with a smaller batch size... And this is a related question btw.
The other thing is that when you use stateful intermediate operations - you will effectively make intermediate operations that are above that stateful one - into stateful too... Let me provide an example:
IntStream.of(1, 3, 5, 2, 6)
.filter(x -> {
System.out.println("Filtering : " + x);
return x > 2;
})
.sorted()
.peek(x -> System.out.println("Peek : " + x))
.boxed()
.collect(Collectors.toList());
This will print:
Filtering : 1
Filtering : 3
Filtering : 5
Filtering : 2
Filtering : 6
Peek : 3
Peek : 5
Peek : 6
Because you have used sorted and filter is above that, filter has to take all elements and process them - so that sorted is applied to the correct ones.
On the other hand if you dropped sorted:
IntStream.of(1, 3, 5, 2, 6)
.filter(x -> {
System.out.println("Filtering : " + x);
return x > 2;
})
// .sorted()
.peek(x -> System.out.println("Peek : " + x))
.boxed()
.collect(Collectors.toList());
The output is going to be:
Filtering : 1
Filtering : 3
Peek : 3
Filtering : 5
Peek : 5
Filtering : 2
Filtering : 6
Peek : 6
Generally I do agree, I try to avoid (if I can) stateful intermediate operations - may be you don't want sorted let's say - may be you can collect to a TreeSet... etc. But I don't overthink it - it I need to use it - I just do and may be measure to see if it's really a bottleneck.
Unless you are really hitting some performance problems around this - I would not take that much into account; especially since you would need lots of elements to actually have some speed benefit from parallel.
Here is a related question that shows that you really really need lots of elements to see a performance gain.
This question doesn't address any programming language in particular but of course I'm happy to hear some examples.
Imagine a big number of files, let's say 5000, that have all kinds of letters and numbers in it. Then, there is a method that receives a user input that acts as an alias in order to display that file. Without having the files sorted in a folder, the method(s) need to return the file name that is associated to the alias the user provided.
So let's say user input "gd322" stands for the file named "k4e23", the method would look like
if(input.equals("gd322")){
return "k4e23";
}
Now, imagine having 4 values in that method:
switch(input){
case gd322: return fw332;
case g344d: return 5g4gh;
case s3red: return 536fg;
case h563d: return h425d;
} //switch on string, no break, no string indicators, ..., pls ignore the syntax, it's just pseudo
Keeping in mind we have 5000 entries, there are probably more than just 2 entries starting with g. Now, if the user input starts with 's', instead of wasting CPU cycles checking all the a's, b's, c's, ..., we could also make another switch for this, which then directs to the 'next' methods like this:
switch(input[0]){ //implying we could access strings like that
case a: switchA(input);
case b: switchB(input);
// [...]
case g: switchG(input);
case s: switchS(input);
}
So the CPU doesn't have to check on all of them, but rather calls a method like this:
switchG(String input){
switch(input){
case gd322: return fw332;
case g344d: return 5g4gh;
// [...]
}
Is there any field of computer science dealing with this? I don't know how to call it and therefore don't know how to search for it but I think my thoughts make sense on a large scale. Pls move the thread if it doesn't belong here but I really wanna see your thoughts on this.
EDIT: don't quote me on that "5000", I am not in the situation described above and I wanted to talk about this completely theoretical, it could also be 3 entries or 300'000, maybe even less or more
If you have 5000 options, you're probably better off hashing them than having hard-coded if / switch statements. In c++ you could also use std::map to pair a function pointer or other option handling information with each possible option.
Interesting, but I don't think you can give a generic answer. It all depends on how the code is executed. Many compilers will have all kinds of optimizations, in the if and switch, but also in the way strings are compared.
That said, if you have actual (disk) files with those lists, then reading the file will probably take much longer than processing it, since disk I/O is very slow compared to memory access and CPU processing.
And if you have a list like that, you may want to build a hash table, or simply a sorted list/array in which you can perform a binary search. Sorting it also takes time, but if you have to do many lookups in the same list, it may be well worth the time.
Is there any field of computer science dealing with this?
Yes, the science of efficient data structures. Well, isn't that what CS is all about? :-)
The algorithm you described resembles a trie. It wouldn't be statically encoded in the source code with switch statements, but would use dynamic lookups in a structure loaded from somewhere and stuff, but the idea is the same.
Yes the problem is known and solved since decades. Hash functions.
Basically you have a set of values (here strings like "gd322", "g344d") and you want to know if some other value v is among them.
The idea is to put the strings in a big array, at an index calculated from their values by some function. Given a value v, you'll compute an index the same way, and check whether the value v is here or not. Much faster than checking the whole array.
Of course there is a problem with different values falling at the same place : collisions. Some magic is needed then : perfect hash functions whose coefficients are tweaked so values from the initial set don't cause any collisions.
I've read the question and answer How to code a URL shortener? and all the math makes perfect sense. My question is, since you have to go back to the database/datastore anyway for the lookup, why not just generate a random short string in your alphabet and store it with the full URL in your datastore, rather than converting it back to a numerical ID?
It seems to me that this saves doing any math on the server, reduces complexity, and eliminates the 'walkability' of the short URL space (for my use-case, this is critical; URLs must not be guessed). If using a NoSQL store designed for key->value lookup, it doesn't seem that there is any potential performance issue of looking up the full URL value from a string as opposed to a numerical ID.
I'd like to know if I'm missing something.
The random short string approach violates the bijectivity of the shortening function.
Given two URLs a and b and your shortening function f, it should be guaranteed that:
if a = b then f(a) = f(b), however, since f generates a random value, the bijectivity is violated.
If however, you are just looking to shorten any particular URL and do not mind that subsequent shortenings of the same URL will generate different values, then the approach you outline above would be more efficient.
Now that std::experimental::optional has been accepted (or is about to be accepted), I wonder what is the overhead and the consequences on the assembly generated when the inner value is get by the following operators :
->
*
value
value_or
compared to the case without std::optional. It could be particularly important for computationaly intensive programs.
For example, what would be order of magnitude of the overhead on operations on a std::vector<std::experimental::optional<double>> compared to a std::vector<double> ?
-> and * ought to have zero overhead.
value and value_or ought to have the overhead of one branch: if(active)
Also, copy/move constructor, copy/move assignment, swap, emplace, operator==, operator<, and the destructor ought to also have the overhead of one branch.
However, one banch of overhead is so small it probably can't even be measured. Seriously, write pretty code, and don't worry about the performance here. Odds are making the code pretty will result in it running faster than if you tried to make it fast. Counter-intuitive, but do it anyway.
There are definitely cases where the overhead becomes noticible, for instance sorting a large number of optionals. In these cases, there's four situations,
(A) all the optionals known to be empty ahead of time, in which case, why sort?
(B) Some optionals may or may not be active, in which case the overhead is required and there is no better way.
(C) All optionals are known to have values ahead of time and you don't need the sorted-data in place, in which case, use the zero overhead operators to make a copy of the data where the copy is using the raw type instead of optional, and sort that.
(D) All optionals are known to have values ahead of time, but you need the sorted data in-place. In this case, optional is adding unnecessary overhead, and the easiest way to work around it is to do step C, and then use the no-overhead operators to move the data back.
Besides the other answer, you should also consider that std::optional requires additional memory.
Often it's not just an extra byte, but (at least for "small" types) a 2x space overhead due to padding .
Maybe RAM isn't a problem but that also means fewer values available in the cache.
A sentinel value, if specific knowledge allows to use it, could be a better choice (probably in the form of markable to keep type safety).
An interesting reading is: Boost optional - Performance considerations