Related
I’ve read that good obfuscation techniques not merely do things like replacing method names with something obscure, but also, for instance, replace strings in the source code with byte arrays and add methods to convert those back to the original strings.
This might be one of those questions leading to opinion-based answers, but I’m going to ask it anyway: Is there any general notion how much performance loss an application would suffer from in case such an obfuscation method is applied? I’ve got in mind a software that is heavily leaning on a database, i.e., queries exist in the code, for instance, as C# strings or StringBuilder entities.
Yes, string obfuscation has a significant performance impact, at the micro-level. With obfuscation, instead of a direct memory lookup you have code that has to execute (every time), and it is usually somewhat complicated, so it is necessarily much worse at the micro-performance level.
However, that cost usually doesn't matter; the time required for the database call (or showing the UI dialog, or sending the error to a log, or network traffic, or ...) is going to be orders of magnitude higher than the cost of converting the string. In most cases, the cost of the conversion is essentially invisible.
As with everything, careful testing is wise, but usually the costs are only "visible" if you are accessing obfuscated strings in a tight loop that is already CPU-performance-sensitive.
I was wondering if there was a performance difference in system-verilog when doing a string compare when using these two different methods:
1. str.compare(other_str);
2. str == other_str
If there is a difference, why is there a difference, and where did you get your information from?
I think there are a lot more factors that might affect performance than what you have shown here. Realize that SystemVerilog comes from the merging of multiple languages. Sometimes there are duplication of features which, for historical reasons, prevented removal of the redundancies.
The Questions are:
is the compiler sophisticated enough to generate the same
implementation for both?
if not, do some conditions favor one implementation over another? For example: characterization of the variable types, or storage class of the variables.
and do those conditions affect the compilers ability in generating the same code for both?
I have a large dataset from an analytics provider.
It arrives in JSON and I parse it into a hash, but due to the size of the set I'm ballooning to over a gig in memory usage. Almost everything starts as strings (a few values are numerical), and while of course the keys are duplicated many times, many of the values are repeated as well.
So I was thinking, why not symbolize all the (non-numerical) values, as well?
I've found some discusion of potential problems, but I figure it would be nice to have a comprehensive description for Ruby, since the problems seem dependent on the implementation of the interning process (what happens when you symbolize a string).
I found this talking about Java:
Is it good practice to use java.lang.String.intern()?
The interning process can be expensive
Interned strings are never de-allocated, resulting in a memory leak
(Except there's some contention on that last point.)
So, can anyone give a detailed explanation of when not to intern strings in Ruby?
When the list of things in question is an open set (i.e., dynamic, has no fixed inventory), you should not convert them into symbols. Each symbol created will not be garbage collected, and will cause memory leak.
When the list of things in question is a closed set (i.e., static, has a fixed inventory), you should better convert them into symbols. Each symbol will be created only once, and will be reused. That will save memory.
The interning process can be expensive
there is always a tradeoff between memory and computing power we have to choose. so try some best practices out there and benchmark to figure out what's right for you. a few suggestions I like to mention..
symbols are an excellent choice for a hash key
{name: "my name"}
Freeze Strings to save memory, try to keep a small string pool
person[:country] = "USA".freeze
have fun with Ruby GC tuning.
Interned strings are never de-allocated, resulting in a memory leak
ruby 2.2 introduced a symbol garbage collection. so this concern is no longer valid. however, overuse of frozen strings and symbols will decrease the performance.
Are there any appreciable performance differences between:
something.Where(predicate).FirstOrDefault();
and
something.FirstOrDefault(predicate);
?
I tend to use both, but am wondering if there's a clear winner when it comes to performance.
It depends on whether this Where is against an IQueryable or IEnumerable.
In case of IQueryable the difference is based on implementation of the provider but it is more likely there will be no difference and would yield same query.
In case of IEnumerable it should be negligible.
In java they say don't concatenate Strings, instead you should make a stringbuffer and keep adding to that and then when you're all done, use toString() to get a String object out of it.
Here's what I don't get. They say do this for performance reasons, because concatenating strings makes lots of temporary objects. But if the goal was performance, then you'd use a language like C/C++ or assembly.
The argument for using java is that it is a lot cheaper to buy a faster processor than it is to pay a senior programmer to write fast efficient code.
So on the one hand, you're supposed let the hardware take care of the inefficiencies, but on the other hand, you're supposed to use stringbuffers to make java more efficient.
While I see that you can do both, use java and stringbuffers, my question is where is the flaw in the logic that you either use a faster chip or you spent extra time writing more efficient software.
Developers should understand the performance implications of their coding choices.
It's not terribly difficult to write an algorithm that results in non-linear performance - polynomial, exponential or worse. If you don't understand to some extent how the language, compiler, and libraries support your algorithm you can fall into trap that no amount of processing power will dig you out of. Algorithms whose runtime or memory usage is exponential can quickly exceed the ability of any hardware to execute in a reasonable time.
Assuming that hardware can scale to a poorly designed algorithm/coding choice is a bad idea. Take for example a loop that concatenates 100,000 small strings together (say into an XML message). This is not an uncommon situation - but when implementing using individual string concatenations (rather than a StringBuffer) this will result in 99,999 intermediate strings of increasing size that the garbage collector has to dispose of. This can easily make the operation fail if there's not enough memory - or at best just take forever to run.
Now in the above example, some Java compilers can usually (but not always) rewrite the code to use a StringBuffer behind the scenes - but this is the exception, not the rule. In many situations the compiler simply cannot infer the intent of the developer - and it becomes the developer's responsibility to write efficient code.
One last comment - writing efficient code does not mean spending all your time looking for micro-optimizations. Premature optimization is the enemy of writing good code. However, you shouldn't confuse premature optimization with understanding the O() performance of an algorithm in terms of time/storage and making good choices about which algorithm or design to use in which situation.
As a developer you cannot ignore this level of knowledge and just assume that you can always throw more hardware at it.
The argument that you should use StringBuffer rather than concatenation is an old java cargo-cult myth. The Java compiler itself will convert a series of concatenations into a single StringBuffer call, making this "optimization" completely unnecessary in source code.
Having said that, there are legitimate reasons to optimize even if you're using a "slow" bytecode or interpreted language. You don't want to deal with the bugs, instability, and longer development cycle of C/C++, so you use a language with richer capabilities. (Built-in strings, whee!) But at the same time, you want your code to run as fast as possible with that language, so you avoid obviously inefficient constructs. IOWs just because you're giving up some speed by using java doesn't mean that you should forget about performance entirely.
The difference is that StringBuffer is not at all harder or more time-consuming to use than concatenating strings. The general principle is that if it's possible to gain efficiency without increasing development time/difficulty, it should be done: your principle only applies when that's not possible.
The language being slower isn't an excuse to use a much slower algorithm (and Java isn't that slow these days).
If we concatenate a 1-character to an n-character string, we need to copy n+1 characters into the new string. If we do
string s;
for (int i = 0; i < N; ++ i)
s = s + "c";
then the running time will be O(N2).
By contrast, a string buffer maintain a mutable buffer which reduces the running time to O(N).
You cannot double the CPU to reduce a quadratic algorithm into a linear one.
(Although the optimizer may have implicitly created a StringBuffer for you already.)
Java != ineffecient code.
You do not buy a faster processor to avoid writing efficient code. A bad programmer will write bad code regardless of language. The argument that C/C++ is more efficient than Java is an old argument that does not matter anymore.
In the real world, programming languages, operating systems and developpement tools are not selected by the peoples who will actually deal with it.
Some salesman of company A have lunch with your boss to sell its operating system ... and then some other salesman invite your boss at the strippers to sell its database engine ... and so on.
Then, and only then, they hire a bunch of programmers to put all that together. They want it nice, fast and cheap.
That's why you may end up programming high end performance applications with Java on a mobile device or nice 3D graphics on Windows with Python ...
So, your right, but it doesn't matter. :)
You should always put optimizations where you can. You shouldn't be "lazy coding" just because you have a fast processor...
I don't really know how stringbuffer works, nor do i work with Java, but assuming that java defines a string as char[], you're allocating a ton of dummy strings when doing str1+str2+str3+str4+str5, where you really only need to make a string of length str1.length+...str5.length and copy everything ONCE...
However, a smart compiler would optimize and automatically use stringbuffer