Am I using Rust's bitvec_simd in the most efficient way? - performance

I'm using bitvec_simd = "0.20" for bitvector operations in rust.
I have two instances of a struct, call them clique_into and clique_from. The relevant fields of the struct are two bitvectors members_bv and neighbors_bv, as well as a vector of integers which is called members. The members_bv and members vector represent the same data.
After profiling my code, I find that this is my bottleneck (in here 41% of the time): checking whether the members (typically 1) of clique_from are all neighbors of clique_into.
My current approach is to loop through the members of clique_from (typically 1) and check each one in turn to see if it's a neighbor of clique_into.
Here's my code:
use bitvec_simd::BitVec;
use smallvec::{smallvec, SmallVec};
struct Clique {
members_bv: BitVec,
members: SmallVec<[usize; 256]>,
neighbors_bv: BitVec,
}
fn are_cliques_mergable(clique_into: &Clique, clique_from: &Clique) -> bool {
for i in 0..clique_from.members.len() {
if !clique_into.neighbors_bv.get_unchecked(clique_from.members[i]) {
return false;
}
}
return true;
}
That code works fine and it's fast, but is there a way to make it faster? We can assume that clique_from almost always has a single member so the inner for loop is almost always executed once.
It likely comes down to this:
if !clique_into.neighbors_bv.get_unchecked(clique_from.members[i])
Is get_unchecked() the fastest way to do this? While I have written this so it will never panic, the compiler doesn't know that. Does this force Rust to waste time checking if it should panic?

Related

Why can't dead code detection be fully solved by a compiler?

The compilers I've been using in C or Java have dead code prevention (warning when a line won't ever be executed). My professor says that this problem can never be fully solved by compilers though. I was wondering why that is. I am not too familiar with the actual coding of compilers as this is a theory-based class. But I was wondering what they check (such as possible input strings vs acceptable inputs, etc.), and why that is insufficient.
The dead code problem is related to the Halting problem.
Alan Turing proved that it is impossible to write a general algorithm that will be given a program and be able to decide whether that program halts for all inputs. You may be able to write such an algorithm for specific types of programs, but not for all programs.
How does this relate to dead code?
The Halting problem is reducible to the problem of finding dead code. That is, if you find an algorithm that can detect dead code in any program, then you can use that algorithm to test whether any program will halt. Since that has been proven to be impossible, it follows that writing an algorithm for dead code is impossible as well.
How do you transfer an algorithm for dead code into an algorithm for the Halting problem?
Simple: you add a line of code after the end of the program you want to check for halt. If your dead-code detector detects that this line is dead, then you know that the program does not halt. If it doesn't, then you know that your program halts (gets to the last line, and then to your added line of code).
Compilers usually check for things that can be proven at compile-time to be dead. For example, blocks that are dependent on conditions that can be determined to be false at compile time. Or any statement after a return (within the same scope).
These are specific cases, and therefore it's possible to write an algorithm for them. It may be possible to write algorithms for more complicated cases (like an algorithm that checks whether a condition is syntactically a contradiction and therefore will always return false), but still, that wouldn't cover all possible cases.
Well, let's take the classical proof of the undecidability of the halting problem and change the halting-detector to a dead-code detector!
C# program
using System;
using YourVendor.Compiler;
class Program
{
static void Main(string[] args)
{
string quine_text = #"using System;
using YourVendor.Compiler;
class Program
{{
static void Main(string[] args)
{{
string quine_text = #{0}{1}{0};
quine_text = string.Format(quine_text, (char)34, quine_text);
if (YourVendor.Compiler.HasDeadCode(quine_text))
{{
System.Console.WriteLine({0}Dead code!{0});
}}
}}
}}";
quine_text = string.Format(quine_text, (char)34, quine_text);
if (YourVendor.Compiler.HasDeadCode(quine_text))
{
System.Console.WriteLine("Dead code!");
}
}
}
If YourVendor.Compiler.HasDeadCode(quine_text) returns false, then the line System.Console.WriteLn("Dead code!"); won't be ever executed, so this program actually does have dead code, and the detector was wrong.
But if it returns true, then the line System.Console.WriteLn("Dead code!"); will be executed, and since there is no more code in the program, there is no dead code at all, so again, the detector was wrong.
So there you have it, a dead-code detector that returns only "There is dead code" or "There is no dead code" must sometimes yield wrong answers.
If the halting problem is too obscure, think of it this way.
Take a mathematical problem that is believed to be true for all positive integer's n, but hasn't been proven to be true for every n. A good example would be Goldbach's conjecture, that any positive even integer greater than two can be represented by the sum of two primes. Then (with an appropriate bigint library) run this program (pseudocode follows):
for (BigInt n = 4; ; n+=2) {
if (!isGoldbachsConjectureTrueFor(n)) {
print("Conjecture is false for at least one value of n\n");
exit(0);
}
}
Implementation of isGoldbachsConjectureTrueFor() is left as an exercise for the reader but for this purpose could be a simple iteration over all primes less than n
Now, logically the above must either be the equivalent of:
for (; ;) {
}
(i.e. an infinite loop) or
print("Conjecture is false for at least one value of n\n");
as Goldbach's conjecture must either be true or not true. If a compiler could always eliminate dead code, there would definitely be dead code to eliminate here in either case. However, in doing so at the very least your compiler would need to solve arbitrarily hard problems. We could provide problems provably hard that it would have to solve (e.g. NP-complete problems) to determine which bit of code to eliminate. For instance if we take this program:
String target = "f3c5ac5a63d50099f3b5147cabbbd81e89211513a92e3dcd2565d8c7d302ba9c";
for (BigInt n = 0; n < 2**2048; n++) {
String s = n.toString();
if (sha256(s).equals(target)) {
print("Found SHA value\n");
exit(0);
}
}
print("Not found SHA value\n");
we know that the program will either print out "Found SHA value" or "Not found SHA value" (bonus points if you can tell me which one is true). However, for a compiler to be able to reasonably optimise that would take of the order of 2^2048 iterations. It would in fact be a great optimisation as I predict the above program would (or might) run until the heat death of the universe rather than printing anything without optimisation.
I don't know if C++ or Java have an Eval type function, but many languages do allow you do call methods by name. Consider the following (contrived) VBA example.
Dim methodName As String
If foo Then
methodName = "Bar"
Else
methodName = "Qux"
End If
Application.Run(methodName)
The name of the method to be called is impossible to know until runtime. Therefore, by definition, the compiler cannot know with absolute certainty that a particular method is never called.
Actually, given the example of calling a method by name, the branching logic isn't even necessary. Simply saying
Application.Run("Bar")
Is more than the compiler can determine. When the code is compiled, all the compiler knows is that a certain string value is being passed to that method. It doesn't check to see if that method exists until runtime. If the method isn't called elsewhere, through more normal methods, an attempt to find dead methods can return false positives. The same issue exists in any language that allows code to be called via reflection.
Unconditional dead code can be detected and removed by advanced compilers.
But there is also conditional dead code. That is code that cannot be known at the time of compilation and can only be detected during runtime. For example, a software may be configurable to include or exclude certain features depending on user preference, making certain sections of code seemingly dead in particular scenarios. That is not be real dead code.
There are specific tools that can do testing, resolve dependencies, remove conditional dead code and recombine the useful code at runtime for efficiency. This is called dynamic dead code elimination. But as you can see it is beyond the scope of compilers.
A simple example:
int readValueFromPort(const unsigned int portNum);
int x = readValueFromPort(0x100); // just an example, nothing meaningful
if (x < 2)
{
std::cout << "Hey! X < 2" << std::endl;
}
else
{
std::cout << "X is too big!" << std::endl;
}
Now assume that the port 0x100 is designed to return only 0 or 1. In that case the compiler cannot figure out that the else block will never be executed.
However in this basic example:
bool boolVal = /*anything boolean*/;
if (boolVal)
{
// Do A
}
else if (!boolVal)
{
// Do B
}
else
{
// Do C
}
Here the compiler can calculate out the the else block is a dead code.
So the compiler can warn about the dead code only if it has enough data to to figure out the dead code and also it should know how to apply that data in order to figure out if the given block is a dead code.
EDIT
Sometimes the data is just not available at the compilation time:
// File a.cpp
bool boolMethod();
bool boolVal = boolMethod();
if (boolVal)
{
// Do A
}
else
{
// Do B
}
//............
// File b.cpp
bool boolMethod()
{
return true;
}
While compiling a.cpp the compiler cannot know that boolMethod always returns true.
The compiler will always lack some context information. E.g. you might know, that a double value never exeeds 2, because that is a feature of the mathematical function, you use from a library. The compiler does not even see the code in the library, and it can never know all features of all mathematical functions, and detect all weired and complicated ways to implement them.
The compiler doesn't necessarily see the whole program. I could have a program that calls a shared library, which calls back into a function in my program which isn't called directly.
So a function which is dead with respect to the library it's compiled against could become alive if that library was changed at runtime.
If a compiler could eliminate all dead code accurately, it would be called an interpreter.
Consider this simple scenario:
if (my_func()) {
am_i_dead();
}
my_func() can contain arbitrary code and in order for the compiler to determine whether it returns true or false, it will either have to run the code or do something that is functionally equivalent to running the code.
The idea of a compiler is that it only performs a partial analysis of the code, thus simplifying the job of a separate running environment. If you perform a full analysis, that isn't a compiler any more.
If you consider the compiler as a function c(), where c(source)=compiled code, and the running environment as r(), where r(compiled code)=program output, then to determine the output for any source code you have to compute the value of r(c(source code)). If calculating c() requires the knowledge of the value of r(c()) for any input, there is no need for a separate r() and c(): you can just derive a function i() from c() such that i(source)=program output.
Others have commented on the halting problem and so forth. These generally apply to portions of functions. However it can be hard/impossible to know whether even an entire type (class/etc) is used or not.
In .NET/Java/JavaScript and other runtime driven environments there's nothing stopping types being loaded via reflection. This is popular with dependency injection frameworks, and is even harder to reason about in the face of deserialisation or dynamic module loading.
The compiler cannot know whether such types would be loaded. Their names could come from external config files at runtime.
You might like to search around for tree shaking which is a common term for tools that attempt to safely remove unused subgraphs of code.
Take a function
void DoSomeAction(int actnumber)
{
switch(actnumber)
{
case 1: Action1(); break;
case 2: Action2(); break;
case 3: Action3(); break;
}
}
Can you prove that actnumber will never be 2 so that Action2() is never called...?
I disagree about the halting problem. I wouldn't call such code dead even though in reality it will never be reached.
Instead, lets consider:
for (int N = 3;;N++)
for (int A = 2; A < int.MaxValue; A++)
for (int B = 2; B < int.MaxValue; B++)
{
int Square = Math.Pow(A, N) + Math.Pow(B, N);
float Test = Math.Sqrt(Square);
if (Test == Math.Trunc(Test))
FermatWasWrong();
}
private void FermatWasWrong()
{
Press.Announce("Fermat was wrong!");
Nobel.Claim();
}
(Ignore the type and overflow errors) Dead code?
Look at this example:
public boolean isEven(int i){
if(i % 2 == 0)
return true;
if(i % 2 == 1)
return false;
return false;
}
The compiler can't know that an int can only be even or odd. Therefore the compiler must be able to understand the semantics of your code. How should this be implemented? The compiler can't ensure that the lowest return will never be executed. Therefore the compiler can't detect the dead code.

performance of many if statements/switch cases

If I had literally 1000s of simple if statements or switch statements
ex:
if 'a':
return 1
if 'b':
return 2
if 'c':
return 3
...
...
Would the performance of creating trivial if statements be faster when compared to searching a list for something? I imagined that because every if statement must be tested until the desired output is found (worst case O(n)) it would have the same performance if I were to search through a list. This is just an assumption. I have no evidence to prove this. I am curious to know this.
You could potentially put these things in to delegates that are then in a map, the key of which is the input you've specified.
C# Example:
// declare a map. The input(key) is a char, and we have a function that will return an
// integer based on that char. The function may do something more complicated.
var map = new Dictionary<char, Func<char, int>>();
// Add some:
map['a'] = (c) => { return 1; };
map['b'] = (c) => { return 2; };
map['c'] = (c) => { return 3; };
// etc... ad infinitum.
Now that we have this map, we can quite cleanly return something based on the input
public int Test(char c)
{
Func<char, int> func;
if(map.TryGetValue(c, out func))
return func(c);
return 0;
}
In the above code, we can call Test and it will find the appropriate function to call (if present). This approach is better (imho) than a list as you'd have to potentially search the entire list to find the desired input.
This depends on the language and the compiler/interpreter you use. In many interpreted languages, the performance will be the same, in other languages, the switch statements gives the compiler crucial additional information that it can use to optimize the code.
In C, for instance, I expect a long switch statement like the one you present to use a lookup table under the hood, avoiding explicit comparison with all the different values. With that, your switch decision takes the same time, no matter how many cases you have. A compiler might also hardcode a binary search for the matching case. These optimizations are typically not performed when evaluating a long else if() ladder.
In any case, I repeat, it depends on the interpreter/compiler: If your compiler optimized else if() ladders, but no switch statements, what it could do with a switch statement is quite irrelevant. However, for mainline languages, you should be able to expect all constructs to be optimized.
Apart from that, I advise to use a switch statement wherever applicable, it carries a lot more semantic information to the reader than an equivalent else if() ladder.

OOP much slower than Structural programming. why and how can be fixed?

as i mentioned on subject of this post i found out OOP is slower than Structural Programming(spaghetti code) in the hard way.
i writed a simulated annealing program with OOP then remove one class and write it structural in main form. suddenly it got much faster . i was calling my removed class in every iteration in OOP program.
also checked it with Tabu Search. Same result .
can anyone tell me why this is happening and how can i fix it on other OOP programs?
are there any tricks ? for example cache my classes or something like that?
(Programs has been written in C#)
If you have a high-frequency loop, and inside that loop you create new objects and don't call other functions very much, then, yes, you will see that if you can avoid those news, say by re-using one copy of the object, you can save a large fraction of total time.
Between new, constructors, destructors, and garbage collection, a very little code can waste a whole lot of time.
Use them sparingly.
Memory access is often overlooked. The way o.o. tends to lay out data in memory is not conducive to efficient memory access in practice in loops. Consider the following pseudocode:
adult_clients = 0
for client in list_of_all_clients:
if client.age >= AGE_OF_MAJORITY:
adult_clients++
It so happens that the way this is accessed from memory is quite inefficient on modern architectures because they like accessing large contiguous rows of memory, but we only care for client.age, and of all clients we have; those will not be laid out in contiguous memory.
Focusing on objects that have fields results into data being laid out in memory in such a way that fields that hold the same type of information will not be laid out in consecutive memory. Performance-heavy code tends to involve loops that often look at data with the same conceptual meaning. It is conducive to performance that such data be laid out in contiguous memory.
Consider these two examples in Rust:
// struct that contains an id, and an optiona value of whether the id is divisible by three
struct Foo {
id : u32,
divbythree : Option<bool>,
}
fn main () {
// create a pretty big vector of these structs with increasing ids, and divbythree initialized as None
let mut vec_of_foos : Vec<Foo> = (0..100000000).map(|i| Foo{ id : i, divbythree : None }).collect();
// loop over all hese vectors, determine if the id is divisible by three
// and set divbythree accordingly
let mut divbythrees = 0;
for foo in vec_of_foos.iter_mut() {
if foo.id % 3 == 0 {
foo.divbythree = Some(true);
divbythrees += 1;
} else {
foo.divbythree = Some(false);
}
}
// print the number of times it was divisible by three
println!("{}", divbythrees);
}
On my system, the real time with rustc -O is 0m0.436s; now let us consider this example:
fn main () {
// this time we create two vectors rather than a vector of structs
let vec_of_ids : Vec<u32> = (0..100000000).collect();
let mut vec_of_divbythrees : Vec<Option<bool>> = vec![None; vec_of_ids.len()];
// but we basically do the same thing
let mut divbythrees = 0;
for i in 0..vec_of_ids.len(){
if vec_of_ids[i] % 3 == 0 {
vec_of_divbythrees[i] = Some(true);
divbythrees += 1;
} else {
vec_of_divbythrees[i] = Some(false);
}
}
println!("{}", divbythrees);
}
This runs in 0m0.254s on the same optimization level, — close to half the time needed.
Despite having to allocate two vectors instead of of one, storing similar values in contiguous memory has almost halved the execution time. Though obviously the o.o. approach provides for much nicer and more maintainable code.
P.s.: it occurs to me that I should probably explain why this matters so much given that the code itself in both cases still indexes memory one field at a time, rather than, say, putting a large swath on the stack. The reason is c.p.u. caches: when the program asks for the memory at a certain address, it actually obtains, and caches, a significant chunk of memory around that address, and if memory next to it be asked quickly again, then it can serve it from the cache, rather than from actual physical working memory. Of course, compilers will also vectorize the bottom code more efficiently as a consequence.

Changing from recursion to iteration

I have my function that needs changing to iteration, because of very slow executing of it.
There are two recursive calls, depends which condition is true.
It's something like this:
(I have static array of classes outside function, let's say data)
void func(int a, int b, int c, int d) {
//something
//non-recursive
//here..
for (i=0; i<data.length; i++) {
if (!data[i].x) {
data[i].x = 1;
if (a == data[i].value1)
func(data[i].value2,b,c,d);
else if (a == data[i].value2)
func(data[i].value1,b,c,d);
data[i].x = 0;
}
}
}
!!
EDIT: Here is my algorithm: http://pastebin.com/F7UfzfHv
Function searches for every paths from one point to another in graph, but returns (to an array) only one path in which are only unique vertices.
!!
And I know that good way to manage that is to use stack... But I don't know how. Could someone give me a hint about it?
There is a good possibility that your function might be computing the same thing again and again, which in technical terms is called as overlapping sub-problems(if you are familiar with the dynamic programming then you might know). First try to identify whether that is the case or not. Since you haven't told anything about what function do and I can't tell the exact problem.
Also the thing you are talking about using a stack is not of great use since recursion uses a stack internally. Using external stack instead of internal is more error prone is not of great use.

Good algorithm to turn stl map into sorted list of the keys based on a numeric value

I have a stl map that's of type:
map<Object*, baseObject*>
where
class baseObject{
int ID;
//other stuff
};
If I wanted to return a list of objects (std::list< Object* >), what's the best way to sort it in order of the baseObject.ID's?
Am I just stuck looking through for every number or something? I'd prefer not to change the map to a boost map, although I wouldn't be necessarily against doing something that's self contained within a return function like
GetObjectList(std::list<Object*> &objects)
{
//sort the map into the list
}
Edit: maybe I should iterate through and copy the obj->baseobj into a map of baseobj.ID->obj ?
What I'd do is first extract the keys (since you only want to return those) into a vector, and then sort that:
std::vector<baseObject*> out;
std::transform(myMap.begin(), myMap.end(), std::back_inserter(out), [](std::pair<Object*, baseObject*> p) { return p.first; });
std::sort(out.begin(), out.end(), [&myMap](baseObject* lhs, baseObject* rhs) { return myMap[lhs].componentID < myMap[rhs].componentID; });
If your compiler doesn't support lambdas, just rewrite them as free functions or function objects. I just used lambdas for conciseness.
For performance, I'd probably reserve enough room in the vector initially, instead of letting it gradually expand.
(Also note that I haven't tested the code, so it might need a little bit of fiddling)
Also, I don't know what this map is supposed to represent, but holding a map where both key and value types are pointers really sets my "bad C++" sense tingling. It smells of manual memory management and muddled (or nonexistent) ownership semantics.
You mentioned getting the output in a list, but a vector is almost certainly a better performing option, so I used that. The only situation where a list is preferable is really when you have no intention of ever iterating over it, and if you need the guarantee that pointers and iterators stay valid after modification of the list.
The first thing is that I would not use a std::list, but rather a std::vector. Now as of the particular problem you need to perform two operations: generate the container, sort it by whatever your criteria is.
// Extract the data:
std::vector<Object*> v;
v.reserve( m.size() );
std::transform( m.begin(), m.end(),
std::back_inserter(v),
[]( const map<Object*, baseObject*>::value_type& v ) {
return v.first;
} );
// Order according to the values in the map
std::sort( v.begin(), v.end(),
[&m]( Object* lhs, Object* rhs ) {
return m[lhs]->id < m[rhs]->id;
} );
Without C++11 you will need to create functors instead of the lambdas, and if you insist in returning a std::list then you should use std::list<>::sort( Comparator ). Note that this is probably inefficient. If performance is an issue (after you get this working and you profile and know that this is actually a bottleneck) you might want to consider using an intermediate map<int,Object*>:
std::map<int,Object*> mm;
for ( auto it = m.begin(); it != m.end(); ++it )
mm[ it->second->id ] = it->first;
}
std::vector<Object*> v;
v.reserve( mm.size() ); // mm might have less elements than m!
std::transform( mm.begin(), mm.end(),
std::back_inserter(v),
[]( const map<int, Object*>::value_type& v ) {
return v.second;
} );
Again, this might be faster or slower than the original version... profile.
I think you'll do fine with:
GetObjectList(std::list<Object*> &objects)
{
std::vector <Object*> vec;
vec.reserve(map.size());
for(auto it = map.begin(), it_end = map.end(); it != it_end; ++it)
vec.push_back(it->second);
std::sort(vec.begin(), vec.end(), [](Object* a, Object* b) { return a->ID < b->ID; });
objects.assign(vec.begin(), vec.end());
}
Here's how to do what you said, "sort it in order of the baseObject.ID's":
typedef std::map<Object*, baseObject*> MapType;
MapType mymap; // don't care how this is populated
// except that it must not contain null baseObject* values.
struct CompareByMappedId {
const MapType &map;
CompareByMappedId(const MapType &map) : map(map) {}
bool operator()(Object *lhs, Object *rhs) {
return map.find(lhs)->second->ID < map.find(rhs)->second->ID;
}
};
void GetObjectList(std::list<Object*> &objects) {
assert(objects.empty()); // pre-condition, or could clear it
// or for that matter return a list by value instead.
// copy keys into list
for (MapType::const_iterator it = mymap.begin(); it != mymap.end(); ++it) {
objects.push_back(it->first);
}
// sort the list
objects.sort(CompareByMappedId(mymap));
}
This isn't desperately efficient: it does more looking up in the map than is strictly necessary, and manipulating list nodes in std::list::sort is likely a little slower than std::sort would be at manipulating a random-access container of pointers. But then, std::list itself isn't very efficient for most purposes, so you expect it to be expensive to set one up.
If you need to optimize, you could create a vector of pairs of (int, Object*), so that you only have to iterate over the map once, no need to look things up. Sort the pairs, then put the second element of each pair into the list. That may be a premature optimization, but it's an effective trick in practice.
I would create a new map that had a sort criterion that used the component id of your objects. Populate the second map from the first map (just iterate through or std::copy in). Then you can read this map in order using the iterators.
This has a slight overhead in terms of insertion over using a vector or list (log(n) time instead of constant time), but it avoids the need to sort after you've created the vector or list which is nice.
Also, you'll be able to add more elements to it later in your program and it will maintain its order without need of a resort.
I'm not sure I completely understand what you're trying to store in your map but perhaps look here
The third template argument of an std::map is a less functor. Perhaps you can utilize this to sort the data stored in the map on insertion. Then it would be a straight forward loop on a map iterator to populate a list

Resources