Why is C++ so much faster than C in this code? - gcc

My C code is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void){
char* a = (char*)malloc(200000);
for (int i = 0;i< 100000;i++){
strcat(a,"b");
}
printf("%s",a);
}
My C++ code is
#include <iostream>
int main(void){
std::string a = "";
for (int i = 0;i< 100000;i++){
¦ a+="b";
}
std::cout<<a;
}
On my machine, the C code runs in about 5 seconds, while on my machine, the C++ code runs in 0.025! seconds.
Now, the C code doesn't check for overflows, has no C++ overhead, classes and yet is quite a few magnitudes slower than my C++ code.
Using gcc/g++ 6.2.0 compiled with -O3 on Raspberry Pi.

#erwin was correct.
When I change my code to
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void mystrcat(char* src,char* dest,int lenSrc){
src[lenSrc]=dest[0];
}
int main(void){
char* a = (char*)malloc(200000);
for (int i = 0;i< 100000;i++){
mystrcat(a,"b",i);
}
a[100000] = 0;
printf("%s\n",a);
}
It takes about .012s to run (mostly printing the large screen).
Shlemiel's the painter's algorithm at work!

Related

How to compile the exp bechmark in riscv-test

I have built simple program to calculate the exp of value. I got error;
#include <stdint.h>
#include "util.h"
#include <math.h>
#include <stdio.h>
int main() {
double value = -150;
Start_Timer();
for(int i=0; i<500 ;i++){
result = exp(value);
value++;
}
Stop_Timer();
User_Time=End_Time-Begin_Time;
printf("User_Time: %ld - %ld = %ld - \n", End_Time,Begin_Time,User_Time);
printf("The Exponential of %ld is %ld\n", value, result);
return 0;
}
Any idea how to use exp in Benchmark for testing.
i have figured out that exp function need -x and -lm for compiling. How can i use them in the test
C Failing to compile: Can't find math.h functions
I tried to edit the makefile in riscv-test/benchmark but i think , it is little bit tricky for me.
Error Message:https://github.com/riscv/riscv-tests/issues/142

__seg_fs on GCC. Is it possible to emulate it just in a program?

I've just read about support for %fs and %gs segment prefixes on the Intel platforms in GCC.
It was mentioned that "The way you obtain %gs-based pointers, or control the
value of %gs itself, is out of the scope of gcc;"
I'm looking for a way when I manually can set the value of %fs (I'm on IA32, RH Linux) and work with it. When I just set %fs=%ds the test below works fine and this is expected. But I cannot change the test in order to have another value of %fs and do not get a segmentation fault. I start thinking that changing the value of %fs is not the only thing to do. So I'm looking for an advice how to make a part of memory addressed by %fs that is not equal to DS.
#include <stddef.h>
typedef char __seg_fs fs_ptr;
fs_ptr p[] = {'h','e','l','l','o','\0'};
void fs_puts(fs_ptr *s)
{
char buf[100];
buf[0] = s[0];
buf[1] = s[1];
buf[2] = s[2];
buf[3] = '\0';
puts(buf);
}
void __attribute__((constructor)) set_fs()
{
__asm__("mov %ds, %bx\n\t"
"add $0, %bx\n\t" //<---- if fs=ds then the program executes as expected. If not $0 here, then segmentation fault happens.
"mov %bx, %fs\n\t");
}
int main()
{
fs_puts(p);
return 0;
}
I've talked with Armin who implemented __seg_gs/__seg_fs in GCC (Thanks Armin!).
So basically I cannot use these keywords for globals. The aim of introducing __seg_gs/fs was to have a possibility to dynamically allocate regions of memory that are thread-local.
We cannot use __thread for a pointer and to allocate a memory for it using malloc. But __seg_gs/fs introduce such possibility.
The test below somehow illustrates that.
Note that arch_prctl() was used. It exists as 64-bit version only.
Also note that %fs is used for __thread on 64-bit and %gs is free.
#include <stddef.h>
#include <string.h>
#include <stdio.h>
#include <asm/ldt.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#include <asm/prctl.h>
#include <sys/syscall.h>
#include <unistd.h>
typedef __seg_gs char gs_str;
void gs_puts(gs_str *ptr)
{
int i;
char buf[100];
for(i = 0; i < 100; i++)
buf[i] = ptr[i];
puts(buf);
}
int main()
{
int i;
void *buffer = malloc(100 * sizeof(char));
arch_prctl(ARCH_SET_GS, buffer);
gs_str *gsobj = (gs_str *)0;
for (i = 0; i < 100; i++)
gsobj[i] = 'a'; /* in the %gs space */
gs_puts(gsobj);
return 0;
}

Multiplication - Matrix by imaginary unit

I would like to ask if anybody knows why this is not working:
For example, let
SparseMatrix<int> A
and
SparseMatrix<std::complex<float> > B
I would like to do the following math:
B=i*A
As code:
std::complex<float> c;
c=1.0i;
B=A.cast<std::complex<float> >()*c;
or equivalent:
B=A.cast<std::complex<float> >()*1.0i;
I expect all real values of A to be imaginary in B but
there are only zeros as (0,0).
Example:
#include <Eigen/Sparse>
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
using namespace Eigen;
using std::cout;
using std::endl;
int main(int argc, char *argv[]){
int rows=5, cols=5;
SparseMatrix<int> A(rows,cols);
A.setIdentity();
SparseMatrix<std::complex<float> > B;
std::complex<float> c;
c=1i;
B=A.cast<std::complex<float> >()*1.0i;
//B=A.cast<std::complex<float> >()*c;
cout << B << endl;
return 0;
}
compile with:
g++ [name].cpp -o [name]
What am I doing wrong?
Thanks a lot for any help!
You need to enable c++14 to get 1.0i working as expected. With GCC or clang, you need to add the -std=c++14 compiler option.
Then, you can simply do:
MatrixXd A = MatrixXd::Random(3,3);
MatrixXcd B;
B = A * 1.0i;
Same with a SparseMatrix.

Dinic algorithm implementation and a spoj puzzle

I am trying to solve this problem on http://www.spoj.com/problems/FASTFLOW/
I suppose dinic's algorithm is suitable for this problem. But it runs in O(E.V^2) time which is too slow for this problem in the worst case. Any suggestions for a different algorithm or for improving the running time of this algorithm?
EDIT: I am including my implementation of dinic's algorithm. Apparently, it contains some mistake...Could anyone give some test case or help in debugging the logic of program.
//#define DEBUG //comment when you have to disable all debug macros.
#define NDEBUG //comment when all assert statements have to be disabled.
#include <iostream>
#include <cstring>
#include <sstream>
#include <cstdlib>
#include <cstdio>
#include <cmath>
#include <vector>
#include <set>
#include <map>
#include <bitset>
#include <climits>
#include <ctime>
#include <algorithm>
#include <functional>
#include <stack>
#include <queue>
#include <list>
#include <deque>
#include <sys/time.h>
#include <iomanip>
#include <cstdarg>
#include <utility> //std::pair
#include <cassert>
#define tr(c,i) for(typeof(c.begin()) i = (c).begin(); i != (c).end(); i++)
#define present(c,x) ((c).find(x) != (c).end())
#define all(x) x.begin(), x.end()
#define pb push_back
#define mp make_pair
#define log2(x) (log(x)/log(2))
#define ARRAY_SIZE(arr) (1[&arr]-arr)
#define INDEX(arr,elem) (lower_bound(all(arr),elem)-arr.begin())
#define lld long long int
#define MOD 1000000007
#define gcd __gcd
#define equals(a,b) (a.compareTo(b)==0) //for strings only
using namespace std;
#ifdef DEBUG
#define debug(args...) {dbg,args; cerr<<endl;}
#else
#define debug(args...) // Just strip off all debug tokens
#endif
struct debugger
{
template<typename T> debugger& operator , (const T& v)
{
cerr<<v<<" ";
return *this;
}
}dbg;
/**********************************MAIN CODE***************************************************/
//runs in O(V^2E) time.
//might consider using a 1-d array of size V*V for large values of V
vector<vector<lld> > flow, capacity, level_graph;
lld V;
vector<lld> *adj, *level_adj;
void init(lld v)
{
adj=new vector<lld>[v+1];
level_adj=new vector<lld>[v+1];
V=v;
flow.resize(V+1);
capacity.resize(V+1);
level_graph.resize(V+1);
for(lld i=0;i<=V;i++)
flow[i].resize(V+1), capacity[i].resize(V+1), level_graph[i].resize(V+1);
}
void add_edge(lld u, lld v, lld uv, lld vu=0)
{
capacity[u][v]=uv;
capacity[v][u]=vu;
adj[u].push_back(v);
flow[u][v]=uv; //will store the present capacity. facility for the residual graph
flow[v][u]=vu;
if(vu) adj[v].push_back(u);
}
void update_residual_graph(lld source, lld destination, lld *parent) //push augment flow in the residual graph and modify the latter.
{
lld i=destination, aug=LLONG_MAX;
while(parent[i]!=-2)
{
//debug(i);
aug=min(aug,flow[parent[i]][i]);
i=parent[i];
}
i=destination;
while(parent[i]!=-2)
{
flow[parent[i]][i]-=aug;
flow[i][parent[i]]=capacity[parent[i]][i]-flow[parent[i]][i];
i=parent[i];
}
}
bool DFS(lld source, lld destination)
{
stack<lld> state;
bool visited[V+1], present;
lld parent[V+1],t;
memset(visited, false, sizeof(visited));
memset(parent, -1, sizeof(parent));
parent[source]=-2;
state.push(source);
visited[source]=true;
while(!state.empty())
{
t=state.top();
present=false;
for(vector<lld>::iterator it=level_adj[t].begin(); it!=level_adj[t].end();it++)
{
parent[*it]=t;
if(!visited[*it] && level_graph[t][*it])
{
present=true;
state.push(*it);
visited[*it]=true;
if(*it==destination)
update_residual_graph(source,destination,parent); //update residual graph
}
}
if(!present)
state.pop();
}
return parent[destination]!=-1;
}
bool BFS(lld source, lld destination)
{
//create level graph usign BFS
fill(level_graph.begin(), level_graph.end(), vector<lld>(V+1,-1));
lld i,j;
for(i=1;i<=V;i++)
level_adj[i].clear();
queue<lld> state;
lld level[V+1],t; //record of minimum distance from source
memset(level,-1, sizeof(level));
state.push(source);
level[source]=0;
while(!state.empty())
{
t=state.front();
state.pop();
for(vector<lld>::iterator it=adj[t].begin();it!=adj[t].end();it++)
{
if((level[*it]==-1 && flow[t][*it]) || (level[*it]==level[t]+1))
{
level_graph[t][*it]=flow[t][*it];
level_adj[t].push_back(*it);
level[*it]=level[t]+1;
state.push(*it);
}
}
}
if(level[destination]==-1)
return false;
//call DFS and update the residual graph
return DFS(source,destination);
}
lld maximum_flow(lld source, lld destination)
{
while(BFS(source,destination));
lld max_flow=0;
for(vector<lld>::iterator it=adj[source].begin(); it!=adj[source].end(); it++)
max_flow+=flow[*it][source];
return max_flow;
}
int main()
{
lld e,u,v,n,c;
//cout<<"V:"<<endl;
cin>>n>>e;
init(n);
while(e--)cin>>u>>v>>c, add_edge(u,v,c);
cout<<maximum_flow(1,n)<<endl;
}
The push-relabel algorithm with the global relabeling heuristic proposed by Cherkassky--Goldberg should be sufficient (every m steps, recompute the labels with breadth-first search). The practical running times with the heuristic are much, much better than the worst-case cubic bound. (You could do gap relabeling too, but it's trickier to implement and probably not necessary for this application.)

OpenCV: Load multiple images and cluster them with bag of words

I am trying to cluster video frames abnormal and normal. I divided into frames as normal and abnormal with frames. I have two problem, I am not sure whether my approach is true or not and I got an unexpected error.
Please help me.
Error code: bowTrainer.add(features1);
My full code is as below:
// Bow.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include "opencv2/video/tracking.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/features2d/features2d.hpp"
#include <opencv2/nonfree/features2d.hpp>
#include <opencv2/nonfree/nonfree.hpp>
#include <opencv2/legacy/legacy.hpp>
#include <windows.h>
#include "opencv2/ml/ml.hpp"
#include <stdlib.h>
#include <stdio.h>
#include <sys/stat.h>
#define _USE_MATH_DEFINES
#include <math.h>
#include <limits>
#include <cstdio>
#include <iostream>
#include <fstream>
using namespace std;
using namespace cv;
using std::vector;
using std::iostream;
int main()
{
initModule_nonfree();
Ptr<FeatureDetector> features = FeatureDetector::create("SIFT");
Ptr<DescriptorExtractor> descriptor = DescriptorExtractor::create("SIFT");
Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("FlannBased");
//defining terms for bowkmeans trainer
TermCriteria tc(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 10, 0.001);
int dictionarySize = 100;
int retries = 1;
int flags = KMEANS_PP_CENTERS;
BOWKMeansTrainer bowTrainer(dictionarySize, tc, retries, flags);
BOWImgDescriptorExtractor bowDE(descriptor, matcher);
//**creating dictionary**//
Mat trainme(0, dictionarySize, CV_32FC1);
Mat labels(0, 1, CV_32FC1); //1d matrix with 32fc1 is requirement of normalbayesclassifier class
int i=0;
while(i<10)
{
char filename[255];
string n;
n=sprintf(filename, "C:\\Users\\Desktop\\New folder\\View_001\\frame_000%d.jpg",i);
Mat img = imread(filename, 0);
Mat features1;
vector<KeyPoint> keypoints;
descriptor->compute(img, keypoints, features1);
bowTrainer.add(features1);
Mat dictionary = bowTrainer.cluster();
bowDE.setVocabulary(dictionary);
Mat bowDescriptor;
bowDE.compute(img, keypoints, bowDescriptor);
trainme.push_back(bowDescriptor);
float label = 1.0;
labels.push_back(label);
i++;
}
int j=11;
while(j<21)
{
char filename2[255];
string n;
n=sprintf(filename2, "C:\\Users\\Desktop\\New folder\\View_001\\frame_000%d.jpg",j);
cout<<filename2;
Mat img2 = imread(filename2, 0);
Mat features2;
vector<KeyPoint> keypoints2;
descriptor->compute(img2, keypoints2, features2);
bowTrainer.add(features2);
Mat bowDescriptor2;
bowDE.compute(img2, keypoints2, bowDescriptor2);
trainme.push_back(bowDescriptor2);
float label = 2.0;
labels.push_back(label);
j++;
}
NormalBayesClassifier classifier;
classifier.train(trainme, labels);
//**classifier trained**//
//**now trying to predict using the same trained classifier, it should return 1.0**//
Mat tryme(0, dictionarySize, CV_32FC1);
Mat tryDescriptor;
Mat img3 = imread("C:\\Users\\Desktop\\New folder\\View_001\\frame_0121.jpg", 0);
vector<KeyPoint> keypoints3;
features->detect(img3, keypoints3);
bowDE.compute(img3, keypoints3, tryDescriptor);
tryme.push_back(tryDescriptor);
cout<<classifier.predict(tryme)<<endl;
waitKey(0);
system("PAUSE");
return 0;
}

Resources