Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Looks like I finally improved a little map insertion speed (sorting before inserting). What do you think about these results ? Are there anymore optimisations ?
#include <map>
#include <iostream>
#include <algorithm>
int main (int argc, char* argv []) {
//a map<size_t, size_t> random initilisation
std::map<size_t, size_t> m0, m1;
{
std::vector<std::pair<size_t, size_t> > t (10000, std::pair<size_t, size_t> ((size_t) -1, (size_t) -1));
std::vector<std::pair<size_t, size_t> >::iterator i (t.begin ());
for (; i != t.end (); ++i) {
i->first = rand () % 1000000;
i->second = rand () %1;
}
m0.insert (t.begin (), t.end ());
m1 = m0;
}
//vins :
std::vector<std::pair<size_t, size_t> > vins (10000, std::pair<size_t, size_t> (0, 0));
{
std::vector<std::pair<size_t, size_t> >::iterator i (vins.begin ());
for (; i != vins.end (); ++i) {
i->first = rand () % 1000000;
i->second = rand () %1;
}
}
//normal insertion
clock_t t0 (clock ()), t1 (t0);
{
m0.insert (vins.begin (), vins.end ());
}
t1 = clock ();
std::cout << "normal insertion took " << (size_t) (t1 - t0) << " ticks" << std::endl;
//sort + hint insertion
t0 = t1;
{
std::sort (vins.begin (), vins.end (), [] (std::pair<size_t, size_t>& p0, std::pair<size_t, size_t>& p1)->bool {
return (p0.first < p1.first ? true:false);
});
std::map<size_t, size_t>::iterator ihint (m1.begin ());
//std::vector<std::pair<size_t, size_t> >::iterator i (vins.begin ());
//imroved and more C++11 solution
std::for_each (vins.begin (), vins.end (), [&ihint, &m1] (std::pair<size_t, size_t>& p) {
ihint = m1.insert (ihint, p);
});
}
t1 = clock ();
std::cout << "insertion after sorting took " << (size_t) (t1 - t0) << " ticks" << std::endl;
if (m0 != m1) std::cout << "but insertion is nok" << std::endl;
else std::cout << "and insertion is ok" << std::endl;
}
A result on a Lenovo Think Centre :
insertion took 2355 ticks
sort then insertion took 1706 ticks
and insertion is ok
If you don't need your items to be ordered, then using std::unordered_map is usually a better choice than std::map - it uses a hashtable rather than a balanced tree in most implementations, which means that operations are on average O(1) rather than O(log n). In addition, it can have a performance boost from cache-locality, as the underlying data structure (array) is usually more cache-friendly than a tree.
Enabling -O3 (or similar optimisation level) might increase the performance of your code, but is unlikely to effect the performance of the map's insertion/find operations (as it's already been compiled). Bear in mind that using -Ofast means the compiler no longer needs to strictly adhere to the standards, which is usually not a great idea - use it if performance is critical and you've checked your code works as expected with it, but usually -O3 is enough.
On my machine (Debian, g++ 6.3.0), using a few runs and taking rough averages:
Configuration Normal Hint Insertion Hint Insertion
------------------------- ----------------------- ----------------
std::map, -O0 9750 9200
std::map, -O3 8000 4250
std::unordered_map, -O0 7000 9700
std::unordered_map, -O3 4200 5000
Related
Since move-assigning a std::vector is is a O(1) time operation and copying a std::vector to another is O(N) (where N is the sum of the sizes of the 2 vectors), I expected to see move-assignment having a significant performance advantage over copying. To test this, I wrote the following code, which move-assigns/copies a std::vector nums2 of size 1000 to nums 100,000 times.
#include <iostream>
#include <vector>
#include <chrono>
using namespace std;
int main()
{
auto start = clock();
vector <int> nums;
for(int i = 0; i < 100000; ++i) {
vector <int> nums2(1000);
for(int i = 0; i < 1000; ++i) {
nums2[i] = rand();
}
nums = nums2; // or nums = move(nums2);
cout << (nums[0] ? 1:0) << "\b \b"; // prevent compiler from optimizing out nums (I think)
}
cout << "Time: " << (clock() - start) / (CLOCKS_PER_SEC / 1000) << '\n';
return 0;
}
The compiler I am using is g++ 7.5.0. When running with g++ -std=c++1z -O3, both the move-assign/copy versions take around 1600ms, which does not match with the hypothesis that move-assignment has any significant performance benefit. I then tested using std::swap(nums, nums2) (as an alternative to move-assignment), but that also took around the same time.
So, my question is, why doesn't move-assigning a std::vector to another seem to have a performance advantage over copy-assignment? Do I have a fundamental mistake in my understanding of C++ move-assignment?
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I wrote a small test to figure out the fastest mathematic operation for a special x. I wanted the x to be entered by the user, so that I can run the tests for different x. In the following code I tells me that there is an error with std::cin >> val;
error: cannot bind 'std::istream {aka std::basic_istream}' lvalue to 'std::basic_istream&&'
If I declare val as double valinstead of const double val I get more errors. What can I change in order to have a running programm?
#include <cmath>
#include <chrono>
#include <iomanip>
#include <iostream>
#include <istream>
#include <ostream>
// for x^1.5
double test_pow_15(double x) { return std::pow(x, 1.5); };
double test_chain_15(double x) { return sqrt(x * x * x); };
double test_tmp_15(double x) { double tmp = x * x * x; return sqrt(tmp); };
volatile double sink;
const double val = 0;
const double ans_15 = std::pow(val, 1.5);
void do_test(const char* name, double(&fn)(double), const double ans) {
auto start = std::chrono::high_resolution_clock::now();
for (size_t n = 0; n < 1000 * 1000 * 10; ++n) {
sink = val;
sink = fn(sink);
}
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> dur = end - start;
std::cout << name << ".Took" << dur.count() << "ms, error:" << sink - ans << '\n';
}
int main()
{
std::cout << "Speed test"<< '\n';
std::cout << "Please enter value for x."<< '\n';
std::cout << "x = ";
std::cin >> val;
std::cout << "Speed test starts for x = "<< val <<"."<<'\n';
std::cout << " " << '\n';
std::cout << "For " << val<<"^(1.5) the speed is:" <<'\n';
do_test("std::pow(x,1.5) ",test_pow_15, ans_15);
do_test("sqrt(x*x*x) ",test_chain_15, ans_15);
do_test("tmp = x*x*x; sqrt(tmp) ",test_tmp_15, ans_15);
return 0;
}
I think if you remove the "const" keyword, it would probably work fine.
double val = 0;
I am doing some parallel programming with async. I have an integrator and in a test program I wanted to see whether if dividing a vector in 4 subvectors actually takes one fourth of the time to complete the task.
I had an initial issue about the time measured, now solved as steady_clock() measures real and not CPU time.
I tried the code with different vector lenghts. For short lenghts (<10e5 elements) the direct integration is faster: normal, as the .get() calls and the sum take their time.
For intermediate lenghts (about 1e8 elements) the integration followed the expected time, giving 1 s as the first time and 0.26 s for the second time.
For long vectors(10e9 or higher) the second integration takes much more time than the first, more than 3 s against a similar or greater time.
Why? What is the process that makes the divide and conquer routine slower?
A couple of additional notes: Please note that I pass the vectors by reference, so that cannot be the issue, and keep in mind that this is a test code, thus the subvector creation is not the point of the question.
#include<iostream>
#include<vector>
#include<thread>
#include<future>
#include<ctime>
#include<chrono>
using namespace std;
using namespace chrono;
typedef steady_clock::time_point tt;
double integral(const std::vector<double>& v, double dx) //simpson 1/3
{
int n=v.size();
double in=0.;
if(n%2 == 1) {in+=v[n-1]*v[n-1]; n--;}
in=(v[0]*v[0])+(v[n-1]*v[n-1]);
for(int i=1; i<n/2; i++)
in+= 2.*v[2*i] + 4.*v[2*i+1];
return in*dx/3.;
}
int main()
{
double h=0.001;
vector<double> v1(100000,h); // a vector, content is not important
// subvectors
vector<double> sv1(v1.begin(), v1.begin() + v1.size()/4),
sv2(v1.begin() + v1.size()/4 +1,v1.begin()+ 2*v1.size()/4),
sv3( v1.begin() + 2*v1.size()/4+1, v1.begin() + 3*v1.size()/4+1),
sv4( v1.begin() + 3*v1.size()/4+1, v1.end());
double a,b;
cout << "f1" << endl;
tt bt1 = chrono::steady_clock::now();
// complete integration: should take time t
a=integral(v1, h);
tt et1 = chrono::steady_clock::now();
duration<double> time_span = duration_cast<duration<double>>(et1 - bt1);
cout << time_span.count() << endl;
future<double> f1, f2,f3,f4;
cout << "f2" << endl;
tt bt2 = chrono::steady_clock::now();
// four integrations: should take time t/4
f1 = async(launch::async, integral, ref(sv1), h);
f2 = async(launch::async, integral, ref(sv2), h);
f3 = async(launch::async, integral, ref(sv3), h);
f4 = async(launch::async, integral, ref(sv4), h);
b=f1.get()+f2.get()+f3.get()+f4.get();
tt et2 = chrono::steady_clock::now();
duration<double> time_span2 = duration_cast<duration<double>>(et2 - bt2);
cout << time_span2.count() << endl;
cout << a << " " << b << endl;
return 0;
}
Suppose I have an unsorted integer array {3, -1, 4, 5, -3, 2, 5}, and I want to find the maximum non-repeating number (4 in this case) (5 being invalid as it is repeated). How can I achieve this?
Use an unordered map to count the frequencies of each element. (As an optimization, keep track of largest element encountered and skip elements lower than that.) Then, scan the map to find out the largest element with frequency exactly equal to 1.
template <typename T> // numeric T
pair<T, bool> FindMaxNonRepeating(vector<T> const& vec) {
unordered_map<T, int> elem2freq;
for (auto const& elem : vec) {
elem2freq[elem] += 1;
}
T largest_non_repetitive = std::numeric_limits<T>::min();
bool found = false;
for (auto const& item : elem2freq) {
if (item.first > largest_non_repetitive && item.second == 1) {
largest_non_repetitive = item.first;
found = true;
}
}
return {largest_non_repetitive, found};
}
This runs in time complexity O(n) and requires space complexity O(n).
Sort the array in descending order.
Begin from top element and store it a variable, say max.
Check next element with max, if they are the same, repeat until
you find the next max, otherwise, you found the max non-repeated
number.
Time complexity: O(nlogn)
c++ implementation, based on my Sort (C++):
#include <algorithm>
#include <iostream>
#include <vector>
#include <limits>
#include <cstddef>
using namespace std;
void printVector(vector<int>& v)
{
for(vector<int>::iterator it = v.begin() ; it != v.end() ; it++)
cout << *it << ' ';
cout << endl;
}
bool compar(const int& a, const int& b)
{
return (a > b) ? true : false;
}
int main()
{
vector<int> v = {3, -1, 4, 5, -3, 2, 5};
cout << "Before sorting : " << endl;
printVector(v);
sort(v.begin(), v.end(), compar);
cout << endl << "After sorting : " << endl;
printVector(v);
int max_non_repeat = numeric_limits<int>::min();
for(unsigned int i = 0; i < v.size(); ++i)
{
if(max_non_repeat == v[i])
max_non_repeat = numeric_limits<int>::min();
else if(v[i] > max_non_repeat)
max_non_repeat = v[i];
}
cout << "Max non-repeated element: " << max_non_repeat << endl;
return 0;
}
Output:
C02QT2UBFVH6-lm:~ gsamaras$ g++ -Wall -std=c++0x main.cpp
C02QT2UBFVH6-lm:~ gsamaras$ ./a.out
Before sorting :
3 -1 4 5 -3 2 5
After sorting :
5 5 4 3 2 -1 -3
Max non-repeated element: 4
For maximum pleasure, do base your (a different) approach on How to find max. and min. in array using minimum comparisons? and modify it accordingly.
I wrote the following for a class, but came across some strange behavior while testing it. arrayProcedure is meant to do things with an array based on the 2 "tweaks" at the top of the function (arrSize, and start). For the assignment, arrSize must be 10,000, and start, 100. Just for kicks, I decided to see what happens if I increase them, and for some reason, if arrSize exceeds around 60,000 (I haven't found the exact limit), the program immediately crashes with a stack overflow when using a debugger:
Unhandled exception at 0x008F6977 in TMA3Question1.exe: 0xC00000FD: Stack overflow (parameters: 0x00000000, 0x00A32000).
If I just run it without a debugger, I don't get any helpful errors; windows hangs for a fraction of a second, then gives me an error TMA3Question1.exe has stopped working.
I decided to play around with debugging it, but that didn't shed any light. I placed breaks above and below the call to arrayProcedure, as well as peppered inside of it. When arrSize doesn't exceed 60,000 it runs fine: It pauses before calling arrayProcedure, properly waits at all the points inside of it, then pauses on the break underneath the call.
If I raise arrSize however, the break before the call happens, but it appears as though it never even steps into arrayProcedure; it immediately gives me a stack overflow without pausing at any of the internal breakpoints.
The only thing I can think of is the resulting arrays exceeds my computer's current memory, but that doesn't seem likely for a couple reasons:
It should only use just under a megabyte:
sizeof(double) = 8 bytes
8 * 60000 = 480000 bytes per array
480000 * 2 = 960000 bytes for both arrays
As far as I know, arrays aren't immediately constructed when I function is entered; they're allocated on definition. I placed several breakpoints before the arrays are even declared, and they are never reached.
Any light that you could shed on this would be appreciated.
The code:
#include <iostream>
#include <ctime>
//CLOCKS_PER_SEC is a macro supplied by ctime
double msBetween(clock_t startTime, clock_t endTime) {
return endTime - startTime / (CLOCKS_PER_SEC * 1000.0);
}
void initArr(double arr[], int start, int length, int step) {
for (int i = 0, j = start; i < length; i++, j += step) {
arr[i] = j;
}
}
//The function we're going to inline in the next question
void helper(double a1, double a2) {
std::cout << a1 << " * " << a2 << " = " << a1 * a2 << std::endl;
}
void arrayProcedure() {
const int arrSize = 70000;
const int start = 1000000;
std::cout << "Checking..." << std::endl;
if (arrSize > INT_MAX) {
std::cout << "Given arrSize is too high and exceeds the INT_MAX of: " << INT_MAX << std::endl;
return;
}
double arr1[arrSize];
double arr2[arrSize];
initArr(arr1, start, arrSize, 1);
initArr(arr2, arrSize + start - 1, arrSize, -1);
for (int i = 0; i < arrSize; i++) {
helper(arr1[i], arr2[i]);
}
}
int main(int argc, char* argv[]) {
using namespace std;
const clock_t startTime = clock();
arrayProcedure();
clock_t endTime = clock();
cout << endTime << endl;
double elapsedTime = msBetween(startTime, endTime);
cout << "\n\n" << elapsedTime << " milliseconds. ("
<< elapsedTime / 60000 << " minutes)\n";
}
The default stack size is 1 MB with Visual Studio.
https://msdn.microsoft.com/en-us/library/tdkhxaks.aspx
You can increase the stack size or use the new operator.
double *arr1 = new double[arrSize];
double *arr2 = new double[arrSize];
...
delete [] arr1;
delete [] arr2;