I'm new here and actually
I've got a problem in my mind, and it's like this:
I get an input of a vector of any size, but for this case, let's take this one:
vetor = {1, 2, 3, 4}
Now, all I want to do is to take this numbers and sum each one (considering it's unity, tens, hundred, thousand) and register the result into a integer variable, for the case, 'int vec_value'.
Considering the vector stated above, the answer should be: vec_value = 4321.
I will leave the main.cpp attached to the post, however I will tell you how I calculated the result, but it gave me the wrong answer.
vetor[0] = 1
vetor[1] = 2
vetor[2] = 3
vetor[3] = 4
the result should be = (1*10^0)+(2*10^1)+(3*10^2)+(4*10^3) = 1 + 20 +
300 + 4000 = 4321.
The program is giving me the solution as 4320, and if I change the values randomly, the answer follows the new values, but with wrong numbers still.
If anyone could take a look at my code to see what I'm doing wrong I'd appreciate it a lot!
Thanks..
There's a link to a picture at the end of the post showing an example of wrong result.
Keep in mind that sometimes the program gives me the right answer (what leaves me more confused)
Code:
#include <iostream>
#include <ctime>
#include <cstdlib>
#include <vector>
#include <cmath>
using namespace std;
int main()
{
vector<int> vetor;
srand(time(NULL));
int lim = rand() % 2 + 3; //the minimum size must be 3 and the maximum must be 4
int value;
for(int i=0; i<lim; i++)
{
value = rand() % 8 + 1; // I'm giving random values to each position of the vector
vetor.push_back(value);
cout << "\nPos [" << i << "]: " << vetor[i]; //just to keep in mind what are the elements inside the vector
}
int vec_value=0;
for(int i=0; i<lim; i++)
{
vec_value += vetor[i] * pow(10, i); //here i wrote the formula to sum each element of the vector with the correspondent unity, tens, hundreds or thousands
}
cout << "\n\nValor final: " << vec_value; //to see what result the program will give me
return 0;
}
Example of the program
Try this for the main loop:
int power = 1;
for(int i=0; i<lim; i++)
{
vec_value += vetor[i] * power;
power *= 10;
}
This way, all the computations are in integers, you are not affected by floating point rounding.
Related
I am very much new to C++11 and learning about the STL Libraries. I have written a code which is like this,
#include <bits/stdc++.h>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
void Print( const vector<int> &arrays )
{
for ( int x : arrays ) cout << x << ' ';
}
int main() {
int citys, cityPairs, fv, lv, w;
vector <int> fvarr;
vector <int> lvarr;
vector <int> warr;
vector <int> warr_temp;
vector <int> disjoint_pairs;
scanf("%d%d", &citys, &cityPairs);
for(int nr = 0; nr < cityPairs; nr++){
scanf("%d%d%d", &fv, &lv, &w);
fvarr.push_back(fv);
lvarr.push_back(lv);
warr.push_back(w);
warr_temp = warr;
}
for (int j = 0; j < citys; j++){
auto result = min_element(begin(warr_temp), end(warr_temp));
auto pos_temp = distance(begin(warr_temp), result);
cout << pos_temp;
auto pos = distance(begin(warr), result);
cout << pos;
disjoint_pairs.push_back(fvarr[pos]);
disjoint_pairs.push_back(lvarr[pos]);
warr_temp.erase(warr_temp.begin() + pos_temp);
}
// Print(disjoint_pairs);
}
What i am doing in this code is i am taking 3 vectors and 1 vector to copy the last one warr_temp = warr;. Then i am checking the minimum value in vectorwarr_temp and storing it's index in pos_temp, next i am storing that min value's index from vector warr into pos.
Now the problem is the first cout which is pos_temp giving me correct values but the second one which is pos giving me the output something like this,
-61-62-63-64
why is this happening? what are these numbers? are they pointers? I know that distance is a template so what is the right way to implement this?
If anyone can clear my doubts that would be very helpfull.
Sorry if stupid question!!!
The root cause of the problem is auto pos = distance(begin(warr), result); line. It gives unpredictable results because result and begin(warr) belong to different vectors.
result is iterator pointing to warr_temp element, it cannot be mixed with iterators pointing to warr elements like begin(warr).
To get element position in warr vector use std::find(begin(warr), end(warr), *result) instead:
auto warr_res = std::find(begin(warr), end(warr), *result);
auto pos = distance(begin(warr), warr_res);
I am using rand and srand from cstdlib and g++ as a compiler. I was playing around trying to generate some pseudo random numbers and I was getting some unexpected biased results. I was curious so I wrote a simple function. The expected behavior would be that a random number between 1 and 10 would be generated and printed out to screen a 100x's. The expected value of the average should be 5. However, when I run this function it will a generate a single random number between 1 and 10 and print it 100x's with the average being equal to the random number that was generated.
#include <iostream>
#include <cstdlib>
#include <ctime>
using namespace std;
float bs(){
float random;
srand(time(0));
random = rand() % 10 + 1;
return random;
}
int main(){
float average;
float random;
for (int i = 1; i < 101; ++i)
{
random += bs();
cout << random << endl;
}
average = random/100;
cout << average << endl;
return 0;
}
If the initial return from bs = 7 it will stay 7 for the duration of the loop and each time bs() is called. The output will be 7 added to itself 100x's and the average will be equal to gasp 7. What is going on here?
The seed should only be applied once. Move the
srand(time(0));
to main before the loop.
Problem
Provided I have two arrays:
const int N = 1000000;
float A[N];
myStruct *B[N];
The numbers in A can be positive or negative (e.g. A[N]={3,2,-1,0,5,-2}), how can I make the array A partly sorted (all positive values first, not need to be sorted, then negative values)(e.g. A[N]={3,2,5,0,-1,-2} or A[N]={5,2,3,0,-2,-1}) on the GPU? The array B should be changed according to A (A is keys, B is values).
Since the scale of A,B can be very large, I think the sort algorithm should be implemented on GPU (especially on CUDA, because I use this platform). Surely I know thrust::sort_by_key can do this work, but it does muck extra work since I do not need the array A&B to be sorted entirely.
Has anyone come across this kind of problem?
Thrust example
thrust::sort_by_key(thrust::device_ptr<float> (A),
thrust::device_ptr<float> ( A + N ),
thrust::device_ptr<myStruct> ( B ),
thrust::greater<float>() );
Thrust's documentation on Github is not up-to-date. As #JaredHoberock said, thrust::partition is the way to go since it now supports stencils. You may need to get a copy from the Github repository:
git clone git://github.com/thrust/thrust.git
Then run scons doc in the Thrust folder to get an updated documentation, and use these updated Thrust sources when compiling your code (nvcc -I/path/to/thrust ...). With the new stencil partition, you can do:
#include <thrust/partition.h>
#include <thrust/execution_policy.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
struct is_positive
{
__host__ __device__
bool operator()(const int &x)
{
return x >= 0;
}
};
thrust::partition(thrust::host, // if you want to test on the host
thrust::make_zip_iterator(thrust::make_tuple(keyVec.begin(), valVec.begin())),
thrust::make_zip_iterator(thrust::make_tuple(keyVec.end(), valVec.end())),
keyVec.begin(),
is_positive());
This returns:
Before:
keyVec = 0 -1 2 -3 4 -5 6 -7 8 -9
valVec = 0 1 2 3 4 5 6 7 8 9
After:
keyVec = 0 2 4 6 8 -5 -3 -7 -1 -9
valVec = 0 2 4 6 8 5 3 7 1 9
Note that the 2 partitions are not necessarily sorted. Also, the order may differ between the original vectors and the partitions. If this is important to you, you can use thrust::stable_partition:
stable_partition differs from partition in that stable_partition is
guaranteed to preserve relative order. That is, if x and y are
elements in [first, last), such that pred(x) == pred(y), and if x
precedes y, then it will still be true after stable_partition that x
precedes y.
If you want a complete example, here it is:
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/partition.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
struct is_positive
{
__host__ __device__
bool operator()(const int &x)
{
return x >= 0;
}
};
void print_vec(const thrust::host_vector<int>& v)
{
for(size_t i = 0; i < v.size(); i++)
std::cout << " " << v[i];
std::cout << "\n";
}
int main ()
{
const int N = 10;
thrust::host_vector<int> keyVec(N);
thrust::host_vector<int> valVec(N);
int sign = 1;
for(int i = 0; i < N; ++i)
{
keyVec[i] = sign * i;
valVec[i] = i;
sign *= -1;
}
// Copy host to device
thrust::device_vector<int> d_keyVec = keyVec;
thrust::device_vector<int> d_valVec = valVec;
std::cout << "Before:\n keyVec = ";
print_vec(keyVec);
std::cout << " valVec = ";
print_vec(valVec);
// Partition key-val on device
thrust::partition(thrust::make_zip_iterator(thrust::make_tuple(d_keyVec.begin(), d_valVec.begin())),
thrust::make_zip_iterator(thrust::make_tuple(d_keyVec.end(), d_valVec.end())),
d_keyVec.begin(),
is_positive());
// Copy result back to host
keyVec = d_keyVec;
valVec = d_valVec;
std::cout << "After:\n keyVec = ";
print_vec(keyVec);
std::cout << " valVec = ";
print_vec(valVec);
}
UPDATE
I made a quick comparison with the thrust::sort_by_key version, and the thrust::partition implementation does seem to be faster (which is what we could naturally expect). Here is what I obtain on NVIDIA Visual Profiler, with N = 1024 * 1024, with the sort version on the left, and the partition version on the right. You may want to do the same kind of tests on your own.
How about this?:
Count how many positive numbers to determine the inflexion point
Evenly divide each side of the inflexion point into groups (negative-groups are all same length but different length to positive-groups. these groups are the memory chunks for the results)
Use one kernel call (one thread) per chunk pair
Each kernel swaps any out-of-place elements in the input groups into the desired output groups. You will need to flag any chunks that have more swaps than the maximum so that you can fix them during subsequent iterations.
Repeat until done
Memory traffic is swaps only (from original element position, to sorted position). I don't know if this algorithm sounds like anything already defined...
You should be able to achieve this in thrust simply with a modification of your comparison operator:
struct my_compare
{
__device__ __host__ bool operator()(const float x, const float y) const
{
return !((x<0.0f) && (y>0.0f));
}
};
thrust::sort_by_key(thrust::device_ptr<float> (A),
thrust::device_ptr<float> ( A + N ),
thrust::device_ptr<myStruct> ( B ),
my_compare() );
I have been reading + researching on algorithms and formulas to work out a score for my user submitted content to display currently hot / trending items higher up the list, however i'll admit i'm a little over my head here.
I'll give some background on what i'm after... users upload audio to my site, audios have several actions:
Played
Downloaded
Liked
Favorited
Ideally i want an algorithm where I can update an audios score each time a new activity is logged (played, download etc...), also a download action is worth more than a play, like more than a download and a favourite more than a like.
If possible i would like for audios older than 1 week to drop off quite sharply from the list to give newer content more of a chance of trending.
I have read about reddits algorithm which looked good, but i'm in over my head on how to tweak it to make use of my multiple variables, and to drop off older articles after around 7 days.
Some articles that we're interesting:
https://medium.com/hacking-and-gonzo/how-reddit-ranking-algorithms-work-ef111e33d0d9 (reddits algo)
http://www.evanmiller.org/rank-hotness-with-newtons-law-of-cooling.html
Any help is appreciated!
Paul
Reddits old formula and a little drop off
Basically you can use Reddit's formula. Since your system only supports upvotes you could weight them, resulting in something like this:
def hotness(track)
s = track.playedCount
s = s + 2*track.downloadCount
s = s + 3*track.likeCount
s = s + 4*track.favCount
baseScore = log(max(s,1))
timeDiff = (now - track.uploaded).toWeeks
if(timeDiff > 1)
x = timeDiff - 1
baseScore = baseScore * exp(-8*x*x)
return baseScore
The factor exp(-8*x*x) will give you your desired drop off:
The basics behind
You can use any function that goes to zero faster than your score goes up. Since we use log on our score, even a linear function can get multiplied (as long as your score doesn't grow exponentially).
So all you need is a function that returns 1 as long as you don't want to modify the score, and drops afterwards. Our example above forms that function:
multiplier(x) = x > 1 ? exp(-8*x*x) : 1
You can vary the multiplier if you want less steep curves.
Example in C++
Lets say that the probability for a given track to be played in a given hour is 50%, download 10%, like 1% and favorite 0.1%. Then the following C++ program will give you an estimate for your scores behavior:
#include <iostream>
#include <fstream>
#include <random>
#include <ctime>
#include <cmath>
struct track{
track() : uploadTime(0),playCount(0),downCount(0),likeCount(0),faveCount(0){}
std::time_t uploadTime;
unsigned int playCount;
unsigned int downCount;
unsigned int likeCount;
unsigned int faveCount;
void addPlay(unsigned int n = 1){ playCount += n;}
void addDown(unsigned int n = 1){ downCount += n;}
void addLike(unsigned int n = 1){ likeCount += n;}
void addFave(unsigned int n = 1){ faveCount += n;}
unsigned int baseScore(){
return playCount +
2 * downCount +
3 * likeCount +
4 * faveCount;
}
};
int main(){
track test;
const unsigned int dayLength = 24 * 3600;
const unsigned int weekLength = dayLength * 7;
std::mt19937 gen(std::time(0));
std::bernoulli_distribution playProb(0.5);
std::bernoulli_distribution downProb(0.1);
std::bernoulli_distribution likeProb(0.01);
std::bernoulli_distribution faveProb(0.001);
std::ofstream fakeRecord("fakeRecord.dat");
std::ofstream fakeRecordDecay("fakeRecordDecay.dat");
for(unsigned int i = 0; i < weekLength * 3; i += 3600){
test.addPlay(playProb(gen));
test.addDown(downProb(gen));
test.addLike(likeProb(gen));
test.addFave(faveProb(gen));
double baseScore = std::log(std::max<unsigned int>(1,test.baseScore()));
double timePoint = static_cast<double>(i)/weekLength;
fakeRecord << timePoint << " " << baseScore << std::endl;
if(timePoint > 1){
double x = timePoint - 1;
fakeRecordDecay << timePoint << " " << (baseScore * std::exp(-8*x*x)) << std::endl;
}
else
fakeRecordDecay << timePoint << " " << baseScore << std::endl;
}
return 0;
}
Result:
This should be sufficient for you.
I need to sort 20+ arrays, already on the GPU, each of the same length, by the same keys. I can not use sort_by_key() directly since it sorts the keys as well (making them useless to sort the next array). Here is what I tried instead:
thrust::device_vector<int> indices(N);
thrust::sequence(indices.begin(),indices.end());
thrust::sort_by_key(keys.begin(),keys.end(),indices.begin());
thrust::gather(indices.begin(),indices.end(),a_01,a_01);
thrust::gather(indices.begin(),indices.end(),a_02,a_02);
...
thrust::gather(indices.begin(),indices.end(),a_20,a_20);
This does not seem to work since gather() expects a different array for the output than for the input, i.e. this works:
thrust::gather(indices.begin(),indices.end(),a_01,o_01);
...
However, I would prefer to not allocate 20+ extra arrays for this task. I know that there is a solution using a thrust::tuple, thrust::zip_iterator and thrust::sort_by_keys(), similiar to here. However, I can only combine up to 10 arrays in a tuple, s.t. I would need to duplicate the key vector again. How would you tackle this task?
I think that the classical way to sort multiple arrays is the so-called back-to-back approach which uses uses thrust::stable_sort_by_key two times. You need to create a keys vector such that elements within the same array have the same key. For example:
Elements: 10.5 4.3 -2.3 0. 55. 24. 66.
Keys: 0 0 0 1 1 1 1
In this case we have two arrays, the first with 3 elements and the second with 4 elements.
You first need to call thrust::stable_sort_by_key having the matrix values as the keys like
thrust::stable_sort_by_key(d_matrix.begin(),
d_matrix.end(),
d_keys.begin(),
thrust::less<float>());
After that, you have
Elements: -2.3 0 4.3 10.5 24. 55. 66.
Keys: 0 1 0 0 1 1 1
which means that the array elements are ordered, while the keys are not. Then you need a second to call thrust::stable_sort_by_key
thrust::stable_sort_by_key(d_keys.begin(),
d_keys.end(),
d_matrix.begin(),
thrust::less<int>());
so performing a sorting according to the keys. After that step, you have
Elements: -2.3 4.3 10.5 0 24. 55. 66.
Keys: 0 0 0 1 1 1 1
which is the final desired result.
Below, a full working example which considers the following problem: separately order each row of a matrix. This is a particular case in which all the arrays have the same length, but the approach works with arrays having possibly different lengths.
#include <cublas_v2.h>
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <thrust/functional.h>
#include <thrust/random.h>
#include <thrust/sequence.h>
#include <stdio.h>
#include <iostream>
#include "Utilities.cuh"
/**************************************************************/
/* CONVERT LINEAR INDEX TO ROW INDEX - NEEDED FOR APPROACH #1 */
/**************************************************************/
template <typename T>
struct linear_index_to_row_index : public thrust::unary_function<T,T> {
T Ncols; // --- Number of columns
__host__ __device__ linear_index_to_row_index(T Ncols) : Ncols(Ncols) {}
__host__ __device__ T operator()(T i) { return i / Ncols; }
};
/********/
/* MAIN */
/********/
int main()
{
const int Nrows = 5; // --- Number of rows
const int Ncols = 8; // --- Number of columns
// --- Random uniform integer distribution between 10 and 99
thrust::default_random_engine rng;
thrust::uniform_int_distribution<int> dist(10, 99);
// --- Matrix allocation and initialization
thrust::device_vector<float> d_matrix(Nrows * Ncols);
for (size_t i = 0; i < d_matrix.size(); i++) d_matrix[i] = (float)dist(rng);
// --- Print result
printf("Original matrix\n");
for(int i = 0; i < Nrows; i++) {
std::cout << "[ ";
for(int j = 0; j < Ncols; j++)
std::cout << d_matrix[i * Ncols + j] << " ";
std::cout << "]\n";
}
/*************************/
/* BACK-TO-BACK APPROACH */
/*************************/
thrust::device_vector<float> d_keys(Nrows * Ncols);
// --- Generate row indices
thrust::transform(thrust::make_counting_iterator(0),
thrust::make_counting_iterator(Nrows*Ncols),
thrust::make_constant_iterator(Ncols),
d_keys.begin(),
thrust::divides<int>());
// --- Back-to-back approach
thrust::stable_sort_by_key(d_matrix.begin(),
d_matrix.end(),
d_keys.begin(),
thrust::less<float>());
thrust::stable_sort_by_key(d_keys.begin(),
d_keys.end(),
d_matrix.begin(),
thrust::less<int>());
// --- Print result
printf("\n\nSorted matrix\n");
for(int i = 0; i < Nrows; i++) {
std::cout << "[ ";
for(int j = 0; j < Ncols; j++)
std::cout << d_matrix[i * Ncols + j] << " ";
std::cout << "]\n";
}
return 0;
}
Well, you really only need to allocate one extra array if you are OK with manipulating pointers to device_vector instead:
thrust::device_vector<int> indices(N);
thrust::sequence(indices.begin(),indices.end());
thrust::sort_by_key(keys.begin(),keys.end(),indices.begin());
thrust::device_vector<int> temp(N);
thrust::device_vector<int> *sorted = &temp;
thrust::device_vector<int> *pa_01 = &a_01;
thrust::device_vector<int> *pa_02 = &a_02;
...
thrust::device_vector<int> *pa_20 = &a_20;
thrust::gather(indices.begin(), indices.end(), *pa_01, *sorted);
pa_01 = sorted; sorted = &a_01;
thrust::gather(indices.begin(), indices.end(), *pa_02, *sorted);
pa_02 = sorted; sorted = &a_02;
...
thrust::gather(indices.begin(), indices.end(), *pa_20, *sorted);
pa_20 = sorted; sorted = &a_20;
Or something like that should work anyway. You would need to fix it so the temp device vector is not automatically deallocated when it goes out of scope -- I suggest allocating the CUDA device pointers using cudaMalloc and then wrapping them with device_ptr instead of using automatic device_vectors.