Memory limit of 256 mb exceeded? Why? - c++11

I was solving this problem on codeforces here, the code displays right answer but it exceeds the memory limit on the site. I can't figure out why. Also tried to use vectors but it doesn't work.
#include<bits/stdc++.h>
using namespace std;
int main(){
int n,a,b,c,deny=0;
int groups[n];
scanf("%d %d %d",&n,&a,&b);
for(int i=0;i<n;i++){
cin>>groups[i];
}
int two_one=b*2;
//two kinds of tables:one seater,2 seater
//find no of ppl denied service
for(int j=0;j<n;j++){
if(groups[j]==1 and a!=0){
a-=1;
}
else if(groups[j]==1 and a==0){
two_one-=1;
b-=1;
}
else if(groups[j]==1 and a==0 and two_one==0){
deny+=1;
}
else if(groups[j]==2 and b!=0){
b-=1;
}
else if(groups[j]==2 and b==0){
deny+=2;
}
}
printf("%d",deny);
return 0;
}

It looks like you are trying to allocate an array before reading its size:
int n,a,b,c,deny=0; // <------ Unknown n value
int groups[n]; // <----- Allocation of array of n: undefined behavior
scanf("%d %d %d",&n,&a,&b); // <------ Reading 'groups' size
Just swap the last two lines.
Edit: according to the C++ standard, you should be using vectors:
int n,a,b,c,deny=0;
scanf("%d %d %d",&n,&a,&b);
std::vector<int> groups(n);

Related

why am i getting access violation error c++?

i am getting 0xc0000005 error(access violation error), where am i wrong in this code?
i couldnt debug this error. please help me.
question is this
Formally, given a wall of infinite height, initially unpainted. There occur N operations, and in ith operation, the wall is painted upto height Hi with color Ci. Suppose in jth operation (j>i) wall is painted upto height Hj with color Cj such that Hj >= Hi, the Cith color on the wall is hidden. At the end of N operations, you have to find the number of distinct colors(>=1) visible on the wall.
#include<iostream>
#include <bits/stdc++.h>
#include <algorithm>
using namespace std;
int main()
{
int t;
cin>>t;
for(int tt= 0;tt<t;tt++)
{
int h,c;
int temp = 0;
cin>>h>>c;
int A[h], B[c];
vector<int> fc;
for(int i = 0;i<h;i++)
{
cin>>A[i];
}
for(int j =0;j<h;j++)
{
cin>>B[j];
}
if(is_sorted(A,A+h))
{
return 1;
}
if(count(A,A+h,B[0]) == h)
{
return 1;
}
for(int i = 0;i<h;i++)
{
if(A[i]>=temp)
{
temp = A[i];
}
else
{
if(temp == fc[fc.size()-1])
{
fc[fc.size()-1] = B[i];
}
else
{
fc.push_back(B[i]);
}
}
}
}
}
There are several issues.
When reading values into B, your loop check is j<h. How many elements are in B?
You later look at fc[fc.size()-1]. This is Undefined Behavior if fc is empty, and is the likely source of your problem.
Other issues:
Don't use #include <bits/stdc++.h>
Avoid using namespace std;
Variable declarations like int A[h], where h is a variable, are not standard C++. Some compilers support them as an extension.

Hashing using int array or unordered_map in STL?

Which is more efficient in terms of memory and time complexity hashing using int array or unordered_map in STL?
By hashing I mean storing elements formed by the combination of a key value and a mapped value, and fast retrieval of individual elements based on their keys.
Actually I was trying to solve this question.
Here's my solution:-
#include <bits/stdc++.h>
#define MAX 15000005
using namespace std;
/*
* author: vivekcrux
*/
int gcd(int a, int b)
{
if (b == 0)
return a;
return gcd(b, a % b);
}
int c[MAX];
int n;
int sieve()
{
bitset<MAX> m;
m.set();
int ans = 0;
for(int i=2;i<MAX;i++)
{
if(m[i])
{
int mans = 0;
for(int j=i;j<MAX;j+=i)
{
m[j]=0;
mans += c[j];
}
if(mans<n)
ans = max(ans,mans);
}
}
return ans;
}
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
cout.tie(NULL);
int i,j;
cin>>n;
int a[n+1];
for(i=0;i<n;i++)
{
cin>>a[i];
}
int g = a[0];
for(i=1;i<n;i++)
{
g = gcd(g,a[i]);
}
for(i=0;i<n;i++)
{
a[i] /= g;
if(a[i]!=1) c[a[i]]++;
}
int m = sieve();
if(m==0)
cout<<"-1";
else
cout<<n - m<<endl;
return 0;
}
In this code if I use
unordered_map<int,int> c;
instead of
int c[MAX];
I get a Memory limit exceeded verdict.I have found here that unordered_map has a constant average time complexity on average, but no details about space complexity is mentioned here.I wonder why am I getting MLE with unordered_map.
unordered_map uses bucket to store values. A bucket is a slot in the container's internal hash table to which elements are assigned based on the hash value of their key. Lets see the following code in C++17.
#include <bits/stdc++.h>
using namespace std;
int main() {
unordered_map<int,int> mp;
mp[4] = 1;
mp[41] = 5;
mp[67] = 6;
cout<<mp.bucket_count();
}
The output comes out be 7 (depends on compiler). This is the number of buckets used in the above code. But if we use an array of size 67, it will obviously take more memory. Another case would be that if we would had numbers 1, 2 and 3 instead of 4, 41 and 67, the output would have been 7. Here using array was the way to go for saving space. So it depends on the keys you are storing in the hash table. For time complexity, both performs equally same. There is a collision condition in unordered_map which would blow the overall time complexity of the code. Here is the codeforces link of the blog.

What is wrong with my selection sort?

My implementation of selection sort does not work in case of j < n-2 or n-1 or n. What am I doing wrong?
Is there an online IDE that lets us put a watch for the control loops?
#include <stdio.h>
#define n 4
int main(void) {
int a[n]={4,3,2,1};
int j,min;
for(int i=0;i<n;i++){
min=i;
for(j=i+1;j<n-3;j++)
if(a[j]>a[j+1])
min=j+1;
if(min!=i){
int t=a[min];
a[min]=a[i];
a[i]=a[t];
}
}
for(int i=0;i<n;i++)
printf("%d",a[i]);
return 0;
}
I tried it here
Your code has indeed a strange limit on n-3, but it has also some other flaws:
To find a minimum you should compare with the current minimum (a[min]), not the next/previous element in the array
The code to swap is not correct: the last assignment should not be from a[t], but t itself.
Here is the corrected code:
int main(void) {
int a[n]={4,3,2,1};
int j,min;
for(int i=0;i<n;i++){
min=i;
for(j=i+1;j<n;j++)
if(a[min]>a[j])
min=j;
if(min!=i){
int t=a[min];
a[min]=a[i];
a[i]=t;
}
}
for(int i=0;i<n;i++)
printf("%d",a[i]);
return 0;
}
https://ideone.com/AGJDPS
NB: To see intermediate results in an online IDE, why not add printf calls inside the loop? Of course, for larger code projects you'd better use a locally installed IDE with all the debugging features, and step through the code.

My logic of Minimum coins required

I was trying to do Coin Change problem on my own.But it seem like my logic is somewhere lacking,Please help me.I have commented what was in my mind.
#include <bits/stdc++.h>
using namespace std;
vector<int>nom;
int dp[1000004];
int recurse(int v){
if(dp[v]!=-1)return dp[v]; // If already found something just return
if(v==0)return 0; // If value is 0.Minimum changes req is 0.
if(v<0)return INT_MAX; // If reached out of bound return MAX.
int ans=INT_MAX; // For storing Ans.
for(int i=0;i<nom.size();i++){
ans=min(ans,recurse(v-nom[i])+1); //Min Number of changes req fir val-nom[i]+1 for value val.
}
dp[v]=ans;
return dp[v];
}
int main() {
int v,n,x;
cin>>v>>n; // Value for which I have to find change,No. of change available
for(int i=0;i<n;i++){
cin>>x;
nom.push_back(x); // changes
dp[x]=1; // If we want x money only 1 change req so dp[x]=1
}
int mincoins=0; // For storing answer
mincoins=recurse(v); // Answer for value v.
cout<<mincoins<<endl;
}
return 0;
}
The only problem here is that you forgot to initialise all elements of dp[] to -1.

unable to multiply matrices above 32*32 size in CUDA

I am trying to implement matrix multiplication using CUDA. I have two matrices of order Mw and wN. I launched (w*w) threads in each block and grid dimension = (M/w,N/w). I created a matrix in shared memory of size 32*32. I want to implement matrix multiplication using only one matrix in shared memory. Here's my code
#include<stdio.h>
#include<cuda.h>
#include<stdlib.h>
#include<stdlib.h>
#include<unistd.h>
#include<math.h>
__global__ void add(int *a,int *b, int *c,int *p,int *q){
// __shared__ int aTile[*p][*p];
//const int a=*p;
__shared__ int aTile[32][32];
int row = blockIdx.x*blockDim.x+threadIdx.x;
int col = blockIdx.y*blockDim.y+threadIdx.y;
int sum=0;
aTile[threadIdx.x][threadIdx.y] = a[row*(*p)+threadIdx.y];
__syncthreads();
if(row< *q && col< *q)
{
for(int k=0;k<*p;k++)
{
sum+= aTile[threadIdx.x][k]*b[col+(*q)*k];
// __syncthreads();
}
c[col+(*q)*row]=sum;
//__syncthreads();
}
}
int main(){
printf("Enter the number of rows of matrix 1\n");
int row_1;
scanf("%d",&row_1);
printf("Enter the number of columns of matrix 1\n");
int col_1;
scanf("%d",&col_1);
/*printf("Enter the values of matrix 1 \n");
*/
int a[row_1][col_1];
for(int i=0;i<row_1;i++)
{
for(int j=0;j<col_1;j++)
{
//scanf("%d",&a[i][j]);
a[i][j]=1;
}
}
printf("Enter the number of rows of matrix 2\n");
int row_2;
scanf("%d",&row_2);
printf("Enter the number of columns of matrix 2\n");
int col_2;
scanf("%d",&col_2);
/* printf("Enter the values of matrix 2 \n");
*/
int b[row_2][col_2];
for(int i=0;i<row_2;i++)
{
for(int j=0;j<col_2;j++)
{
// scanf("%d",&b[i][j]);
b[i][j]=1;
}
}
int c[row_1][col_2];
//dim3 dimBlock(col_1, col_1);// in one block u have row_1*col_2 threads;
dim3 dimBlock(col_1,col_1);
//dim3 dimGrid((row_1/col_1)+1,(col_2/col_1)+1); // in one grid you have 1*1 blocks
dim3 dimGrid(ceil(row_1/col_1),ceil(col_2/col_1));
int *p;
int *q;
int *dev_a,*dev_b,*dev_c;
int size_a=row_1*col_1*sizeof(int);
int size_b=row_2*col_2*sizeof(int);
int size_c = row_1*col_2*sizeof(int);
cudaMalloc((void**)&dev_a,size_a);
cudaMalloc((void**)&dev_b,size_b);
cudaMalloc((void**)&dev_c,size_c);
cudaMalloc((void**)&p,sizeof(int));
cudaMalloc((void**)&q,sizeof(int));
cudaMemcpy(dev_a,a,size_a,cudaMemcpyHostToDevice);
cudaMemcpy(dev_b,b,size_b,cudaMemcpyHostToDevice);
cudaMemcpy(dev_c,c,size_c,cudaMemcpyHostToDevice);
cudaMemcpy(p,&col_1,sizeof(int),cudaMemcpyHostToDevice);
cudaMemcpy(q,&col_2,sizeof(int),cudaMemcpyHostToDevice);
add<<<dimGrid,dimBlock>>>(dev_a,dev_b,dev_c,p,q);
cudaMemcpy(c,dev_c,size_c,cudaMemcpyDeviceToHost);
cudaFree(dev_a);
cudaFree(dev_b);
cudaFree(dev_c);
printf("output matrix is : \n");
for(int i=0;i<10;i++)
{
for(int j=0;j<10;j++)
{
printf("%d ",c[i][j]);
}
printf("\n");
}
}
I get the correct output for when i multiply matrices of size 32*32 and 32*32
but when i Multiply matrices of sizes 33*33 and 33*33(and above) , the resultant multiplied matrix contains all zeros. I have tried to increased the size of matrix in shared memory, but I get the following error
ptxas error : Entry function '_Z3addPiS_S_S_S_' uses too much shared data (0x10038 bytes, 0x4000 max)
I am pretty new to CUDA. Sorry, if this was too much basic question
This is a basic question and has been answered many times over.
First of all, use proper cuda error checking any time you are having trouble with a CUDA code. In this case, you would have received an error that would have been instructive.
CUDA kernels have a limit on the maximum number of threads per threadblock. That limit (under CUDA 7, 7.5RC, currently) is 1024 threads per block, on all supported devices. The number of threads per block is specified (in this case) by your dimBlock variable, and it is the product of the terms in each dimension:
dim3 dimBlock(col_1,col_1);
add<<<dimGrid,dimBlock>>>(dev_a,dev_b,dev_c,p,q);
Therefore, when col_1 is 32, you are requesting 32x32 threads (1024) which is the maximum. Any value above 32x32 will fail for this reason. (Your kernel will not launch. No kernel code will get executed when you specify 33x33 here.)
Rather than rewrite this code to fix all the issues, I suggest you study any of the dozens of questions already asked about matrix multiplication, here on the cuda tag. In fact, if you want to see a shared memory optimized code for naive matrix multiplication in CUDA, there is a full example in the programming guide (including both the non-shared version and the shared version for comparison).
And again, I suggest you implement proper cuda error checking before asking for help here. Even if you don't understand the error results, it will be useful information for those who are trying to help you.
You have an overflow in this line:
aTile[threadIdx.x][threadIdx.y] = a[row*(*p)+threadIdx.y];
knowing that aTile is defined as __shared__ int aTile[32][32];
If you want to do tiling, you'll have to loop over the number of tiles you need to cover your matrice.

Resources