SPOJ DQUERY : TLE Even With BIT? - algorithm

Here is The Problem i Want to Solve , I am Using The Fact That Prefix Sum[i] - Prefix Sum[i-1] Leads to Frequency being Greater than Zero to Identify Distinct Digits and Then i am Eliminating The Frequency , But Even with BIT , i am Getting a TLE
Given a sequence of n numbers a1, a2, ..., an and a number of d-queries.
A d-query is a pair (i, j) (1 ≤ i ≤ j ≤ n).
For each d-query (i, j), you have to return the number of distinct elements in the subsequence ai, ai+1, ..., aj.
Input
Line 1: n (1 ≤ n ≤ 30000).
Line 2: n numbers a1, a2, ..., an (1 ≤ ai ≤ 106).
Line 3: q (1 ≤ q ≤ 200000), the number of d-queries.
In the next q lines, each line contains 2 numbers i, j
representing a d-query (1 ≤ i ≤ j ≤ n).
Output
For each d-query (i, j), print the number of distinct elements in the
subsequence ai, ai+1, ..., aj in a single line.
Example
Input
5
1 1 2 1 3
3
1 5
2 4
3 5
Output
3
2
3
the code is:
#include <iostream>
#include <algorithm>
#include <vector>
#include <stdlib.h>
#include <stdio.h>
typedef long long int ll;
using namespace std;
void update(ll n, ll val, vector<ll> &b);
ll read(ll n,vector<ll> &b);
ll readsingle(ll n,vector<ll> &b);
void map(vector<ll> &a,vector<ll> &b,ll n) /**** RElative Mapping ***/
{
ll temp;
a.clear();
b.clear();
for(ll i=0; i<n; i++)
{
cin>>temp;
a.push_back(temp);
b.push_back(temp);
}
sort(b.begin(),b.end());
for(ll i=0; i<n; i++)
*(a.begin()+i) = (lower_bound(b.begin(),b.end(),a[i])-b.begin())+1;
b.assign(n+1,0);
}
int main()
{
ll n;
cin>>n;
vector<ll> a,b;
map(a,b,n);
ll t;
cin>>t;
while(t--)
{
ll l ,u;
b.assign(n+1,0);
cin>>l>>u;
l--;/*** Reduce For Zero Based INdex ****/
u--;
for(ll i=l;i<=u;i++)
update(a[i],1,b);
ll cont=0;
for(ll i=l;i<=u;i++)
if(readsingle(a[i],b)>0)
{
cont++;
update(a[i],-readsingle(a[i],b),b); /***Eliminate The Frequency */
}
cout<<cont<<endl;
}
return 0;
}
ll readsingle(ll n,vector<ll> &b)
{
return read(n,b)-read(n-1,b);
}
ll read(ll n,vector<ll> &b)
{
ll sum=0;
for(; n; sum+=b[n],n-=n&-n);
return sum;
}
void update(ll n, ll val, vector<ll> &b)
{
for(; n<=b.size(); b[n]+=val,n+=n&-n);
}

The algorithm you use is too slow. For each query, your iterate over the entire query range, which already gives n * q operations(obviously, it is way too much). Here is a better solution(it has O((n + q) * log n) time and O(n + q) space complexity (it is an offline solution):
Let's sort all queries by their right end(there is no need to sort them explicitly, you can just add a query to an appropriate position (from 0 to n - 1)).
Now let's iterate over all positions in the array from left to right and maintain a BIT. Each position in the BIT is either 1(it means that there is a new element at position i) or 0(initially, it is filled with zeros).
For each element a[i]: if it the first occurrence of this element, just add one to the i position in the BIT. Otherwise, add -1 to the position of the previous occurrence of this element and then add 1 to the i position.
The answer to the query (left, right) is just sum for all elements from left to right.
To maintain the last occurrence of each element, you can use a map.
It is possible to make it online using persistent segment tree(the time complexity would be the same, the same complexity would become O(n * log n + q)), but it is not required here.

Related

Finding kth element in the nth order of Farey Sequence

Farey sequence of order n is the sequence of completely reduced fractions, between 0 and 1 which when in lowest terms have denominators less than or equal to n, arranged in order of increasing size. Detailed explanation here.
Problem
The problem is, given n and k, where n = order of seq and k = element index, can we find the particular element from the sequence. For examples answer for (n=5, k =6) is 1/2.
Lead
There are many less than optimal solution available, but am looking for a near-optimal one. One such algorithm is discussed here, for which I am unable to understand the logic hence unable to apply the examples.
Question
Can some please explain the solution with more detail, preferably with an example.
Thank you.
I've read the method provided in your link, and the accepted C++ solution to it. Let me post them, for reference:
Editorial Explanation
Several less-than-optimal solutions exist. Using a priority queue, one
can iterate through the fractions (generating them one by one) in O(K
log N) time. Using a fancier math relation, this can be reduced to
O(K). However, neither of these solution obtains many points, because
the number of fractions (and thus K) is quadratic in N.
The “good” solution is based on meta-binary search. To construct this
solution, we need the following subroutine: given a fraction A/B
(which is not necessarily irreducible), find how many fractions from
the Farey sequence are less than this fraction. Suppose we had this
subroutine; then the algorithm works as follows:
Determine a number X such that the answer is between X/N and (X+1)/N; such a number can be determined by binary searching the range
1...N, thus calling the subroutine O(log N) times.
Make a list of all fractions A/B in the range X/N...(X+1)/N. For any given B, there is at most one A in this range, and it can be
determined trivially in O(1).
Determine the appropriate order statistic in this list (doing this in O(N log N) by sorting is good enough).
It remains to show how we can construct the desired subroutine. We
will show how it can be implemented in O(N log N), thus giving a O(N
log^2 N) algorithm overall. Let us denote by C[j] the number of
irreducible fractions i/j which are less than X/N. The algorithm is
based on the following observation: C[j] = floor(X*B/N) – Sum(C[D],
where D divides j). A direct implementation, which tests whether any D
is a divisor, yields a quadratic algorithm. A better approach,
inspired by Eratosthene’s sieve, is the following: at step j, we know
C[j], and we subtract it from all multiples of j. The running time of
the subroutine becomes O(N log N).
Relevant Code
#include <cassert>
#include <algorithm>
#include <fstream>
#include <iostream>
#include <vector>
using namespace std;
const int kMaxN = 2e5;
typedef int int32;
typedef long long int64_x;
// #define int __int128_t
// #define int64 __int128_t
typedef long long int64;
int64 count_less(int a, int n) {
vector<int> counter(n + 1, 0);
for (int i = 2; i <= n; i += 1) {
counter[i] = min(1LL * (i - 1), 1LL * i * a / n);
}
int64 result = 0;
for (int i = 2; i <= n; i += 1) {
for (int j = 2 * i; j <= n; j += i) {
counter[j] -= counter[i];
}
result += counter[i];
}
return result;
}
int32 main() {
// ifstream cin("farey.in");
// ofstream cout("farey.out");
int64_x n, k; cin >> n >> k;
assert(1 <= n);
assert(n <= kMaxN);
assert(1 <= k);
assert(k <= count_less(n, n));
int up = 0;
for (int p = 29; p >= 0; p -= 1) {
if ((1 << p) + up > n)
continue;
if (count_less((1 << p) + up, n) < k) {
up += (1 << p);
}
}
k -= count_less(up, n);
vector<pair<int, int>> elements;
for (int i = 1; i <= n; i += 1) {
int b = i;
// find a such that up/n < a / b and a / b <= (up+1) / n
int a = 1LL * (up + 1) * b / n;
if (1LL * up * b < 1LL * a * n) {
} else {
continue;
}
if (1LL * a * n <= 1LL * (up + 1) * b) {
} else {
continue;
}
if (__gcd(a, b) != 1) {
continue;
}
elements.push_back({a, b});
}
sort(elements.begin(), elements.end(),
[](const pair<int, int>& lhs, const pair<int, int>& rhs) -> bool {
return 1LL * lhs.first * rhs.second < 1LL * rhs.first * lhs.second;
});
cout << (int64_x)elements[k - 1].first << ' ' << (int64_x)elements[k - 1].second << '\n';
return 0;
}
Basic Methodology
The above editorial explanation results in the following simplified version. Let me start with an example.
Let's say, we want to find 7th element of Farey Sequence with N = 5.
We start with writing a subroutine, as said in the explanation, that gives us the "k" value (how many Farey Sequence reduced fractions there exist before a given fraction - the given number may or may not be reduced)
So, take your F5 sequence:
k = 0, 0/1
k = 1, 1/5
k = 2, 1/4
k = 3, 1/3
k = 4, 2/5
k = 5, 1/2
k = 6, 3/5
k = 7, 2/3
k = 8, 3/4
k = 9, 4/5
k = 10, 1/1
If we can find a function that finds the count of the previous reduced fractions in Farey Sequence, we can do the following:
int64 k_count_2 = count_less(2, 5); // result = 4
int64 k_count_3 = count_less(3, 5); // result = 6
int64 k_count_4 = count_less(4, 5); // result = 9
This function is written in the accepted solution. It uses the exact methodology explained in the last paragraph of the editorial.
As you can see, the count_less() function generates the same k values as in our hand written list.
We know the values of the reduced fractions for k = 4, 6, 9 using that function. What about k = 7? As explained in the editorial, we will list all the reduced fractions in range X/N and (X+1)/N, here X = 3 and N = 5.
Using the function in the accepted solution (its near bottom), we list and sort the reduced fractions.
After that we will rearrange our k values, as in to fit in our new array as such:
k = -, 0/1
k = -, 1/5
k = -, 1/4
k = -, 1/3
k = -, 2/5
k = -, 1/2
k = -, 3/5 <-|
k = 0, 2/3 | We list and sort the possible reduced fractions
k = 1, 3/4 | in between these numbers
k = -, 4/5 <-|
k = -, 1/1
(That's why there is this piece of code: k -= count_less(up, n);, it basically remaps the k values)
(And we also subtract one more during indexing, i.e.: cout << (int64_x)elements[k - 1].first << ' ' << (int64_x)elements[k - 1].second << '\n';. This is just to basically call the right position in the generated array.)
So, for our new re-mapped k values, for N = 5 and k = 7 (original k), our result is 2/3.
(We select the value k = 0, in our new map)
If you compile and run the accepted solution, it will give you this:
Input: 5 7 (Enter)
Output: 2 3
I believe this is the basic point of the editorial and accepted solution.

Variant of Subset-Sum

Given 3 positive integers n, k, and sum, find exactly k number of distinct elements a_i, where
a_i \in S, 1 <= i <= k, and a_i \neq a_j for i \neq j
and, S is the set
S = {1, 2, 3, ..., n}
such that
\sum_{i=1}^{k}{a_i} = sum
I don't want to apply brute force (checking all possible combinations) to solve the problem due to exponential complexity. Can someone give me a hint towards another approach in solving this problem? Also, how can we exploit the fact the set S is sorted?
Is it possible to have complexity of O(k) in this problem?
An idea how to exploit 1..n set properties:
Sum of k continuous members of natural row starting from a is
sum = k*(2*a + (k-1))/2
To get sum of such subsequence about needed s, we can solve
a >= s/k - k/2 + 1/2
or
a <= s/k - k/2 + 1/2
compare s and sum values and make corrections.
For example, having s=173, n=40 and k=5, we can find
a <= 173/5 - 5/2 + 1/2 = 32.6
for starting number 32 we have sequence 32,33,34,35,36 with sum = 170, and for correction by 3 we can just change 36 with 39, or 34,35,36 with 35,36,37 and so on.
Seems that using this approach we get O(1) complexity (of course, there might exist some subtleties that I did miss)
It's possible to modify the pseudo-polynomial algorithm for subset sum.
Prepare a matrix P with dimension k X sum, and initialize all elements to 0. The meaning of P[p, q] == 1 is that there is a subset of p numbers summing to q, and P[p, q] == 0 means that such a subset has not yet been found.
Now iterate over i = 1, ..., n. In each iteration:
If i ≤ sum, set P[1, i] = 1 (there is a subset of size 1 that achieves i).
For any entry P[p, q] == 1, you now know that P[p + 1, q + i] should now be 1 too. If (p + 1, q + i) is within the boundaries of the matrix, set P[p + 1, q + i] = 1.
Finally, check if P[k, sum] == 1.
The complexity, assuming that all integer math operations is constant, is Θ(n2 sum).
There is a O(1) (so to speak) solution. What follows is a formal enough (I hope) development of the idea by #MBo.
It is sufficient to assume that S is a set of all integers and find a minimal solution. Solution K is smaller than K' iff max(K) < max(K'). If max(K) <= n, then K is also a solution to the original problem; otherwise, the original problem has no solution.
So we disregard n and find K, a minimal solution. Let g = max(K) = ceil(sum/k + (k - 1)/2) and s = g + (g-1) + (g-2) + ... (g-k+1) and s' = (g-1) + (g-2) + ... + (g-k). That is, s' is s shifted down by 1. Note s' = s - k.
Obviously s >= sum and (because K is minimal) s' < sum.
If s == sum the solution is K and we're done. Otherwise consider the set K+ = {g, g-1, ..., g-k}. We know that \sum(K+ \setminus {g}) < sum and \sum(K+ \setminus {g-k}) > sum, therefore, there's a single element g_i of K+ such that \sum (K+ \setminus {g_i}) = sum. The solution isK+ \setminus {\sum(K+)-sum}.
The solution in the form of 4 integers a, b, c, d where the actual set is understood to be [a..b] \setunion [c..d] can be computed in O(1).
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
unsigned long int arithmeticSum(unsigned long int a, unsigned long int k, unsigned long int n, unsigned long int *A);
void printSubset(unsigned long int k, unsigned long int *A);
int main(void)
{
unsigned long int n, k, sum;
// scan the respective values of sum, n, and k
scanf("%lu %lu %lu", &sum, &n, &k);
// find the starting element using the formula for the sum of an A.P. having 'k' terms
// starting at 'a', common difference 'd' ( = 1 in this problem), having 'sum' = sum
// sum = [k/2][2*a + (k-1)*d]
unsigned long startElement = (long double)sum/k - (long double)k/2 + (long double)1/2;
// exit if the arithmetic progression formed at the startElement is not within the required bounds
if(startElement < 1 || startElement + k - 1 > n)
{
printf("-1\n");
return 0;
}
// we now work on the k-element set [startElement, startElement + k - 1]
// create an array to store the k elements
unsigned long int *A = malloc(k * sizeof(unsigned long int));
// calculate the sum of k elements in the arithmetic progression [a, a + 1, a + 2, ..., a + (k - 1)]
unsigned long int currentSum = arithmeticSum(startElement, k, n, A);
// if the currentSum is equal to the required sum, then print the array A, and we are done
if(currentSum == sum)
{
printSubset(k, A);
}
// we enter into this block only if currentSum < sum
// i.e. we need to add 'something' to the currentSum in order to make it equal to sum
// i.e. we need to remove an element from the k-element set [startElement, startElement + k - 1]
// and replace it with an element of higher magnitude
// i.e. we need to replace an element in the set [startElement, startElement + k - 1] and replace
// it with an element in the range [startElement + k, n]
else
{
long int j;
bool done;
// calculate the amount which we need to add to the currentSum
unsigned long int difference = sum - currentSum;
// starting from A[k-1] upto A[0] do the following...
for(j = k - 1, done = false; j >= 0; j--)
{
// check if adding the "difference" to A[j] results in a number in the range [startElement + k, n]
// if it does then replace A[j] with that element, and we are done
if(A[j] + difference <= n && A[j] + difference > A[k-1])
{
A[j] += difference;
printSubset(k, A);
done = true;
break;
}
}
// if no such A[j] is found then, exit with fail
if(done == false)
{
printf("-1\n");
}
}
return 0;
}
unsigned long int arithmeticSum(unsigned long int a, unsigned long int k, unsigned long int n, unsigned long int *A)
{
unsigned long int currentSum;
long int j;
// calculate the sum of the arithmetic progression and store the each member in the array A
for(j = 0, currentSum = 0; j < k; j++)
{
A[j] = a + j;
currentSum += A[j];
}
return currentSum;
}
void printSubset(unsigned long int k, unsigned long int *A)
{
long int j;
for(j = 0; j < k; j++)
{
printf("%lu ", A[j]);
}
printf("\n");
}

Approach for better solution - Sum of medians

Here is the question Spoj-WEIRDFN
Problem:
Let us define :
F[1] = 1
F[i] = (a*M[i] + b*i + c)%1000000007 for i > 1
where M[i] is the median of the array {F[1],F[2],..,F[i-1]}
Given a,b,c and n, calculate the sum F[1] + F[2] + .. + F[n].
Constraints:
0 <= a,b,c < 1000000007
1 <= n <= 200000
I came up with a solution which is not so efficient
MY SOLUTION::--
#include <bits/stdc++.h>
using namespace std;
#define ll long long int
#define mod 1000000007
int main() {
// your code goes here
int t;
scanf("%d",&t);
while(t--)
{
ll a,b,c,sum=0;
int n;
scanf("%lld%lld%lld%d",&a,&b,&c,&n);
ll f[n+1];
f[1]=1;
f[0]=0;
for(int i=2;i<=n;i++)
{
ll temp;
sort(&f[1],&f[i]);
temp=f[i/2];
f[i]=((a*(temp)%mod)+((b*i)%mod)+(c%mod))%mod;
sum+=f[i];
}
printf("%lld\n",sum+f[1]);
}
return 0;
}
Can anybody give me hint for for better algorithm or data structure for this task
For each test case, you can maintain a binary search tree, thus you can find the median of n elements in O(log n) time, and you only need O(log n) time to add a new element into the tree.
Thus, we have an O(T*nlogn) algorithm, with T is number of test case, and n is number of elements, which should be enough to pass.

Efficient way to count subsets with given sum

Given N numbers I need to count subsets whose sum is S.
Note : Numbers in array need not to be distinct.
My current code is :
int countSubsets(vector<int> numbers,int sum)
{
vector<int> DP(sum+1);
DP[0]=1;
int currentSum=0;
for(int i=0;i<numbers.size();i++)
{
currentSum+=numbers[i];
for (int j=min(sum,currentSum);j>=numbers[i];j--)
DP[j]+=DP[j - numbers[i]];
}
return DP[sum];
}
Can their be any efficient way than this ?
Constraints are :
1 ≤ N ≤ 14
1 ≤ S ≤ 100000
1 ≤ A[i] ≤ 10000
Also their are 100 test cases in a single file. So please help if their exist better solution than this one
N is small (2^20 - is about 1 milion - 2^14 is really small value) - just iterate over all subsets, below I wrote pretty fast way to do that (bithacking). Treat integers as sets (that's enumerating subsets in Lexicographical order)
int length = array.Length;
int subsetCount = 0;
for (int i=0; i<(1<<length); ++i)
{
int currentSet = i;
int tempIndex = length-1;
int currentSum = 0;
while (currentSet > 0) // iterate over bits "from the right side"
{
if (currentSet & 1 == 1) // if current bit is "1"
currentSum += array[tempIndex];
currentSet >>= 1;
tempIndex--;
}
subsetCount += (currentSum == targetSum) ? 1 : 0;
}
You can use the fact that N is small: it is possible to generate all possible subsets of the given array and check if its sum is S for each of them. The time complexity is O(N * 2 ** N) or O(2 ** N)(it depends on the way of the generation). This solution should be fast enough for the given constraints.
Here is a pseudo code of an O(2 ** N) solution:
result = 0
void generate(int curPos, int curSum):
if curPos == N:
if curSum == S:
result++
return
// Do not take the current element.
generate(curPos + 1, curSum)
// Take it.
generate(curPos + 1, curSum + numbers[curPos])
generate(0, 0)
A faster solution based on the meet in the middle technique:
Let's generate all subsets for the first half of the array using the algorithm described above and put their sums into a map(which maps a sum to the number of subsets that have it. It can be either a hash table or just an array because S is relatively small). This step takes O(2 ** (N / 2)) time.
Now let's generate all subsets for the second half and for each of them add the number of subset that sum up to S - currentSum e in the first half(using the map constructed in 1.), where the currentSum is the sum of all elements in the current subseta. Again, we have O(2 ** (N / 2)) subsets and each of them is processed in O(1).
The total time complexity is O(2 ** (N / 2)).
A pseudo code for this solution:
Map<int, int> count = new HashMap<int, int>() // or an array of size S + 1.
result = 0
void generate1(int[] numbers, int pos, int currentSum):
if pos == numbers.length:
count[currentSum]++
return
generate1(numbers, pos + 1, currentSum)
generate1(numbers, pos + 1, currentSum + numbers[pos])
void generate2(int[] numbers, int pos, int currentSum):
if pos == numbers.length:
result += count[S - currentSum]
return
generate2(numbers, pos + 1, currentSum)
generate2(numbers, pos + 1, currentSum + numbers[pos])
generate1(the first half of numbers, 0, 0)
generate2(the second half of numbers, 0, 0)
If N is odd, the middle element can go to either the first half or to the second one. It doesn't matter where it goes as long as it goes to exactly one of them.

Speed of two algorithms rotating a sequence. (from the book Programming Pearls)

In Column 2 of the book Programming Pearls there is a problem asking you to design an algorithm to rotate a string k positions to the left. For example, the string is "12345" and k=2, then the result is "34512".
The first algorithm is to simulate the exchanging process, i.e. put x[(i + k) % n] into x[i], and repeat until finishing.
The second algorithm uses the observation that we only need to exchange the a="12" and b="345", i.e. first k characters and last n - k characters. We could reverse a to a'="21", and b to b'="543' at first, then reverse (a'b')' to ba, which is desired.
Following is my code:
Algorithm 1:
#define NEXT(j) ((j + k) % n)
#define PREV(j) ((j + n - k) % n)
#include "stdio.h"
#include "stdlib.h"
int gcd(int a, int b) {
return (a % b == 0 ? b : gcd(b, a % b));
}
void solve(int *a, int n, int k) {
int len = gcd(n, k);
for (int i = 0; i < len; i++) {
int x = a[i];
int j = i;
do {
a[j] = a[NEXT(j)];
j = NEXT(j);
} while (j != i);
a[PREV(j)] = x;
}
}
int main(int argc, char const *argv[])
{
int n, k;
scanf("%d %d", &n, &k);
int *a = malloc(sizeof(int) * n);
for (int i = 0; i < n; i++) a[i] = i;
solve(a, n, k);
free(a);
return 0;
}
Algorithm 2:
#include "stdio.h"
#include "stdlib.h"
void swap(int *a, int *b) {
int t = *a;
*a = *b;
*b = t;
}
void reverse(int *a, int n) {
int m = n / 2;
for (int i = 0; i < m; i++) {
swap(a + i, a + (n - 1 - i));
}
}
void solve(int *a, int n, int k) {
reverse(a, k);
reverse(a + k, n - k);
reverse(a, n);
}
int main(int argc, char const *argv[])
{
int n, k;
scanf("%d %d", &n, &k);
int *a = malloc(sizeof(int) * n);
for (int i = 0; i < n; i++) a[i] = i;
solve(a, n, k);
free(a);
return 0;
}
where n is the length of the string, and k is the length to rotate.
I use n=232830359 and k=80829 to test the two algorithms. The result is, algorithm 1 takes 6.199s while algorithm 2 takes 1.970s.
However, I think the two algorithms both need to compute n exchanges. (Algorithm 1 is obvious, algorithm 2 takes k/2 + (n-k)/2 + n/2 = n exchanges).
My question is, why their speeds differ so much?
Both of this algorithms are more memory bound than CPU bound. That's why it the case when analyzing the number of basic operations(like swaps or loop iterations) gives results that are quite different from the real running time. So we will use external memory model instead of RAM model. That is, we will analyze the number of cache misses. Let's assume that N is an array size, M is the number of blocks in cache and B is one block size. As long as N is big in your test, it safe to assume that N >M(that is, all the array cannot be in cache).
1)The first algorithm: It accesses array elements in the the following manner i, (i + k) mod N, (i + 2 * k) mod N and so on. If k is large, then two consecutively accessed elements are not in the same block. So in the worst case two accesses yield two cache misses.
These two blocks will be loaded into cache, but they might not be used for a long time after that! So when they are accessed again, they might be already replaced by other blocks(because the cache is smaller then the array). And it will be a miss again. It can be shown that this algorithm can have O(N) cache misses in the worst case.
2)The second algorithm has very different array access pattern: l, r, l + 1, r - 1, ....
If accessing the l-th element causes a miss, the entire block with it is loaded into the cache, so accesses to l + 1, l + 2, ... till the end of the block will not cause any misses. The same is true for r, r - 1 and so on(it is actually true only if l and r blocks can be held in cache at the same time, but this is a safe assumption because caches are usually not direct mapped). So this algorithm has O(N / B) cache misses in the worst case.
Taking into account that a block size of real cache is larger than one integer size, it becomes clear why the second algorithm is significantly faster.
P.S It is just a model of what's really going on, but in this particular case external memory model works better than RAM model(and RAM model is just a model too, anyway).

Resources