Converting Scratch to Algorithm - algorithm

First time I am learning algorithms and trying to figure out with stratch. I am following tutorials on Stratch wiki. How can I convert this to algorithm?( with flow chart or normal steps). Especially the loop.( I uploaded as picture) Please click here to see picture
I Started:
Step:1 Start
Step2: İnt: delete all of numbers, iterator, amount,sum
Step3: How many numbers you want?
Step4:initialize sum=0,amount=0,iterator=1
Step5: Enter the elements values
Step6: found the sum by using loop in array and update sum value in which loop must be continue till (no of elements-1 ) times
Step7:avg=sum/no of elements
Step8: Print the values average
I don't think It's true. I mean I feel there are errors? Thank you for time.

Scratch
Here is the algorithm in variant 2 (see Java algorithm below) in Scratch. The output should be identical.
Java
Here is the algorithm in Java where I did comment the steps which should give you a step-by-step guide on how to do it in Scratch as well.
I have also implemented two variants of the algorithm to show you some considerations that a programmer often has to think of when implementing an algorithm which mainly is time (= time required for the algorithm to complete) and space (= memory used on your computer).
Please note: the following algorithms do not handle errors. E.g. if a user would enter a instead of a number the program would crash. It is easy to adjust the program to handle this but for simplicity I did not do that.
Variant 1: Storing all elements in array numbers
This variant stores all numbers in an array numbers and calculates the sum at the end using those numbers which is slower than variant 2 as the algorithm goes over all the numbers twice. The upside is that you will preserve all the numbers the user entered and you could use that later on if you need to but you will need storage to store those values.
public static void yourAlgorithm() {
// needed in Java to get input from user
var sc = new Scanner(System.in);
// print to screen (equivalent to "say"/ "ask")
System.out.print("How many numbers do you want? ");
// get amount of numbers as answer from user
var amount = sc.nextInt();
// create array to store all elements
var numbers = new int[amount];
// set iterator to 1
int iterator = 1;
// as long as the iterator is smaller or equal to the number of required numbers, keep asking for new numbers
// equivalent to "repeat amount" except that retries are possible if no number was entered
while (iterator <= amount) {
// ask for a number
System.out.printf("%d. number: ", iterator);
// insert the number at position iterator - 1 in the array
numbers[iterator - 1] = sc.nextInt();
// increase iterator by one
iterator++;
}
// calulate the sum after all the numbers have been entered by the user
int sum = 0;
// go over all numbers again! (this is why it is slower) and calculate the sum
for (int i = 0; i < amount; i++) {
sum += numbers[i];
}
// print average to screen
System.out.printf("Average: %s / %s = %s", sum, amount, (double)sum / (double)amount);
}
Variant 2: Calculating sum when entering new number
This algorithm does not store the numbers the user enters but immediately uses the input to calculate the sum, hence it is faster as only one loop is required and it needs less memory as the numbers do not need to be stored.
This would be the best solution (fastest, least space/ memory needed) in case you do not need all the numbers the user entered later on.
// needed in Java to get input from user
var sc = new Scanner(System.in);
// print to screen (equivalent to "say"/ "ask")
System.out.print("How many numbers do you want? ");
// get amount of numbers as answer from user
var amount = sc.nextInt();
// set iterator to 1
int iterator = 1;
int sum = 0;
// as long as the iterator is smaller or equal to the number of required numbers, keep asking for new numbers
// equivalent to "repeat amount" except that retries are possible if no number was entered (e.g. character was entered instead)
while (iterator <= amount) {
// ask for a number
System.out.printf("%d. number: ", iterator);
// get number from user
var newNumber = sc.nextInt();
// add the new number to the sum
sum += newNumber;
// increase iterator by one
iterator++;
}
// print average to screen
System.out.printf("Average: %s / %s = %s", sum, amount, (double)sum / (double)amount);
Variant 3: Combining both approaches
You could also combine both approaches, i. e. calculating the sum within the first loop and additionally storing the values in a numbers array so you could use that later on if you need to.
Expected output

Related

Find all number pairs in a given range

I have N numbers let say 20 30 15 30 30 40 15 20. Now I want to find how many numbers pairs are in a given range.(L and R given).
number pair= both numbers are same.
My approach:
Create a Map of Array, such that key of map= number, and value=ArrayList of indexes at which that number appears. Then I traverse from L to R and for each value in that range I traverse in the corresponding arraylist to find if there is a pair that fits in range, and then increment count.
But I think this approach is too slow. Is there some faster method to do the same?
Example: for above given sequence and L=0 and R=6
Answer=5. Possible pairs are 1 for 20, 1 for 15 and 3 for 30.
I am developing a solution, assuming numbers can be upto 10^8( and non negative).
If you are looking for speed and don't care about memory there's maybe a better way.
You can use a set as an auxiliary data structure to see if a number was found, and then simply walk the array. Pseudo code:
int numPairs = 0;
set setVisited;
for (int i = L; i < R; i++) {
if (setVisited.contains(a[i])) {
// found the second of a pair. count it up and reset.
numPairs++;
setVisited.remove(a[i]);
} else {
// remember that we saw this number, so we can spot the next pair.
setVisited.add(a[i]);
}
New solution... hopefully better this time. Psuedo C-ish code:
// Sort the sub-array a[L..R]. This can be done O(nlogn) using qsort.
// ... code omitted ...
// Walk through the sorted array counting how many times number occurs.
// When the number changes, count how many possibles ways to make pairs
// from the given count.
int totalPairs = 0;
int count = 1;
int current = a[L];
for (i = L+1; i < R; i++) {
if (a[i] == current) { // found another, keep counting
count++;
} else { // found a different one
if (count > 1) { // need at least 2 to make a pair!
totalPairs += factorial(count) / 2;
}
}
// start counting the new one
current = a[i];
count = 1;
}
// count the final one
if (count > 1) {
totalPairs += factorial(count) / 2;
}
The sort runs O(nlgn), and the loop body runs O(n). Interestingly the performance barrier is now factorial. For really long arrays with really high numbers of occurrences, factorial is expensive unless you optimize further.
One way would be to have loop count repetitions but not compute factorial yet -- leave yet another array of counts of numbers. Then sort this array (again Nlg(N)), then walk through this array and re-use previously computed factorial to compute the next one.
Also if this array gets big, you'll need a large integer to represent the total. I don't know the O() performance of large integers off the top of my head.
Cool problem!

Random logic engine implementation ideas

I try to find an effective random logic algorithm for this scenario. It doesn't matter which programming Language:
Say I have 20 element array filled with numbers
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
From this I need to construct each time 15 size array BUT
each time I set numbers that must be in this new array, and the remaining slots will be filled with random numbers from the master array.
For example:
In the new array the numbers that must be in are: 1,11,13,20,8,9
so the new array will be:
[1,N,N,11,N,20,8,N,9,N,N,N,13,N,N]
Where the Ns are random numbers from ALL 20 elements of the Master array.
Another example:
given 2,18,17,9,5
create new 10 element array:
[2,2,18,2,11,17,20,5,5,9]
No problem with duplicate elements
I'm trying to find some good algorithm for this.
If you want to receive one random number at a time and don't want to create the full result array up front, an alternative to my other answer is this:
Get a random number ranging from 0..requested_number (where requested_number is the total number of elements to fetch).
If this index is between 0 and length(required), print it from the array required; then remove it from the array;
.. else the next index is > length(required) and so pick any random number out of the optional array.
Decrease requested_number and repeat until this reaches 0.
You need 2 calls to random; the first to select an index from total_number - required_number, so you know from which array to pick a value, and the second time for optional, to pick a random number out of the entire available range.
Here is a basic implementation in C (footnote: using mod on rand() does not yield A Good Random Number, but it'll do for this example).
int main()
{
int optional[] = { 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 };
int required[] = { 21,22,23,24,25 };
int requested_number = 15;
int take_from_required, optional_size, next;
srand(time(NULL));
if (requested_number < sizeof(required)/sizeof(required[0]))
{
printf ("requested number of elements must be at least as large as required array\n");
return EDOM;
}
/* Use this much from 'required': */
take_from_required = sizeof(required)/sizeof(required[0]);
/* Use this much from 'optional': */
optional_size = sizeof(optional)/sizeof(optional[0]);
while (requested_number > 0)
{
/* Please note this is a fairly bad 'random'!
As discussed many times before on SO. */
next = rand() % requested_number;
/* Take from which array? */
if (next >= take_from_required)
{
printf ("%d\n", optional[rand() % optional_size]);
} else
{
printf ("%d (required)\n", required[next]);
required[next] = required[take_from_required-1];
take_from_required--;
}
requested_number--;
}
return 0;
}
If I understand correctly, this is the issue:
optional [ 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 ]
required [ 2,18,17,9,5 ]
Now construct a new array containing at least all elements of required, and filled to its capacity with elements taken from optional.
The problem seems to be that you need to take out random numbers from either required or optional and at the same time make sure required is empty at the end. [*]
Create a new array result (which needs to be at least as long as required -- then again, that can be inferred from the question). Copy all elements of required into it; fill the rest with random elements from optional.
At this point, you fulfill the primary condition, but the elements of required always appear first. So, as a last step, shuffle the elements now stored in the result array (for example, with the well-known Fisher-Yates shuffle).
[*] 'Empty', because all numbers in required must be used at least once. Taking them "out" of the array is the easiest way to make sure this happens. Things start to get complicated when (a) you may have duplicates of any number (from both optional and required) and (b) required is not a subset of optional.

Data structure for set of (non-disjoint) sets

I'm looking for a data structure that roughly corresponds to (in Java terms) Map<Set<int>, double>. Essentially a set of sets of labeled marbles, where each set of marbles is associated with a scalar. I want it to be able to efficiently handle the following operations:
Add a given integer to every set.
Remove every set that contains (or does not contain) a given integer, or at least set the associated double to 0.
Union two of the maps, adding together the doubles for sets that appear in both.
Multiply all of the doubles by a given double.
Rarely, iterate over the entire map.
under the following conditions:
The integers will fall within a constrained range (between 1 and 10,000 or so); the exact range will be known at compile-time.
Most of the integers within the range (80-90%) will never be used, but which ones will not be easily determinable until the end of the calculation.
The number of integers used will almost always still be over 100.
Many of the sets will be very similar, differing only by a few elements.
It may be possible to identify certain groups of integers that frequently appear only in sequential order: for example, if a set contains the integers 27 and 29 then it (almost?) certainly contains 28 as well.
It may be possible to identify these groups prior to running the calculation.
These groups would typically have 100 or so integers.
I've considered tries, but I don't see a good way to handle the "remove every set that contains a given integer" operation.
The purpose of this data structure would be to represent discrete random variables and permit addition, multiplication, and scalar multiplication operations on them. Each of these discrete random variables would ultimately have been created by applying these operations to a fixed (at compile-time) set of independent Bernoulli random variables (i.e. each takes the value 1 or 0 with some probability).
The systems being modeled are close to being representable as a time-inhomogeneous Markov chains (which would of course simplify this immensely) but, unfortunately, it is essential to track the duration since various transitions.
Here's a data structure, that can do all of your operations pretty efficiently:
I'm going to refer to it as a BitmapArray for this explanation.
Thinking about it, apparently for just the operations you have described a sorted array with bitmaps as keys and weights(your doubles) as values will be pretty efficient.
The bitmaps are what maintain membership in your set. Since you said the range of integers in the set are between 1-10,000, we can maintain information about any set with a bitmap of length 10,000.
It's gonna be tough sorting an array where the keys can be as big as 2^10000, but you can be smart about implementing the comparison function in the following way:
Iterate from left to right on the two bitmaps
XOR the bits on each index
Say you get a 1 at ith position
Whichever bitmap has 1 at ith position is greater
If you never get a 1 they're equal
I know this is still a slow comparison.
But not too slow, Here's a benchmark fiddle I did on bitmaps with length 10000.
This is in Javascript, if you're going to write in Java, it's going to perform even better.
function runTest() {
var num = document.getElementById("txtValue").value;
num = isNaN(num * 1) ? 0 : num * 1;
/*For integers in the range 1-10,000 the worst case for comparison are any equal integers which will cause the comparision to iterate over the whole BitArray*/
bitmap1 = convertToBitmap(10000, num);
bitmap2 = convertToBitmap(10000, num);
before = new Date().getMilliseconds();
var result = firstIsGreater(bitmap1, bitmap2, 10000);
after = new Date().getMilliseconds();
alert(result + " in time: " + (after-before) + " ms");
}
function convertToBitmap(size, number) {
var bits = new Array();
var q = number;
do {
bits.push(q % 2);
q = Math.floor(q / 2);
} while (q > 0);
xbitArray = new Array();
for (var i = 0; i < size; i++) {
xbitArray.push(0);
}
var j = xbitArray.length - 1;
for (var i = bits.length - 1; i >= 0; i--) {
xbitArray[j] = bits[i];
j--
}
return xbitArray;
}
function firstIsGreater(bitArray1, bitArray2, lengthOfArrays) {
for (var i = 0; i < lengthOfArrays; i++) {
if (bitArray1[i] ^ bitArray2[i]) {
if (bitArray1[i]) return true;
else return false;
}
}
return false;
}
document.getElementById("btnTest").onclick = function (e) {
runTest();
};
Also, remember that you only have to do this once, when building your BitmapArray (or while taking unions) and then it's going to become pretty efficient for the operations you'd do most often:
Note: N is the length of the BitmapArray.
Add integer to every set: Worst/best case O(N) time. Flip a 0 to 1 in each bitmap.
Remove every set that contains a given integer: Worst case O(N) time.
For each bitmap check the bit that represents the given integer, if 1 mark it's index.
Compress the array by deleting all marked indices.
If you're okay with just setting the weights to 0 it'll be even more efficient. This also makes it very easy if you want to remove all sets that have any element in a given set.
Union of two maps: Worst case O(N1+N2) time. Just like merging two sorted arrays, except you have to be smart about comparisons once more.
Multiply all of the doubles by a given double: Worst/best case O(N) time. Iterate and multiply each value by the input double.
Iterate over the BitmapArray: Worst/best case O(1) time for next element.

Get N samples given iterator

Given are an iterator it over data points, the number of data points we have n, and the maximum number of samples we want to use to do some calculations (maxSamples).
Imagine a function calculateStatistics(Iterator it, int n, int maxSamples). This function should use the iterator to retrieve the data and do some (heavy) calculations on the data element retrieved.
if n <= maxSamples we will of course use each element we get from the iterator
if n > maxSamples we will have to choose which elements to look at and which to skip
I've been spending quite some time on this. The problem is of course how to choose when to skip an element and when to keep it. My approaches so far:
I don't want to take the first maxSamples coming from the iterator, because the values might not be evenly distributed.
Another idea was to use a random number generator and let me create maxSamples (distinct) random numbers between 0 and n and take the elements at these positions. But if e.g. n = 101 and maxSamples = 100 it gets more and more difficult to find a new distinct number not yet in the list, loosing lot of time just in the random number generation
My last idea was to do the contrary: to generate n - maxSamples random numbers and exclude the data elements at these positions elements. But this also doesn't seem to be a very good solution.
Do you have a good idea for this problem? Are there maybe standard known algorithms for this?
To provide some answer, a good way to collect a set of random numbers given collection size > elements needed, is the following. (in C++ ish pseudo code).
EDIT: you may need to iterate over and create the "someElements" vector first. If your elements are large they can be "pointers" to these elements to save space.
vector randomCollectionFromVector(someElements, numElementsToGrab) {
while(numElementsToGrab--) {
randPosition = rand() % someElements.size();
resultVector.push(someElements.get(randPosition))
someElements.remove(randPosition);
}
return resultVector;
}
If you don't care about changing your vector of elements, you could also remove random elements from someElements, as you mentioned. The algorithm would look very similar, and again, this is conceptually the same idea, you just pass someElements by reference, and manipulate it.
Something worth noting, is the quality of psuedo random distributions as far as how random they are, grows as the size of the distribution you used increases. So, you may tend to get better results if you pick which method you use based on which method results in the use of more random numbers. Example: if you have 100 values, and need 99, you should probably pick 99 values, as this will result in you using 99 pseudo random numbers, instead of just 1. Conversely, if you have 1000 values, and need 99, you should probably prefer the version where you remove 901 values, because you use more numbers from the psuedo random distribution. If what you want is a solid random distribution, this is a very simple optimization, that will greatly increase the quality of "fake randomness" that you see. Alternatively, if performance matters more than distribution, you would take the alternative or even just grab the first 99 values approach.
interval = n/(n-maxSamples) //an euclidian division of course
offset = random(0..(n-1)) //a random number between 0 and n-1
totalSkip = 0
indexSample = 0;
FOR it IN samples DO
indexSample++ // goes from 1 to n
IF totalSkip < (n-maxSamples) AND indexSample+offset % interval == 0 THEN
//do nothing with this sample
totalSkip++
ELSE
//work with this sample
ENDIF
ENDFOR
ASSERT(totalSkip == n-maxSamples) //to be sure
interval represents the distance between two samples to skip.
offset is not mandatory but it allows to have a very little diversity.
Based on the discussion, and greater understanding of your problem, I suggest the following. You can take advantage of a property of prime numbers that I think will net you a very good solution, that will appear to grab pseudo random numbers. It is illustrated in the following code.
#include <iostream>
using namespace std;
int main() {
const int SOME_LARGE_PRIME = 577; //This prime should be larger than the size of your data set.
const int NUM_ELEMENTS = 100;
int lastValue = 0;
for(int i = 0; i < NUM_ELEMENTS; i++) {
lastValue += SOME_LARGE_PRIME;
cout << lastValue % NUM_ELEMENTS << endl;
}
}
Using the logic presented here, you can create a table of all values from 1 to "NUM_ELEMENTS". Because of the properties of prime numbers, you will not get any duplicates until you rotate all the way around back to the size of your data set. If you then take the first "NUM_SAMPLES" of these, and sort them, you can iterate through your data structure, and grab a pseudo random distribution of numbers(not very good random, but more random than a pre-determined interval), without extra space and only one pass over your data. Better yet, you can change the layout of the distribution by grabbing a random prime number each time, again must be larger than your data set, or the following example breaks.
PRIME = 3, data set size = 99. Won't work.
Of course, ultimately this is very similar to the pre-determined interval, but it inserts a level of randomness that you do not get by simply grabbing every "size/num_samples"th element.
This is called the Reservoir sampling

How to keep a random subset of a stream of data?

I have a stream of events flowing through my servers. It is not feasible for me to store all of them, but I would like to periodically be able to process some of them in aggregate. So, I want to keep a subset of the stream that is a random sampling of everything I've seen, but is capped to a max size.
So, for each new item, I need an algorithm to decide if I should add it to the stored set, or if I should discard it. If I add it, and I'm already at my limit, I need an algorithm to evict one of the old items.
Obviously, this is easy as long as I'm below my limit (just save everything). But how can I maintain a good random sampling without being biased towards old items or new items once I'm past that limit?
Thanks,
This is a common interview question.
One easy way to do it is to save the nth element with probability k/n (or 1, whichever is lesser). If you need to remove an element to save the new sample, evict a random element.
This gives you a uniformly random subset of the n elements. If you don't know n, you can estimate it and get an approximately uniform subset.
This is called random sampling. Source: http://en.wikipedia.org/wiki/Reservoir_sampling
array R[k]; // result
integer i, j;
// fill the reservoir array
for each i in 1 to k do
R[i] := S[i]
done;
// replace elements with gradually decreasing probability
for each i in k+1 to length(S) do
j := random(1, i); // important: inclusive range
if j <= k then
R[j] := S[i]
fi
done
A decent explanation/proof: http://propersubset.com/2010/04/choosing-random-elements.html
While this paper isn't precisely what you're looking for, it may be a good starting point in your search.
store samples in a first in first out (FIFO) queue.
set a sampling rate of so many events between samples, or randomize this a bit - depending on your patterns of events.
save every nth event, or whenever your rate tells you to, then stick it in to the end of the queue.
pop one off the top if the size is too big.
This is assuming you dont know the total number of events that will be received and that you don't need a minimum number of elements in the subset.
arr = arr[MAX_SIZE] //Create a new array that will store the events. Assuming first index 1.
counter = 1 //Initialize a counter.
while(receiving event){
random = //Generate a random number between 1 and counter
if( counter == random ){
if( counter <= MAX_SIZE ){
arr[counter] = event
}
else{
tmpRandom = //Generate a random number between 1 and MAX_SIZE
arr[tmpRandom] = event
}
}
counter =+ 1
}
Assign a probability of recording each event and store the event in an indexable data structure. When the size of the structure gets to the threshold, remove a random element and add new elements. In Ruby, you could do this:
#storage = []
prob = 0.002
while ( message = getnextMessage) do
#storage.delete((rand() * #storage.length).floor) if #storage.length > MAX_LEN
#storage << message if (rand() < prob)
end
This addresses your max size AND your non-bias toward when the event occurred. You could also choose which element gets deleted by partitioning your stored elements into buckets and then removing an element from any bucket that has more than one element. The bucket method allows you to keep one from each hour, for example.
You should also know that sampling theory is Big Math. If you need more than a layman's idea about this you should consult a qualified mathematician in your area.

Resources