Ec2 Amazon virtual machine - amazon-ec2

I have a problem.
I conjectured that there is no prime formed by the concatenation of two consecutive Mersenne numbers (examples are 157,12763,40952047...) which is congruent to 6 mod 7. Because these numbers have the form (2^k-1)*10^d+2^(k-1)-1, it is easy to see that they growth very rapidly. I arrived to k=565.000 and no prime 6 mod 7 was found. Now I would continue the search at least upto k=1 million and I thought that Amazon ec2 virtual machine could be the solution. What do you suggest, what package should I buy?

Aws provide virtual machines with EC2 services.
For such specific problem, I think you will have to write the program by yourself, even if you have specific server type for computing (like your problem)

Related

Guest allocation with Genetic Algorithm

In our current project, we need to allocate numbers of families to hosts every weekend for lunch.
Each host can serve set number of guests (host capacity).
Each family can have different number of members.
There are several rules need to be applied before family can be assigned to the host:
Some families don’t want to have lunch together (meaning it can’t be assigned to a host if that other family is already assigned to it).
A family cannot be allocated to same host for 2 consecutive weeks.
Children under 13 should be no more than 50% of host capacity.
Families with member older than 50 should be no more than 50% of host capacity.
and few more...
Each rule can have different weight of importance i.e. if first rule breaks then family to host allocation can’t take place, but the 3rd and 4th ones might be relaxed.
This kind of problem is similar to wedding plan table arrangement, but much more complex. I have researched for few days and seems like Genetic Algorithm might be the good direction. However, I'm stuck on how to model, encode the input & implement the algorithm.
I would really appreciate any advice. Thanks in advance.
Like any other genetic algorithm problem, you need to make valid/partially invalid chromosomes (scenarios which can be evaluated).
One example of a chromosome can be, W1: [{H1: F2, F3, F7}; {H2: F4, F6}; {H3: F1, F5}], W2: [{H1: F4, F3, F7}; {H2: F2}; {H3: F1, F5, F6}], ... meaning on Week 1 (W1), Host 1 (H1) is hosting families 2, 3 and 7 and so on.
This can be generated by random assignment or other methods, known as the Population Initialization
Now there needs to be a way to evaluate this chromosome. Which can be done on the basis of points defined in the question.
After this some essentials functions like crossover and mutation can be thought of.
Basic examples:
1) for mutation, families can be shifted or switched from one host to another,
2) for crossover, some weekends can be chosen from one parent, and some from the second, duplicate families can be adjusted.
Hope this helps.

Comparing secret data without giving away source

Issue:
Company A has secret data they don't want to give away to company B.
Company B has secret data they don't want to give away to company A.
The secret data is IP addresses on both sides.
But the two companies want to know the number of overlapping IPs they have (IP addresses that both companies have in the database).
Without using a third party I can't think of a way to solve this issue without one party compromising their secret data set. Is there any type of hashing algo written to solve this problem?
First I'll describe a simple but not very secure idea. Then I'll describe a way that I think it can be easily made much more secure. The basic idea is to have each company send an encoding of a one-way function to the other company.
Sending Programs
As a warm-up, let's first suppose that one company (let's say A) develops an ordinary computer program in some language and sends it to B; B will then run it, supplying its own list of email addresses as input, and the program will report how many of them are also used by A. At this point, B knows how many email addresses it shares with A. Then the process can be repeated, but with the roles of A and B reversed.
Sending SAT Instances
Implementing this program straightforwardly in a normal programming language would yield a program that is almost trivially easy to reverse-engineer. To mitigate this, first, instead of having the program report the count directly, let's reformulate the problem as a decision problem: Does the other company have at least k of the emails in the input? (This involves choosing some value k to test for; of course, if both parties agree then the whole procedure can be performed for many different values of k. (But see the last section for possible ramifications.)) Now the program can be represented instead as a SAT instance that takes as input (some bitstring encoding of) a list of email addresses, and outputs a single bit that indicates whether k or more of them also belong to the company that created the instance.
It's computationally easy to supply inputs to a SAT instance and read off the output bit, but when the instance is large, it's (in principle) very difficult to go in "the other direction" -- that is, to find a satisfying assignment of inputs, i.e., a list of email addresses that will drive the output bit to 1: SAT being an NP-hard problem, all known exact techniques take time exponential in the problem size.
Making it Harder with Hashing
[EDIT: Actually there are many more than (n choose k) possible hashes to be ORed together, since any valid subsequence (with gaps allowed) in the list of email addresses that contains at least k shared ones needs to turn the output bit on. If each email address takes at most b bits, then there are much more than 2^((n-k)b)*(n choose k) possibilities. It's probably only feasible to sample a small fraction of them, and I don't know if unsampled ones can be somehow turned into "don't-cares"...]
The SAT instance I propose here would certainly be very large, as it would have to be a disjunction (OR) of all (n choose k) possible allowed bitstrings. (Let's assume that email addresses are required to be listed in some particular order, to wipe off an n-factorial factor.) However it has a very regular structure that might make it amenable to analysis that could dramatically reduce the time required to solve it. To get around this, all we need to do is to require the receiver to hash the original input and supply this hash value as input instead. The resulting SAT instance will still look like the disjunction (OR) of (n choose k) possible valid bitstrings (which now represent hashes of lists of strings, rather than raw lists of strings) -- but, by choosing a hash size large enough and applying some logic minimisation to the resulting instance, I'm confident that any remaining telltale patterns can be removed. (If anyone with more knowledge in the area can confirm or deny this, please edit or comment.)
Possible Attacks
One weakness of this approach is that nothing stops the receiver from "running" (supplying inputs to) the SAT instance many times. So, choosing k too low allows the receiver to easily isolate the email addresses shared with the sender by rerunning the SAT instance many times using different k-combinations of their own addresses, and dummy values (e.g. invalid email addresses) for the remaining input bits. E.g. if k=2, then the receiver can simply try running all n^2 pairs of its own email addresses and invalid email addresses for the rest until a pair is found that turns the output bit on; either of these email addresses can then be paired with all remaining email addresses to detect them in linear time.
You should be able to use homomorphic encryption to carry out the computation. I imagine creating something like bitmasks on both sites, performing encryption, then performing a XOR of the result. I think this source points to some information on what encryption you can perform that supports XOR.

Find Top 10 Most Frequent visited URl, data is stored across network

Source: Google Interview Question
Given a large network of computers, each keeping log files of visited urls, find the top ten most visited URLs.
Have many large <string (url) -> int (visits)> maps.
Calculate < string (url) -> int (sum of visits among all distributed maps), and get the top ten in the combined map.
Main constraint: The maps are too large to transmit over the network. Also can't use MapReduce directly.
I have now come across quite a few questions of this type, where processiong needs to be done over large Distributed systems. I cant think or find a suitable answer.
All I could think of is brute force, which in some or other way, violates the given constraint.
It says you can't use map-reduce directly which is a hint the author of the question wants you to think how map reduce works, so we will just mimic the actions of map-reduce:
pre-processing: let R be the number of servers in cluster, give each
server unique id from 0,1,2,...,R-1
(map) For each (string,id) - send the tuple to the server which has the id hash(string) % R.
(reduce) Once step 2 is done (simple control communication), produce the (string,count) of the top 10 strings per server. Note that the tuples where those sent in step2 to this particular server.
(map) Each server will send all his top 10 to 1 server (let it be server 0). It should be fine, there are only 10*R of those records.
(reduce) Server 0 will yield the top 10 across the network.
Notes:
The problem with the algorithm, like most big-data algorithms that
don't use frameworks is handling failing servers. MapReduce takes
care of it for you.
The above algorithm can be translated to a 2 phases map-reduce algorithm pretty straight forward.
In the worst case any algorithm, which does not require transmitting the whole frequency table, is going to fail. We can create a trivial case where the global top-10s are all at the bottom of every individual machines list.
If we assume that the frequency of URIs follow Zipf's law, we can come up with effecive solutions. One such solution follows.
Each machine sends top-K elements. K depends solely on the bandwidth available. One master machine aggregates the frequencies and finds the 10th maximum frequency value "V10" (note that this is a lower limit. Since the global top-10 may not be in top-K of every machine, the sum is incomplete).
In the next step every machine sends a list of URIs whose frequency is V10/M (where M is the number of machines). The union of all such is sent back to every machine. Each machines, in turn, sends back the frequency for this particular list. A master aggregates this list into top-10 list.

brute force search optimisation

I have an function that is engineered as follows:
int brutesearch(startNumber,endNumber);
this function returns the correct number if one matches my criteria by performing a linear search, or null if it's not found in the searched numbers.
Say that:
I want to search all 6 digits numbers to find one that does something I want
I can run the brutesearch() function multithreaded
I have a laptop with 4 cores
My question is the following:
What is my best bet for optimising this search? Dividing the number space in 4 segments and running 4 instances of the function one on each core? Or dividing for example in 10 segments and running them all together, or dividing in 12 segments and running them in batches of 4 using a queue?
Any ideas?
Knowing nothing about your search criteria (there may be other considerations created by the memory subsystem), the tradeoff here is between the cost of having some processors do more work than others (e.g., because the search predicate is faster on some values than others, or because other threads were scheduled) and the cost of coordinating the work. A strategy that's worked well for me is to have a work queue from which threads grab a constant/#threads fraction of the remaining tasks each time, but with only four processors, it's pretty hard to go wrong, though the really big running-time wins are in algorithms.
There is no general answer. You need to give more information.
If your each comparison is completely independent of the others, and there are no opportunities for saving computation in a global resource, there is say no global hash tables involved, and your operations are all done in a single stage,
then your best bet is to just divide your problem space into the number of cores you have available, in this case 4 and send 1/4 of the data to each core.
For example if you had 10 million unique numbers that you wanted to test for primality. Or if you had 10 million passwords your were trying to hash to find a match, then just divide by 4.
If you have a real world problem, then you need to know a lot more about the underlying operations to get a good solution. For example if a global resource is involved, then you won't get any improvement from parallelism unless you isolate the operations on the global resource somehow.

Resource allocation using genetic algorithm and matlab

i am working in resource allocation problem, in this i have total 50000 resource and i want to distribulte it on 6 module. objective is
f(i)=1-exp(-b(i)*w(i)); for i=1 to 6
g(i)=1+2*exp(-b(i)*w(i)); for i=1 to 6
ff=(c1-c2)*a(i)*v(i)*f(i)/g(i)+c2*a(i)+c3*w(i); for i=1 to 6
and a(i),b(i),c1,c2,c3 and v(i) is known
and consrtaont is
w(i)<=w whare i=1 to 6
w(i)>=0 and
r(i)=1-exp(-b(i)*w(i))/1+2*exp(-b(i)*w(i))>=0.9; ie r(i) of each module is >= 0.9
so i need W(i) i=1 to 6 and total w is 50000.
Please any one tell me how i will do using genetic algo.
Thank You.
Having the evaluation function is not enough to determine what the genetic algorithm will look like, because even though the evaluation function is very important, it is not the only part of your problem. In order to fully evaluate your problem, one needs to know at least the following:
What are the restrictions on each module and in the total
distribution? For instance, is there a capacity limit for each
module?
What kind of resources we are dealing with? Are there
dependencies between resources, either function or temporal?
What kind of problem are we dealing with? Are all resources to be
allocated or do we intend to use them to perform a certain task
(hence, we will neeed specific types of resources in each module)?
Assuming you need all the resources allocated, the simplest (and maybe dumbest) solution is to encode the module to which each resource will be designated into the chromosome and have a 50.000 long string of 1's, 2's, 3's, ... and 6's. All the bit string operators may be applied to this solution with the minor changes applicable, of course.
Since working with a 50.000 characters long string would be though, we need alternative. If there are any numeric parameters on each resource, we can consider creating a representation that is made of the group centers and cluster the resources according to their proximity to each center. Nevertheless, there should be a meaningful transposition of the resources into the multidimensional real numbers set. In order to create that, we need to know more about the resources themselves.

Resources