I have this code to find all the permutations of a set of letters that form legal words.
>>Combinatorica`
Module[{str = "abc", chars, len, r, check},
chars = Characters[str];
len = StringLength[str];
r = Range[len];
check[n_Integer] :=
DictionaryLookup[{"BritishEnglish",
StringJoin[chars[[UnrankPermutation[n, r]]]]}, 1];
DistributeDefinitions[check, chars, r];
ParallelTable[check[i], {i, 1, len!}]]
I've verified that, if I replace ParallelTable with Table, I get this:
{{}, {}, {}, {"cab"}, {}, {}}
With ParallelTable, however, in addition to that result, I also get warnings like these:
Part::pspec: Part specification Combinatorica`UnrankPermutation[1,{1,2,3}] is neither a machine-sized integer nor a list of machine-sized integers.
Part::pspec: Part specification Combinatorica`UnrankPermutation[2,{1,2,3}] is neither a machine-sized integer nor a list of machine-sized integers.
StringJoin::string: String expected at position 1 in StringJoin[{a,b,c}[[Combinatorica`UnrankPermutation[1,{1,2,3}]]]].
StringJoin::string: String expected at position 1 in StringJoin[{a,b,c}[[Combinatorica`UnrankPermutation[2,{1,2,3}]]]].
These warnings seem to come from kernel 7 and higher. My guess is that the computation reaches those kernels and there isn't any data left, since there are only 6 permutations, and causes them to spit out those warnings.
Is my understanding correct? How do I prevent these warnings?
I don't think that's it - if it was the case this simple test would fail too:
ParallelTable[k^2,{k,3}] (* Assuming more than 3 kernels *)
... which runs just fine.
Rather, it seems to me that the function UnrankPermutations[] behaves badly under ParallelTable as you can see from running the simplified version (which also fails):
ParallelTable[Part[chars, UnrankPermutation[n, r]], {n, len!}]
I am not sure that the brute force approach you are taking with this is a good one (consider what happens when word lengths exceed 10 characters), but a work-around following that idea is this:
list = LexicographicPermutations[chars]
ParallelMap[DictionaryLookup[{"BritishEnglish", #}] &,
Map[StringJoin[#] &, list]]
Good luck!
Related
I need to write a function that maps each of the 26 lowercase alphabetic characters to a single digit number in [0, 9]. I need this function to generate all possible combinations (26^10 of them). I know of a way to write this with recursion, but I would like to do it without recursion, and I'm not sure what the algorithm should look like.
Could someone provide an algorithm for this without recursion?
Sure, here is some code that prints the representation of each function:
for idx in range(10 ** 26):
print(f"function {idx}")
for letter in range(ord('a'), ord('z') + 1):
# f(letter) = 1 + (idx % 10)
print(f"\tf({chr(letter)}) = {1 + (idx % 10)}")
idx //= 10
You'll notice that this code will take a very, very long time to run. There are 10 options for what 'a' maps to, 10 options for what 'b' maps to, etc., so there are 10^26 possible mappings. Generally a computer will only do around 10^9 operations in a second, so doing anything with these mappings will not be tractable.
My question is twofold:
1) As far as I understand, constructs like for loops introduce scope blocks, however I'm having some trouble with a variable that is define outside of said construct. The following code depicts an attempt to extract digits from a number and place them in an array.
n = 654068
l = length(n)
a = Int64[]
for i in 1:(l-1)
temp = n/10^(l-i)
if temp < 1 # ith digit is 0
a = push!(a,0)
else # ith digit is != 0
push!(a,floor(temp))
# update n
n = n - a[i]*10^(l-i)
end
end
# last digit
push!(a,n)
The code executes fine, but when I look at the a array I get this result
julia> a
0-element Array{Int64,1}
I thought that anything that goes on inside the for loop is invisible to the outside, unless I'm operating on variables defined outside the for loop. Moreover, I thought that by using the ! syntax I would operate directly on a, this does not seem to be the case. Would be grateful if anyone can explain to me how this works :)
2) Second question is about syntex used when explaining functions. There is apparently a function called digits that extracts digits from a number and puts them in an array, using the help function I get
julia> help(digits)
Base.digits(n[, base][, pad])
Returns an array of the digits of "n" in the given base,
optionally padded with zeros to a specified size. More significant
digits are at higher indexes, such that "n ==
sum([digits[k]*base^(k-1) for k=1:length(digits)])".
Can anyone explain to me how to interpret the information given about functions in Julia. How am I to interpret digits(n[, base][, pad])? How does one correctly call the digits function? I can't be like this: digits(40125[, 10])?
I'm unable to reproduce you result, running your code gives me
julia> a
1-element Array{Int64,1}:
654068
There's a few mistakes and inefficiencies in the code:
length(n) doesn't give the number of digits in n, but always returns 1 (currently, numbers are iterable, and return a sequence that only contain one number; itself). So the for loop is never run.
/ between integers does floating point division. For extracting digits, you´re better off with div(x,y), which does integer division.
There's no reason to write a = push!(a,x), since push! modifies a in place. So it will be equivalent to writing push!(a,x); a = a.
There's no reason to digits that are zero specially, they are handled just fine by the general case.
Your description of scoping in Julia seems to be correct, I think that it is the above which is giving you trouble.
You could use something like
n = 654068
a = Int64[]
while n != 0
push!(a, n % 10)
n = div(n, 10)
end
reverse!(a)
This loop extracts the digits in opposite order to avoid having to figure out the number of digits in advance, and uses the modulus operator % to extract the least significant digit. It then uses reverse! to get them in the order you wanted, which should be pretty efficient.
About the documentation for digits, [, base] just means that base is an optional parameter. The description should probably be digits(n[, base[, pad]]), since it's not possible to specify pad unless you specify base. Also note that digits will return the least significant digit first, what we get if we remove the reverse! from the code above.
Is this cheating?:
n = 654068
nstr = string(n)
a = map((x) -> x |> string |> int , collect(nstr))
outputs:
6-element Array{Int64,1}:
6
5
4
0
6
8
My question is in reference to the FB HackerCup 2013 QualificationRound problem - BalancedSmileys.
Problem Statement: https://www.facebook.com/hackercup/problems.php?pid=403525256396727&round=185564241586420 (copy here too: https://github.com/anuragkapur/Algorithmic-Programming/tree/master/src/com/anuragkapur/fb/hackercup2013/qr#problem-2-balanced-smileys)
I figured how to solve this problem using the BruteForce method which has exponential running time.
As per the official solutions posted, there is a solution with linear running time. It is described here: https://www.facebook.com/notes/facebook-hacker-cup/qualification-round-solutions/598486173500621
Essentially, it maintains two counters: minOpen and maxOpen. Whenever a open parenthesis "(" is encountered, maxOpen is incremented. If the "(" was NOT a part of a smiley, minOpen is also incremented. Similar strategy for handling ")" as well, as described in the explanation link above.
I can see that the linear time method works, but it is not crystal clear in my head - how? So I am polling this group to find out if anyone can give an alternate "explanation" of the linear running time solution.
Many thanks!
Preprocessing: tokenize the input and convert each token to a list containing the possible effects on the open parenthesis count.
The inputs
i am sick today (:()
:(:))
become
[[0], [0], ..., [0], [1], [0, 1], [-1]]
[[0, 1], [-1, 0], [-1]]
. Now, your brute force algorithm is something like this.
def solution1(lst, opencnt=0, i=0):
if opencnt < 0:
return False
elif i >= len(lst):
return opencnt == 0
else:
for delta in lst[i]:
if solution1(lst, opencnt + delta, i + 1):
return True
return False
The function solution1 always gives the same output for a given input. For a given lst with entries in {-1, 0, 1}, there are only a linear number of possibilities for each of opencnt (-1 to len(lst)) and i (0 to len(lst) - 1), so by caching the output for a given input, namely, memoizing, we get the quadratic-time algorithm.
The linear-time algorithm turns the control flow inside out. Instead of making a separate recursive call for each delta, we make opencnt a set.
def solution2(lst, opencnt={0}, i=0):
opencnt = {x for x in opencnt if x >= 0}
if i >= len(lst):
return 0 in opencnt
else:
return solution2(lst, {x + delta for x in opencnt for delta in lst[i]}, i + 1)
This implementation isn't linear time yet. The final optimization is that opencnt is always an interval, i.e., [minOpen, maxOpen], and can be manipulated in constant time.
If it weren't for the colon in rule #2, the obvious algorithm would be a finite state machine with two states, which loops over the string and maintains a parentheses count:
pcount = 0
for each character:
if it is ':': discard next character
if it is '(': pcount++
if it is ')': pcount-- if pcount > 0, otherwise immediately return "NO"
The fact that rule #2 allows colons makes it a bit more challenging, since messages of the form "(:)" or, say, "( foo :)" may be regarded to have balanced parentheses as well. What this rule essentially says is that "you find it hard to tell if a parenthesis really is a parenthesis or part of an emoticon", and you shall determine whether "there is a way to interpret his message while leaving the parentheses balanced".
To put it more clearly: We don't actually care for the emoticons at all. We shall find out whether the parentheses may be balanced.
The naive approach is to maintain an array of parentheses counters. Initially, it contains only one parentheses counter. Each time you encounter '(' or ')', you append the current parentheses counter array to itself, in-/decrementing the first half as suggested in the simple algorithm above. Once you reach the end of the string, you check whether the array contains zeros. If it does, there's a way for the string to be regarded as having balanced parentheses. Otherwise, there's not.
There might be room for improvements to this 2nd algorithm, and there may even be a jaw-droppingly elegant solution that won't run out of memory if the message consists of 1000 parentheses. However, given enough RAM, this naive approach will determine the correct answer in linear time (and, sadly, exponential space).
Earlier today I asked if there's an idiomatic way to count elements matching predicate function in Mathematica, as I was concerned with performance.
My initial approach for a given predicate pred was the following:
PredCount1[lst_, pred_] := Length#Select[lst, pred];
and I got a suggestion to instead use
PredCount2[lst_, pred_] := Count[lst, x_/;pred#x];
I started profiling these functions, with different lst sizes and pred functions, and added two more definitions:
PredCount3[lst_, pred_] := Count[Thread#pred#lst, True];
PredCount4[lst_, pred_] := Total[If[pred##, 1, 0] & /# lst];
My data samples were ranges between 1 and 10 million elements, and my test functions were EvenQ, #<5& and PrimeQ. The following graphs demonstrate time taken.
EvenQ
PredCount2 is slowest, 3 and 4 duke it out.
Comparison predicate: #<5&
I've selected this function, because it's close to what I need in my actual problem. Don't worry that this is a silly test function, it actually proves that the 4th function has some merit, which I actually ended up using it in my solution.
Same as EvenQ, but 3 is clearly slower than 4.
PrimeQ
This is just bizarre. Everything is flipped. I'm not suspecting caching as the culprit here, since worst values are for the function computed last.
So, what's the right (fastest) way to count the number of elements in a list, that match a given predicate function?
You are seeing the result of auto-compilation.
First, for a Listable function such as EvenQ and PrimeQ use of Thread is unnecessary:
EvenQ[{1, 2, 3}]
{False, True, False}
This also explains why PredCount3 performs well on these functions. (They are internally optimized for threading over a list.)
Now let us look at timings.
dat = RandomInteger[1*^6, 1*^6];
test = # < 5 &;
First#Timing[#[dat, test]] & /# {PredCount1, PredCount2, PredCount3, PredCount4}
{0.343, 0.437, 0.25, 0.047}
If we change a System Option to prevent auto-compilation within Map and run the test again:
SetSystemOptions["CompileOptions" -> {"MapCompileLength" -> Infinity}]
First#Timing[#[dat, test]] & /# {PredCount1, PredCount2, PredCount3, PredCount4}
{0.343, 0.452, 0.234, 0.765}
You can clearly see that without compilation PredCount4 is much slower. In short, if your test function can be compiled by Mathematica this is a good option.
Here are some other examples of fast counting using numeric functions.
The nature of the integers in the list can have a significant effect on the achievable timings. The use of Tally can improve performance if the range of the integers is constrained.
(* Count items in the list matching predicate, pred *)
PredCountID[lst_, pred_] :=
Select[Tally#lst, pred#First## &]\[Transpose] // Last // Total
(* Define the values over which to check timings *)
ranges = {100, 1000, 10000, 100000, 1000000};
sizes = {100, 1000, 10000, 100000, 1000000, 10000000,100000000};
For PrimeQ this function gives the following timings:
Showing that even in a 10^8 sized list, Primes can be counted in less than a tenth of a second if they are from the set of integers, of {0,...,100000} and below the resolution of Timing if they are within a small range such as 1 to 100.
Because the predicate only has to be applied over the set of Tally values, this approach is relatively insensitive to the exact predicate function.
I have not used PackedArray before, but just started looking at using them from reading some discussion on them here today.
What I have is lots of large size 1D and 2D matrices of all reals, and no symbolic (it is a finite difference PDE solver), and so I thought that I should take advantage of using PackedArray.
I have an initialization function where I allocate all the data/grids needed. So I went and used ToPackedArray on them. It seems a bit faster, but I need to do more performance testing to better compare speed before and after and also compare RAM usage.
But while I was looking at this, I noticed that some operations in M automatically return lists in PackedArray already, and some do not.
For example, this does not return packed array
a = Table[RandomReal[], {5}, {5}];
Developer`PackedArrayQ[a]
But this does
a = RandomReal[1, {5, 5}];
Developer`PackedArrayQ[a]
and this does
a = Table[0, {5}, {5}];
b = ListConvolve[ {{0, 1, 0}, {1, 4, 1}, {0, 1, 1}}, a, 1];
Developer`PackedArrayQ[b]
and also matrix multiplication does return result in packed array
a = Table[0, {5}, {5}];
b = a.a;
Developer`PackedArrayQ[b]
But element wise multiplication does not
b = a*a;
Developer`PackedArrayQ[b]
My question : Is there a list somewhere which documents which M commands return PackedArray vs. not? (assuming data meets the requirements, such as Real, not mixed, no symbolic, etc..)
Also, a minor question, do you think it will be better to check first if a list/matrix created is already packed before calling calling ToPackedArray on it? I would think calling ToPackedArray on list already packed will not cost anything, as the call will return right away.
thanks,
update (1)
Just wanted to mention, that just found that PackedArray symbols not allowed in a demo CDF as I got an error uploading one with one. So, had to remove all my packing code out. Since I mainly write demos, now this topic is just of an academic interest for me. But wanted to thank everyone for time and good answers.
There isn't a comprehensive list. To point out a few things:
Basic operations with packed arrays will tend to remain packed:
In[66]:= a = RandomReal[1, {5, 5}];
In[67]:= Developer`PackedArrayQ /# {a, a.a, a*a}
Out[67]= {True, True, True}
Note above that that my version (8.0.4) doesn't unpack for element-wise multiplication.
Whether a Table will result in a packed array depends on the number of elements:
In[71]:= Developer`PackedArrayQ[Table[RandomReal[], {24}, {10}]]
Out[71]= False
In[72]:= Developer`PackedArrayQ[Table[RandomReal[], {24}, {11}]]
Out[72]= True
In[73]:= Developer`PackedArrayQ[Table[RandomReal[], {25}, {10}]]
Out[73]= True
On["Packing"] will turn on messages to let you know when things unpack:
In[77]:= On["Packing"]
In[78]:= a = RandomReal[1, 10];
In[79]:= Developer`PackedArrayQ[a]
Out[79]= True
In[80]:= a[[1]] = 0 (* force unpacking due to type mismatch *)
Developer`FromPackedArray::punpack1: Unpacking array with dimensions {10}. >>
Out[80]= 0
Operations that do per-element inspection will usually unpack the array,
In[81]:= a = RandomReal[1, 10];
In[82]:= Position[a, Max[a]]
Developer`FromPackedArray::unpack: Unpacking array in call to Position. >>
Out[82]= {{4}}
There penalty for calling ToPackedArray on an already packed list is small enough that I wouldn't worry about it too much:
In[90]:= a = RandomReal[1, 10^7];
In[91]:= Timing[Do[Identity[a], {10^5}];]
Out[91]= {0.028089, Null}
In[92]:= Timing[Do[Developer`ToPackedArray[a], {10^5}];]
Out[92]= {0.043788, Null}
The frontend prefers packed to unpacked arrays, which can show up when dealing with Dynamic and Manipulate:
In[97]:= Developer`PackedArrayQ[{1}]
Out[97]= False
In[98]:= Dynamic[Developer`PackedArrayQ[{1}]]
Out[98]= True
When looking into performance, focus on cases where large lists are getting unpacked, rather than the small ones. Unless the small ones are in big loops.
This is just an addendum to Brett's answer:
SystemOptions["CompileOptions"]
will give you the lengths being used for which a function will return a packed array. So if you did need to pack a small list, as an alternative to using Developer`ToPackedArray you could temporarily set a smaller number for one of the compile options. e.g.
SetSystemOptions["CompileOptions" -> {"TableCompileLength" -> 20}]
Note also some difference between functions which to me at least doesn't seem intuitive so I generally have to test these kind of things whenever I use them rather than instinctively knowing what will work best:
f = # + 1 &;
g[x_] := x + 1;
data = RandomReal[1, 10^6];
On["Packing"]
Timing[Developer`PackedArrayQ[f /# data]]
{0.131565, True}
Timing[Developer`PackedArrayQ[g /# data]]
Developer`FromPackedArray::punpack1: Unpacking array with dimensions {1000000}.
{1.95083, False}
Another addition to Brett's answer: If a list is a packed array then a ToPackedArray is very fast since this checked quite early. Also you might find this valuable:
http://library.wolfram.com/infocenter/Articles/3141/
In general for numerics stuff look for talks from Rob Knapp and/or Mark Sofroniou.
When I develop numerics codes, I write the function and then use On["Packing"] to make sure that everything is packed that needs to be packed.
Concerning Mike's answer, the threshold has been introduced since for small stuff there is overhead. Where the threshold is is hardware dependent. It might be an idea to write a function that sets these threshold based on measurements done on the computer.