I have two big lists that their item's lengths isn't constant. Each list include millions items.
And I want to count frequency of items of first list in second list!
For example:
a = [[c, d], [a, b, e]]
b = [[a, d, c], [e, a, b], [a, d], [c, d, a]]
# expected result of calculate_frequency(a, b) is %{[c, d] => 2, [a, b, e] => 1} Or [{[c, d], 2}, {[a, b, e], 1}]
Due to the large size of the lists, I would like this process to be done concurrently.
So I wrote this function:
def calculate_frequency(items, data_list) do
items
|> Task.async_stream(
fn item ->
frequency =
data_list
|> Enum.reduce(0, fn data_row, acc ->
if item -- data_row == [] do
acc + 1
else
acc
end
end)
{item, frequency}
end,
ordered: false
)
|> Enum.reduce([], fn {:ok, merged}, merged_list -> [merged | merged_list] end)
end
But this algorithm is slow. What should I do to make it fast?
PS: Please do not consider the type of inputs and outputs, the speed of execution is important.
Not sure if this fast enough and certainly it's not concurrent. It's O(m + n) where m is the size of items and n is the size of data_list. I can't find a faster concurrent way because combining the result of all the sub-processes also takes time.
data_list
|> Enum.reduce(%{}, fn(item, counts)->
Map.update(counts, item, 1, &(&1 + 1))
end)
|> Map.take(items)
FYI, doing things concurrently does not necessarily mean doing things in parallel. If you have only one CPU core, concurrency actually slows things down because one CPU core can only do one thing at a time.
Put one list into a MapSet.
Go through the second list and see whether or not each element is in the MapSet.
This is linear in the lengths of the lists, and both operations should be able to be parallelized.
I would start by normalizing the data you want to compare so a simple equality check can tell if two items are "equal" as you would define it. Based on your code, I would guess Enum.sort/1 would do the trick, though MapSet.new/1 or a function returning a map may compare faster if it matches your use case.
defp normalize(item) do
Enum.sort(item)
end
def calculate_frequency(items, data_list) do
data_list = Enum.map(data_list, &normalize/1)
items = Enum.map(items, &normalize/1)
end
If you're going to get most frequencies from data list, I would then calculate all frequencies for data list. Elixir 1.10 introduced Enum.frequencies/1 and Enum.frequencies_by/2, but you could do this with a reduce if desired.
def calculate_frequency(items, data_list) do
data_frequencies = Enum.frequencies_by(data_list, &normalize/1) # does map for you
Map.new(items, &Map.get(data_frequencies, normalize(&1), 0)) # if you want result as map
end
I haven't done any benchmarks on my code or yours. If you were looking to do more asynchronous stuff, you could replace your mapping with Task.async_stream/3, and you could replace your frequencies call with a combination of Stream.chunk_every/2, Task.async_stream/3 (with Enum.frequencies/1 being the function), and Map.merge/3.
suppose i have two vectors:
let x = V.fromList ["foo", "bar", "baz"]
let y = V.fromList [1,3,2]
I want to define a vector y' which is the sorted version of y, but I also want to defined a reordered x' which is ordered based on the sort ordering of y (x' should look like ["foo", "baz", "bar"]).
What's the best function to do that? Ideally, I want to avoid writing sorting functions from scratch.
I think you are looking for backpermute
backpermute :: Vector a -> Vector Int -> Vector a
O(n) Yield the vector obtained by replacing each element i of the index vector by xs!i. This is equivalent to map (xs!) is but is often much more efficient.
Here's a list-based way:
> import Data.List
> let x = ["foo", "bar", "baz"]
> let y = [1,3,2]
> map snd . sort $ zip y x
["foo","baz","bar"]
Basically, we zip so to obtain a list of pairs
[(1,"foo"),(3,"bar"),(2,"baz")]
Then we sort it, lexicographically, so that the first component matters more.
Finally, we discard the first components.
You should be able to adapt this to vectors as well.
Sort a vector of indices comparing indexed values; then permute both vectors based on sorted indices. Data.Vector.Algorithms.Intro provides
introsort for mutable vectors and modify provides safe destructive updates using ST Monad.
import Data.Ord (comparing)
import Data.Vector.Algorithms.Intro (sortBy)
import Data.Vector.Unboxed (generate, modify)
import Data.Vector (Vector, unsafeIndex, backpermute, convert, fromList)
import qualified Data.Vector as V
reorder :: (Ord b) => Vector a -> Vector b -> (Vector a, Vector b)
reorder a b = (backpermute a idx, backpermute b idx)
where
idx = convert $ modify (sortBy comp) init
comp = comparing $ unsafeIndex b -- comparing function
init = generate (V.length b) id -- [0..size - 1]
then,
\> reorder (fromList ["foo", "bar", "baz"]) $ fromList [1, 3, 2]
(["foo","baz","bar"],[1,2,3])
Which algorithm for permutation of list is predictable?
For example, i can get number of i-th permutation
(Haskell code)
--List of all possible permutations
permut [] = [[]]
permut xs = [x:ys|x<-xs,ys<-permut (delete x xs)]
--In command line call:
> permut "abc" !! 2
"bac"
but i don't know how to reverse it.
I want to o something like this:
> getNumOfPermut "abc" "bac"
2
Any reversible algorithm goes!
Thank you in advance!
Okay, I wanted to wait until you answered my question about what you had tried, but I had so much fun working out the answer that I just had to write it up and share it. Nerd sniping, I guess! I'm sure I'm not the first to have invented the algorithm below, but I hope you enjoy the presentation.
Our first step is to give an actual runnable implementation of permut (which you have not done). Our implementation strategy will be a simple one: choose some element of the list, choose some permutation of the remaining elements, and concatenate the two.
chooseFrom [] = []
chooseFrom (x:xs) = (x,xs) : [(y, x:ys) | (y, ys) <- chooseFrom xs]
permut [] = [[]]
permut xs = do
(element, remaining) <- chooseFrom xs
permutation <- permut remaining
return (element:permutation)
If we run this on a sample list, it's pretty clear how it behaves:
> permut [1..4]
[[0,1,2,3],[0,1,3,2],[0,2,1,3],[0,2,3,1],[0,3,1,2],[0,3,2,1],[1,0,2,3],[1,0,3,2],[1,2,0,3],[1,2,3,0],[1,3,0,2],[1,3,2,0],[2,0,1,3],[2,0,3,1],[2,1,0,3],[2,1,3,0],[2,3,0,1],[2,3,1,0],[3,0,1,2],[3,0,2,1],[3,1,0,2],[3,1,2,0],[3,2,0,1],[3,2,1,0]]
The result has a lot of structure; for example, if we group by the first element of the contained lists, there are four groups, each containing 6 (which is 3!) elements:
> mapM_ print $ groupBy ((==) `on` head) it
[[0,1,2,3],[0,1,3,2],[0,2,1,3],[0,2,3,1],[0,3,1,2],[0,3,2,1]]
[[1,0,2,3],[1,0,3,2],[1,2,0,3],[1,2,3,0],[1,3,0,2],[1,3,2,0]]
[[2,0,1,3],[2,0,3,1],[2,1,0,3],[2,1,3,0],[2,3,0,1],[2,3,1,0]]
[[3,0,1,2],[3,0,2,1],[3,1,0,2],[3,1,2,0],[3,2,0,1],[3,2,1,0]]
So! The first digit of the list tells us "how many 6s to add". Additionally, each list in the above grouping exhibits similar structure: the lists in the first group have three groups of 2! elements each containing 1, 2, and 3 as their second element; the lists in each of those groups have 2 groups of 1! elements each starting with each of the remaining digits; and each of those groups have 1 group(s) of 0! elements each starting with the only remaining digit. So the second digit tells us "how many 2s to add", the third digit tells us "how many 1s to add", and the last digit tells us "how many 1s to add" (but always tells us to add 0 1s).
If you have implemented a change-of-base function on numbers before (e.g. decimal to hexadecimal or similar) you may recognize this pattern. Indeed, we can treat this as a change-of-base operation with a sliding base: instead of 1s, 10s, 100s, 1000s, and so on columns, we have 0!s, 1!s, 2!s, 3!s, 4!s, and so on columns. Let's write it! For efficiency, we'll compute all the sliding bases up front with a factorials function.
import Data.List
factorials n = scanr (*) 1 [n,n-1..1]
deleteAt i xs = case splitAt i xs of (b, e) -> b ++ drop 1 e
permutIndices permutation original
= go (factorials (length permutation - 1))
permutation
original
where
go _ [] [] = [0]
go _ [] _ = []
go _ _ [] = []
go (base:bases) (x:xs) ys = do
i <- elemIndices x ys
remainder <- go bases xs (deleteAt i ys)
return (i*base + remainder)
go [] _ _ = error "the impossible happened!"
Here's a sample sanity-check:
> map (`permutIndices` [1..4]) (permut [1..4])
[[0],[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23]]
And, for fun, here you can see it handling ambiguity correctly:
> permutIndices "acbba" "aabbc"
[21,23,45,47]
> map (permut "aabbc"!!) it
["acbba","acbba","acbba","acbba"]
...and showing that it's significantly more efficient than elemIndices:
> :set +s
> elemIndices "zyxwvutsr" (permut "rstuvwxyz")
[362879]
(2.65 secs, 1288004848 bytes)
> permutIndices "zyxwvutsr" "rstuvwxyz"
[362879]
(0.00 secs, 1030304 bytes)
Less than one thousandth the allocation/time. Seems like a win!
So, to be clear, you are looking for a way to find the position of a given permution-
"bac"
in a list of given permutions-
["abc", "acb", "bac", ....]
This problem actually has nothing inherently to do with permutions themselves. You want to find the location of an element in an array.
As #raymonad mentioned in his comment, stackoverflow.com/questions/20641772/ deals with this question, and the answer there was, use elemIndex.
elemIndex thePermutionToFind $ permut theString
Keep in mind, that if letters repeat, a value might appear more than once in the output, if your "permut" function doesn't remove these duplicates (ie- Note that permut "aa" = ["aa", "aa"]).... In this case the elemIndices function will come in useful.
If elemIndex returns Nothing, it means the string you supplied wasn't a permution.
(this isn't the most effecient algorithm for large strings, since the number of permutions grows like the factorial of the size of the string.... Which is worse than exponential.)
I just can't understand how this algorithm works. All the explanations I've seen say that if you have a set such as {A, B, C} and you want all the permutations, start with each letter distinctly, then find the permutations of the rest of the letters. So for example {A} + permutationsOf({B,C}).
But all the explanations seem to gloss over how you find the permutations of the rest. An example being this one.
Could someone try to explain this algorithm a little more clearly to me?
To understand recursion you need to understand recursion..
(c) Programmer's wisdom
Your question is about that fact, that "permutations of the rest" is that recursive part. Recursion always consist of two parts: trivial case and recursion case. Trivial case points to a case when there's no continue for recursion and something should be returned.
In your sample, trivial part would be {A} - there's only one permutation of this set - itself. Recursion part will be union of current element and this "rest part" - i.e. if you have more than one element, then your result will be union of permutation between this element and "rest part". In terms of permutation: the rest part is current set without selected element. I.e. for set {A,B,C} on first recursion step that will be {A} and "rest part": {B,C}, then {B} and "rest part": {A,C} - and, finally, {C} with "rest part": {A,B}
So your recursion will last till the moment when "the rest part" will be single element - and then it will end.
That is the whole point of recursive implementation. You define the solution recursively assuming you already have the solution for the simpler problem. With a little tought you will come to the conclusion that you can do the very same consideration for the simpler case making it even more simple. Going on until you reach a case that is simple enough to solve. This simple enough case is known as bottom for the recursion.
Also please note that you have to iterate over all letters not just A being the first element. Thus you get all permutations as:
{{A} + permutationsOf({B,C})} +{{B} + permutationsOf({A,C})} + {{C} + permutationsOf({A,B})}
Take a minute and try to write down all the permutations of a set of four letters say {A, B, C, D}. You will find that the algorithm you use is close to the recursion above.
The answer to your question is in the halting-criterion (in this case !inputString.length).
http://jsfiddle.net/mzPpa/
function permutate(inputString, outputString) {
if (!inputString.length) console.log(outputString);
else for (var i = 0; i < inputString.length; ++i) {
permutate(inputString.substring(0, i) +
inputString.substring(i + 1),
outputString + inputString[i]);
}
}
var inputString = "abcd";
var outputString = "";
permutate(inputString, outputString);
So, let's analyze the example {A, B, C}.
First, you want to take single element out of it, and get the rest. So you would need to write some function that would return a list of pairs:
pairs = [ (A, {B, C})
(B, {A, C})
(C, {A, B}) ]
for each of these pairs, you get a separate list of permutations that can be made out of it, like that:
for pair in pairs do
head <- pair.fst // e.g. for the first pair it will be A
tails <- perms(pair.snd) // e.g. tails will be a list of permutations computed from {B, C}
You need to attach the head to each tail from tails to get a complete permutation. So the complete loop will be:
permutations <- []
for pair in pairs do
head <- pair.fst // e.g. for the first pair it will be A
tails <- perms(pair.snd) // e.g. tails will be a list of permutations computed from {B, C}
for tail in tails do
permutations.add(head :: tail); // here we create a complete permutation
head :: tail means that we attach one element head to the beginning of the list tail.
Well now, how to implement perms function used in the fragment tails <- perm(pair.snd). We just did! That's what recursion is all about. :)
We still need a base case, so:
perms({X}) = [ {X} ] // return a list of one possible permutation
And the function for all other cases looks like that:
perms({X...}) =
permutations <- []
pairs <- createPairs({X...})
for pair in pairs do
head <- pair.fst // e.g. for the first pair it will be A
tails <- perms(pair.snd) // e.g. tails will be a list of permutations computed from {B, C}
for tail in tails do
permutations.add( head :: tail ); // here we create a complete permutation
return permutations
I am trying to solve the maximum sub array problem with a brute force approach i.e generating all the possible subarrays combinations. I got something that works but it's not satisfying at all because it produces way too many duplicated subarrays.
Does anyone knows a smart way to generate all the subarrays (in [[]] form) with a minimal number of duplicated elements ?
By the way, I'm new to Haskell. Here's my current solution:
import qualified Data.List as L
maximumSubList::[Integer]->[Integer]
maximumSubList x = head $ L.sortBy (\a b -> compare (sum b) (sum a)) $ L.nub $ slice x
where
-- slice will return all the "sub lists"
slice [] = []
slice x = (slice $ tail x) ++ (sliceLeft x) ++ (sliceRight x)
-- Create sub lists by removing "left" part
-- ex [1,2,3] -> [[1,2,3],[2,3],[3]]
sliceRight [] = []
sliceRight x = x : (sliceRight $ tail x)
-- Create sub lists by removing "right" part
-- ex [1,2,3] -> [[1,2,3],[1,2],[1]]
sliceLeft [] = []
sliceLeft x = x : (sliceLeft $ init x)
There are many useful functions for operating on lists in the standard Data.List module.
import Data.List
slice :: [a] -> [[a]]
slice = filter (not . null) . concatMap tails . inits
dave4420's answer is how to do what you want to do using smart, concise Haskell. I'm no Haskell expert, but I occasionally play around with it and find solving a problem like this to be an interesting distraction, and enjoy figuring out exactly why it works. Hopefully the following explanation will be helpful :)
The key property of dave4420's answer (which your answer doesn't have) is that the pair (startPos, endPos) is unique for each subarray it generates. Now, observe that two subarrays are distinct if either their startPos or endPos is different. Applying inits to the original array returns a list of subarrays that each have unique startPos, and the same endPos (equal to the number of elements in the array). Applying tails to each of these subarrays in turn produces another list of subarrays -- one list of subarrays is output per input subarray. Notice that tails does not disturb the distinctness between input subarrays because the subarrays output by invoking tails on a single input subarray all retain the same startPos: that is, if you have two subarrays with distinct startPoses, and put both of them through tails, each of the subarrays produced from the first input subarray will be distinct from each of the subarrays produced from the second one.
Additionally, each of the subarrays produced by the invocation of tails on a single subarray are distinct because, although they all share the same startPos, they all have distinct endPoses. Therefore all subarrays produced by (concatMap tails) . inits are distinct. It only remains to note that no subarray is missed out: for any subarray starting at position i and ending at position j, that subarray must appear as the j-i+1th list produced by applying tails to the i+1th list produced by inits. So in conclusion, every possible subarray appears exactly once!