I would like to generate partitions for a set in a specific way: I need to filter out all partitions which are not of size N in the process of generating these partitions. The general solution is "Generate all “unique” subsets of a set (not a powerset)".
For the set S with the following subsets:
[a,b,c]
[a,b]
[c]
[d,e,f]
[d,f]
[e]
and the following 'unique' elements:
a, b, c, d, e, f
the result of the function/method running with the argument N = 2 should be:
[[a,b,c], [d,e,f]]
While the following partitions should be filtered out by the function/method:
[[a,b,c], [d,f], [e]]
[[a,b], [c], [d,e,f]]
[[a,b], [c], [d,f], [e]]
The underlying data structure is not important and could be arrays, sets or whatever.
Reason: I need to filter some partitions out before I have the full set of all partitions, because the function/method which generates all partitions is rather computationally intensive.
According to "Generating the Partitions of a Set", the number of possible partitions can be huge: 44152005855084346 for 23 elements. My data is 50-300 elements in the starting set, so I definitely need to filter out partitions that have size not equal to N before I save them anywhere.
Once you have the partitions as given by Frederick Cheung that you linked, do:
partitions.select{|partition| partition.length == 2}
Related
The goal is to create pairs/triplets/quartets from short lists, since these lists occur in a list of lists that I flatten. Since I want these elements to stay connected, I need a way to flatten the lists without losing the connection between the items in these particular lists.
In short, [a, b, c] needs to be converted to a-b-c. In theory long lists need to be handled too, but in reality only short lists will be relevant.
What I tried so far (which I know is horribly wrong):
create_pair([], Pair).
create_pair([H, H1|T], Pair):-
NPair = H-H1,
create_pair(T, NPair).
This is just for the case of where the list has 2 elements.
You can build your pair/triplet/quartet/... by joining the two first items of the list and replacing it with your connection term until the whole list is processed:
create_ntets([H], H).
create_ntets([H,H1|T], NTet):-
create_ntets([H-H1|T], NTet).
This procedure assumes there is no 0-tet.
Sample runs
?- create_ntets([a,b,c], Triplet).
Triplet = a-b-c
?- create_ntets([a,b,c,d], Quartet).
Quartet = a-b-c-d
If the data structure you want to convert the short lists to doesn't really matter, you can just use =../2 to convert the list to a term. Something like:
list_term(L,T) :- T =.. [ listlet | L ].
So evaluating list_to_term( [a,b,c], T) binds T as listlet(a,b,c) and evaluating list_to_term( L , listlet(a,b,c,d) ) binds L as [a,b,c,d].
See https://swish.swi-prolog.org/p/list-to-term.pl for a runnable playground.
I have two big lists that their item's lengths isn't constant. Each list include millions items.
And I want to count frequency of items of first list in second list!
For example:
a = [[c, d], [a, b, e]]
b = [[a, d, c], [e, a, b], [a, d], [c, d, a]]
# expected result of calculate_frequency(a, b) is %{[c, d] => 2, [a, b, e] => 1} Or [{[c, d], 2}, {[a, b, e], 1}]
Due to the large size of the lists, I would like this process to be done concurrently.
So I wrote this function:
def calculate_frequency(items, data_list) do
items
|> Task.async_stream(
fn item ->
frequency =
data_list
|> Enum.reduce(0, fn data_row, acc ->
if item -- data_row == [] do
acc + 1
else
acc
end
end)
{item, frequency}
end,
ordered: false
)
|> Enum.reduce([], fn {:ok, merged}, merged_list -> [merged | merged_list] end)
end
But this algorithm is slow. What should I do to make it fast?
PS: Please do not consider the type of inputs and outputs, the speed of execution is important.
Not sure if this fast enough and certainly it's not concurrent. It's O(m + n) where m is the size of items and n is the size of data_list. I can't find a faster concurrent way because combining the result of all the sub-processes also takes time.
data_list
|> Enum.reduce(%{}, fn(item, counts)->
Map.update(counts, item, 1, &(&1 + 1))
end)
|> Map.take(items)
FYI, doing things concurrently does not necessarily mean doing things in parallel. If you have only one CPU core, concurrency actually slows things down because one CPU core can only do one thing at a time.
Put one list into a MapSet.
Go through the second list and see whether or not each element is in the MapSet.
This is linear in the lengths of the lists, and both operations should be able to be parallelized.
I would start by normalizing the data you want to compare so a simple equality check can tell if two items are "equal" as you would define it. Based on your code, I would guess Enum.sort/1 would do the trick, though MapSet.new/1 or a function returning a map may compare faster if it matches your use case.
defp normalize(item) do
Enum.sort(item)
end
def calculate_frequency(items, data_list) do
data_list = Enum.map(data_list, &normalize/1)
items = Enum.map(items, &normalize/1)
end
If you're going to get most frequencies from data list, I would then calculate all frequencies for data list. Elixir 1.10 introduced Enum.frequencies/1 and Enum.frequencies_by/2, but you could do this with a reduce if desired.
def calculate_frequency(items, data_list) do
data_frequencies = Enum.frequencies_by(data_list, &normalize/1) # does map for you
Map.new(items, &Map.get(data_frequencies, normalize(&1), 0)) # if you want result as map
end
I haven't done any benchmarks on my code or yours. If you were looking to do more asynchronous stuff, you could replace your mapping with Task.async_stream/3, and you could replace your frequencies call with a combination of Stream.chunk_every/2, Task.async_stream/3 (with Enum.frequencies/1 being the function), and Map.merge/3.
I have a list of lists.
L1= [[...][...][.....].....]
If I take all the elements after flattening the list and extract unique values from it then i get a list L2.
I have another list L3 which is some subset of L2.
I want to find the pair-wise mutual occurrences of the elements of L3 in L1. Relation is non-directed. i.e. a,b is same as b,a
eg-
L1= [[a b c d][a b d g f][c d g][d g]....]
L2=[a b c d g f]
say L3 = [c d g]
I want to find pair wise mutual occurrences of L3 in L1. i.e these values.
c,d:2
d,g:3
c,g:1
I'm getting O(n*n*m*p); where- p- no. of lists in L1, m - avg. no. of elements in each list of L1. n - no. of elements in L3.
Can I get an improved complexity?
Code for above in python is:
Here sig_tags is L3 and tags is L1.
x=[]
for i in range(len(sig_tags)):
for j in range(i+1,len(sig_tags)):
count=0
for k in tags:
if (sig_tags[i] in k) and (sig_tags[j] in k):
count+=1
if count>param:
x.append([sig_tags[i],sig_tags[j],count])
return x
Yes you can.
Give each element an id, then convert the list L1 into a list of bit vectors, where a bit is true if that list constains the corresponding letter. This is O(m*p), or O(M*p*log|Alphabet|) depending how you implement it.
Now to check if a pair belongs to a list you need to check if cerain 2 bits are true which is O(1). So all the checks are going to be O(n^2*p).
Overall the complexity is O(n^2*p + m*p).
You can skip assiging ids if you use a hash function. Be careful, sometimes the hash function computation is expensive.
I'm trying to do some graph analysis using PROLOG. In particular I want a list of pairs which indicate the number of nodes at each deeper level from the root. I'm able to produce a list of pairs of the form:
M = [1-[431, 441, 443, 444, 445, 447, 449], 2-[3, 5, 7, 8, 409, 451|...]].
The pair key is the graph level; the pair value is the list of nodes at that level;
whereas I want the pair value to be a count of the nodes.
But I can't figure out to reduce M to N.
N = [1-7],[2,20,],...........[8-398]
where N indicates 7 nodes at the 1th level etc.... perhaps I need a good set of examples working with pairs.
Simpler data could be M=[1-[a,b,c],],2-[d,e]] should reduce to N=[1-3,2-2] Any pointers would be much appreciated.
You want to map a list of such pairs to another list element-wise as follows
list_bylength(KLs, KNs) :-
maplist(el_len, KLs, KNs).
el_len(I-L,I-N) :-
length(L, N).
Alternatively:
list_bylength2([], []).
list_bylength2([I-L|ILs], [I-N|INs]) :-
length(L, N),
list_bylength2(ILs, INs).
And most compactly, using library(lambda):
..., maplist(\ (I-L)^(I-N)^length(L,N), ILs, INs), ...
a lot of list processing can be performed by findall/3, using member/2 as 'cursor'.
list_bylength(KLs, KNs) :-
findall(K-L, (member(K-Ls,KLs),length(Ls,L)), KNs).
I want to write a predicate which takes 2 unsorted lists, and produces a sorted list output.
sort_lists(List1, List2, List3)
For example:
[10,8,2,4,5]
[3,7,6,9,11]
I wish to merge these into a descending sorted list, WITHOUT sorting them both beforehand and doing a simple merge. The end result would be:
[11,10,9,8,7,6,5,4,3,2]
One idea I had was placing the numbers one at a time into the third list, each time checking the first number that was less than the current number being checked, and inserting the number in that position, but I'm struggling to implement this.. I'm quite new to prolog
What you describe is an application of insertion sort:
join(L1,L2,S):-
append(L1,L2,[A|B]) -> insert_each(B,[A],S)
; S = [].
insert_each([],S,S).
insert_each([A|B],L,S):-
insert(A, ...
insert.......
insert(A,[B|C], X):-
A > B -> ....
; ...........
You can fill in the blanks.