Generate pairs having the same attributes from list - algorithm

Assume you have a list of items, each with a set of attributes.
What is an efficient algorithm for generating all pairs from the list having the same attributes?
For example, given a list:
[('item1', {'a','b'}), ('item2', {'a'}), ('item3', {'c','b'}), ('item4', {'b'})]
We should return the following list of four pairs, out of the total possible six:
('item1', 'item2') # both have attribute 'a'
('item1', 'item3') # both have attribute 'b'
('item1', 'item4') # both have attribute 'b'
('item3', 'item4') # both have attribute 'b'
Now, the trivial approach would be to first generate the list of all possible n(n+1)/2 pairs, and then filter out those without similar attributes, but I suspect this approach is inefficient, especially if the number of pairs is very large.
Any suggestions?

I would suggest a two phase algorithm:
arr = [('item1', {'a','b'}), ('item2', {'a'}), ('item3', {'c','b'}), ('item4', {'b'})]
# 1. create map with for each attribute the list of items that have it
mp = {}
for lst in arr:
for prop in lst[1]:
if prop not in mp: mp[prop] = []
mp[prop].append(lst[0])
# 2. for each attribute: add the pairs of items to the result set
result = set()
for prop in mp:
items = mp[prop]
# collect all pairs in items list
for p1 in range(len(items)):
for p2 in range(p1+1,len(items)):
result.add((items[p1],items[p2]))
print (result)
Output:
{('item1', 'item4'), ('item1', 'item2'), ('item3', 'item4'), ('item1', 'item3')}

Related

Finding the most commonly occuring pairs

Say that I have a list (or array) that links Suppliers with the materials they supply. For example, an array of the form
[[Supplier_1, Material_a], [Supplier_2, Material_a], [Supplier_3, Material_a], [Supplier_1, Material_b], [Supplier_2, Material_c], [Supplier_3, Material_b], ...]
I am interested in finding the the list of suppliers that supply at least k materials that a particular supplier say Supplier_1 supplies.
One way that I can think of is to pair all suppliers with Supplier_1 for each material Supplier_1 supplies
[[Supplier_1, Supplier_2, Material_a], [Supplier_1, Supplier_3, Material_a], [Supplier_1, Supplier_3, Material_b]...]
and then count the number of times each pair is present
[[Supplier_1, Supplier_2, 1], [Supplier_1, Supplier_3, 2]...]
The problem is that this approach can be very time consuming since the list provided can be quite long. I was wondering if there is a better way to do this.
You would put the materials of Supplier_1 in a hash set, so that you can verify for any material whether it is supplied by Supplier_1 in constant time.
Once you have that you can iterate the data again, and in a dictionary (hash map) keep a count per supplier which you increment each time the material is in the above mentioned set.
In Python it would look like this:
def getsuppliers(pairs, selected_supplier, k):
materialset = set()
countmap = {} # a dictionary with <key=supplier, value=count> pairs
for supplier, material in pairs:
if supplier == selected_supplier:
materialset.add(material)
countmap[supplier] = 0
# An optional quick exit: if the selected provider does not have k materials,
# there is no use in continuing...
if countmap[selected_supplier] < k:
return [] # no supplier meets the requirement
for supplier, material in pairs:
if material in materialset:
countmap[supplier] = countmap[supplier]+1
result = []
for supplier, count in countmap.items():
if count >= k:
result.append(supplier)
return result
NB: this would include the selected supplier also, provided it has at least k materials.
All operations within each individual loop body, have a constant time complexity, so the overall time complexity is O(n), where n is the size of the input list (pairs).

How to assign more than one value to UInt32

I am trying to set the bird group as two numbers so that when I assign a variable I can use multiple "else if" statements with that one group later on
Code:
Xcode doesn't let me do this I'm in Swift
Let birdgroup: UInt32 = 2, 3
You can use Array, Set, or a tuple to store multiple values in a single variable. If order matters, go with Array or tuple, but if the order doesn't matter, you can use Set. Array and Set both allow you to vary the number of values stored in your variable, while a tuple variable must always be the same length. Also, you can loop over the items in an array or set, but not over a tuple.
Array is the most often used of the three, so if you aren't sure which to use, it's a good first choice.
In summary, this table shows the possibilities and their properties:
Loopable Unloopable
Ordered Array Tuple
Unordered Set (none)
Finally, all the items in an array or set must be of the same type (or derived from the same type, if the array or set is defined with the base class). This is called homogeneous. A tuple can contain different types, also known as heterogeneous.
Homogeneous Heterogeneous
Ordered Array Tuple
Unordered Set (none)
Collection Types in the Swift documentation describes how to use Array and Set.
Array
Create an array with
var birdgroup: [UInt32] = [2, 3]
birdgroup[0] is equal to 2, and birdgroup[1] is equal to 3. You can also access the items by looping:
for bird in birdgroup {
println("\(bird)")
}
Set
You can declare a set with
var birdgroup: Set<UInt32> = [2, 3]
Because sets have no order (imagine every item is tossed together in a bag), you can't request the "first" or "second" item. Instead, loop over each item of the set:
for bird in birdgroup {
println("\(bird)")
}
Tuple
let birdgroup: (UInt32, UInt32) = (2, 3)
Tuples also retain the order of their items. birdgroup.0 is equal to 2, and birdgroup.1 to 3. You can also give each item of the tuple a name if you prefer that to a number:
let birdgroup: (UInt32, UInt32) = (foo: 2, bar: 3)
birdgroup.foo is 2, and birdgroup.bar is 3.
Additionally, the values in a tuple do not all need to be the same type. You can combine different types, such as
let heterogeneousTuple: (UInt32, String) = (2, "three")

how to print a dictionary sorted by a value of a subdictionary?

I have a dictionary inside a dictionary and I wish to print the whole dictionary but sorted around a value in the sub dictionary
Lesson = {Name:{'Rating':Rating, 'Desc':Desc, 'TimeLeftTask':Timeleft}}
or
Lesson = {'Math':{'Rating':11, 'Desc':'Exercises 14 and 19 page 157', 'TimeLeftTask':7}, 'English':{'Rating':23, 'Desc':'Exercise 5 page 204', 'TimeLeftTask':2}}
I want to print this dict for example but sorted by 'Rating' (high numbers at the top)
I have read this post but i don't fully understand it.
If you could keep it simple it would be great.
And yes i'm making a program to sort and deal with my homework
Thanks in advance
def sort_by_subdict(dictionary, subdict_key):
return sorted(dictionary.items(), key=lambda k_v: k_v[1][subdict_key])
Lesson = {'Math':{'Rating':11, 'Desc':'Exercises 14 and 19 page 157', 'TimeLeftTask':7}, 'English':{'Rating':23, 'Desc':'Exercise 5 page 204', 'TimeLeftTask':2}}
print(sort_by_subdict(Lesson, 'Rating'))
As there is no notion of order in dictionary, we need to represent the dictionary as a list of key, value pair tuples to preserve the sorted order.
The so question you mention sorts the dictionary using the sorted function such that it returns a list of (k, v) tuples (here k means key & v means value) of top level dictionary, sorting by the desired value of sub dictionary v.

Python: Sorting a dictionary's values whose values are lists of tuples?

I would like to sort a dictionary where I have a string key but my values are lists of tuples. For example, imagine we have a dictionary where each person is mapped to their rating of different academic subjects, where d.items() would return:
('Person':[("Math",5),("Chemistry",10),("History",2)])
Is there any way I can sort the value of each key alphabetically? For example, d['Person'] would now return:
('Person':[(Chemistry",10),("History",2),("Math",5)])
My solution:
arr = {
'Person': [("Math",5),("Chemistry",10),("History",2)]
}
def customKey(a):
return a[0]
for i in arr.keys():
arr[i] = sorted(arr[i], key=customKey)
print arr

Most efficient way to compile unique values in a massive text file?

I have a set of large text files that in total contain about 3 million rows.
What I want to do is pluck a value from a given column from each row and add it to an array in memory. If the value already exists in the array, then ignore it.
I'm assuming the fastest way is NOT:
Read a value
if exists (using array's native index or what-have-you method), then push it to the array
Should I be inserting the value in alphabetical order to speed up the match/search?
OR should I keep multiple arrays...for example, one for each letter of the alphabet?
Use Set:
Set implements a collection of unordered values with no duplicates. This is a hybrid of Array's intuitive inter-operation facilities and Hash's fast lookup.
Example usage:
require 'set'
set = Set.new
set << 1 << 2 << 3 # => #<Set: {1, 2, 3}>
set << 2 # => #<Set: {1, 2, 3}>
You could add the values as keys to a hash map, that would take care of removing duplicates automatically. You could even count the number of times each value occurs this way (with the hash value).

Resources