Reduce all values to one specified - sorting

So I have a group of lists which looks like:
[['Amy,1,"10,10,6"'], ['Bella,3,"4,7,2"'], ['Cendrick,3,"5,1,9"'], ['Fella,2,"3,8,4"'], ['Hussain,1,"9,4,3"'], ['Jamie,2,"1,1,1"'], ['Jack,3,"10,8,0"'], ['Thomas,2,"5,0,5"'], ['Zyra,1,"7,8,7"']]
Whereby the number after the name is the student's class number and the following 3 numbers are the 3 scores which that student scored.
I have sorted it from an organised group of lists to this alphabetical one however I am having difficulty with the following:
I want to be able to sort them alphabetically but only for a specific class and the highest score out of the last three values. For example, if I wanted to sort class 2, then the output would be as follows:
Fella,8
Jamie,1
Thomas,5
As the names have been sorted alphabetically and all students are from class 2. Each students high score has also been placed beside them.
I would really appreciate any help. TIA

Your data structure is way off. It looks like you wanted to hold information about each student in a list, but ended up putting just one comma delimited string with that information in that list. You then ended up with a list of lists, each of which contained one such string.
This is really what you wanted to do:
[[Amy, 1, 10,10,6],
[Bella, 3, 4,7,2],
[Cendrick, 3, 5,1,9],
[Fella, 2, 3,8,4],
[Hussain, 1, 9,4,3],
[Jamie, 2, 1,1,1],
[Jack, 3, 10,8,0],
[Thomas, 2, 5,0,5],
[Zyra, 1, 7,8,7]
]
Here's how you transform what you have, into what you wanted:
students = []
for student in myList: # myList is the list that you already have
s = []
name, course, grades = student[0].split(',', 2)
s.append(name)
s.append(int(course))
s.extend([int(i) for i in grades.strip('"').split(',')])
students.append(s)
Once you have this, then you can filter and sort students as follows:
import operator
classNum = 1 # let's say you want all the students from class number 1
answer = sorted([s for s in students if s[1]==classNum], key=operator.itemgetter(0))
for student in answer:
name = student[0]
grade = max(student[2:]))
print(name, grade)
Note that I said that this is what it seems like you wanted to do. In your position, this is what I would do:
from collections import namedtuple as ntuple
Student = ntuple('Student', ['name', 'course', 'grades'])
students = []
courseNum = 1
for student in myList: # myList is the list that you already have
s = Student
name, course, grades = student[0].split(',', 2)
course = int(course)
if course != courseNum: continue
grades = [int(i) for i in grades.strip('"').split(',')]
students.append(Student(name, course, grades))
students.sort(key=operator.attrgetter('name'))
for student in students:
print(student.name, max(student.grades))

Maybe this would work:
def transform(inputs, class_number):
results = []
for input in inputs:
input = input[0]
input_pieces = input.split(',', 2)
if input_pieces[1] != class_number:
continue
scores = input_pieces[2].strip('"').split(',')
results.append((input_pieces[0], max(scores)))
return results
Also, I strongly recommend you use something to give your data a little more structure than just a comma-separated string. Something like collections.namedtuple. Then you could have a list of namedtuple's with meaningfully named fields.

Below is the solution, with the way you've your data stored at the moment -- makes processing hard.
>>> lst = [['Amy,1,"10,10,6"'], ['Bella,3,"4,7,2"'], ['Cendrick,3,"5,1,9"'],
['Fella,2,"3,8,4"'], ['Hussain,1,"9,4,3"'], ['Jamie,2,"1,1,1"'], ['Jack,3,"10,8,0"'], ['Thomas,2,"5,0,5"'], ['Zyra,1,"7,8,7"']]
>>> from itertools import chain
>>> lst_flat = chain.from_iterable(lst)
>>> sorted_lst = sorted(filter(lambda x: x.split(',')[1] == '2', lst_flat))
>>> print map(lambda x: (x.split(',')[0],
max([int(y) for y in x.split('"')[1].split(',')])), sorted_lst)
[('Fella', 8), ('Jamie', 1), ('Thomas', 5)]
You should consider cleaning up the way you've represented your data:
>>> from pprint import pprint
>>> from itertools import chain
>>> lst_clean = []
>>> for item in chain.from_iterable(lst):
... name, cls = item.split(',')[0], item.split(',')[1]
... marks = [int(x) for x in item.split('"')[1].split(',')]
... lst_clean.append((name, cls, marks))
>>> pprint(lst_clean)
[('Amy', '1', [10, 10, 6]),
('Bella', '3', [4, 7, 2]),
('Cendrick', '3', [5, 1, 9]),
('Fella', '2', [3, 8, 4]),
('Hussain', '1', [9, 4, 3]),
('Jamie', '2', [1, 1, 1]),
('Jack', '3', [10, 8, 0]),
('Thomas', '2', [5, 0, 5]),
('Zyra', '1', [7, 8, 7])]
>>> sorted_lst = sorted([(name, cls, marks) for (name, cls, marks) in lst_clean if cls == '2'])
>>> for name, cls, marks in sorted_lst:
... print name, max(marks)
Fella 8
Jamie 1
Thomas 5

Related

i wanna use append() method in order to appending a temporary list to a empty list using for loop [duplicate]

While using new_list = my_list, any modifications to new_list changes my_list every time. Why is this, and how can I clone or copy the list to prevent it?
new_list = my_list doesn't actually create a second list. The assignment just copies the reference to the list, not the actual list, so both new_list and my_list refer to the same list after the assignment.
To actually copy the list, you have several options:
You can use the built-in list.copy() method (available since Python 3.3):
new_list = old_list.copy()
You can slice it:
new_list = old_list[:]
Alex Martelli's opinion (at least back in 2007) about this is, that it is a weird syntax and it does not make sense to use it ever. ;) (In his opinion, the next one is more readable).
You can use the built-in list() constructor:
new_list = list(old_list)
You can use generic copy.copy():
import copy
new_list = copy.copy(old_list)
This is a little slower than list() because it has to find out the datatype of old_list first.
If you need to copy the elements of the list as well, use generic copy.deepcopy():
import copy
new_list = copy.deepcopy(old_list)
Obviously the slowest and most memory-needing method, but sometimes unavoidable. This operates recursively; it will handle any number of levels of nested lists (or other containers).
Example:
import copy
class Foo(object):
def __init__(self, val):
self.val = val
def __repr__(self):
return f'Foo({self.val!r})'
foo = Foo(1)
a = ['foo', foo]
b = a.copy()
c = a[:]
d = list(a)
e = copy.copy(a)
f = copy.deepcopy(a)
# edit orignal list and instance
a.append('baz')
foo.val = 5
print(f'original: {a}\nlist.copy(): {b}\nslice: {c}\nlist(): {d}\ncopy: {e}\ndeepcopy: {f}')
Result:
original: ['foo', Foo(5), 'baz']
list.copy(): ['foo', Foo(5)]
slice: ['foo', Foo(5)]
list(): ['foo', Foo(5)]
copy: ['foo', Foo(5)]
deepcopy: ['foo', Foo(1)]
Felix already provided an excellent answer, but I thought I'd do a speed comparison of the various methods:
10.59 sec (105.9 µs/itn) - copy.deepcopy(old_list)
10.16 sec (101.6 µs/itn) - pure Python Copy() method copying classes with deepcopy
1.488 sec (14.88 µs/itn) - pure Python Copy() method not copying classes (only dicts/lists/tuples)
0.325 sec (3.25 µs/itn) - for item in old_list: new_list.append(item)
0.217 sec (2.17 µs/itn) - [i for i in old_list] (a list comprehension)
0.186 sec (1.86 µs/itn) - copy.copy(old_list)
0.075 sec (0.75 µs/itn) - list(old_list)
0.053 sec (0.53 µs/itn) - new_list = []; new_list.extend(old_list)
0.039 sec (0.39 µs/itn) - old_list[:] (list slicing)
So the fastest is list slicing. But be aware that copy.copy(), list[:] and list(list), unlike copy.deepcopy() and the python version don't copy any lists, dictionaries and class instances in the list, so if the originals change, they will change in the copied list too and vice versa.
(Here's the script if anyone's interested or wants to raise any issues:)
from copy import deepcopy
class old_class:
def __init__(self):
self.blah = 'blah'
class new_class(object):
def __init__(self):
self.blah = 'blah'
dignore = {str: None, unicode: None, int: None, type(None): None}
def Copy(obj, use_deepcopy=True):
t = type(obj)
if t in (list, tuple):
if t == tuple:
# Convert to a list if a tuple to
# allow assigning to when copying
is_tuple = True
obj = list(obj)
else:
# Otherwise just do a quick slice copy
obj = obj[:]
is_tuple = False
# Copy each item recursively
for x in xrange(len(obj)):
if type(obj[x]) in dignore:
continue
obj[x] = Copy(obj[x], use_deepcopy)
if is_tuple:
# Convert back into a tuple again
obj = tuple(obj)
elif t == dict:
# Use the fast shallow dict copy() method and copy any
# values which aren't immutable (like lists, dicts etc)
obj = obj.copy()
for k in obj:
if type(obj[k]) in dignore:
continue
obj[k] = Copy(obj[k], use_deepcopy)
elif t in dignore:
# Numeric or string/unicode?
# It's immutable, so ignore it!
pass
elif use_deepcopy:
obj = deepcopy(obj)
return obj
if __name__ == '__main__':
import copy
from time import time
num_times = 100000
L = [None, 'blah', 1, 543.4532,
['foo'], ('bar',), {'blah': 'blah'},
old_class(), new_class()]
t = time()
for i in xrange(num_times):
Copy(L)
print 'Custom Copy:', time()-t
t = time()
for i in xrange(num_times):
Copy(L, use_deepcopy=False)
print 'Custom Copy Only Copying Lists/Tuples/Dicts (no classes):', time()-t
t = time()
for i in xrange(num_times):
copy.copy(L)
print 'copy.copy:', time()-t
t = time()
for i in xrange(num_times):
copy.deepcopy(L)
print 'copy.deepcopy:', time()-t
t = time()
for i in xrange(num_times):
L[:]
print 'list slicing [:]:', time()-t
t = time()
for i in xrange(num_times):
list(L)
print 'list(L):', time()-t
t = time()
for i in xrange(num_times):
[i for i in L]
print 'list expression(L):', time()-t
t = time()
for i in xrange(num_times):
a = []
a.extend(L)
print 'list extend:', time()-t
t = time()
for i in xrange(num_times):
a = []
for y in L:
a.append(y)
print 'list append:', time()-t
t = time()
for i in xrange(num_times):
a = []
a.extend(i for i in L)
print 'generator expression extend:', time()-t
I've been told that Python 3.3+ adds the list.copy() method, which should be as fast as slicing:
newlist = old_list.copy()
What are the options to clone or copy a list in Python?
In Python 3, a shallow copy can be made with:
a_copy = a_list.copy()
In Python 2 and 3, you can get a shallow copy with a full slice of the original:
a_copy = a_list[:]
Explanation
There are two semantic ways to copy a list. A shallow copy creates a new list of the same objects, a deep copy creates a new list containing new equivalent objects.
Shallow list copy
A shallow copy only copies the list itself, which is a container of references to the objects in the list. If the objects contained themselves are mutable and one is changed, the change will be reflected in both lists.
There are different ways to do this in Python 2 and 3. The Python 2 ways will also work in Python 3.
Python 2
In Python 2, the idiomatic way of making a shallow copy of a list is with a complete slice of the original:
a_copy = a_list[:]
You can also accomplish the same thing by passing the list through the list constructor,
a_copy = list(a_list)
but using the constructor is less efficient:
>>> timeit
>>> l = range(20)
>>> min(timeit.repeat(lambda: l[:]))
0.30504298210144043
>>> min(timeit.repeat(lambda: list(l)))
0.40698814392089844
Python 3
In Python 3, lists get the list.copy method:
a_copy = a_list.copy()
In Python 3.5:
>>> import timeit
>>> l = list(range(20))
>>> min(timeit.repeat(lambda: l[:]))
0.38448613602668047
>>> min(timeit.repeat(lambda: list(l)))
0.6309100328944623
>>> min(timeit.repeat(lambda: l.copy()))
0.38122922903858125
Making another pointer does not make a copy
Using new_list = my_list then modifies new_list every time my_list changes. Why is this?
my_list is just a name that points to the actual list in memory. When you say new_list = my_list you're not making a copy, you're just adding another name that points at that original list in memory. We can have similar issues when we make copies of lists.
>>> l = [[], [], []]
>>> l_copy = l[:]
>>> l_copy
[[], [], []]
>>> l_copy[0].append('foo')
>>> l_copy
[['foo'], [], []]
>>> l
[['foo'], [], []]
The list is just an array of pointers to the contents, so a shallow copy just copies the pointers, and so you have two different lists, but they have the same contents. To make copies of the contents, you need a deep copy.
Deep copies
To make a deep copy of a list, in Python 2 or 3, use deepcopy in the copy module:
import copy
a_deep_copy = copy.deepcopy(a_list)
To demonstrate how this allows us to make new sub-lists:
>>> import copy
>>> l
[['foo'], [], []]
>>> l_deep_copy = copy.deepcopy(l)
>>> l_deep_copy[0].pop()
'foo'
>>> l_deep_copy
[[], [], []]
>>> l
[['foo'], [], []]
And so we see that the deep copied list is an entirely different list from the original. You could roll your own function - but don't. You're likely to create bugs you otherwise wouldn't have by using the standard library's deepcopy function.
Don't use eval
You may see this used as a way to deepcopy, but don't do it:
problematic_deep_copy = eval(repr(a_list))
It's dangerous, particularly if you're evaluating something from a source you don't trust.
It's not reliable, if a subelement you're copying doesn't have a representation that can be eval'd to reproduce an equivalent element.
It's also less performant.
In 64 bit Python 2.7:
>>> import timeit
>>> import copy
>>> l = range(10)
>>> min(timeit.repeat(lambda: copy.deepcopy(l)))
27.55826997756958
>>> min(timeit.repeat(lambda: eval(repr(l))))
29.04534101486206
on 64 bit Python 3.5:
>>> import timeit
>>> import copy
>>> l = list(range(10))
>>> min(timeit.repeat(lambda: copy.deepcopy(l)))
16.84255409205798
>>> min(timeit.repeat(lambda: eval(repr(l))))
34.813894678023644
Let's start from the beginning and explore this question.
So let's suppose you have two lists:
list_1 = ['01', '98']
list_2 = [['01', '98']]
And we have to copy both lists, now starting from the first list:
So first let's try by setting the variable copy to our original list, list_1:
copy = list_1
Now if you are thinking copy copied the list_1, then you are wrong. The id function can show us if two variables can point to the same object. Let's try this:
print(id(copy))
print(id(list_1))
The output is:
4329485320
4329485320
Both variables are the exact same argument. Are you surprised?
So as we know, Python doesn't store anything in a variable, Variables are just referencing to the object and object store the value. Here object is a list but we created two references to that same object by two different variable names. This means that both variables are pointing to the same object, just with different names.
When you do copy = list_1, it is actually doing:
Here in the image list_1 and copy are two variable names, but the object is same for both variable which is list.
So if you try to modify copied list then it will modify the original list too because the list is only one there, you will modify that list no matter you do from the copied list or from the original list:
copy[0] = "modify"
print(copy)
print(list_1)
Output:
['modify', '98']
['modify', '98']
So it modified the original list:
Now let's move onto a Pythonic method for copying lists.
copy_1 = list_1[:]
This method fixes the first issue we had:
print(id(copy_1))
print(id(list_1))
4338792136
4338791432
So as we can see our both list having different id and it means that both variables are pointing to different objects. So what actually going on here is:
Now let's try to modify the list and let's see if we still face the previous problem:
copy_1[0] = "modify"
print(list_1)
print(copy_1)
The output is:
['01', '98']
['modify', '98']
As you can see, it only modified the copied list. That means it worked.
Do you think we're done? No. Let's try to copy our nested list.
copy_2 = list_2[:]
list_2 should reference to another object which is copy of list_2. Let's check:
print(id((list_2)), id(copy_2))
We get the output:
4330403592 4330403528
Now we can assume both lists are pointing different object, so now let's try to modify it and let's see it is giving what we want:
copy_2[0][1] = "modify"
print(list_2, copy_2)
This gives us the output:
[['01', 'modify']] [['01', 'modify']]
This may seem a little bit confusing, because the same method we previously used worked. Let's try to understand this.
When you do:
copy_2 = list_2[:]
You're only copying the outer list, not the inside list. We can use the id function once again to check this.
print(id(copy_2[0]))
print(id(list_2[0]))
The output is:
4329485832
4329485832
When we do copy_2 = list_2[:], this happens:
It creates the copy of list, but only outer list copy, not the nested list copy. The nested list is same for both variable, so if you try to modify the nested list then it will modify the original list too as the nested list object is same for both lists.
What is the solution? The solution is the deepcopy function.
from copy import deepcopy
deep = deepcopy(list_2)
Let's check this:
print(id((list_2)), id(deep))
4322146056 4322148040
Both outer lists have different IDs. Let's try this on the inner nested lists.
print(id(deep[0]))
print(id(list_2[0]))
The output is:
4322145992
4322145800
As you can see both IDs are different, meaning we can assume that both nested lists are pointing different object now.
This means when you do deep = deepcopy(list_2) what actually happens:
Both nested lists are pointing different object and they have separate copy of nested list now.
Now let's try to modify the nested list and see if it solved the previous issue or not:
deep[0][1] = "modify"
print(list_2, deep)
It outputs:
[['01', '98']] [['01', 'modify']]
As you can see, it didn't modify the original nested list, it only modified the copied list.
There are many answers already that tell you how to make a proper copy, but none of them say why your original 'copy' failed.
Python doesn't store values in variables; it binds names to objects. Your original assignment took the object referred to by my_list and bound it to new_list as well. No matter which name you use there is still only one list, so changes made when referring to it as my_list will persist when referring to it as new_list. Each of the other answers to this question give you different ways of creating a new object to bind to new_list.
Each element of a list acts like a name, in that each element binds non-exclusively to an object. A shallow copy creates a new list whose elements bind to the same objects as before.
new_list = list(my_list) # or my_list[:], but I prefer this syntax
# is simply a shorter way of:
new_list = [element for element in my_list]
To take your list copy one step further, copy each object that your list refers to, and bind those element copies to a new list.
import copy
# each element must have __copy__ defined for this...
new_list = [copy.copy(element) for element in my_list]
This is not yet a deep copy, because each element of a list may refer to other objects, just like the list is bound to its elements. To recursively copy every element in the list, and then each other object referred to by each element, and so on: perform a deep copy.
import copy
# each element must have __deepcopy__ defined for this...
new_list = copy.deepcopy(my_list)
See the documentation for more information about corner cases in copying.
Use thing[:]
>>> a = [1,2]
>>> b = a[:]
>>> a += [3]
>>> a
[1, 2, 3]
>>> b
[1, 2]
>>>
Python 3.6 Timings
Here are the timing results using Python 3.6.8. Keep in mind these times are relative to one another, not absolute.
I stuck to only doing shallow copies, and also added some new methods that weren't possible in Python 2, such as list.copy() (the Python 3 slice equivalent) and two forms of list unpacking (*new_list, = list and new_list = [*list]):
METHOD TIME TAKEN
b = [*a] 2.75180600000021
b = a * 1 3.50215399999990
b = a[:] 3.78278899999986 # Python 2 winner (see above)
b = a.copy() 4.20556500000020 # Python 3 "slice equivalent" (see above)
b = []; b.extend(a) 4.68069800000012
b = a[0:len(a)] 6.84498999999959
*b, = a 7.54031799999984
b = list(a) 7.75815899999997
b = [i for i in a] 18.4886440000000
b = copy.copy(a) 18.8254879999999
b = []
for item in a:
b.append(item) 35.4729199999997
We can see the Python 2 winner still does well, but doesn't edge out Python 3 list.copy() by much, especially considering the superior readability of the latter.
The dark horse is the unpacking and repacking method (b = [*a]), which is ~25% faster than raw slicing, and more than twice as fast as the other unpacking method (*b, = a).
b = a * 1 also does surprisingly well.
Note that these methods do not output equivalent results for any input other than lists. They all work for sliceable objects, a few work for any iterable, but only copy.copy() works for more general Python objects.
Here is the testing code for interested parties (Template from here):
import timeit
COUNT = 50000000
print("Array duplicating. Tests run", COUNT, "times")
setup = 'a = [0,1,2,3,4,5,6,7,8,9]; import copy'
print("b = list(a)\t\t", timeit.timeit(stmt='b = list(a)', setup=setup, number=COUNT))
print("b = copy.copy(a)\t", timeit.timeit(stmt='b = copy.copy(a)', setup=setup, number=COUNT))
print("b = a.copy()\t\t", timeit.timeit(stmt='b = a.copy()', setup=setup, number=COUNT))
print("b = a[:]\t\t", timeit.timeit(stmt='b = a[:]', setup=setup, number=COUNT))
print("b = a[0:len(a)]\t\t", timeit.timeit(stmt='b = a[0:len(a)]', setup=setup, number=COUNT))
print("*b, = a\t\t\t", timeit.timeit(stmt='*b, = a', setup=setup, number=COUNT))
print("b = []; b.extend(a)\t", timeit.timeit(stmt='b = []; b.extend(a)', setup=setup, number=COUNT))
print("b = []; for item in a: b.append(item)\t", timeit.timeit(stmt='b = []\nfor item in a: b.append(item)', setup=setup, number=COUNT))
print("b = [i for i in a]\t", timeit.timeit(stmt='b = [i for i in a]', setup=setup, number=COUNT))
print("b = [*a]\t\t", timeit.timeit(stmt='b = [*a]', setup=setup, number=COUNT))
print("b = a * 1\t\t", timeit.timeit(stmt='b = a * 1', setup=setup, number=COUNT))
Python's idiom for doing this is newList = oldList[:]
All of the other contributors gave great answers, which work when you have a single dimension (leveled) list, however of the methods mentioned so far, only copy.deepcopy() works to clone/copy a list and not have it point to the nested list objects when you are working with multidimensional, nested lists (list of lists). While Felix Kling refers to it in his answer, there is a little bit more to the issue and possibly a workaround using built-ins that might prove a faster alternative to deepcopy.
While new_list = old_list[:], copy.copy(old_list)' and for Py3k old_list.copy() work for single-leveled lists, they revert to pointing at the list objects nested within the old_list and the new_list, and changes to one of the list objects are perpetuated in the other.
Edit: New information brought to light
As was pointed out by both Aaron Hall and PM 2Ring using eval() is not only a bad idea, it is also much slower than copy.deepcopy().
This means that for multidimensional lists, the only option is copy.deepcopy(). With that being said, it really isn't an option as the performance goes way south when you try to use it on a moderately sized multidimensional array. I tried to timeit using a 42x42 array, not unheard of or even that large for bioinformatics applications, and I gave up on waiting for a response and just started typing my edit to this post.
It would seem that the only real option then is to initialize multiple lists and work on them independently. If anyone has any other suggestions, for how to handle multidimensional list copying, it would be appreciated.
As others have stated, there are significant performance issues using the copy module and copy.deepcopy for multidimensional lists.
It surprises me that this hasn't been mentioned yet, so for the sake of completeness...
You can perform list unpacking with the "splat operator": *, which will also copy elements of your list.
old_list = [1, 2, 3]
new_list = [*old_list]
new_list.append(4)
old_list == [1, 2, 3]
new_list == [1, 2, 3, 4]
The obvious downside to this method is that it is only available in Python 3.5+.
Timing wise though, this appears to perform better than other common methods.
x = [random.random() for _ in range(1000)]
%timeit a = list(x)
%timeit a = x.copy()
%timeit a = x[:]
%timeit a = [*x]
#: 2.47 µs ± 38.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#: 2.47 µs ± 54.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#: 2.39 µs ± 58.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#: 2.22 µs ± 43.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
new_list = my_list[:]
new_list = my_list
Try to understand this. Let's say that my_list is in the heap memory at location X, i.e., my_list is pointing to the X. Now by assigning new_list = my_list you're letting new_list point to the X. This is known as a shallow copy.
Now if you assign new_list = my_list[:], you're simply copying each object of my_list to new_list. This is known as a deep copy.
The other ways you can do this are:
new_list = list(old_list)
import copy
new_list = copy.deepcopy(old_list)
A very simple approach independent of python version was missing in already-given answers which you can use most of the time (at least I do):
new_list = my_list * 1 # Solution 1 when you are not using nested lists
However, if my_list contains other containers (for example, nested lists) you must use deepcopy as others suggested in the answers above from the copy library. For example:
import copy
new_list = copy.deepcopy(my_list) # Solution 2 when you are using nested lists
.Bonus: If you don't want to copy elements use (AKA shallow copy):
new_list = my_list[:]
Let's understand difference between solution #1 and solution #2
>>> a = range(5)
>>> b = a*1
>>> a,b
([0, 1, 2, 3, 4], [0, 1, 2, 3, 4])
>>> a[2] = 55
>>> a,b
([0, 1, 55, 3, 4], [0, 1, 2, 3, 4])
As you can see, solution #1 worked perfectly when we were not using the nested lists. Let's check what will happen when we apply solution #1 to nested lists.
>>> from copy import deepcopy
>>> a = [range(i,i+4) for i in range(3)]
>>> a
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]]
>>> b = a*1
>>> c = deepcopy(a)
>>> for i in (a, b, c): print i
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]]
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]]
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]]
>>> a[2].append('99')
>>> for i in (a, b, c): print i
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5, 99]]
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5, 99]] # Solution #1 didn't work in nested list
[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]] # Solution #2 - DeepCopy worked in nested list
I wanted to post something a bit different than some of the other answers. Even though this is most likely not the most understandable, or fastest option, it provides a bit of an inside view of how deep copy works, as well as being another alternative option for deep copying. It doesn't really matter if my function has bugs, since the point of this is to show a way to copy objects like the question answers, but also to use this as a point to explain how deepcopy works at its core.
At the core of any deep copy function is way to make a shallow copy. How? Simple. Any deep copy function only duplicates the containers of immutable objects. When you deepcopy a nested list, you are only duplicating the outer lists, not the mutable objects inside of the lists. You are only duplicating the containers. The same works for classes, too. When you deepcopy a class, you deepcopy all of its mutable attributes. So, how? How come you only have to copy the containers, like lists, dicts, tuples, iters, classes, and class instances?
It's simple. A mutable object can't really be duplicated. It can never be changed, so it is only a single value. That means you never have to duplicate strings, numbers, bools, or any of those. But how would you duplicate the containers? Simple. You make just initialize a new container with all of the values. Deepcopy relies on recursion. It duplicates all the containers, even ones with containers inside of them, until no containers are left. A container is an immutable object.
Once you know that, completely duplicating an object without any references is pretty easy. Here's a function for deepcopying basic data-types (wouldn't work for custom classes but you could always add that)
def deepcopy(x):
immutables = (str, int, bool, float)
mutables = (list, dict, tuple)
if isinstance(x, immutables):
return x
elif isinstance(x, mutables):
if isinstance(x, tuple):
return tuple(deepcopy(list(x)))
elif isinstance(x, list):
return [deepcopy(y) for y in x]
elif isinstance(x, dict):
values = [deepcopy(y) for y in list(x.values())]
keys = list(x.keys())
return dict(zip(keys, values))
Python's own built-in deepcopy is based around that example. The only difference is it supports other types, and also supports user-classes by duplicating the attributes into a new duplicate class, and also blocks infinite-recursion with a reference to an object it's already seen using a memo list or dictionary. And that's really it for making deep copies. At its core, making a deep copy is just making shallow copies. I hope this answer adds something to the question.
EXAMPLES
Say you have this list: [1, 2, 3]. The immutable numbers cannot be duplicated, but the other layer can. You can duplicate it using a list comprehension: [x for x in [1, 2, 3]]
Now, imagine you have this list: [[1, 2], [3, 4], [5, 6]]. This time, you want to make a function, which uses recursion to deep copy all layers of the list. Instead of the previous list comprehension:
[x for x in _list]
It uses a new one for lists:
[deepcopy_list(x) for x in _list]
And deepcopy_list looks like this:
def deepcopy_list(x):
if isinstance(x, (str, bool, float, int)):
return x
else:
return [deepcopy_list(y) for y in x]
Then now you have a function which can deepcopy any list of strs, bools, floast, ints and even lists to infinitely many layers using recursion. And there you have it, deepcopying.
TLDR: Deepcopy uses recursion to duplicate objects, and merely returns the same immutable objects as before, as immutable objects cannot be duplicated. However, it deepcopies the most inner layers of mutable objects until it reaches the outermost mutable layer of an object.
Note that there are some cases where if you have defined your own custom class and you want to keep the attributes then you should use copy.copy() or copy.deepcopy() rather than the alternatives, for example in Python 3:
import copy
class MyList(list):
pass
lst = MyList([1,2,3])
lst.name = 'custom list'
d = {
'original': lst,
'slicecopy' : lst[:],
'lstcopy' : lst.copy(),
'copycopy': copy.copy(lst),
'deepcopy': copy.deepcopy(lst)
}
for k,v in d.items():
print('lst: {}'.format(k), end=', ')
try:
name = v.name
except AttributeError:
name = 'NA'
print('name: {}'.format(name))
Outputs:
lst: original, name: custom list
lst: slicecopy, name: NA
lst: lstcopy, name: NA
lst: copycopy, name: custom list
lst: deepcopy, name: custom list
Remember that in Python when you do:
list1 = ['apples','bananas','pineapples']
list2 = list1
List2 isn't storing the actual list, but a reference to list1. So when you do anything to list1, list2 changes as well. use the copy module (not default, download on pip) to make an original copy of the list(copy.copy() for simple lists, copy.deepcopy() for nested ones). This makes a copy that doesn't change with the first list.
A slight practical perspective to look into memory through id and gc.
>>> b = a = ['hell', 'word']
>>> c = ['hell', 'word']
>>> id(a), id(b), id(c)
(4424020872, 4424020872, 4423979272)
| |
-----------
>>> id(a[0]), id(b[0]), id(c[0])
(4424018328, 4424018328, 4424018328) # all referring to same 'hell'
| | |
-----------------------
>>> id(a[0][0]), id(b[0][0]), id(c[0][0])
(4422785208, 4422785208, 4422785208) # all referring to same 'h'
| | |
-----------------------
>>> a[0] += 'o'
>>> a,b,c
(['hello', 'word'], ['hello', 'word'], ['hell', 'word']) # b changed too
>>> id(a[0]), id(b[0]), id(c[0])
(4424018384, 4424018384, 4424018328) # augmented assignment changed a[0],b[0]
| |
-----------
>>> b = a = ['hell', 'word']
>>> id(a[0]), id(b[0]), id(c[0])
(4424018328, 4424018328, 4424018328) # the same hell
| | |
-----------------------
>>> import gc
>>> gc.get_referrers(a[0])
[['hell', 'word'], ['hell', 'word']] # one copy belong to a,b, the another for c
>>> gc.get_referrers(('hell'))
[['hell', 'word'], ['hell', 'word'], ('hell', None)] # ('hello', None)
There is another way of copying a list that was not listed until now: adding an empty list: l2 = l + [].
I tested it with Python 3.8:
l = [1,2,3]
l2 = l + []
print(l,l2)
l[0] = 'a'
print(l,l2)
It is not the best answer, but it works.
The deepcopy option is the only method that works for me:
from copy import deepcopy
a = [ [ list(range(1, 3)) for i in range(3) ] ]
b = deepcopy(a)
b[0][1]=[3]
print('Deep:')
print(a)
print(b)
print('-----------------------------')
a = [ [ list(range(1, 3)) for i in range(3) ] ]
b = a*1
b[0][1]=[3]
print('*1:')
print(a)
print(b)
print('-----------------------------')
a = [ [ list(range(1, 3)) for i in range(3) ] ]
b = a[:]
b[0][1]=[3]
print('Vector copy:')
print(a)
print(b)
print('-----------------------------')
a = [ [ list(range(1, 3)) for i in range(3) ] ]
b = list(a)
b[0][1]=[3]
print('List copy:')
print(a)
print(b)
print('-----------------------------')
a = [ [ list(range(1, 3)) for i in range(3) ] ]
b = a.copy()
b[0][1]=[3]
print('.copy():')
print(a)
print(b)
print('-----------------------------')
a = [ [ list(range(1, 3)) for i in range(3) ] ]
b = a
b[0][1]=[3]
print('Shallow:')
print(a)
print(b)
print('-----------------------------')
leads to output of:
Deep:
[[[1, 2], [1, 2], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
*1:
[[[1, 2], [3], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
Vector copy:
[[[1, 2], [3], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
List copy:
[[[1, 2], [3], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
.copy():
[[[1, 2], [3], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
Shallow:
[[[1, 2], [3], [1, 2]]]
[[[1, 2], [3], [1, 2]]]
-----------------------------
This is because, the line new_list = my_list assigns a new reference to the variable my_list which is new_list
This is similar to the C code given below,
int my_list[] = [1,2,3,4];
int *new_list;
new_list = my_list;
You should use the copy module to create a new list by
import copy
new_list = copy.deepcopy(my_list)
The method to use depends on the contents of the list being copied. If the list contains nested dicts than deepcopy is the only method that works, otherwise most of the methods listed in the answers (slice, loop [for], copy, extend, combine, or unpack) will work and execute in similar time (except for loop and deepcopy, which preformed the worst).
Script
from random import randint
from time import time
import copy
item_count = 100000
def copy_type(l1: list, l2: list):
if l1 == l2:
return 'shallow'
return 'deep'
def run_time(start, end):
run = end - start
return int(run * 1000000)
def list_combine(data):
l1 = [data for i in range(item_count)]
start = time()
l2 = [] + l1
end = time()
if type(data) == dict:
l2[0]['test'].append(1)
elif type(data) == list:
l2.append(1)
return {'method': 'combine', 'copy_type': copy_type(l1, l2),
'time_µs': run_time(start, end)}
def list_extend(data):
l1 = [data for i in range(item_count)]
start = time()
l2 = []
l2.extend(l1)
end = time()
if type(data) == dict:
l2[0]['test'].append(1)
elif type(data) == list:
l2.append(1)
return {'method': 'extend', 'copy_type': copy_type(l1, l2),
'time_µs': run_time(start, end)}
def list_unpack(data):
l1 = [data for i in range(item_count)]
start = time()
l2 = [*l1]
end = time()
if type(data) == dict:
l2[0]['test'].append(1)
elif type(data) == list:
l2.append(1)
return {'method': 'unpack', 'copy_type': copy_type(l1, l2),
'time_µs': run_time(start, end)}
def list_deepcopy(data):
l1 = [data for i in range(item_count)]
start = time()
l2 = copy.deepcopy(l1)
end = time()
if type(data) == dict:
l2[0]['test'].append(1)
elif type(data) == list:
l2.append(1)
return {'method': 'deepcopy', 'copy_type': copy_type(l1, l2),
'time_µs': run_time(start, end)}
def list_copy(data):
l1 = [data for i in range(item_count)]
start = time()
l2 = list.copy(l1)
end = time()
if type(data) == dict:
l2[0]['test'].append(1)
elif type(data) == list:
l2.append(1)
return {'method': 'copy', 'copy_type': copy_type(l1, l2),
'time_µs': run_time(start, end)}
def list_slice(data):
l1 = [data for i in range(item_count)]
start = time()
l2 = l1[:]
end = time()
if type(data) == dict:
l2[0]['test'].append(1)
elif type(data) == list:
l2.append(1)
return {'method': 'slice', 'copy_type': copy_type(l1, l2),
'time_µs': run_time(start, end)}
def list_loop(data):
l1 = [data for i in range(item_count)]
start = time()
l2 = []
for i in range(len(l1)):
l2.append(l1[i])
end = time()
if type(data) == dict:
l2[0]['test'].append(1)
elif type(data) == list:
l2.append(1)
return {'method': 'loop', 'copy_type': copy_type(l1, l2),
'time_µs': run_time(start, end)}
def list_list(data):
l1 = [data for i in range(item_count)]
start = time()
l2 = list(l1)
end = time()
if type(data) == dict:
l2[0]['test'].append(1)
elif type(data) == list:
l2.append(1)
return {'method': 'list()', 'copy_type': copy_type(l1, l2),
'time_µs': run_time(start, end)}
if __name__ == '__main__':
list_type = [{'list[dict]': {'test': [1, 1]}},
{'list[list]': [1, 1]}]
store = []
for data in list_type:
key = list(data.keys())[0]
store.append({key: [list_unpack(data[key]), list_extend(data[key]),
list_combine(data[key]), list_deepcopy(data[key]),
list_copy(data[key]), list_slice(data[key]),
list_loop(data[key])]})
print(store)
Results
[{"list[dict]": [
{"method": "unpack", "copy_type": "shallow", "time_µs": 56149},
{"method": "extend", "copy_type": "shallow", "time_µs": 52991},
{"method": "combine", "copy_type": "shallow", "time_µs": 53726},
{"method": "deepcopy", "copy_type": "deep", "time_µs": 2702616},
{"method": "copy", "copy_type": "shallow", "time_µs": 52204},
{"method": "slice", "copy_type": "shallow", "time_µs": 52223},
{"method": "loop", "copy_type": "shallow", "time_µs": 836928}]},
{"list[list]": [
{"method": "unpack", "copy_type": "deep", "time_µs": 52313},
{"method": "extend", "copy_type": "deep", "time_µs": 52550},
{"method": "combine", "copy_type": "deep", "time_µs": 53203},
{"method": "deepcopy", "copy_type": "deep", "time_µs": 2608560},
{"method": "copy", "copy_type": "deep", "time_µs": 53210},
{"method": "slice", "copy_type": "deep", "time_µs": 52937},
{"method": "loop", "copy_type": "deep", "time_µs": 834774}
]}]
Frame challenge: do you actually need to copy, for your application?
I often see code that tries to modify a copy of the list in some iterative fashion. To construct a trivial example, suppose we had non-working (because x should not be modified) code like:
x = [8, 6, 7, 5, 3, 0, 9]
y = x
for index, element in enumerate(y):
y[index] = element * 2
# Expected result:
# x = [8, 6, 7, 5, 3, 0, 9] <-- this is where the code is wrong.
# y = [16, 12, 14, 10, 6, 0, 18]
Naturally people will ask how to make y be a copy of x, rather than a name for the same list, so that the for loop will do the right thing.
But this is the wrong approach. Functionally, what we really want to do is make a new list that is based on the original.
We don't need to make a copy first to do that, and we typically shouldn't.
When we need to apply logic to each element
The natural tool for this is a list comprehension. This way, we write the logic that tells us how the elements in the desired result, relate to the original elements. It's simple, elegant and expressive; and we avoid the need for workarounds to modify the y copy in a for loop (since assigning to the iteration variable doesn't affect the list - for the same reason that we wanted the copy in the first place!).
For the above example, it looks like:
x = [8, 6, 7, 5, 3, 0, 9]
y = [element * 2 for element in x]
List comprehensions are quite powerful; we can also use them to filter out elements by a rule with an if clause, and we can chain for and if clauses (it works like the corresponding imperative code, with the same clauses in the same order; only the value that will ultimately end up in the result list, is moved to the front instead of being in the "innermost" part). If the plan was to iterate over the original while modifying the copy to avoid problems, there is generally a much more pleasant way to do that with a filtering list comprehension.
When we need to reject or insert specific elements by position
Suppose instead that we had something like
x = [8, 6, 7, 5, 3, 0, 9]
y = x
del y[2:-2] # oops, x was changed inappropriately
Rather than making y a separate copy first in order to delete the part we don't want, we can build a list by putting together the parts that we do want. Thus:
x = [8, 6, 7, 5, 3, 0, 9]
y = x[:2] + x[-2:]
Handling insertion, replacement etc. by slicing is left as an exercise. Just reason out which subsequences you want the result to contain. A special case of this is making a reversed copy - assuming we need a new list at all (rather than just to iterate in reverse), we can directly create it by slicing, rather than cloning and then using .reverse.
These approaches - like the list comprehension - also have the advantage that they create the desired result as an expression, rather than by procedurally modifying an existing object in-place (and returning None). This is more convenient for writing code in a "fluent" style.
new_list = my_list
because: new_list will only be a reference to my_list, and changes made in new_list will automatically also be made in my_list and vice versa
There are two easy ways to copy a list
new_list = my_list.copy()
or
new_list = list(my_list)
Short and simple explanations of each copy mode:
A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original - creating a shallow copy:
new_list = my_list
A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original - creating a deep copy:
new_list = list(my_list)
list() works fine for deep copy of simple lists, like:
my_list = ["A","B","C"]
But, for complex lists like...
my_complex_list = [{'A' : 500, 'B' : 501},{'C' : 502}]
...use deepcopy():
import copy
new_complex_list = copy.deepcopy(my_complex_list)

Most efficient way to select pairs of unique numbers out of many pairs

Problem
I have a list of integer pairs. I want to select pairs from this list such that the following conditions would be satisfied:
None of the numbers should appear in more than one pair.
The maximum number of pairs should be selected.
Notice that there might be multiple answers for the same data, but we just want one.
Example
Let's say our list is the following:
(1, 2)
(2, 3)
(2, 4)
(1, 5)
(10, 11)
The simplest algorithm that guarantees the satisfaction of the first condition only, would be just selecting the first pairs with non-duplicate numbers:
(1, 2)
(10, 11)
However, the valid algorithm that satisfies both, should return the following:
(2, 3) or (2, 4)
(1, 5)
(10, 11)
As already mentioned by #kaya3, your problem is called Maximum cardinality matching, and you can read more about this here.
Some people suggested using or-tools for this, and so I wanted to show how it could be done (my solution is probably not the most efficient though).
Preparation
Add a sat solver and prepare initial data:
from ortools.sat.python import cp_model
pairs = [[1, 2],
[2, 3],
[2, 4],
[1, 5],
[10, 11]]
solver = cp_model.CpSolver()
model = cp_model.CpModel()
Define edges
First, you define an array of boolean variables:
edges = {}
for id1 in range(len(pairs)):
edges[id1] = model.NewBoolVar('edges[%i]' % id1)
edges[i] is 1 if your pair is included in the final solution, and 0 otherwise.
Pair values are different from the other
For each pair of numbers from the original list, values must be different from values in the other pairs to be included in the final solution.
for i in range(len(pairs)):
e1 = edges[i]
pair1_0 = model.NewConstant(pairs[i][0])
pair1_1 = model.NewConstant(pairs[i][1])
for j in range(i + 1, len(pairs)):
e2 = edges[j]
pair2_0 = model.NewConstant(pairs[j][0])
pair2_1 = model.NewConstant(pairs[j][1])
model.Add(pair1_0 != pair2_0).OnlyEnforceIf(e1).OnlyEnforceIf(e2)
model.Add(pair1_1 != pair2_0).OnlyEnforceIf(e1).OnlyEnforceIf(e2)
model.Add(pair1_0 != pair2_1).OnlyEnforceIf(e1).OnlyEnforceIf(e2)
model.Add(pair1_1 != pair2_1).OnlyEnforceIf(e1).OnlyEnforceIf(e2)
Basically, what we do above, is enforce uniqueness of numbers for any pair of pairs (sorry for tautology).
Maximise number of pairs in solution
That's the easiest part. Just enforce that the number of included pairs is maximum:
model.Maximize(sum(edges[id] for id in range(len(edges))))
Show solution
status = solver.Solve(model)
if status == cp_model.OPTIMAL:
print ('Printing solutions below...')
for i in range(len(pairs)):
if solver.Value(edges[i]):
print (pairs[i])
Output
Printing solutions below...
[2, 4]
[1, 5]
[10, 11]
Full Code
from ortools.sat.python import cp_model
pairs = [[1, 2],
[2, 3],
[2, 4],
[1, 5],
[10, 11]]
solver = cp_model.CpSolver()
model = cp_model.CpModel()
edges = {}
for id1 in range(len(pairs)):
edges[id1] = model.NewBoolVar('edges[%i]' % id1)
for i in range(len(pairs)):
e1 = edges[i]
pair1_0 = model.NewConstant(pairs[i][0])
pair1_1 = model.NewConstant(pairs[i][1])
for j in range(i + 1, len(pairs)):
e2 = edges[j]
pair2_0 = model.NewConstant(pairs[j][0])
pair2_1 = model.NewConstant(pairs[j][1])
model.Add(pair1_0 != pair2_0).OnlyEnforceIf(e1).OnlyEnforceIf(e2)
model.Add(pair1_1 != pair2_0).OnlyEnforceIf(e1).OnlyEnforceIf(e2)
model.Add(pair1_0 != pair2_1).OnlyEnforceIf(e1).OnlyEnforceIf(e2)
model.Add(pair1_1 != pair2_1).OnlyEnforceIf(e1).OnlyEnforceIf(e2)
model.Maximize(sum(edges[id] for id in range(len(edges))))
status = solver.Solve(model)
if status == cp_model.OPTIMAL:
print ('Printing solutions below...')
for i in range(len(pairs)):
if solver.Value(edges[i]):
print (pairs[i])

Algorithm: Find out which objects hold the subset of input array

We have some objets (about 100,000 objects), each object has a property with some integers (range from 1 to 20,000, at most 20 elements, no duplicated elements.):
For example:
object_1: [1, 4]
object_2: [1, 3]
object_3: [100]
And the problem is, we input a array of integer (called A), find out which objects hold the subset of A.
For example:
when A = [1], the output should be []
when A = [1, 4], the output should be [object_1]
when A = [1, 3, 4], the output should be [object_1, object_2]
The problem can be described in python:
from typing import List
# problem description
class Object(object):
def __init__(self, integers):
self.integers = integers
def size(self):
return len(self.integers)
object_1 = Object([1, 4])
object_2 = Object([1, 3])
object_3 = Object([100])
def _find_subset_objects(integers): # type: (List[int]) -> List[Object]
raise NotImplementedError()
def test(find_subset_objects=_find_subset_objects):
assert find_subset_objects([1]) == []
assert find_subset_objects([1, 4]) == [object_1]
assert find_subset_objects([1, 3, 4]) == [object_1, object_2]
Is there some algorithm or some data struct is aim to solve this kind of problem?
Store the objects in an array. The indices will be 0 ... ~100K. Then create two helper arrays.
First one with the element counts for every object. I will call this array obj_total(This could be ommited by calling the object.size or something similar if you wish.)
Second one initialized with zeroes. I will call it current_object_count.
For every integer property p where 0 < p <= 20000, create a list of indices where index i in the list means that the element is contained in the i-th object.
It is getting messy and I'm getting lost in the names. Time for the example with the objects that you used in the question:
objects = [[1, 4], [1, 3], [100]]
obj_total = [2, 2, 1]
current_object_count = [0, 0, 0]
object_1_ref = [0, 1]
object_2_ref = [ ]
object_3_ref = [1]
object_4_ref = [0]
object_100_ref = [100]
object_refs = [object_1_ref ,object_2_ref , ... , object_100_ref]
#Note that you will want to do this references dynamically.
#I'm writing them out only for the sake of clarity.
Now we are given the input array, for example [1, 3, 4]. For every element i in the array, we look we look at the object_i_ref. We then use the indices in the reference array to increase the values in the current_object_count array.
Whenever you increase a value in the current_object_count[x], you also check against the obj_total[x] array. If the values match, the object in objects[x] is a subset of the input array and we can note it down.
When you finish with the input array you have all the results.

Scala : Sorting list of number based on another list

I am implementing an algorithm in scala where I have set of nodes (Integers numbers) and each node has one property associated with it, lets call that property "d" (which is again an integer).
I have a list[Int] , this list contains nodes in the descending order of value "d".
Also I have a Map[Int,Iterable[Int]] , here key is a node and value is the list of all its neighbors.
The question is, how can I store the List of neighbors for a node in Map in the descending order of property "d" .
Example :
List 1 : List[1,5,7,2,4,8,6,3] --> Imagine this list is sorted in some order and has all the numbers.
Map : [Int,Iterable][Int]] --> [1 , Iterable[2,3,4,5,6]]
This iterable may or may not have all numbers.
In simple words, I want the numbers in Iterable to be in same order as in List 1.
So my entry in Map should be : [1, Iterable[5,2,4,6,3]]
The easiest way to do this is to just filter the sorted list.
val list = List(1,5,7,2,4,8,6,3)
val map = Map(1 -> List(2,3,4,5,6),
2 -> List(1,2,7,8))
val map2 = map.mapValues(neighbors => list.filter(neighbors.contains))
println(map2)
Here is a possible solution utilizing foldLeft (note we get an ArrayBuffer at end instead of desired Iterable, but the type signature does say Iterable):
scala> val orderTemplate = List(1,5,7,2,4,8,6,3)
orderTemplate: List[Int] = List(1, 5, 7, 2, 4, 8, 6, 3)
scala> val toOrder = Map(1 -> Iterable(2,3,4,5,6))
toOrder: scala.collection.immutable.Map[Int,Iterable[Int]] = Map(1 -> List(2, 3, 4, 5, 6))
scala> val ordered = toOrder.mapValues(iterable =>
orderTemplate.foldLeft(Iterable.empty[Int])((a, i) =>
if (iterable.toBuffer.contains(i)) a.toBuffer :+ i
else a
)
)
ordered: scala.collection.immutable.Map[Int,Iterable[Int]] = Map(1 -> ArrayBuffer(5, 2, 4, 6, 3))
Here's what I got.
val lst = List(1,5,7,2,4,8,6,3)
val itr = Iterable(2,3,4,5,6)
itr.map(x => (lst.indexOf(x), x))
.toArray
.sorted
.map(_._2)
.toIterable // res0: Iterable[Int] = WrappedArray(5, 2, 4, 6, 3)
I coupled each entry with its relative index in the full list.
Can't sort iterables so went with Array (for no particular reason).
Tuples sorting defaults to the first element.
Remove the indexes.
Back to Iterable.

Find closest numbers in array to given value

I'm looking to create a method that will return to me the 5 closest numbers in an array. Here is what I have to get me started. I'm looking to compare differences but I feel there has to be a simpler way .
def get_suggested_items
#suggested_items = []
new_price = self.price
products = Product.all
products.each do |product, difference|
price = product.price
old_difference = new_price - product.price
difference = (new_price - product.price).abs
while difference < old_difference
#suggested_items << product
end
end
I'm looking to have returned the array #suggested_items with the 5 closest products by the price
SQL was designed for this sort of thing. Add the following class method to your Product model:
class Product < ActiveRecord::Base
def self.with_price_nearest_to(price)
order("abs(products.price - #{price})")
end
end
Then you can write:
Product.with_price_nearest_to(3.99).limit(5)
There is a distinct performance advantage to this approach over what you outlined in your question. In this case, the database does the calculation and sorting for you and returns to ActiveRecord only the 5 products that you need. When you do Product.all or even Product.each you're forcing ActiveRecord to instantiate a model for every row in your table, which gets expensive as the table gets larger.
Note that this approach still requires a full table scan; if you want to improve the performance further, you can add an index to price column on the products table.
Suppose arr is a sorted array of integers. (If it's not sorted, then sort as the first step.)
I assume you want to find a sequence of five elements from the array, a = arr[i,5], such that a.last-a.first is minimum for all i, 0 <= i <= arr.size-4. If that's correct, then it's simply:
start_index = (arr.size-4).times.min_by { |i| arr[i+4]-arr[i] }
Suppose
arr = [1, 2, 4, 5, 8, 9, 11, 12, 13, 15, 17, 19, 23, 24, 24, 25, 30]
start_index = (arr.size-4).times.min_by { |i| arr[i+4]-arr[i] }
#=> 4
So the "closest" five numbers would be:
arr[4,5]
#=> [8, 9, 11, 12, 13]

Resources