Pool objects, keep reference to pool, make illegal states irrepresentable - data-structures

I have a number of objects and I'd like to "pool" them, i.e., put them into lists or sets such that
every object appears in at most one list, and
every object knows which list it's in.
In Python, I could do
# create objects
pool1 = [obj1, obj5, obj6]
pool2 = [obj3]
pool3 = [obj8, obj7]
obj1.pool = pool1
obj2.pool = None
obj3.pool = pool2
obj4.pool = None
obj5.pool = pool1
obj6.pool = pool1
obj7.pool = pool3
obj8.pool = pool3
This works, but has the disadvantage that the data structure can represent illegal states, e.g.,
pool1 = [obj1]
pool2 = []
obj1.pool = pool2
or
pool1 = [obj1]
pool2 = [obj1]
obj1.pool = pool1
Is there a more fitting data structure for this?

I don't think there is a more fitting data structure for this, as you need the association to work in two directions (from object to list, from list to object).
The best is probably to encapsulate this logic in a class and require that the caller uses only the provided methods to manipulate the data structure.
In Python that could look like this:
class Node:
def __init__(self, name):
self.name = name
self.pool = None
def __repr__(self):
return self.name
class Pools():
def __init__(self):
self._pools = {}
def assign(self, poolid, obj):
if poolid not in self._pools:
self._pools[poolid] = set()
if obj.pool is not None:
self._pools[obj.pool].discard(obj)
if poolid is not None:
self._pools[poolid].add(obj)
obj.pool = poolid
def unassign(self, obj):
self.assign(None, obj)
def content(self, poolid):
return list(self._pools[poolid])
# demo
a = Node("a")
b = Node("b")
c = Node("c")
pools = Pools()
pools.assign(0, a)
pools.assign(0, b)
pools.assign(5, c)
pools.assign(3, a)
pools.assign(3, c)
pools.unassign(b)
print(pools.content(0)) # []
print(pools.content(3)) # ['a', 'c']
print(pools.content(5)) # []
print(a.pool) # 3
print(b.pool) # None
print(c.pool) # 3
You could improve on this and make Pools a subclass of dict, but you get the idea.

Related

How to implement Elasticsearch advanced search with DRF

I want to implement a search in Elastic Search with Django Rest Framework. I have a form for searching as follows.
I used a serializer to implement this form.
search.py:
class AdvancedSearch(mixins.ListModelMixin, viewsets.GenericViewSet):
serializer_class = AdvancedSearchSerializer
def query_builder(self, *args, **kwargs):
## building related query
return query
#get_db()
def get_queryset(self, db=None, *args, **kwargs):
serializer = self.get_serializer(data=self.request.data)
serializer.is_valid(raise_exception=True)
query = self.query_builder(search_input=serializer.validated_data)
response = db.search(query) # query the elastic with elasticsearch-dsl and return the results
if not response:
raise NoteFound()
return response
def list(self, request, *args, **kwargs):
serializer = self.get_serializer(data=request.data)
self.serializer_class = AdvancedSearchSerializer
return super(AdvancedSearch, self).list(request, *args, **kwargs)
serializer.py:
class AdvancedSearchSerializer(serializers.Serializer):
metadata_choices = [('', ''), ...]
name = serializers.CharField(required=False, label='Name')
type = serializers.CharField(required=False, label='Type')
metadata = serializers.CharField(required=False, label='Metadata')
metadata_fields = serializers.MultipleChoiceField(allow_blank=True, choices=metadata_choices)
submit_date = serializers.DateTimeField(required=False)
def to_representation(self, instance):
output = {}
output['es_id'] = instance.meta.id
for attribute_name in instance:
attribute = getattr(instance, attribute_name)
if isinstance(attribute, (str, int, bool, float, type(None))):
# Primitive types can be passed through unmodified.
output[attribute_name] = attribute
elif isinstance(attribute, list):
# Recursively deal with items in lists.
output[attribute_name] = [
self.to_representation(item) for item in attribute
]
elif isinstance(attribute, (dict, AttrDict)):
temp = attribute.to_dict()
for key, value in temp.items():
print(key,value)
# Recursively deal with items in dictionaries.
output[attribute_name] = {
str(key): value
for key, value in temp.items()
}
else:
# Force anything else to its string representation.
output[attribute_name] = attribute
output['highlight'] = instance.meta.highlight.to_dict()
return [output]
With this code, I get the expected result, but I was wondering if this is a right approach.
And also in to_representation I have access to each result, but how can I add a total value like the number of results.
Thanks in advance.

Storing and retrieving object in ray.io

I have a ray cluster running on a machine as below:
ray start --head --redis-port=6379
I have two files that need to run on the cluster.
Producer p_ray.py:
import ray
ray.init(address='auto', redis_password='5241590000000000')
#ray.remote
class Counter(object):
def __init__(self):
self.n = 0
def increment(self):
self.n += 1
def read(self):
return self.n
counters = [Counter.remote() for i in range(4)]
[c.increment.remote() for c in counters]
futures = [c.read.remote() for c in counters]
print(futures, type(futures[0]))
obj_id = ray.put(ray.get(futures))
print(obj_id)
print(ray.get(obj_id))
while True:
pass
Consumer c_ray.py:
import ray
ray.init(address='auto', redis_password='5241590000000000')
[objs] = ray.objects()
print('OBJ-ID:', objs, 'TYPE:', type(objs))
print(ray.get([objs]))
My intention is to store the futures objects from producer and retrieve it in the consumer. I can retrieve the Object ID in the consumer. However the get in the consumer never returns.
What am I doing wrong?
How do I resolve my requirement?
This particular case can be a bug (I am not 100% sure). I created an issue at Ray Github.
But, this is not a good way to get object created by the p_ray.py. If you have many objects, it will be extremely complicated to manage. You can implement a similar thing using a detached actor. https://ray.readthedocs.io/en/latest/advanced.html#detached-actors.
The idea is to create a detached actor that can be retrieved by any driver/worker running in the same cluster.
p_ray.py
import ray
ray.init(address='auto', redis_password='5241590000000000')
#ray.remote
class DetachedQueue:
def __init__(self):
self.dict = {}
def put(key, value):
self.dict[key] = value
def get(self):
return self.dict
#ray.remote
class Counter(object):
def __init__(self):
self.n = 0
def increment(self):
self.n += 1
def read(self):
return self.n
queue = DetachedQueue.remote(name="queue_1", detached=True)
counters = [Counter.remote() for i in range(4)]
[c.increment.remote() for c in counters]
futures = [c.read.remote() for c in counters]
print(futures, type(futures[0]))
queue.put.remote("key", ray.get(futures)))
while True:
pass
c_ray.py:
import ray
ray.init(address='auto', redis_password='5241590000000000')
queue = ray.util.get_actor("queue_1")
print(ray.get(queue.get.remote()))

Overriding from_yaml to add custom YAML tag

Is overriding from_yaml enough to register a tag from a class or is it necessary to use yaml.add_constructor(Class.yaml_tag, Class.from_yaml)? If I don't use te add_constructor method, my YAML tags are not recognized. Example of what I have:
import yaml
class Something(yaml.YAMLObject):
yaml_tag = u'!Something'
#classmethod
def from_yaml(cls,loader,node):
# Set attributes to None if not in file
values = loader.construct_mapping(node, deep=True)
attr = ['attr1','attr2']
result = {}
for val in attr:
try:
result[val] = values[val]
except KeyError:
result[val] = None
return cls(**result)
Is this enough for it to work? I'm confused with the use of from_yaml vs any other constructor you would register using the method I mentioned above. I suppose there's something fundamental I'm missing, since they say:
Subclassing YAMLObject is an easy way to define tags, constructors,
and representers for your classes. You only need to override the
yaml_tag attribute. If you want to define your custom constructor and
representer, redefine the from_yaml and to_yaml method
correspondingly.
There is indeed no need to register explicitly:
import yaml
class Something(yaml.YAMLObject):
yaml_tag = u'!Something'
def __init__(self, *args, **kw):
print('some_init', args, kw)
#classmethod
def from_yaml(cls,loader,node):
# Set attributes to None if not in file
values = loader.construct_mapping(node, deep=True)
attr = ['attr1','attr2']
result = {}
for val in attr:
try:
result[val] = values[val]
except KeyError:
result[val] = None
return cls(**result)
yaml_str = """\
test: !Something
attr1: 1
attr2: 2
"""
d = yaml.load(yaml_str)
which gives:
some_init () {'attr1': 1, 'attr2': 2}
But there is no need at all to use PyYAML's load() which is
documented to be unsafe. You can just use safe_load if you set the yaml_loader class attribute:
import yaml
class Something(yaml.YAMLObject):
yaml_tag = u'!Something'
yaml_loader = yaml.SafeLoader
def __init__(self, *args, **kw):
print('some_init', args, kw)
#classmethod
def from_yaml(cls,loader,node):
# Set attributes to None if not in file
values = loader.construct_mapping(node, deep=True)
attr = ['attr1','attr2']
result = {}
for val in attr:
try:
result[val] = values[val]
except KeyError:
result[val] = None
return cls(**result)
yaml_str = """\
test: !Something
attr1: 1
attr2: 2
"""
d = yaml.safe_load(yaml_str)
as this gives the same:
some_init () {'attr1': 1, 'attr2': 2}
(done both with Python 3.6 and Python 2.7)
The registering is done in the __init__() of the metaclass of yaml.YAMLObject:
class YAMLObjectMetaclass(type):
"""
The metaclass for YAMLObject.
"""
def __init__(cls, name, bases, kwds):
super(YAMLObjectMetaclass, cls).__init__(name, bases, kwds)
if 'yaml_tag' in kwds and kwds['yaml_tag'] is not None:
cls.yaml_loader.add_constructor(cls.yaml_tag, cls.from_yaml)
cls.yaml_dumper.add_representer(cls, cls.to_yaml)
So maybe you are somehow interfering with that initialisation in your full class definition. Try to start with a minimal implementation as I did, and add the functionality on your class that you need until things break.

ModelSerializer with data as list

I have a ModelSerializer class as follows which I want to accept a list of items or a single item (dictionary) as data. The documentation states that passing "many" as True will support my requirement.
class PointSerializer(serializers.ModelSerializer):
class Meta:
model = Point
def __init__(self, *args, **kwargs):
if "data" in kwargs:
if isinstance(kwargs["data"]):
kwargs["many"] = True
super(PointSerializer, self).__init__(*args, **kwargs)
Now, providing data dictionary as follows works:
p = PointSerializer(data={'x':10, 'y': 12})
p.is_valid() # True
But this, with a list of dictionaries, fails:
p = PointSerializer(data=[{'x':10, 'y':12}, {'x':12, 'y':12}])
p.is_valid() # False
p.errors() # {'non_field_errors': ['Invalid data. Expected a dictionary, but got a list.']}
UPDATE:
Thanks to the chosen answer, I've changed my code to the following and it works fine:
class PointSerializer(serializers.ModelSerializer):
class Meta:
model = Point
>>> ps = PointSerializer(data={'x':10, 'y':12})
>>> ps.is_valid()
... True
>>> ps = PointSerializer(data=[{'x':10, 'y':12}, {'x':12, 'y':12}], many=True)
>>> ps.is_valid()
... True
many=True argument will only work when instantiating the serializer because it'll return a ListSerializer behind the scene.
Your option are either you set the many=True as serializer argument during creation call, either use explicitly the ListSerializer.

How do I mass-assign unique instance variable names when iterating and parsing over an array of hashes?

Here is the big hash that I start with (actually its been refined a step or two but this is what I'm starting with at this point.
angel_hash = {"follower_count"=>1369, "name"=>"AngelList", "markets"=>
[{"display_name"=>"Startups", "name"=>"startups", "id"=>448, "tag_type"=>"MarketTag",
"angellist_url"=>"http://angel.co/startups-1"}, {"display_name"=>"Venture Capital",
"name"=>"venture capital", "id"=>856, "tag_type"=>"MarketTag",
"angellist_url"=>"http://angel.co/venture-capital"}], "video_url"=>"",
"created_at"=>"2011-03-18T00:24:29Z", "updated_at"=>"2012-07-09T14:12:28Z",
"product_desc"=>"AngelList is a platform for startups to meet investors and talent. ",
"blog_url"=>"http://blog.angel.co",
"thumb_url"=>"https://s3.amazonaws.com/photos.angel.co/startups/i/6702-
766d1ce00c99ce9a5cbc19d0c87a436e-thumb_jpg.jpg", "id"=>6702,
"company_url"=>"http://angel.co", "locations"=>[{"display_name"=>"San Francisco",
"name"=>"san francisco", "id"=>1692, "tag_type"=>"LocationTag",
"angellist_url"=>"http://angel.co/san-francisco"}], "community_profile"=>false, "status"=>
{"message"=>"Done Deal: #volunteerspot raises $1.5M
http://techcrunch.com/2012/06/27/targeting-power-moms-volunteerspot-secures-1-5m-in-
series-a-from-ff-venture-capital-and-more/ \316\207 20 intros on AngelList \316\207 Funded
by #ff-venture-capital", "created_at"=>"2012-06-28T20:37:58Z", "id"=>63110},
"twitter_url"=>"http://twitter.com/angellist", "high_concept"=>"A platform for startups",
"logo_url"=>"https://s3.amazonaws.com/photos.angel.co/startups/i/6702
-766d1ce00c99ce9a5cbc19d0c87a436e-medium_jpg.jpg",
"angellist_url"=>"http://angel.co/angellist", "screenshots"=>
[{"thumb"=>"https://s3.amazonaws.com/screenshots.angel.co/98/6702/009cff275fb96709c915c4d4abc9
43d6-thumb_jpg.jpg",
"original"=>"https://s3.amazonaws.com/screenshots.angel.co/98/6702/009cff275fb96709c915c4d4abc
943d6-original.jpg"}], "hidden"=>false}
Out of this hash I parsed out some elements, and am doing just fine until I run into the embedded arrays
module SimpleAngel
class Company
attr_accessor :followers, :company_name, :markets_array, :date_joined, :locations_array
attr_accessor :high_concept, :high_concept_long, :thumbnail_logo, :full_size_logo
attr_accessor :angel_url, :twitter_url, :company_url, :blog_url
def initialize(angel_hash)
#followers = angel_hash['follower_count']
#company_name = angel_hash['name']
#markets_array = angel_hash['markets']
#markets_array.each_with_index do |market, i|
###This is where I'm stuck. I want to pull out individual elements
# from each array AND dynamically assign unique instance variable names for
# each separate market in the markets array. Something like #market1_name,
# #market1_id, etc.
end
#date_joined = angel_hash['created_at']
#locations_array = angel_hash['locations']
#high_concept = angel_hash['high_concept']
#high_concept_long = angel_hash['product_desc']
#thumbnail_logo = angel_hash['thumb_url']
#full_size_logo = angel_hash['logo_url']
#angel_url = angel_hash['angellist_url']
#twitter_url = angel_hash['twitter_url']
#company_url = angel_hash['company_url']
#blog_url = angel_hash['blog_url']
end
end
end
Here's a direct answer to your question: you can define arbitrary instance variable by calling instance_variable_set.
#markets_array.each_with_index do |market, i|
market.each do |k, v|
instance_variable_set "market#{i}_#{k}", v
# this will define #market0_id = 448
end
end

Resources