Polymorphic HashTable like datastructure - data-structures

I'm creating a client side cache object and one of the consumers of the cache needs a means of looking up data by type. Obviously I can't just have a map from class to data since that wouldn't retrieve subtypes of the class. Is there a 'standard' or well suited data structure for this kind of thing?

Instead of using a HashTable, it'd be easier to use a tree, since a tree would easily represent the type heirarchy.

If you key you HashMap by Class objects it will behave exactly as you expect it to. There will be no lookup on "subtype".

Related

Data Structure - Abstract data type VS Concrete data type

Abstract data type (ADT) : Organized data and operations on this data
Examples : Stack, Queue
what's meaning of Concrete data type (CDT)?
please explain by Examples.
One way to understand it is that an ADT is a specification of an object with certain methods.
For example if we talk about a List we are referring to an object that performs list operations such as:
add to the beginning
add to the end
insert at position
size
etc..
A "concrete data type" in this context would refer to the actual Data Structure you use to implement the list.
For example, one implementation of a List is to create nodes with a value and next pointer to point to the next node in the list.
Another is to have a value array, and a next array to tell you where the next node is (this is a more popular implementation for parallelism).
And yet another is to have a dynamic array (known as an ArrayList in Java) where you use an array till it fills up and then you duplicate it's size and copy the values to the new array.
So the concrete data type refers to the data structure actually being used, whereas the ADT is the abstract concept like List, Dictionary, Stack, Queue, Graph, etc..
There are many ways to implement an ADT.

java customize a hashmap values

I am working on using a real time application in java, I have a data structure that looks like this.
HashMap<Integer, Object> myMap;
now this works really well for storing the data that I need but it kills me on getting data out. The underlying problems that I run into is that if i call
Collection<Object> myObjects = myMap.values();
Iterator<object> it = myObjects.iterator();
while(it.hasNext(){ object o = it.next(); }
I declare the iterator and collection as variable in my class, and assign them each iteration, but iterating over the collection is very slow. This is a real time application so need to iterate at least 25x per second.
Looking at the profiler I see that there is a new instance of the iterator being created every update.
I was thinking of two ways of possibly changing the hashmap to possibly fix my problems.
1. cache the iterator somehow although i'm not sure if that's possible.
2. possibly changing the return type of hashmap.values() to return a list instead of a collection
3. use a different data structure but I don't know what I could use.
If this is still open use Google Guava collections. They have things like multiMap for the structures you are defining. Ok, these might not be an exact replacement, but close:
From the website here: https://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained
Every experienced Java programmer has, at one point or another, implemented a Map> or Map>, and dealt with the awkwardness of that structure. For example, Map> is a typical way to represent an unlabeled directed graph. Guava's Multimap framework makes it easy to handle a mapping from keys to multiple values. A Multimap is a general way to associate keys with arbitrarily many values.

hadoop CustomWritables

I have more of a design question regarding the necessity of a CustomWritable for my use case:
So I have a document pair that I will process through a pipeline and write out intermediate and final data to HDFS. My key will be something like ObjectId - DocId - Pair - Lang. I do not see why/if I will need a CustomWritable for this use case. I guess if I did not have a key, I would need a CustomWritable? Also, when I write data out to HDFS in the Reducer, I use a Custom Partitioner. So, that would kind of eliminate my need for a Custom Writable?
I am not sure if I got the concept of the need for a Custom Writable right. Can someone point me in the right direction?
Writables can be used for de/serializing objects. For example a log entry can contain a timestamp, an user IP and the browser agent. So you should implement your own WritableComparable for a key that identifies this entry and you should implement a value class that implements Writable that reads and writes the attributes in your log entry.
These serializations are just a handy way to get the data from a binary format to an object. Some Frameworks like HBase still require byte arrays to persist the data. So you'll have a lot of overhead in transfering this by yourself and messes up your code.
Thomas' answer explains a bit. Its way too late but I'd like to add the following for prospective readers:
Partitioner only comes into play between the map and reduce phase and has no role to play in writing from reducer to output files.
I don't believe writing INTERMEDIATE data to hdfs is a requirement in most cases, although there are some hacks that can be applied to do the same.
When you write from a reducer to hdfs, the keys will automatically be sorted and each reducer will write to ONE SEPARATE file. Based on their compareTo method, keys are sorted. So if you want to sort based on multiple variables, go for a Custom key class that extends WritableComparable, and implement the write, readFields and compareTo methods. You can now control the way the keys are sorted, based on the compareTo implementation

What is the best data structure for this in-memory lookup table?

I need to store a lookup table as an instance member in one of my classes. The table will be initialized when the object is constructed. Each "row" will have 3 "columns":
StringKey (e.g., "car")
EnumKey (e.g., LookupKeys.Car)
Value (e.g, "Ths is a car.")
I want to pick the data structure that will yield the best performance for doing lookups either by the StringKey or the EnumKey.
It's kind of awkward having 2 keys for the same dictionary value. I've never encountered this before, so I'm wondering what the norm is for this type of thing.
I could make a Key/Value/Value structure instead of Key/Key/Value, but I'm wondering what type of performance impact that would have.
Am I thinking about this all wrong?
Well ... "Wrong" is a harsh way of putting it. I think that because the most common dictionary is "single key to value", and a lot of effort goes into providing efficient data structures for that (maps), it's often best to just use two of those, sharing the memory for the values if at all possible.
You have two hashmaps.
One from StringKey to value.
One from EnumKey to value.
You do not have to duplicate all the Value instances, those objects can be shared between the two hashmaps.
If it's a LOT of items, you might want to use two treemaps instead of two hashmaps. But the essential principle ("Share the Values") applies to both structures. One set of Values with two maps.
Is it really necessary to key into the same structure with both types of key? You probably don't need to rebuild a complex data structure yourself. You could do some sort of encapsulation for the lookup table so that you really have two lookup tables if memory is not an issue. You could use this encapsulating structure to simulate being able to pull out the value from the "same" structure with either type of key.
OR
If there is some way to map between the enum value and the string key you could go that route with only having one type of lookup table.
LINQ's ILookup(TKey, TElement) interface may help. Assuming your Dictionary is something like:
Dictionary<carKey, carValue> cars;
You could use:
ILookUp<carValue, carKey> lookup = cars.ToLookup(x => x.Value, x => x.Key);
(...actually I think I might have slightly misread the question - but an ILookUp might still fit the bill, but the key/value set might need to be the key and the enum.)
If every value is guaranteed to be accessible by both types of keys, another idea would be to convert one type of key to another. For example:
public Value getValue(String key)
{
dictionary.get(key); // normal way
}
public Value getValue(Enum enumKey)
{
String realKey = toKey(enumKey);
getValue(realKey); // use String key
}
You could have your Enum implement a toKey() method that returns their String key, or maybe have another dictionary that maps Enum values to the String counterparts.

NSCoder vs NSDictionary, when do you use what?

I'm trying to figure out how to decide when to use NSDictionary or NSCoder/NSCoding?
It seems that for general property lists and such that NSDictionary is the easy way to go that generates XML files that are easily editable outside of the application.
When dealing with custom classes that holds data or possibly other custom classes nested inside, it seems like NSCoder/NSCoding would be the better route since it will step through all the contained object classes and encode them as well when an archive command is used.
NSDictionary seems like it would take more work to get all the properties or data characteristics to a single level to be able to save it, where as NSCoder/NSCoding would automatically encode nested custom classes that implement the NSCoding interface.
Outside of it being binary data and not editable outside of your application is there a real reason to use one over the other? And along those lines is there an indicator of which way you should lean between the two? Am I missing something obvious?
Apple's documentation on object graphs has this to say:
Mac OS X serializations store a simple hierarchy of value objects, such as dictionaries, arrays, strings, and binary data. The serialization only preserves the values of the objects and their position in the hierarchy. Multiple references to the same value object might result in multiple objects when deserialized. The mutability of the objects is not maintained.
…
Mac OS X archives store an arbitrarily complex object graph. The archive preserves the identity of every object in the graph and all the relationships it has with all the other objects in the graph. When unarchived, the rebuilt object graph should, with few exceptions, be an exact copy of the original object graph.
The way I interpret this is that, if you want to store simple values, serialization (using an NSDictionary, for example) is a fine way to go. If you want to store an object graph of arbitrary types, with uniqueness and mutability preserved, using archives (with NSCoder, for example) is your best bet.
You may also want to read Apple's Archives and Serializations Programming Guide for Cocoa, of which the aforelinked page on object graphs is a part, as it covers this topic well.
I am NOT a big fan of using NSCoding/NSCoder/NSArchiver (we need to pick a name!) to serialise an object graph to a file.
Archives created in this way are incredibly fragile. If you save an object of class Foo then by golly you need to make sure when you load the data back in you have a class Foo in your application.
This makes NSCoder based serialisation difficult from the perspective of sharing files with other applications or even forwards compatibility with your future application.
I forgot to list what I would recommend.
NSCoding can be ok in certain situations: if you're just doing something quick and simple (although you do have to write a lot of code - two methods per class to be serialised). It can also be ok if you're not worried about compatibility with other applications.
Export/import via property lists (perhaps using the NSPropertyListSerializaion class) is a fine solution. XML based plists are easy to create and edit. Main advantage to plists is that you're not tying the file format to just your application.
You can also create your own XML based file format and read/write to it using NSXMLDocument API and friends. This really isn't much more work than using property lists.
I think you're a bit confused, NSDictionary is a data structure, it also happens to implement the NSCoding protocol. So in essence, you could either put all your data into a NSDictionary and have that encode itself later on, or you can implement the NSCoding protocol and encode your object tree using the NSCoder API. Based on the type of NSCoder object passed in to the encodeWithCoder: method, is the output of your encoding.

Resources