ConcurrentModificationException when processing HashMap - freemarker

I'm trying to put a HashMap<Object, List<Object>> into my dataModel, but when i call the template.process() method, I get the following exception:
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
at java.util.HashMap$KeyIterator.next(HashMap.java:828)
at freemarker.template.SimpleCollection$SimpleTemplateModelIterator.next(SimpleCollection.java:142)
at freemarker.core.IteratorBlock$Context.runLoop(IteratorBlock.java:157)
at freemarker.core.Environment.visit(Environment.java:351)
at freemarker.core.IteratorBlock.accept(IteratorBlock.java:95)
at freemarker.core.Environment.visit(Environment.java:196)
at freemarker.core.MixedContent.accept(MixedContent.java:92)
at freemarker.core.Environment.visit(Environment.java:196)
at freemarker.core.IteratorBlock$Context.runLoop(IteratorBlock.java:172)
at freemarker.core.Environment.visit(Environment.java:351)
at freemarker.core.IteratorBlock.accept(IteratorBlock.java:95)
at freemarker.core.Environment.visit(Environment.java:196)
at freemarker.core.MixedContent.accept(MixedContent.java:92)
at freemarker.core.Environment.visit(Environment.java:196)
at freemarker.core.Environment.process(Environment.java:176)
at freemarker.template.Template.process(Template.java:232)
After looking over some articles and older questions, I've tried to use a ConcurrentHashMap instead, to the same result. I've also tried making a copy using new HashMap<Object, List<Object>>(oldHashMap). Are there any other common fixes to this problem I could try?
EDIT: I know the general cause of ConcurrentModificationExceptions. Please only reply if you can help me understand why the framework Freemarker is throwing these exceptions, mkay? =)
Thanks!

The ConcurrentModificationException is caused by using an invalid iterator after the underlying collection has been changed. The only way to fix this is not changing the collection you are iterating over. In most cases this is not caused by multi-threading.
Simple Example:
//throws an exception in the second iteration
for(String s: list){
list.remove(s);//changes the collection
}
fix 1, not supported by all iterators:
Iterator<String> iter = list.iterator();
while(iter.hasNext()){
iter.next();
iter.remove();//iterator still valid
}
fix 2:
List<String> toRemove = ...;
for(String s: list){
toRemove.add(s);
}
list.removeAll(toRemove);

The exception means that, while you're iterating over the map, something has changed the map's contents.
Your best course of action is figure out what that "something" is. For example, it could be another thread, or it could be that you have a foreach loop and modify the map from within the loop.
It is very hard to give advice on how to best fix the problem until we understand what exactly is causing it and what the desired behaviour is.

You'll get this kind of problem on List and Map when doing something like this:
List<A> list = ...; //a list with few elements
for(A anObject : list){
list.add(anotherObject); //modify list inside the loop
}
The same goes with maps. The solution is to look for possible places where you might be modifying the map inside the loop over that map. Or if you are using a multi-threaded application, then it's possible that another thread is looping over the map while you are modifying it (or visa-versa). In such case you'll need to synchronize access to the map in both places: looping code and map modifying code.
There some info on it in the Java API for TreeMap here.
The iterators returned by the iterator
method of the collections returned by
all of this class's "collection view
methods" are fail-fast: if the map is
structurally modified at any time
after the iterator is created, in any
way except through the iterator's own
remove method, the iterator will throw
a ConcurrentModificationException.
Thus, in the face of concurrent
modification, the iterator fails
quickly and cleanly, rather than
risking arbitrary, non-deterministic
behavior at an undetermined time in
the future.
Note that the fail-fast behavior of an
iterator cannot be guaranteed as it
is, generally speaking, impossible to
make any hard guarantees in the
presence of unsynchronized concurrent
modification. Fail-fast iterators
throw ConcurrentModificationException
on a best-effort basis. Therefore, it
would be wrong to write a program that
depended on this exception for its
correctness: the fail-fast behavior of
iterators should be used only to
detect bugs.

Synchronise access to the hashmap so that only one thread can be accessing the hashmap at once.

Related

Should I use Map<string, Map<string, object>> or Map<string, object> and concat the two keys

I'm working on processing some data and I need to keep a previous state. That state data structure would be something like
type locationId = string;
type alarm = {
alarmId : string,
triggered: Date,
urgency: string
}
type stateData = Map<locationId, Map<alarmId, alarm>> OR Map<locationId+alarmId, alarm>;
In pseudocode it would look like:
for each alarm in alarmList
compare(lastState[locationId][alarm.alarmId], alarm)
or if I concat the two keys:
for each alarm in alarmList
compare(lastState[locationId + "-" + alarmId], alarm)
Which one is the best approach?
How you do this will depend on your data and how you want to access it, and perhaps most importantly how you want to think about it.
Consider the dictionary with a combined key. Something like
type alarmKey = {
locationId: string;
alarmId: string;
}
type alarms = Map<alarmKey, stateData>
With that design, you have to know the location and the alarm ID if you want to look up an alarm. There's no quick way, for example, to get all of the alarms for a particular location. Instead, you have to scan the entire dictionary looking for alarms where alarmKey.locationId="Desired location".
That might not be a bad thing. If your total number of alarms is small (i.e. the map isn't huge) or if getting the list of alarms for a single location isn't a common operation, then that's not a bad thing.
Do note, by the way, that if you go that route you'll need to define a hash code method that will create a hash code for the combined key. You don't say what language you're using. In C#, that method could be as simple as:
return (locationId+alarmId).GetHashCode();
(Yes, I know that it's a horribly inefficient way to compute a hash code for a combined key. But it will in fact work. I'll leave coming up with a better one as a detail to be resolved by the implementor.)
The other way, with a nested map, seems more flexible to me. That is:
type alarms = Map<locationId, Map<alarmId, stateData>>
That lets you easily get all alarms for a single location, and it's also easy to look up an individual alarm.
You're probably going to supply an accessor anyway, so either way will be just as easy to use. That is, regardless of which way you design it, you'll probably have a getter function:
alarm GetAlarm(locationId, alarmId)
{
}
And of course a corresponding setter function.
I don't have knowledge of your application, but when I've encountered this kind of thing in the past my immediate preference is for the nested maps because I find it more flexible. But I readily admit that it's largely a subjective thing, and dependent on the application.

java customize a hashmap values

I am working on using a real time application in java, I have a data structure that looks like this.
HashMap<Integer, Object> myMap;
now this works really well for storing the data that I need but it kills me on getting data out. The underlying problems that I run into is that if i call
Collection<Object> myObjects = myMap.values();
Iterator<object> it = myObjects.iterator();
while(it.hasNext(){ object o = it.next(); }
I declare the iterator and collection as variable in my class, and assign them each iteration, but iterating over the collection is very slow. This is a real time application so need to iterate at least 25x per second.
Looking at the profiler I see that there is a new instance of the iterator being created every update.
I was thinking of two ways of possibly changing the hashmap to possibly fix my problems.
1. cache the iterator somehow although i'm not sure if that's possible.
2. possibly changing the return type of hashmap.values() to return a list instead of a collection
3. use a different data structure but I don't know what I could use.
If this is still open use Google Guava collections. They have things like multiMap for the structures you are defining. Ok, these might not be an exact replacement, but close:
From the website here: https://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained
Every experienced Java programmer has, at one point or another, implemented a Map> or Map>, and dealt with the awkwardness of that structure. For example, Map> is a typical way to represent an unlabeled directed graph. Guava's Multimap framework makes it easy to handle a mapping from keys to multiple values. A Multimap is a general way to associate keys with arbitrarily many values.

Executing a certain action for all elements in an Enumerable<T>

I have an Enumerable<T> and am looking for a method that allows me to execute an action for each element, kind of like Select but then for side-effects. Something like:
string[] Names = ...;
Names.each(s => Console.Writeline(s));
or
Names.each(s => GenHTMLOutput(s));
// (where GenHTMLOutput cannot for some reason receive the enumerable itself as a parameter)
I did try Select(s=> { Console.WriteLine(s); return s; }), but it wasn't printing anything.
A quick-and-easy way to get this is:
Names.ToList().ForEach(e => ...);
You are looking for the ever-elusive ForEach that currently only exists on the List generic collection. There are many discussions online about whether Microsoft should or should not add this as a LINQ method. Currently, you have to roll your own:
public static void ForEach<T>(this IEnumerable<T> value, Action<T> action)
{
foreach (T item in value)
{
action(item);
}
}
While the All() method provides similar abilities, it's use-case is for performing a predicate test on every item rather than an action. Of course, it can be persuaded to perform other tasks but this somewhat changes the semantics and would make it harder for others to interpret your code (i.e. is this use of All() for a predicate test or an action?).
Disclaimer: This post no longer resembles my original answer, but rather incorporates the some seven years experience I've gained since. I made the edit because this is a highly-viewed question and none of the existing answers really covered all the angles. If you want to see my original answer, it's available in the revision history for this post.
The first thing to understand here is C# linq operations like Select(), All(), Where(), etc, have their roots in functional programming. The idea was to bring some of the more useful and approachable parts of functional programming to the .Net world. This is important, because a key tenet of functional programming is for operations to be free of side effects. It's hard to understate this. However, in the case of ForEach()/each(), side effects are the entire purpose of the operation. Adding each() or ForEach() is not just outside the functional programming scope of the other linq operators, but in direct opposition to them.
But I understand this feels unsatisfying. It may help explain why ForEach() was omitted from the framework, but fails to address the real issue at hand. You have a real problem you need to solve. Why should all this ivory tower philosophy get in the way of something that might be genuinely useful?
Eric Lippert, who was on the C# design team at the time, can help us out here. He recommends using a traditional foreach loop:
[ForEach()] adds zero new representational power to the language. Doing this lets you rewrite this perfectly clear code:
foreach(Foo foo in foos){ statement involving foo; }
into this code:
foos.ForEach(foo=>{ statement involving foo; });
His point is, when you look closely at your syntax options, you don't gain anything new from a ForEach() extension versus a traditional foreach loop. I partially disagree. Imagine you have this:
foreach(var item in Some.Long(and => possibly)
.Complicated(set => ofLINQ)
.Expression(to => evaluate))
{
// now do something
}
This code obfuscates meaning, because it separates the foreach keyword from the operations in the loop. It also lists the loop command prior to the operations that define the sequence on which the loop operates. It feels much more natural to want to have those operations come first, and then have the the loop command at the end of the query definition. Also, the code is just ugly. It seems like it would be much nicer to be able to write this:
Some.Long(and => possibly)
.Complicated(set => ofLINQ)
.Expression(to => evaluate)
.ForEach(item =>
{
// now do something
});
However, even here, I eventually came around to Eric's point of view. I realized code like you see above is calling out for an additional variable. If you have a complicated set of LINQ expressions like that, you can add valuable information to your code by first assigning the result of the LINQ expression to a new variable:
var queryForSomeThing = Some.Long(and => possibly)
.Complicated(set => ofLINQ)
.Expressions(to => evaluate);
foreach(var item in queryForSomeThing)
{
// now do something
}
This code feels more natural. It puts the foreach keyword back next to the rest of the loop, and after the query definition. Most of all, the variable name can add new information that will be helpful to future programmers trying to understand the purpose of the LINQ query. Again, we see the desired ForEach() operator really added no new expressive power to the language.
However, we are still missing two features of a hypothetical ForEach() extension method:
It's not composable. I can't add a further .Where() or GroupBy() or OrderBy() after a foreach loop inline with the rest of the code, without creating a new statement.
It's not lazy. These operations happen immediately. It doesn't allow me to, say, have a form where a user chooses an operation as one field in a larger screen that is not acted on until the user presses a command button. This form might allow the user to change their mind before executing the command. This is perfectly normal (easy even) with a LINQ query, but not as simple with a foreach.
(FWIW, most naive .ForEach() implementations also have these issues. But it's possible to craft one without them.)
You could, of course, make your own ForEach() extension method. Several other answers have implementations of this method already; it's not all that complicated. However, I feel like it's unnecessary. There's already an existing method that fits what we want to do from both semantic and operational standpoints. Both of the missing features above can be addressed by use of the existing Select() operation.
Select() fits the kind of transformation or projection described by both of the examples above. Keep in mind, though, that I would still avoid creating side effects. The call to Select() should return either new objects or projections from the originals. This can sometimes be aided through the use of an anonymous type or dynamic object (if and only if necessary). If you need the results to persist in, say, an original list variable, you can always call .ToList() and assign it back to your original variable. I'll add here that I prefer working with IEnumerable<T> variables as much as possible over more concrete types.
myList = myList.Select(item => new SomeType(item.value1, item.value2 *4)).ToList();
In summary:
Just stick with foreach most of the time.
When foreach really won't do (which probably isn't as often as you think), use Select()
When you need to use Select(), you can still generally avoid (program-visible) side effects, possibly by projecting to an anonymous type.
Avoid the crutch of calling ToList(). You don't need it as much as you might think, and it can have significant negative consequence for performance and memory use.
Unfortunately there is no built-in way to do this in the current version of LINQ. The framework team neglected to add a .ForEach extension method. There's a good discussion about this going on right now on the following blog.
http://blogs.msdn.com/kirillosenkov/archive/2009/01/31/foreach.aspx
It's rather easy to add one though.
public static void ForEach<T>(this IEnumerable<T> enumerable, Action<T> action) {
foreach ( var cur in enumerable ) {
action(cur);
}
}
You cannot do this right away with LINQ and IEnumerable - you need to either implement your own extension method, or cast your enumeration to an array with LINQ and then call Array.ForEach():
Array.ForEach(MyCollection.ToArray(), x => x.YourMethod());
Please note that because of the way value types and structs work, if the collection is of a value type and you modify the elements of the collection this way, it will have no effect on the elements of the original collection.
Because LINQ is designed to be a query feature and not an update feature you will not find an extension which executes methods on IEnumerable<T> because that would allow you to execute a method (potentially with side effects). In this case you may as well just stick with
foreach(string name in Names)
Console.WriteLine(name);
Using Parallel Linq:
Names.AsParallel().ForAll(name => ...)
Well, you can also use the standard foreach keyword, just format it into a oneliner:
foreach(var n in Names.Where(blahblah)) DoStuff(n);
Sorry, thought this option deserves to be here :)
There is a ForEach method off of List. You could convert the Enumerable to List by calling the .ToList() method, and then call the ForEach method off of that.
Alternatively, I've heard of people defining their own ForEach method off of IEnumerable. This can be accomplished by essentially calling the ForEach method, but instead wrapping it in an extension method:
public static class IEnumerableExtensions
{
public static IEnumerable<T> ForEach<T>(this IEnumerable<T> _this, Action<T> del)
{
List<T> list = _this.ToList();
list.ForEach(del);
return list;
}
}
As mentioned before ForEach extension will do the fix.
My tip for the current question is how to execute the iterator
[I did try Select(s=> { Console.WriteLine(s); return s; }), but it wasn't printing anything.]
Check this
_= Names.Select(s=> { Console.WriteLine(s); return 0; }).Count();
Try it!

Do you ToList()?

Do you have a default type that you prefer to use in your dealings with the results of LINQ queries?
By default LINQ will return an IEnumerable<> or maybe an IOrderedEnumerable<>. We have found that a List<> is generally more useful to us, so have adopted a habit of ToList()ing our queries most of the time, and certainly using List<> in our function arguments and return values.
The only exception to this has been in LINQ to SQL where calling .ToList() would enumerate the IEnumerable prematurely.
We are also using WCF extensively, the default collection type of which is System.Array. We always change this to System.Collections.Generic.List in the Service Reference Settings dialog in VS2008 for consistency with the rest of our codebase.
What do you do?
ToList always evaluates the sequence immediately - not just in LINQ to SQL. If you want that, that's fine - but it's not always appropriate.
Personally I would try to avoid declaring that you return List<T> directly - usually IList<T> is more appropriate, and allows you to change to a different implementation later on. Of course, there are some operations which are only specified on List<T> itself... this sort of decision is always tricky.
EDIT: (I would have put this in a comment, but it would be too bulky.) Deferred execution allows you to deal with data sources which are too big to fit in memory. For instance, if you're processing log files - transforming them from one format to another, uploading them into a database, working out some stats, or something like that - you may very well be able to handle arbitrary amounts of data by streaming it, but you really don't want to suck everything into memory. This may not be a concern for your particular application, but it's something to bear in mind.
We have the same scenario - WCF communications to a server, the server uses LINQtoSQL.
We use .ToArray() when requesting objects from the server, because it's "illegal" for the client to change the list. (Meaning, there is no purpose to support ".Add", ".Remove", etc).
While still on the server, however, I would recommend that you leave it as it's default (which is not IEnumerable, but rather IQueryable). This way, if you want to filter even more based on some criteria, the filtering is STILL on the SQL side until evaluated.
This is a very important point as it means incredible performance gains or losses depending on what you do.
EXAMPLE:
// This is just an example... imagine this is on the server only. It's the
// basic method that gets the list of clients.
private IEnumerable<Client> GetClients()
{
var result = MyDataContext.Clients;
return result.AsEnumerable();
}
// This method here is actually called by the user...
public Client[] GetClientsForLoggedInUser()
{
var clients = GetClients().Where(client=> client.Owner == currentUser);
return clients.ToArray();
}
Do you see what's happening there? The "GetClients" method is going to force a download of ALL 'clients' from the database... THEN the Where clause will happen in the GetClientsForLoogedInUser method to filter it down.
Now, notice the slight change:
private IQueryable<Client> GetClients()
{
var result = MyDataContext.Clients;
return result.AsQueryable();
}
Now, the actual evaluation won't happen until ".ToArray" is called... and SQL will do the filtering. MUCH better!
In the Linq-to-Objects case, returning List<T> from a function isn't as nice as returning IList<T>, as THE VENERABLE SKEET points out. But often you can still do better than that. If the thing you are returning ought to be immutable, IList is a bad choice because it invites the caller to add or remove things.
For example, sometimes you have a method or property that returns the result of a Linq query or uses yield return to lazily generate a list, and then you realise that it would be better to do that the first time you're called, cache the result in a List<T> and return the cached version thereafter. That's when returning IList may be a bad idea, because the caller may modify the list for their own purposes, which will then corrupt your cache, making their changes visible to all other callers.
Better to return IEnumerable<T>, so all they have is forward iteration. And if the caller wants rapid random access, i.e. they wish they could use [] to access by index, they can use ElementAt, which Linq defines so that it quietly sniffs for IList and uses that if available, and if not it does the dumb linear lookup.
One thing I've used ToList for is when I've got a complex system of Linq expressions mixed with custom operators that use yield return to filter or transform lists. Stepping through in the debugger can get mighty confusing as it jumps around doing lazy evaluation, so I sometimes temporarily add a ToList() to a few places so that I can more easily follow the execution path. (Although if the things you are executing have side-effects, this can change the meaning of the program.)
It depends if you need to modify the collection. I like to use an Array when I know that no one is going to add/delete items. I use a list when I need to sort/add/delete items. But, usually I just leave it as IEnumerable as long as I can.
If you don't need the added features of List<>, why not just stick with IQueryable<> ?!?!?! Lowest common denominator is the best solution (especially when you see Timothy's answer).

Using return statements to great effect!

When I am making methods with return values, I usually try and set things up so that there is never a case when the method is called in such a way that it would have to return some default value. When I started I would often write methods that did something, and would either return what they did or, if they failed to do anything, would return null. But I hate having ugly if(!null) statements all over my code,
I'm reading a re-guide to ruby that I read many moons ago, by the pragmatic programmers, and I notice that they often return self (ruby's this) when they wouldn't normally return anything. This is, they say, in order to be able to chain method calls, as in this example using setters that return the object whose attributes they set.
tree.setColor(green).setDecor(gaudy).setPractical(false)
Initially I find this sort of thing attractive. There have been a couple of times when I have rejoiced at being able to chain method calls, like Player.getHand().getSize() but this is somewhat different in that the object of the method call changes from step to step.
What does Stack Overflow think about return values? Are there any patterns or idioms that come to mind warmly when you think of return values? Any great ways to avoid frustration and increase beauty?
In my humble opinion, there are three kinds of return-cases that you should take into consideration:
Object property manipulation
The first is the manipulation of object properties. The pattern you describe here is very often used when manipulating objects. A very typical scenario is using it together with a factory. Consider this hypothetical creation call:
// When the object has manipulative methods:
Pizza p = PizzaFactory().create().addAnchovies().addTomatoes();
// When the factory has manipulative methods working on the
// object, IMHO more elegant from a semantic point of view:
Pizza p = PizzaFactory().create().addAnchovies().addTomatoes().getPizza();
It allows for a quick grasp at what exactly is being created or how an object is manipulated, because the methods form one human-readable expression. It's definitely nice, but don't overuse. A rule of thumb is that this might be used with methods whose return value you could also declare as void.
Evaluating object properties
The second might be when a method evaluates something on an object. Consider, for example, the method car.getCurrentSpeed(), that could be interpreted as a message to an object asking for the current speed and returning that. It would simply return the value, not too complicated. :)
Make object do this or that
The third might be when a method makes an perform an operation, returning some sort of value indicating how well the caller's intention was fulfilled - but laying out such a method could be difficult:
int new_gear = 20;
if (car.gears.changeGear(new_gear)) // does that mean success or fail?
This is where you can see a difficulty in designing the method. Should it return 0 upon success or failure? How about -1 if the gear could not be set, because the car only has 5 gears? Does that mean the current gear is at -1 now, too? The method could return the gear it changed to, meaning you would have to compare the argument supplied to the method to the return code. That would work. On the other hand, you could simply return either true or false for failure or false or true for failure. Which one to use could be decided by estimating if you'd expect those method calls to rather fail or succeed.
In my humble opinion, there is a way to better express the semantics of such return values, by giving them a semantic description. Future developers interacting with your objects will love you for not having to look up the comments or documentation for your methods:
class GearSystem {
// (...)
public:
enum GearChangeResult
{ GearChangeSuccess, NonExistingGear, MechanicalGearProblem };
GearChangeResult changeGear (int gear);
};
That way, it becomes perfectly obvious for any programmer looking at your code, what the return value means (consider: if (gears.changeGear(20) == GearSystem::GearChangeSuccess) - much clearer what that means than the example above)
Antipattern: Failures as return codes.
The fourth possibility for a return value I actually omitted, because in my opinion it isn't any: when there's an error in your program, like a logic error or a failure that needs to be dealt with - you could theoretically return a value indicating so. But today, that's not done so often anymore (or should not be), because for that, there are exceptions.
I don't agree that methods should never return null. The most obvious examples are from systems programming. For instance, if someone asks to open a file, you simply have to give them null if the open fails. There is no sane alternative. There are other cases where null is appropriate, such as a getNextNode(node) method, when called on the last node of a linked list. So I guess what these cases have in common is that null represents "no object" (either no file handle or no list node), which makes sense.
In other cases, the method should never fail, and there is an appropriate exception facility. Then, I think method chaining like your example can be used to great effect. I think it's a bit funny that you seem to believe this is an innovation of the "Pragmatic Programmers". In fact, it dates to Lisp if not before.
Returning this is also used in the "builder pattern", another case where method chaining can enhance readability as well as writing convenience.
A null is often returned as an out-of-band value to indicate that no result could be produced. I believe that this is perfectly reasonable when getting no result is a normal event; examples would include a null return from readLine() at end-of-file, or a null returned when providing a non-existent key to the get(...) method of a Map. Reading to the end of the file is normal behavior (as opposed to an IOException, which indicates that something went abnormally wrong while trying to read). Similarly, looking up a key and being told that it has no value is a normal case.
A good alternative to null for some cases is a "null object", which is a full-fledged instance of the result class, but which has appropriate state and behavior for a "nobody's home" case. For instance, the result of looking up a non-existent user ID might well be a NullUser object which has a zero-length name and no permissions to do anything in the system.
It's confusing to me. OO programming languages need Smalltalk's semicolon:
tree color: green;
decor: gaudy;
practical: false.
obj method1; method2. means "call method1 on obj then method2 on obj". This kind of object setup is very common.

Resources