Recursion algorithms: suggested patterns and practices? - algorithm

I am writing a utility that reflects on two object graphs and returns a value to indicate whether the graphs are identical or not. It got me thinking, is there a generally accepted pattern for writing a recursion algorithm that returns a value from some where in the recursion?
My solution would probably use a ref parameter and look something like this pseudo code:
public static bool IsChanged(T current, T previous)
{
bool isChanged = false;
CheckChanged(current, previous, ref isChanged);
return isChanged ;
}
private static void CheckChanged(T current, T previous, ref isChanged)
{
//perform recursion
if (graphIsChanged)
isChanged = true;
else
CheckChanged(current, previous, ref isChanged);
}
Is there a better / cleaner / more efficient way? Is there a general pattern for such a function?

I don't see any benefits of your version when compared to this highly trivial version:
public static bool IsChanged(T current, T previous)
{
//perform recursion
if (graphIsChanged)
return true;
else
return IsChanged(current, previous);
}
As an added benefit, some compilers are able to use tail call optimization to turn this version into a simple loop, which is more effective.

Tail recursion isn't just more effective, it keeps you from blowing out the stack on deep recursion:
http://en.wikipedia.org/wiki/Tail_recursion
That is to say, it prevents "Stack Overflow" :)
http://en.wikipedia.org/wiki/Stack_overflow

I've always been a fan of having an actual return value from a recursive function, not just passing in a reference to a variable. I[m not really sure what you're trying to do in your sample, but why not just return a bool from CheckChanged?

Related

Benefit of defining success/failure function instead of using success/!success

I was reading man page of gearman code (http://manpages.ubuntu.com/manpages/precise/man3/gearman_success.3.html). They are having two functions
bool gearman_success(gearman_return_t rc)
bool gearman_failed(gearman_return_t rc)
And code of those functions look like (libgearman-1.0/return.h):
static inline bool gearman_failed(enum gearman_return_t rc)
{
return rc != GEARMAN_SUCCESS;
}
static inline bool gearman_success(enum gearman_return_t rc)
{
return rc == GEARMAN_SUCCESS;
}
Both function does nearly same thing. One return true and another false. What is the benefit of this code ?
Why not just have
!gearman_success
Is there benefit of coding pattern or something , which I am missing here.
This code is easier to extend. Suppose you add another value to that enum:
GEARMAN_SUCCESS_BUT_HAD_WARNINGS
With the implementation you're looking at, all you have to do is adjust both methods. Without it, you'd have to go through every place GEARMAN_SUCCESS is used all over the code base and make sure that the new enum value is handled properly.

Does Java optimize new HashSet(someHashSet).contains() to be O(1)?

This class is an example of where the issue arises:
public class ContainsSet {
private static HashSet<E> myHashSet;
[...]
public static Set<E> getMyHashSet() {
return new HashSet<E>(myHashSet);
}
public static boolean doesMyHashSetContain(E e) {
return myHashSet.contains(e);
}
}
Now imagine two possible functions:
boolean method1() {
return ContainsSet.getMyHashSet().contains(someE);
}
boolean method2() {
return ContainsSet.doesMyHashSetContain(someE);
}
Now is my question whether or not method 1 will have the same time complexity after Java optimization as method 2.
(I used HashSet instead of just Set to emphasize that myHashSet.contains(someE) has complexity O(1).)
Without optimization it would not. Although .contains() has complexity O(1), the new HashSet<E>(myHashSet) has complexity O(n), which would give method 1 a complexity of O(n) + O(1) = O(n), which is horrible compared to the beloved O(1).
The reason why I this issue is imported is because I am teached not to return lists or sets if you will not allow an external class to change the contents of it. Returning a copy is an obvious solution, but it can be really time-consuming.
No, javac does not (and cannot) optimize this away. It's required to emit the byte code you describe in your source to this level. And the JVM will not be nearly intelligent enough to optimize this away. It's way too likely to have side effects to prove.
Don't return a copy of the HashSet if you want immutability. Wrap it in an unmodifiable wrapper: Collections.unmodifiableSet(myHashSet)
What can I say here but that creating a new HashSet and populating it via the constructor is expensive!
Java will not "optimize away" this work: Even though you and I know it would give the same result as "passing through" the contains() call, java can not know this.
No. That is beyond optimization. You returned a new object and you could use it in somewhere else, Java is not supposed to omit it. A new HashSet will be created.
This is not a good practice to return a copy. It's not only time consuming but also space consuming. As Sean said, you can wrap with unmodifiableSet or you can wrap it in your own class.
You can try this:
public static Set<E> getMyHashSet() {
return Collection.unmodifiableSortedSet(myHashSet);
}
Note: use that method will return a view of your set, not a copy.

Is LINQ's Any method efficient?

Does the Any method in LINQ iterated over the entire collection or return true when the first successful iteration occurs?
The Any method will only iterate over the minimum number of elements necessary. As soon as it finds a matching element it will return immediately
It's roughly implemented as follows
public static bool Any<T>(this IEnumerable<T> enumerable, Func<T, bool> predicate) {
foreach (var cur in enumerable) {
if (predicate(cur)) {
return true;
}
}
return false;
}
In the worst case (none or last matching) it will visit all elements. In the best case (first matching) it will only visit 1
The latter - you can look at the code with ReSharper to verify that if you download a trial version.
As to whether Any is efficient - it's not when e.g. a Count property is available as an alternative. But it does arguably express intent well.
Any returns true as soon as it finds a successful match to the predicate, though if none exist, it will have iterated across the entire collection.

How to avoid Linq chaining to return null?

I have a problem with code contracts and linq. I managed to narrow the issue to the following code sample. And now I am stuck.
public void SomeMethod()
{
var list = new List<Question>();
if (list.Take(5) == null) { }
// resharper hints that condition can never be true
if (list.ForPerson(12) == null) { }
// resharper does not hint that condition can never be true
}
public static IQueryable<Question> ForPerson(this IQueryable<Question> source, int personId)
{
if(source == null) throw new ArgumentNullException();
return from q in source
where q.PersonId == personId
select q;
}
What is wrong with my linq chain? Why doesn't resharper 'complain' when analyzing the ForPerson call?
EDIT: return type for ForPerson method changed from string to IQueryable, which I meant. (my bad)
Reshaper is correct that the result of a Take or Skip is never null - if there are no items the result is an IEnumerable<Question> which has no elements. I think to do what you want you should check Any.
var query = list.Take(5);
if (!query.Any())
{
// Code here executes only if there were no items in the list.
}
But how does this warning work? Resharper cannot know that the method never returns null from only looking at the method definition, and I assume that it does not reverse engineer the method body to determine that it never returns null. I assume therefore that it has been specially hard-coded with a rule specifying that the .NET methods Skip and Take do not return null.
When you write your own custom methods Reflector can make assumptions about your method behaviour from the interface, but your interface allows it to return null. Therefore it issues no warnings. If it analyzed the method body then it could see that null is impossible and would be able to issue a warning. But analyzing code to determine its possible behaviour is an incredibly difficult task and I doubt that Red Gate are willing to spend the money on solving this problem when they could add more useful features elsewhere with a much lower development cost.
To determine whether a boolean expression can ever return true is called the Boolean satisfiability problem and is an NP-hard problem.
You want Resharper to determine whether general method bodies can ever return null. This is a generalization of the above mentioned NP-hard problem. It's unlikely any tool will ever be able to do this correctly in 100% of cases.
if(source == null) throw new ArgumentNullException();
That's not the code contract way, do you instead mean:
Contract.Require(source != null);

How does LINQ implement the SingleOrDefault() method?

How is the method SingleOrDefault() evaluated in LINQ? Does it use a Binary Search behind the scenes?
Better than attempting to explain in words, I thought I'd just post the exact code of implementation in the .NET Framework, retrieved using the Reflector program (and reformatted ever so slightly).
public static TSource SingleOrDefault<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
throw Error.ArgumentNull("source");
IList<TSource> list = source as IList<TSource>;
if (list != null)
{
switch (list.Count)
{
case 0:
return default(TSource);
case 1:
return list[0];
}
}
else
{
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
if (!enumerator.MoveNext())
return default(TSource);
TSource current = enumerator.Current;
if (!enumerator.MoveNext())
return current;
}
}
throw Error.MoreThanOneElement();
}
It's quite interesting to oberserve that an optimisation is made if the object is of type IList<T>, which seems quite sensible. It simply falls back to enumerating over the object otherwise if the object implements nothing more specific than IEnumerable<T>, and does so just how you'd expect.
Note that it can't use a binary search because the object doesn't necessarily represent a sorted collection. (In fact, in almost all usage cases, it won't.)
I would assume that it simply performs the query and if the result count is zero, it returns the default instance of the class. If the result count is one, it returns that instance, and if the result count is greater than one, it throws an exception.
I don't think it does any searching, it's all about getting the first element of the source [list, result set, etc].
My best guess is that it just pulls the first element. If there is no first it returns the default (null, 0, false, etc). If there is a first, it attempts to pull the second result. If there is a second result it throws an exception. Otherwise it returns the first result.

Resources