Design choice: Any good reason for 'ToArray()' LINQ extension to throw an exception for null collections? - linq

Unsurprisingly, following code will throw an ArgumentNullException
IEnumerable<string> collection = null;
string[] collectionViewAsAnArray = collection.ToArray();
This looks obvious at first sight ... but ain't incoherent to argue that returning null may have been a reasonnable alternative (Accounting that ToArray() is an extension method and therefor can be called, even on null).
While I acknowledge, this way, extension behave like a real method, I can't help finding the other approach really smart too ... but this may lead to other issues?

Granted IEnumerable is an interface used in collections but the underlying implementation is an object and you have set that collection variable to point to null, hence the exception.
If on the other hand you initialized collection with:
IEnumerable<string> collection = Enumerable.Empty<string>();
OR
IEnumerable<string> collection = new List<string>();
You would have an empty list object that you could act on. The ArgumentNullException exception is thrown because the collection argument is in fact null and that is what ToArray() is trying to act on. So logically, to me anyways, this was the only design choice.
Edit
On the other hand, in practice, I've made the conscious decision to always return a valid IEnumerable<T> when the return type on my class methods are supposed to return an IEnumerable<T>.
For example; a method signature that looks like IEnumerable<T> GetAll() would always return a valid enumerable and If there was nothing to return then I would return return Enumerable.Empty<T>();
The difference here to me is that GetAll() is not acting on a collection argument. You could really look at this like the collection is really nothing more than a parameter to the method and if you passed in a null parameter to a regular method you would probably throw an ArgumentNullException.
The short answer is that your collection variable is an expected argument (or parameter) to the ToArray() method and it is null so it makes sense to throw an ArgumentNullException.

Why push the error to some other spot in your code? If you expect that something can be null, check it before you use it. Otherwise throw an error exactly at the spot where your assumption was false, namely when you tried to convert it to an array.
I understand it can be debatable when you're chaining operations but in those cases I've found it easier to work with an empty IEnumerable rather than a null one.

Related

How do I return different response in the webflux based on whether the Flux object has elements?

I know there is a function named "hasElements" on a Flux object. But it behaves a bit strange!
Flux<RoomBO> rooms=serverRequest.bodyToMono(PageBO.class).flatMapMany(roomRepository::getRooms);
return rooms.hasElements().flatMap(aBool -> aBool?ServerResponse.ok().body(rooms,RoomBO.class):ServerResponse.badRequest().build());
return ServerResponse.ok().body(rooms,RoomBO.class)
The second return statement can return the right things I need when the flux object is not empty,but the first return statement only returns a empty array,which likes "[]" in json.I don't know why this could happen!I use the same data to test.The only difference is that I call the hasElements function in the first situation.But I need to return badRequest when the flux object is empty. And the hasElements function seems to make my flux object empty,though I know it doesn't do this actually.
well, finally I decide to call switchIfEmpty(Mono.error()) to throw an error and then I deal with the special error globally(Sometimes it's not suitable to use onErrorReturn or onErrorResume). I think this can avoid the collect operation in memory when meets big data. But it's still not a good solution for the global error handler can be hard to maintain. I'd expect someone to give a better solution.
In your example you are transforming class Flux to class RoomBO, it is one of the reasons, why you get an empty array.
If you need to return the processed list of rooms, then, I think, collectList should be your choice. https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Flux.html#collectList--
Flux<RoomBO> roomsFlux = serverRequest.bodyToMono(PageBO.class)
.flatMapMany(roomRepository::getRooms);
return rooms
.collectList()
.flatMap(rooms -> !rooms.isEmpty() ? ServerResponse.ok().bodyValue(rooms) : ServerResponse.badRequest().build());

Sense of Optional.orElse() [duplicate]

Why does this throw a java.lang.NullPointerException?
List<String> strings = new ArrayList<>();
strings.add(null);
strings.add("test");
String firstString = strings.stream()
.findFirst() // Exception thrown here
.orElse("StringWhenListIsEmpty");
//.orElse(null); // Changing the `orElse()` to avoid ambiguity
The first item in strings is null, which is a perfectly acceptable value. Furthermore, findFirst() returns an Optional, which makes even more sense for findFirst() to be able to handle nulls.
EDIT: updated the orElse() to be less ambiguous.
The reason for this is the use of Optional<T> in the return. Optional is not allowed to contain null. Essentially, it offers no way of distinguishing situations "it's not there" and "it's there, but it is set to null".
That's why the documentation explicitly prohibits the situation when null is selected in findFirst():
Throws:
NullPointerException - if the element selected is null
As already discussed, the API designers do not assume that the developer wants to treat null values and absent values the same way.
If you still want to do that, you may do it explicitly by applying the sequence
.map(Optional::ofNullable).findFirst().flatMap(Function.identity())
to the stream. The result will be an empty optional in both cases, if there is no first element or if the first element is null. So in your case, you may use
String firstString = strings.stream()
.map(Optional::ofNullable).findFirst().flatMap(Function.identity())
.orElse(null);
to get a null value if the first element is either absent or null.
If you want to distinguish between these cases, you may simply omit the flatMap step:
Optional<String> firstString = strings.stream()
.map(Optional::ofNullable).findFirst().orElse(null);
System.out.println(firstString==null? "no such element":
firstString.orElse("first element is null"));
This is not much different to your updated question. You just have to replace "no such element" with "StringWhenListIsEmpty" and "first element is null" with null. But if you don’t like conditionals, you can achieve it also like:
String firstString = strings.stream()
.map(Optional::ofNullable).findFirst()
.orElseGet(()->Optional.of("StringWhenListIsEmpty"))
.orElse(null);
Now, firstString will be null if an element exists but is null and it will be "StringWhenListIsEmpty" when no element exists.
You can use java.util.Objects.nonNull to filter the list before find
something like
list.stream().filter(Objects::nonNull).findFirst();
The following code replaces findFirst() with limit(1) and replaces orElse() with reduce():
String firstString = strings.
stream().
limit(1).
reduce("StringWhenListIsEmpty", (first, second) -> second);
limit() allows only 1 element to reach reduce. The BinaryOperator passed to reduce returns that 1 element or else "StringWhenListIsEmpty" if no elements reach the reduce.
The beauty of this solution is that Optional isn't allocated and the BinaryOperator lambda isn't going to allocate anything.
Optional is supposed to be a "value" type. (read the fine print in javadoc:) JVM could even replace all Optional<Foo> with just Foo, removing all boxing and unboxing costs. A null Foo means an empty Optional<Foo>.
It is a possible design to allow Optional with null value, without adding a boolean flag - just add a sentinel object. (could even use this as sentinel; see Throwable.cause)
The decision that Optional cannot wrap null is not based on runtime cost. This was a hugely contended issue and you need to dig the mailing lists. The decision is not convincing to everybody.
In any case, since Optional cannot wrap null value, it pushes us in a corner in cases like findFirst. They must have reasoned that null values are very rare (it was even considered that Stream should bar null values), therefore it is more convenient to throw exception on null values instead of on empty streams.
A workaround is to box null, e.g.
class Box<T>
static Box<T> of(T value){ .. }
Optional<Box<String>> first = stream.map(Box::of).findFirst();
(They say the solution to every OOP problem is to introduce another type :)

How can I intercept the result of an IQueryProvider query (other than single result)

I'm using Entity Framework and I have a custum IQueryProvider. I use the Execute method so that I can modify the result (a POCO) of a query after is has been executed. I want to do the same for collections. The problem is that the Execute method is only called for single result.
As described on MSDN :
The Execute method executes queries that return a single value
(instead of an enumerable sequence of values). Expression trees that
represent queries that return enumerable results are executed when
their associated IQueryable object is enumerated.
Is there another way to accomplish what I want that I missed?
I know I could write a specific method inside a repository or whatever but I want to apply this to all possible queries.
This is true that the actual signature is:
public object Execute(Expression expression)
public TResult Execute<TResult>(Expression expression)
However, that does not mean that the TResult will always be a single element! It is the type expected to be returned from the expression.
Also, note that there are no constraints over the TResult, not even 'class' or 'new()'.
The TResult is a MyObject when your expression is of singular result, like .FirstOrDefault(). However, the TResult can also be a double when you .Avg() over the query, and also it can be IEnumerable<MyObject> when your query is plain .Select.Where.
Proof(*) - I've just set a breakpoint inside my Execute() implementation, and I've inspected it with Watches:
typeof(TResult).FullName "System.Collections.Generic.IEnumerable`1[[xxxxxx,xxxxx]]"
expression.Type.FullName "System.Linq.IQueryable`1[[xxxxxx,xxxxx]]"
I admit that three overloads, one object, one TResult and one IEnumerable<TResult> would probably be more readable. I think they did not place three of them as extensibility point for future interfaces. I can imagine that in future they came up with something more robust than IEnumerable, and then they'd need to add another overload and so on. With simple this interface can process any type.
Oh, see, we now also have IQueryable in addition to IEnumerable, so it would need at least four overloads:)
The Proof is marked with (*) because I have had a small bug/feature in my IQueryProvider's code that has is obscuring the real behavior of LINQ.
LINQ indeed calls the generic Execute only for singular cases. This is a shortcut, an optimization.
For all other cases, it ... doesn't call Execute() it at all
For those all other cases, the LINQ calls .GetEnumerator on your custom IQueryable<> implementation, that what happens is dictated by .. simply what you wrote there. I mean, assuming that you actually provided custom implementations of IQueryable. That would be strange if you did not - that's just about 15 lines in total, nothing compared to the length of custom provider.
In the project where I got the "proof" from, my implementation looks like:
public System.Collections.IEnumerator GetEnumerator()
{
return Provider.Execute<IEnumerable>( this.Expression ).GetEnumerator();
}
public IEnumerator<TOut> GetEnumerator()
{
return Provider.Execute<IEnumerable<TOut>>( this.Expression ).GetEnumerator();
}
of course, one of them would be explicit due to name collision. Please note that to fetch the enumerator, I actually call the Execute with explicitely stated TResult. This is why in my "proof" those types occurred.
I think that you see the "TResult = Single Element" case, because you wrote i.e. something like this:
public IEnumerator<TOut> GetEnumerator()
{
return Provider.Execute<TOut>( this.Expression ).GetEnumerator();
}
Which really renders your Execute implementation without choice, and must return single element. IMHO, this is just a bug in your code. You could have done it like in my example above, or you could simply use the untyped Execute:
public System.Collections.IEnumerator GetEnumerator()
{
return ((IEnumerable)Provider.Execute( this.Expression )).GetEnumerator();
}
public IEnumerator<TOut> GetEnumerator()
{
return ((IEnumerable<TOut>)Provider.Execute( this.Expression )).GetEnumerator();
}
Of course, your implementation of Execute must make sure to return proper IEnumerables for such queries!
Expression trees that represent queries that return enumerable results are executed when their associated IQueryable object is enumerated.
I recommend enumerating your query:
foreach(T t in query)
{
CustomModification(t);
}
Your IQueryProvider must implement CreateQuery<T>. You get to choose the implemenation of the resulting IQueryable. If you want that IQueryable to do something to each row when enumerated, you get to write that implementation.
The final answer is that it's not possible.

MemberExpression to MemberExpression[]

The objective is to get an array of MemberExpressions from two LambdaExpressions. The first is convertible to a MethodCallExpression that returns the instance of an object (Expression<Func<T>>). The second Lambda expression would take the result of the compiled first expression and return a nested member (Expression<Func<T,TMember>>). We can assume that the second Lambda expression will only make calls to nested properties, but may do several of these calls.
So, the signature of the method I am trying to create is :
MemberExpression[] GetMemberExpressionArray<T,TValue>(Expression<Func<T>> instanceExpression, Expression<Func<T,TValue>> nestedMemberExpression)
where nestedMemberExpression will be assumed to take an argument of the form
parent => parent.ChildProperty.GrandChildProperty
and the resulting array represents the MemberAccess from parent to ChildProperty and from the value of ChildProperty to GrandChildProperty.
I have already returned the last MemberExpression using the following extension method.
public static MemberExpression GetMemberExpression<T, TValue>(Expression<Func<T, TValue>> expression)
{
if (expression == null)
{
return null;
}
if (expression.Body is MemberExpression)
{
return (MemberExpression)expression.Body;
}
if (expression.Body is UnaryExpression)
{
var operand = ((UnaryExpression)expression.Body).Operand;
if (operand is MemberExpression)
{
return (MemberExpression)operand;
}
if (operand is MethodCallExpression)
{
return ((MethodCallExpression)operand).Object as MemberExpression;
}
}
return null;
}
Now, I know there are several ways to accomplish this. The most immediately intuitive to me would be to loop through the .Expression property to get the first expression and capture references to each MemberExpression along the way. This may be the best way to do it, but it may not. I am not extraordinarily familiar with the performance costs I get from using expressions like this. I know a MemberExpression has a MemberInfo and that reflection is supposed to hurt performance.
I've tried to search for information on expressions, but my resources have been very limited in what I've found.
I would appreciate any advice on how to accomplish this task (and this type of task, in general) with optimal performance and reliability.
I'm not sure why this has been tagged performance, but the easiest way I can think of to extract member-expressions from a tree is to subclass ExpressionVisitor. This should be much simpler than manually writing the logic to 'expand' different types of expressions and walk the tree.
You'll probably have to override the VisitMember method so that:
Each member-expression is captured.
Its children are visited.
I imagine that would look something like:
protected override Expression VisitMember(MemberExpression node)
{
_myListOfMemberExpressions.Add(node);
return base.VisitMember(node);
}
I'm slightly unclear about the remainder of your task; it appears like you want to rewrite parameter-expressions, in which case you might want to look at this answer from Marc Gravell.

How does LINQ implement the SingleOrDefault() method?

How is the method SingleOrDefault() evaluated in LINQ? Does it use a Binary Search behind the scenes?
Better than attempting to explain in words, I thought I'd just post the exact code of implementation in the .NET Framework, retrieved using the Reflector program (and reformatted ever so slightly).
public static TSource SingleOrDefault<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
throw Error.ArgumentNull("source");
IList<TSource> list = source as IList<TSource>;
if (list != null)
{
switch (list.Count)
{
case 0:
return default(TSource);
case 1:
return list[0];
}
}
else
{
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
if (!enumerator.MoveNext())
return default(TSource);
TSource current = enumerator.Current;
if (!enumerator.MoveNext())
return current;
}
}
throw Error.MoreThanOneElement();
}
It's quite interesting to oberserve that an optimisation is made if the object is of type IList<T>, which seems quite sensible. It simply falls back to enumerating over the object otherwise if the object implements nothing more specific than IEnumerable<T>, and does so just how you'd expect.
Note that it can't use a binary search because the object doesn't necessarily represent a sorted collection. (In fact, in almost all usage cases, it won't.)
I would assume that it simply performs the query and if the result count is zero, it returns the default instance of the class. If the result count is one, it returns that instance, and if the result count is greater than one, it throws an exception.
I don't think it does any searching, it's all about getting the first element of the source [list, result set, etc].
My best guess is that it just pulls the first element. If there is no first it returns the default (null, 0, false, etc). If there is a first, it attempts to pull the second result. If there is a second result it throws an exception. Otherwise it returns the first result.

Resources