How Pig deals with negating null value? - hadoop

I have a problem not understanding how apache pig (version r0.9.2) is handling negation of null values.
I have an expression like this:
nonEmpty = FILTER dataFields BY NOT IsEmpty(children);
If children is null, IsEmpty function will return null - so what confuses me how NOT operator will behave since I would have expression like this:
nonEmpty = FILTER dataFields BY NOT NULL;
Documentation for pig latin r0.9.2 says next:
"Pig does not support a boolean data type. However, the result of a boolean expression (an expression that includes boolean and comparison operators) is always of type boolean (true or false)."
which doesn't do anything more than confuse me totally.
Thanks for the help in advance.

Testing a NULL for emptiness is probably not a good idea regardless. In fact, I tried it on 0.10.0, and it threw an error saying exactly that. Instead, filter by not null and not empty:
nonEmpty = FILTER dataFields BY (children IS NOT NULL) AND (NOT IsEmpty(children));

Related

SSRS - Filtering an Integer With Both a String and a Boolean Parameter

I have a Boolean parameter called WLH where if True then it should ignore everything but if False then it should show a 0 for every craft textbox in a row that has the word "LABORER" in it. This is the expression that I am using but it doesn't seem to be doing anything. Can I get help on making it work? What am I doing wrong?
=IIF(Parameters!WLH.Value = false AND ReportItems!craft.Value LIKE "*laborer*", 0, ---main calculation for the else statement---)
Two things I see with this expression that need closer attention.
Parameters!WLH.Value = CBool("false"): The false side of the equality test needs to be converted to a boolean type with the CBool (conver to boolean) function.
ReportItems!craft.Value.IndexOf("laborer") >= 0: SSRS doesn't support LIKE in expressions but we can test for the existance of a substring in this manner. What this is doing is looking for the index (where the string "laborer" starts) in the field value and checking for a value greater than 0. This would mean that "laborer" was found while a value other than a positive integer means that the string "laborer" was not found.
I don't have SSRS installed on this machine to double check so post a comment if you still need help. Also note that IndexOf is case sensitive and that if you want to match to "Laborer" as well, you will have to do a case conversion prior to the IndexOf.
Full expression:
=IIF(Parameters!WLH.Value = CBool("false") AND ReportItems!craft.Value.IndexOf("laborer") >= 0, 0, ---main calculation for the else statement---)
EDIT: To deal with case sensitivity
Use "UCase()" to convert your field to upper case and then test only against "LABORER".
=IIF(Parameters!WLH.Value = CBool("false") AND UCase(ReportItems!craft.Value).IndexOf("LABORER") >=0, 0, ---main calculation for the else statement---)

How to use CASE statement and a parameter in the WHERE clause?

I have an SSRS report where there is a parameter that asks the user to include records where revenue is greater than zero, or records with revenue values that are just zero.
Since the query is not a stored procedure and it is not an option to put it into a procedure, I need to use some case logic for the embedded query. I need to do this in the where clause in the end.
I am trying to do something like this:
SELECT * FROM TABLE
WHERE MY_DATE BETWEEN D_START AND D_END
AND
CASE
WHEN :REVENUE = 1 THEN REV != 0
WHEN :REVENUE = 2 THEN REV = 0
END
However, when I run this query I get the following error:
ORA-00905: missing keyword
Is what I am doing not possible? Or is there an error that someone can see and help me with?
Please help. Thanks!
UPDATE: Just to clarify, the user is passing a value of 1 or 2. And the query should filter the data according to what value is passed to it. If 1 is passed in the parameter, then filter out all revenue not equal to zero. Else if two is passed, then filter so that only records where revenue is zero is returned.
You can write it better with a bit of boolean logic:
SELECT * FROM TABLE
WHERE MY_DATE BETWEEN D_START AND D_END
AND (
(:REVENUE = 1 AND REV != 0)
OR
(:REVENUE = 2 AND REV = 0 )
)
CASE is meant to extract different values based on conditions, so you can use it to check conditions, but you need to use it as a value to check against a condition
It's not necessary to use a CASE expression to get this particular result.
But it is possible to make use of one.
The problem in the original query is that Oracle is more strict than other databases (like MySQL) in that Oracle doesn't implicitly convert a boolean expression to a value, or convert a value into boolean.
I suspect that Oracle is choking in a couple of places. The error message is only showing us one of those.
The CASE expression returns a value, and Oracle is balking that he won't evaluate the value as a boolean.
To get that value evaluated as a boolean, we could do a comparison of the value to some other value.
If we fix that, I think Oracle is still going to choke on the expression following THEN. Oracle is expecting to return a value, and it's finding a comparison, which evaluates to a boolean.
Okay, so we know the CASE expression needs to return a value, and we need to use that in a boolean expression. If we move that conditional test into the WHEN part, and specify a value to be returned in the THEN, we can compare the return from the CASE expression to another value.
(As an aside... I strongly recommend that you qualify the column references in the SQL statement. That makes the intent more clear. Looking at the statement, it looks like MY_DATE, D_START and D_END are all column references. That's perfectly valid, it just seems a bit odd to me.)
As an example, we could do something like this with the CASE expression:
SELECT t.*
FROM TABLE t
WHERE t.MY_DATE BETWEEN t.D_START AND t.D_END
AND CASE
WHEN ( :REVENUE = 1 AND t.REV != 0 ) THEN 1
WHEN ( :REVENUE = 2 AND t.REV = 0 ) THEN 1
ELSE NULL
END = 1
The parens inside the CASE aren't necessary; I just included them to highlight the part that Oracle is evaluating in a boolean context.
So, does that work? If the value passed in for :REVENUE is 2, the condition in the first WHEN won't evaluate to TRUE (the result of first comparison is guaranteed to be FALSE). The condition in the second WHEN may evaluate to TRUE (first comparison will yield TRUE, the result from second comparison will depend on the value in the REV column.)
That CASE expression is either going to return a value of 1 or NULL. (We could just as easily use a 0 or a -1, or 999 in place of NULL if we wanted.)
Once the CASE expression is evaluated, the value returned will be compared to a literal value, as if we wrote e.g. val = 1. That comparison is evaluated as boolean. If it evaluates to TRUE, the row will be returned...
To get Oracle to behave similarly to other databases (like MySQL), we would need to make the conversion from boolean to value and value to boolean explicit. We would still need the return from the CASE compared to 1, like we did above. In place of REV != 0 we could use another CASE expression. I'm not recommending this, just shown here for illustration, converting a boolean to a value.
WHERE CASE
WHEN ( :REVENUE = 1 )
THEN CASE WHEN ( t.REV != 0 ) THEN 1 ELSE NULL END
WHEN ( :REVENUE = 2 )
THEN CASE WHEN ( t.REV = 0 ) THEN 1 ELSE NULL END
ELSE
NULL
END = 1
Note that the return from the outermost CASE expression is being compared to a value, so we get a boolean (where Oracle expects a boolean.)
All of the ELSE NULL in the statements above can be omitted for an equivalent result, since that's the default when ELSE is omitted.)
Again, it's not necessary to use a CASE expression. You can get equivalent results without it. For example:
SELECT t.*
FROM TABLE t
WHERE t.MY_DATE BETWEEN t.D_START AND t.D_END
AND ( ( :REVENUE = 1 AND t.REV != 0 )
OR ( :REVENUE = 2 AND t.REV = 0 )
)
In these queries that all return an equivalent result, the CASE expression doesn't buy us anything. But in some circumstances, it can have some advantages over a regular OR, because the CASE expression stops evaluation when a condition in a WHEN clause evaluates to TRUE.
The problem is that Oracle SQL does not have the boolean data type, so you cannot have columns of type boolean, pass boolean parameters to a query, have boolean expressions etc. So they have the somewhat unnatural concept of "condition" which is something that goes into logical conditions (like in the WHERE clause). Unfortunately, when they introduced the case EXPRESSION, which can be used wherever any other expression can be used (but this excludes boolean), they DID NOT introduce a "case CONDITION" - which could be used where other conditions can be used. This omission is odd, since the code for a case condition would probably use 95% of the code for the case expression. All the more weird since PL/SQL does have the boolean type, and the case expression there works seamlessly for Booleans.

MongoDB comparison operators with null

In MongoDB I would like to use $gt and $lt comparision operators where the value could be null. When the operators did not work with null, I looked for documentation but found none. In both cases it returned no documents (even though $ne, $gte, and $lte did return documents; meaning there were documents that were both equal to and not equal to null).
I would expect $gt to essentially operate like $ne (as the null type Mongo comarison order is so low) and $lt to return nothing for the same reason.
I was hoping this would work as the value I pass to the query is variable (potentially null), and I don't want to have to write a special case for null.
Example of what I was expeccting, given the following collection:
{
id: 1,
colNum: null
}
{
id: 2,
colNum: 72
}
{
id: 3
}
I would expect the following query:
db.testtable.find( { "colNum" { $gt : null } } )
To return:
{
id: 2,
colNum: 72
}
However, nothing was returned.
Is there a reason that $gt and $lt don't seem to work with null, or is it a MongoDB bug, or is it actually supposed to work and there is likely a user error?
Nitty-Gritty Details
Reading through the latest Mongo source, there's basically 2 cases when doing comparisons involving null:
If the canonical types of the BSON elements being compared are different, only equality comparisons (==, >=, <=) of null & undefined will return true; otherwise any comparison with null will return false.
Note: No other BSON type has the same canonical type as null.
If the canonical types are the same (i.e., both elements are null), then compareElementValues is called. For null, this just returns the difference between the canonical type of both BSON elements and then carries out the requested comparison against 0.
For example, null > null would translate into (5-5) > 0 --> False because the canonical type of null is 5.
Similarly, null < null would translate into (5-5) < 0 --> False.
This means null can only ever be equal to null or undefined. Any other comparison involving null will always return false.
Is this a Bug?
Updated Answer:
The documentation for the comparison operators ($gt, $lt) references the documentation which you originally linked, which implies that the comparison operators should work with null. Furthermore, query sorting (i.e., db.find().sort()) does accurately follow the documented Comparison/Sort behavior.
This is, at the very least, inconsistent. I think it would be worth submitting a bug report to MongoDB's JIRA site.
Original Answer:
I don't think this behavior is a bug.
The general consensus for Javascript is that undefined means unassigned while null means assigned but otherwise undefined. Value comparisons against undefined, aside from equality, don't make sense, at least in a mathematical sense.
Given that BSON draws heavily from JavaScript, this applies to MongoDB too.

Checking for inequality against string variable fails

I have the code like this:
var query = repository.Where(item => item.UserId == userId && item.LoanNumber != loanNumber)
which is transformed to SQL (repository is IQueryable).
loanNumber is a string parameter in the method. The problem is that checking against inequality fails (ignored). If instead of variable I use constant with its value, it works properly.
What the... ?
A number should be a NUMBER DATA TYPE, and not a string. It violates normalization rules. So please tell what are the data type of the values being compared on both sides of the expression in predicate.
If you compare similar data types, you would get correct results as you don't and should not rely on implicit conversion.
So make sure you have the correct data type.

Using Where() with a dynamic Func fails, but works with a hard coded Where() clause

See the two Linq (to SharePoint) code samples below.
The only differences are the highlighted sections of code. The first statement works as expected with a hard-coded where clause, but the 2nd set of code throws the error “Value does not fall in within the expected range” when I try to do a count on the items. What am I missing?
Works
relatedListItems = dc.GetList<GeneralPage>("Pages")
.Where(x => x.RelatedPracticesTitle.Any(y=>y=="Foo"))
if (relatedListItems.Count() == 0)
{…}
Fails - “Value does not fall within the expected range”
Func<GeneralPage, bool> f = x => x.RelatedPracticesTitle.Any(y => y == "Foo");
relatedListItems = dc.GetList<GeneralPage>("Pages")
.Where(f)
if (relatedListItems.Count() == 0)
{…}
If it's LINQ to Sharepoint, presumably that means it should be using expression trees, not delegates. Try:
Expression<Func<GeneralPage, bool>> f =
x => x.RelatedPracticesTitle.Any(y => y == "Foo");
relatedListItems = dc.GetList<GeneralPage>("Pages").Where(f);
By the way, it's generally a better idea to use Any() rather than Count() if you just want to find out if there are any results - that way it can return as soon as it's found the first one. (It also expresses what you're interested in more clearly, IMO.)
In the first case, you're using the Expression<Func<GeneralPage, bool>> overload and pass an expression which I assume LINQ to SharePoint will try to convert to CAML and execute.
In the second case, you're passing the plain Func<GeneralPage, bool> so LINQ to SharePoint can't figure out how to compose a query (it only sees the delegate, not the expression).

Resources