Pig - Filter a column twice in the same filter - filter

I'm trying to filter the same column twice. I basically want to only get the record in which one column is between the values of other 2 columns.
Imagine this:
(id,year_min,year_max,year)
(4470,1999,2001,2011)
(4471,2006,2013,2013)
So filtering like this it doesn't work:
filter1 = filter set by (year_min <= year and year_max >= year)
Is there another way of filtering it instead of breaking that filter into several ones?
Also, all the columns are bytearray.
PS: That is not the whole set but that is basically what I want to achieve.
Thanks.

I guess the year_max and year got swapped in your schema. It should be like this right? id, year_min, year, year_max but your schema says id, year_min, year_max, year . I modified the schema its works fine for me. Can you check it?

Related

How to compare columns in a new column

I have a recordset where rows have a date field.Rows are from current year and last year. I want to see the count of rows per year. That's easy. Now I'd like to have a third column with difference (count(currentYear) - count(lastYear)). Is there any way to achieve this?
thanks
It seems like you want to have the difference calculated between two members of the same date field.
If so, you might want to consider the customizeCell() API call to retrieve the count values of each year, use them to calculate the difference, and modify the necessary cells with the result.
Alternatively, you could try modifying your data set so that the lastYear and currentYear are two different fields – this, for example, would allow you to compare them within a calculated value.

Sum on Count in hql - error Not yet supported place for UDAF 'count'

I'm new here so please be gentle, my first question after using this website for a long time regards the below:
I'm trying to create a sum of count of events in the past 30 days:
select key, sum((COALESCE(count(*),0)))
from table
Where date>= '2016-08-13'
And date<= '2016-09-11'
group by key;
but the sum doesn't seem to work. i'm looking at the last 30 days, and i would like to count any row that exists for each key, and then sum the counts (i need to count on a daily basis and then sum all day's count).
If you can offer any other way to deal with this issue i'm open for suggestions!
Many thanks,
Shira
You can't nest aggregate functions in HQL (or SQL). However, if you just want a count of records falling within range for each key, then you can simply just use COUNT(*):
select key, count(*)
from table
where date >= '2016-08-13' and
date <= '2016-09-11'
group by key;
It looks like there was a couple things wrong with your code.
I've written this for you, haven't tested it but it passes the syntax test.
SELECT COUNT(key) AS Counting FROM tblname
WHERE date>= '2016-08-13'
AND date<= '2016-09-11'
GROUP BY key;
And this might help you. You should definitely be using COUNT for this query.
I'm not sure if it's related but there might be an issue with calling a field 'key' I kept receiving syntax errors for it.
Hope I was able to help!
-Krypton

SSRS Tablix Cell Calculation based on RowGroup Value

I have looked through several of the posts on SSRS tablix expressions and I can't find the answer to my particular issue.
I have a dashboard I am creating that contains summary data for various managers. They are entering monthly summary data into a single table structured like this:
Create TABLE OperationMetrics
AS
Date date
Plant char(10)
Sales float
ReturnedProduct float
The data could use some grouping so I created a table for referencing which report group these metrics go into looks like this:
Create Table OperationsReport
as
ReportType varchar(50)
MetricType varchar(50)
In this table, 'Sales' and 'ReturnedProduct' are the Metric column, while 'ExecSummary' or 'Quality' are ReportType entries. To do the join, I decided to UNPIVOT the OperationMetrics table...
Select Date, Plant, Metric, MetricType
From (Select Date, Plant, Sales, ReturnedProduct From OperationMetrics)
UNPVIOT (Metric for MetricType in (Sales, ReturnedProduct) UnPvt
and join it to the OperationsReport table so I have grouped metrics.
Select Date, Plant, Metric, Rpt.MetricReport, MetricType
FROM OpMetrics_Unpivoted OpEx
INNER JOIN OperationsReport Rpt on OpEx.MetricType = Rpt.MetricType
(I understand that elements of this is not ideal but sometimes we are not in control of our destiny.)
This does not include the whole of the tables but you get the gist. So, they have a form they fill in the OperationMetrics table. I chose SSRS to display the output.
I created a tablix with the following configuration (I can't post images due to my rep...)
Date is the only column group, grouped on 'MMM-yy'
Parent Row Group is the ReportType
Child Row Group is the MetricType
Now, my problem is that some of the metrics are calculations of other metrics. For instance, 'Returned Product (% of Sales)' is not entered by the manager because it is assumed we can simply calculate that. It would be ReturnedProduct divided by Sales.
I attempted to calculate this by using a lookup function, as below:
Switch(Fields!FriendlyName.Value="Sales",SUM(Fields!Metric.Value),
Fields!FriendlyName.Value="ReturnedProduct",SUM(Fields!Metric.Value),
Fields!FriendlyName.Value="ReturnedProductPercent",Lookup("ReturnedProduct",
Fields!FriendlyName.Value,Fields!Metric.Value,"MetricDataSet")/
Lookup("Sales",Fields!FriendlyName.Value,Fields!Metric.Value,
"MetricDataSet"))
This works great! For the first month... but since Lookup looks for the first match, it just posts the same value for the rest of the months after.
I attempted to use this but it got me back to where I was at the beginning since the dataset does not have the value.
Any help with this would be well received. I would like to keep the rowgroup hierarchy.
It sounds like the LookUp is working for you but you just need to include the date to find the right month. LookUp will return the first match which is why it's only working on the first month.
What you can try is concatenating the Metric Name and Date fields in the LookUp.
Lookup("Sales" & CSTR(Fields!DATE.Value), Fields!FriendlyName.Value & CSTR(Fields!DATE.Value), Fields!Metric.Value, "MetricDataSet")
Let me know if I misunderstood the issue.

Can I compare values in the same column in adjacent rows in PowerPivot?

I have a PowerPivot table for which I need to be able to determine how long an item was in an Error state. My data set looks something like this:
What I need to be able to do is to look at the values in the ID and State columns, and see if the value in the previous row is ERROR in the State column, and the same in the ID column. If it is, I then need to calculate the difference between the Changed Date values in those two rows.
So, for example, when I got to row 4, I would see that the value in the State column for Row 3, the previous row, is ERROR, and that the value in the ID column in the previous row is the same as the current row, so I would then calculate the difference between the Changed Date values in Row 3 and Row 4 (I don't care about the values in any of the other columns for this particular requirement).
Is there a way to do this in PowerPivot? I've done a fair amount of Internet searching, and it looks like if it can be done, it would use the EARLIER or EARLIEST DAX functions, but I can't find anything that tells me how, or even if, this can be done.
Thanks.
Chris,
I have had similar requirements many times and after a really long time of trial-and-error, I finally understood how EARLIER works. It can be very powerful, but also very slow so always check for the performance of your calculations.
To answer your question, you will need to create 4 calculated columns:
1) Item Rank - used for ranking the issues with same Item ID
=COUNTROWS(FILTER('ID', EARLIER([Item ID]) = [Item ID] && EARLIER([Date]) >= [Date]))
2) Follows Error - to easily find issue that follows EROR issue
=IF([State] = "EROR",[Item Rank]+1)
3) Time of Following Issue - simple lookup so that you can calculate the different
=IF([Follows Error]>0,
LOOKUPVALUE([Date], [User], [User], [Item Rank], [Follows Error]),
BLANK()
)
4) Time Diff - calculation of time different for the specific issue
=IF([State]="EROR",
DAY([Time of Following Issue])-DAY([Date]),
BLANK()
)
With those calculated columns, you can then easily create a powerpivot table, drag State and Item Id onto the ROWS pane and then simply add Time Diff to Values. You will get an overview of issues that contain string "EROR" issue and the time it took to resolve them.
This is what it looks like in PowerPivot window:
And the resulting Pivot table:
You can download my Excel file here (2013).
As I mentioned, be careful with the performance as the calculated columns with nested EARLIER and IF conditions might be a bit too performance-demanding. If there is a smarter way, I would be very happy to see it, but for now this works for me just fine.
Also, keep in mind that all calculated columns could be nested into 1, but I kept them separated to make it easier to understand the formulas.
Hope this helps :-)

How can I select entries for a given weekday using SQL?

I could use this query to select all orders with a date on a monday:
SELECT * from orders WHERE strftime("%w", date)="1";
But as far as I know, this can't be speed up using an index, as for every row strftime has to be calculated.
I could add an additional field with the weekday stored, but I want to avoid it. Is there a solution that makes use of an index or am I wrong and this query actually works fine? (That means it doesn't have to go through every row to calculate the result.)
If you want all Mondays ever, you'd need a field or sequential scan. What you could do, is calculate actual dates for example for all Mondays within a year. The condition WHERE date IN ('2009-03-02', '2009-02-23', ...) would use index
Or as an alternative to vartec's suggestion, construct a calendar table consisting only of a date and a day name for each day in the year (both indexed) and then perform your query by doing a JOIN against this table.

Resources