Group by name, then sort by oldest date - sorting

How can I group by names, and then sort by oldest date? Some names have multiple dates so I want to keep the oldest date first and suppress any other date associated with that name.
A 1/1/2016
C 8/5/2017
B 10/2/2017
C 5/6/2017
A 10/6/2012
B 5/15/2013
A 2/5/2017
B 4/6/2017
I tried creating a group for each name and date, then sorting by ascending order. But when I place Group 2: Date above Group 1: Name, the names are no longer grouped together.

There are a few ways to go about this. Crystal makes use of SQL to get your data in the first place, so you're able to write your own SQL for record selection. And it's rather easy to write a statement that will only return the oldest date per person, then arrange them by date.
If you prefer to do it in Crystal, you can do either of the following:
Group by Name, Sort Ascending. Then make a subrgroup by Date, Ascending. Use a running total (suppressed so you don't see it) to count the number of records, but reset this count on change of Group A. Then suppress the Details section every time this running total is greater than 1.
Group by Name. Don't add a second group. Use a subreport that takes the current Name in as a parameter, and returns only the oldest date on record for that name. (Hint: Turn off Can Grow and limit the subreport to one row.)

Related

How to make groups in an input and select a specific row in each of them in Talend?

I am working on a Talend transformation process (we are using Talend 6.4).
, and I don't know how to implement the current requirement.
I have an input consisting in :
Two columns that are my group keys (Account and Product), but are not unique (the same Account x Product couple can happen in multiple rows)
A criterion column (Contract end date), which will help me decide which row I want to keep for each group
Some "tail" data that need to be passed to the following step of the processing (the contract number)
The rule to implement is:
Keep only one record per group
The selected record must be one with no end date or, if all have end date, with the biggest end date
The selected record can be random in case there is a tie
See the transformation applying those rules on some dummy data:
I thought first to do the following:
sort by Account, Product, End_date (nulls first)
"select first" in each group
but I am not skilled enough to know whether the second transformation exists in Talend.
Regards,
Pierre
Very interesting Talend question.
You need to create something like this job.
here a link to the zip file to import in your Talend
The answer from #MBDIA seem to be working, however I would like to share what we did to fulfill our requirement.
See our Talend process here:
The first tMap (tMap_3) acts like a tReplicate and a tMap, and sends:
in the upper branch only the Account and Product references, that are then deduplicated by the tAggregateRow_1.
in the lower branch all data and computed fields that enables us to take care of the case where the date is missing (instead of defaulting to 31/12/9999, we compute a flag (0 or 1) that we use in the sort step afterwards).
In the second part of the process, we first apply the sort to the whole data on Account, Product, Empty date flag (computed before), End date (desc) and use a second tMap to make a join on both branches (on Account x Product), only keeping First Match in order to keep the first record as per our requirement.

Power Query, avg value based on the values appearing within a specified date range

Context:
I have a data set for the weights of truck and trailer combinations coming into my site over the span of a few years. I have organized my data by seasons as I am trying to prove that the truck:trailers in winter are noticeably heavier due to ice, snow, and mud. The theory is, if the tare weight is higher in this season (the weight of the truck after it empties its load) than its Avg tare weight (which I need to calculate from the data) it can be deduced that the truck:trailer combinations are coming in with extra weight that we pay for in part as some snow/ice/mud falls off in the trailer emptying process.
What I've done so far:
I've defined a custom date range for my seasons
I've grouped Truck:Trailer by: count to get a duplicates column and, all rows to keep all my details
I've filtered out every combination I've seen less than 50 times, as i want good representation for each truck:trailer combo so that I can better emphasize repeated patterns
I've added an index column to better keep track of the individuals before expanding the details
What I need to do:
I only want to work with truck:trailer combinations which have weighed in for all four seasons at least once
I need to find the average tare weight of the truck:trailer combinations based over the extended range for both summer and autumn (the dry time of the year) while preserving the raw tare data for all seasons, as I need to eventually compare the winter tare values to this average.
example of my data
When I'm finished I'd like the data to look something like this
Pivot Chart
query data
For your first question (all seasons) you can add a column that holds the distinct count of the values in [Season] for each [Driver:Trailer]. Then filter your table on that column, keeping only the 4's. To achieve this, add the following m-code to your script in the Advanced Editor. Change the part after in to #"DistinctCount Season"
#"DistinctCount Season" = Table.Join(#"insert name previous step","Driver:Trailer",
Table.Group(#"insert name previous step", {"Driver:Trailer"},
{{"DistinctCountSeasons", each Table.RowCount(Table.Distinct(_,"Season")),
type number}}),"Driver:Trailer")
Insert the name of your previous step where indicated.
For second question:
You can use a matrix-visual for that in you report. First create a measure:
[AverageTare] = AVERAGE(table'[Tare])
Then put [Season] on Rows and the [AverageTare] on Values. You can create a group (right-click on [Season] in the FIELDS-pain) called [DrySeason], to combine the values for Spring and Summer.
If that doesn't work for you, explore the AVERAGEX function.
EDIT
In excel you can use a pivottable. Put [Season] on Rows and the [AverageTare] on Values. Right-click a value in the pivottable. Select Value Field Setting and choose Average. Then select the Seasons you want to group, right-click and select Group.
EDIT 2
To add a column in the Power Query Editor that holds the average [Tare] for the [Season] in each row, add the following steps to your script in the Avanced Editor:
#"GroupedSeasonAvg" = Table.Group(#"Insert name previous step", {"Season"}, {{"AVG", each List.Average([Tare]), type number}}),
#"JoinOnSeason" = Table.NestedJoin(#"Insert name previous step",{"Season"},GroupedSeasonAvg,{"Season"},"AVGGrouped"),
#"ExtractSeasonAVG" = Table.ExpandTableColumn(JoinOnSeason, "AVGGrouped", {"AVG"}, {"SeasonAVG"})
It works something like this:
"GroupedSeasonAvg" : Creates a table with the avereges for each [Season]
"JoinOnSeason": Creates a new column with tables joining the [Season] value for each row to [Season] in the grouped table.
#"ExtractSeasonAVG": Expand each table and keep only [AVG].

SSRS Matrix Bespoke Headers (Still from datasource!!)

I have to create a matrix in SSRS to detail the number uses leaving an organisation.
The columns will all represent spaces of time spanning 1 week and the rows will all represent departements in the organisation. The detail portion will be a count of people who have left that area in that week.
I have a leaving date field in the DB but nothing that flags the specific intevals I have been told to use. That means that as the matrix is, it counts each of users that have left a specific department however the date range columns is 1 day, not 1 week. Is there a way to force the column headers to respect the week intervals I want given that they are currently coming from the dataset and are not hard coded?
Firstly try to manage your data in sql itself by using Group By with date and making each group as one week period. That way you can manage to get all data in your required format
I don't know what is your columns so I am just showing a way to get the week groups from table and get the count of the people
SELECT DATEPART(wk, datevaluecolumn) weekno
, SUM(peopleleavingcolumn) totalvalue
FROM yourTable
GROUP BY DATEPART(wk, datevalue)

Ms Access 2007 - Can sort or group report but not both, can't figure out why

I'm stuck and confused and hopefully can get some help here. I have a query that pulls info from two tables and a report that reads it. My two tables are as such: One of the is a Contact list with phone numbers, names, and addresses. The other table is a paid history. The PhoneNumber field is how they're tied together. There are no duplicate entries in the Contact table but there are multiple paid instances per phone number in the other table.
My report groups them on the phone number, but I also need it to sort by date. My date field is marked as time/date, and it is in the paid table. The issue I'm running into is that I can either Group on PhoneNumber OR sort on the Date field but not both. When I set the Group as the top level, it ignores the Sort that I have set below it. If I take the sort and drag it up so that it becomes the top level, it won't group. When it doesn't group I'm left with multiple instances of the same Contact info... as in I get a new listing for every date that it has ever paid, whereas I need one a single Contact listing with each paid instance to be grouped underneath it.
Here's my query SQL:
SELECT
tblContributorsLead.FirstName,
tblContributorsLead.LastName,
tblContributorsLead.Address1,
tblContributorsLead.ZipCode,
tblContributorsLead.CityName,
tblPledgesLead.PledgeAmountRecd,
tblPledgesLead.DateRecd,
tblPledgesLead.PhoneNumber,
tblPledgesLead.DispositionTime,
tblPledgesLead.Agent,
tblPledgesLead.CampaignName,
tblPledgesLead.Custom20
FROM
tblContributorsLead
INNER JOIN
tblPledgesLead
ON tblContributorsLead.PhoneNumber = tblPledgesLead.PhoneNumber
WHERE
(((tblPledgesLead.PledgeAmountRecd)>0)
AND ((tblPledgesLead.DateRecd) Is Not Null));
Why would I only be able to either group OR sort but not both at the same time?
Edit: http://icloudbackups.com/stripped.zip is a copy of my database stripped down.
I think I understand now what you wanted - to have the phone number groups with the most recent dates show up at the top. To do this you need to identify the Last (or First if you need it the other way around) DateRecd for each PhoneNumber.
SELECT SortingAndGrouping.LastDate, SortingAndGrouping.PhoneNumber, tblPledgesLead.DateRecd
FROM (tblContributorsLead INNER JOIN tblPledgesLead ON
tblContributorsLead.PhoneNumber = tblPledgesLead.PhoneNumber) INNER JOIN
(SELECT CDate(Format(tblPledgesLead.DateRecd,"MM/DD/YYYY")) As LastDate, tblPledgesLead.PhoneNumber
FROM tblContributorsLead INNER JOIN tblPledgesLead ON
tblContributorsLead.PhoneNumber = tblPledgesLead.PhoneNumber
ORDER BY tblPledgesLead.DateRecd DESC) AS SortingAndGrouping ON
tblContributorsLead.PhoneNumber = SortingAndGrouping.PhoneNumber
ORDER BY SortingAndGrouping.LastDate DESC , SortingAndGrouping.PhoneNumber, tblPledgesLead.DateRecd DESC;
You will need to add the additional fields you want to display (I removed them here for clarity), and have the report enforce the same sorting I have here - Create a group for the LastDate column, then a group for the PhoneNumber column, then have the sorting specified.

How can I select entries for a given weekday using SQL?

I could use this query to select all orders with a date on a monday:
SELECT * from orders WHERE strftime("%w", date)="1";
But as far as I know, this can't be speed up using an index, as for every row strftime has to be calculated.
I could add an additional field with the weekday stored, but I want to avoid it. Is there a solution that makes use of an index or am I wrong and this query actually works fine? (That means it doesn't have to go through every row to calculate the result.)
If you want all Mondays ever, you'd need a field or sequential scan. What you could do, is calculate actual dates for example for all Mondays within a year. The condition WHERE date IN ('2009-03-02', '2009-02-23', ...) would use index
Or as an alternative to vartec's suggestion, construct a calendar table consisting only of a date and a day name for each day in the year (both indexed) and then perform your query by doing a JOIN against this table.

Resources