Create a Dynamic Array formula (Excel) to combine multiple results columns into one column that is filtered & sorted using multiple criteria? - sorting

The sample data in the image below is collected from a round robin tournament.
There is a Round column,Home team & Away team columns listing who is playing who. A team could be either Home or Away.
For each match in a round (including any "Bye" match) the number of games won for the Home and Away team are recorded in separate columns respectively.
"Ff" = forfeit and has a value of 0. "Bye" result is left blank (at this stage).
Output columns are "Won, Lost, Round".
Required output (shown in the image) is, for any selected team, the top n most-games-won matches (from both Home & Away) sorted in descending order and then the corresponding games lost but sorted in ascending order where the games won are equal. Finally show the rounds where those scores occurred.
These are the challenges I've faced in going from data to output in one step using dynamic array formula:
Collating/Combining the the Win results into 1 column. Likewise the Losses.
Getting the array to ignore blanks or convert "Ff" to 0 without getting #NUM or #VALUE errors.
Ensuring that if I used separate single column arrays the corresponding Loss and Round matched the Win result
Although "Round, Won, Lost" would be acceptable. But I wasn't able to get the Dynamic Array capability to give the required output with this order.
SUMPRODUCT, INDEX(MATCH), SORT(FILTER) functions all hint at a possible one step formula solution.
The solutions are numerous for sorting & filtering where the existing values are already in one column. There was one solution that dealt with 2 columns of values which was somewhat useful How to get the highest values from 2 columns in excel - Stackoverflow 2013
Many other responses are around the use of concatenation, combining/merging array sets, aggregation etc.
My work around solution is to use a Helper Sheet to combine the Wins from the separate results columns and convert blanks & "Ff" to -1. Likewise for Losses. Using the formula for each line
=IF($C5=L$2,IF($F5="",-1,IF($F5="Ff",0,$F5)),IF($D5=L$2,IF($G5="",-1,IF($G5="Ff",0,$G5)),-1))
Example Helper Sheet
To get the final output the Dynamic Array formula was used on the Helper Sheet data
=SORT(FILTER(L$26:N$40,L$26:L$40>=LARGE(L$26:L$40,$J$3),""),{1,2},{-1,1},FALSE)
I'm trying to avoid using pivottable, VBA solutions. Powerquery possible but not preferred.
Apologies for the screenshots but I couldn't work out how to attach the sample spreadsheet file. (Unfortunately Stackoverflow Help didn't help me to/not to do this.)

Based on the comments I changed my answer with a different approach:
=LET(data,A5:F19,
round,INDEX(data,,1),
ha,CHOOSECOLS(data,3,4),
HAwonR,CHOOSECOLS(data,5,6,1),
w,BYROW(ha,LAMBDA(h,IFERROR(XMATCH(L2,h),0))),
clm,CHOOSE(w,{1,2},{2,1}),
srtwon,DROP(REDUCE(0,SEQUENCE(ROWS(data)),LAMBDA(y,z,VSTACK(y,INDEX(HAwonR,z,HSTACK(INDEX(clm,z,),3))))),1),
res,FILTER(srtwon,w),
TAKE(SORT(res,{1,2},{-1,1}),J3))
Old answer:
=LET(data,A5:F19,
round,INDEX(data,,1),
home,INDEX(data,,3),
away,INDEX(data,,4),
HAwonR,CHOOSECOLS(data,5,6,1),
w,MAP(home,away,LAMBDA(h,a,OR(h=L2,a=L2))),
won,FILTER(HAwonR,w),
TAKE(SORT(won,{1,2},{-1,1}),J3))
In your example you selected round 3 for the third result, but that wasn't won, so I guess that was by mistake.
As you can see making use of LET avoids helpers. Let allows you to create names (helpers) that are stored and because you can name them, you can make complex formulas be more readable.
Basically what it does is filter the columns Home, Away and Round (in that order) for either Home or Away equal the team in cell L2. That's sorted column 1 descending and column 2 ascending. Than the number of rows mentioned in cell J3 are displayed from that sorted array.

Here is my solution based on the excellent contribution by #P.b. Thank you much appreciated.
The wins (likewise losses) required mapping the presence, of the team in question, as hT (home team) to the games it won (hG) and adding to that a 2nd mapping of the games it won (aG) when it was the away team (aT). Essentially what was being done on the Helper Sheet. Result was a 1 column array for game wins and a 1 column array for game losses.
In the process I was able to convert the "Ff" text to 0. I attempted without the conversion and it threw an error.
Instead of CHOOSECOLS used HSTACK to create the new array (wins, losses & round) for the FILTER, SORT, TAKE to work on.
If it could be made conciser(?) that is the next challenge. Overall (not just my solution), this exercise has provided greater flexibility and solved the problems stated. I'm happy!
=LET(data,A5:G19,
round,INDEX(data,,1),
hT,INDEX(data,,3),
aT,INDEX(data,,4),
hG,INDEX(data,,6),
aG,INDEX(data,,7),
wins,MAP(hG,
MAP(hT,LAMBDA(h,h=L2)),
LAMBDA(w,t,IF(w="Ff",0,w)*IF(t=TRUE,1,0))) +
MAP(aG,
MAP(aT,LAMBDA(a,a=L2)),
LAMBDA(w,t,IF(w="Ff",0,w)*IF(t=TRUE,1,0))),
losses,MAP(aG,
MAP(hT,LAMBDA(h,h=L2)),
LAMBDA(w,t,IF(w="Ff",0,w)*IF(t=TRUE,1,0))) +
MAP(hG,
MAP(aT,LAMBDA(a,a=L2)),
LAMBDA(w,t,IF(w="Ff",0,w)*IF(t=TRUE,1,0))),
HAwonR,HSTACK(wins,losses,round),
w,MAP(home,away,LAMBDA(h,a,OR(h=L2,a=L2))),
won,FILTER(HAwonR,w),
TAKE(SORT(won,{1,2},{-1,1}),J3))

Related

Google Sheets calculate characters only once

Is there a formula in google sheets to calculate a character only once. For example, if a row has 5 columns (Monday-Friday) and there are 2 or 3 columns marked with X. How can I calculate how many rows have an X. I don't need to know how many Xs there are just how many have an X?
Reina, I have one answer, though there may be better ones.
This formula, pasted into B34, should do what you want. It merges all the cells in column B to F, in each row, into one value, substitutes out possible spaces, then checks if it has at least one "y" (as used in your example.
=COUNTIF(ARRAYFORMULA(
SUBSTITUTE(B4:B29&C4:C29&D4:D29&E4:E29&F4:F29," ","")),
"*y*")
It is coded to search all student rows, ie. between 4 and 29 - change these row numbers if necessary.
If the attendance might be marked with something other than a "y", you could change the "y" part of the formula to "?*". I just didn't know if other values might be used, eg. an "S' for sick day or something, and you wanted to ignore those.
Then, you can drag the new formula from B34, sideways on row 34, to G34 and beyond, and it should calculate the results for the subsequent weeks. It will shift the columns being checked by the formula automatically.
Let me know if this works for you, or if you need something else.
To possibly ease data entry, here is a sample sheet with the formula, but with check boxes replacing the cells where attendance is marked.
https://docs.google.com/spreadsheets/d/1ON5Rc55aLVq_LHtFOfpgmf876bYg2ITfwpbifklr3lU/edit?usp=sharing
Here the formula is slightly modified to look for "TRUE" values, instead of "y"s.
UPDATE: To look for ANY non-blank cell in that range, and count "1" for every student that week that attended at least one day, the formula is:
=COUNTIF(
ARRAYFORMULA( B4:B29&C4:C29&D4:D29&E4:E29&F4:F29), ">""")
or
=COUNTIF(
ARRAYFORMULA( B4:B29&C4:C29&D4:D29&E4:E29&F4:F29), "?*")
See sample here:
https://docs.google.com/spreadsheets/d/1ON5Rc55aLVq_LHtFOfpgmf876bYg2ITfwpbifklr3lU/edit#gid=461771088&range=B34:F34
Let me know if this answers your question, or do you need to do something specific with the "y,x, and o"s?

Advanced Excel Search and Sorting

I have a incredibly large spreadsheet that lists details for the computers in my company's inventory. We need to know how many systems we have that are x years old. I was able to sort it by model but because the model names are wildly different it didn't help much. For example, one model name is
13-inch MacBook Pro (2011)
And another is
13-inch Retina MacBook Pro (Mid 2017)
The only constant value in the parentheses is the year at the end. I'm trying to write a formula that will spit out how many of each system there are. We need to know how many are 2011 computers, how many are 2017, etc. We are fine with grouping up "Early, Mid, Late" since we just need a year separation but those terms don't show up in every cell throwing my math off. The rows don't have to be sorted, I just need a count.
My plan of attack would be to first, convert the spreadsheet into a table using Insert > Table... this enables Excel to manage calculating columns for you.
The following assumes that the cell at the top of your list contains the word "Detail".
Second, I would make a new column at the far right with an equation like this:
=mid([#Detail], find(")",[#Detail])-4, 4)
...and I would tune the "Find" function and the "mid" function until it gives me just the year.
Third, sort the entire table by this new column. Tada!
Transfer the data to column A. Cells A1 to A1000 in my Example.
In Enter the years in column C. Cells C2 to C20 in my example.
In cell D2, enter the following Array Formula, and drag it down.
=SUM(IFERROR(IF(VALUE(LEFT(RIGHT($A$1:$A$1000,5),4))=C2,1,0),"-"))
Array Formulas are entered using Control + Shift + Enter, instead of Enter.
The Formula takes the last 5 characters of all entries in the column A. Then it takes the first 4 characters of this new text (to eliminate the closing bracket) and converts the text entries to numerical values. It matches each entry with the year in column C, and totals the matches.
I hope this solves your problem.
Regards,
Vijaykumar Shetye,
Spreadsheet Excellence,
Panaji, Goa India

Sorting and merging in Stata on categorical variables

I am in the process of merging two data sets together in Stata and came up with a potential concern.
I am planning on sorting each data set in exactly the same manner on several categorical variables that are common to both sets of data. HOWEVER, several of the categorical variables have more categories present in one data set over the other. I have been careful enough to ensure that the coding matches up in both data sets (e.g. Red is coded as 1 in both data set A and B, but data set A has only Red, Green and Blue whereas data set B has Red, Green, Blue, and Yellow).
If I were to sort each data set the same way and generate an id variable (gen id = _n) and merge on that, would I run into any problems?
There is no statistical question here, as this is purely about data management in Stata, so I too shall shortly vote for this to be migrated to Stack Overflow, where I would be one of those who might try to answer it, so I will do that now.
What you describe to generate identifiers is not how to think of merging data sets, regardless of any of the other details in your question.
Imagine any two data sets, and then in each data set, generate an identifier that is based on the observation numbers, as you propose. Generating such similar identifiers does not create a genuine merge key. You might as well say that four values "Alan" "Bill" "Christopher" "David" in one data set can be merged with "William" "Xavier" "Yulia" "Zach" in another data set because both can be labelled with observation numbers 1 to 4.
My advice is threefold:
Try what you are proposing with your data and try to understand the results.
Consider whether you have something else altogether, namely an append problem. It is quite common to confuse the two.
If both of those fail, come back with a real problem and real code and real results for a small sample, rather than abstract worries.
I think I may have solved my problem - I figured I would post an answer specifically relating to the problem in case anybody has the same issue.
~~
I have two data sets: One containing information about the amount of time IT help spent at a customer and another data set with how much product a customer purchased. Both data sets contain unique ID numbers for each company and the fiscal quarter and year that link the sets together (e.g. ID# 1001 corresponds to the same company in both data sets). Additionally, the IT data set contains unique ID numbers for each IT person and the customer purchases data set contains a unique ID number for each purchase made. I am not interested in analysis at the individual employee level, so I collapsed the IT time data set to the total sum of time spent at a given company regardless of who was there.
I was interested in merging both data sets so that I could perform analysis to estimate some sort of "responsiveness" (or elasticity) function linking together IT time spent and products purchased.
I am certain this is a case of "merging" data because I want to add more VARIABLES not OBSERVATIONS - that is, I wish to horizontally elongate not vertically elongate my final data set.
Stata 12 has many options for merging - one to one, many to one, and one to many. Supposing that I treat my IT time data set as my master and my purchases data set as my merging set, I would perform a "m:1" or many to one merge. This is because I have MANY purchases corresponding to one observation per quarter per company.

BIRT: Adding multiple Category (X) Series

I have a single dataset containing 4 columns, each showing the number of rejections for a quarter-year. A 5th column shows the Team to which those values belong.
Is it possible to add 4 fixed points on the x-Axis, each belonging to one of these columns? Then I could add the Team as the Y-Series. I'd like to see the evolution of each team in time.
Take a look at this example:
http://www.birt-exchange.org/org/devshare/designing-birt-reports/1553-use-column-names-as-chart-xaxis/
I solved this by writing a (really short, really simple) loop which takes the values out of the four mentioned columns, one at a time, and creates four rows instead.
So the rows basically contain the Team from the original row (duplicated 4 times) and the Number of Rejections. So now instead of a Row with one team and four numbers, I have four rows, each with one team and one number.
I did this all in the report scripts (under "fetch"). Try it, it's really easy.

Is it possible to create a table made up of multiple what-if scenario results?

I'm going to describe my goal in steps because I think that might be the easiest way to explain it. This is what I'm trying to do:
1) Create a template that has various calculations on it. On this template, 1 specific cell is left blank. The calculations will change depending on what's in this cell (I'll refer to this as the special cell).
2) There's one final figure behind these calculations that's important. What I want to do is create a list with every possible final figure and in an adjacent cell, list the value of the special cell that gives this final figure.
The problem is Excel for Mac 2008 doesn't use macros or VBA. In my Windows version of Excel, this is just a simple function. But on Excel for Mac 2008, I'm not sure at all how to tackle this. The only solution I can think of is to create one sheet for every possible value of the special cell, with all the calculations done specifically for that value of the special cell. Then I could just link each final figure/special cell to a main page so all the information is together. However, there are roughly 400 values the special cell can take, and I really don't want to create 400 different sheets. Does anybody know how I can do this?
Also, just as a note in case this is easier to visualize what I mean, I'm basically trying to run multiple what-if scenarios and collect one specific number from each of these scenarios.
Here's an example of the processes involved. I should mention here that there are actual 2 different special cells, I wrote 1 in the original description because I'm assuming the idea would be the same to do 2:
1) The main template sheet is located on Sheet A
2) There are 10 slots for store names
3) Each store has a rate, the rate is found by applying a vlookup which looks up the special cell 1 and where the array table is located on Sheet B
4) Each store also has an index number (referred to as index)
5) Each store has a calculation which is index * special cell 2 (referred to as calc1)
6) Each store has another calculation which is rate * num1 (referred to as calc2)
7) Each store has another index number (referred to as index2)
8) Some of the index2 values have to be multiplied by calc2, the rest will stay the same (referred to as calc3)
9) A summation has to be done, summing all the calc2 values to result in sum1
10) A summation has to be done, summing all the calc3 values to result in sum2
11) The final figure is sum1 + sum2
It sounds like you could create 400 rows where each row is a what if scenario. Then next to each row you could take an input and an output, and graph accordingly.
Update
Per your description so far I've created the attached workbook with some formulas to put you in the right direction:
https://dl.dropbox.com/u/19599049/120813_2c.xlsx
It calculates the sum1 and sum2 For 10 stores based on the 2 inputs.
Note that I colored which cells were ending up in which final output.
yellow = original sum1/sum2
blue = array formula version of sum1/sum2
green = data used in both.
I did this to point out that while this example workbook seems to follow all 11 of your rules. the input 2 doesnt appear to be included in the final outputs of my mock-up version for some reason.
Either way this should serve as a good basis to get you started. And I can modify it if you continue to include more details.

Resources