Corresp biplot in R - column scores plotted as 0 - correspondence-analysis

I am quite new to R, I am trying to do a Corresp analysis (MASS package) on summarized data. While the output shows row and column score, the resulting biplot shows the column scores as zero, making the plot unreadable (all values arranged by row scores in an expected manner, but flat along the column scores).
the code is
corresp(some_data)
biplot(corresp(some_data, nf = 2))
I would be grateful for any suggestions as to what I'm doing wrong and how to amend this, thanks in advance!
Martin
link to the image
the plot
corresp results

As suggested here:
http://www.statsoft.com/textbook/correspondence-analysis
the biplot actually depicts distributions of the row/column variables over 2 extracted dimensions where the variables' dependency is "the sharpest".
Looks like in your case a good deal of dependencies is concentrated along just one dimension, while the second dimension is already mush less significant.
It does not seem, however, that you relationships are weak. On the contrary, looking at your graph, one can observe the red (column) variable's interception with 2 distinct regions of the other variable values.
Makes sense?
Regards,
Igor

Related

Create a Dynamic Array formula (Excel) to combine multiple results columns into one column that is filtered & sorted using multiple criteria?

The sample data in the image below is collected from a round robin tournament.
There is a Round column,Home team & Away team columns listing who is playing who. A team could be either Home or Away.
For each match in a round (including any "Bye" match) the number of games won for the Home and Away team are recorded in separate columns respectively.
"Ff" = forfeit and has a value of 0. "Bye" result is left blank (at this stage).
Output columns are "Won, Lost, Round".
Required output (shown in the image) is, for any selected team, the top n most-games-won matches (from both Home & Away) sorted in descending order and then the corresponding games lost but sorted in ascending order where the games won are equal. Finally show the rounds where those scores occurred.
These are the challenges I've faced in going from data to output in one step using dynamic array formula:
Collating/Combining the the Win results into 1 column. Likewise the Losses.
Getting the array to ignore blanks or convert "Ff" to 0 without getting #NUM or #VALUE errors.
Ensuring that if I used separate single column arrays the corresponding Loss and Round matched the Win result
Although "Round, Won, Lost" would be acceptable. But I wasn't able to get the Dynamic Array capability to give the required output with this order.
SUMPRODUCT, INDEX(MATCH), SORT(FILTER) functions all hint at a possible one step formula solution.
The solutions are numerous for sorting & filtering where the existing values are already in one column. There was one solution that dealt with 2 columns of values which was somewhat useful How to get the highest values from 2 columns in excel - Stackoverflow 2013
Many other responses are around the use of concatenation, combining/merging array sets, aggregation etc.
My work around solution is to use a Helper Sheet to combine the Wins from the separate results columns and convert blanks & "Ff" to -1. Likewise for Losses. Using the formula for each line
=IF($C5=L$2,IF($F5="",-1,IF($F5="Ff",0,$F5)),IF($D5=L$2,IF($G5="",-1,IF($G5="Ff",0,$G5)),-1))
Example Helper Sheet
To get the final output the Dynamic Array formula was used on the Helper Sheet data
=SORT(FILTER(L$26:N$40,L$26:L$40>=LARGE(L$26:L$40,$J$3),""),{1,2},{-1,1},FALSE)
I'm trying to avoid using pivottable, VBA solutions. Powerquery possible but not preferred.
Apologies for the screenshots but I couldn't work out how to attach the sample spreadsheet file. (Unfortunately Stackoverflow Help didn't help me to/not to do this.)
Based on the comments I changed my answer with a different approach:
=LET(data,A5:F19,
round,INDEX(data,,1),
ha,CHOOSECOLS(data,3,4),
HAwonR,CHOOSECOLS(data,5,6,1),
w,BYROW(ha,LAMBDA(h,IFERROR(XMATCH(L2,h),0))),
clm,CHOOSE(w,{1,2},{2,1}),
srtwon,DROP(REDUCE(0,SEQUENCE(ROWS(data)),LAMBDA(y,z,VSTACK(y,INDEX(HAwonR,z,HSTACK(INDEX(clm,z,),3))))),1),
res,FILTER(srtwon,w),
TAKE(SORT(res,{1,2},{-1,1}),J3))
Old answer:
=LET(data,A5:F19,
round,INDEX(data,,1),
home,INDEX(data,,3),
away,INDEX(data,,4),
HAwonR,CHOOSECOLS(data,5,6,1),
w,MAP(home,away,LAMBDA(h,a,OR(h=L2,a=L2))),
won,FILTER(HAwonR,w),
TAKE(SORT(won,{1,2},{-1,1}),J3))
In your example you selected round 3 for the third result, but that wasn't won, so I guess that was by mistake.
As you can see making use of LET avoids helpers. Let allows you to create names (helpers) that are stored and because you can name them, you can make complex formulas be more readable.
Basically what it does is filter the columns Home, Away and Round (in that order) for either Home or Away equal the team in cell L2. That's sorted column 1 descending and column 2 ascending. Than the number of rows mentioned in cell J3 are displayed from that sorted array.
Here is my solution based on the excellent contribution by #P.b. Thank you much appreciated.
The wins (likewise losses) required mapping the presence, of the team in question, as hT (home team) to the games it won (hG) and adding to that a 2nd mapping of the games it won (aG) when it was the away team (aT). Essentially what was being done on the Helper Sheet. Result was a 1 column array for game wins and a 1 column array for game losses.
In the process I was able to convert the "Ff" text to 0. I attempted without the conversion and it threw an error.
Instead of CHOOSECOLS used HSTACK to create the new array (wins, losses & round) for the FILTER, SORT, TAKE to work on.
If it could be made conciser(?) that is the next challenge. Overall (not just my solution), this exercise has provided greater flexibility and solved the problems stated. I'm happy!
=LET(data,A5:G19,
round,INDEX(data,,1),
hT,INDEX(data,,3),
aT,INDEX(data,,4),
hG,INDEX(data,,6),
aG,INDEX(data,,7),
wins,MAP(hG,
MAP(hT,LAMBDA(h,h=L2)),
LAMBDA(w,t,IF(w="Ff",0,w)*IF(t=TRUE,1,0))) +
MAP(aG,
MAP(aT,LAMBDA(a,a=L2)),
LAMBDA(w,t,IF(w="Ff",0,w)*IF(t=TRUE,1,0))),
losses,MAP(aG,
MAP(hT,LAMBDA(h,h=L2)),
LAMBDA(w,t,IF(w="Ff",0,w)*IF(t=TRUE,1,0))) +
MAP(hG,
MAP(aT,LAMBDA(a,a=L2)),
LAMBDA(w,t,IF(w="Ff",0,w)*IF(t=TRUE,1,0))),
HAwonR,HSTACK(wins,losses,round),
w,MAP(home,away,LAMBDA(h,a,OR(h=L2,a=L2))),
won,FILTER(HAwonR,w),
TAKE(SORT(won,{1,2},{-1,1}),J3))

Google Sheets calculate characters only once

Is there a formula in google sheets to calculate a character only once. For example, if a row has 5 columns (Monday-Friday) and there are 2 or 3 columns marked with X. How can I calculate how many rows have an X. I don't need to know how many Xs there are just how many have an X?
Reina, I have one answer, though there may be better ones.
This formula, pasted into B34, should do what you want. It merges all the cells in column B to F, in each row, into one value, substitutes out possible spaces, then checks if it has at least one "y" (as used in your example.
=COUNTIF(ARRAYFORMULA(
SUBSTITUTE(B4:B29&C4:C29&D4:D29&E4:E29&F4:F29," ","")),
"*y*")
It is coded to search all student rows, ie. between 4 and 29 - change these row numbers if necessary.
If the attendance might be marked with something other than a "y", you could change the "y" part of the formula to "?*". I just didn't know if other values might be used, eg. an "S' for sick day or something, and you wanted to ignore those.
Then, you can drag the new formula from B34, sideways on row 34, to G34 and beyond, and it should calculate the results for the subsequent weeks. It will shift the columns being checked by the formula automatically.
Let me know if this works for you, or if you need something else.
To possibly ease data entry, here is a sample sheet with the formula, but with check boxes replacing the cells where attendance is marked.
https://docs.google.com/spreadsheets/d/1ON5Rc55aLVq_LHtFOfpgmf876bYg2ITfwpbifklr3lU/edit?usp=sharing
Here the formula is slightly modified to look for "TRUE" values, instead of "y"s.
UPDATE: To look for ANY non-blank cell in that range, and count "1" for every student that week that attended at least one day, the formula is:
=COUNTIF(
ARRAYFORMULA( B4:B29&C4:C29&D4:D29&E4:E29&F4:F29), ">""")
or
=COUNTIF(
ARRAYFORMULA( B4:B29&C4:C29&D4:D29&E4:E29&F4:F29), "?*")
See sample here:
https://docs.google.com/spreadsheets/d/1ON5Rc55aLVq_LHtFOfpgmf876bYg2ITfwpbifklr3lU/edit#gid=461771088&range=B34:F34
Let me know if this answers your question, or do you need to do something specific with the "y,x, and o"s?

Is there any option to do FOR loop in excel?

I have an excel that I'm calculating my Scrum Task's completed average. I have Story point item also in the excel. My calculation is:
Result= SP * percentage of completion --> This calculation is for each row and after that I sum up all result and taking the summary.
But sometimes I am adding new task and for each task I am adding the calculation to the average result.
Is there any way to use for loop in the excel?
for(int i=0;i<50;i++){ if(SP!=null && task!=null)(B+i)*(L+i)}
My calculation is like below:
AVERAGE((B4*L4+B5*L5+B6*L6+B7*L7+B8*L8+B9*L9+B10*L10)/SUM(B4:B10))
First of all, AVERAGE is not doing anything in your formula, since the argument you pass to it is just one single value. You already do an average calculation by dividing by the sum. That average is in fact a weighted average, and so you could not even achieve that with a plain AVERAGE function.
I see several ways to make this formula more generic, so it keeps working when you add rows:
1. Use SUMPRODUCT
=SUMPRODUCT(B4:B100,L4:L100)/SUM(B4:B100)
The row number 100 is chosen arbitrarily, but should evidently encompass all data rows. If you have no data occurring below your table, then it is safe to add a large margin. You'll want to avoid the situation where you think you add a line to the table, but actually get outside of the range of the formula. Using proper Excel tables can help to avoid this situation.
2. Use an array formula
This would be a second resort for when the formula becomes more complicated and cannot be executed with a "simple" SUMPRODUCT. But the above would translate to this array formula:
=SUM(B4:B100*L4:L100)/SUM(B4:B100)
Once you have typed this in the formula bar, make sure to press Ctrl+Shift+Enter to enter it. Only then will it act as an array formula.
Again, the same remark about row number 100.
3. Use an extra column
Things get easy when you use an extra column for storing the product of B & L values for each row. So you would put in cell N4 the following formula:
=B4*L4
...and then copy that relative formula to the other rows. You can hide that column if you want.
Then the overal formula can be:
=SUM(N4:N100)/SUM(B4:B100)
With this solution you must take care to always copy a row when inserting a new row, as you need the N column to have the intermediate product formula also for any new row.

Outcome difference: using list & for-loop vs. single parameter input

This is my first question, so please let me know if I'm not giving enough details or asking a question that is not relevant on this platform!
I want to compute the same formula over a grid running from 0 to 4.0209, therefore I'm using a for-loop with an defined array using numpy.
To be certain that the for-loop is right, I've computed a selection of values by just using specific values for the radius an input in the formula.
Now, the outcomes with the same input of the radius is just slightly different. Do I interpret my grid wrongly? Or is there an error in my script?
It probably is something pretty straightforward, but maybe some of you can find a minute to help me out.
Here I use a selection of values for my radius parameter.
Here I use a for-loop to compute over a distance
Here are the differences in the outcomes:
Outcomes computed with for-loop:
9.443,086753902220000000
1.935,510475232510000000
57,174050755727700000
1,688894026484580000
0,020682674424032700
Outcomes computed with selected radii:
9.444,748178731630000000
1.938,918526458330000000
57,476599453309800000
1,703815523775800000
0,020957378277984600

R: Extract non-NA elements from a matrix and return with row/column labels

I have a large matrix as a result of using tapply with an INDEX argument of two rows from a dataframe. Most of the matrix is empty (NA).
Here is how I used tapply: latavgs <- tapply(geodata$latitude,geodata[5:6],FUN=mean) where latavgs is my resulting matrix, and geodata is the dataframe mentioned above.
Is there a way to extract only the non-NA elements from latavgs and return them in such a way that I could have the row and column listed, as well as the value? Or is there a better way to use tapply than what I've done, if I want to take the means of all values in geodata that belong to each unique pair of values from geodata[5:6]? I.e., for each unique pair in geodata[5:6] I get one mean.
Thanks for any help.
Really hard to solve this without fully look at your data.
Try this: Why does tapply take the subset as NA and not exclude them totally

Resources