Conditional Formatting based on Another Range - filter

I want to set conditional formatting on a sheet with range A2:D15 using a custom formula that changes the cell background color. I have column F which includes a list of names (F2:F13), and column H which includes what class the name is (G2:G13). I want to compare each row by saying that if the class in G2 = "Paladin" and F2 is not blank, then perform the conditional formatting. I want this to span all 12 rows in F and G, but I cannot pass an array using an if function.
Example sheet: https://docs.google.com/spreadsheets/d/1a32ItT0HpRsov_oG5-CVHVe3HZV9WP-LypkxugsoK0g/edit?usp=sharing
I tried using this formula:
=if(and(not(isblank(F2)),G2="Paladin"),1)
It successfully changes the first result in my range because it happens to be true, but I need it to include the entire array, so I tried using this:
=if(and(not(isblank(F2:F13)),G2:G13="Paladin"),1)
Also played around with this =if(and(F2=A2,G2="Paladin"),1) - same problem I reckon, but more accurate if I could find a way to use arrays.
However, IF function as I understand it cannot evaluate arrays. I tried using $ signs to play around with it, similar to this example I found: https://www.benlcollins.com/formula-examples/array-formula-intro/ - but that is using numerical data and when I use $ it either applies the conditional formatting on the entire row, or entire column, or the entire range of A3:D16.

you will need 4 rules:
=FILTER(A2, COUNTIF(FILTER(F$2:F,G$2:G="Paladin"), A2))
=FILTER(B2, COUNTIF(FILTER(F$2:F,G$2:G="Paladin"), B2))
=FILTER(C2, COUNTIF(FILTER(F$2:F,G$2:G="Paladin"), C2))
=FILTER(D2, COUNTIF(FILTER(F$2:F,G$2:G="Paladin"), D2))

Related

Extract substring using importxml and substring-after

Using Google sheet 'ImportXML', I was able to extract the following data from a url(in cell A2) using:
=IMPORTXML(A2,"//a/#href[substring-after(., 'AGX:')]").
Data:
/vector/AGX:5WH
/vector/AGX:Z74
/vector/AGX:C52
/vector/AGX:A27
/vector/AGX:C6L
But, I want to extract the code after "/vector/AGX:". The code is not fixed to 3 letters and number of rows is not fixed as well.
I used =INDEX(SPLIT(AP2,"/,'vector',':'"),1,2). But it applied to only one line of data. Had to copy the index+split function to the whole column and had to insert an additional column to store the codes.
5WH
Z74
C52
A27
C6L
But, I want to be able to extract the code(s) after AGX: using ImportXML in one go. Is there a way?
Solution
Your issue is in how you are implementing the index formula. The first parameter returns the rows (in your case each element) and the second the column (in your case either AGX or the code after that).
If instead of getting a single cell we apply this formula on a range and we do not set any value for the row, the formula will return all the values achieving what you were aiming for. Here is its implementation (where F1:F5 will be the range of values you want this formula to be applied) :
=INDEX(SPLIT(F1:F5,"/,'vector',':'"),,2)
If you are interested in a solution simply using IMPORTXML and XPATH, according to the documentation you could use a substring as follows:
=IMPORTXML(A1,"//a/#href[substring-after(.,'SGX:')]")
The drawback of this is that it will return the full string and not exclusively what is after the SGX: which means that you would need to use a Google sheet formula to splitting this. This is the furthest I have achieved exclusively using XPath. In XML it would be easier to apply a forEach and really select what is after the : but I believe in sheets is more complicated if not impossible just using XPath.
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)

Extracting data from text file in AMPL without adding indexes

I'm new to AMPL and I have data in a text file in matrix form from which I need to use certain values. However, I don't know how to use the matrices directly without having to manually add column and row indexes to them. Is there a way around this?
So the data I need to use looks something like this, with hundreds of rows and columns (and several more matrices like this), and I would like to use it as a parameter with index i for rows and j for columns.
t=1
0.0 40.95 40.36 38.14 44.87 29.7 26.85 28.61 29.73 39.15 41.49 32.37 33.13 59.63 38.72 42.34 40.59 33.77 44.69 38.14 33.45 47.27 38.93 56.43 44.74 35.38 58.27 31.57 55.76 35.83 51.01 59.29 39.11 30.91 58.24 52.83 42.65 32.25 41.13 41.88 46.94 30.72 46.69 55.5 45.15 42.28 47.86 54.6 42.25 48.57 32.83 37.52 58.18 46.27 43.98 33.43 39.41 34.0 57.23 32.98 33.4 47.8 40.36 53.84 51.66 47.76 30.95 50.34 ...
I'm not aware of an easy way to do this. The closest thing is probably the table format given in section 9.3 of the AMPL Book. This avoids needing to give indices for every term individually, but it still requires explicitly stating row and column indices.
AMPL doesn't seem to do a lot with position-based input formats, probably because it defaults to treating index sets as unordered so the concept of "first row" etc. isn't meaningful.
If you really wanted to do it within AMPL, you could probably put together a work-around along these lines:
declare a single-index param with length equal to the total size of your matrix (e.g. if your matrix is 10 x 100, this param has length 1000)
edit the beginning and end of your "matrix" data file to turn it into appropriate format for a single-index parameter indexed from 1 to n
then define your matrix something like this:
param m{i in 1..nrows,j in 1..ncols} := x[j+i*(ncols-1)];
(not tested, I won't promise that I have rows and columns the right way around there!)
But you're probably better off editing the input file into one of the standard AMPL matrix formats. AMPL isn't really designed for data wrangling - you can do it in a pinch but if you're doing this kind of thing repeatedly it may be less trouble to code it in a general-purpose language e.g. Python.

Mathematica removing columns cannot take positions through 2 to 3 error

I have a matrix consisting of 3 rows and 4 columns of which which I require the central two columns.
I have attempted extracting the central two columns as follows:
a = a[[2 ;; 3, All]];
On the mathematica function list, the first entry in a[[2 ;; 3, All]] represents the rows and the second the columns, however whenever I try a[[All,2 ;; 3]] it removes the top row rather than the two columns. For some reason they seem inverted. I tried going around this by switching the entries around however, when I use a[[2 ;; 3, All]], I get the error: Part: Cannot take positions 2 through 3 in a.
I cannot wrap my head around why this keeps happening. It also refuses to extract single columns from the matrix as well.
You show that you are assigning a variable to itself and then saying that things don't work for you. That makes me think you might have previously made assignments to variables and the results of that are lurking in the background and might be responsible for what you are seeing.
With a fresh start of Mathematica, before you do anything else, try
mat={{a,b,c,d},
{e,f,g,h},
{i,j,k,l}};
take23[row_]:=Take[row,{2,3}];
newmat = Map[take23, mat]
Map performs the function take23 on every row and returns a list containing all the results giving
{{b,c},
{f,g},
{j,k}}
If need be you can abbreviate that to
newmat = Map[Take[#,{2,3}]&, mat]
but that requires you understand # and & and it gives the same result.
If necessary you can further abbreviate that to
newmat = Take[#,{2,3}]& /# mat
Map is widely used in Mathematica programming and can do many more things than just extract elements. Learning how to use that will increase your Mathematica skill greatly.
Or if you really need to use ;; then this
newmat = mat[[All, 2;;3]]
I interpret the documentation for that to mean you want to do something with All the rows and then within each row you want to extract from the second to the third item. That seems to work for me and instantly returns the same result.
If you instead wrote
newmat = mat[[1;;2, 2;;3]]
that would tell it that you wanted to work from row 1 down to row 2 and within those you want to work from column 2 to column 3 and that gives
{{b,c},
{f,g}}

Keep labels of merged variables only - Stata

I have a database A. I want to merge it with a few variables from database B (which has hundreds of variables). All variables in B have labels. So, if I do:
use A.dta
merge 1:1 id using B.dta, keepusing(var1 var2)
I get all value labels from B copied into A.
If I do instead:
merge 1:1 id using B.dta, keepusing(var1 var2) nolabel
var1 and var2 have no labels in A.
There seems to be no option in merge which allows for a solution in between (i.e. to copy only the value labels of the merged ones).
A workaround would be to run:
labelbook, problems
label drop `r(notused)'
after the first method. Yet, this needs to be run every time a merge is done (and I am merging many many times). It can also be quite slow (dataset B has many many variables).
Another option would be to create a temporary dataset "B-minus" containing only the variables and value labels I want, and merge from it. But this also entails running the same time-consuming code above, so it's no different.
Is there a "better" way to achieve this?
MCVE:
webuse voter, clear
label list // shows two variables with value labels (candidat and inc)
drop candidat inc
label drop candidat inc2 // we drop all value labels
merge 1:1 pop frac using http://www.stata-press.com/data/r14/voter, nogen keepusing(candidat)
label list // instead of having only the candidat label, we also have inc
There is no such an option in merge but you could simply use macro list manipulation:
webuse voter, clear
label list // shows two variables with value labels (candidat and inc)
drop candidat inc
label drop candidat inc2 // we drop all value labels
local labkeep candidat // define which labels you want to keep
merge 1:1 pop frac using http://www.stata-press.com/data/r14/voter, nogen keepusing(candidat)
quietly label dir
local secondary "`r(names)'"
display "`secondary'"
local newlabels : list secondary - labkeep
display "`newlabels'"
label drop `newlabels'
label list
Update: Not using preserve/restore (thanks #Pearly Spencer for highlighting this) further improves the speed of the method. To see old code with preserve/restore, see the older versions of this answer.
I think I found a faster method to solve the problem (at least judging by results using timer on, timer off).
So, to recap, the current, slow approach is to merge databases, and then drop all unused labels using
labelbook, problems
label drop `r(notused)'
An alternative and faster method is to load a smaller dataset using only the needed variables. This will only contain the labels of selected variables. Then, merge this smaller database with the original one. Importantly, the merge direction is reversed! This way we eliminate the need for preserve/restore, which as #Pearly Spencer suggested, can slow down things a bit, particularly in larger datasets.
In terms of my original example, the code would be:
*** Open and work with dataset A ***
use A.dta // load original dataset
... // do stuff with it (added just for generality)
save A_final.dta // name of final dataset
*** Load dataset B with subset of needed variables only ***
use id var1 var2 using B.dta, clear // this loads id (needed for merging), var1 and var2 and their labels only!
*** Merge modified A dataset into smaller B dataset ***
merge 1:1 id using A_final.dta, keep(using match) // we do not specify any variables to load, as all those in A_final.dta needed) IMPORTANT: if we want to keep all observations of the original dataset (A, which is the one being merged into B), we need to use "using" rather than "master" in the "keep()" option.
save A_final.dta, replace // Create final version of A. Done!
That's it! I'm not sure this is the optimal solution, but in my case, where I am merging many datasets which have hundreds of variables, it is way faster.
The code in terms of the MCVE would be:
*** Open original dataset and work with it ***
webuse voter, clear
label list // shows two variables with value labels (candidat and inc)
drop candidat inc
label drop candidat inc2 // we drop all value labels
save final.dta
*** Create temporary dataset ***
use pop frac candidat using http://www.stata-press.com/data/r14/voter, clear // this is key. Only load needed variables!
*** Merge temporary dataset with original one ***
merge 1:1 pop frac using final.dta, nogen
label list // we only have the "candidat" label! Success!
save final.dta, replace

Hide Labels with No Data in SPSS

I just started using SPSS, there is a option of Select cases that I was trying in SPSS, and later on finding frequency based on that filter.
For Eg:
Suppose Q1 has 12 parts, Q1_1 Q1_2 Q1_3 Q1_4 Q1_5 Q1_6 Q1_7 Q1_8 Q1_9 Q1_10 Q1_11 Q1_12
I want to see data in these variables based on a condition that I used in select cases. Now when I try to see frequencies of these variables based on the filter, only 4 out of 12 satisfy has data.
Now my question is can I hide rest 8 and show only 4 with data on my output window.
It's not entirely clear what you are trying to describe however reading between the lines, I'm guessing you are trying to delete tables generated from FREQUENCIES which may happen to be empty (likely due to a filter applied but perhaps not necessarily either)
You could do this with SPSS Scripting but avoiding that, you may want to explore using CTABLES, which though may not be in the exact same format as FREQUENCY table output it will still none the less retrieve the same information.
Solution below. Assumes Python Integration with SPSS SELECT VARIABLES installed and of course the CTABLE add-on module.
/****** Simulate example data ******/.
input program.
loop #j = 1 to 100.
compute ID=#j.
vector Q(12).
loop #i = 1 to 12.
do if #j<51 and #i<9.
compute Q(#i) = $sysmis.
else.
compute Q(#i) = trunc(rv.uniform(1,5)).
end if.
end loop.
end case.
end loop.
end file.
end input program.
execute.
/************************************/.
/* frequencies without filtering applied */.
freq q1 to q12.
/* frequencies WITH filtering applied */.
/* Empty table here shoult be removed */.
temp.
select if (ID<51).
freq q1 to q12.
spssinc select variables macroname="!Qp" /properties pattern = "^Q\d+$"/options separator="+" order=file.
spssinc select variables macroname="!Qs" /properties pattern = "^Q\d+$"/options separator=" " order=file.
temp.
select if (ID<51).
ctables /table (!Qp)[c][count colpct]
/categories variables=!Qs empty=exclude.
Note if you had assess empty variables at a total level then there is a function in spssaux2 (spssaux2.FindEmptyVars) which could help you find the empty variables and then you could build the syntax to exclude these and so contain the variables with only valid responses and then run FREQUENCIES. But I don't think spssaux2.FindEmptyVars will honor any filtering.

Resources