I'm trying to have a similar function to SUMIFS (like SUMIF but with more than a single criterion) in a Google Spreadsheet. MS-Excel has this function built-in (http://office.microsoft.com/en-us/excel-help/sumifs-function-HA010342933.aspx?CTT=1).
I've tried to use ArrayFormula (http://support.google.com/docs/bin/answer.py?hl=en&answer=71291), similar to the SUMIF:
=ARRAYFORMULA(SUM(IF(A1:A10>5, A1:A10, 0)))
By Adding AND:
=ARRAYFORMULA(SUM(IF(AND(A1:A10>5,B1:B10=1), C1:C10, 0)))
But the AND function didn't pick up the ArrayFormula instruction and returned FALSE all the times.
The only solution I could find was to use QUERY which seems a bit slow and complex:
=SUM(QUERY(A1:C10,"Select C where A>5 AND B=1"))
My Target is to fill up a table (similar to a Pivot Table) with many values to calculate:
=SUM(QUERY(DataRange,Concatenate( "Select C where A=",$A2," AND B=",B$1)))
Did anyone manage to do it in a simpler and faster way?
The simplest way to easily make SumIFS-like functions in my opinion is to combine the FILTER and SUM function.
SUM(FILTER(sourceArray, arrayCondition_1, arrayCondition_2, ..., arrayCondition_30))
For example:
SUM(FILTER(A1:A10;A1:A10>5;B1:B10=1)
Explanation: the FILTER() filters the rows in A1:A10 where A1:A10 > 5 and B1:B10 = 1. Then SUM() sums the values of those cells.
This approach is very flexible and easily allows for making COUNTIFS() functions for example as well (just use COUNT() instead of SUM()).
I found a faster function to fill up the "pivot table":
=ARRAYFORMULA(SUM(((Sample!$A:$A)=$A2) * ((Sample!$B:$B)=B$1) * (Sample!$C:$C) ))
It seems to run much faster without the heavier String and Query functions.
As of December, 2013, Google Sheets now has a SUMIFS function, as mentioned in this blog post and documented here.
Note that old spreadsheets are not converted to the new version, though you can try copy-pasting the data into a new workbook.
This guy used the Filter function to chop down the array by the criteria, then the sum function to add it all in the same cell.
http://www.youtube.com/watch?v=Q4j3uSqet14
It worked like a charm for me.
Related
Using Google sheet 'ImportXML', I was able to extract the following data from a url(in cell A2) using:
=IMPORTXML(A2,"//a/#href[substring-after(., 'AGX:')]").
Data:
/vector/AGX:5WH
/vector/AGX:Z74
/vector/AGX:C52
/vector/AGX:A27
/vector/AGX:C6L
But, I want to extract the code after "/vector/AGX:". The code is not fixed to 3 letters and number of rows is not fixed as well.
I used =INDEX(SPLIT(AP2,"/,'vector',':'"),1,2). But it applied to only one line of data. Had to copy the index+split function to the whole column and had to insert an additional column to store the codes.
5WH
Z74
C52
A27
C6L
But, I want to be able to extract the code(s) after AGX: using ImportXML in one go. Is there a way?
Solution
Your issue is in how you are implementing the index formula. The first parameter returns the rows (in your case each element) and the second the column (in your case either AGX or the code after that).
If instead of getting a single cell we apply this formula on a range and we do not set any value for the row, the formula will return all the values achieving what you were aiming for. Here is its implementation (where F1:F5 will be the range of values you want this formula to be applied) :
=INDEX(SPLIT(F1:F5,"/,'vector',':'"),,2)
If you are interested in a solution simply using IMPORTXML and XPATH, according to the documentation you could use a substring as follows:
=IMPORTXML(A1,"//a/#href[substring-after(.,'SGX:')]")
The drawback of this is that it will return the full string and not exclusively what is after the SGX: which means that you would need to use a Google sheet formula to splitting this. This is the furthest I have achieved exclusively using XPath. In XML it would be easier to apply a forEach and really select what is after the : but I believe in sheets is more complicated if not impossible just using XPath.
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)
I am trying to reduce the size of my data and I cannot make it work. I have data points taken every minute over 1 month. I want to reduce this data to have one sample for every hour. The problem is: Some of my runs have "NA" value, so I delete these rows. There is not exactly 60 points for every hour - it varies.
I have a 'Timestamp' column. I have used this to make a 'datehour' column which has the same value if the data set has the same date and hour. I want to average all the values with the same 'datehour' value.
How can I do this? I have tried using the if and for loop below, but it takes so long to run.
Thanks for all your help! I am new to Julia and come from a Matlab background.
======= CODE ==========
uniquedatehour=unique(datehour,1)
index=[]
avedata=reshape([],0,length(alldata[1,:]))
for j in uniquedatehour
for i in 1:length(datehour)
if datehour[i]==j
index=vcat(index,i)
else
rows=alldata[index,:]
rows=convert(Array{Float64,2},rows)
avehour=mean(rows,1)
avedata=vcat(avedata,avehour)
index=[]
continue
end
end
end
There are several layers to optimizing this code. I am assuming that your data is sorted on datehour (your code assumes this).
Layer one: general recommendation
Wrap your code in a function. Executing code in global scope in Julia is much slower than within a function. By wrapping it make sure to either pass data to your function as arguments or if data is in global scope it should be qualified with const;
Layer two: recommendations to your algorithm
Statement like [] creates an array of type Any which is slow, you should use type qualifier like index=Int[] to make it fast;
Using vcat like index=vcat(index,i) is inefficient, it is better to do push!(index, i) in place;
It is better to preallocate avedata with e.g. fill(NA, length(uniquedatehour), size(alldata, 2)) and assign values to an existing matrix than to do vcat on it;
Your code will produce incorrect results if I am not mistaken as it will not catch the last entry of uniquedatehour vector (assume it has only one element and check what happens - avedata will have zero rows)
Line rows=convert(Array{Float64,2},rows) is probably not needed at all. If alldata is not Matrix{Float64} it is better to convert it at the beginning with Matrix{Float64}(alldata);
You can change line rows=alldata[index,:] to a view like view(alldata, index, :) to avoid allocation;
In general you can avoid creation of index vector as it is enough that you remember start s and end e position of the range of the same values and then use range s:e to select rows you want.
If you correct those things please post your updated code and maybe I can help further as there is still room for improvement but requires a bit different algorithmic approach (but maybe you will prefer option below for simplicity).
Layer three: how I would do it
I would use DataFrames package to handle this problem like this:
using DataFrames
df = DataFrame(alldata) # assuming alldata is Matrix{Float64}, otherwise convert it here
df[:grouping] = datehour
agg = aggregate(df, :grouping, mean) # maybe this is all what you need if DataFrame is OK for you
Matrix(agg[2:end]) # here is how you can convert DataFrame back to a matrix
This is not the fastest solution (as it converts to a DataFrame and back but it is much simpler for me).
I'd like to calculate the standard deviation over two fields from the same dataset.
example:
MyFields1 = 10, 10
MyFields2 = 20
What I want now, is the standard deviation for (10,10,20), the expected result is 4.7
In SSRS I'd like to have something like this:
=StDevP(Fields!MyField1.Value + Fields!MyField2.Value)
Unfortunately this isn't possible, since (Fields!MyField1.Value + Fields!MyField2.Value) returns a single value and not a list of values. Is there no way to combine two fields from the same dataset into some kind of temporary dataset?
The only solutions I have are:
To create a new Dataset that contains all values from both fields. But this is very annoying because I need about twenty of those and I have six report parameters that need to filter every query. => It's probably getting very slow and annoying to maintain.
Write the formula by hand. But I don't really know how yet. StDevP is not that trivial to me. This is how I did it with Avg which is mathematically simpler:
=(SUM(Fields!MyField1.Value)+SUM(Fields!MyField2.Value))/2
found here: http://social.msdn.microsoft.com/Forums/is/sqlreportingservices/thread/7ff43716-2529-4240-a84d-42ada929020e
Btw. I know that it's odd to make such a calculation, but this is what my customer wants and I have to deliver somehow.
Thanks for any help.
CTDevP is standard deviation.
Such expression works fine for me
=StDevP(Fields!MyField1.Value + Fields!MyField2.Value) but it's deviation from one value (Fields!MyField1.Value + Fields!MyField2.Value) which is always 0.
you can look here for formula:
standard deviation (wiki)
I believe that you need to calculate this for some group (or full dataset), to do this you need set in the CTDevP your scope:
=StDevP(Fields!MyField1.Value + Fields!MyField2.Value, "MyDataSet1")
I am using Excel-DNA to insert formulas into about 40k rows * 10 columns, and it is quite slow.
XlCall.Excel(XlCall.xlcFormula, myFormula, new ExcelReference(row, row, column, column));
I managed to improve it dramatically by temporarily disabling the recalculation of cells on update (XlCall.Excel(XlCall.xlcCalculation, 3);), but ideally I would like to find a way to put an entire column of formulas into excel in a single operation (I am assuming this would improve the speed).
I tried passing an object[,] with my call to xlcFormula:
XlCall.Excel(XlCall.xlcFormula, excelFormulas, new ExcelReference(1, lastRow, columnNumber, columnNumber));
but it put all the formulas into a single field (separated by semicolons). Is there a way to do what I am trying to do, or am I wasting my time on something that is impossible?
I also had this trouble and figured out another way to speedup formula insertion.
Try this code:
var formula = "=1+2";
var reference = new ExcelReference(rowFirst, rowLast, columnFirst, columnLast); // it's а rectangular area, just split up your huge area to smaler ones here
XlCall.Excel(XlCall.xlcFormulaFill, new object[] { formula, reference } ));
This code is good when you want to insert the same formula into a lot of cells.
Try to use relative references in the formula.
Previous solution also works:
XlCall.Excel(XlCall.xlcEcho, false)
... but don't forget to enable echo from time to time.
You could try it with screen updating also switched off XlCall.Excel(XlCall.xlcEcho, false).
What about using the Clipboard? You could copy the formulae (with tabs between the columns) to the clipboard, and paste all at once into the Excel sheet. This would probably be as fast as you could get Excel to process the formula strings.
Using gnumeric, how do I sum the positive values in a range, without
creating a new column?
I'm thinking something along the lines of:
SUM(B21:B25, #>0&)
or
SUM(SELECT(B21:B25, #>0&))
"#>0&" is Mathematica-ese for a function returning true if its
argument is greater than 0, false otherwise.
More generically: how do I apply an aggregate function to cells in a
range that meet a specific condition?
Try using the SUMIF function:
=SUMIF(B21:B25, ">0")
The gnumeric documentation linked above doesn't contain very much detail on the usage of the SUMIF function. The documentation claims that it is Excel compatible, so you may have some luck reading the documentation for Excel's SUMIF function if you want any more information.