Speeding up the insertion of formulas via xlcFormula via Excel-DNA - excel-dna

I am using Excel-DNA to insert formulas into about 40k rows * 10 columns, and it is quite slow.
XlCall.Excel(XlCall.xlcFormula, myFormula, new ExcelReference(row, row, column, column));
I managed to improve it dramatically by temporarily disabling the recalculation of cells on update (XlCall.Excel(XlCall.xlcCalculation, 3);), but ideally I would like to find a way to put an entire column of formulas into excel in a single operation (I am assuming this would improve the speed).
I tried passing an object[,] with my call to xlcFormula:
XlCall.Excel(XlCall.xlcFormula, excelFormulas, new ExcelReference(1, lastRow, columnNumber, columnNumber));
but it put all the formulas into a single field (separated by semicolons). Is there a way to do what I am trying to do, or am I wasting my time on something that is impossible?

I also had this trouble and figured out another way to speedup formula insertion.
Try this code:
var formula = "=1+2";
var reference = new ExcelReference(rowFirst, rowLast, columnFirst, columnLast); // it's а rectangular area, just split up your huge area to smaler ones here
XlCall.Excel(XlCall.xlcFormulaFill, new object[] { formula, reference } ));
This code is good when you want to insert the same formula into a lot of cells.
Try to use relative references in the formula.
Previous solution also works:
XlCall.Excel(XlCall.xlcEcho, false)
... but don't forget to enable echo from time to time.

You could try it with screen updating also switched off XlCall.Excel(XlCall.xlcEcho, false).
What about using the Clipboard? You could copy the formulae (with tabs between the columns) to the clipboard, and paste all at once into the Excel sheet. This would probably be as fast as you could get Excel to process the formula strings.

Related

Extract substring using importxml and substring-after

Using Google sheet 'ImportXML', I was able to extract the following data from a url(in cell A2) using:
=IMPORTXML(A2,"//a/#href[substring-after(., 'AGX:')]").
Data:
/vector/AGX:5WH
/vector/AGX:Z74
/vector/AGX:C52
/vector/AGX:A27
/vector/AGX:C6L
But, I want to extract the code after "/vector/AGX:". The code is not fixed to 3 letters and number of rows is not fixed as well.
I used =INDEX(SPLIT(AP2,"/,'vector',':'"),1,2). But it applied to only one line of data. Had to copy the index+split function to the whole column and had to insert an additional column to store the codes.
5WH
Z74
C52
A27
C6L
But, I want to be able to extract the code(s) after AGX: using ImportXML in one go. Is there a way?
Solution
Your issue is in how you are implementing the index formula. The first parameter returns the rows (in your case each element) and the second the column (in your case either AGX or the code after that).
If instead of getting a single cell we apply this formula on a range and we do not set any value for the row, the formula will return all the values achieving what you were aiming for. Here is its implementation (where F1:F5 will be the range of values you want this formula to be applied) :
=INDEX(SPLIT(F1:F5,"/,'vector',':'"),,2)
If you are interested in a solution simply using IMPORTXML and XPATH, according to the documentation you could use a substring as follows:
=IMPORTXML(A1,"//a/#href[substring-after(.,'SGX:')]")
The drawback of this is that it will return the full string and not exclusively what is after the SGX: which means that you would need to use a Google sheet formula to splitting this. This is the furthest I have achieved exclusively using XPath. In XML it would be easier to apply a forEach and really select what is after the : but I believe in sheets is more complicated if not impossible just using XPath.
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)

(Using Julia) How can I reduce my data matrix by averaging values from the same hour?

I am trying to reduce the size of my data and I cannot make it work. I have data points taken every minute over 1 month. I want to reduce this data to have one sample for every hour. The problem is: Some of my runs have "NA" value, so I delete these rows. There is not exactly 60 points for every hour - it varies.
I have a 'Timestamp' column. I have used this to make a 'datehour' column which has the same value if the data set has the same date and hour. I want to average all the values with the same 'datehour' value.
How can I do this? I have tried using the if and for loop below, but it takes so long to run.
Thanks for all your help! I am new to Julia and come from a Matlab background.
======= CODE ==========
uniquedatehour=unique(datehour,1)
index=[]
avedata=reshape([],0,length(alldata[1,:]))
for j in uniquedatehour
for i in 1:length(datehour)
if datehour[i]==j
index=vcat(index,i)
else
rows=alldata[index,:]
rows=convert(Array{Float64,2},rows)
avehour=mean(rows,1)
avedata=vcat(avedata,avehour)
index=[]
continue
end
end
end
There are several layers to optimizing this code. I am assuming that your data is sorted on datehour (your code assumes this).
Layer one: general recommendation
Wrap your code in a function. Executing code in global scope in Julia is much slower than within a function. By wrapping it make sure to either pass data to your function as arguments or if data is in global scope it should be qualified with const;
Layer two: recommendations to your algorithm
Statement like [] creates an array of type Any which is slow, you should use type qualifier like index=Int[] to make it fast;
Using vcat like index=vcat(index,i) is inefficient, it is better to do push!(index, i) in place;
It is better to preallocate avedata with e.g. fill(NA, length(uniquedatehour), size(alldata, 2)) and assign values to an existing matrix than to do vcat on it;
Your code will produce incorrect results if I am not mistaken as it will not catch the last entry of uniquedatehour vector (assume it has only one element and check what happens - avedata will have zero rows)
Line rows=convert(Array{Float64,2},rows) is probably not needed at all. If alldata is not Matrix{Float64} it is better to convert it at the beginning with Matrix{Float64}(alldata);
You can change line rows=alldata[index,:] to a view like view(alldata, index, :) to avoid allocation;
In general you can avoid creation of index vector as it is enough that you remember start s and end e position of the range of the same values and then use range s:e to select rows you want.
If you correct those things please post your updated code and maybe I can help further as there is still room for improvement but requires a bit different algorithmic approach (but maybe you will prefer option below for simplicity).
Layer three: how I would do it
I would use DataFrames package to handle this problem like this:
using DataFrames
df = DataFrame(alldata) # assuming alldata is Matrix{Float64}, otherwise convert it here
df[:grouping] = datehour
agg = aggregate(df, :grouping, mean) # maybe this is all what you need if DataFrame is OK for you
Matrix(agg[2:end]) # here is how you can convert DataFrame back to a matrix
This is not the fastest solution (as it converts to a DataFrame and back but it is much simpler for me).

R: Which heatmap/image to get row-sorted plot without any dendrogram?

Which package is best for a heatmap/image with sorting on rows only, but don't show any dendrogram or other visual clutter (just a 2D colored grid with automatic named labels on both axes). I don't need fancy clustering beyond basic numeric sorting. The data is a 39x10 table of numerics in the range (0,0.21) which I want to visualize.
I searched SO (see this) and the R sites, and tried a few out. Check out R Graphical Manual to see an excellent searchable list of screenshots and corresponding packages.
The range of packages is confusing - which one is the preferred heatmap (like ggplot2 is for most other plotting)? Here is what I found out so far:
base::image - bad, no name labels on axes, no sorting/clustering
base::heatmap - options are far less intelligible than the following:
pheatmap::pheatmap - fantastic but can't seem to turn off the
dendrograms? (any hacks?)
ggplot2 people use geom_tile, as Andrie points out
gplots::heatmap.2 , ref - seems
to be favored by biotech people, but way overkill for my purposes. (no
relation to ggplot* or Prof Wickham)
plotrix::color2D.matplot also exists
base::heatmap is annoying, even with args heatmap(..., Colv=NA, keep.dendro=FALSE) it still plots the unwanted dendrogram on rows.
For now I'm going with pheatmap(..., cluster_cols=FALSE, cluster_rows=FALSE) and manually presorting my table, like this guy: Order of rows in heatmap?
Addendum: to display the value inside each cell, see: display a matrix, including the values, as a heatmap . I didn't need that but it's nice-to-have.
With pheatmap you can use options treeheight_row and treeheight_col and set these to 0.
just another option you have not mentioned...package bipartite as it is as simple as you say
library(bipartite)
mat<-matrix(c(1,2,3,1,2,3,1,2,3),byrow=TRUE,nrow=3)
rownames(mat)<-c("a","b","c")
colnames(mat)<-c("a","b","c")
visweb(mat,type="nested")

SUMIFS function in Google Spreadsheet

I'm trying to have a similar function to SUMIFS (like SUMIF but with more than a single criterion) in a Google Spreadsheet. MS-Excel has this function built-in (http://office.microsoft.com/en-us/excel-help/sumifs-function-HA010342933.aspx?CTT=1).
I've tried to use ArrayFormula (http://support.google.com/docs/bin/answer.py?hl=en&answer=71291), similar to the SUMIF:
=ARRAYFORMULA(SUM(IF(A1:A10>5, A1:A10, 0)))
By Adding AND:
=ARRAYFORMULA(SUM(IF(AND(A1:A10>5,B1:B10=1), C1:C10, 0)))
But the AND function didn't pick up the ArrayFormula instruction and returned FALSE all the times.
The only solution I could find was to use QUERY which seems a bit slow and complex:
=SUM(QUERY(A1:C10,"Select C where A>5 AND B=1"))
My Target is to fill up a table (similar to a Pivot Table) with many values to calculate:
=SUM(QUERY(DataRange,Concatenate( "Select C where A=",$A2," AND B=",B$1)))
Did anyone manage to do it in a simpler and faster way?
The simplest way to easily make SumIFS-like functions in my opinion is to combine the FILTER and SUM function.
SUM(FILTER(sourceArray, arrayCondition_1, arrayCondition_2, ..., arrayCondition_30))
For example:
SUM(FILTER(A1:A10;A1:A10>5;B1:B10=1)
Explanation: the FILTER() filters the rows in A1:A10 where A1:A10 > 5 and B1:B10 = 1. Then SUM() sums the values of those cells.
This approach is very flexible and easily allows for making COUNTIFS() functions for example as well (just use COUNT() instead of SUM()).
I found a faster function to fill up the "pivot table":
=ARRAYFORMULA(SUM(((Sample!$A:$A)=$A2) * ((Sample!$B:$B)=B$1) * (Sample!$C:$C) ))
It seems to run much faster without the heavier String and Query functions.
As of December, 2013, Google Sheets now has a SUMIFS function, as mentioned in this blog post and documented here.
Note that old spreadsheets are not converted to the new version, though you can try copy-pasting the data into a new workbook.
This guy used the Filter function to chop down the array by the criteria, then the sum function to add it all in the same cell.
http://www.youtube.com/watch?v=Q4j3uSqet14
It worked like a charm for me.

Jxl and maximum number of formatted cells

When I'm writing Excel files with jxl and use your own cell format, I get this warning: The maximum number of formatted cells has exprired. Using default format". I have about 350 cells that need to be formatted, which seems relatively little to me. Am I doing something wrong? I use loops to set my cell format. Or is there any way to increase the number of formatted cells? My whole code this long, but here is a simple example of doing formatting:
for (int=0;i<30;i++) }
ws.getWritableCell(2, i).setCellFormat(sumrow());
{
How are you creating the CellFormat objects?
What you want to do is to make sure you are reusing the CellFormat objects and not recreating them in a loop somewhere.
That is unless you really have 350 cells that each have a different formatting. Otherwise create a single CellFormat object and pass that into setCellFormat.
Make the NumberFormat to EXPONENTIAL. It worked for me. Like this:-
NumberFormats.EXPONENTIAL

Resources