I have a large table with about 1000 rows. If I try to get selected rows from the table, it takes more than 30 seconds. Is there way to speed up the process? I use JXA.
selected = table.rows.whose({selected:true})()
names = ""
for (r in selected) {
names+=", "+selected[r].uiElements[1].name()
}
console.log(names)
Is there a faster way?
Thanks!
Apparently you can just narrow down using specifier without going through each element to get names for them.
names = table.rows.whose({selected:true}).uiElements[1].name()
This was much faster for my case.
Related
I'm not a VBA coder, and I would prefer an excel formula if possible, the easiest solution will be the best one.
Test workbook screenshot
As you can see, I have plenty of columns, which are filterable.
I am attempting to retrieve an average of Column L, but I want the data to be calculated for the correct month in G3:R3.
The resulting calculation needs to be recalculated when filtered, between customers, sites, status, job type etc.
I am referencing the resulting cells in another sheet, which gives an idea of trends I can glance at, as such filtering by month in each sheet, is not an option.
=AVERAGE(IF(MONTH(E9:E1833)=1,(J9:J1833)))
This one does not update with the filtered data.
=SUM(IF(MONTH(E9:E1833)=1,J9:J1833,0)) /SUM(IF(MONTH(E9:E1833)=1,1))
This one does not update with the filtered data.
I have tried 5 different SUBTOTAL formulas, some with OFFSET, none of these produce the same result I get when checking manually.
Each worksheet has over 1,500 hundred rows, the largest is 29148 rows. The data goes back as far as 2005.
Please can someone help me find a solution?
One possible solution is to create a helper column which returns 1 if the row is visible and returns 0 if the row is invisible (or blank). This allows a bit more freedom in your formulas.
For example, if you want to create a helper column in column X, type this into cell X9 and drag down:
= SUBTOTAL(103,A9)
Now you can create a custom average formula, for example:
= SUMPRODUCT((MONTH(E9:E1833)=1)*(X9:X1833)*(J9:J1833))/
SUMPRODUCT((MONTH(E9:E1833)=1)*(X9:X1833))
Not exactly pretty but it gets the job done. (Note this is an array formula, so you must press Ctrl+Shift+Enter on your keyboard instead of just Enter after typing this formula.)
With even more helper columns you could avoid SUMPRODUCT altogether and just accomplish this by doing a single AVERAGEIFS.
For example if you type into cell Y9 and drag down:
= MONTH(E9)
Then your formula could be:
= AVERAGEIFS(J9:J1833,X9:X1833,1,Y9:Y1833,1)
There isn't a clean way to do this without at least one helper function (if you want to avoid VBA).
I'm new to the CouchDb.
I have to filter records by date (date must be between two values) and to sort the data by the name or by the date etc (it depends on user's selection in the table).
In MySQL it looks like
SELECT * FROM table WHERE date > "2015-01-01" AND date < "2015-08-01" ORDER BY name/date/email ASC/DESC
I can't figure out if I can use one view for all these issues.
Here is my map example:
function(doc) {
emit(
[doc.date, doc.name, doc.email],
{
email:doc.email,
name:doc.name,
date:doc.date,
}
);
}
I try to filter data using startkey and endkey, but I'm not sure how to sort data in this way:
startkey=["2015-01-01"]&endkey=["2015-08-01"]
Can I use one view? Or I have to create some views with keys order depending on my current order field: [doc.date, doc.name, doc.email], [doc.name, doc.date, doc.email] etc?
Thanks for your help!
As Sebastian said you need to use a list function to do this in Couch.
If you think about it, this is what MySQL is doing. Its query optimizer will pick an index into your table, it will scan a range from that index, load what it needs into memory, and execute query logic.
In Couch the view is your B-tree index, and a list function can implement whatever logic you need. It can be used to spit out HTML instead of JSON, but it can also be used to filter/sort the output of your view, and still spit out JSON in the end. It might not scale very well to millions of documents, but MySQL might not either.
So your options are the ones Sebastian highlighted:
view sorts by date, query selects date range and list function loads everything into memory and sorts by email/etc.
views sort by email/etc, list function filters out everything outside the date range.
Which one you choose depends on your data and architecture.
With option 1 you may skip the list function entirely: get all the necessary data from the view in one go (with include_docs), and sort client side. This is how you'll typically use Couch.
If you need this done server side, you'll need your list function to load every matching document into an array, and then sort it and JSON serialize it. This obviously falls into pieces if there are soo many matching documents that they don't even fit into memory or take to long to sort.
Option 2 scans through preordered documents and only sends those matching the dates. Done right this avoids loading everything into memory. OTOH it might scan way too many documents, trashing your disk IO.
If the date range is "very discriminating" (few documents pass the test) option 1 works best; otherwise (most documents pass) option 2 can be better. Remember that in the time it takes to load a useless document from disk (option 2), you can sort tens of documents in memory, as long as they fit in memory (option 1). Also, the more indexes, the more disk space is used and the more writes are slowed down.
you COULD use a list function for that, in two ways:
1.) Couch-View is ordered by dates and you sort by e-amil => but pls. be aware that you'd have to have ALL items in memory to do this sort by e-mail (i.e. you can do this only when your result set is small)
2.) Couch-View is ordered by e-mail and a list function drops all outside the date range (you can only do that when the overall list is small - so this one is most probably bad)
possibly #1 can help you
I want to analyze a large dataset (2,000,000 records, 20,000 customer IDs, 6 nominal attributes) using the Generalized Sequential Pattern algorithm.
This requires all attributes, aside from the time and customer ID attribute, to be binominal. Having 6 nominal attributes which I want to analyze for patterns, I need to transform those into binominal attributes, using the "Nominal to Binominal" Function. This is causing memory problems on my workstation (with 16GB RAM, of which I allocated 12 to the Java instance running rapidminer).
Ideally I would like to set up my project in a way, that it writes temporarily to the disc or using temporary tables in my oracle database, from which my model also reads the data directly. In order to use the "write database" or "update database" function, I need to have an existing table already in my database with boolean columns already (if I'm not mistaken).
I tried to write step by step the results of the binominal conversion into csv files onto my local disk. I started using the nominal attribute with the least distinct values, resulting in a csv file containing my dataset ID and now 7 binominal attributes. I was seriously surprised seeing the filesize being >200MB already. This is cause by rapidminer writing strings for the binominal values "true"/"false". Wouldn't it be way more memory efficient just writing 0/1?
Is there a way to either use the oracle database directly or working with 0/1 values instead of "true"/"false"? My next column would have 3000 distinct values to be transformed which would end in a nightmare...
I'd highly appreciate recommendations on how to use the memory more efficient or work directly in the database. If anyone knows how to easily transform a varchar2-column in Oracle into boolean columns for each distinct value that would also be appreciated!
Thanks a lot,
Holger
edit:
My goal is to get from such a structure:
column_a; column_b; customer_ID; timestamp
value_aa; value_ba; 1; 1
value_ab; value_ba; 1; 2
value_ab; value_bb; 1; 3
to this structure:
customer_ID; timestamp; column_a_value_aa; column_a_value_ab; column_b_value_ba; column_b_value_bb
1; 1; 1; 0; 1; 0
1; 2; 0; 1; 1; 0
1; 3; 0; 1; 0; 1
This answer is too long for a comment.
If you have thousands of levels for the six variables you are interested in, then you are unlikely to get useful results using that data. A typical approach is to categorize the data going in, which results in fewer "binominal" variables. For instance, instead of "1 Gallon Whole Milk", you use "diary products". This can result in more actionable results. Remember, Oracle only allows 1,000 columns in a table so the database has other limiting factors.
If you are working with lots of individual items, then I would suggest other approaches, notably an approach based on association rules. This will not limit you by the number of variables.
Personally, I find that I can do much of this work in SQL, which is why I wrote a book on the topic ("Data Analysis Using SQL and Excel").
You can use the operator Nominal to Numeric to convert true and false values to 1 or 0. set the coding type parameter to be unique integers.
I am currently using a dsum to calculate some totals and I noticed excel has become really slow (needs 2 seconds per cell change).
This is the situation:
- I am trying to calculate 112 dsums to show in a chart;
- all dsums are queries on a table with 15 columns and +32k rows;
- all dsums have multiple criteria (5-6 constraints);
- the criteria uses both numerical and alpha-numerical constraints;
- i have the source table/range sorted;
- excel file is 3.4 mb in size;
(I am using excel 2007 on an 4 year old windows laptop)
Any ideas on what can be done to make it faster?
...other than reducing the number of dsums :P ====>>> already working on that one.
Thanks!
Some options are:
Change Calculation to Manual and press F9 whenever you want to calculate
Try SUMIFS rather than DSUM
Exploit the fact that the data is sorted by using MATCH and COUNTIF to find the first row and count of rows, then use OFFSET or INDEX to get the relevant subset of data to feed to SUMIFS for the remaining constraints
Instead of DSUMs you could also put it all in one or multiple Pivot tables - and then use GETPIVOTDATA to extract the data you need. The reading of the table will take up a bit of time (though 32k rows should be done below 1") - and then GETPIVOTDATA is lightning fast!
Downsides:
You need to manually refresh the pivot when you get new data
The pivot(s) need to be laid out so the requested data is show
File size will increase (unless Pivot cache is not stored, the file loading takes longer)
I'm a big fan of Linq for typeing, clarity and brevity. But I'm finding it very slow to search for matching records compared to the old dataview by a factor of some 2000 times!
I am writing an app to backup large sets of files - 500,000 files and 500 gb of data. I have created a manifest of files in the backup set and compare the files in the directory with those in the manifest documenting what's been backed up already. This way I know which files have changed and so need to be copied.
The slow step is this one:
var matchingMEs = from m in manifest
where m.FullName == fi.FullName
select m;
where manifest = List<ManifestEntry> and ManifestEntry is a relatively simple POCO.
Overall performance is 17-18 records per second.
When I use a dataview:
DataView vueManifest = new DataView(dt, "", "FullName", DataViewRowState.CurrentRows);
then in the loop find the matching manifest entries with a .FindRows:
matchingMEs = vueManifest.FindRows(fi.FullName);
... then I'm getting some 35,000 files per second throughput!
Is this normal? I can't believe that Linq comes at such a price. Is it the Linq or the objects that slow things down?
(btw, I tried using a Dictionary and a SortedList as well as the List<ManifestEntries> and they all gave about the same result.)
Your DataView is sorting by fullname and hence FindRows can jump straight to the correct record(s), whereas your linq query has to iterate through list until it reaches the correct record(s).
This will definitely be noticeable if you have 500,000 entries.
Assuming fullname is unique, then when you switched to using a dictionary, I would suspect you are still iterating through it using a similar linq query, something like
var matchingME = (from m in manifest where m.Key == fi.FullName select m).Single();
whereas you should be using
var matchingME = manifest[fi.FullName] ;