Slicing in PyTables - performance

Slicing in PyTables - performance

What is the fastest way to slice arrays that are saved in h5 using PyTables?
The scenario is the following:
The data was already saved (no need to optimize here):
filters = tables.Filters(complib='blosc', complevel=5)
h5file = tables.open_file(hd5_filename, mode='w',
title='My Data',
filters=filters)
group = h5file.create_group(h5file.root, 'Data', 'Data')
X_atom = tables.Float32Atom(shape=[50,50,50])
X = h5file.create_carray(group, 'X', atom=X_atom, title='XData',
shape=(1000,), filters=filters)
The data is opened :
h5file = tables.openFile(hd5_filename, mode="r")
node = h5file.getNode('/', data_node)
X = getattr(node, X_str)
This is where I need optimization, I need to make a lot of the following kind of array slicing that cannot be sorted, for many many indexes and different min/max locations:
for index, min_x, min_y, min_z, max_x, max_y, max_z in my_very_long_list:
current_item = X[index][min_x:max_x,min_y:max_y,min_z:max_z]
do_something(current_item)
The question is:
Is this the fastest way to do the task?

Related

Sort Google Spreadsheet With Multiple Criteria Using Script

I have a spreadsheet that I update on a regular basis. I also have to re-sort the spreadsheet when finished because of the changes made. I need to sort with multiple criteria like the below settings. I have searched for examples but my Google search skills have failed me.
Sort range from A1:E59
[x] Data has header rows
sort by "Priority" A > Z
then by "Open" Z > A
then by "Project" A > Z

Mogsdad's answer works fine if none of your cells have values automatically calculated via a formula. If you do use formulas, though, then that solution will erase all of them and replace them with static values. And even so, it is more complicated than it needs to be, as there's now a built-in method for sorting based on multiple columns. Try this instead:
function onEdit(e) {
var priorityCol = 1;
var openCol = 2;
var projectCol = 3;
var sheet = SpreadsheetApp.getActiveSheet();
var dataRange = sheet.getDataRange();
dataRange.sort([
{column: priorityCol, ascending: true},
{column: openCol, ascending: false},
{column: projectCol, ascending: true}
]);
}
Instead of making a separate function, you can use the built-in onEdit() function, and your data will automatically sort itself when you change any of the values. The sort() function accepts an array of criteria, which it applies one after the other, in order.
Note that with this solution, the first column in your spreadsheet is column 1, whereas if you're doing direct array accesses like in Mogsdad's answer, the first column is column 0. So your numbers will be different.

That is a nice specification, a great place to start!
Remember that Google Apps Script is, to a large extent, JavaScript. If you extend your searching into JavaScript solutions, you'll find plenty of examples of array sorts here on SO.
As it happens, much of what you need is in Script to copy and sort form submission data. You don't need the trigger part, but the approach to sorting can be easily adapted to handle multiple columns.
The workhorse here is the comparison function-parameter, which is used by the JavaScript Array.sort() method. It works through the three columns you've indicated, with ascending or descending comparisons. The comparisons used here are OK for Strings, Numbers and Dates. It could be improved with some cleaning up, or even generalized, but it should be pretty fast as-is.
function sortMySheet() {
var sheet = SpreadsheetApp.getActiveSheet();
var dataRange = sourceSheet.getDataRange();
var data = dataRange.getValues();
var headers = data.splice(0,1)[0]; // remove headers from data
data.sort(compare); // Sort 2d array
data.splice(0,0,headers); // replace headers
// Replace with sorted values
dataRange.setValues(data);
};
// Comparison function for sorting two rows
// Returns -1 if 'a' comes before 'b',
// +1 if 'b' before 'a',
// 0 if they match.
function compare(a,b) {
var priorityCol = 0; // Column containing "Priority", 0 is A
var openCol = 1;
var projectCol = 2;
// First, compare "Priority" A > Z
var result = (a[priorityCol] > b[priorityCol] ) ?
(a[priorityCol] < b[priorityCol] ? -1 : 0) : 1;
if (result == 0) {
// "Priority" matched. Then compare "Open" Z > A
result = (b[openCol] > a[openCol] ) ?
(b[openCol] < a[openCol] ? -1 : 0) : 1;
}
if (result == 0) {
// "Open" matched. Finally, compare "Project" A > Z
result = (a[projectCol] > b[projectCol] ) ?
(a[projectCol] < b[projectCol] ? -1 : 0) : 1;
}
return result;
}

Try this using the Apps Script sort instead of the native JavaScript. I had the same issue with sorting the header row(s) and this solved the issue.
So I think something like this should work:
function onOpen() {
SpreadsheetApp.getActiveSpreadsheet()
.getSheetByName("Form Responses 1").sort(2);
}
Regarding sorting by multiple columns, you can chain that sort() method, with the final sort() having the highest priority, and the first sort() the lowest. So something like this should sort by Start date, then by End date:
function onOpen() {
SpreadsheetApp.getActiveSpreadsheet()
.getSheetByName("Form Responses 1").sort(3).sort(2);
}
Reference link:-
https://support.google.com/docs/thread/16556745/google-spreadsheet-script-how-to-sort-a-range-of-data?hl=en

Not sure if this is still relevant, but you can use the sort() function to define another tab as a sorted version of the original data.
Say your original data is in a tab named Sheet1; I'm also going to act as though your Priority, Open, and Project columns are A, B, and C, respectively.
Create a new tab, and in cell A1 type:
=sort(Sheet1!A1:E59, 1, TRUE, 2, FALSE, 3, TRUE)
The first argument specifies the sheet and range to be sorted, followed by three pairs: the first of each pair specifies the column (A=1, B=2, etc.), and the second specifies ascending (TRUE) or descending (FALSE).

How can I sum binned time series using d3.js?

I want a simple graph like:
The data I have is a simple list of transactions with two properties:
timestamp
amount
I tried d3.layout.histogram().bins() but it seems it only supports counting the transactions.
I mustn't be the only one looking for that, am I ?

Ok, so the IRC folks helped me out and pointed to nest, which works great (this is CoffeeScript):
nested_data = d3.nest()
.key((d) -> d3.time.day(d.timestamp))
.rollup((a) -> d3.sum(a, (d) -> d.amount))
.entries(incoming_data) # An array of {timestamp: ..., amount: ...} objects
# Optional
nested_data.map (d) ->
d.date = new Date(d.key)
The trick here is d3.time.day which takes a timestamp, and tells you which day (12 a.m. in the night) that timestamp belongs to. This function and the other ones like d3.time.week, etc.. can bin timeseries very well.
The other trick is the nest().rollup() function, which after being grouped by key(), sum all of the events on a given day.
Last thing I wanted, was to interpolate empty values on the days where I had no transactions. This is the last part of the code:
# Interpolate empty vals
nested_data.sort((a, b) -> d3.descending(a.date, b.date))
ex = d3.extent(nested_data, (d) -> d.date)
each_day = d3.time.days(ex[0], ex[1])
# Build a hashmap with the days we have
data_hash = {}
angular.forEach(data, (d) ->
data_hash[d.date] = d.values
)
# Build a new array for each day, including those where we didn't have transactions
new_data = []
angular.forEach(each_day, (d) ->
val = 0
if data_hash[d]
val = data_hash[d]
new_data.push({date: d, values: val})
)
final_data = new_data
Hope this helps somebody!

The histogram code doesn't support this, but you can easily do the binning yourself. Assuming that you have a date and a count for each transaction, you can bin by day like this.
var bins = {};
transactions.forEach(function(t) {
var key = t.date.toDateString();
bins[key] = bins[key] || 0;
bins[key] += t.amount;
});
You can obviously parse the date string back into a date if you need it; the point of using .toDateString() here is that the time part is chopped off and everything binned by day. If you want to bin by another time interval, you can use the same technique and extract a different part of the date.

MongoDB + Ruby: updating records in an iteration

Using MongoDB and the Ruby driver, I'm trying to calculate the rankings for players in my app, so I'm sorting by (in this case) pushups, and then adding a rank field and value per object.
pushups = coll.find.sort(["pushups", -1] )
pushups.each_with_index do |r, idx|
r[:pushups_rank] = idx + 1
coll.update( {:id => r }, r, :upsert => true)
coll.save(r)
end
This approach does work, but is this the best way to iterate over objects and update each one? Is there a better way to calculate a player's rank?

Another approach would be to do the entire update on the server by executing a javascript function:
update_rank = "function(){
var rank=0;
db.players.find().sort({pushups:-1}).forEach(function(p){
rank +=1;
p.rank = rank;
db.players.save(p);
});
}"
cn.eval( update_rank )
(Code assumes you have a "players" collection in mongo, and a ruby variable cn that holds a conection to your database)

How to order integers according to size and track their positions by variable name

I have a program with multiple int variables where individual counts are added to the specific variable each time a set fail condition is encountered. I want the user to be able to track how many failures of each category they have encountered by a button click. I want to display the range on a datagridview in order from highest value integer down to lowest. I also need to display in the adjacent column the name of the test step that relates to the value. My plan was to use Array.sort to order the integers but i then lose track of their names so cant assign the adjacent string column. I tried using a hashtable but if i use the string as a key it sorts alphabetically not numerically and if i use the integer as a key i get duplicate entries which dont get added to the hash table. here is some of the examples i tried but they have the aforementioned problems. essentially i want to end with two arrays where the order matches the naming and value convention. FYI the variables were declared before this section of code, variables ending in x are the string name for the (non x) value of the integer.
Hashtable sorter = new Hashtable();
sorter[download] = downloadx;
sorter[power] = powerx;
sorter[phase] = phasex;
sorter[eeprom] = eepromx;
sorter[upulse] = upulsex;
sorter[vpulse] = vpulsex;
sorter[wpulse] = wpulsex;
sorter[volts] = voltsx;
sorter[current] = currentx;
sorter[ad] = adx;
sorter[comms] = commsx;
sorter[ntc] = ntcx;
sorter[prt] = prtx;
string list = "";
string[] names = new string[13];
foreach (DictionaryEntry child in sorter)
{
list += child.Value.ToString() + "z";
}
int[] ordered = new int[] { download, power, phase, eeprom, upulse, vpulse, wpulse, volts, current, ad, comms, ntc, prt };
Array.Sort(ordered);
Array.Reverse(ordered);
for (int i = 0; i < sorter.Count; i++)
{
int pos = list.IndexOf("z");
names[i] = list.Substring(0, pos);
list = list.Substring(pos + 1);
}
First question here so hope its not too longwinded.
Thanks

Use a Dictionary. And you can order it by the value : myDico.OrderBy(x => x.Value).Reverse(), the sort will be numerical descending. You just have to enumerate the result.
I hope I understand your need. Otherwise ignore me.

You want to be using a
Dictionary <string, int>
to store your numbers.I'm not clear on how you're displaying results at the end - do you have a grid or a list control?
You ask about usings. Which ones do you already have?
EDIT for .NET 2.0
There might be a more elegant solution, but you could implement the logic by putting your rows in a DataTable. Then you can make a DataView of that table and sort by whichever column you like, ascending or descending.
See http://msdn.microsoft.com/en-us/library/system.data.dataview(v=VS.80).aspx for example.
EDIT for .NET 3.5 and higher
As far as sorting a Dictionary by its values:
var sortedEntries = myDictionary.OrderBy(pair => pair.Value);
If you need the results to be a Dictionary, you can call .ToDictionary() on that. For reverse order, use .OrderByDescending(pair => pair.Value).

Why is this LINQ so slow?

Can anyone please explain why the third query below is orders of magnitude slower than the others when it oughtn't to take any longer than doing the first two in sequence?
var data = Enumerable.Range(0, 10000).Select(x => new { Index = x, Value = x + " is the magic number"}).ToList();
var test1 = data.Select(x => new { Original = x, Match = data.Single(y => y.Value == x.Value) }).Take(1).Dump();
var test2 = data.Select(x => new { Original = x, Match = data.Single(z => z.Index == x.Index) }).Take(1).Dump();
var test3 = data.Select(x => new { Original = x, Match = data.Single(z => z.Index == data.Single(y => y.Value == x.Value).Index) }).Take(1).Dump();
EDIT: I've added a .ToList() to the original data generation because I don't want any repeated generation of the data clouding the issue.
I'm just trying to understand why this code is so slow by the way, not looking for faster alternative, unless it sheds some light on the matter. I would have thought that if Linq is lazily evaluated and I'm only looking for the first item (Take(1)) then test3's:
data.Select(x => new { Original = x, Match = data.Single(z => z.Index == data.Single(y => y.Value == x.Value).Index) }).Take(1);
could reduce to:
data.Select(x => new { Original = x, Match = data.Single(z => z.Index == 1) }).Take(1)
in O(N) as the first item in data is successfully matched after one full scan of the data by the inner Single(), leaving one more sweep of the data by the remaining Single(). So still all O(N).
It's evidently being processed in a more long winded way but I don't really understand how or why.
Test3 takes a couple of seconds to run by the way, so I think we can safely assume that if your answer features the number 10^16 you've made a mistake somewhere along the line.

The first two "tests" are identical, and both slow. The third adds another entire level of slowness.
The first two LINQ statements here are quadratic in nature. Since your "Match" element potentially requires iterating through the entire "data" sequence in order to find the match, as you progress through the range, the length of time for that element will get progressively longer. The 10000th element, for example, will force the engine to iterate through all 10000 elements of the original sequence to find the match, making this an O(N^2) operation.
The "test3" operation takes this to an entirely new level of pain, since it's "squaring" the O(N^2) operation in the second single - forcing it to do another quadratic operation on top of the first one - which is going to be a huge number of operations.
Each time you do data.Single(...) with the match, you're doing an O(N^2) operation - the third test basically becomes O(N^4), which will be orders of magnitude slower.

Fixed.
var data = Enumerable.Range(0, 10000)
.Select(x => new { Index = x, Value = x + " is the magic number"})
.ToList();
var forward = data.ToLookup(x => x.Index);
var backward = data.ToLookup(x => x.Value);
var test1 = data.Select(x => new { Original = x,
Match = backward[x.Value].Single()
} ).Take(1).Dump();
var test2 = data.Select(x => new { Original = x,
Match = forward[x.Index].Single()
} ).Take(1).Dump();
var test3 = data.Select(x => new { Original = x,
Match = forward[backward[x.Value].Single().Index].Single()
} ).Take(1).Dump();
In the original code,
data.ToList() generates 10,000 instances (10^4).
data.Select( data.Single() ).ToList() generates 100,000,000 instances (10^8).
data.Select( data.Single( data.Single() ) ).ToList() generates 100,000,000,000,000,000 instances (10^16).
Single and First are different. Single throws if multiple instances are encountered. Single must fully enumerate its source to check for multiple instances.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Slicing in PyTables - performance

Related

Sort Google Spreadsheet With Multiple Criteria Using Script

How can I sum binned time series using d3.js?

MongoDB + Ruby: updating records in an iteration

How to order integers according to size and track their positions by variable name

Why is this LINQ so slow?

Categories

Resources