multiple tqdm progress bars when using joblib parallel - multiprocessing

I’ve a function:
def func(something):
for j in tqdm(something):
...
which is called by:
joblib.Parallel(n_jobs=4)((joblib.delayed)(s) for s in something_else)
Now, this creates 4 overlapping tqdm progress bars. Is it possible to get 4 separate ones that update independently?

EDIT: I was sent this discussion by a friend in which a much cleaner solution is provided. I wrote a quick performance test to make sure that the lock does not cause the threads to block each other. There was no performance hit even when updating bars every millisecond. I recommend that you use that solution instead.
The previous answer has been removed.

Related

Is Qt4/Qt5 QListWidget slow?

I'm planing on creating an application that will need to receive/show a lot of data in a QListWidget (or maybe QListView/QListModel) (I'm open to alternatives).
The QListWidget will receive a huge number of items (+100) each second. I'll need to show all those items if/when scrollbar is used, and I'd like to achieve a non-sluggish effect.
If you have used Procmon (Windows only), that's a good example about what I'm talking about.
My question is: Can Qt handle that much data without being slow? What implementation should I take in mind?
I suggest creating a small prototype and trying if the performance is good enough for you. I would say that QListView might be fast enough for you. Actually, when I worked with similar log views, I found the QTableView a little bit faster than the QListView.
But you should also consider whether the list view is the best possible user interface at all. When you have, lets say, 1 million items in the list (after an hour and a half), the scroll bar will be useless. You cannot use it for fine grained scrolling anymore except by clicking up/down arrows. And when you get 200 new items per second, it is not very useful to constantly draw those new lines, the user cannot read them anyway.
Alternative 1
For showing a log, you can also use QTextDocument or QTextEdit. The implementation is more straight forward, and there is probably less overhead.
If you mix that in with a QSyntaxHighlighter, then you can have a very readable easy to use log stream.
Alternative 2
You could also implement some sort of paging or grouping of your data, where you can jump to the beginning easily or the most recent easily.
Alternative 3
Another idea you could look at, is that most people don't want to try to look at so much data at once. You could aggregate the calls, and tally them up.
For example:
State 1 abcd
State 1 abcd
State 1 abcd
State 1 abcd
State 2 efg
could be represented as
State 1 abcd (x4)
State 2 efg (x1)
Alternative 4
Or you could go with a graphical approach. Draw the stream of data using something like Qwt or QGraphicsView in some manner that makes sense for the large quantities of data you are displaying.
Alternative 5
And finally, another way that may prove useful is to write it to the harddrive. Then have a button if the user wants to see the current log file.
Hope that helps.

executePackage seems to take a long time to launch subpackage

I am a relative beginner at SSIS so I may be doing something silly.
I have a process that involves looping over a heterogenous queue and processing the objects 1 at a time. The process is currently being done in 'set logic' and its dropping stuff. I was asked to rework it in a looping manner, so that decision has been made for me.
I have chosen to implement queue logic in 1 package and the actual processing in another package.
This is all going relatively well considering...
I now have the process up and running, but its slow. 9 seconds per item. Clearly I cant present this solution. :-)
One thing i notice, 1.5 - 2 seconds of each loop are on the ExecutePackage Task in the queue loop.
I cant figure out how to get a hard number, I am using the flashing green box method of performance tuning. The other steps seem to be very fast. Adding indexes, changing sql to sps, all the usual tricks have helped.
Is the UI realiable at all with regards to boxes turning white/yellow/green? Some tasks report times in the progress tab, some dont seem to. So I am counting yellow time.
Should calling a subpackage be that expensive? 1 change i made was I change 'RunInASeparateProcess' to FALSE. I did that because the subpackage produces the following message otherwise:
Error: 0xC0012024 at Script Task: The task "Script Task" cannot run on this edition of Integration Services. It requires a higher level edition.
Task failed: Script Task
The reading i have done seems to advocate multiple packages. Anyone have any counter patterns? Should i stay the course? I started changing to 1 package. Copy/paste doesnt seem to work well w/ SequenceContainers. I would also need to recreate all the variables in the parent package. Doable, but im not sure that is the answer.
Does anyone know of any tuning resources/websites/books they would be willing to share.
Update - I have been tearing things down in an effort to figure out what the problem is. I was thinking it was the package configurations passing variable values. I dont think that is it. I can pass variables to another package w/ nothing in it and it is fast.
I can make the trivial subpackage slow by adding the two connection managers to it.
I suddenly realize I may be making and breaking a connection to both an Oracle Server and a SQL server in both the main package and then the sub package.
Am I correct in this observation?
Is there any way I can reuse the connection between the two packages?
When i google it, most of what i see is suggestions for passing the connection string.
UPDATE - I combined the two packages into one. This performance is not about 1.25 seconds per item, down from about 9. the only thing i can point to that changed is i am now reusing a single connection instead of making multiple connections.
Thanks, I appreciate any help you are kind enough to offer.
Greg
Once you enable logging, I'd suggest running the package from a command window using dtexec. While that doesn't perfectly duplicate the server environment, it does have the advantages of (a) eliminating BIDS as a potential performance issue and (b) being something you can do without jumping through change control hoops.

Excel List-Object VBA Performance Bug?

I have an issue with performance on an excel application which uses List Objects (AKA Excel Tables). I suspect it may be a bug, but despite my Googling I could not find any reference of it. I've already developed a workaround for my application, but what I’m interested in is if anyone can give any insight into why this happens.
Note: I’m using Excel 2007 on Windows Vista. The setup is as follows: I have a spreadsheet which holds data in a List Object, with VBA code which can be kicked off via a command button; this code may make several edits to any number of cells on the worksheet, so Excel’s Calculation mode is set to Manual prior to any edits.
The problem I’ve encountered is that if the currently active cell is within the List Object, then setting the Calculation Mode to manual seems to have no effect whatsoever. So if a user happens to have a heavy calculation workbook open in the same instance, then the VBA code runs very slowly. I practically had to pull my application apart to discover that this was caused by the active cell; and I created a new workbook with simple version of this scenario to confirm that there wasn’t some sort of corruption on my application.
I’ve been doing a number of test cases with this, and below are the results from what I’ve found:
Although it seems generally related to the calculation, there is still a time difference when the calculation mode is switched between Manual and Automatic...
Manual = 7.64 secs
Automatic = 9.39 secs
Manual mode is just fewer than 20% faster than Automatic. But my expectation was they’d be more or less the same, considering the issue seems to be the calculation kicking off even when in Manual mode.
Compare that to when the active cell is not on a List Object, and the results are vastly different...
Manual = 0.14 secs
Automatic = 3.23 secs
Now, the Manual run is 50 times faster, and Automatic run shows that the calculation shouldn’t have taken any more than 3.2 secs! So now the first test looks like it might have run the Calculation twice while in Manual mode, and nearly 3 times while in Automatic mode.
Repeating this test again, this time in an instance with no calculation formula in any cells, and suddenly it doesn’t seem as bad,
Active cell is List Object & Calc is Manual = 0.17 secs
Active cell is List Object & Calc is Automatic = 0.20 secs
Active cell is Empty & Calc is Manual = 0.14 secs
Active cell is Empty & Calc is Automatic = 0.18 secs
It’s still slower, but now it’s only by 10-20%, making it unnoticeable. But this does show that the issue must be related to the Calculation in some way, as otherwise it should have taken just as long as the first test.
If anyone wants to create these tests to see for themselves, the setup is as follows:
New Workbook with a List Object added (doesn’t have to be linked to any data)
Add some formula that will take excel a while to calculate (I just did ‘=1*1’ repeated 30,000 times)
Write a quick VBA code which will; (i) loop through a simple edit of a cell several hundred times, (ii) and record the time it took
Then just run the code while changing the active cell between the List Object and an empty cell
I’d be very interested to hear if anyone can explain why Excel behaves in this way, and if is a bug or if is some feature to do with List Objects which actually has some genuine use?
Thanks,
Stuart
This is not relative to the "bug" you found, which is quite interesting and intriguing.
I just want to share that there is a great way to avoid calculation delays. I had fantastic results with this and now I use it all the time.
Simply put, Excel takes a long time copying data back and forth between the "VBA world" and the "spreadsheet world".
If you do all the "reads" at once, process, and then do all the "writes" at once, you get amazing performance. This is done using variant arrays as documented here:
http://msdn.microsoft.com/en-us/library/ff726673.aspx#xlFasterVBA
in the section labeled: Read and Write Large Blocks of Data in a Single Operation
I was able to refactor some code I had that took 5 minutes to run and bring it down to 1.5 minutes. The refactoring took me 10 minutes, which is amazing because it was quite complex code.
Regarding Table performance (and performance, in general):
I know this is an old question, but I want to get this documented.
One thing that changed between older versions of Excel and the post-2007 versions is that Excel now activates the target sheet of any PasteSpecial operation. You cannot override it by turning off ScreenUpdating and making calculations manual. Such Activation WILL make the sheet visible, and cause uncontrollable flicker.
My original VBA code ran very fast on an old, single-processor XP box running Excel 2000. The change to Excel 2013 on a modern machine was stunning in the terrible slowness of code execution. The three areas that kill performance are PasteSpecial from one sheet to another, any other code that requires activating sheets (Zoom level, Advanced Filter, Sheet-Level range names, etc), and automating sheet protection/unprotection.
This is too bad, because PasteSpecial helped "cleanse" data you copy (Direct use of .Copy to a target will throw the occasional error).
So you need to review your code and make sure you are using direct assignment to the right property for the data type you need (from among Value, Value2, Text, and Formula, for example), instead of PasteSpecial.
e.g. .Range("MYRANGE").Value = .Cells(5, 7).Value2
You also need to be scrupulous in resisting use of Select and Activate throughout your code.
As referenced above, many comments you'll find in Excel fora about that last point will make statements that you "never" need to use Activation, which is clearly untrue, since several things in Excel only apply to or require active sheets. Understanding the cases where activation is forced automatically by a particular method or use of an object will help in coding as well. Unfortunately, you won't see much in the way of documentation of this.
Update:
Regarding Conditional Formatting, you'll find many complaints in various fora about the slowness of Excel when encountering a large number of Conditionally-formatted cells. I suspected this would impact Excel Tables since they have many table format options. To test this, I took a large workbook we use that is currently formatted as several worksheets with the same style of Excel Table on them.
After converting the tables to a conventional range, I noticed no difference in speed of code execution. This would seem to indicate that use of Excel Table formats is far superior to conditionally-formatting your own arrays of cells.

axapta thread / animation

i have a function which costs plenty of time.
this function is an sql-query called via odbc - not written in x++, since the functional range is insufficient.
while this operation is running, I want to show an animation on a form - defined in the aviFiles-macro.
trying to realize, several problems occur:
the animation doesn't start prior the function has finished.
using threads won't fulfill my hopes, since the odbc-settings are made on the server and i guess, the function is called on client-side.
besides - how am i able to get the information that the treaded task has ended up?
could anyone give me a hint, how to
play an animation on a form
do something ( in background ) and go on playing the animation until the task to perform is finished
stop the animation
coding this in exactly this order shows the behaviour mentioned above.
thanks in advance for hints and help!
You can use standard AotFind as an example:
split the work in small pieces each
piece should be executed at timer
tick
Also, you can try not to use timer, but to call infolog.yield() as often as possible.
this could potentially be done in a very complicated way with call backs and delegates if your odbc is in a vs project...
but isn't the real solution to try to find a faster/more effective way to query your data?

What makes a JavaFx 1.2 Scene Graph Refresh?

My first question =). I'm writing a video game with a user interface written in JavaFx. The behavior is correct, but I'm having performance problems. I'm trying to figure out how to figure out what is queuing up the refreshes which are slowing down the app.
I've got a relatively complex Scene Graph that represents a hexagonal map. It scales so that you could have 100 or a 1000 hexagons in the map. As the number of hexagons grow the responsiveness of the gui decreases. I've used YourKit (a Java Profiler) to trace these delays to major redraw operations.
I've spent most of the night trying to figure out how to do two things and understand one thing:
1) Cause a CustomNode to print something to the console whenever it is painted. This would help me identify exactly when these paints are being queued.
2) Identify when a CustomNode is put on the repaint queued.
If I answered 1 and 2, I might be able to figure out what it is that is binding all these different nodes together. Is it possible that JavaFX only works through global refreshes (doubtful)?
JavaFX script is a powerful UI language but certain practices will kill performance. Best performance generally boils down to:
keeping the Scene Graph small
keeping use of bind to a minimum (you can look at using triggers instead which are more performant)
This blog post by Jim Weaver expands these points.
I'm not sure as to the specific answers to your questions. If you examine the 1.2.1 docs you might be able to find a point in the Node documentation that you can override and add println statements but I'm not sure it can be done. You could try posting on forums.sun.com
This is a partial post. I expect to expand it after I've done some more work. I wanted to put in what I've done to date so I don't forget.
I realized that I'd need to get my IDE running with a full compliment of the JavaFx 1.2 source. This would allow me to put break points into the core code to figure out what is going on. I decided to do this configuration on Eclipse for remote debugging. I'm developing my FX in Netbeans but am more comfortable with Eclipse so that's what I want to debug in if I can.
To get this info into Eclipse, I first made a project with the Java source that my code uses. I then added external Jars to the project. On my Mac, the Jars I linked to were in /Library/Frameworks/JavaFX.framework/Versions/1.2
Then I went searching for the Source to link to these Jars. Unfortunately, it's not available. I could find some of it in /Library/Frameworks/JavaFX.framework/Versions/1.2/src.zip.
I did some research and found that the only available option left was to install a Java Decompilier. I used this one because it was easy to install into Eclipse 3.4: http colon_ //java dot decompiler _dot free.fr/ (<-- Please forgive the psudo link, I'm limited because I'm new)
This is where I am now. I can navigate into the Core FX classes and believe I'll be able to set break points and begin real analysis. I'll update this post as I progress.
I found a helpful benchmarking tool:
If you run with the JVM arg:
-Djava.util.logging.config.file=/path/to/logging/file/logging.properties
And you've put the following args into the file referenced by that arg:
handlers = java.util.logging.ConsoleHandler
java.util.logging.ConsoleHandler.level = ALL
java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter
com.sun.scenario.animation.fps.level = ALL
You'll get console output that includes your frame count per second. For FX 1.2 it wasn't working for me, but it appears to be working for 1.2.1 (which was released Sept. 9, 2009.) I don't have a Netbeans that runs 1.2.1 yet.
You may want to read this article.
http://fxexperience.com/2009/09/performance-improving-insertion-times/
Basically, insertions in to the scenegraph are slow and benefits can be seen by batching up inserts

Resources