Trains: Can I reset the status of a task? (from 'Aborted' back to 'Running') - trains

I had to stop training in the middle, which set the Trains status to Aborted.
Later I continued it from the last checkpoint, but the status remained Aborted.
Furthermore, automatic training metrics stopped appearing in the dashboard (though custom metrics still do).
Can I reset the status back to Running and make Trains log training stats again?
Edit: When continuing training, I retrieved the task using Task.get_task() and not Task.init(). Maybe that's why training stats are not updated anymore?
Edit2: I also tried Task.init(reuse_last_task_id=original_task_id_string), but it just creates a new task, and doesn't reuse the given task ID.

Disclaimer: I'm a member of Allegro Trains team
When continuing training, I retrieved the task using Task.get_task() and not Task.init(). Maybe that's why training stats are not updated anymore?
Yes that's the only way to continue the same exact Task.
You can also mark it as started with task.mark_started() , that said the automatic logging will not kick in, as Task.get_task is usually used for accessing previously executed tasks and not continuing it (if you think the continue use case is important please feel free to open a GitHub issue, I can definitely see the value there)
You can also do something a bit different, and justcreate a new Task continuing from the last iteration the previous run ended. Notice that if you load the weights file (PyTorch/TF/Keras/JobLib) it will automatically connect it with the model that was created in the previous run (assuming the model was stored is the same location, or if you have the model on https/S3/Gs/Azure and you are using trains.StorageManager.get_local_copy())
previous_run = Task.get_task()
task = Task.init('examples', 'continue training')
task.set_initial_iteration(previous_run.get_last_iteration())
torch.load('/tmp/my_previous_weights')
BTW:
I also tried Task.init(reuse_last_task_id=original_task_id_string), but it just creates a new task, and doesn't reuse the given task ID.
This is a great idea for an interface to continue a previous run, feel free to add it as GitHub issue.

Related

Run multiple tests from a previous saved state

I see Cypress lets us get back to the application state during a test to debug using time-travel. Is it possible to use this state snapshot as a starting point for other tests?
Imagine a UI where options in a stepper depend on previous selections in earlier steps, and many of these rely on requests to an API. To run different tests in the last step I would need to complete the earlier steps in exactly the same way each time. This can be added to the before block to make the code simpler but we still have the delay and overheads of API requests each time to get to this exact same state. Given that Cypress already stores the state at various points, can I seed future tests with the state from previous ones?

Apache Niffi getMongo Processor

I am new in niffi i am using getMongo to extract document from mongodb but same result is coming again and again but the result of query is only 2 document the query is {"qty":{$gt:10}}
There is a similar question regarding this. Let me quote what I had said there:
"GetMongo will continue to pull data from MongoDB based on the provided properties such as Query, Projection, Limit. It has no way of tracking the execution process, at least for now. What you can do, however, is changing the Run Schedule and/or Scheduling Strategy. You can find them by right clicking on the processor and clicking Configure. By default, Run Schedule will be 0 sec which means running continuously. Changing it to, say, 60 min will make the processor run every one hour. This will still read the same documents from MongoDB again every one hour but since you have mentioned that you just want to run it only once, I'm suggesting this approach."
The question can be found here.

Sustainable solution using JMeter for a big functional flow

I have a huge flow to test using APIs. There are 3 endpoints. One is starting a process (db migration) that can last ~2-3 days, one is returning the status of the current running process (in progress, success, fail) and the last one is returning all the failed processes (as a list).
The whole flow should be:
Start the first process
Call the second endpoint until the first process ends (should get Fail or Success)
If the process failed, call the first endpoint again, if not, go to the next process.
The problem is that 1 process can last around 2-3 days and we have around 20k processes to check. (this should take a lot of time). I do have a special VM only for this.
My question: does it worth trying to implement a solution for this using JMeter?
It is not worth implementing in JMeter unless you want to use the tool as a workload automation engine that replaces functionalities provided by UC4 AppWorkr or Control-M. Based on what you describe, it does not appear to be a load test except the 2nd part that continuously queries the services for success/failure. I do not know the architecture behind that implementation. Hence, I am unable to quantify even that would be a load test or not.

Parallel Processing with Starting New Task - front end screen timeout

I am running an ABAP program to work with a huge amount of data. The SAP documentation gives the information that I should use
Remote Function Modules with the addition STARTING NEW TASK to process the data.
So my program first selects all the data, breaks the data into packages and calls a function module with a package of data for further processing.
So that's my pseudo code:
Select KEYFIELD from MYSAP_TABLE into table KEY_TABLE package size 500.
append KEY_TABLE to ALL_KEYS_TABLE.
Endselect.
Loop at ALL_KEYS_TABLE assigning <fs_table> .
call function 'Z_MASS_PROCESSING'
starting new TASK 'TEST' destination in group default
exporting
IT_DATA = <fs_table> .
Endloop .
But I am surprised to see that I am using Dialog Processes instead of Background Process for the call of my function module.
So now I encountered the problem that one of my Dialog Processes were killed after 60 Minutes because of Timeout.
For me, it seems that STARTING NEW TASK is not the right solution for parallel processing of mass data.
What will be the alternative?
As already mentioned, thats not an easy topic that is handled with a few lines of codes. The general steps you have to conduct in a thoughtful way to gain the desired benefit is:
1) Get free work processes available for parallel processing
2) Slice your data in packages to be processed
3) Call an RFC enabled function module asynchronously for each package with the available work processes. Handle waiting for free work processes, if packages > available processes
4) Receive your results asynchronously
5) Wait till everything is processed and merge the data together again and assure that every package was handled properly
Although it is bad practice to just post links, the code is very long and would make this answer very messy, therfore take a look at the following links:
Example1-aRFC
Example2-aRFC
Example3-aRFC
Other RFC variants (e.g. qRFC, tRFC etc.) can be found here with short description but sadly cannot give you further insight on them.
EDIT:
Regarding process type of aRFC:
In parallel processing, a job step is started as usual in a background
processing work process. (...)While the job itself runs in a
background process, the parallel processing tasks that it starts run
in dialog work processes. Such dialog work processes may be located on
any SAP server.
The server is specified with the GROUP (default: parallel_generators) see transaction RZ12 and can have its own ressources just for parallel processing. If your process times out, you have to slice your packages differently in size.
I think, best way for parallel processing in SAP is Bank Parallel Processing framework as Jagger mentioned. Unfortunently its rarerly mentioned in any resource and its not documented well.
Actually, best documentation I found was in this book
https://www.sap-press.com/abap-performance-tuning_2092/
Yes, it's tricky. It costed me about 5 or 6 days to force it going. But results were good.
All stuff is situated in package BANK_PP_JOBCTRL and you can use its name for googling.
Main idea there is to divide all your work into steps (simplified):
Preparation
Parallel processing
2.1. Processing preparation
2.2. Processing
(Actually there are more steps there)
First step is not paralleized. Here you should prepare all you data for parallel processing and devide it into 'piece' which will be processed in parallel.
Content of pieces, in turn, can be ID or preloaded data as well.
After that, you can run step 2 in parallel processing.
Great benefit of all this is that error in one piece of parallel work won't lead to crash of all your processing.
I recomend you check demo in function group BANK_API_PP_DEMO
To implement parallel processing, you need to do a bit more than just add that clause. The information is contained in this help topic. A lot of design effort needs to be devoted to ensure that the communication and result merging overhead of the parallel processing does not negate the performance advantage gained by the parallel processing in the first place and that referential integrity of the data is maintained even when some of the parallel tasks fail. Do not under-estimate the complexity of this task.
You could make use of the bgRFC technique. This is a new method of background processing made by SAP.
BgRFC has, in addition to the already existing IN BACKGROUND TASK, the possibility to configure and monitor all calls which run through this method.
You can read more documentation between the different possibilities here. This is all (of course) depending on your SAP version.

Windows Workflows - While Activity for creating multiple tasks not working

I am using a while activity for creating multiple tasks for a workflow. The code is executed fine and the task is created when the loop runs only once. But when the loop runs twice or more, only one task is getting created. Also the WF status shows as Error Occured.
All I want to do here is create multiple tasks (no of tasks depends on an entered column value) for the same user. Is it posible to use 'while' in this scenario? Or is there any other way to go ahead?
NB: I am using state machine workflow.
You may want to use a Replicator Activity which will in turn "clone" its child-activities. It can be run parallel or sequentially.
I found Working with the Replicator Activity and an Until Condition useful.
Otherwise without the Replicator, there is just one Task Activity.
In either case, make sure to assign a new Guid to the TaskId property. However, as an annoying "feature": it will not work if you just assign the TaskId property (I know, I tried and was like "Wth?!?"). Instead, bind the TaskId to a Field/Property and then assign to that.

Resources