Which is the proper way get store log in talend ETL - etl

I am new for Talend ETL. So which is the proper way to store talend logs when its running in automate.
1. Job running time
2. Error in-case if Job return error
3. Number of rows

In Talend Open Studio tStatCatcher components listens to components that have the tStatCatcher Statistics option set to true, and writes statistics information to the defined output. tStatCatcher also listens for the start and end of a Job's execution.
End of the job's execution, pass the logs from tStatCatcher to database

Related

Nifi process group scheduling using control m

I am new to Nifi.My requirement is to trigger Nifi process group using external scheduling tool called Control M. I tried using shell script to start and.stop the process group using curl command. Process group will fetch data from text file and writes into a database but unable to determine when the process group gets completed because I could see status like Started, Running and Stopped but not Completed state. Struck with this issue and need your valuable inputs on this of how to determine all the records got inserted into database placed inside process group
NiFi is not a batch 'start & stop' style tool. NiFi is built to work with continuous streams of data, meaning that flows are 'always on'. It is not intended to be used with batch schedulers like ControlM, Oozie, Airflow, etc. As such, there is no 'Completed' status for a flow.
That said, if you want to schedule flows in this way, it is possible - but you need to build it in to the flow yourself. You will need to define what 'Completed' is and build that logic in your flow - e.g. MonitorActivity after your last processor to watch for activity.

Can a DataStage job be viewed without access to a DataStage installation

I am tasked to replace an ETL process that used to run in DataStage. I have used DataStage in the past and would be able to review it for replication if I could view it.
I have the extracted jobs in version control, is there a way to view the job without access to DataStage? (If needed, I could request new extracts)
You could ask for a job report - that is a picture of the job with a printed logic for each stage in form of a html page. This might be enough to rebuild the job. There is no access to a free DataStage fat client.

Azure Databricks scheduled job is executed successfully but data is not loaded into destination

log is showing like command took 49.03 minutes ,i can see status of job as "succeeded" but data is not loaded.
Kindly help me out with possible assumptions.
In the log file, please check for the number of records processed and loaded in each step. Have a look in the tables used in the join.
Perform more detailed analysis by, executing the steps manually and analyse the outcome.
Thanks.

how to do deployment of Hive script in mutliple environments

Please help me in answering below questions.
What is deployment strategy for Hive related scripts. Like For SQL we have dacpac, Is there any such components ?
Is there any API to get status of Job submitted through ODBC.
Have you looked at Azure Data Factory: http://azure.microsoft.com/en-us/services/data-factory/
Regarding your questions on APIs to check job status, here are a few PowerShell APIs. Do these help you?
“Start-AzureHDInsightJob” (https://msdn.microsoft.com/en-us/library/dn593743.aspx) starts the job and returns a job object which can be used to track/kill the job.
“Wait-AzureHDInsightJob” (https://msdn.microsoft.com/en-us/library/dn593748.aspx) uses the job object to check the status of the job. It will wait until the job completes or the wait time is exceeded.
“Stop-AzureHDInsightJob” (https://msdn.microsoft.com/en-us/library/dn593754.aspx) stops the job.

Oracle scheduler job sometimes failing, without error message

I have created a dbms scheduler job which should write a short string to the alert log of each instance of the 9 databases on our 2-node 11g Oracle RAC cluster, once every 24 hours.
The job action is:
'dbms_system.ksdwrt(2, to_char(sysdate+1,''rrrr-mm-dd'') || ''_First'');'
which should write a line like:
2014-08-27_First
The job runs succesfully according to its log, and it does write what it's supposed to, but not always. It's only been scheduled for a few days, so I can't be certain, but it looks as if it will only write to one instance's alert log. Logs on both sides seem to be getting written to, but if it's on one side it's not on the other, so it appears. There is however no indication whatever of any failure in the job itself.
Can anyone shed any light on this behaviour? Thanks.

Resources