Apache SparkUI on local mode - maven

I'm new to Apache Spark. Currently using in on IntelliJ IDEA 14, maven project method. I'm unable to access the Spark UI after my program stops because of
15/10/22 23:28:27 INFO SparkUI: Stopped Spark web UI at http://192.168.1.88:4041
SparkUI stops too fast for me to review what my program does. Is there any way to review the SparkUI for the particular program after it's stopped? Or can I find some kind of log that is created after running the program?
After setting "spark.eventLog.enabled" to true and "spark.eventLog.dir" to a local folder, I was able to produce Unix executable files(local-1445955378567). However, the details of the log is not what I expected. How do I add SparkListener??
Output of unix executable"local-1445955378567"

You need to either keep the ui alive (by keeping your context alive) or start a history server since the ui is tied to the context. http://spark.apache.org/docs/latest/monitoring.html

Related

Task fails with no way to debug

I have a node running two jobs - they communicate with an external adaptor and then send the value on-chain.
One job works fine, which already tells me that the node can write on-chain.
The other job, receives the request, talks with the external adaptor (I have verified this on the external adaptor server) and then doesn't submit anything on-chain.
There is no way to debug this through the Operator UI. This is what it shows:
What should I do? I am running the Chainlink develop version because the most up-to-date stable version as a critical bug.
In the Chainlink node version 1.8.0, there are "Error" and "Runs" tabs in your node UI in the browser, and these 2 tabs allow you to view what's wrong with your job run. You can find the latest chainlink docker image here.
The error messages under the "error" tab are shown below, and the info can reflect the error your job encountered in the run.
If there are no "error" and "run" tabs in the browser or there is nothing shown in the UI, you can also find error info in the log file housed by the server running the Chainlink node. The default path of the Chainlink node log file is /chainlink/chainlink_debug.log, so you can log into the server that running the node and check the log for debugging.
Hope it helps.

Logging for Talend job running within spring-boot

We have talend-jobs triggered within Spring-boot application. Is there any way to configure the output of talend-jobs to the application log files?
One workaround we find is to write logs directly to an external file (filePath passed as context-param). But wanted to find if there is a better way to configure this seamlessly.
Not sure if I understood the question correctly, but I guess your concerns might be on what might have happened to the triggered Jobs.
Logging
With Respect to Logging for Talend, You could configure using Log4j,
https://help.talend.com/reader/5DC~TBhDsBie5JTXyVLW4g/QSGCZJKXo~uhKvZDq1DxUg
Monitoring
Regarding the Status of the Job Executed, you could get the execution details retrieved using REST Call(Talend Metaservlet API).
getTaskExecutionStatus
https://help.talend.com/reader/oYf9gKhmYrkWCiSua4qLeg/SLiAyHyDTjuznLR_F~MiQQ
By Modifying the Existing Talend Job,You could also design a like a feedback loop, ie Trigger a REST Call back to your application. With the details of Execution from Talend Job.

Java Quartz is internally caching jobs and its parameters

I am involved in a project which requires me to create a Job Scheduler using “Quartz Scheduler” to schedule various jobs which in turn trigger Pentaho Kettle transformation(s). Kettle transformations are essentially ETL scripts performing some mundane activities in our case. Am facing a critical issue while running the scheduler:
We have around 10 jobs scheduled using Job Scheduler. For some 3 to 4 specific jobs it’s throwing following exception:
Unable to load the job from XML file [/home /transformations/jobs/TestJob.kjb] Unable to read file [file:///home /transformations/jobs/ TestJob.kjb] Could not read from "file:///home /transformations/jobs/TestJob.kjb" because it is a not a file.
org.pentaho.di.job.JobMeta.(JobMeta.java:715)
org.pentaho.di.job.JobMeta.(JobMeta.java:679)
com. XYZ.transformation.jobs.impl.JobBootstrapImpl.executeJob(JobBootstrapImpl.java:115)
com. XYZ.transformation.jobs.impl.JobBootstrapImpl.startJobsExecution(JobBootstrapImpl.java:100)
com. XYZ.transformation.jobs.impl.QuartzJobsScheduler.executeInternal(QuartzJobsScheduler.java:25)
org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
org.quartz.core.JobRunShell.run(JobRunShell.java:223)
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
Weird thing is that, upon verifying the specified path i.e. “/home /transformations/jobs/TestJob.kjb”, file is present and I am able to read it. Moreover the Job runs successfully and does all the things which it is supposed to, yet throws the exception detailed above.
After observing closely, I strongly feel that Quartz is internally caching jobs and/or its parameters. We do load certain parameters required for the job to execute after it is triggered. Would it be possible to delete/purge the cache used by Quartz? I also tried killing all the java processes running on the box (thinking that it may kill Quartz itself, as Quartz is being run within java process) and restarting quartz and its jobs afresh, but couldn’t make it work as expected. It still stores the old parameters somewhere perhaps in some cache.
Versions used –
Spring Framework (spring-core & spring-beans) - 3.0.6.RELEASE
Quertz Scheduler - 1.8.6
Platform – Redhat Linux - 2.6.18-308.el5
Pentaho kettle – Spoon Stable Release – 4.3.0
I will do in this way:
Ensure that the Pentaho Job can run in standalone first with a shell script, java service wrapper or whatever
In the Quartz Job, then use Quartz's NativeJob to call the same standalone script
Just my two cents
Looks to me like you have an extra space in the path.
/home /transformations/jobs/TestJob.kjb
Between the e of home and the /
Remove that space, I can't possibly believe you actually have a home directory called "home "!!

How does one run Spring XD in distributed mode?

I'm looking to start Spring XD in distributed mode (more specifically deploying it with BOSH). How does the admin component communicate to the module container?
If it's via TCP/HTTP, surely I'll have to tell the admin component where all the containers are? If it's via Redis, I would've thought that I'll need to tell the containers where the Redis instance is?
Update
I've tried running xd-admin and Redis on one box, and xd-container on another with redis.properties updated to point to the admin box. The container starts without reporting any exceptions.
Running the example stream submission curl -d "time | log" http://{admin IP}:8080/streams/ticktock yields no output to either console, and not output to the logs.
If you are using the xd-container script, then the redis.properties is expected to be under "XD_HOME/config" where XD_HOME points the base directory where you have bin, config, lib & modules of xd.
Communication between the Admin and Container runtime components is via the messaging bus, which by default is Redis.
Make sure the environment variable XD_HOME is set as per the documentation; if it is not you will see a logging message that suggests the properties file has been loaded correctly when it has not:
13/06/24 09:20:35 INFO support.PropertySourcesPlaceholderConfigurer: Loading properties file from URL [file:../config/redis.properties]

Spring Batch Admin: Schedule new jobs through web GUI

A newbie question on Sprint Batch Admin.
My requirement is that the user should be able to schedule new jobs (passing some parameters for the job functionality) through a web UI. These jobs should be persistent, will be repetitive and could be cancelled or deleted. Also, a report could be generated for last run jobs and to list all the existing jobs with their next run dates.
Perhaps my most important requirement is that this should be possible "on the fly", not requiring redeploying the web-application or a server re-start.
Can this be done using Spring Batch Admin (I see that the guide talks about uploading an XML for adding a job but that seems tedious, if there is an API why shouldn't we be able to create a job on the fly through the Batch Admin Web UI)? Or does JDK Timer or Quartz support it?
Once a job has been created, it can't be deleted, but it can be stopped. Allowing deletion from DB is a risky operation, as Spring Batch might have already been started the job execution, but the DB has not been updated yet. If one removes the job at this moment, you have inconsistency.
Scheduling a new job is described in Launch Job. It is not possible to create new types of jobs, as jobs can generally have complicated configuration which is parsed only once when Spring Context is loaded.
Dynamic deployment (on the fly) of jobs and configurations, without requiring server restart, is a feature we implemented in Trooper Batch Profile - it is not exactly Spring Batch admin but builds on it. You continue to write your jobs using Spring batch, just the container changes for in Trooper you would use its Batch profile runtime. Screen shots and features are here : https://github.com/regunathb/Trooper/wiki/Writing-Batch-jobs-in-Trooper
I think we can deploy the each spring batch job by a SBA. I mean each batch job will be compiled as a war file. We deploy them together in server. In this way, we have the following visiting urls to monitor each jobs:
h t t p://bactchjobserver/job1
h t t p://bactchjobserver/job2
h t t p://bactchjobserver/job3
h t t p://bactchjobserver/job4
But the downside is that each war fill surely contains lib files, which make each war file like 10MB size.
At the same time, I tried to manually add new-job.xml to war-file\WEB-INF\classes\META-INF\spring\batch\jobs, and new-job.jar to war-file\WEB-INF\lib without stopping JBoss. It works. The new job can be showed in SBA UI and runnable.
But obviously this would lead much maintenance and trouble shooting. It is not implementable.

Resources