Run Flink with parallelism more than 1 - parallel-processing

may be i'm just missing smth, but i just have no more ideas where to look.
i read messages from 2 sources, make a join based on a common key and sink
it all to kafka.
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(3)
...
source1
.keyBy(_.searchId)
.connect(source2.keyBy(_.searchId))
.process(new SearchResultsJoinFunction)
.addSink(KafkaSink.sink)
so it perfectly works when i launch it locally and it also works on cluster with Parallelism set to 1, but with 3 not any more.
When i deploy it to 1 job manager and 3 taskmanagers and get every Task in "RUNNING" state, after 2
minutes (when nothing is comming to sink) one of the taskmanagers gets the following log:
https://gist.github.com/zavalit/1b1bf6621bed2a3848a05c1ef84c689c#file-gistfile1-txt-L108
and the whole thing just shuts down.
i'll appreciate any hint.
tnx, in an advance.

The problem appears to be that this task manager -- flink-taskmanager-12-2qvcd (10.81.53.209) -- is unable to talk to at least one of the other task managers, namely flink-taskmanager-12-57jzd (10.81.40.124:46240). This is why the job never really starts to run.
I would check in the logs for this other task manager to see what it says, and I would also review your network configuration. Perhaps a firewall is getting in the way?

Related

What is the correct use case for SchedulerLock lockAtMostFor?

I am using SchedulerLock in Spring Boot And I am using 2 servers.
What I'm curious about is why is "lockAtMostFor" an option that exists?
Take an example: on one of my 2 servers, the schedule runs first and then locks.
But something went wrong while running, and my server went down.
At this moment, my scheduled task ends incompletely.
Any guide I read is full of vague answers about "lock time in case a node dies".
When a node dies, it can no longer execute schedules.
But why keep holding a LOCK for a dead node?
Even if I urgently try to manually execute the schedule on the 2nd server, it is impossible to manually execute it because of the above lock.
What are options that exist for?

Auto Trigger B job after triggering A job in TeamCity

Is there a way that I can auto trigger job B exactly 1 hour after triggering job A, here the issue is job A would have not finished its work in mid of the job itself it has to trigger job B that too exactly after an hour or the other option would be to skip to build script 2 exactly after an hour of execution in script 1 , is there any way to do this ?
Thanks in advance
I cannot offer a good practice as a solution, but I can suggest two possible workarounds:
1. Build Pause
You can add a 'Command Line' shell pause as the last build step of project A or the first build step of project B. That pause must be set to one hour:
sleep 1h
You need to reconfigure the default build timeout for this or the job will fail.
2. Strict Scheduling
If you have some flexibility on the time where A can or should be triggered, you can use the 'Schedule Trigger' to schedule both A and B, e.g. if you schedule project A to 1 pm and project B to 2 pm, you make sure that there is at least one hour between those two. This can be scheduled as often as necessary.
I don't think what you are proposing is a good way to go about setting up the deployment, but I can think of a few workarounds that might help if you are forced in this direction.
In configuration A, add build step which adds a scheduled build trigger to configuration B for an hours time (using the API). In configuration B, add a build step to the end of the configuration to remove this scheduled trigger. This feels like a really horrible hack which should be avoided, but more details here.
Outside of TeamCity make use of any pub/sub mechanism so the deployment to the VM can create an event when it has completed. Subscribe to this event and trigger a follow on build using the TeamCity API. For example, if you are using AWS you could have an SNS topic with a lambda function as a subscriber. This lambda function would call the API to queue configuration B when the environment is in a suitable state.
There are probably much nicer solutions if you share what deployment software you are using.

Azure DevOps Server 2019 (on-premises): Can agent jobs be run serially?

I have a scenario where I would like a build to start running on one agent (Job 1), and then after doing some work, I'd like it to run a step on a special agent (pool) of machines with specially licensed software. (Job 2). When that is done I'd like the rest of the build to complete on the original agent (Job 3).
I have been able to use "Variable Tools for Azure DevOps Services" to successfully pass any number of variables between agent jobs, even when they are running on different machines. It is no problem for me to pass a UNC path from Job1 to Job2 / Job3, etc.
However, what I am seeing is that no matter what I do, agent jobs are always running in parallel, and there is no way to get them to run serially, unless they are locked to the same agent on the same machine, which defeats the whole purpose.
Does anyone know of a means to accomplish this? Right now in tests, I have to use "Start-Sleep" or something similar, and repeatedly monitor an external event. A terribly inelegant work-around.
I found the answer. A job properties contains a field called "dependencies". You can make it serial by setting a dependency on the previous job.
In Azure Devops for the agent job you will get below options
You can select any option based on your requirements.

Issues in custom scheduler instance for EJB timer service in clustered environment

We have our application deployed on Websphere application server. The application is running on clustered environment with 6 Nodes. EJB timer service is configured using custom scheduler with datasource pointing to Oracle database.So when the application is deployed on cluster it triggers the Ejb timer service on Node1 which is given in the Oracle database.
Some times the value in oracle database changes automatically to some
other nodes like node2 or node3 because of which EJB timer is getting
stopped.Any Suggestions or advice on why it gets changed automatically.
EJB timer configuration
Server(0).components.ApplicationServer(1).components.EJBContainer(1).timerSettings.EJBTimer(0).datasourceJNDIName = jdbc/cdb_db
Server(0).components.ApplicationServer(1).components.EJBContainer(1).timerSettings.EJBTimer(0).nonPersistentTimerRetryCount = -1
Server(0).components.ApplicationServer(1).components.EJBContainer(1).timerSettings.EJBTimer(0).nonPersistentTimerRetryInterval = 300
Server(0).components.ApplicationServer(1).components.EJBContainer(1).timerSettings.EJBTimer(0).numAlarmThreads = 1
Server(0).components.ApplicationServer(1).components.EJBContainer(1).timerSettings.EJBTimer(0).numNPTimerThreads = 1
Server(0).components.ApplicationServer(1).components.EJBContainer(1).timerSettings.EJBTimer(0).pollInterval = 300
Server(0).components.ApplicationServer(1).components.EJBContainer(1).timerSettings.EJBTimer(0).tablePrefix = EJBTIMER_
Server(0).components.ApplicationServer(1).components.EJBContainer(1).timerSettings.EJBTimer(0).uniqueTimerManagerForNP = false
As the first comment added to this question points out, it is the designed behavior of EJB Persistent Timers/Scheduler to have any one member run all of the tasks until that member isn't available or cannot respond quickly enough, in which case another member takes over.
If you don't like this behavior and want to change it so that your timer tasks can only run on a single member, you can accomplish that by stopping the scheduler poll daemon on all members except for the one that you want to run the tasks. Here is a knowledge center document which describes how to do that:
https://www.ibm.com/support/knowledgecenter/en/SSAW57_8.5.5/com.ibm.websphere.nd.multiplatform.doc/scheduler/xmp/xsch_stopstart.html
Just be aware that if you do this, you will be losing out on the ability for the scheduler to automatically start running tasks on a different member should the member that you have designated to run them go down. In this case, tasks will not run at all until either of
1) the member that is allowed to run them comes back up, or
2) you manually use the aforementioned WASScheduler MBean to start the scheduler poll daemon on a different member, thus allowing tasks to run there

Oracle scheduler job sometimes failing, without error message

I have created a dbms scheduler job which should write a short string to the alert log of each instance of the 9 databases on our 2-node 11g Oracle RAC cluster, once every 24 hours.
The job action is:
'dbms_system.ksdwrt(2, to_char(sysdate+1,''rrrr-mm-dd'') || ''_First'');'
which should write a line like:
2014-08-27_First
The job runs succesfully according to its log, and it does write what it's supposed to, but not always. It's only been scheduled for a few days, so I can't be certain, but it looks as if it will only write to one instance's alert log. Logs on both sides seem to be getting written to, but if it's on one side it's not on the other, so it appears. There is however no indication whatever of any failure in the job itself.
Can anyone shed any light on this behaviour? Thanks.

Resources