Debugging Node.js processes with cluster.fork() - debugging

I've got some code that looks very much like the sample in the Cluster documentation at http://nodejs.org/docs/v0.6.0/api/cluster.html, to wit:
var cluster = require('cluster');
var server = require('./mycustomserver');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
var i;
// Master process
for (i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('death', function (worker) {
console.log('Worker ' + worker.pid + ' died');
});
} else {
// Worker process
server.createServer({port: 80}, function(err, result) {
if (err) {
throw err;
} else {
console.log('Thread listening on port ' + result.port);
}
});
}
I've installed node-inspector and tried using both it and the Eclipse V8 plugin detailed at https://github.com/joyent/node/wiki/Using-Eclipse-as-Node-Applications-Debugger to debug my application, but it looks like I can't hook a debugger up to forked cluster instances to put breakpoints at the interesting server logic--I can only debug the part of the application that spawns the cluster processes. Does anybody know if I can in fact do such a thing, or am I going to have to refactor my application to use only a single thread when in debugging mode?
I'm a Node.js newbie, so I'm hoping there's something obvious I'm missing here.

var fixedExecArgv=[];
fixedExecArgv.push('--debug-brk=5859');
cluster.setupMaster({
execArgv: fixedExecArgv
});
Credit goes to Sergey's post.
I changed my server.js to fork only one worker mainly for testing this then added the code above the forking. This fixed the debugging issue for me. Thank you Sergey for explaining and providing the solution!!!

For those who wish to debug child processes in VS Code, just add this to launch.json configuration:
"autoAttachChildProcesses": true
https://code.visualstudio.com/docs/nodejs/nodejs-debugging#_remote-debugging

I already opened a ticket about this here: https://github.com/dannycoates/node-inspector/issues/130
Although it's not fixed yet, there's a workaround:
FWIW: The reason I suspect is that the node debugger needs to bind to the debug port (default: 5858). If you are using Cluster, I am guessing the master/controller binds first, and succeeds, causing the bind in the children/workers to fail. While a port can be supplied to node --debug=N there seems to be no easy way to do this when node is invoked within Cluster for the worker (it might be possible to programmatically set process.debug_port and then enable debugging, but I haven't got that working yet). Which leaves a bunch of options: 1) start node without the --debug option, and once it is running, find the pid for the worker process you want to debug/profile, and send it a USR1 signal to enable debugging. Another option is to write a wrapper for node that calls the real node binary with --debug set to a unique port each time. There are possibly options in Cluster that let you pass such as arg as well.

if you use VSCode to debug, you need to specify the port and and "autoAttachChildProcesses": true in the lanuch.json file.
If you debug directly in DevTool, you need to add a connection to the corresponding port in the console.

For anyone looking at this in 2018+, no startup arguments are necessary.
From this Github issue:
Just a time saver for anyone who might have been in the same boat as me-- the Node.js V8 --inspector Manager (NiM) seems to introduce this issue when it otherwise wouldn't be present-- I spent about an hour banging my head before disabling the Chrome plugin and discovering that everything worked fine when opening from chrome://inspect.
I also spent hours reading github posts, tweaking the settings of gulp-typescript and gulp-sourcemaps, etc, only to have that plugin be the issue. Also worth noting is that I had to add port N+1 to the remote targets of chrome://inspect, so localhost:9230, to debug my worker process.

Use the --inspect flag for node version higher than or equal to 7.7.0 to debug the node js process,
if someone wants more information on how to debug cluster processing and setting up chrome debugger tools for Node JS, please follow my post here.

Related

I keep getting this Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch

I recently installed Heroku Redis. Until then, the app worked just fine. I am using Bull for queuing and ioredis as the redis library. I had connection issues initially but I have resolved that as I no longer get the error. However, this new Error described shows up.
Please check these details below;
Package.json Start Script
"scripts": {
"start": "sh ./run.sh"
}
run.sh file
node ./app/services/queues/process.js &&
node server.js
From the logs on the heroku console, I see this.
Processing UPDATE_USER_BOOKING... Press [ctrl C] to Cancel
{"level":"info","message":"mongodb connected"}
1 is my log in the process script. This tells me that the consumer is running and ready to process any data it receives.
2 Tells me that mongo is connected. It can be found in my server.js(entry file).
My challenge is after those 2 lines, it then shows this;
Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch
Stopping process with SIGKILL
Error waiting for process to terminate: No child processes
Process exited with status 22
State changed from starting to crashed
So, I don't know why this is happening even when I have the PORT sorted out already as described in their docs. See this for clarity:
app.listen(process.env.PORT || 4900, ()=>{})
Note: It was working before until I introduced the Redis bit just a day ago.
Could there be an issue with the way I am running both server and the Queue process in the package.json file? I have been reading answers similar to this, but they are usually focused on the PORT fix which is not my own issue as far as I know.
TroubleShooting : I removed the queue process from the start script and the issue was gone. I had this instead
"scripts": {
"start": "node server.js -p $PORT"
}
So it becomes clear that this line below;
node ./app/services/queues/process.js was the issue
Now, How then do I run this queue process script? I need it to run to listen to any subscription and then run the processor script. It works fine locally with the former start script.
Please Note: I am using Bull for the Queue. I followed this guide to implement it and it worked fine locally.
Bull Redis Implementation Nodejs Guide
I will appreciate any help on this as I am currently blocked on my development.
So I decided to go another way. I read up on how to run background jobs on heroku with Bull and I got a guide which I implemented. The idea is to utilize Node's concurrency API. For the guide a wrapper was used called throng to implement this.
I removed the process file and just wrapped my consumer script inside the start function and passed that to throng.
Link to heroku guide on enabling concurrency in your app
Result: I started getting EADDR in use Error which was because that app.listen() is being run twice..
Solution: I had to wrap the app.listen function inside a worker and pass it to throng and it worked fine.
Link to the solution to the EADDR in use Error
On my local Machine, I was able to push to the Queue and consume from it. After deploying to heroku, I am not getting any errors so far.
I have tested the update on heroku and it works fine too.

How to tell Octopus Deploy to wait until another deployment finishes on the same machine?

Sometimes it is preferred and/or required to host dozens of applications on a single server. Not saying this is "right" or "wrong," I'm only saying that it happens.
A downside to this configuration is the error message Waiting for the script in task [TASK ID] to finish as this script requires that no other Octopus scripts are executing on this target at the same time appears whenever more than one deployment to the same machine is running. It seems like Octopus Deploy is fighting itself.
How can I configure Octopus Deploy to wait for one deployment to completely finish before the next one is started?
Before diving into the answer, it is important to understand why that message is appearing in the first place. Each time a step is run on a deployment target, the tentacle will create a "Mutex" to prevent others projects from interfering with it. An early use case for this was updating the IIS metabase during a deployment. In certain cases, concurrent updates would cause random errors.
Option 1: Disable the Mutex
We've seen cases where the mutex is the cause of the delay. The mutex is applied per step, not per deployment. It is common to see a situation where it looks like Octopus is "jumping" between deployments. Depending on the number of concurrent deployments, that can slow down the deployment. The natural thought is to disable the mutex altogether.
It is possible to disable the mutex by adding the variable OctopusBypassDeploymentMutex and setting it to True. That variable can exist in either a specific project or in a variable set.
More details on what that variable does can be found in this document. If you do disable the mutex please test it and monitor for any failures. For the most part, we don't see issues disabling the mutex, but it has happened from time to time. It depends on a host of other factors such as application type and Windows version.
Option 2: Leverage Deploy a Release Step
Another option is to coordinate the projects using the deploy a release step. Typically this works best when the projects being deployed are part of the same application suite. In the example screenshot below I have five "deployment" projects:
Azure Worker IaC
Database Worker IaC
Kubernetes Worker IaC
Script Worker IaC
OctoStudy
The project Unleash the Kraken coordinates deployments for those projects.
It does this by using the Deploy a Release step. First it spins up all the infrastructure, then it deploys the application.
This won't work as well if the server is hosting 50 disparate applications.
Option 3: Leverage the API to check for running deployments
The final option is to include a step at the start of each project which hits the API to check for active releases to the deployment targets for the deployment target. If an active deployment is found then wait until it is done.
You can do this by hitting the endpoint https://[YOUR URL]/api/[SPACE ID]/machines/[Machine Id]/tasks?skip=0&name=Deploy&states=Executing%2CCancelling&spaces=[SPACE ID]&includeSystem=false. That will tell you all the active tasks being run for a specific machine.
You can get Machine Id by pulling the value from Octopus.Deployment.Machines. You can get Space Id by pulling the value from Octopus.Space.Id.
The pseudo code for this approach could look like this (I'm not including the actual code as your requirements could be very different).
activeDeployments = true
while (activeDeployments)
{
activeDeployments = false
foreach(machineId in Octopus.Deployment.Machines)
{
activeTasks = https://[YOUR URL]/api/[Octopus.Space.Id]/machines/[Machine Id]/tasks?skip=0&name=Deploy&states=Executing%2CCancelling&spaces=[Octopus.Space.Id]&includeSystem=false
if (activeTasks.Count > 0)
{
activeDeployments = true
}
}
if (activeDeployments = true)
{
Sleep for 5 seconds
}
}
I had this message hit me because I hit the Task Cap on the Octopus Server.
In Octopus\Configuration\Nodes change the task cap to 1 to have 1 deployment at a time even with agents on different servers. The message will display constantly
Or simply increase this value to prevent the message from occurring at all.

Docker container with SWI-Prolog terminated with fatal error

Im developing a Spring Boot Web Application, using SWI-Prolog's JPL interface to call Prolog from Java. In development mode everything runs OK.
When I deploy it to Docker the first call on JPL through API, runs fine. When I try to call JPL again, JVM crashes.
I use LD_PRELOAD to point to libswipl.so
SWI_HOME_DIR is set also.
LD_LIBRARY_PATH is set to point to libjvm.so
My Controller function:
#PostMapping("/rules/testAPI/")
#Timed
public List<String> insertRule() {
String use_module_http = "use_module(library(http/http_open)).";
JPL.init();
Query q1 = new Query(use_module_http);
if (!q1.hasNext()) {
System.out.println("Failed to load HTTP Module");
} else {
System.out.println("Succeeded to load HTTP Module");
}
return null;
}
Console output
1st Call
Succeeded to load HTTP Module
2nd Call
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f31705294b2, pid=16, tid=0x00007f30d2eee700
#
# JRE version: OpenJDK Runtime Environment (8.0_191-b12) (build 1.8.0_191-8u191-b12-2ubuntu0.18.04.1-b12)
# Java VM: OpenJDK 64-Bit Server VM (25.191-b12 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libswipl.so+0xb34b2] PL_thread_attach_engine+0xe2
#
# Core dump written. Default location: //core or core.16
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
I uploaded the error log file in pastebin. click here
Has anyone faced the same problem? Is there a solution about this?
Note that, I also checked it also with oracle-java-8 but the same error occurs.
UPDATE:
#CapelliC answer didn't work.
I think I would try to 'consume' the term. For instance
Query q1 = new Query(use_module_http);
if (!q1.hasNext()) {
System.out.println("Failed to load HTTP Module");
} else {
System.out.println("Succeeded to load HTTP Module:"+q1.next().toString());
// remember q1.close() if there could be multiple soultions
}
or better
if ((new Query(use_module_http)).oneSolution() == null) ...
or still better
if ((new Query(use_module_http)).hasSolution() == false) ...
Not a direct answer because it suggests a different approach, but for a long time I was running a setup where a C++ program I wrote would wrap SWI-Prolog the way you're doing with Spring Boot and it was very difficult to add features to/maintain. About a year ago I went to a totally different approach where I added a MQTT plugin to SWI-Prolog so my Prolog code could run continuously and respond to and send MQTT messages. So now Prolog can interoperate with other modules in a variety of languages (mostly Java), but everything runs in its own process. This has worked out MUCH better for me and I've got everything running in Docker containers - including the MQTT broker. I'm not firmly suggesting MQTT (though I like it), just to consider the approach of having Java and Prolog less tightly coupled.
Most likely the reason why it is failing the second time is because you are calling JPL.init() again. It should be called only once.
Finally it was a bug of JPL package. After contacting SWI-Prolog developers, they patched a fix to the SWI-Prolog Git and now the error is gone!
Right configuration, so that Docker container be able to understand JPL is found in this link: Github : env.sh

Jmeter in distributed mode: JVM doesn't stop because of BeanShell server

WHAT I HAVE:
Jmeter network created for tests in distributed mode: one Jmeter master plus few Jmeter slaves. BeanShell Server disabled. Everything works fine.
WHAT I WANT TO DO:
I want to enable BeanShell server to be able to modify properties on the fly.
ACTUAL RESULT:
BeanShell server starts and works successfully.
Once the test is done, the following message appears in the log:
The JVM should have exited but did not.
The following non-daemon threads are still running (DestroyJavaVM is OK):
Thread[Thread-7,5,main], stackTrace:java.net.PlainSocketImpl#socketAccept
java.net.AbstractPlainSocketImpl#accept at line:409
java.net.ServerSocket#implAccept at line:545
java.net.ServerSocket#accept at line:513
bsh.util.Sessiond#run at line:65
java.lang.Thread#run at line:748
Thread[Thread-5,5,main], stackTrace:java.net.PlainSocketImpl#socketAccept
java.net.AbstractPlainSocketImpl#accept at line:409
java.net.ServerSocket#implAccept at line:545
java.net.ServerSocket#accept at line:513
bsh.util.Httpd#run at line:64
java.lang.Thread#run at line:748
It's clear that it happens because of BeanShell server that is running and doesn't exit for some reason.
As a result, java process will never exit and will hang.
QUESTION:
Any ideas why it happens? How to avoid it? I don't connect to beanshell server, neither by http nor by telnet.
MORE DETAILS:
All the nodes are running as Docker containers.
All nodes are deployed in AWS.
Can't reproduce the same issue locally on my machine. Even with BeanShell server enabled, all works smooth, and java exists as it should.
WHAT I TRIED:
Tried to set up the following properties:
jmeterengine.remote.system.exit=true
jmeterengine.stopfail.system.exit=true
jmeterengine.force.system.exit=true
Doesn't help, java process still hangs.
The same for
bsh.system.shutdownOnExit = true;
Any ideas are very much appreciated.
I don't think there is a way of stopping the Beanshell Server along with the test as it runs in an endless-loop Thread
I would recommend stopping your test and the Beanshell server using a .bsh file like:
stopEngine();
Thread.sleep(5000L); // just in case wait for 5 seconds for graceful shutdown
System.exit(0);
where:
stopEngine() is a shorthand for StandardJMeterEngine.stopEngine(), it's defined in startup.bsh
System.exit(0); - basically shuts down the whole JVM with 0 exit status code
So you will be able to turn everything off the same way you amend the properties in the runtime.
You can also achieve the same by executing the System.exit(0); command automatically from i.e. tearDown Thread Group using OS Process Sampler, however in this case make sure to set the following property:
jmeter.save.saveservice.autoflush=true
otherwise you can loose some results which have not been written to disk yet
Finally, I found a workaround, so posting it as an answer here.
Actually, it doesn't answer the original question in fact, but don't want to waste time trying to find an original root cause as it seems to be related to AWS structure (just because I can't reproduce it locally inside the same container I launch remotely).
So, the final schema is:
To update properties on the fly, I don't have to launch BeanShell on slave and master both. Real property change happens on slave, so master is not required. It solves my specific case.
To stop test, you can use existent shutdown.sh and stoptest.sh scripts. No need to do it over beanshell server.
If the test was properly stopped, then -X option (server.exitaftertest=true) also works fine, so force shutdown for JVM is not required, and you don't have to add tearDown thread group that does System.exit. But the idea is nice.
It's worth to have a shutdown.bsh script to force kill it anyway as a tool.

quickfix session config issues

I've compiled and trolled around the quickfix ( http://www.quickfixengine.org ) source and the examples. I figured a good starting point would be to compile (C++) and run the 'executor' example, then use the 'tradeclient' example to connect to 'executor', and send it order requests.
I created two seperate session files one for the 'executor' as an acceptor, and one for the 'tradeclient' as the initiator. They're both running on the same Win7 pc.
'executor' runs, but tradeclient can't connect to it, and I can't figure out why. I downloaded Mini-fix and was able to send messages to executor, so I know that executor is working. I figure that the problem is with the tradeclient session settings. I've included both of them below, I was hoping someone could point out what's causing them to not communicate. They're both running on the same computer using port 56156.
--accceptor session.txt----
[DEFAULT]
ConnectionType=acceptor
ReconnectInterval=5
SenderCompID=EXEC
DefaultApplVerID=FIX.5.0
[SESSION]
BeginString=FIXT.1.1
TargetCompID=SENDER
HeartBtInt=5
#SocketConnectPort=
SocketAcceptPort=56156
SocketConnectHost=127.0.0.1
TransportDataDictionary=pathToXml/spec/FIX50.xml
StartTime=07:00:00
EndTime=23:00:00
FileStorePath=store
---- initiator session.txt ---
[DEFAULT]
ConnectionType=initiator
ReconnectInterval=5
SenderCompID=SENDER
DefaultApplVerID=FIX.5.0
[SESSION]
BeginString=FIXT.1.1
TargetCompID=EXEC
HeartBtInt=5
SocketConnectPort=56156
#SocketAcceptPort=56156
SocketConnectHost=127.0.0.1
TransportDataDictionary=pathToXml/spec/FIX50.xml
StartTime=07:00:00
EndTime=23:00:00
FileLogPath=log
FileStorePath=store
--------end------
Update: Thanks for the resonses... Turns out that my logfile directories didn't exist. Once I created them, they both started communicating. Must have been some logging error that didn't throw an exception, but disabled proper behavior.
Is there an error condition that I should be checking? I was relying on exceptions, but that's obviously not enough.
It doesn't seem to be config, check that your message sequence numbers are in synch, especially since you've been connecting to a different server using the same settings.
Try setting the TargetCompID and SenderCompID on the acceptor to *

Resources