How to run code in a debugging session from VS code on a remote using an interactive session? - visual-studio

I am using a cluster (similar to slurm but using condor) and I wanted to run my code using VS code (its debugger specially) and it's remote sync extension.
I tried running it using my debugger in VS code but it didn't quite work as expected.
First I logged in to the cluster using VS code and remote sync as usual and that works just fine. Then I go ahead an get an interactive job with the command:
condor_submit -i request_cpus=4 request_gpus=1
then that successfully gives a node/gpu to use.
Once I have that I try to run the debugger but somehow it logs me out from the remote session (and it looks like it goes to the head node from the print statements). That's NOT what I want. I want to run my job in the interactive session in the node/gpu I was allocated. Why is VS code running it in the wrong place? How can I run it in the right place?
Some of the output from the integrated terminal:
source /home/miranda9/miniconda3/envs/automl-meta-learning/bin/activate
/home/miranda9/miniconda3/envs/automl-meta-learning/bin/python /home/miranda9/.vscode-server/extensions/ms-python.python-2020.2.60897-dev/pythonFiles/lib/python/new_ptvsd/wheels/ptvsd/launcher /home/miranda9/automl-meta-learning/automl/automl/meta_optimizers/differentiable_SGD.py
conda activate base
(automl-meta-learning) miranda9~/automl-meta-learning $ source /home/miranda9/miniconda3/envs/automl-meta-learning/bin/activate
(automl-meta-learning) miranda9~/automl-meta-learning $ /home/miranda9/miniconda3/envs/automl-meta-learning/bin/python /home/miranda9/.vscode-server/extensions/ms-python.python-2020.2.60897-dev/pythonFiles/lib/python/new_ptvsd/wheels/ptvsd/launcher /home/miranda9/automl-meta-learning/automl/automl/meta_optimizers/differentiable_SGD.py
--> main in differentiable SGD
hello world torch_utils!
vision-sched.cs.illinois.edu
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
-> initialization of DiMO done!
---> i = 0, iteration/it 1 about to start
lp_norms(mdl) = 18.43514633178711
lp_norms(meta_optimized mdl) = 18.43514633178711
[e=0,it=1], train_loss: 2.304989814758301, train error: -1, test loss: -1, test error: -1
---> i = 1, iteration/it 2 about to start
lp_norms(mdl) = 18.470401763916016
lp_norms(meta_optimized mdl) = 18.470401763916016
[e=0,it=2], train_loss: 2.3068909645080566, train error: -1, test loss: -1, test error: -1
---> i = 2, iteration/it 3 about to start
lp_norms(mdl) = 18.548133850097656
lp_norms(meta_optimized mdl) = 18.548133850097656
[e=0,it=3], train_loss: 2.3019633293151855, train error: -1, test loss: -1, test error: -1
---> i = 0, iteration/it 1 about to start
lp_norms(mdl) = 18.65604019165039
lp_norms(meta_optimized mdl) = 18.65604019165039
[e=1,it=1], train_loss: 2.308889150619507, train error: -1, test loss: -1, test error: -1
---> i = 1, iteration/it 2 about to start
lp_norms(mdl) = 18.441967010498047
lp_norms(meta_optimized mdl) = 18.441967010498047
[e=1,it=2], train_loss: 2.300947666168213, train error: -1, test loss: -1, test error: -1
---> i = 2, iteration/it 3 about to start
lp_norms(mdl) = 18.545459747314453
lp_norms(meta_optimized mdl) = 18.545459747314453
[e=1,it=3], train_loss: 2.30662202835083, train error: -1, test loss: -1, test error: -1
-> DiMO done training!
--> Done with Main
(automl-meta-learning) miranda9~/automl-meta-learning $ conda activate base
(automl-meta-learning) miranda9~/automl-meta-learning $ hostname vision-sched.cs.illinois.edu
Doesn't even run without debugging mode
The problem is more serious than I thought. I can't run the debugger in the interactive session but I can't even "Run Without Debugging" without it switching to the Python Debug Console on it's own. So that means I have to run things manually with python main.py but that won't allow me to use the variable pane...which is a big loss!
What I am doing is switching my terminal to the conoder_ssh_to_job and then clicking the button Run Without Debugging (or ^F5 or Control + fn + f5) and although I made sure to be on the interactive session at the bottom in my integrated window it goes by itself to the Python Debugger window/pane which is not connected to the interactive session I requested from my cluster...
related:
gitissue: https://github.com/microsoft/vscode-remote-release/issues/1722
quora: https://qr.ae/TqCiu8
reddit: https://www.reddit.com/r/vscode/comments/f1giwi/how_to_run_code_in_a_debugging_session_from_vs/

You can try reversing the order of operations; first submitting the job, obtaining the name of the compute node allocated to you, then instructing VSCode to connect to the compute node rather than the login node.
So first would be
condor_submit -i request_cpus=4 request_gpus=1
and noting the name of the compute node. Assuming node001 in the following.
Then, open VSCode on your laptop, click on the Remote Development extension icon and choose "Remote SSH: Connect to Host...". Choose "+ Add new SSH host...". In the "Enter SSH command" box, add the following:
ssh -J vision-sched.cs.illinois.edu miranda9#node001
The VSCode will ask you which SSH configuration file it should update. Make sure to review that configuration: specify the SSH keys if needed, the user name, etc. Also make sure you have the vision-sched.cs.illinois.edu correctly configured in that file.
Then you can choose that host to connect to. VSCode will then execute on the compute node, and will be disconnected when the allocation finishes.

I stumbled upon a related issue recently (I wanted to use VsCode interactive Python capabilities on a compute node) and the above weren't working but this solved it:
ssh to the remote cluster ssh cluster
inside the remote cluster, add my public key to the authorized keys, so typically append the content of ~/.ssh/id_rsa.pub (local machine) to .ssh/authorized_keys (remote cluster)
allocate some resources inside the cluster (this particular cluster uses slurm and not condor so in this case I use something like srun --pty bash)
get the name of the compute node, typically visible in the command line as username#nodename). For argument's sake, let's imagine I get a generic name like node001
for simplicity on my local machine, modify the ~/.ssh/config file and edit it as:
Host cluster
# stuff written
Host node*
HostName %h
ProxyJump cluster
User $USERNAME
Now I'm able to ssh to it from my local machine (as long as the compute node is running) with ssh node001.
In VsCode this boils down to
CTRL+P > Remote-SSH: Connect to Host...
type in the name of the node, here node001
you get connected to the node, now every interactive python you run (including jupyter and jupytext) will have access to your allocated resources
I don't know how generic this solution is, I hope it'll help at least somebody !

Here is a simpler workaround:
on the remote server create a file named bash somewhere for example /home/myuser/pathto/bash
make it executable using chmod +x bash
write salloc [your desired options for the interactive job] in the bash file
In vscode Settings search for Automation Shell: Linux and click on the "Edit in settings.js"
change the line to "terminal.integrated.automationShell.linux": "/home/myuser/pathto/bash" and save it (use the absolute path. for example ~/pathto/bash didn't work for me)
Done :)
now every time you run the debugger it will first ask for the interactive job and the debugger will run on it. but take in to consider that this is also applied to tasks you run in tasks.json.
also you can use srun instead of salloc. for example srun --pty -t 2:00:00 --mem=8G bash

Related

ISDeploymentWizard.exe command (SSIS deployment ) in CMD doesn't print any indication for status

I'm running the below command in CMD for SSIS:
ISDeploymentWizard.exe /Silent /ModelType:Project /SourcePath:"C:\TEST\Integration Services.ispac" /DestinationServer:"TEST03,1111" /DestinationPath:"/TEST/DEVOPS"
and it finished successfully but with no indication to the command line. I can only check with SSMS to make sure it was really deployed. any idea why?
Solid observation here #areilma - the /silent option eliminates all status info. I had always assumed that flag controlled whether the gui was displayed or not.
If I run this command
isdeploymentwizard.exe /Silent /ModelType:Project /SourcePath:".\SO_66497856.ispac" /DestinationServer:".\dev2017" /DestinationPath:"/SSISDB/BatchSizeTester/SO_66497856"
My package is deployed to my local machine at the path specified. Removing the /silent option causes the GUI to open up with the prepopulated values.
isdeploymentwizard.exe /ModelType:Project /SourcePath:".\SO_66497856.ispac" /DestinationServer:".\dev2017" /DestinationPath:"/SSISDB/BatchSizeTester/SO_66497856"
When the former command runs, nothing is printed to the command prompt. So that's happy path deployment, maybe if something is "wrong", I'd get an error message on the command line. And this is where things got "interesting".
I altered my destination path to a folder that doesn't exist. I know the tool doesn't create a path if it doesn't exist and when I ran it, I didn't get an error back on the command line. What I did get, was a pop up windowed error of
TITLE: SQL Server Integration Services
The path does not exist. The folder 'cBatchSizeTester' was not found in catalog 'SSISDB'. (Microsoft.SqlServer.IntegrationServices.Wizard.Common)
BUTTONS:
OK
So the /silent option removes the gui to allow us to have an automated deploy but if a bad value is passed, we return to having a gui... I then repeated with a bad server name, which led to a second observation. The second I hit enter, the command line returned ready for the next command. 15 seconds later however,
TITLE: SQL Server Integration Services
Failed to connect to server .\dev2017a. (Microsoft.SqlServer.ConnectionInfo)
ADDITIONAL INFORMATION:
A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: SQL Network Interfaces, error: 26 - Error Locating Server/Instance Specified) (Microsoft SQL Server, Error: -1)
For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&EvtSrc=MSSQLServer&EvtID=-1&LinkId=20476
Well now, that tells me that the actual deployment is an independent spawned process. So it won't return any data back to the command line, in any case.
Since I assume we're looking at this from a CI/CD perspective, what can we do? We could fire off a sqlcmd afterwards looking for an entry in the SSISDB catalog views to see what happened. Something like this
SELECT TOP 1 O.end_time, SV.StatusValue, F.name AS FolderName, P.name AS ProjectName FROM catalog.operations AS O
CROSS APPLY
(
SELECT
CASE O.status
WHEN 1 THEN 'Created'
WHEN 2 THEN 'Running'
WHEN 3 THEN 'Canceled'
WHEN 4 THEN 'Failed'
WHEN 5 THEN 'Pending'
WHEN 6 THEN 'Ended unexpectedly'
WHEN 7 THEN 'Succeeded'
WHEN 8 THEN 'Stopping'
WHEN 9 THEN 'Completed'
END AS StatusValue
)SV
INNER JOIN catalog.object_versions AS OV ON OV.object_id = O.object_id
INNER JOIN catalog.projects AS P ON P.object_version_lsn = OV.object_version_lsn
INNER JOIN catalog.folders AS F ON F.folder_id = P.folder_id
/*
INNER JOIN
catalog.packages AS PKG
ON PKG.project_id = P.project_id
*/
WHERE O.operation_type = 101 /*deploy project*/
AND P.name = 'SO_66497856' /*project name*/
AND F.name = 'BatchSizeTester'
ORDER BY o.created_time DESC
Perhaps a filter against end_time of within the past 10 seconds would be appropriate and if we have a result and the status is Succeeded we got a deploy. No result means it failed. I presume something similar happens when the gui runs and despite all this testing, I'm not interested in firing up a trace to fully round out this answer and see what happens behind the scenes.
If you want to negate the value of the prebuilt tool, the other option would be to use the ManagedObjectModel/PowerShell approach to deploy as you can get info from there. The other deployment option is with the TSQL Commands. The second link in my documentation section outlines what that would look like
Paltry documentation I could find
I could find no documentation as to the command line switches for isdeploymentwizard.exe
Deploy an SSIS project from the command prompt with ISDeploymentWizard.exe
Deploy Integration Services (SSIS) Projects and Packages
From #arielma's deleted answer, they found a more succinct answer saying "not possible"

Pyro4 configuration doesn't change

I put the Pyro4 configuration as this in the starting part of my code:
Pyro4.config.THREADPOOL_SIZE = 1
Pyro4.config.THREADPOOL_SIZE_MIN = 1
I check if I tried to run two client code at the same time, it will say ' rejected: no free workers, increase server threadpool size'. It looks like the setting is working, but when I open the console to check the pyro configuration using "python -m Pyro4.configuration", it returns:
THREADPOOL_SIZE = 40
THREADPOOL_SIZE_MIN = 4
Does someone know why?
When you run python -m Pyro4.configuration, it will simply print the default settings (influenced only by any environment variables you may have set). I'm not sure why you think that this should know about the settings you added in your own code.

MapReduceIndexerTool output dir error "Cannot write parent of file"

I want to use Cloudera's MapReduceIndexerTool to understand how morphlines work. I created a basic morphline that just reads lines from the input file and I tried to run that tool using that command:
hadoop jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool \
--morphline-file morphline.conf \
--output-dir hdfs:///hostname/dir/ \
--dry-run true
Hadoop is installed on the same machine where I run this command.
The error I'm getting is the following:
net.sourceforge.argparse4j.inf.ArgumentParserException: Cannot write parent of file: hdfs:/hostname/dir
at org.apache.solr.hadoop.PathArgumentType.verifyCanWriteParent(PathArgumentType.java:200)
The /dir directory has 777 permissions on it, so it is definitely allowed to write into it. I don't know what I should do to allow it to write into that output directory.
I'm new to HDFS and I don't know how I should approach this problem. Logs don't offer me any info about that.
What I tried until now (with no result):
created a hierarchy of 2 directories (/dir/dir2) and put 777 permissions on both of them
changed the output-dir schema from hdfs:///... to hdfs://... because all the examples in the --help menu are built that way, but this leads to an invalid schema error
Thank you.
It states 'cannot write parent of file'. And the parent in your case is /. Take a look into the source:
private void verifyCanWriteParent(ArgumentParser parser, Path file) throws ArgumentParserException, IOException {
Path parent = file.getParent();
if (parent == null || !fs.exists(parent) || !fs.getFileStatus(parent).getPermission().getUserAction().implies(FsAction.WRITE)) {
throw new ArgumentParserException("Cannot write parent of file: " + file, parser);
}
}
In the message printed is file, in your case hdfs:/hostname/dir, so file.getParent() will be /.
Additionally you can try the permissions with hadoop fs command, for example you can try to create a zero length file in the path:
hadoop fs -touchz /test-file
I solved that problem after days of working on it.
The problem is with that line --output-dir hdfs:///hostname/dir/.
First of all, there are not 3 slashes at the beginning as I put in my continuous trying to make this work, there are only 2 (as in any valid HDFS URI). Actually I put 3 slashes because otherwise, the tool throws an invalid schema exception! You can easily see in this code that the schema check is done before the verifyCanWriteParent check.
I tried to get the hostname by simply running the hostname command on the Cent OS machine that I was running the tool on. This was the main issue. I analyzed the /etc/hosts file and I saw that there are 2 hostnames for the same local IP. I took the second one and it worked. (I also attached the port to the hostname, so the final format is the following: --output-dir hdfs://correct_hostname:8020/path/to/file/from/hdfs
This error is very confusing because everywhere you look for the namenode hostname, you will see the same thing that the hostname command returns. Moreover, the errors are not structured in a way that you can diagnose the problem and take a logical path to solve it.
Additional information regarding this tool and debugging it
If you want to see the actual code that runs behind it, check the cloudera version that you are running and select the same branch on the official repository. The master is not up to date.
If you want to just run this tool to play with the morphline (by using the --dry-run option) without connecting to Solr and playing with it, you can't. You have to specify a Zookeeper endpoint and a Solr collection or a solr config directory, which involves additional work to research on. This is something that can be improved to this tool.
You don't need to run the tool with -u hdfs, it works with a regular user.

Send data by network and plot with octave

I am working on a robot and my goal is to plot the state of the robot.
For now, my workflow is this:
Launch the program
Redirect the output in a file (robot/bash): rosrun explo explo_node > states.txt
Send the file to my local machine (robot/bash): scp states.txt my_desktop:/home/user
Plot the states with octave (desktop/octave): plot_data('states.txt')
Is there a simple solution to have the data in "real time"? For the octave side. I think that I can with not so much difficulty read from a file as an input and plot the data when data is added.
The problem is how do I send the data to a file?
I am opened to other solutions than octave. The thing is that I need to have 2d plot with arrows for the orientation of the robot.
Here's an example of how you could send the data over the network (as Andy suggested) and plot as it is generated (i.e. realtime). I also think this approach is the most flexible / appropriate.
To demonstrate, I will use a bash script that generates an
pair every 10th of a second, for the
function, in the range
:
#!/bin/bash
# script: sin.sh
for i in `seq 0 0.01 31.4`;
do
printf "$i, `echo "s($i)" | bc -l`\n"
sleep 0.1
done
(Don't forget to make this script executable!)
Prepare the following octave script (requires the sockets package!):
% in visualiseRobotData.m
pkg load sockets
s = socket();
bind(s, 9000);
listen(s, 1);
c = accept(s);
figure; hold on;
while ! isempty (a = str2num (char (recv (c, inf))))
plot (a(:,1), a(:,2), '*'); drawnow;
end
hold off;
Now execute things in the following order:
Run the visualiseRobotData script from the octave terminal.
(Note: this will block until a connection is established)
From your bash terminal run: ./sin.sh | nc localhost 9000
And watch the datapoints get plotted as they come in from your sin.sh script.
It's a bit crude, but you can just reload the file in a loop. This one runs for 5 minutes:
for i = 1:300
load Test/sine.txt
plot (sine(:,1), sine(:,2))
sleep (1)
endfor
You can mount remote directory via sshfs:
sshfs user#remote:/path/to/remote_dir local_dir
so you wouldn't have to load remote file. If sshfs is not installed, install it. To unmount remote directory later, execute
fusermount -u local_dir
To get a robot's data from Octave, execute (Octave code)
system("ssh user#host 'cd remote_dir; rosrun explo explo_node > states.txt'")
%% then plot picture from the data in local_dir
%% that is defacto the directory on the remote server

subinacl get full output

We are using the windows console program subinacl.exe to grant a user the right to stop and start a service. Therfore we use the following command:
subinacl.exe /service %SERVICE_NAME% /grant=%PC_NAME%\%USER_NAME%=PTO
where
%SERVICE_NAME% = name of the service
%PC_NAME% = name of the computer
%USER_NAME% = name of the user that should become the right to start and stop the service
PTO = right to start and stop the service (R would be just reading)
When typing the command into the default windows command line (with administrator rights) on a windows server 2012 the result is:
ELITE_INETRSVSERVER : delete Perm. ACE 4 test-pc\test
ELITE_INETRSVSERVER : new ace for test-pc\test
ELITE_INETRSVSERVER : 2 change(s)
Elapsed Time: 00 00:00:00
Done: 1, Modified 1, Failed 0, Syntax errors 0
Last Done : ELITE_INETRSVSERVER
Now we want to save the text into a file or get it into a programm (via redirect the outputs : Getting output from a shell/dos app into a Delphi app). We need the integer values of Done and Failed found in the result.
The problem is, that we cannot catch the last three lines after the empty lines.
When using console redirect, the first three lines can be found in the file result.txt. But the last three are shown in the console.
subinacl.exe /service %SERVICE_NAME% /grant=%PC_NAME%\%USER_NAME%=PTO > result.txt 1<&2
The same problem we do have, when redirecting the output programmatically.
Of course every command is executed as administrator.
The option /errorlog could help to solve the problem:
subinacl /outputlog=c:\NONERRORS.TXT /errorlog=C:\ERRORLOG.TXT /file C:\TEST.TXT /display
if C:\ERRORLOG.TXT file is empty it means that the command has been executed successfully.

Resources