mpirun vs "./" when to use? - performance

So I'm configuring a batch script to run my code on a cluster. It runs on my workstation with both
./<executable> name
and
mpirun -n <# of cores> executable name
What is the difference and why does mpirun sometimes fail/hang?

MPI is the Message Passing Interface used for parallelization (running the code over multiple cores). mpirun is a command for launching MPI jobs. Your executable needs to be specifically programmed and built for MPI to take advantage of the parallelization. Otherwise mpirun just runs the same job n times, once on each core.
When you use ./, you're running the program on a single process, so it won't run with parallelization (at least not with MPI parallelization).
The operating system needs to know where the executable is located in order to launch it. Possible locations for the executables are defined in the PATH environment variable, which is typically defined with /usr/bin and possibly a few other directories. If the executable is not in a directory defined in the PATH, then the full path needs to be provided explicitly, e.g. /opt/some_third_party_application/bin/some_third_party_executable
The single dot . represents the current working directory. It's common for . to not be included in PATH (though sometimes it is included). It's recommended for security purposes not to include it, so that executables (including viruses) in random directories don't get launched accidentally. That means <executable> won't launch on its own if it's located in the current directory. The path to it needs to be provided, but . counts as a path (representing the current directory. Hence ./<executable> launches the program, the ./ indicating that execution is deliberately intended.
mpirun does not need a path to be provided since it's located in /usr/bin, in the PATH variable.

Related

execute shell commands within contents of node package [duplicate]

My book states:
Every program that runs on your computer has a current working directory, or cwd. Any filenames or paths that do not begin with the root folder are assumed to be under the current working directory
As I am on OSX, my root folder is /. When I type in os.getcwd() in my Python shell, I get /Users/apple/Documents. Why am I getting the Documents folder in my cwd? Is it saying that Python is using Documents folder? Isn't there any path heading to Python that begins with / (the root folder)? Also, does every program have a different cwd?
Every process has a current directory. When a process starts, it simply inherits the current directory from its parent process; and it's not, for example, set to the directory which contains the program you are running.
For a more detailed explanation, read on.
When disks became large enough that you did not want all your files in the same place, operating system vendors came up with a way to structure files in directories. So instead of saving everything in the same directory (or "folder" as beginners are now taught to call it) you could create new collections and other new collections inside of those (except in some early implementations directories could not contain other directories!)
Fundamentally, a directory is just a peculiar type of file, whose contents is a collection of other files, which can also include other directories.
On a primitive operating system, that was where the story ended. If you wanted to print a file called term_paper.txt which was in the directory spring_semester which in turn was in the directory 2021 which was in the directory studies in the directory mine, you would have to say
print mine/studies/2021/spring_semester/term_paper.txt
(except the command was probably something more arcane than print, and the directory separator might have been something crazy like square brackets and colons, or something;
lpr [mine:studies:2021:spring_semester]term_paper.txt
but this is unimportant for this exposition) and if you wanted to copy the file, you would have to spell out the whole enchilada twice:
copy mine/studies/2021/spring_semester/term_paper.txt mine/studies/2021/spring_semester/term_paper.backup
Then came the concept of a current working directory. What if you could say "from now on, until I say otherwise, all the files I am talking about will be in this particular directory". Thus was the cd command born (except on old systems like VMS it was called something clunkier, like SET DEFAULT).
cd mine/studies/2021/spring_semester
print term_paper.txt
copy term_paper.txt term_paper.backup
That's really all there is to it. When you cd (or, in Python, os.chdir()), you change your current working directory. It stays until you log out (or otherwise exit this process), or until you cd to a different working directory, or switch to a different process or window where you are running a separate command which has its own current working directory. Just like you can have your file browser (Explorer or Finder or Nautilus or whatever it's called) open with multiple windows in different directories, you can have multiple terminals open, and each one runs a shell which has its own independent current working directory.
So when you type pwd into a terminal (or cwd or whatever the command is called in your command language) the result will pretty much depend on what you happened to do in that window or process before, and probably depends on how you created that window or process. On many Unix-like systems, when you create a new terminal window with an associated shell process, it is originally opened in your home directory (/home/you on many Unix systems, /Users/you on a Mac, something more or less like C:\Users\you on recent Windows) though probably your terminal can be configured to open somewhere else (commonly Desktop or Documents inside your home directory on some ostensibly "modern" and "friendly" systems).
Many beginners have a vague and incomplete mental model of what happens when you run a program. Many will incessantly cd into whichever directory contains their script or program, and be genuinely scared and confused when you tell them that you don't have to. If frobozz is in /home/you/bin then you don't have to
cd /home/you/bin
./frobozz
because you can simply run it directly with
/home/you/bin/frobozz
and similarly if ls is in /bin you most definitely don't
cd /bin
./ls
just to get a directory listing.
Furthermore, like the ls (or on Windows, dir) example should readily convince you, any program you run will look in your current directory for files. Not the directory the program or script was saved in. Because if that were the case, ls could only produce a listing of the directory it's in (/bin) -- there is nothing special about the directory listing program, or the copy program, or the word processor program; they all, by design, look in the current working directory (though again, some GUI programs will start with e.g. your Documents directory as their current working directory, by design, at least if you don't tell them otherwise).
Many beginners write scripts which demand that the input and output files are in a particular directory inside a particular user's home directory, but this is just poor design; a well-written program will simply look in the current working directory for its input files unless instructed otherwise, and write output to the current directory (or perhaps create a new directory in the current directory for its output if it consists of multiple files).
Python, then, is no different from any other programs. If your current working directory is /Users/you/Documents when you run python then that directory is what os.getcwd() inside your Python script or interpreter will produce (unless you separately os.chdir() to a different directory during runtime; but again, this is probably unnecessary, and often a sign that a script was written by a beginner). And if your Python script accepts a file name parameter, it probably should simply get the operating system to open whatever the user passed in, which means relative file names are relative to the invoking user's current working directory.
python /home/you/bin/script.py file.txt
should simply open(sys.argv[1]) and fail with an error if file.txt does not exist in the current directory. Let's say that again; it doesn't look in /home/you/bin for file.txt -- unless of course that is also the current working directory of you, the invoking user, in which case of course you could simply write
python script.py file.txt
On a related note, many beginners needlessly try something like
with open(os.path.join(os.getcwd(), "input.txt")) as data:
...
which needlessly calls os.getcwd(). Why is it needless? If you have been following along, you know the answer already: the operating system will look for relative file names (like here, input.txt) in the current working directory anyway. So all you need is
with open("input.txt") as data:
...
One final remark. On Unix-like systems, all files are ultimately inside the root directory / which contains a number of other directories (and usually regular users are not allowed to write anything there, and system administrators with the privilege to do it typically don't want to). Every relative file name can be turned into an absolute file name by tracing the path from the root directory to the current directory. So if the file we want to access is in /home/you/Documents/file.txt it means that home is in the root directory, and contains you, which contains Documents, which contains file.txt. If your current working directory were /home you could refer to the same file by the relative path you/Documents/file.txt; and if your current directory was /home/you, the relative path to it would be Documents/file.txt (and if your current directory was /home/you/Music you could say ../Documents/file.txt but let's not take this example any further now).
Windows has a slightly different arrangement, with a number of drives with single-letter identifiers, each with its own root directory; so the root of the C: drive is C:\ and the root of the D: drive is D:\ etc. (and the directory separator is a backslash instead of a slash, although you can use a slash instead pretty much everywhere, which is often a good idea for preserving your sanity).
Your python interpreter location is based off of how you launched it, as well as subsequent actions taken after launching it like use of the os module to navigate your file system. Merely starting the interpreter will place you in the directory of your python installation (not the same on different operating systems). On the other hand, if you start by editing or running a file within a specific directory, your location will be the folder of the file you were editing. If you need to run the interpreter in a certain directory and you are using idle for example, it is easiest to start by creating a python file there one way or another and when you edit it you can start a shell with Run > Python Shell which will already be in that directory. If you are using the command line interpreter, navigate to the folder where you want to run your interpreter before running the python/python3/py command. If you need to navigate manually, you can of course use the following which has already been mentioned:
import os
os.chdir('full_path_to_your_directory')
This has nothing to do with osx in particular, it's more of a concept shared by all unix-based systems, and I believe Windows as well. os.getcwd() is the equivalent of the bash pwd command - it simply returns the full path of the current location in which you are in. In other words:
alex#suse:~> cd /
alex#suse:/> python
Python 2.7.12 (default, Jul 01 2016, 15:34:22) [GCC] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.getcwd()
'/'
It depends from where you started the python shell/script.
Python is usually (except if you are working with virtual environments) accessible from any of your directory. You can check the variables in your path and Python should be available. So the directory you get when you ask Python is the one in which you started Python. Change directory in your shell before starting Python and you will see you will it.
os.getcwd() has nothing to do with OSX in particular. It simply returns the directory/location of the source-file. If my source-file is on my desktop it would return C:\Users\Dave\Desktop\ or let say the source-file is saved on an external storage device it could return something like G:\Programs\. It is the same for both unix-based and Windows systems.

How are WSL POSIX paths converted to UNC for Windows native applications?

I found out that if I execute a Windows native program (PE) from WSL2, accessing a POSIX path magically works.
For example, I can access /dev/random if I execute my program from WSL bash, but if I execute the same program from CMD (command-prompt), I cannot.
I must understand the mechanism which allows this! :)
The test program is fairly simple:
#include <stdio.h>
int main(int argc, char *argv[], char *envp[]) {
printf("%p\n", fopen("/dev/urandom", "r"));
return 0;
}
If I execute this from inside the WSL instance, it succeeds opening the device.
If I execute this via CMD, however, it fails.
When I look at API mon, I can see that the open("/dev/urandom", "r") is converted to CreateFileA("\\wsl.localhost\Ubuntu\dev\urandom", ...).
First question: What component is doing this conversion?
If I replace the fopen with CreateFile it fails... so it must be something in the stdio functions.
Second question: How does it know what WSL instance is the parent?
I saw no API query, no environment to give me a hint. The only abnormality I can see is the opening \\wsl.localhost\Ubuntu\tmp during process startup.
Third question: Does this survive nested within process tree?
When I execute cmd.exe from inside WSL, then execute my test program, it fails.
However, I wrote my own native Windows program that executes my test program and the test program succeeds, so this behavior does survive process tree.
Can anyone explain the mechanism that allows this magic to work? What API? What component is doing the transition? Where is the context stored? How is it queried? How does it knows what distro to lookup?
I tried to ask this at Microsoft discussion[1] and got no response, so I am hopping someone here may be able to provide a hint.
[1] https://github.com/microsoft/WSL/discussions/8212
Short summary. I believe:
/init handles the conversion of the working directory that gets passed to the Windows executable.
When a path starts with a directory separator character (e.g. / or \), fopen considers it to be relative to the root of the volume of the working directory.
For example:
If you execute your code from /home/<username>
... then the working directory will be \\wsl.localhost\Ubuntu\home\<username>.
... the "volume" (share name in this case) will be \\wsl.localhost\Ubuntu\
... so /dev/random is opened as \\wsl.localhost\Ubuntu\dev\random.
Try this, however:
cd /mnt/c (or any location inside that mount)
Call your program via /full/path/to/the.exe.
The fopen fails in my testing (and I assume will for you as well), because ...
... the working directory that gets passed in is C:\ (or a subdirectory thereof).
... thus the volume name is also C:\.
... and fopen attempts to open C:\dev\random, which doesn't exist.
More detail:
What component is doing this conversion?
That part is (I believe) fairly easy to answer, although not definitively. As mentioned in this answer, when you launch a Windows executable in WSL, it uses a handler registered with binfmt_misc (see cat /proc/sys/fs/binfmt_misc/WSLInterop) to call the WSL /init.
Unfortunately, WSL's /init is closed source, and so it is difficult to get full insight into what is happening with the launch process. But I think we can safely say that the handler (/init) is going to be the component that converts the path before the Windows process receives it.
One interesting thing to note is that the wslpath command is mapped to that same binary via symlink. When called with the name wslpath, the /init binary will do OS path conversions. For example:
wslpath -w /dev/random
# \\wsl.localhost\Ubuntu\dev\random
But here's the real question ...
So we know that /init knows how to convert the path, but exactly what does it convert when launching a Windows binary? That's a bit tricky, but I think we can surmise that what gets converted is the path of the current working directory.
Try these simple experiments:
$ cd /home
$ wslpath -w .
\\wsl.localhost\Ubuntu\home
$ powershell.exe -c "Get-Location"
Path
----
Microsoft.PowerShell.Core\FileSystem::\\wsl.localhost\Ubuntu\home
$ cd /dev
$ wslpath -w .
\\wsl.localhost\Ubuntu\dev
$ powershell.exe -c "Get-Location"
Path
----
Microsoft.PowerShell.Core\FileSystem::\\wsl.localhost\Ubuntu\dev
$ cd /mnt/c
$ wslpath -w .
C:\
$ powershell.exe -c "Get-Location"
Path
----
C:\
And another question
So here's my question -- When did the Windows API get smart about concatenating UNC working directories and paths that start with a directory separator? I can find no documentation on that behavior, but it obviously works. And it's not specific to WSL. I observed the same concatenation behavior when using a UNC working directory for a regular network share.
Even more curious is that .NET's path handling is not this smart about UNC concatenation. From the doc, the behavior we observe with fopen is expected for DOS paths, but for UNC:
UNC paths must always be fully qualified. They can include relative directory segments (. and ..), but these must be part of a fully qualified path. You can use relative paths only by mapping a UNC path to a drive letter.
And I was able to confirm that behavior in PowerShell with a simple Get-Content.
Back to our regularly scheduled ...
But that aside, you don't even need your sample code to demonstrate this. You can see the same behavior by calling notepad.exe from within WSL:
$ cd /etc
$ notepad.exe /home/<username>/testfile.txt
# Creates or opens the proper file using \\wsl.localhost\Ubuntu\home\<username>\testfile.txt
$ cd /mnt/c/Users
$ notepad.exe /home/<username>/testfile.txt
# Results in "The system cannot find the path specified", because it is really attempting to open C:\home\<username>/testfile.txt, and the `home` directory (likely) doesn't exist at that path.
And your other related questions:
How does it know what WSL instance is the parent?
In case it's not clear by now, I think it's safe to say that the WSL /init knows what WSL instance you are in since it is "orchestrating" the whole thing anyway.
Does this survive nested within process tree?
As long as one process doesn't change the working directory of the next process in the tree, yes. However, CMD doesn't understand UNC paths, so, if it's in the process chain, your program will fail.

showip: command not found

I am trying to run one of the example from Beej's Guide to Network Programming (https://beej.us/guide/bgnet/), specifically showip.c (The link to the program is here: https://beej.us/guide/bgnet/examples/showip.c). Using gcc, I've typed in
gcc -o showip showip.c
Then ran the program
showip www.example.net
and I get an error showip: command not found on the same directory where the code and the program is compiled at. I'm not sure why this is the case. I've even cloned the code from his GitHub and used makefile to compile the program and yet I'm getting the same error. What exactly am I doing it wrong here?
This is actually problem with how you're running the program.
On Linux systems (unlike Windows systems) an executable in the current directory is not by default searched by the shell for programs to run. If the given program does not contain a path element (i.e. there are no / characters in the name) then only the directories listed in the PATH environment variable are searched.
Since the current directory is not part of your PATH, prefix the command with the directory:
./showip www.example.net
Is the working directory on your path? Likely not.
Try ./showip
Since the program showip is not in your $PATH you have to tell
your shell that it's in the current directory:
./showip
Or add the current directory to your $PATH but it's a less secure
option:
PATH=:$PATH
or
PATH=.:$PATH
and run it as you're trying now:
showip

Having the local directory in the PATH environment is not secure?

In a lab, my professor writes:
For security reasons, the local directory '.' is NOT part of the PATH environment variable [on Unix] (on Windows it is, though!).
The rest of the lab is unrelated to this issue and focuses on multithreaded programs, however this line bugs me - I have no idea how this is not secure or how it could be exploited on a Windows system.
Why might have the local directory in the PATH be insecure, and what kind of attack could this make possible?
To demonstrate the weakness,
consider the extreme case when . is the first entry in PATH.
If a malicious directory contains scripts named ls and cd that execute,
let's say, rm -fr ~, you'll be in for an unpleasant surprise.
These scripts will get executed instead of the standard commands,
since the files in the current directory will be found first.
Let's take the optimistic extreme case, when . is the last entry in PATH.
That's better, but still not so great.
The idea of PATH is to have entries that are absolute paths,
which are chosen deliberately as directories that contain programs that are safe to run.
Having . in PATH let's you run prog instead of ./prog.
But this tiny convenience is not worth undermining your security.

Should you change the current directory in a shell script?

I've always mentally regarded the current directory as something for users, not scripts, since it is dependent on the user's location and can be different each time the script is executed.
So when I came across the Java jar utility's -C option I was a little puzzled.
For those who don't know the -C option is used before specifying a file/folder to include in a jar. Since the path to the file/folder is replicated in the jar, the -C option changes directories before including the file:
in other words:
jar -C flower lily.class
will make a jar containing the lily.class file, whereas:
jar flower/lily.class
will make a flower folder in the jar which contains lily.class
For a jar-ing script I'm making I want to use Bourne wild-cards folder/* but that would make using -C impossible since it only applies to the next immediate argument.
So the only way to use wild-cards is run from the current directory; but I still feel uneasy towards changing and using the current directory in a script.
Is there any downside to using the current directory in scripts? Is it frowned upon for some reason perhaps?
I don't think there's anything inherently wrong with changing the current directory from a shell script. Certainly it won't cause anything bad to happen, if taken by itself.
In fact, I have a standard script that I use for starting up a Java-based server, and the very first line is:
cd `dirname $0`
This ensures that the rest of the commands in the script are executed in the directory that contains the script file itself (useful when a single machine is hosting multiple server instances), regardless of where the shell script was actually invoked from. Without changing the current directory in the script, it would only work correctly if the user remember to manually cd into the corresponding directory before running the script.
In this case, performing the cd operation from within the script removes a manual step from the server startup/shutdown process, and makes things slightly less error-prone as a result.
So as with most things, there are legitimate uses for this sort of thing. And I'm sure there are also some questionable ones, as well. It really depends upon what's most appropriate for your specific use-case. Which is something I can't really comment on...I always just let maven build my JAR's for me.

Resources