Checklist: When to write portable / POSIX-compliant shell code?

Checklist: When to write portable / POSIX-compliant shell code? - bash

Many shell developers spend much effort in developing portable code, avoiding for example Bashisms, and I wonder how much this increased effort really contributes to compliance with software requirements.
I wonder if it is possible to give a simple checklist of conditions requiring portable code. Let us assume
that using shell instead of $BETTER_LANGUAGE actually makes sense,
that each shell has the same version on all target systems (i.e. shell version differences are not part of portability),
that readability/changeability is not considered for the moment (portable code may be easier to read / understand than e.g. the newest Bash RegEx feature).
This is what has come to my mind so far:
Code is actually (or probably in the future) and intentionally executed in different shells (e.g. because the target platforms are limited to different shells, or because the code may have to be executed in a faster shell like dash in the future) and providing different implementations is not feasible.
Code is part of a framework that aims to be portable itself
Code is part of a source package that must be buildable on a variety of platforms
The software is distributed as a package for a variety of target systems and the developer does not want to introduce a dependency on a specific shell
Are there any obvious conditions that I haven't thought of, or could they be expressed more generally? Would POSIX compliance require a different checklist than general portability? Is there any literature or other source that you would recommend?

I wonder how much this increased effort really contributes to compliance with software requirements.
This - "this increased effort" - suggests that choice of the shell as programming language was wrong in the first place. Shell is the language of plumbing: you do NOT implement tools in it - you control other tools with it. If a shell script takes long time to develop, if you run into limitations of the POSIX shell, then the shell is very likely the wrong language.
The only exception to that, and something missing from your checklist, is the environment of a bare, as if freshly installed, commercial UNIX system (and probably some of the *BSD systems). If you develop software for such systems, then you would probably have to use the POSIX shell, simply because there might be literally nothing (as in literally "literally nothing") else installed on the system.
In the end, all software comes with the list of dependencies and prerequisites. If you include the bash as dependency, it would bother no *NIX admin: bash is often the first 3rd party package installed on such systems anyway. Also, compared to other 3rd party software, the bash (as other popular shells, e.g. csh, tcsh, ksh) is literally free of the installation and dependency problems.
Would POSIX compliance require a different checklist than general portability?
There is no such thing as "general portability". Portability is always particular.
Software portability, broadly defined, is a ability of the software to run on two or more different operating systems. POSIX is an operating system standard, defining a part of the OS interfaces to users and software developers. It is a platform foundation, not a platform of its own.
POSIX compliance requirement I have seen only in some niche markets. Often for sole reason that it is the only operating system ISO standard in existence.
For the software which doesn't have the hard requirement of the "POSIX compliance", generally it is recommended to target specific platforms. Common denominator might still be the POSIX standard, but the OS specific features, customizations and handling simply make the software better for the users of the particular platform.
In the end, users do not care about the POSIX. The POSIX is the tool for the developers. Developers use it to reduce the costs of developing and maintaining the software for multiple platform. Thus the requirement, as it comes from the users, is the support of a particular platform, not "POSIX compliance".

Related

Is there anything preventing interoperability between modern languages and COBOL?

I was reading about how people were having trouble finding people to work with COBOL when working government systems that still use it. I was also reading about how Fortran, a language made two years before COBOL, is interoperable with C, C++, R, and Python with the right libraries.
This allows Fortran scripts to work with modern programming languages to some degree and even create scripts in modern programming languages that can work alongside Fortran code, making it easier for novices of Fortran to work with it. Are there any particular issues that prevent COBOL from having similar interoperability with other programming languages like SQL (which is used for databases similar to COBOL) that would make it easier for modern programmers who might not normally learn COBOL to work with it?

Q1: Does anything prevents interoperability between modern languages and COBOL?
A1: Short answer similar to those above: No, it is actually often done.
But that may depends on what "modern language" is defined for the reader.
Even with "real" COBOL (not some "shiny" [may be read as "blending"] "managed COBOL") you are in most cases free to directly call any C functions so more or less can call anything (at least with a C wrapper) and also can call binaries as you can do on the operating system (`CALL 'SYSTEM' USING 'some-executabe-or-script "param1" "param2"' is a common extension).
For calling into any "native code" directly (like Win32 or POSIX) you obviously have to ensure you are using the correct parameter definitions, but COBOL 2002+ have stuff like USAGE SIGNED-LONG, USAGE POINTER and similar (the extension USAGE COMP-5 is also common in this place).
Additional there are often direct ways to inter-operate with socket servers, HTTP(S), XML, JSON, ... ; and many COBOL implementations also allow to ASSIGN a (line-)sequential file to a pipe, allowing to interact with other programs in this way, too.
Q2: Are there any particular issues that prevent COBOL from having [...] interoperability with [...] SQL?
A2: No, and SQL is a very common directly used in COBOL: EXEC SQL
Many people will say that SQL is no "programming language". It is a query language and may be used in different environments, including COBOL.
Depending on the environment used, EXEC SQL may be directly integrated into the COBOL environment or with a pre-parser that adjusts the code to be plain COBOL (normally CALLing some "native" code, see Q1).
Q3: [... stuff] that would make it easier for modern programmers who might not normally learn COBOL to work with it?
... this is a completely different question, whatever a "modern programmer" is.
For a programmer to get to know a programming language it all depends on the programmer and the resources (like time, manuals, tutorials, mentors) - and the will of the programmer. Many people actually don't "want" to learn COBOL (for reasons I've heard but don't understand or disagree), other miss some of the resources (a free compiler is available with GnuCOBOL, nearly all COBOL compiler have their manuals available online and the ISO working group for COBOL publish the draft standards online, too; you often can find mentors in COBOL discussion forums or mailing lists, along with many samples).
One thing that often is special with COBOL is not the language itself, but the environment it is used on ("mainframe" with job control language "jcl" instead of a GUI to click or a shell to use) and/or the software that is actually coded in COBOL; every software that is maintained over decades has "special ways" here and there, and if you get to "decade old code that wasn't actually maintained for years" you get into even more troubles/fun (this is not something COBOL specific, but with COBOL you may encounter this software more often).

No, there is nothing preventing interoperability.
The main reason (this is an opinion, not based on known facts) that Fortran seems to have more interop out-of-the-box was that there was a free software GNU/Fortran for interested parties to work with. COBOL was very late in the game getting a viable free software compiler. That is no longer an issue with GnuCOBOL and people are finally starting to write the code needed to catch up.
Adding to Simon's answer; proof of concept for direct embedding is in a branch for GnuCOBOL; intrinsic functions added to support FUNCTION TCL, FUNCTION PYTHON, FUNCTION REXX, FUNCION LUA and FUNCTION JVM, so far. With FUNCTION JVM tests for Scala, Groovy, Java, Frink, all worked. This allows data transfer between COBOL working storage and the other language engine using simple COBOL syntax. Including setups for callbacks to and from. Those functions are embedded into the compiler and libcob run-time, when using that branch.
For other interface trials, not built into the compiler, but still allowing interop; the GnuCOBOL FAQ has dozens of examples. Shakespeare? Yep. Falcon? Yep. C, well, GnuCOBOL emits intermediate C so that's covered in spades. There is also a C++ edition of the compiler, so C++ is also covered, in spades. Javascript; Jsish, Duktape, Spidermonkey, Quickjs to name a few of the trials.
Ada, D, Vala, Genie, S-Lang, ROOT/CINT, J, Gambas, Forth, Perl, Postscript, Pure, Icon and Unicon, Nim, BaCon, SWIG (which opens up many multiples), PARI/GP, Gretl, R, Red, Ruby, Haxe/Neko, Pascal, Erlang, Elixir, SQLite, Rust, Go, more..., including a fair number of esolangs, and GNU Lightning for on the fly assembly modules. Trials documented in the GnuCOBOL FAQ.
Framework interfacing for AWT/Swing, GTK, Agar, and things like ZeroMQ, CGI and websockets also proved successful and are in productive use. Along with at least 7 EXEC SQL preprocessors successfully tested, and in use.
It comes down to someone caring to try, and writing some glue or properly aligning call frames. No attempts I've tried have failed to produce satisfactory results, although Perl 5 was a hair pull of unraveling macro layers. (Ok, I just lied, while attempting to embed jq, which relies on using C call and return by struct features, I would have had to leave pure COBOL interface coding, and didn't bother with the C middleware that would have made it easy). ;-) Will do that someday though, as jq is quite the powerful little JSON handler.
Use the search engine you mistrust the least and look for "gnu-cobol-builtin-script" and "GnuCOBOL FAQ", and visit the hits on SourceForge.
In my particular explorations I usually focus on languages with a C Application Binary Interface, but other ABIs would be along a similar vein. It only takes sitting down and writing some middleware or figuring out how to properly synch the call frames.
Are these current samples perfect? Not always, there are edge and corner cases with some datatypes and COBOL PICTURE data that would require more work, but that is all; a little bit of work and testing to smooth over the bumps. When exploring, I don't always go that far until an actual need arises. These seed work experiments are just to get some proof in the pudding, all done for the simple joy of it.
One of the lead developers for GnuCOBOL just added uni and bi-directional piping using simple filenames, which provides access to whatever the base OS offers, using basic COBOL OPEN/READ/WRITE/CLOSE (and other file IO) statements. Code was committed to trunk just a few hours before I started typing this response.
Basically, the answer to the titular question is a resounding No.

The scenario involved in the governmental systems is most likely IBM mainframe hardware with a flavor of z/OS, z/VSE, or z/VM operating system.
It somewhat depends on what is meant by interoperability in the sense that most any modern mainframe supports TCP/IP and that pretty much opens up the whole networked computing ecosystem to networked interoperability.
My guess is when all is said and done, the reason there is a problem is that the state refuses to pay a market rate for experienced mainframe developers and has kicked the maintenance can down the road as cost-saving measures.
It most likely is not a matter of there being no mainframe COBOL professionals able to make the systems work; it's most likely the state won't pay the price.
But this is speculation on my part since all I know is that the governor blames inanimate objects for appropriations and management failures within the state IT administration.
As a 40-year mainframe veteran, I'm dying to know details as to how this perfectly good technology is at fault for problems dealing with (again, I assume) unprecedented volumes of processing demand.

We found an interoperability problem between C and GnuCOBOL.
Our problem was addressed so this answer is just for educational purposes so you can understand what kind of problems you may have.
The problem manifests when C calls COBOL(a, b) calls C(c) calls COBOL(a, b).
And specifically when the number of arguments varies.
A recent change to GnuCOBOL assumed that COBOL called COBOL so it passed meta data about the arguments in some global area. Then the called COBOL program cleared out the second argument because is falsely thought it was being called with one argument. That is, the intermediate C call was transparent to COBOL.
This is under the guise of making it more compatible with IBM mainframe but it caused me a lot of grief. It was quick addressed with runtime changes. I would like to see it addressed with a compile time option:
Make .so file a stand alone .so file called from any language but programmer has to be vigilant.
Make .so file assume it will be called from COBOL and has the additional protections afforded by mainframe COBOL.
BTW: GnuCOBOL is great and has a great community behind it. If you are experiencing problems report it and you will get better response than commercial products.

Portability between Unix shells - am I thinking about the issue correctly?

Whenever I write shell scripts (mostly software development utilities or build tools) I've generally tried to avoid using bash in favor of using plain old sh for portability. However lately I've been running into more and more issues where useful features are not available, or behavior is actually less consistent across systems using sh then it is using bash, since sh is aliased to different shells...
As I understand it, sh is the oldest Unix shell and carefully written sh scripts should in theory run on pretty much any system out there... but it also seems there are about 9000 different variants of every major shell, too. Doesn't using bash as your script interpreter effectively limit your script's portability? Sure, no problems on OS X or pretty much any Linux out there, but what about the BSDs? Solaris, AIX, HP-UX? What do you do if you really want to run on everything?
I know bash can be installed on virtually any OS but it is really a first class citizen on all relevant modern systems? Does it come pre-installed? I'm just not really sure whether it's best to avoid or embrace bash with the intent of having the most consistent and portable overall experience.

What do you do if you really want to run on everything?
You follow the POSIX standard for sh (and the tools you're calling) and hope that the target OS does so too. Any modern product called "UNIX" must follow this standard, and customarily (though not universally), the standard shell will be called /bin/sh. The BSDs and Linux distros tend to aim at POSIX compatibility as well.

Doesn't using bash as your script interpreter effectively limit your script's portability?
Yes, but it depends on your target audience as you noted. If it's a short script, it's worth testing under dash (Ubuntu and Debian's default shell) for POSIX compatibility.
Whenever I start thinking about portability issues in my shell script, I switch to another language. Perl is widely available and generally a good choice for scripts, but if your tools are to be consumed by Python, Ruby, $lang developers, use $lang to its full potential.

bash itself is just a plain C program, does not need special authority to run, can be put in any location. You can easily build it from source. Basically, you can run bash if you need to and doesn't need the administrator of the system to install it.
As long as it is in your path, you can always code your script with the line.
#!/usr/bin/env bash

How does porting between Linux and Windows work?

If a particular piece of software is made to be run on one platform and the programmer/company/whatever wants to port it to the other, what exactly is done? I mean, do they just rewrite linux or windows-specific references to the equivalent in the other? Or is an entire rewrite necessary?
Just trying to understand what makes it so cost-prohibitive that so many major vendors don't port their software to Linux (specifically thinking about Adobe)
Thanks

this is the point of a cross-platform toolkit like qt or gtk, they provide a platform-agnostic API, which delegates to whichever platform the program is compiled for.
some companies don't use such a toolkit, and write their own (for whatever reason - could well be optimisation-related), meaning they can't just recompile their code for another os.

There are also libraries available that ease, at least on a specific area, the port of Windows API calls to Linux. See the Windows to Linux porting library.

In my experience, there are three main reasons why it's cost-prohibitive to take a large existing program on one platform and port it to another:
it has (not necessarily purposely) extensively used some library or API (often GUI, but there are also plenty of other things) that turns out not to exist on the other platform
it has unknowingly become riddled with dependency on nonstandard features or oddities of the compiler or other tools
it was written by somebody who didn't know that you had to use some oddball feature to get things to work on the other platform (like a Linux library that isn't sprinkled with the right __declspec directives you need for a good Windows DLL).
It's much easier to write a cross-platform app if you consider that a design goal from the start, and I have three specific recommendations:
Use Boost—oodles of handy things you might ordinarily get from platform-specific APIs and libraries, but by using Boost you get it cross-platform.
Do all your GUI programming using a cross-platform library. My favorite these days is Qt, but there are other worthy ones as well.
Build and test every day on both platforms, never provide an opportunity for the code to develop a dependency on only one platform and discover it only too late.

There are many reasons why it may be very difficult to port an application to a different platform, most often it is because some interfaces the application uses to communicate with the system are not available, and one either has to implement them on their own, port a library your application depends on, or rewrite the application, so that it uses alternative functions. Most languages today are very portable across hardware architectures and operating systems, but the problem is with libraries, system calls and potentially other interfaces the OS (or platform) provides. To be more specific:
Compilers may differ in their configuration and the standard functions they provide. On Windows the most popular compiler for C/C++ is Visual Studio, while on unix it is gcc and llvm (in combination with the standard library glibc or BSD libc). They expect different flags, different forms of declaration, produce different file format of executables and shared libraries. Even though C and C++ have standard libraries, they are implemented differently across platforms. There are some systems whose aim is to make compilation portable, such as Autotools, CMake and SCons.
On top of standard libraries there are additional functions OS provides. On Windows they are covered by win32 API, on unix systems these are part of the POSIX standard, with various GNU, BSD and Linux specific extensions, and there are also plain system calls (the lowest-level interface between applications and the operating system). POSIX functions are available on Windows via systems such as cygwin and mingw, win32 API function are available on unix via Wine. The problem is that these implementations are not complete, and often there are minor (but important) differences.
Communication with the desktop system (in order to make a GUI interface) is done differently. On Linux this might be the X Window System (together with freedesktop libraries) or Wayland, while Windows has its own systems. There are GUI libraries which try to provide an abstract interface for these, such as Qt, GTK, wxWidgets, EFL.
Other services the OS provides, such as network communication may be implemented differently. On Windows many applications use .NET libraries, for which there is only limited support on unix systems. Some unix applications rely on Linux-specific features such as systemd, /proc, KMS, cgroups, namespaces. This limits portability even among unix systems (Linux, BSD systems, Mac OS X, ...). Even .NET libraries are not very compatible across different versions, and they might not be available on an older version of Windows or on embedded systems. Android and iOS have different interfaces entirely.
Web applications are usually the most portable, but HTML5 is a live standard, and many interfaces may not be available yet in some browsers/web engines. This requires the use of polyfills, but it is usually much less painful than the situation with "native" applications.
Because of all of these limitations, porting can be a pretty hard work and sometimes it is easier to create a new application from scratch, either specifically for the other platform, using cross-platform abstraction libraries/platforms (such as Qt or Java), or as a web application (potentially bundled in something like Electron). It is a good idea to use these from the beginning, but many programmers choose not to because the applications tend to look and behave differently from "native" applications on the platform, and they might also be slower and more restricted in the way they interact with the OS.

Porting a piece of software that has not been made platform-independant upfront can be an enormous task. Often, the code is deeply ingrained with non-portable APIs, whether 3rd party or just OS libraries. If the 3rd party vendor does not provide the API for the platform you are porting on, you are pretty much forced into a full rewrite of that functionality, or finding another 3rd party that is portable. This only can be awfully costly.
Finally, porting software also means supporting it on another platform, which means hiring some specialists, and training support to answer more complex queries.
In the end, such a process can be very costly, for very little additional sales. Sadly, the decision is easy: concentrate on new functionality on your current platform that you know your customers are going to pay for.

If the software was written for a single OS, a major rework is likely. The first step is to move absolutely all platform-specific code into a single area of the code base; this area should have little or no app-specific stuff. Then rewrite this isolated portion of the code for the new target OS.
Of course, this glosses over some extremely major implications. For instance, if your first version targeted the Win32 API, then any GUI code will be heavily tied to Windows, and to maintain any hope of preserving your sanity, you will need to move all that code to a cross-platform GUI framework like Qt or GTK.

Under Mono, you can write a C# Winforms program that works on both platforms. But to make that possible, the Mono team had to write their own Winforms library that essentially duplicates all of the functions of Winforms. So there is still no free lunch.

Most software is portable to some extent. In the case of a C app - there will be a lot of #ifdefs in the area, apart from path changes, etc.
Rarely windows/linux version of the same software don't share a common codebase - this would actually mean that they only share a common name. It's always harder to maintain more codebases, but I think that the actual problem with porting applications has little to do with the technical side and a lot with business side. Linux has much fewer users that Windows/OSX, most of them expect everything to be free as in beer or simply hate commercial software on some religious grounds.
When you come to think about it - most open source software is multiplatform, no matter what language was used to implement it. This speaks for itself...
P.S. Disclaimer - I'm an avid supporter of Free and Open source software, I don't want to insult anybody - I just share my perspective on the topic.

What is the best Scheme implementation for (sys-admin) shell scripts?

I've gone through the academic Scheme stuff (read SICP, The Little Schemer, The Seasoned Schemer, TSPL) and been playing with Scheme as a toy for a while.
But I want to get practical.
Today I needed to write a shell script to do some batch file processing, and thought "why not do it in Scheme?". I did, and it was a joy.
Now I'm forced to wonder what the best implementation is for shell script type stuff.
I know all implementations differ in terms of what they implement beyond R5RS. (Basically, they differ in all the useful and practical extensions you'd want in a scripting language).
So I'd like to pick one implementation and stick to it. I'm looking for something that:
Is cross platform (Linux, OS X, Windows).
Has extensions that are useful in day-to-day shell scripting, and those extensions are part of the base install.
Is easy to install. (e.g. there are a number of pre-built binaries, and/or it is a standard package on many distros.)
Is actively developed, with an active community.
Has Unicode support.
I've been using Gambit so far. It seems to satisfy the above constraints. PLT seems like overkill. Wondering about Guile, MIT/GNU, etc.

PLT Scheme meets all of your criteria. Since it looks like you know that already, you should to use the MzScheme package. MzScheme is the runtime on top of which all of PLT is built.
If you were to download the full PLT Scheme install it would seem large as it includes a lot of documentation and an IDE in addition to the runtime.

Have you heard of scsh? I haven't used it, but it sounds a lot like what you want.

I recommend Gauche, which is:
Running on Linux, OS X, Windows w/
Cygwin and some other UNIX-like
platforms,
The base install contains
POSIX-compliant system libraries and
useful modules such as
network protocols, file system,
DBM, multithreading, etc...
Several package system such as MacPorts, apt-get, yum are available
(or just say configure, make and
make install),
There are active English and Japanese mailing list,
Supports UTF-8 as an internal encoding.

What is the best (portable and maintanable) Shell Scripting language today?

I know this question has kind-a started "religious" wars in past and there might not be one correct answer. But after working with ksh and csh for last 3-4 years and going through the pain of porting from one to another or applying a common piece of logic to multiple versions (read as legacy code), if I am writing a new script, I would go for ksh, but out of compulsion rather than choice. Is there a better option other than ksh/csh? Also something that is portable across Unixes (Solaris/HP/IBM/FreeBSD) and Linux (and if I am not asking too much or it if does make sense all Linux flavors)
Waiting for suggestions ...
Peace :)
Devang Kamdar

I would suggest plain old sh, which is available everywhere.
Also, it is worth noting that portability involves not only shell but also other commands used in a script such as awk, grep, ps or echo.

If you really want it to be portable (I don't know that any shell-script is maintainable), I would specify #!/bin/sh and test with dash and if possible other shells.

I would expect BASH to be the widest spread shell at the moment since it is the default for many Linux distributions (it can even run on Windows with cygwin, but that's probably true for the other shells, too).
An alternative might be to not use the shell itself for scripting but one of the scriping languages out there like perl, python, ruby, ...

I usually use ksh. I find that it's a good compromise between features and portability. It's there (or a compatible version is available) on most Linux boxes and Solaris. It's a while since I used HP-UX (thankfully) but I'm pretty sure it was available there too.
If all the machines you need to support are modern, bash might be an option. Solaris 10 comes with a copy. It's the default on most Linux machines.
Your lowest common denominator is going to be Bourne (sh), so that's worth considering if portability is your main concern. It's missing some of the more friendly features of ksh and bash though.
It's still worth steering clear of csh/tcsh for scripting. Csh Programming Considered Harmful is an oldie but still largely relevant.

My answer would be perl.
Does everything 'sh' 'bash' etc can do in a nicer more elegant manner.
Also it is actually more portable. A given version perl is very consistant accross all platforms. There are no significant differences between the Linux, Solaris and AIX distributions whereas porting shell scripts between these platforms is a real pain.
And it works on all windows paltforms! Provided you avoid backticks and "system()" your script has a good chance of running.

Python! Check out iPython, which is an enhanced Python interpreter. Also: Python for Unix and Linux System Adminitration.
You can write great portable scripts, and it's fun.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio