How do different apps written in different languages interact? - compilation

Off late I'd been hearing that applications written in different languages can call each other's functions/subroutines. Now, till recently I felt that was very natural - since all, yes all - that's what I thought then, silly me! - languages are compiled into machine code and that should be same for all the languages. Only some time back did I realise that even languages compiled in 'higher machine code' - IL, byte code etc. can interact with each other, the applications actually. I tried to find the answer a lot of times, but failed - no answer satisfied me - either they assumed I knew a lot about compilers, or something that I totally didn't agree with, and other stuff...Please explain in an easy to understand way how this works out. Especially how languages compiled into 'pure' machine code have different something called 'calling conventions' is what is making me clutch my hair.

This is actually a very broad topic. Languages compiled to machine code can often call each others' routines, though usually not without effort; e.g., C++ code can call C routines when properly declared:
// declare the C function foo so it can be called by C++ code
extern "C" {
void foo(int, char *);
}
This is about as simple as it gets, because C++ was explicitly designed for compatibility with C (it includes support for calling C++ routines from C as well).
Calling conventions indeed complicate the picture in that C routines compiled by one compiler might not be callable from C compiled by another compiler, unless they share a common calling convention. For example, one compiler might compile
foo(i, j);
to (pseudo-assembly)
PUSH the value of i on the stack
PUSH the value of j on the stack
JUMP into foo
while another might push the values of i and j in reverse order, or place them in registers. If foo was compiled by a compiler following another convention, it might try to fetch its arguments off the stack in the wrong order, leading to unpredictable behavior (consider yourself lucky if it crashes immediately).
Some compilers support various calling conventions for this purpose. The Wikipedia article introduces calling conventions; for more details, consult your compiler's documentation.
Finally, mixing bytecode-compiled or interpreted languages and lower-level ones in the same address space is still more complicated. High-level language implementations commonly come with their own set of conventions to extend them with lower-level (C or C++) code. E.g., Java has JNI and JNA.

Related

How OS restricts the way of procedure call?

There are two ways of procedure call, save address to register or save it in stack. I read that the way of procedure call is OS specific. I want to understand how OS restricts that. Can't compiler generate a code that saves address in register and load it later, or save it in stack and pop it when needed?
Just want to understand the role of OS here.
Thank you.
The operating system has no function in it whatsoever, except that the OS's own libraries may use a specific calling convention.The compiler determines the calling convention. Its' not OS specific but rather language and compiler specific.
Programming languages do things in different ways. For example, the nested procedures of Ada and Pascal need context passed to them behind the scenes that are not needed in C and C++.
In the old days there was pretty much chaos on this.
By the late 1970's the VMS operating system had a calling convention defined and all compilers made by the vendor complied with it. This made it possible for Fortran to call Pascal to call C to call Fortran. However, even there, things were not 100% transparent. In fact, the VMS compilers had extensions to languages to call function in other languages. In FORTRAN 77, everything was passed by reference. There had to extensions to call C function that expected everything to be passed by value.

Writing a Ruby extension in Go (golang)

Are there some tutorials or practical lessons on how to write an extension for Ruby in Go?
Go 1.5 added support for building shared libraries that are callable from C (and thus from Ruby via FFI). This makes the process easier than in pre-1.5 releases (when it was necessary to write the C glue layer), and the Go runtime is now usable, making this actually useful in real life (goroutines and memory allocations were not possible before, as they require the Go runtime, which was not useable if Go was not the main entry point).
goFuncs.go:
package main
import "C"
//export GoAdd
func GoAdd(a, b C.int) C.int {
return a + b
}
func main() {} // Required but ignored
Note that the //export GoAdd comment is required for each exported function; the symbol after export is how the function will be exported.
goFromRuby.rb:
require 'ffi'
module GoFuncs
extend FFI::Library
ffi_lib './goFuncs.so'
attach_function :GoAdd, [:int, :int], :int
end
puts GoFuncs.GoAdd(41, 1)
The library is built with:
go build -buildmode=c-shared -o goFuncs.so goFuncs.go
Running the Ruby script produces:
42
Normally I'd try to give you a straight answer but the comments so far show there might not be one. So, hopefully this answer with a generic solution and some other possibilities will be acceptable.
One generic solution: compile high level language program into library callable from C. Wrap that for Ruby. One has to be extremely careful about integration at this point. This trick was a nice kludge to integrate many languages in the past, usually for legacy reasons. Thing is, I'm not a Go developer and I don't know that you can compile Go into something callable from C. Moving on.
Create two standalone programs: Ruby and Go program. In the programs, use a very efficient way of passing data back and forth. The extension will simply establish a connection to the Go program, send the data, wait for the result, and pass the result back into Ruby. The communication channel might be OS IPC, sockets, etc. Whatever each supports. The data format can be extremely simple if there's no security issues and you're using predefined message formats. That further boosts speed. Some of my older programs used XDR for binary format. These days, people seem to use things like JSON, Protocol Buffers and ZeroMQ style wire protocols.
Variation of second suggestion: use ZeroMQ! Or something similar. ZeroMQ is fast, robust and has bindings for both languages. It manages the whole above paragraph for you. Drawbacks are that it's less flexible wrt performance tuning and has extra stuff you don't need.
The tricky part of using two processes and passing data between them is a speed penalty. The overhead might not justify leaving Ruby. However, Go has great native performance and concurrency features that might justify coding part of an application in it versus a scripting language like Ruby. (Probably one of your justifications for your question.) So, try each of these strategies. If you get a working program that's also faster, use it. Otherwise, stick with Ruby.
Maybe less appealing option: use something other than Go that has similar advantages, allows call from C, and can be integrated. Althought it's not very popular, Ada is a possibility. It's long been strong in native code, (restricted) concurrency, reliability, low-level support, cross-language development and IDE (GNAT). Also, Julia is a new language for high performance technical and parallel programming that can be compiled into a library callable from C. It has a JIT too. Maybe changing problem statement from Ruby+Go to Ruby+(more suitable language) will solve the problem?
As of Go 1.5, there's a new build mode that tells the Go compiler to output a shared library and a C header file:
-buildmode c-shared
(This is explained in more detail in this helpful tutorial: http://blog.ralch.com/tutorial/golang-sharing-libraries/)
With the new build mode, you no longer have to write a C glue layer yourself (as previously suggested in earlier responses). Once you have the shared-library and the header file, you can proceed to use FFI to call the Go-created shared library (example here: https://www.amberbit.com/blog/2014/6/12/calling-c-cpp-from-ruby/)

gcc and reentrant code

Does GCC generate reentrant code for all scenarios ?
no, you must write reentrant code.
Reentrancy is something that ISO C and C++ are capable of by design, so that includes GCC. It is still your responsibility to code the function for reentrancy.
A C compiler that does not generate reentrant code even when a function is coded correctly for reentrancy would be the exception rather than the rule, and would be for reasons of architectural constraint (such as having insufficient resources to support stack, so generating static frames). In these situations the compiler documentation should make this clear.
Some articles you might read:
Jack Ganssle on Rentrancy in 1993
Same author in 2001 on the same subject
No, GCC does not guarantee for the code written by you. Here is a good link for writing re-entrant code.
https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/generalprogramming/writing_reentrant_thread_safe_code.html
Re-entrancy is not something that the compiler has any control over - it's up to the programmer to write re-entrant code. To do this you need to avoid all the obvious pitfalls, e.g. globals (including local static variables), shared resources, threads, calls to other non-reentrant functions, etc.
Having said that, some cross-compilers for small embedded systems, e.g. 8051, may not generate reentrant code by default, and you may have to request reentrant code for specific functions via e.g. a #pragma.
GCC generates reentrant code on at least the majority of platforms it compiles for (especially if you avoid passing or returning structures by value) but it is possible that a particular language or platform ABI might dictate otherwise. You'll need to be much more specific for any more conclusive statement to be made; I know it's certainly basically reentrant on desktop processors if the code being compiled is itself basically reentrant (weird global state tricks can get you into trouble on any platform, of course).
No, GCC cannot possibly guarantee re-entrant code that you write.
However, on the major platforms, the compiler produced or included code, such as math intrinsics or function calls, are re-entrant. As GCC doesn't support platforms where non-reentrant function calls are common, such as the 8051, there is little risk in having a compiler issue with reentrancy.
There are GCC ports which have bugs and issues, such as the MSP430 version.

Why is WinAPI so much different from "normal" C?

I wonder why the WinAPI is so much different from "normal" C programming?
I mean, in school I learned that every C programm has a main() function (WinAPI uses WinMain with some special parameters), some variable types like int, long, char etc. (WinAPI uses things like LPCSTR, BOOL, etc.) so why did Microsoft decide to go such a different way with their OS API?
When I saw my first WinAPI program I it looks more like a new language to me... ;)
The original Windows API was designed in the 1984-85 time frame, over 25 years ago. Hungarian Notation was all the rage, so putting the type of a variable into the declaration was the thing to do. For example, in pure C, there is no way to indicate a 'far' pointer, which is what the LP in LPCSTR indicates, but in 1985, it was very important to distinguish between regular pointers and far pointers. (That importance went by the wayside when 32-bit windows took over in the mid-90s, but the syntax persists...)
Also, C doesn't really distinguish between just a pointer to a char and a pointer to a static string. Thus the lpsz types.
In the end, it's about bringing a stronger, consistent typing to the parameters than plain C allowed in 1984. As for the WinMain, it's because a Windows program is pretty fundamentally different from a command line program. If you look in the library, you'd probably find a main() function that sets up the parameters and then calls into an extern WinMain function (i.e. yours).
There are two major reasons:
Complexity. The C language is minimal, providing the building blocks on which more complex architectures can be constructed. LPCSTR, BOOL and other types that you find in Win32 are typedefs or structs built on top of C.
Event orientation. C is usually taught assuming that your program is proactive and in control of things. In an event-oriented environment, such as Windows (or any other GUI-based OS), your program is called by the OS, so it usually sits there in a loop waiting for messages to arrive.
The APIs of other GUI-based OSs may feel different to Win32, because there is no single solution, but the problem they are solving is the same one.
Microsoft's Raymond Chen writes in his blog:
Although the function WinMain is
documented in the Platform SDK, it's
not really part of the platform.
Rather, WinMain is the conventional
name for the user-provided entry point
to a Windows program.
The real entry point is in the C
runtime library, which initializes the
runtime, runs global constructors, and
then calls your WinMain function (or
wWinMain if you prefer a Unicode entry
point).
I'd say most of it is a question of style. The standards grew out of the Unix world, so for example the library functions have short names, and there aren't a whole lot of typedefs. I presume that reflects the choices of the designers of C and Unix. On the other hand Windows has LongFunctionNamesInMixedCase, and LOTSOFTYPEDEFS, *PTYPEDEFSFORPOINTERSTOO.
Some of it is also the perception of necessity. For example WinMain() has things like nCmdShow, because graphical apps will call ShowWindow() and I suppose they wanted to be able to pass the argument to that to a newly launched process. Whether or not that is actually needed might be another question.
And of course, some of the APIs do very different things. In Windows there's a lot of emphasis on passing messages, and processing messages on a per-thread basis. CreateFile() has a lot of flags that the Unix world doesn't have, including sharing modes which determine what another process can do while you have a file open.
They really didn't "go such a different way," as you put it.
WinMain() is simply the entry point looked for by the Windows OS. Conceptually, it's no different than main().
As for the symbol definitions (LPCSTR, BOOL, etc.), part of this is for ease of use. For example, it's shorter to write LPCSTR than const char *. Another example is the BOOL typedef which is not supported by the C language. The other reason is to insulate the developer from changes to the underlying hardware, e.g., the change from 16-bit to 32-bit to 64-bit architectures.
By no means should this answer be considered exhaustive. It's just a couple of things that I've noticed from the programming I've done with the Win32/MFC.
Windows API programming is event driven, while, up until that point, most C programming was linear. WinMain() is thus a shortcut into the libraries for writing using OS functionality - while main() is part of the C language.
While we're on the subject, C has few built in types, and, at the time, had few ways of indicating them. The windows "types" (HWND, LPSTR, BOOL, etc) reflect data types commonly used in windows programming, and make an attempt to indicate to the programmer what the data types will be.
The Hungarian notation is a bit of a mis-use of the original versions, in that there are an unnecessary number of qualifiers in many variables.

C Runtime objects, dll boundaries

What is the best way to design a C API for dlls which deals with the problem of passing "objects" which are C runtime dependent (FILE*, pointer returned by malloc, etc...). For example, if two dlls are linked with a different version of the runtime, my understanding is that you cannot pass a FILE* from one dll to the other safely.
Is the only solution to use windows-dependent API (which are guaranteed to work across dlls) ? The C API already exists and is mature, but was designed from a unix POV, mostly (and still has to work on unix, of course).
You asked for a C, not a C++ solution.
The usual method(s) for doing this kind of thing in C are:
Design the modules API to simply not require CRT objects. Get stuff passed accross in raw C types - i.e. get the consumer to load the file and simply pass you the pointer. Or, get the consumer to pass a fully qualified file name, that is opened , read, and closed, internally.
An approach used by other c modules, the MS cabinet SD and parts of the OpenSSL library iirc come to mind, get the consuming application to pass in pointers to functions to the initialization function. So, any API you pass a FILE* to would at some point during initialization have taken a pointer to a struct with function pointers matching the signatures of fread, fopen etc. When dealing with the external FILE*s the dll always uses the passed in functions rather than the CRT functions.
With some simple tricks like this you can make your C DLLs interface entirely independent of the hosts CRT - or in fact require the host to be written in C or C++ at all.
Neither existing answer is correct: Given the following on Windows: you have two DLLs, each is statically linked with two different versions of the C/C++ standard libraries.
In this case, you should not pass pointers to structures created by the C/C++ standard library in one DLL to the other. The reason is that these structures may be different between the two C/C++ standard library implementations.
The other thing you should not do is free a pointer allocated by new or malloc from one DLL that was allocated in the other. The heap manger may be differently implemented as well.
Note, you can use the pointers between the DLLs - they just point to memory. It is the free that is the issue.
Now, you may find that this works, but if it does, then you are just luck. This is likely to cause you problems in the future.
One potential solution to your problem is dynamically linking to the CRT. For example,you could dynamically link to MSVCRT.DLL. That way your DLL's will always use the same CRT.
Note, I suggest that it is not a best practice to pass CRT data structures between DLLs. You might want to see if you can factor things better.
Note, I am not a Linux/Unix expert - but you will have the same issues on those OSes as well.
The problem with the different runtimes isn't solvable because the FILE* struct belongs
to one runtime on a windows system.
But if you write a small wrapper Interface your done and it does not really hurt.
stdcall IFile* IFileFactory(const char* filename, const char* mode);
class IFile {
virtual fwrite(...) = 0;
virtual fread(...) = 0;
virtual delete() = 0;
}
This is save to be passed accross dll boundaries everywhere and does not really hurt.
P.S.: Be careful if you start throwing exceptions across dll boundaries. This will work quiet well if you fulfill some design creterions on windows OS but will fail on some others.
If the C API exists and is mature, bypassing the CRT internally by using pure Win32 API stuff gets you half the way. The other half is making sure the DLL's user uses the corresponding Win32 API functions. This will make your API less portable, in both use and documentation. Also, even if you go this way with memory allocation, where both the CRT functions and the Win32 ones deal with void*, you're still in trouble with the file stuff - Win32 API uses handles, and knows nothing about the FILE structure.
I'm not quite sure what are the limitations of the FILE*, but I assume the problem is the same as with CRT allocations across modules. MSVCRT uses Win32 internally to handle the file operations, and the underlying file handle can be used from every module within the same process. What might not work is closing a file that was opened by another module, which involves freeing the FILE structure on a possibly different CRT.
What I would do, if changing the API is still an option, is export cleanup functions for any possible "object" created within the DLL. These cleanup functions will handle the disposal of the given object in the way that corresponds to the way it was created within that DLL. This will also make the DLL absolutely portable in terms of usage. The only worry you'll have then is making sure the DLL's user does indeed use your cleanup functions rather than the regular CRT ones. This can be done using several tricks, which deserve another question...

Resources