How to call C++ code in llvm ... IRBuilder? - c++11

I am using LLVM to build a simple language along the lines demonstrated in the LLVM kaleidoscope documentation. However the doc omits two important things (at least):
how to call C++ code
how to do strings
You can imagine a custom language allows constructs like,
String str("Hello World");
String str1(" and StackOverflow");
str.append(str1);
Since LLVM ships with SmallString, the obvious thing to do is treat the user type String as a llvm::SmallString and emit code for it. But how to do this? I see very little sense in creating a String class from scratch provided there's a way to invoke those C++ calls from LLVM. Would one use IRBuilder to build out the C++ calls?
Another option is to create C++ file containing the equivalent C++ code for the user provided code in terms of llvm::SmallString intermixed with inlined IR code ala the kaleidoscope demo ... and pass the whole thing to Clang/LLVM to compile down to object code. But Clang/LLVM doesn't allow intermixing this way.
Putting aside technicalities of creating static strings that go into the initializer how do people do this? Once this is sorted out for strings, most likely the approach would work for sets, hashmaps, vectors, and all the various other containers LLVM ships with.
LLVM is a strange, cool thing. The things I need are right there, and not readily accessible based on any documentation I can find.
I want a considered approach because even this simple hello world program generates 81 lines of IR code. Replace the llvm::SimpleString with a C-string and the resulting IR code is smaller almost by an order of 10:
#include <llvm/ADT/SmallString.h>
int main(int argc, char **argv)
{
llvm::SmallString<32> str("hello world\n");
printf("%s\n", str.c_str());
return 0;
}
That's a lot of code to hand assemble and we're no where close to supporting all the methods doable for strings.
Thanks.

Related

Storing pairs in a GCC rope with c++11

I'm using a GCC extension rope to store pairs of objects in my program and am running into some C++11 related trouble. The following compiles under C++98
#include <ext/rope>
typedef std::pair<int, int> std_pair;
int main()
{
__gnu_cxx::rope<std_pair> r;
}
but not with C++11 under G++ 4.8.2 or 4.8.3.
What happens is that the uninitialised_copy_n algorithm is pulled in from two places, the ext/memory and the C++11 version of the memory header. The gnu_cxx namespace is pulled in by rope and the std namespace is pulled in by pair and there are now two identically defined methods in scope leading to a compile error.
I assume this is a bug in a weird use case for a rarely used library but what would be the correct fix? You can't remove the function from ext/memory to avoid breaking existing code and it now required to be in std. I've worked around it using my own pair class but how should this be fixed properly?
If changing the libstdc++ headers is an option (and I asked in the comments whether you were looking for a way to fix it in libstdc++, or work around it in your program), then the simple solution, to me, seems to be to make sure there is only one uninitialized_copy_n function. ext/memory already includes <memory>, which provides std::uninitialized_copy_n. So instead of defining __gnu_cxx::uninitialized_copy_n, it can have using std::uninitialized_copy_n; inside the __gnu_cxx namespace. It can even conditionalize this on C++11 support, so that pre-C++11 code gets the custom implementation of those functions, and C++11 code gets the std implementation of those functions.
This way, code that attempts to use __gnu_cxx::uninitialized_copy_n, whether directly or through ADL, will continue to work, but there is no ambiguity between std::uninitialized_copy_n and __gnu_cxx::uninitialized_copy_n, because they are the very same function.

Purpose of using Windows Data Types in a program

I am trying to understand the purpose of using Windows Data Types when defining parameters of a function/structure fields in a particular language. I've read explanations detailing how this prevents code from "breaking" if "underlying types" are changed. Can some one present a concise explanation and example to clarify? Thanks.
Found answer in a similar post (Why are the standard datatypes not used in Win32 API?):
And the reason that these types are defined the way they are, rather than using int, char and so on is that it removes the "whatever the compiler thinks an int should be sized as" from the interface of the OS. Which is a very good thing, because if you use compiler A, or compiler B, or compiler C, they will all use the same types - only the library interface header file needs to do the right thing defining the types.
By defining types that are not standard types, it's easy to change int from 16 to 32 bit, for example. The first C/C++ compilers for Windows were using 16-bit integers. It was only in the mid to late 1990's that Windows got a 32-bit API, and up until that point, you were using int that was 16-bit. Imagine that you have a well-working program that uses several hundred int variables, and all of a sudden, you have to change ALL of those variables to something else... Wouldn't be very nice, right - especially as SOME of those variables DON'T need changing, because moving to a 32-bit int for some of your code won't make any difference, so no point in changing those bits.
It should be noted that WCHAR is NOT the same as const char - WCHAR is a "wide char" so wchar_t is the comparable type.
So, basically, the "define our own type" is a way to guarantee that it's possible to change the underlying compiler architecture, without having to change (much of the) source code. All larger projects that do machine-dependant coding does this sort of thing.

Equivalent for GCC's naked attribute

I've got an application written in pure C, mixed with some functions that contain pure ASM. Naked attribute isn't available for x86 (why? why?!) and my asm functions don't like when prologue and epilogue is messing with the stack. Is it somehow possible to create a pure assembler function that can be referenced from C code parts? I simply need the address of such ASM function.
Just use asm() outside a function block. The argument of asm() is simply ignored by the compiler and passed directly on to the assembler. For complex functions a separate assembly source file is the better option to avoid the awkward syntax.
Example:
#include <stdio.h>
asm("_one: \n\
movl $1,%eax \n\
ret \n\
");
int one();
int main() {
printf("result: %d\n", one());
return 0;
}
PS: Make sure you understand the calling conventions of your platform. Many times you can not just copy/past assembly code.
PPS: If you care about performance, use extended asm instead. Extended asm essentially inlines the assembly code into your C/C++ code and is much faster, especially for short assembly functions. For larger assembly functions a seperate assembly source file is preferable, so this answer is really a hack for the rare case that you need a function pointer to a small assembly function.
Good news everyone. GCC developers finally implemented attribute((naked)) for x86. The feature will be available in GCC 8.
Certainly, just create a .s file (assembly source), which is run through gas (the assembler) to create a normal object file.

Where Is gcvt or gcvtf Defined in gcc Source Code?

I'm working on some old source code for an embedded system on an m68k target, and I'm seeing massive memory allocation requests sometimes when calling gcvtf to format a floating point number for display. I can probably work around this by writing my own substitute routine, but the nature of the error has me very curious, because it only occurs when the heap starts at or above a certain address, and it goes away if I hack the .ld linker script or remove any set of global variables (which are placed before the heap in my memory map) that add up to enough byte size so that the heap starts below the mysterious critical address.
So, I thought I'd look in the gcc source code for the compiler version I'm using (m68k-elf-gcc 3.3.2). I downloaded what appears to be the source for this version at http://gcc.petsads.us/releases/gcc-3.3.2/, but I can't find the definition for gcvt or gcvtf anywhere in there. When I search for it, grep only finds some documentation and .h references, but not the definition:
$ find | xargs grep gcvt
./gcc/doc/gcc.info: C library functions `ecvt', `fcvt' and `gcvt'. Given va
lid
./gcc/doc/trouble.texi:library functions #code{ecvt}, #code{fcvt} and #code{gcvt
}. Given valid
./gcc/sys-protos.h:extern char * gcvt(double, int, char *);
So, where is this function actually defined in the source code? Or did I download the entirely wrong thing?
I don't want to change this project to use the most recent gcc, due to project stability and testing considerations, and like I said, I can work around this by writing my own formatting routine, but this behavior is very confusing to me, and it will grind my brain if I don't find out why it's acting so weird.
Wallyk is correct that this is defined in the C library rather than the compiler. However, the GNU C library is (nearly always) only used with Linux compilers and distributions. Your compiler, being a "bare-metal" compiler, almost certainly uses the Newlib C library instead.
The main website for Newlib is here: http://sourceware.org/newlib/, and this particular function is defined in the newlib/libc/stdlib/efgcvt.c file. The sources have been quite stable for a long time, so (unless this is a result of a bug) chances are pretty good that the current sources are not too different from what your compiler is using.
As with the GNU C source, I don't see anything in there that would obviously cause this weirdness that you're seeing, but it's all eventually a bunch of wrappers around the basic sprintf routines.
It is in the GNU C library as glibc/misc/efgcvt.c. To save you some trouble, the code for the function is:
char *
__APPEND (FUNC_PREFIX, gcvt) (value, ndigit, buf)
FLOAT_TYPE value;
int ndigit;
char *buf;
{
sprintf (buf, "%.*" FLOAT_FMT_FLAG "g", MIN (ndigit, NDIGIT_MAX), value);
return buf;
}
The directions for obtain glibc are here.

Looking for C source code for snprintf()

I need to port snprintf() to another platform that does not fully support GLibC.
I am looking for the underlying declaration in the Glibc 2.14 source code. I follow many function calls, but get stuck on vfprintf(). It then seems to call _IO_vfprintf(), but I cannot find the definition. Probably a macro is obfuscating things.
I need to see the real C code that scans the format string and calculates the number of bytes it would write if input buffer was large enough.
I also tried looking in newlib 1.19.0, but I got stuck on _svfprintf_r(). I cannot find the definition anywhere.
Can someone point me to either definition or another one for snprintf()?
I've spent quite a while digging the sources to find _svfprintf_r() (and friends) definitions in the Newlib. Since OP asked about it, I'll post my finding for the poor souls who need those as well. The following holds true for Newlib 1.20.0, but I guess it is more or less the same across different versions.
The actual sources are located in the vfprintf.c file. There is a macro _VFPRINTF_R set to one of _svfiprintf_r, _vfiprintf_r, _svfprintf_r, or _vfprintf_r (depending on the build options), and then the actual implementation function is defined accordingly:
int
_DEFUN(_VFPRINTF_R, (data, fp, fmt0, ap),
struct _reent *data _AND
FILE * fp _AND
_CONST char *fmt0 _AND
va_list ap)
{
...
http://www.ijs.si/software/snprintf/ has what they claim is a portable implementation of snprintf, including vsnprintf.c, asnprintf, vasnprintf, asprintf, vasprintf. Perhaps it can help.
The source code of the GNU C library (glibc) is hosted on sourceware.org.
Here is a link to the implementation of vfprintf(), which is called by snprintf():
https://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-common/vfprintf.c

Resources