Mac M1 `cp`ing binary over another results in crash - macos

Recently, I've been observing an issue that happens after copying a binary file over another binary file without first deleting it on my M1. After some experimentation (after hitting this issue), I've come up with a reproducible method of hitting this issue on Apple's new hardware on the latest 11.3 release of Big Sur.
The issue happens when copying a differing binary over another binary after they have been run at least once. Not sure what is causing this issue, but it's very perplexing and could potentially lead to some security issues.
For example, this produces the error:
> ./binaryA
# output A
> ./binaryB
# output B
> cp binaryA binaryB
> ./binaryB
Killed: 9
Setup
In order to reproduce the above behavior, we can create two simple C files with the following contents:
// binaryA.c
#include<stdio.h>
int main() {
printf("Hello world!");
}
// binaryB.c
#include<stdio.h>
const char s[] = "Hello world 123!"; // to make sizes differ for clarity
int main() {
printf("%s", s);
}
Now, you can run the following commands and get the error described (the programs must be run before the issue can be reproduced, so running the programs below is necessary):
> gcc -o binaryA binaryA.c
> gcc -o binaryB binaryB.c
> ./binaryA
Hello world!
> ./binaryB
Hello world 123!
> cp binaryA binaryB
> ./binaryB
Killed: 9
As you can see, the binaryB binary no longer works. For all intents and purposes, the two binaries are equal but one runs and one doesn't. A diff of both binaries returns nothing.
I'm assuming this is some sort of signature issue? But it shouldn't be because both binaries are not signed anyways.
Does anyone have a theory behind this behavior or is it a bug? Also, if it is a bug, where would I even file this?

Whenever you update a signed file, you need to create a new file.
Specifically, the code signing information (code directory hash) is hung off the vnode within the kernel, and modifying the file behind that cache will cause problems. You need a new vnode, which means a new file, that is, a new inode. Documented in WWDC 2019 Session 703 All About Notarization - see slide 65.
This is because Big Sur on ARM M1 processor requires all code to be validly signed (if only ad hoc) or the operating system will not execute it, instead killing it on launch.

While Trev's answer is technically correct (best kind of correct?), the likely answer is also that this is a bug in cp - or at least an oversight in the interaction between cp and the security sandbox, which is causing a bad user experience (and bad UX == bug in my book, no matter the intention)
I'm going to take a wild guess (best kind of guess!) and posit that when this was first implemented, someone hooked into the inode deletion as a trigger for resetting the binary signature state. It is very possible that, at the time that they implemented this, cp actually removed/destructively replaced the vnode/inode as part of the copy, so everything worked great. Then, at some point, someone else went and optimized cp to no longer be a destructive inode operation - and this is how the best bugs come to be!

Related

Output for CLion IDE sometimes cuts off when executing a program

When using CLion I have found the output sometimes cuts off.
For example when running the code:
main.cpp
#include <stdio.h>
int main() {
int i;
for (i = 0; i < 1000; i++) {
printf("%d\n", i);
}
fflush(stdout); // Shouldn't be needed as each line ends with "\n"
return 0;
}
Expected Output
The expected output is obviously the numbers 0-999 on each on a new line
Actual Output
After executing the code multiple times within CLion, the output often changes:
Sometimes it executes perfectly and shows all the numbers 0-999
Sometimes it cuts off at different points (e.g. 0-840)
Sometimes it doesn't output anything
The return code is always 0!
Screenshot
Running the code in a terminal (i.e. not in CLion itself)
However, the code outputs the numbers 0-999 perfectly when compiling and running the code using the terminal.
I have spent so much time on this thinking it was a problem with my code and a memory issue until I finally realised that this was just an issue with CLion.
OS: Ubuntu 14.04 LTS
Version: 2016.1
Build: #CL-145.258
Update
A suitable workaround is to run the code in debug mode (thanks to #olaf).
The consensus is that this is an IDE issue. Therefore, I have reported the bug.
A suitable workaround is to execute the code in debug mode (no breakpoint required).
I will update this question, as soon as this bug is fixed.
Update 1
WARNING: You should not change information in registry unless you have been asked specifically by JetBrains. Registry is not in the main menu for a reason! Use the following solution at your own risk!!!
JetBrains have contacted me and provided a suitable solution:
Go to the Find Action Dialog box (CTRL+SHIFT+A)
Search for "Registry..."
Untick run.processes.with.pty
Should then work fine!
Update 2
The bug has been added here:
https://youtrack.jetbrains.com/issue/CPP-6254
Feel free to upvote it!

Python subprocess Popen: Send binary data to C++ on Windows

After three days of intensive googleing and stackoverflowing I more or less got my program to work. I tried a lot of stuff and found a lot of answers somehow connected to my problem, but no working solution. Sry should I have missed the right page!! I'm looking forward to comments and recommendations.
Task:
Send binary data (floats) from python to C++ program, get few floats back
Data is going to be 20ms soundcard input, latency is a bit critical
Platform: Windows (only due to drivers for the soundcard...)
Popen with pipes, but without communicate, because I want to keep the C++ program opened
The whole thing worked just fine on ubuntu with test data. On Windows I ran into the binary stream problem: Windows checks the float stream for EOF character and finds it randomly. Then everything freezes, waiting for instream data which is just behind the "eof" wall. Or so I picture it.
In the end these two things were necessary:
#include <io.h>
#include <fcntl.h>
and
if (_setmode(_fileno(stdin), _O_BINARY) == -1)
{cout << "binary mode problem" << endl; return 1;};
in C++ as described here: https://msdn.microsoft.com/en-us/library/aa298581%28v=vs.60%29.aspx.
cin.ignore() freezes using binary mode! Guess since there's no eof anymore. Did not try/think about this too thoroughly though
cin.read(mem,sizeof(float)*length) does the job, since I know the length of the data stream
Compiled with MinGW
and in the Python code same thing! (forgot this first, cost me a day):
if sys.platform.find("win") > -1:
import msvcrt,os
process = subprocess.Popen("cprogram.exe",stdin=subprocess.PIPE,stdout=subprocess.PIPE,bufsize=2**12)
msvcrt.setmode(process.stdin.fileno(), os.O_BINARY)
and
process.stdin.write(data.tostring())

Cygwin 64-bit C compiler caching funny (and ending early)

We've been using CygWin (/usr/bin/x86_64-w64-mingw32-gcc) to generate Windows 64-bit executable files and it has been working fine through yesterday. Today it stopped working in a bizarre way--it "caches" standard output until the program ends. I wrote a six line example
that did the same thing. Since we use the code in batch, I wouldn't worry except when I run a test case on the now-strangely-caching executable, it opens the output files, ends early, and does not fill them with data. (The same code on Linux works fine, but these guys are using Windows.) I know it's not gorgeous code, but it demonstrates my problem, printing the numbers "1 2 3 4 5 6 7 8 9 10" only after I press the key.
#include <stdio.h>
main ()
{
char q[256];
int i;
for (i = 1; i <= 10; i++)
printf ("%d ", i);
gets (q);
printf ("\n");
}
Does anybody know enough CygWin to help me out here? What do I try? (I don't know how to get version numbers--I did try to get them.) I found a 64-bit cygwin1.dll in /cygdrive/c/cygwin64/bin and that didn't help a bit. The 32-bit gcc compilation works fine, but I need 64-bit to work. Any suggestions will be appreciated.
Edit: we found and corrected an unexpected error in the original code that caused the program not to populate the output files. At this point, the remaining problem is that cygwin won't show the output of the program.
For months, the 64-bit executable has properly generated the expected output, just as the 32-bit version did. Just today, it has started exhibiting the "caching" behavior described above. The program sends many hundreds of lines with many newline characters through stdout. Now, when the 64-bit executable is created as above, none of these lines are shown until the program completes and the entire output it printed at once. Can anybody provide insight into this problem?
This is quite normal. printf outputs to stdout which is a FILE* and is normally line buffered when connected to a terminal. This means you will not see any output until you write a newline, or the internal buffer of the stdout FILE* is full (A common buffer size is 4096 bytes).
If you write to a file or pipe, output might be fully buffered, in which case output is flushed when the internal buffer is full and not when you write a newline.
In all cases the buffers of a FILE* are flushed when: you call fflush(..). You call fclose(..) or the program ends normally.
Your program will behave the same on windows/cygwin as on linux.
You can add a call to fflush(stdout) to see the output immediately.
for (i = 1; i <= 10; i++) {
printf ("%d ", i);
fflush(stdout);
}
Also, do not use the gets() function.
If your real programs "ends early" and does not write data in text files that it's supposed to, it may be it crashes due to a bug of yours before it finishes, in which case the buffered output will not be flushed out. Or, more unlikely, you call the _exit() function, which will terminate the program without flushing FILE* buffers (in contrast to the exit() function)

Same C code producing different results on Mac OS X than Windows and Linux

I'm working with an older version of OpenSSL, and I'm running into some behavior that has stumped me for days when trying to work with cross-platform code.
I have code that calls OpenSSL to sign something. My code is modeled after the code in ASN1_sign, which is found in a_sign.c in OpenSSL, which exhibits the same issues when I use it. Here is the relevant line of code (which is found and used exactly the same way in a_sign.c):
EVP_SignUpdate(&ctx,(unsigned char *)buf_in,inl);
ctx is a structure that OpenSSL uses, not relevant to this discussion
buf_in is a char* of the data that is to be signed
inl is the length of buf_in
EVP_SignUpdate can be called repeatedly in order to read in data to be signed before EVP_SignFinal is called to sign it.
Everything works fine when this code is used on Ubuntu and Windows 7, both of them produce the exact same signatures given the same inputs.
On OS X, if the size of inl is less than 64 (that is there are 64 bytes or less in buf_in), then it too produces the same signatures as Ubuntu and Windows. However, if the size of inl becomes greater than 64, it produces its own internally consistent signatures that differ from the other platforms. By internally consistent, I mean that the Mac will read the signatures and verify them as proper, while it will reject the signatures from Ubuntu and Windows, and vice versa.
I managed to fix this issue, and cause the same signatures to be created by changing that line above to the following, where it reads the buffer one byte at a time:
int input_it;
for(input_it = (int)buf_in; input_it < inl + (int)buf_in; intput_it++){
EVP_SIGNUpdate(&ctx, (unsigned char*) input_it, 1);
}
This causes OS X to reject its own signatures of data > 64 bytes as invalid, and I tracked down a similar line elsewhere for verifying signatures that needed to be broken up in an identical manner.
This fixes the signature creation and verification, but something is still going wrong, as I'm encountering other problems, and I really don't want to go traipsing (and modifying!) much deeper into OpenSSL.
Surely I'm doing something wrong, as I'm seeing the exact same issues when I use stock ASN1_sign. Is this an issue with the way that I compiled OpenSSL? For the life of me I can't figure it out. Can anyone educate me on what bone-headed mistake I must be making?
This is likely a bug in the MacOS implementation. I recommend you file a bug by sending the above text to the developers as described at http://www.openssl.org/support/faq.html#BUILD17
There are known issues with OpenSSL on the mac (you have to jump through a few hoops to ensure it links with the correct library instead of the system library). Did you compile it yourself? The PROBLEMS file in the distribution explains the details of the issue and suggests a few workarounds. (Or if you are running with shared libraries, double check that your DYLD_LIBRARY_PATH is correctly set). No guarantee, but this looks a likely place to start...
The most common issue porting Windows and Linux code around is default values of memory. I think Windows sets it to 0xDEADBEEF and Linux set's it to 0s.

how can i override malloc(), calloc(), free() etc under OS X?

Assuming the latest XCode and GCC, what is the proper way to override the memory allocation functions (I guess operator new/delete as well). The debugging memory allocators are too slow for a game, I just need some basic stats I can do myself with minimal impact.
I know its easy in Linux due to the hooks, and this was trivial under codewarrior ten years ago when I wrote HeapManager.
Sadly smartheap no longer has a mac version.
I would use library preloading for this task, because it does not require modification of the running program. If you're familiar with the usual Unix way to do this, it's almost a matter of replacing LD_PRELOAD with DYLD_INSERT_LIBRARIES.
First step is to create a library with code such as this, then build it using regular shared library linking options (gcc -dynamiclib):
void *malloc(size_t size)
{
void * (*real_malloc)(size_t);
real_malloc = dlsym(RTLD_NEXT, "malloc");
fprintf(stderr, "allocating %lu bytes\n", (unsigned long)size);
/* Do your stuff here */
return real_malloc(size);
}
Note that if you also divert calloc() and its implementation calls malloc(), you may need additional code to check how you're being called. C++ programs should be pretty safe because the new operator calls malloc() anyway, but be aware that no standard enforces that. I have never encountered an implementation that didn't use malloc(), though.
Finally, set up the running environment for your program and launch it (might require adjustments depending on how your shell handles environment variables):
export DYLD_INSERT_LIBRARIES=./yourlibrary.dylib
export DYLD_FORCE_FLAT_NAMESPACE=1
yourprogram --yourargs
See the dyld manual page for more information about the dynamic linker environment variables.
This method is pretty generic. There are limitations, however:
You won't be able to divert direct system calls
If the application itself tricks you by using dlsym() to load malloc's address, the call won't be diverted. Unless, however, you trick it back by also diverting dlsym!
The malloc_default_zone technique mentioned at http://lists.apple.com/archives/darwin-dev/2005/Apr/msg00050.html appears to still work, see e.g. http://code.google.com/p/fileview/source/browse/trunk/fileview/fv_zone.cpp?spec=svn354&r=354 for an example use that seems to be similar to what you intend.
After much searching (here included) and issues with 10.7 I decided to write a blog post about this topic: How to set malloc hooks in OSX Lion
You'll find a few good links at the end of the post with more information on this topic.
The basic solution:
malloc_zone_t *dz=malloc_default_zone();
if(dz->version>=8)
{
vm_protect(mach_task_self(), (uintptr_t)malloc_zones, protect_size, 0, VM_PROT_READ | VM_PROT_WRITE);//remove the write protection
}
original_free=dz->free;
dz->free=&my_free; //this line is throwing a bad ptr exception without calling vm_protect first
if(dz->version==8)
{
vm_protect(mach_task_self(), (uintptr_t)malloc_zones, protect_size, 0, VM_PROT_READ);//put the write protection back
}
This is an old question, but I came across it while trying to do this myself. I got curious about this topic for a personal project I was working on, mainly to make sure that what I thought was automatically deallocated was being properly deallocated. I ended up writing a C++ implementation to allow me to track the amount of allocated heap and report it out if I so chose.
https://gist.github.com/monitorjbl/3dc6d62cf5514892d5ab22a59ff34861
As the name notes, this is OSX-specific. However, I was able to do this on Linux environments using the malloc_usable_size
Example
#define MALLOC_DEBUG_OUTPUT
#include "malloc_override_osx.hpp"
int main(){
int* ip = (int*)malloc(sizeof(int));
double* dp = (double*)malloc(sizeof(double));
free(ip);
free(dp);
}
Building
$ clang++ -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk \
-pipe -stdlib=libc++ -std=gnu++11 -g -o test test.cpp
$ ./test
0x7fa28a403230 -> malloc(16) -> 16
0x7fa28a403240 -> malloc(16) -> 32
0x7fa28a403230 -> free(16) -> 16
0x7fa28a403240 -> free(16) -> 0
Hope this helps someone else out in the future!
If the basic stats you need can be collected in a simple wrapper, a quick (and kinda dirty) trick is just using some #define macro replacement.
void* _mymalloc(size_t size)
{
void* ptr = malloc(size);
/* do your stat work? */
return ptr;
}
and
#define malloc(sz_) _mymalloc(sz_)
Note: if the macro is defined before the _mymalloc definition it will end up replacing the malloc call inside that function leaving you with infinite recursion... so ensure this isn't the case. You might want to explicitly #undef it before that function definition and simply (re)define it afterward depending on where you end up including it to hopefully avoid this situation.
I think if you define a malloc() and free() in your own .c file included in the project the linker will resolve that version.
Now then, how do you intend to implement malloc?
Check out Emery Berger's -- the author of the Hoard memory allocator's -- approach for replacing the allocator on OSX at https://github.com/emeryberger/Heap-Layers/blob/master/wrappers/macwrapper.cpp (and a few other files you can trace yourself by following the includes).
This is complementary to Alex's answer, but I thought this example was more to-the-point of replacing the system provided allocator.

Resources