Otool - Get file size only

Otool - Get file size only - xcode

I'm using Otool to look into a compiled library (.a) and I want to see what the file size of each component in the binary is. I see that
otool -l [lib.a]
will show me this information but there is also a LOT of other information I do not need. Is there a way I can just see the file size and not everything else? I can't seem to find it if there is.

The size command does that, e.g.,
size lib.a
will show the size of each object stored in the lib.a archive. For example:
$ size libasprintf.a
text data bss dec hex filename
0 0 0 0 0 lib-asprintf.o (ex libasprintf.a)
639 8 1 648 288 autosprintf.o (ex libasprintf.a)
on most systems. OS X format is a little different:
$ size libl.a
__TEXT __DATA __OBJC others dec hex
86 0 0 32 118 76 libl.a(libmain.o)
75 0 0 32 107 6b libl.a(libyywrap.o)
Oddly (though "everyone" implements it), I do not see size on the POSIX site. OS X has a manual page for it.

Related

macOS size command shows a really large number?

> size /bin/ls
__TEXT __DATA __OBJC others dec hex
20480 4096 0 4294983680 4295008256 10000a000
How could it be that ls is 4GB? Is size not meant to be used on executables? I have 4GB ram, so is it just showing me the amount memory it can use?

On macOS, 64-bit apps have a 4GB page zero, by default. Page zero is chunk of the address space starting at address 0 which allows no access. This is what causes access violations when a program dereferences a null pointer.
64-bit Mac programs use a 4GB page zero so that, should any valid pointer get accidentally truncated to 32 bits by a program bug (e.g. cast to int and back to a pointer), it will be invalid and cause a crash as soon as possible. That helps to find and fix such bugs.
The page zero segment in the Mach-O executable file doesn't actually use 4GB on disk. It's just a bit of metadata that tells the kernel and dynamic loader how much address space to reserve for it. It seems that size is including the virtual size of all segments, regardless of whether they take up space on disk or not.
Also, the page zero doesn't consume actual RAM when the program is loaded, either. Again, there's just some bookkeeping data to track the fact that the lower 4GB of the address space is reserved.
The size being reported for "others", 4294983680 bytes, is 0x100004000 in hex. That's the 4GB page zero (0x100000000) plus another 4 pages for some other segments.
You can use the -m option to size to get more detail:
$ size -m /bin/ls
Segment __PAGEZERO: 4294967296
Segment __TEXT: 20480
Section __text: 13599
Section __stubs: 456
Section __stub_helper: 776
Section __const: 504
Section __cstring: 1150
Section __unwind_info: 148
total 16633
Segment __DATA: 4096
Section __got: 40
Section __nl_symbol_ptr: 16
Section __la_symbol_ptr: 608
Section __const: 552
Section __data: 40
Section __bss: 224
Section __common: 140
total 1620
Segment __LINKEDIT: 16384
total 4295008256
You can also use the command otool -lV /bin/ls to see the loader commands of the executable, including the one establishing the __PAGEZERO segment.

The size command outputs information related to some binary executable and how it is running. It is not about the file. The 4Gb number might be (that is just my guess) related to the virtual address space needed to run it.
I don't have a MacOSX operating system (because it is proprietary and tied to hardware that I dislike and find too expensive). But on Linux (which is mostly POSIX, like MacOSX), size /bin/ls gives:
text data bss dec hex filename
124847 4672 4824 134343 20cc7 /bin/ls
while ls -l /bin/ls shows
-rwxr-xr-x 1 root root 138856 Feb 28 16:30 /bin/ls
Of course, when ls is running, it has some data (notably bss) which is not corresponding to a part of the executable
Try man size on your system to get an explanation. For Linux, see size(1) (it gives info about sections of an ELF executable) and ls(1) (it gives the file size).
On MacOSX, executables follow the Mach-O format.
On Linux, if you try size on a non-executable file such as /etc/passwd, you get
size: /etc/passwd: file format not recognized
and I guess that you should have some error message on MacOSX if you try that.
Think of size giving executable size information. The name is historical and a bit misleading.

Disassamble ELF file - debugging area where specific string of binary is loaded

I would like to disassamble / debug an elf file. Is it somehow possible to track the function where a specific string in the elf file is called?
So I mean, I have a string where I know it is used to search for that string in a file. Is it somehow possible with e.g. gdb to debug exactly that position in the executable?
Or is the position of the string in the elf file, somehow visible in the objdump -d output?

In order to do that you need a disassembler - objdump just dumps the info - it might not give you enough information as some analysis is needed before you can tell where it is being used. What you need is to get the XREFs for the string you have in mind.
If you open your binary in the disassembler it will probably have the ability to show you strings that are present in the binary with the ability to jump to the place where the string is being used (it might be multiple places).
I'll showcase this using radare2.
Open the binary (I'll use ls here)
r2 -A /bin/ls
and then
iz
to display all the strings. There's a lot of them so here's an extract
000 0x00004af1 0x100004af1 7 8 (4.__TEXT.__cstring) ascii COLUMNS
001 0x00004af9 0x100004af9 39 40 (4.__TEXT.__cstring) ascii 1#ABCFGHLOPRSTUWabcdefghiklmnopqrstuvwx
002 0x00004b21 0x100004b21 6 7 (4.__TEXT.__cstring) ascii bin/ls
003 0x00004b28 0x100004b28 8 9 (4.__TEXT.__cstring) ascii Unix2003
004 0x00004b31 0x100004b31 8 9 (4.__TEXT.__cstring) ascii CLICOLOR
005 0x00004b3a 0x100004b3a 14 15 (4.__TEXT.__cstring) ascii CLICOLOR_FORCE
006 0x00004b49 0x100004b49 4 5 (4.__TEXT.__cstring) ascii TERM
007 0x00004b60 0x100004b60 8 9 (4.__TEXT.__cstring) ascii LSCOLORS
008 0x00004b69 0x100004b69 8 9 (4.__TEXT.__cstring) ascii fts_open
009 0x00004b72 0x100004b72 28 29 (4.__TEXT.__cstring) ascii %s: directory causes a cycle
let's see where this last one is being used. If we move to the location where it's defined 0x100004b72. We can see this:
;-- str.s:_directory_causes_a_cycle:
; DATA XREF from 0x100001cbe (sub.fts_open_INODE64_b44 + 378)
And here we see where it's being referenced -> DATA XREF. We can move there (s 0x100001cbe) and there we see how it's being used.
⁝ 0x100001cbe 488d3dad2e00. lea rdi, str.s:_directory_causes_a_cycle ; 0x100004b72 ; "%s: directory causes a cycle"
⁝ 0x100001cc5 4c89ee mov rsi, r13
⁝ 0x100001cc8 e817290000 call sym.imp.warnx ;[1]
Having the location you can put a breakpoint there (r2 is also a debugger) or use it in gdb.

How can two 100% identical files have different sizes?

I have two 100% identical empty .sh shell script files on Mac:
encrypt.sh: 299 bytes
decrypt.sh: 13 bytes (Actually this size is correct, since I have 13 bytes: 11 character + two new line)
The contents of encrypt.sh and its hexdump:
The contents of decrypt.sh and its hexdump:
The file info window of encrypt.sh:
The file info window of decrypt.sh:
They have the exact same hexdump, then how is it possible that they have different sizes?

Mac OS X file system is implementing forks, so the larger one is likely having something specific stored in its resource fork.
Use ls -l# to get more details.

Getting symbol names from a Mac app's call stack

I have a stack trace from my Mac App Store app, that I'd like to read to help diagnose a problem the user is experiencing. I have the dSYM file and original archived build, but I do not have a full crash report. All I would like to know is the name of the methods in the stack trace (you can see two of them for MyAppName, below). I have not been able to get lldb or atos to give me this information. This is what the stack trace looks like:
0 CoreFoundation 0x00007fff92fdd25c __exceptionPreprocess + 172
1 libobjc.A.dylib 0x00007fff918dbe75 objc_exception_throw + 43
2 CoreFoundation 0x00007fff92ebb4f5 -[__NSArrayM objectAtIndex:] + 245
3 MyAppName 0x0000000108e91c6b MyAppName + 126059
4 MyAppName 0x0000000108e7556f MyAppName + 9583
5 AppKit 0x00007fff8d883099 -[NSToolbarButton sendAction:to:] + 75
6 AppKit 0x00007fff8d8830e8 -[NSToolbarButton sendAction] + 65
7 AppKit 0x00007fff8d436f0c -[NSToolbarItemViewer mouseDown:] + 4897
8 AppKit 0x00007fff8d352a58 -[NSWindow sendEvent:] + 11296
9 AppKit 0x00007fff8d2f15d4 -[NSApplication sendEvent:] + 2021
10 AppKit 0x00007fff8d1419f9 -[NSApplication run] + 646
11 AppKit 0x00007fff8d12c783 NSApplicationMain + 940
12 libdyld.dylib 0x00007fff87df35fd start + 1
13 ??? 0x0000000000000001 0x0 + 1
To get a symbol (say, for level 3 above), what command can I use? When I was calling lldb, it wasn't even clear if I should be using the hex address or the offset, if that's what the 126059 is on level 3.
Update
According to the atos documentation, it looks like I should invoke it like so:
xcrun atos -arch x86_64 -o MyAppName.app/Contents/MacOS/MyAppName -l <LOADED ADDRESS> 0x0000000108e91c6b
What would I use as the loaded address, though? All I have is what I pasted above. Whether I use 0x0000000000000001, 0x00007fff87df35fd, or leave out -l entirely, I get 0x0000000108e91c6b (the address I specified) printed back to standard out.

In most tools that dump stack traces (particularly CrashReporter) there's a section at the bottom of the report with all the images currently loaded in the program, their UUID's and their load addresses. You should always make sure to get that info along with the stack trace since that's what tells you the load address of the binary, and also will ensure that you have the right version of the debug information, since you can match the UUID against the UUID in the dSYM or binary.
You might be able to use the "Symbol Name + offset" part of the trace to figure out the load address, however. Generally, this last column is the offset of the address in the backtrace from the closest unstripped symbol in that binary. So you just find the address of that symbol in your stored binary, add the offset to that address, and subtract that from the address listed in the third column above. In lldb, you can find the address of a symbol using:
(lldb) image lookup -n <SymbolName>
That calculation will give you the "slide" of the binary from its default load address. Then run lldb on your binary and do:
(lldb) image load -f MyAppName -s <Calculated Slide>
Now you can look up the addresses from MyAppName in the stack trace using:
(lldb) image lookup -va <ADDRESS>
However, main executables are usually totally stripped - since they generally don't provide symbols for use by any other component of the system, so there aren't any symbols left. In that case, I would guess MyAppName in the listing above is just the __TEXT.__text section of the binary, though I'm not 100% sure about that. Anyway if that is right, you can find the default load address of that section by loading the binary in lldb and doing:
(lldb) image dump sections MyAppName
Then do the same calculation listed above.

orthAgogue incorrectly processing BLAST files

Need to recruit the help of any budding bioinformaticians that are lurking in the shadows here.
I am currently in the process of formatting some .fasta files for use in a set of grouping programs but I cannot for the life of me get them to work. First things first, all the files have to have a 3 or 4 character name such as the following:
PP41.fasta
PP59.fasta
PPBD.fasta
...etc...
The files must have headers for each gene sequence that look like so: >xxxx|yyyyyyyyyy where xxxx is the same 3 or 4 letter 'taxon' identifier as the file names I put above and yyyyyyy is a numerical identifier for each of the proteins within each of the taxons (the pipe symbol can also be replaced with an _ as below). I then cat all of these in to one file which has a header that looks correct like so:
>PP49_00001
MIENFNENNDMSDMFWEVEKGTGEVINLVPNTSNTVQPVVLMRLGLFVPTLKSTKRGHQG
EMSSMDATAELRQLAIVKTEGYENIHITGARLDMDNDFKTWVGIIHSFAKHKVIGDAVTL
SFVDFIKLCGIPSSRSSKRLRERLGASLRRIATNTLSFSSQNKSYHTHLVQSAYYDMVKD
TVTIQADPKIFELYQFDRKVLLQLRAINELGRKESAQALYTYIESLPPSPAPISLARLRA
RLNLRSRVTTQNAIVRKAMEQLKGIGYLDYTEIKRGSSVYFIVHARRPKLKALKSSKSSF
KRKKETQEESILTELTREELELLEIIRAEKIIKVTRNHRRKKQTLLTFAEDESQ*
>PP49_00002
MQNDIILPINKLHGLKLLNSLELSDIELGELLSLEGDIKQVSTGNNGIVVHRIDMSEIGS
FLIIDSGESRFVIKAS*
Next step is to construct a blast database which I do as follows, using the formatdb tool of NCBI Blast:
formatdb -i allproteins.fasta -p T -o T
This produces a set of files for the database. Next I conduct an all-vs-all BLAST of the concatenated proteins against the database that I made of them like so, which outputs a tabular file which I suspect is where my issues are beginning to arise:
blastall -p blastp -d allproteins.fasta -i allproteins.fasta -a 6 -F '0 S' -v 100000 -b 100000 -e 1e-5 -m 8 -o plasmid_allvall_blastout
These files have 12 columns and look like the below. It appears correct to me, but my supervisor suspects the error is in the blast file - I don't know what I'm doing wrong however.
PP49_00001 PP51_00025 100.00 354 0 0 1 354 1 354 0.0 552
PP49_00001 PP49_00001 100.00 354 0 0 1 354 1 354 0.0 552
PP49_00001 PPTI_00026 90.28 288 28 0 1 288 1 288 3e-172 476
PP49_00001 PPNP_00026 90.28 288 28 0 1 288 1 288 3e-172 476
PP49_00001 PPKC_00016 89.93 288 29 0 1 288 1 288 2e-170 472
PP49_00001 PPBD_00021 89.93 288 29 0 1 288 1 288 2e-170 472
PP49_00001 PPJN_00003 91.14 79 7 0 145 223 2 80 8e-47 147
PP49_00002 PPTI_00024 100.00 76 0 0 1 76 1 76 3e-50 146
PP49_00002 PPNP_00024 100.00 76 0 0 1 76 1 76 3e-50 146
PP49_00002 PPKC_00018 100.00 76 0 0 1 76 1 76 3e-50 146
SO, this is where the problems really begin. I now pass the above file to a program called orthAgogue which analyses the paired sequences I have above using parameters laid out in the manual (still no idea if I'm doing anything wrong) - all I know is the several output files that are produced are all just nonsense/empty.
Command looks like so:
orthAgogue -i plasmid_allvsall_blastout -t 0 -p 1 -e 5 -O .
Any and all ideas welcome! (Hope I've covered everything - sorry about the long post!)
EDIT Never did manage to find a solution to this. Had to use an alternative piece of software. If admins wish to close this please do, unless it is worth having open for someone else (though I suspect its a pretty niche issue).

Discovered this issue (of orthAgogue) first today:
though my reply may be old, I hope it may help future users;
issue is due to a missing parameter: seems like you forgot to specify the separator: -s '_', ie, the following set of command-line parameters should do the trick*:
orthAgogue -i plasmid_allvsall_blastout -t 0 -p 1 -e 5 -O -s '_'
(* Under the assumption that your input-file is a tabular-seperated file of columns.)
A brief update after comment made by Joe:
In brief, the problem described in the intiail error report (by Joe) is (in most cases) not a bug. Instead it is one of the core properties of the Inparanoid algorithm which orthAgogue implements: if your ortholog-result-file is empty (though constructed), this (in most cases) implies that there are no reciprocal best match between a protein-pair from two different taxa/species.
One (of many) explanations for this could be that your blastp-scores are too similar, a case where I would suggest a combined tree-based/homology clustering as in TREEFAM.
Therefore, when I receive your data, I'll send it to one of the biologists I'm working with, with goal of identifying the tool proper for your data: hope my last comment makes your day ;)
Ole Kristian Ekseth, developer of orthAgogue

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio