Assume a simple hello world in C, compiled using gcc -c to an object file and disassembled using objdump will looks like this:
_main:
0: 55 pushq %rbp
1: 48 89 e5 movq %rsp, %rbp
4: c7 45 fc 00 00 00 00 movl $0, -4(%rbp)
b: c7 45 f8 05 00 00 00 movl $5, -8(%rbp)
12: 8b 05 00 00 00 00 movl (%rip), %eax
As you can see the memory addresses are 0, 1, 4, .. and so on. They are not actual addresses.
Linking the object file and disassembling it looks like this:
_main:
100000f90: 55 pushq %rbp
100000f91: 48 89 e5 movq %rsp, %rbp
100000f94: c7 45 fc 00 00 00 00 movl $0, -4(%rbp)
100000f9b: c7 45 f8 05 00 00 00 movl $5, -8(%rbp)
100000fa2: 8b 05 58 00 00 00 movl 88(%rip), %eax
My question is, is 100000f90 an actual address of a byte of virtual memory or is it an offset?
How can the linker give an actual address prior to execution? What if that memory address isn't available when executing? What if I execute it on another machine with much less memory (maybe paging kicks in here).
Is't it the job of the loader to assign actual addresses?
Is the linker generating actual addresses for he final executable file?
(The following answers assume that the linker is not creating a position-independent executable.)
My question is, is 100000f90 an actual address of a byte of virtual memory or is it an offset?
It's the actual virtual address. Strictly speaking, it is the offset from the base of the code segment, but since modern operating systems always set the base of the code segment to 0, it is effectively the actual virtual address.
How can the linker give an actual address prior to execution? What if that memory address isn't available when executing? What if I execute it on another machine with much less memory (maybe paging kicks in here).
Each process gets its own separate virtual address space. Because it is virtual memory, the amount of physical memory in the machine doesn't matter. Paging is the process by which virtual addresses get mapped to physical address.
Isn't it the job of the loader to assign actual addresses?
Yes, when creating a process, the operating system loader allocates physical page frames for the process and maps the pages into the process's virtual address space. But the virtual addresses are those assigned by the linker.
Does a linker generate absolute virtual addresses when linking
It depends upon the linker setting and the input source. For general programming, linkers usually strive to create position independent code.
My question is, is 100000f90 an actual address of a byte of virtual memory or is it an offset?
It is most likely an offset.
How can the linker give an actual address prior to execution?
Think about the loader for an operating system. It expects things to be in specific address locations. Any decent linker will allow the programmer to specify absolute addresses some way.
What if that memory address isn't available when executing? What if I execute it on another machine with much less memory (maybe paging kicks in here).
That's the problem with position-dependent code.
Is't it the job of the loader to assign actual addresses?
The job of the loader is to follow the instructions given to it in the executable file. In creating the executable, the linker can specify addresses or defer to the loader in some cases.
I have a Felica card. The first question is what actually is this card? Is it a Smart Card or it is a simple memory card? Is it a kind of Java Card and can I load .cap files inside or it has its proprietary fixed contents and I can't load any applet? Is it GlobalPlatform standard complaint?
I read here that:
Sony’s proprietary FeliCa is a smartcard technology that is similar to
ISO/IEC
14443. FeliCa has a file system similar to that defined in ISO/IEC 7816-4. The file system and commands for access to the file system are
standardized in JIS X 6319-4 [28]. In addition, the FeliCa system has
proprietary cryptography and security features.
After that I tried to send some APDU commands to it. The first step was do some configuration changes with the reader. Because my reader is configured to read ISO14443 Type A and Type B cards and not Felica cards.
As both Felica and ISO/IEC 14443 cards use 13.56 MHz frequency for the carrier, I think the difference between these types is in the Protocol layer only. Am I right? If so, what is the name of Felica cards transmission protocol? (For ISO/IEC 14443 cards, we have T=1 and T=CL protocols).
After configuring the reader, I tried to send commands to card:
Connect successful.
Send: 00 A4 04 00 00
Recv: 6A 81
Time used: 31.000 ms
Send: 00 C0 00 00 00
Recv: 6A 81
Time used: 28.000 ms
Send: 00 CA 00 00 00
Recv: 6A 81
Time used: 35.000 ms
As you see above, I receive 0x6A81 status words only.
I also searched a lot of ACS Reader Datasheets, Some NXP Application notes and for sure JIS X 6319-4 standard for a list of commands for this type of cards. But I found nothing applicable.
So, the questions are:
What actually is Felica? (Smart? Memory?)
What is the difference between Felica cards and ISO/IEC14443 cards? Is it related to NFC?
How to communicate with this card and transfer data?
Update:
My card'S ATR is : 3b 8f 80 01 80 4f 0c a0 00 00 03 06 11 00 3b 00 00 00 00 42
What actually is Felica? (Smart? Memory?)
It's more like a memory card than a smart card with regards to functionality. Reading data in blocks is typical for a memory card and the card has very limited functionality besides basic authentication based on symmetric cryptography.
You could argue that it is a smart card in the sense that the implementation seems to carry a multi-purpose CPU (see Annex B).
It seems however impossible to change the behavior of the smart card the same way you'd do e.g. in a Global Platform Java Card. So I'd classify it as a memory card with a proprietary protocol.
What is the difference between Felica cards and ISO/IEC14443 cards? Is it related to NFC?
It uses a proprietary communication protocol which includes both the data -link layer (which you are asking about here) and the command/response layer.
How to communicate with this card and transfer data?
The fact that you are sending APDU's instead of FeliCa's proprietary command / response pairs indicate that you are using a translation layer. This translation layer is likely to be in the reader / reader driver. The API of this translation layer is likely to be specified in the PCSC 2.01 specifications (section 3.2.2.1 Storage Card Functionality Support, using CLA byte 0xFF).
You probably need the reader's user manual as well, if just to figure out in which location to store the required keys.
I'm wondering how windows interprets the 8 bytes that define the starting cluster for the $MFT. Convering the 8 bytes to decimal using a calculator doesn't work, luckily WinHex has a calculator which displays the correct value, though I don't know how it's calculated:
http://imgur.com/4UEZvNy
The picture above shows WinHeX data interpreter showing the correct amount. So my question is, how does '00 00 0C 00 00 00 00 00' equate to '786432'.
I saw another user had a similar question about the number of bytes per sector, which was answered by someone that it was down to a byte being base 256. But that doesn't really help me figure it out. Any help would be really appreciated.
I have a question I had been given a while ago during the job interview, I was wandering about the data processor cache. The question itself was connected with volatile variable, how can we not optimize the memory access for those variables. From my understanding when we read the volatile variable we need to omit the processor cache. And this is what my question is about. What is happening in such cases, is entire cache being flushed when the access for such variable is executed? Or there is some register setting that caching should be omitted for a memory region? Or is there a function for reading memory without looking in the cache? Or is it architecture dependent.
Thanks in advance for your time and answers.
There is some confusion here - the memory your program uses (through the compiler), is in fact an abstraction, maintained together by the OS and the processor. As such, you don't "need" to worry about paging, swapping, physical address space and performance.
Wait, before you jump and yell at me for talking nonesence - that was not to say you shouldn't care about them, when optimizing your code you might want to know what actually happens, so you have a set of tools to assist you (SW prefetches for example), as well as a rough idea on how the system works (cache sizes and hierarchy), allowing you to write optimized code.
However, as I said, you don't have to worry about this, and if you don't - it's guaranteed to work "under the hood", to an extent. The cache for example, is guaranteed to maintain coherency even when working with shared data (that's maintained through a set of pretty complicated HW protocols), and even in cases of virtual address aliases (multiple virt addresses pointing to the same physical one). But here comes the "to an extent" part - in some cases you have to make sure you use it correctly. If you want to do memory-mapped IO for e.g., you should define it properly so that the processor knows it shouldn't be cached. The compiler isn't likely to do this for you implicitly, it probably won't even know.
Now, volatile lives in an upper level, it's part of the contract between the programmer and his compiler. It means the compiler isn't allowed to do all sorts of optimizations with this variable, that would be unsafe for the program even within the memory model abstraction. These are basically cases where the value can be modified externally at any point (through interrupt, mmio, other threads, ...). Keep in mind that the compiler still lives above the memory abstraction, if it decides to write something to memory or read it, aside from possible hints it relies completely on the processor to do whatever it needs to make this chunk of memory close at hand while maintaining correctness. However, a compiler is allowed much more freedom than the HW - it could decide to move reads/writes or eliminate variables alltogether, something which the CPU in most cases isn't allowed to, so you need to prevent that from happening if it's unsafe. Some nice examples of when that happens can be found here - http://www.barrgroup.com/Embedded-Systems/How-To/C-Volatile-Keyword
So while a volatile hint limits the freedom of the compiler inside the memory model, it doesn't necessarily limits the underlying HW. You probably don't want it to - say you have a volatile variable that you want to expose to other threads - if the compiler made it uncacheable it would ruin the performance (and without need). If on top of that you also want to protect the memory model from unsafe caching (which are just a subset of the cases volatile might come in handy), you'll have to do so explicitly.
EDIT:
I felt bad for not adding any example, so to make it clearer - consider the following code:
int main() {
int n = 20;
int sum = 0;
int x = 1;
/*volatile */ int* px = &x;
while (sum < n) {
sum+= *px;
printf("%d\n", sum);
}
return 0;
}
This would count from 1 to 20 in jumps of x, which is 1. Let's see how gcc -O3 writes it:
0000000000400440 <main>:
400440: 53 push %rbx
400441: 31 db xor %ebx,%ebx
400443: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
400448: 83 c3 01 add $0x1,%ebx
40044b: 31 c0 xor %eax,%eax
40044d: be 3c 06 40 00 mov $0x40063c,%esi
400452: 89 da mov %ebx,%edx
400454: bf 01 00 00 00 mov $0x1,%edi
400459: e8 d2 ff ff ff callq 400430 <__printf_chk#plt>
40045e: 83 fb 14 cmp $0x14,%ebx
400461: 75 e5 jne 400448 <main+0x8>
400463: 31 c0 xor %eax,%eax
400465: 5b pop %rbx
400466: c3 retq
note the add $0x1,%ebx - since the variable is considered "safe" enough by the compiler (volatile is commented out here), it allows itself to consider it as loop invariant. In fact, if I had not printed something on each iteration, the entire loop would have been optimized away since gcc can tell the final outcome pretty easily.
However, uncommenting the volatile keyword, we get -
0000000000400440 <main>:
400440: 53 push %rbx
400441: 31 db xor %ebx,%ebx
400443: 48 83 ec 10 sub $0x10,%rsp
400447: c7 04 24 01 00 00 00 movl $0x1,(%rsp)
40044e: 66 90 xchg %ax,%ax
400450: 8b 04 24 mov (%rsp),%eax
400453: be 4c 06 40 00 mov $0x40064c,%esi
400458: bf 01 00 00 00 mov $0x1,%edi
40045d: 01 c3 add %eax,%ebx
40045f: 31 c0 xor %eax,%eax
400461: 89 da mov %ebx,%edx
400463: e8 c8 ff ff ff callq 400430 <__printf_chk#plt>
400468: 83 fb 13 cmp $0x13,%ebx
40046b: 7e e3 jle 400450 <main+0x10>
40046d: 48 83 c4 10 add $0x10,%rsp
400471: 31 c0 xor %eax,%eax
400473: 5b pop %rbx
400474: c3 retq
400475: 90 nop
now the add operand is being read from the stack, as the compilers is led to suspect someone might change it. It's still caches, and as a normal writeback-typed memory it would catch any attempt to modify it from another thread or DMA, and the memory system would provide the new value (most likely the cache line would be snooped and invalidated, forcing the CPU to fetch the new value from whichever core owns it now). However, as I said, if x should not have been a normal cacheable memory address, but rather ment to be some MMIO or something else that might change silently beneath the memory system - then the cached value would be wrong (that's why MMIO shouldn't be cached), and the compiler would never know that even though it's considered volatile.
By the way - using volatile int x and adding it directly would produce the same result. Then again - making x or px global variables would also do that, the reason being - the compiler would suspect that someone might have access to it, and therefore would take the same precautions as with an explicit volatile hint. Interestingly enuogh, the same goes for making x local, but copying its address into a global pointer (but still using x directly in the main loop). The compiler is quite cautious.
That is not to say it's 100% full proof, you could in theory keep x local, have the compiler do the optimizations, and then "guess" the address somewhere from the outside (another thread for e.g.). This is when volatile does come in handy.
volatile variable, how can we not optimize the memory access for those variables.
Yes, Volatile on variable tells the compiler that the variable can be read or write in such a way that programmer can foresee what could happen to this variable out of programs scope and cannot seen by the compiler. This means that compiler cannot perform optimizations on the variable which will alter the intended functionality, caching its value in a register to avoid memory access using the register copy during each iteration.
`entire cache being flushed when the access for such variable is executed?`
No. Ideally compiler access variable from the variable's storage location which doesn't flush the existing cache entries between CPU and memory.
Or there is some register setting that caching should be omitted for a memory region?
Apparently when the register is in un-chached memory space, accessing that memory variable will give you the up-to-date value than from cache memory. Again this should be architecture dependent.
Ok, so I need to hook a program, but to do this I am going to copy the instructions E8 <Pointer to Byte Array that contains other code>. The problem with this is, that when I assemble Call 0x100 I get E8 FD, We know the E8 is the call instruction, so FD must be the destination, so how does the assembler take the destination from 0x100 into FD? Thanks, Bradley - Imcept
There is plethora of jump/call opcodes and some of them are relative. I'd say you in fact got not E8 FD but E8 FD FF. E8 seems to be "call 16-bit relative" and 0x100 is the place where instructions are placed by default.
So you put call 0x100 at address 0x100, and the generated code is "do the jump instruction, and jump -3 from the actual instruction pointer". -3 is because the shift is computed from the position after the instruction is read, which in case of E8 FD FF is 0x103. That is why the shift if FD FF, big-endian for 0xfffd, which is 16-bit -3.
http://wwwcsif.cs.ucdavis.edu/~davis/50/8086 Opcodes.htm
E8 is a 16 bit relative call. So for instance E8 00 10 means call the address at the PC+0x1000.