what does the BSS segment store? - loader

I know that the BSS segment stores the uninitialized global and static variables and initializes them to zero. But what if the global/static variable is initialized and my second question is I read that BSS segment doesn't consume memory, then where those it store these variables? Thanks

You probably read that the BSS segment doesn't consume space in the executable file on disk. When the executable loaded, the BSS segment certainly does consume space in memory. Space is allocated and initialised to zero by the OS loader.

If initialized, global/static variables are stored in the .DATA segment. When you declare data in the .DATA segment, you provide the values to that data so it would have to be stored as part of the executable.
On the other hand, you only declare how much data you need for the .BSS since you don't need to know what the values are. So if your program declared 2 GB of uninitialized memory, that 2 GB does not contribute to the size of your executable, you won't see it until after it is loaded.

Related

cache size, set associative and direct mapping

Considering a machine with a byte-addressable main memory of 256 Kbytes and a block size of 8 bytes. With a set associative mapped cache consisting of 32 lines divided into 2-line sets.
The address 110101010101011010 is stored in the 11th set, what other
memory address would be stored in the same set in the cache. If this
byte was stored in the cache in the clock cycle immediately
following the address for the previous one then would it overwrite
that byte?
If direct mapping had been used instead of set associative then the
main memory address would be divided up differently. How would it be
divided up?
In direct mapping into which line would the byte 110101010101011010
with the following address be stored.
What other memory address would be stored into the same line in the cache? If this byte was stored in the cache in the clock cycle immediately after the previous address then would it overwrite that byte?
Explanations as to why would be extremely helpful as i'm trying to understand how to work these out for further understanding.

Why is sometimes .data section virtual size bigger than raw size?

Recently I found out, that .data section in PE can have virtual size bigger than raw size (in file). This is quite suprising. Some people say that this is an effect of uninitialized data somewhere.
But after analyzing some PE, I can't really find this extra data. Here is link to PEDump results of some program:
"Hello world" PEDump
As you can see, .data section has bigger virtual size than raw size. Why is it like this in this particular example?
Values for any initialized data is stored in the section, if the binary wants to reserve space in memory for any uninitialized data, then the virtual size will be larger than the raw data size.
You won't find this data in the file, because it doesn't need to be there. The addresses that reference the data (in the code section) are baked into the binary so that they will point to the correct location when it is loaded into memory.
If the loader didn't reserve this space up front, then globals, etc. would have to be allocated on the heap before they could be used.
From the PE spec:
[SizeOfRawData is the] size of the section (for object files) or the
size of the initialized data on disk (for image files). For executable
images, this must be a multiple of FileAlignment from the optional
header. If this is less than VirtualSize, the remainder of the section
is zero-filled. Because the SizeOfRawData field is rounded but the
VirtualSize field is not, it is possible for SizeOfRawData to be
greater than VirtualSize as well. When a section contains only
uninitialized data, this field should be zero.
Edit: Respond to the question about the SizeOfUninitializedData.
The SizeOfUninitializedData field in the Optional Header is just the size of the .bss section (or the sum of them if there are multiple). Your binary didn't have a separate section for that data, so it was zero. Since sections are aligned on specific boundaries, it may be more efficient to save some space at the end of an existing section than to use a separate one.

alignment requirement for powerpc icbi and dcbf cache instructions

I have inherited some PowerPC 750FX code. A handful of functions flush the instruction and data cache with
icbi 0,3 # instruction cache block invalidate
and
dcbf 0,3 # data cache block flush
respectively. The code makes sure the Register 3 contents is 32 byte aligned so it always points to the start of a cache line. I wonder if this is necessary. The PowerPC manual only talks about computing the effective address (EA) using the operands, but has nothing to say about alignment requirements of the resulting EA. Is it safe to execute these instructions with arbitrary EA addressing any byte within a cache line?
That would deserve a test but from what I've read (what is the same in various cores reference manuals and PowerISA 2.05), there is no need to align data address. The action points on the block that contains the address.
If the block containing the byte addressed by EA is in storage that is Memory
Coherence Required and a block containing the byte addressed by EA is in the
instruction cache of any processors, the block is invalidated in those instruction
caches, so that subsequent references cause the block to be fetched from main
storage.
I don't know your code but is the alignment is done at the beginning and then a loop adds the cache block size to the EA?

How do "per-cpu" variables work? [duplicate]

On multiprocessor, each core can have its own variables. I thought they are different variables in different addresses, although they are in same process and have the same name.
But I am wondering, how does the kernel implement this? Does it dispense a piece of memory to deposit all the percpu pointers, and every time it redirects the pointer to certain address with shift or something?
Normal global variables are not per CPU. Automatic variables are on the stack, and different CPUs use different stack, so naturally they get separate variables.
I guess you're referring to Linux's per-CPU variable infrastructure.
Most of the magic is here (asm-generic/percpu.h):
extern unsigned long __per_cpu_offset[NR_CPUS];
#define per_cpu_offset(x) (__per_cpu_offset[x])
/* Separate out the type, so (int[3], foo) works. */
#define DEFINE_PER_CPU(type, name) \
__attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
/* var is in discarded region: offset to particular copy we want */
#define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
#define __get_cpu_var(var) per_cpu(var, smp_processor_id())
The macro RELOC_HIDE(ptr, offset) simply advances ptr by the given offset in bytes (regardless of the pointer type).
What does it do?
When defining DEFINE_PER_CPU(int, x), an integer __per_cpu_x is created in the special .data.percpu section.
When the kernel is loaded, this section is loaded multiple times - once per CPU (this part of the magic isn't in the code above).
The __per_cpu_offset array is filled with the distances between the copies. Supposing 1000 bytes of per cpu data are used, __per_cpu_offset[n] would contain 1000*n.
The symbol per_cpu__x will be relocated, during load, to CPU 0's per_cpu__x.
__get_cpu_var(x), when running on CPU 3, will translate to *RELOC_HIDE(&per_cpu__x, __per_cpu_offset[3]). This starts with CPU 0's x, adds the offset between CPU 0's data and CPU 3's, and eventually dereferences the resulting pointer.

What do the different columns in the "!heap -flt -s xxxx" windbg command represent

I've been doing some work on high memory issues, and I've been doing a lot of heap analysis in windbg, and I was curious what the different columns really mean in "!heap -flt -s xxxx" command.
I read What do the 'size' numbers mean in the windbg !heap output?, and I looked in my "Windows Internals" book, but I still had a bunch of questions. So the columns and my questions are below.
**HEAP_ENTRY** - What does this pointer really point to? How is it different than UserPtr?
**Size** - What does this size mean? How is it different than UserSize?
**Prev** - This just appears to be the negative offset to get to the previous heap entry. Still not sure exactly how it's used.
**Flags** - Is there any documentation on these flags?
**UserPtr** - What is the user pointer? In all cases I've seen it's always 8 bytes higher than the HEAP_ENTRY, but I don't really know what it points to.
**UserSize** - This appears to be the size of the actual allocation.
**state** - This just tells you what state of this heap entry is (free, busy, etc....)
Example:
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
0015eeb0 0044 0000 [07] 0015eeb8 00204 - (busy)
HEAP_ENTRY
Heaps store allocated blocks in contiguous Segments of memory, each allocated block starts with a 8-bytes header followed by the actual allocated data. The HEAP_ENTRY column is the address of the beginning of the header of the allocated block.
Size
The heap manager handles blocks in multiple of 8 bytes. The column is the number of 8 bytes chunk allocated. In your sample, 0044 means that the block takes 0x220 bytes (0x44*8).
Prev
Multiply per 8 to have the negative offset in bytes to the previous heap block.
Flags
This is a bitmask that encodes the following information
0x01 - HEAP_ENTRY_BUSY
0x02 - HEAP_ENTRY_EXTRA_PRESENT
0x04 - HEAP_ENTRY_FILL_PATTERN
0x08 - HEAP_ENTRY_VIRTUAL_ALLOC
0x10 - HEAP_ENTRY_LAST_ENTRY
UserPtr
This is the pointer returned to the application by the HeapAlloc (callbed by malloc/new) function. Since the header is always 8 bytes long, it is always HEAP_ENTRY +8.
UserSize
This is the size passed the HeapAlloc function.
state
This is a decoding of the Flags column, telling if the entry is busy, freed, last of its segment, …
Be aware that in Windows 7/2008 R2, heaps are by default using a front-end named LFH (Low fragmented heap) that uses the default heap manager to allocate chunks in which it dispatched user allocated data. For these heaps, UserPtr and UserSize will not point to real user data.
The output of !heap -s displays which heaps are LFH enabled.
From looking at the !heap documentation in the Debugging Tools for Windows help file and the heap docs on MSDN and a great excerpt from Advanced Windows Debugging, here's what I've been able to put together:
HEAP_ENTRY: pointer to entry within the heap. As you found, there is an 8 byte header which contains the data for the HEAP_ENTRY structure. The size of the HEAP_ENTRY structure is 8 bytes which defines the "heap granularity" size. This is used for determining the...
SIZE: size of the entry in terms of the granularity (i.e. the allocation size / 8)
FLAGS: these are defined in winbase.h with explanations found the in MSDN link.
USERPTR: the actual pointer to the allocated (or freed) object
Well, the main difference between HEAP_ENTRY and UserPtr is the due to the fact that heaps have to be indexed, allocated, filled with metadata (like the allocated length made available to user)... otherwise, how could you free(p) something without providing how many bytes were allocated? Same thing with the two size fields: one thing is how big the structure indexing the heap is, one thing is how big is the memory region made available to the user.
The FLAGS, in turn, basically specify which properties of the allocated memory block, if it is committed or just reserved, and, I guess, used by the kernel to rearrange or share memory regions if needed (but as nithins specifies they are documented in MSDN).
The PREV ptr is used to keep track of all the allocated regions and the first pointer is stored in the PEB structure so both user-space and kernel-space code is aware of the allocated heap pools.

Resources