(Operating System) Calculate page number and page offset - memory-management

Given the size of the page is 4KB, find the page number and page offset of this address (provided in decimal) 21235.
The Offset would be 21235 / (4*2^10) = 5.xxx => 5
The page number would be 21235 % (4*2^10) = 755
And I am wonder the explanation of this? I know the numbers but not the explanation
Thank you, you all !

Update:
Page number = logical address / page size in this case is 21205 / 1024 = 20
Offset = logical address mod page size in this case is 21205 mod 1024 = 725
Sources:
http://www.yorku.ca/pkashiya/cse1520/Paged%20memory%20technique.htm http://www2.cs.uregina.ca/~hamilton/courses/330/notes/memory/paging.html
The Post over at Meta:
https://cs.stackexchange.com/questions/124826/determine-page-number-and-offsets-for-address-references/142321#142321

Related

Golang readers: Why writing int64 numbers using bitwise operator <<

I have come across the following code when dealing with Go readers to limit the number of bytes read from a remote client when sending a file through multipart upload (e.g. in Postman).
r.Body = http.MaxBytesReader(w, r.Body, 32<<20+1024)
If I am not mistaken, the above notation should represent 33555456 bytes, or 33.555456 MB (32 * 2 ^ 20) + 1024. Or is this number not correct?
What I don't understand is:
why did the author use it like this? Why using 20 and not some other number?
why the author used the notation +1024 at all? Why didn't he write 33 MB instead?
would it be OK to write 33555456 directly as int64?
If I am not mistaken, the above notation should represent 33555456 bytes, or 33.555456 MB (32 * 2 ^ 20) + 1024. Or is this number not correct?
Correct. You can trivially check it yourself.
fmt.Println(32<<20+1024)
Why didn't he write 33 MB instead?
Because this number is not 33 MB. 33 * 1024 * 1024 = 34603008
would it be OK to write 33555456 directly as int64?
Naturally. That's what it likely is reduced to during compilation anyway. This notation is likely easier to read, once you figure out the logic behind 32, 20 and 1024.
Ease of reading is why I almost always (when not using ruby) write constants like "50 MB" as 50 * 1024 * 1024 and "30 days" as 30 * 86400, etc.

Windows 10 x64: Unable to get PXE on Windbg

Can't understand how Windows Memory Manager works.
I look at the attached user process (dbgview.exe).
It is WOW64-process. At the specified address (0x76560000) there is .text section of the kernel32.dll module (also WOW64).
Why there is no PTE and other tables in the process page table pointing to those virtual address?
kd> db 76560000
00000000`76560000 8b ff 55 8b ec 51 56 57-33 f6 89 55 fc 56 68 80 ..U..QVW3..U.Vh.
<...>
kd> !pte 76560000
VA 0000000076560000
PXE at FFFFF6FB7DBED000 PPE at FFFFF6FB7DA00008 PDE at FFFFF6FB40001D90 PTE at FFFFF680003B2B00
Unable to get PXE FFFFF6FB7DBED000
kd> db FFFFF680003B2B00
fffff680`003b2b00 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????
<...>
I know that pages will be allocated after first access (with page fault) have occured, but why there is no protype PTE too?
Firstly, translate an arbitrary virtual address to physical using !vtop to see the dirbase of the process in the process of translation, or use !process to find the dirbase of the process:
lkd> .process /p fffffa8046a2e5f0
Implicit process is now fffffa80`46a2e5f0
lkd> .context 77fa90000
lkd> !vtop 0 13fe60000
Amd64VtoP: Virt 00000001`3fe60000, pagedir 7`7fa90000
Amd64VtoP: PML4E 7`7fa90000
Amd64VtoP: PDPE 1`c2e83020
Amd64VtoP: PDE 7`84e04ff8
Amd64VtoP: PTE 4`be585300
Amd64VtoP: Mapped phys 6`3efae000
Virtual address 13fe60000 translates to physical address 63efae000.
Then find that physical frame in the PFN database (in this case the physical page for PML4 (cr3 page aka. dirbase) is 77fa90 with full physical address 77fa90000:
lkd> !pfn 77fa90
PFN 0077FA90 at address FFFFFA80167EFB00
flink FFFFFA8046A2E5F0 blink / share count 00000005 pteaddress FFFFF6FB7DBEDF68
reference count 0001 used entry count 0000 Cached color 0 Priority 0
restore pte 00000080 containing page 77FA90 Active M
Modified
The address FFFFF6FB7DBED000 is therefore the virtual address of the PML4 page and FFFFF6FB7DBEDF68 is the virtual address of the PML4E self reference entry (1ed*8 = f68).
FFFFF6FB7DBED000 = 1111111111111111111101101111101101111101101111101101000000000000
1111111111111111 111101101 111101101 111101101 111101101 000000000000
The PML4 can only be at a virtual address where the PML4E, PDTPE, PDE and PTE index are the same, so there are actually 2^9 different combinations of that and windows 7 always selects 0x1ed i.e. 111101101. The reason for this is because the PML4 contains a PML4 that points to itself i.e. the physical frame of the PML4, so it will need to keep indexing to that same location at every level of the hierarchy.
The PML4, being a page table page, must reside in the kernel, and kernel addresses are high-canonical, i.e. prefixed with 1111111111111111, and kernel addresses begin with 00001 through 11111 i.e. from 08 to ff
The range of possible addresses that a 64 bit OS that uses 8TiB for user address space can place it at is therefore 31*(2^4) = 496 different possible locations and not actually 2^9:
1111111111111111 000010000 000010000 000010000 000010000 000000000000
1111111111111111 111111111 111111111 111111111 111111111 000000000000
I.e. the first is FFFF080402010000, the second is FFFF088442211000, the last is FFFFFFFFFFFFF000.
Note:
Up until Windows 10 TH2, the magic index for the Self-Reference PML4 entry was 0x1ed as mentioned above. But what about Windows 10 from 1607? Well Microsoft uped their game, as a constant battle for improving Windows security the index is randomized at boot-time, so 0x1ed is now one of the 512 [sic. (496)] possible values (i.e. 9-bit index) that the Self-Reference entry index can have. And side effect, it also broke some of their own tools, like the !pte2va WinDbg command.
0xFFFFF68000000000 is the address of the first PTE in the first page table page, so basically MmPteBase, except because on Windows 10 1607 the PML4E can be an other than 0x1ed, the base is not always 0xFFFFF68000000000 as a result, and it uses a variable nt!MmPteBase to know instantly where the base of the page table page allocations begins. Previously, this symbol does not exist in ntoskrnl.exe, because it has a hardcoded base 0xFFFFF68000000000. The address of the first and last page table page is going to be:
first last
* pml4e_offset : 0x1ed 0x1ed
* pdpe_offset : 0x000 0x1ff
* pde_offset : 0x000 0x1ff
* pte_offset : 0x000 0x1ff
* offset : 0x000 0x000
This gives 0xFFFFF68000000000 for the first and 0xFFFFF6FFFFFFF000 for the last page table page when the PML4E index is 0x1ed. PDEs + PDPTEs + PML4Es + PTEs are assigned in this range.
Therefore, to be able to translate a virtual address to its PTE virtual address (and !pte2va is the reverse of this), you affix 111101101 to the start of the virtual address and then you truncate the last 12 bits (the page offset, which is no longer useful) and then you times it by 8 bytes (the PTE size) (i.e. add 3 zeroes to the end, which creates a new page offset from the last level index into the page that contains the PTEs times the size of a PTE structure). Concatenating the PML4E index to the start simply causes it to loop back one time such that you actually get the PTE rather than what the PTE points to. Concatenating it to the start is the same thing as adding it to MmPteBase.
Here is simple C++ code to do it:
// pte.cpp
#include<iostream>
#include<string>
int main(int argc, char *argv[]) {
unsigned long long int input = std::stoull(argv[1], nullptr, 16);
long long int ptebase = 0xFFFFF68000000000;
long long int pteaddress = ptebase + ((input >> 12) << 3);
std::cout << "0x" << std::hex << pteaddress;
}
C:\> pte 13fe60000
0xfffff680009ff300
To get the PDE virtual address you have to affix it twice and then truncate the last 21 bits and then times by 8. This is how !pte is supposed to work, and is the opposite of !pte2va.
Similarly, PDEs + PDPTEs + PML4Es are assigned in the range:
first last
* pml4e_offset : 0x1ed 0x1ed
* pdpe_offset : 0x1ed 0x1ed
* pde_offset : 0x000 0x1ff
* pte_offset : 0x000 0x1ff
* offset : 0x000 0x000
Because when you get to 0x1ed for the pdpte offset within the page table page range, all of a sudden, you are looping back in the PML4 once again, so you get the PDE.
If it says there is no PTE for an address within a virtual page for which the corresponding physical frame is shown to be part of the working set by VMMap, then you might be experiencing my issue, where you need to use .process /P if you're doing live kernel debugging (local or remote) to explicitly tell the debugger that you want to translate user and kernel addresses in the context of the process and not the debugger.
I have found that since Windows 10 Anniversary Update (1607, 10.0.14393) PML4 table had been randomized to mitigate kernel heap spraying.
It means that probably Page Table is not placed at 0xFFFFF6800000.

Mistake in Virtual Hard Disk Image Format Specification?

I want to calculate the end offset of a parent locator in a VHD. Here is a part of the VHD header:
Cookie: cxsparse
Data offset: 0xffffffffffffffff
Table offset: 0x2000
Header version: 0x00010000
Max table entries: 10240
Block size: 0x200000
Checksum: 4294956454
Parent Unique Id: 0x9678bf077e719640b55e40826ce5d178
Parent time stamp: 525527478
Reserved: 0
Parent Unicode name:
Parent locator 1:
- platform code: 0x57326b75
- platform_data_space: 4096
- platform_data_length: 86
- reserved: 0
- platform_data_offset: 0x1000
Parent locator 2:
- platform code: 0x57327275
- platform_data_space: 65536
- platform_data_length: 34
- reserved: 0
- platform_data_offset: 0xc000
Some definitions from the Virtual Hard Disk Image Format Specification:
"Table Offset: This field stores the absolute byte offset of the Block Allocation Table (BAT) in the file.
Platform Data Space: This field stores the number of 512-byte sectors needed to store the parent hard disk locator.
Platform Data Offset: This field stores the absolute file offset in bytes where the platform specific file locator data is stored.
Platform Data Length. This field stores the actual length of the parent hard disk locator in bytes."
Based on this the end offset of the two parent locators should be:
data offset + 512 * data space:
0x1000 + 512 * 4096 = 0x201000
0xc000 + 512 * 65536 = 0x200c000
But if one uses only data offset + data space:
0x1000 + 4096 = 0x2000 //end of parent locator 1, begin of BAT
0xc000 + 65536 = 0x1c000
This latter calculation makes much more sense: the end of the first parent locator is the beginning of the BAT (see header data above); and since the first BAT entry is 0xe7 (sector offset), this corresponds to file offset 0x1ce00 (sector offset * 512), which is OK, if the second parent locator ends at 0x1c000.
But if one uses the formula data offset + 512 * data space, he ends up having other data written in the parent locator. (But, in this example there would be no data corruption, since Platform Data Length is very small)
So is this a mistake in the specification, and the sentence
"Platform Data Space: This field stores the number of 512-byte sectors needed to store the parent hard disk locator."
should be
"Platform Data Space: This field stores the number of bytes needed to store the parent hard disk locator."?
Apparently Microsoft does not care about correcting their mistake, this being already discovered by Virtualbox developers. VHD.cpp contains the following comment:
/*
* The VHD spec states that the DataSpace field holds the number of sectors
* required to store the parent locator path.
* As it turned out VPC and Hyper-V store the amount of bytes reserved for the
* path and not the number of sectors.
*/

Ehcache miss counts

These are the statistics for my Ehcache.
I have it configured to use only memory (no persistence, no overflow to disk).
cacheHits = 50
onDiskHits = 0
offHeapHits = 0
inMemoryHits = 50
misses = 1194
onDiskMisses = 0
offHeapMisses = 0
inMemoryMisses = 1138
size = 69
averageGetTime = 0.061597
evictionCount = 0
As you can see, misses is higher than onDiskMisses + offHeapMisses + inMemoryMisses. I do have statistics strategy set to best effort:
cache.setStatisticsAccuracy(Statistics.STATISTICS_ACCURACY_BEST_EFFORT)
But the hits add up, and the difference between misses is rather large. Is there a reason why the misses do not add up correctly?
This quesiton is similar to Ehcache misses count and hitrate statistics, but the answer attributes the differences to the multiple tiers. There is only one tier here.
I'm almost certain that you're seeing this because inMemoryMisses does not include misses due to expired elements. On a get if the value is stored, but expired then you will not see an inMemoryMiss recorded, but you will see a cache miss.

How can I know if a TIFF image is in the format CCITT T.6(Group 4)?

How can I know if a TIFF image is in the format CCITT T.6(Group 4)?
You can use this (C#) code example.
It returns a value indicating the compression type:
1: no compression
2: CCITT Group 3
3: Facsimile-compatible CCITT Group 3
4: CCITT Group 4 (T.6)
5: LZW
public static int GetCompressionType(Image image)
{
int compressionTagIndex = Array.IndexOf(image.PropertyIdList, 0x103);
PropertyItem compressionTag = image.PropertyItems[compressionTagIndex];
return BitConverter.ToInt16(compressionTag.Value, 0);
}
You can check these links
The TIFF File Format
TIFF Tag Compression
TIFF File Format Summary
The tag 259 (hex 0x0103) store the info about the Compression method.
--- Compression
Tag = 259 (103)
Type = word
N = 1
Default = 1.
1 = No compression, but pack data into bytes as tightly as possible, with no
unused bits except at the end of a row. The bytes are stored as an array
of bytes, for BitsPerSample <= 8, word if BitsPerSample > 8 and <= 16, and
dword if BitsPerSample > 16 and <= 32. The byte ordering of data >8 bits
must be consistent with that specified in the TIFF file header (bytes 0
and 1). Rows are required to begin on byte boundaries.
2 = CCITT Group 3 1-Dimensional Modified Huffman run length encoding.
See ALGRTHMS.txt BitsPerSample must be 1, since this type of compression
is defined only for bilevel images (like FAX images...)
3 = Facsimile-compatible CCITT Group 3, exactly as specified in
"Standardization of Group 3 facsimile apparatus for document
transmission," Recommendation T.4, Volume VII, Fascicle VII.3,
Terminal Equipment and Protocols for Telematic Services, The
International Telegraph and Telephone Consultative Committee
(CCITT), Geneva, 1985, pages 16 through 31. Each strip must
begin on a byte boundary. (But recall that an image can be a
single strip.) Rows that are not the first row of a strip are
not required to begin on a byte boundary. The data is stored as
bytes, not words - byte-reversal is not allowed. See the
Group3Options field for Group 3 options such as 1D vs 2D coding.
4 = Facsimile-compatible CCITT Group 4, exactly as specified in
"Facsimile Coding Schemes and Coding Control Functions for Group
4 Facsimile Apparatus," Recommendation T.6, Volume VII, Fascicle
VII.3, Terminal Equipment and Protocols for Telematic Services,
The International Telegraph and Telephone Consultative Committee
(CCITT), Geneva, 1985, pages 40 through 48. Each strip must
begin on a byte boundary. Rows that are not the first row of a
strip are not required to begin on a byte boundary. The data is
stored as bytes, not words. See the Group4Options field for
Group 4 options.
5 = LZW Compression, for grayscale, mapped color, and full color images.
You can run identify -verbose from the ImageMagick suite on the image. Look for "Compression: Group4" in the output.
UPDATE:
SO, I downloaded the libtiff library from the link I mentioned before, and from what I've seen, you can do the following: (untested)
int isTIFF_T6(const char* filename)
{
TIFF* tif= TIFFOpen(filename,"r");
TIFFDirectory *td = &tif->tif_dir;
if(td->td_compression == COMPRESSION_CCITTFAX4) return 1;
return 0;
}
PREVIOUS:
This page has a lot of information about this format and links to some code in C:
Here's an excerpt:
The following paper covers T.4, T.6
and JBIG:
"Review of standards for electronic
imaging for facsimile systems" in
Journal of Electronic Imaging, Vol. 1,
No. 1, pp. 5-21, January 1992.
Source code can be obtained as part of
a TIFF toolkit - TIFF image
compression techniques for binary
images include CCITT T.4 and T.6:
ftp://ftp.sgi.com/graphics/tiff/tiff-v3.4beta035-tar.gz
Contact: sam#engr.sgi.com
Read more: http://www.faqs.org/faqs/compression-faq/part1/section-16.html#ixzz0TYLGKnHI

Resources