Ran out of memory searching text in ClickHouse - full-text-search

I'm investigating whether ClickHouse is a good option for OLAP purposes. To do so, I replicated some queries I have running on PostgreSQL, using ClickHouse's sintax.
All the queries I have ran are much faster than Postgres', but the ones that perform text search run out of memory. Below is the error code and the stack trace.
clickhouse_driver.errors.ServerException: Code: 241. DB::Exception:
Memory limit (for query) exceeded: would use 9.31 GiB (attempt to
allocate chunk of 524288 bytes), maximum: 9.31 GiB.
The script for the query is:
SELECT COUNT(*)
FROM ObserverNodeOccurrence as occ
LEFT JOIN
ObserverNodeOccurrence_NodeElements as occ_ne
ON occ._id = occ_ne.occurrenceId
WHERE
occ_ne.snippet LIKE '<img>'
The query above counts the number of entries of the column snippet which contain an HTML image (<img>). This column contains HTML snippets, hence searching text becomes quite expensive. A close/mid term goal is to parse this column and convert it into a set of other columns (e.g. contains_img, contains_script, etc.). But, for now, I would like to be able to run such query without running out of memory.
My question(s) is(are):
how can I successfully execute text-search queries on such column without running out of memory?
Is there a way to force the query planner to use disk as soon as it runs out of memory?
I am using MergeTree engine. Is there another engine that's able to split the load between ram and disk?
Full stack trace:
clickhouse_driver.errors.ServerException: Code: 241.
DB::Exception: Memory limit (for query) exceeded: would use 9.31 GiB (attempt to allocate chunk of 524288 bytes), maximum: 9.31 GiB. Stack trace:
0. /usr/bin/clickhouse-server(StackTrace::StackTrace()+0x22) [0x781c272]
1. /usr/bin/clickhouse-server(MemoryTracker::alloc(long)+0x8ba) [0x71bbb4a]
2. /usr/bin/clickhouse-server(MemoryTracker::alloc(long)+0xc5) [0x71bb355]
3. /usr/bin/clickhouse-server() [0x67aeb4e]
4. /usr/bin/clickhouse-server() [0x67af010]
5. /usr/bin/clickhouse-server() [0x67e5af4]
6. /usr/bin/clickhouse-server(void DB::Join::joinBlockImpl<(DB::ASTTableJoin::Kind)1, (DB::ASTTableJoin::Strictness)2, DB::Join::MapsTemplate<DB::JoinStuff::WithFlags<DB::RowRefList, false> > >(DB::Block&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, DB::NamesAndTypesList const&, DB::Block const&, DB::Join::MapsTemplate<DB::JoinStuff::WithFlags<DB::RowRefList, false> > const&) const+0xe1c) [0x68020dc]
7. /usr/bin/clickhouse-server(DB::Join::joinBlock(DB::Block&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, DB::NamesAndTypesList const&) const+0x1a5) [0x67bc415]
8. /usr/bin/clickhouse-server(DB::ExpressionAction::execute(DB::Block&, bool) const+0xa5d) [0x6d961dd]
9. /usr/bin/clickhouse-server(DB::ExpressionActions::execute(DB::Block&, bool) const+0x45) [0x6d97545]
10. /usr/bin/clickhouse-server(DB::ExpressionBlockInputStream::readImpl()+0x48) [0x6c52888]
11. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x188) [0x6635628]
12. /usr/bin/clickhouse-server(DB::FilterBlockInputStream::readImpl()+0xd9) [0x6c538b9]
13. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x188) [0x6635628]
14. /usr/bin/clickhouse-server(DB::ExpressionBlockInputStream::readImpl()+0x2d) [0x6c5286d]
15. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x188) [0x6635628]
16. /usr/bin/clickhouse-server(DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::loop(unsigned long)+0x139) [0x6c7f409]
17. /usr/bin/clickhouse-server(DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::thread(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long)+0x209) [0x6c7fc79]
18. /usr/bin/clickhouse-server(ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::*)(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long), DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>*, std::shared_ptr<DB::ThreadGroupStatus>, unsigned long&>(void (DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::*&&)(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long), DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>*&&, std::shared_ptr<DB::ThreadGroupStatus>&&, unsigned long&)::{lambda()#1}::operator()() const+0x7f) [0x6c801cf]
19. /usr/bin/clickhouse-server(ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>)+0x1af) [0x71c778f]
20. /usr/bin/clickhouse-server() [0xb2ac5bf]
21. /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7fc5b50826db]
22. /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fc5b480988f]

Run Clickhouse-Client in terminal
set max_bytes_before_external_group_by=20000000000; --20 GB for external group by
set max_memory_usage=40000000000; --40GB for memory limit

Related

Drop table fails with "Checksum doesn't match: corrupted data" exception on clickhouse

So our unit tests for clickhouse started failing. Fails on simple SQL:
::clickhouse::Client(client_options_).Execute("DROP TABLE IF EXISTS test.delme");
for client options I have host, default_database, user and password set.
the error:
[clickhouse error 40, DB::Exception: Checksum doesn't match: corrupted data. Reference: 8a58086e26544cb09217aa1bba09a1d9. Actual: 7c7a5cd56cac83a714e286dbbd46acb5. Size of compressed block: 20]
Errors on the server:
0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) # 0xa38beba in /usr/bin/clickhouse
1. ? # 0x140ae996 in /usr/bin/clickhouse
2. DB::CompressedReadBufferBase::readCompressedData(unsigned long&, unsigned long&, bool) # 0x140ad956 in /usr/bin/clickhouse
3. ? # 0x140ace9f in /usr/bin/clickhouse
4. DB::NativeReader::read() # 0x15cf19c4 in /usr/bin/clickhouse
5. DB::TCPHandler::receiveData(bool) # 0x15ccb990 in /usr/bin/clickhouse
6. DB::TCPHandler::receivePacket() # 0x15cc0a4f in /usr/bin/clickhouse
7. DB::TCPHandler::readDataNext() # 0x15cc3c9f in /usr/bin/clickhouse
8. ? # 0x15cceb68 in /usr/bin/clickhouse
9. DB::Context::initializeExternalTablesIfSet() # 0x1474b5f6 in /usr/bin/clickhouse
10. ? # 0x14feb237 in /usr/bin/clickhouse
11. DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) # 0x14fe9f0e in /usr/bin/clickhouse
12. DB::TCPHandler::runImpl() # 0x15cb97ad in /usr/bin/clickhouse
13. DB::TCPHandler::run() # 0x15ccdd59 in /usr/bin/clickhouse
14. Poco::Net::TCPServerConnection::start() # 0x18a617b3 in /usr/bin/clickhouse
15. Poco::Net::TCPServerDispatcher::run() # 0x18a62c2d in /usr/bin/clickhouse
16. Poco::PooledThread::run() # 0x18c2d9c9 in /usr/bin/clickhouse
17. Poco::ThreadImpl::runnableEntry(void*) # 0x18c2b242 in /usr/bin/clickhouse
18. ? # 0x7f4e74010609 in ?
19. __clone # 0x7f4e73f35133 in ?
table does not exist, so no idea what data is corrupted.
clickhouse version: 22.8.2.11 using c++ client (https://github.com/ClickHouse/clickhouse-cpp)
I will try to recreate database and user, but wondering what led to these errors.
I'm not able to comment, so I'll write an answer.
Have you tried dropping the database test?
Maybe check in table system.parts, whether there are parts for this table. If yes, drop them.
Best regards,
Albert
albert1.cornelius+stackoverflow#gmail.com

clickhouse occurred Code:999 where optimize MATERIALIZED VIEW table with ReplicatedReplacingMergeTree engine

It's 2 shared 2 replicas clickhouse cluster ,It's 4 clickhouse nodes
where I optimize table in one node , occurred error as following:
but it's normal where execute on any other clickhouse nodes。
risk-luck2.dg.163.org :) optimize table risk_detect_test.risk_doubtful_user_daily_device_view_lyp;
OPTIMIZE TABLE risk_detect_test.risk_doubtful_user_daily_device_view_lyp
Received exception from server (version 20.4.4):
Code: 999. DB::Exception: Received from localhost:9000. DB::Exception: Can't get data for node /clickhouse/tables/test/01-02/risk_doubtful_user_daily_device_view_lyp/replicas/risk-olap6.dg.163.org (multiple leaders Ok)/host: node doesn't exist (No node).
0 rows in set. Elapsed: 0.002 sec.
risk-luck2.dg.163.org :) show create table risk_detect_test.risk_doubtful_user_daily_device_view_lyp;
SHOW CREATE TABLE risk_detect_test.risk_doubtful_user_daily_device_view_lyp
┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE MATERIALIZED VIEW risk_detect_test.risk_doubtful_user_daily_device_view_lyp
(
`app_id` String,
`event_date` Date,
`device_id` UInt32
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/test/{layer}-{shard}/risk_doubtful_user_daily_device_view_lyp', '{replica}')
PARTITION BY toYYYYMM(event_date)
PRIMARY KEY app_id
ORDER BY (app_id, event_date, device_id)
SETTINGS index_granularity = 8192 AS
SELECT
app_id,
event_date,
xxHash32(device_id) AS device_id
FROM risk_detect_online.dwd_risk_doubtful_detail │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
It seems it's another bug in CH
ENGINE = ReplicatedReplacingMergeTree(
'/clickhouse/tables/test/{layer}-{shard}/risk_doubtful_user_daily_device_view_lyp', '{replica}')
Can't get data for node
/clickhouse/tables/online/01-02/risk_doubtful_user_daily_device_view/replicas/risk-olap6.dg.163.org
CH tries to use incorrect Zookeeper path in case of Mat.View.
risk_doubtful_user_daily_device_view instead of risk_doubtful_user_daily_device_view_lyp.
Database also is incorrect tables/online/01-02/ /tables/test/{layer}-{shard}/
I suggest you to switch to "TO" notation. https://den-crane.github.io/Everything_you_should_know_about_materialized_views_commented.pdf
Or run optimize against the inner table
OPTIMIZE TABLE "risk_detect_test"."inner.risk_doubtful_user_daily_device_view_lyp";
clickhouse-server.log as following:
2021.08.18 16:37:11.384434 [ 128614 ] {b6de1d84-a238-4e2f-9af4-3ce0ddf8551d} <Debug> executeQuery: (from 10.200.128.91:40236) insert into dwd_risk_detect_detail(app_id, app_type, app_version, city, created_at, defense_count, defense_result, detect_count, device_code, device_id, id, ip, model, os_version, package_name, phone_brand, platform, province, region, risk_type1, risk_type2, risk_type3, role_account, role_id, sdk_version, sign_hash, ts) FORMAT TabSeparated
2021.08.18 16:37:11.384735 [ 128614 ] {b6de1d84-a238-4e2f-9af4-3ce0ddf8551d} <Trace> ContextAccess (default): Access granted: INSERT(app_id, app_type, app_version, city, created_at, defense_count, defense_result, detect_count, device_code, device_id, id, ip, model, os_version, package_name, phone_brand, platform, province, region, risk_type1, risk_type2, risk_type3, role_account, role_id, sdk_version, sign_hash, ts) ON risk_detect_online.dwd_risk_detect_detail
2021.08.18 16:37:11.385706 [ 128614 ] {b6de1d84-a238-4e2f-9af4-3ce0ddf8551d} <Debug> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "risk_type1 != 0" moved to PREWHERE
2021.08.18 16:37:11.386554 [ 128614 ] {b6de1d84-a238-4e2f-9af4-3ce0ddf8551d} <Trace> ContextAccess (default): Access granted: SELECT(id, app_id, app_type, device_id, role_id, defense_result, risk_type1, risk_type2, risk_type3, defense_count, detect_count, event_date, event_hour, event_minute) ON risk_detect_online.dwd_risk_detect_detail
2021.08.18 16:37:11.386764 [ 128614 ] {b6de1d84-a238-4e2f-9af4-3ce0ddf8551d} <Trace> ContextAccess (default): Access granted: INSERT(app_id, app_type, event_date, event_hour, event_minute, risk_type1, risk_type2, risk_type3, defense_result, defense_count, detect_count, device_id, role_id, id) ON risk_detect_online.`.inner.risk_stat_view`
2021.08.18 16:37:11.387323 [ 128614 ] {b6de1d84-a238-4e2f-9af4-3ce0ddf8551d} <Trace> ContextAccess (default): Access granted: SELECT(app_id, app_type, device_id, role_id, event_date) ON risk_detect_online.dwd_risk_detect_detail
2021.08.18 16:37:11.387434 [ 128614 ] {b6de1d84-a238-4e2f-9af4-3ce0ddf8551d} <Trace> ContextAccess (default): Access granted: INSERT(app_id, app_type, event_date, device_id, role_id) ON risk_detect_online.`.inner.risk_total_user_stat_view`
2021.08.18 16:37:11.578506 [ 128861 ] {819b05a8-5ad0-414f-a0a7-111c765cac57} <Debug> executeQuery: (from 127.0.0.1:40932) OPTIMIZE TABLE risk_detect_online.risk_doubtful_user_daily_device_view
2021.08.18 16:37:11.578659 [ 128861 ] {819b05a8-5ad0-414f-a0a7-111c765cac57} <Trace> ContextAccess (default): Access granted: OPTIMIZE ON risk_detect_online.risk_doubtful_user_daily_device_view
2021.08.18 16:37:11.580097 [ 128861 ] {819b05a8-5ad0-414f-a0a7-111c765cac57} <Error> executeQuery: Code: 999, e.displayText() = Coordination::Exception: Can't get data for node /clickhouse/tables/online/01-02/risk_doubtful_user_daily_device_view/replicas/risk-olap6.dg.163.org (multiple leaders Ok)/host: node doesn't exist (No node) (version 20.4.4.18 (official build)) (from 127.0.0.1:40932) (in query: OPTIMIZE TABLE risk_detect_online.risk_doubtful_user_daily_device_view), Stack trace (when copying this message, always include the lines below):
0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) # 0x104191d0 in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) # 0x8fff8ad in /usr/bin/clickhouse
2. Coordination::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, int) # 0xdddf7d8 in /usr/bin/clickhouse
3. Coordination::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) # 0xdddfe02 in /usr/bin/clickhouse
4. ? # 0xddf1f60 in /usr/bin/clickhouse
5. DB::StorageReplicatedMergeTree::sendRequestToLeaderReplica(std::__1::shared_ptr<DB::IAST> const&, DB::Context const&) # 0xd76117e in /usr/bin/clickhouse
6. DB::StorageReplicatedMergeTree::optimize(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::IAST> const&, bool, bool, DB::Context const&) # 0xd762546 in /usr/bin/clickhouse
7. DB::StorageMaterializedView::optimize(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::IAST> const&, bool, bool, DB::Context const&) # 0xd6d5a9d in /usr/bin/clickhouse
8. DB::InterpreterOptimizeQuery::execute() # 0xd225346 in /usr/bin/clickhouse
9. ? # 0xd5499f9 in /usr/bin/clickhouse
10. DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, bool) # 0xd54d025 in /usr/bin/clickhouse
11. DB::TCPHandler::runImpl() # 0x9106678 in /usr/bin/clickhouse
12. DB::TCPHandler::run() # 0x9107650 in /usr/bin/clickhouse
13. Poco::Net::TCPServerConnection::start() # 0x10304f4b in /usr/bin/clickhouse
14. Poco::Net::TCPServerDispatcher::run() # 0x103053db in /usr/bin/clickhouse
15. Poco::PooledThread::run() # 0x104b2fa6 in /usr/bin/clickhouse
16. Poco::ThreadImpl::runnableEntry(void*) # 0x104ae260 in /usr/bin/clickhouse
17. start_thread # 0x74a4 in /lib/x86_64-linux-gnu/libpthread-2.24.so
18. __clone # 0xe8d0f in /lib/x86_64-linux-gnu/libc-2.24.so
2021.08.18 16:37:11.580526 [ 128861 ] {819b05a8-5ad0-414f-a0a7-111c765cac57} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
2021.08.18 16:37:11.580592 [ 128861 ] {} <Information> TCPHandler: Processed in 0.002 sec.

What does runtime.memclrNoHeapPointers do?

I am profiling a library and see that a function called runtime.memclrNoHeapPointers is taking up about 0.82seconds of the cpu-time.
What does this function do, and what does this tell me about the memory-usage of the library i am profiling?
The output, for completeness:
File: gribtest.test
Type: cpu
Time: Feb 12, 2019 at 8:27pm (CET)
Duration: 5.21s, Total samples = 5.11s (98.15%)
Showing nodes accounting for 4.94s, 96.67% of 5.11s total
Dropped 61 nodes (cum <= 0.03s)
flat flat% sum% cum cum%
1.60s 31.31% 31.31% 1.81s 35.42% github.com/nilsmagnus/grib/griblib.(*BitReader).readBit
1.08s 21.14% 52.45% 2.89s 56.56% github.com/nilsmagnus/grib/griblib.(*BitReader).readUint
0.37s 7.24% 59.69% 0.82s 16.05% encoding/binary.(*decoder).value
0.35s 6.85% 66.54% 0.35s 6.85% runtime.memclrNoHeapPointers
func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
memclrNoHeapPointers clears n bytes starting at ptr.
Usually you should use typedmemclr. memclrNoHeapPointers should be
used only when the caller knows that *ptr contains no heap pointers
because either:
*ptr is initialized memory and its type is pointer-free.
*ptr is uninitialized memory (e.g., memory that's being reused
for a new allocation) and hence contains only "junk".
in memclr_*.s go:noescape
See https://github.com/golang/go/blob/9e277f7d554455e16ba3762541c53e9bfc1d8188/src/runtime/stubs.go#L78
This is part of the garbage collector. You can see the declaration here.
The specifics of what it does are CPU dependent. See the various memclr_*.s files in the runtime for implmentation
This does seem like a long time in the GC, but it's hard to say something about the memory usage of the library with just the data you've shown I think.

Windows 10 x64: Unable to get PXE on Windbg

Can't understand how Windows Memory Manager works.
I look at the attached user process (dbgview.exe).
It is WOW64-process. At the specified address (0x76560000) there is .text section of the kernel32.dll module (also WOW64).
Why there is no PTE and other tables in the process page table pointing to those virtual address?
kd> db 76560000
00000000`76560000 8b ff 55 8b ec 51 56 57-33 f6 89 55 fc 56 68 80 ..U..QVW3..U.Vh.
<...>
kd> !pte 76560000
VA 0000000076560000
PXE at FFFFF6FB7DBED000 PPE at FFFFF6FB7DA00008 PDE at FFFFF6FB40001D90 PTE at FFFFF680003B2B00
Unable to get PXE FFFFF6FB7DBED000
kd> db FFFFF680003B2B00
fffff680`003b2b00 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????
<...>
I know that pages will be allocated after first access (with page fault) have occured, but why there is no protype PTE too?
Firstly, translate an arbitrary virtual address to physical using !vtop to see the dirbase of the process in the process of translation, or use !process to find the dirbase of the process:
lkd> .process /p fffffa8046a2e5f0
Implicit process is now fffffa80`46a2e5f0
lkd> .context 77fa90000
lkd> !vtop 0 13fe60000
Amd64VtoP: Virt 00000001`3fe60000, pagedir 7`7fa90000
Amd64VtoP: PML4E 7`7fa90000
Amd64VtoP: PDPE 1`c2e83020
Amd64VtoP: PDE 7`84e04ff8
Amd64VtoP: PTE 4`be585300
Amd64VtoP: Mapped phys 6`3efae000
Virtual address 13fe60000 translates to physical address 63efae000.
Then find that physical frame in the PFN database (in this case the physical page for PML4 (cr3 page aka. dirbase) is 77fa90 with full physical address 77fa90000:
lkd> !pfn 77fa90
PFN 0077FA90 at address FFFFFA80167EFB00
flink FFFFFA8046A2E5F0 blink / share count 00000005 pteaddress FFFFF6FB7DBEDF68
reference count 0001 used entry count 0000 Cached color 0 Priority 0
restore pte 00000080 containing page 77FA90 Active M
Modified
The address FFFFF6FB7DBED000 is therefore the virtual address of the PML4 page and FFFFF6FB7DBEDF68 is the virtual address of the PML4E self reference entry (1ed*8 = f68).
FFFFF6FB7DBED000 = 1111111111111111111101101111101101111101101111101101000000000000
1111111111111111 111101101 111101101 111101101 111101101 000000000000
The PML4 can only be at a virtual address where the PML4E, PDTPE, PDE and PTE index are the same, so there are actually 2^9 different combinations of that and windows 7 always selects 0x1ed i.e. 111101101. The reason for this is because the PML4 contains a PML4 that points to itself i.e. the physical frame of the PML4, so it will need to keep indexing to that same location at every level of the hierarchy.
The PML4, being a page table page, must reside in the kernel, and kernel addresses are high-canonical, i.e. prefixed with 1111111111111111, and kernel addresses begin with 00001 through 11111 i.e. from 08 to ff
The range of possible addresses that a 64 bit OS that uses 8TiB for user address space can place it at is therefore 31*(2^4) = 496 different possible locations and not actually 2^9:
1111111111111111 000010000 000010000 000010000 000010000 000000000000
1111111111111111 111111111 111111111 111111111 111111111 000000000000
I.e. the first is FFFF080402010000, the second is FFFF088442211000, the last is FFFFFFFFFFFFF000.
Note:
Up until Windows 10 TH2, the magic index for the Self-Reference PML4 entry was 0x1ed as mentioned above. But what about Windows 10 from 1607? Well Microsoft uped their game, as a constant battle for improving Windows security the index is randomized at boot-time, so 0x1ed is now one of the 512 [sic. (496)] possible values (i.e. 9-bit index) that the Self-Reference entry index can have. And side effect, it also broke some of their own tools, like the !pte2va WinDbg command.
0xFFFFF68000000000 is the address of the first PTE in the first page table page, so basically MmPteBase, except because on Windows 10 1607 the PML4E can be an other than 0x1ed, the base is not always 0xFFFFF68000000000 as a result, and it uses a variable nt!MmPteBase to know instantly where the base of the page table page allocations begins. Previously, this symbol does not exist in ntoskrnl.exe, because it has a hardcoded base 0xFFFFF68000000000. The address of the first and last page table page is going to be:
first last
* pml4e_offset : 0x1ed 0x1ed
* pdpe_offset : 0x000 0x1ff
* pde_offset : 0x000 0x1ff
* pte_offset : 0x000 0x1ff
* offset : 0x000 0x000
This gives 0xFFFFF68000000000 for the first and 0xFFFFF6FFFFFFF000 for the last page table page when the PML4E index is 0x1ed. PDEs + PDPTEs + PML4Es + PTEs are assigned in this range.
Therefore, to be able to translate a virtual address to its PTE virtual address (and !pte2va is the reverse of this), you affix 111101101 to the start of the virtual address and then you truncate the last 12 bits (the page offset, which is no longer useful) and then you times it by 8 bytes (the PTE size) (i.e. add 3 zeroes to the end, which creates a new page offset from the last level index into the page that contains the PTEs times the size of a PTE structure). Concatenating the PML4E index to the start simply causes it to loop back one time such that you actually get the PTE rather than what the PTE points to. Concatenating it to the start is the same thing as adding it to MmPteBase.
Here is simple C++ code to do it:
// pte.cpp
#include<iostream>
#include<string>
int main(int argc, char *argv[]) {
unsigned long long int input = std::stoull(argv[1], nullptr, 16);
long long int ptebase = 0xFFFFF68000000000;
long long int pteaddress = ptebase + ((input >> 12) << 3);
std::cout << "0x" << std::hex << pteaddress;
}
C:\> pte 13fe60000
0xfffff680009ff300
To get the PDE virtual address you have to affix it twice and then truncate the last 21 bits and then times by 8. This is how !pte is supposed to work, and is the opposite of !pte2va.
Similarly, PDEs + PDPTEs + PML4Es are assigned in the range:
first last
* pml4e_offset : 0x1ed 0x1ed
* pdpe_offset : 0x1ed 0x1ed
* pde_offset : 0x000 0x1ff
* pte_offset : 0x000 0x1ff
* offset : 0x000 0x000
Because when you get to 0x1ed for the pdpte offset within the page table page range, all of a sudden, you are looping back in the PML4 once again, so you get the PDE.
If it says there is no PTE for an address within a virtual page for which the corresponding physical frame is shown to be part of the working set by VMMap, then you might be experiencing my issue, where you need to use .process /P if you're doing live kernel debugging (local or remote) to explicitly tell the debugger that you want to translate user and kernel addresses in the context of the process and not the debugger.
I have found that since Windows 10 Anniversary Update (1607, 10.0.14393) PML4 table had been randomized to mitigate kernel heap spraying.
It means that probably Page Table is not placed at 0xFFFFF6800000.

How to maximize data transfer speed over USB (configured as virtual com port)

I have troubles to get my streaming over OTG-USB-FS configured as VCP. In my disposition I have nucleo-h743zi board that seems to doing a good job at sending me data, but on PC side I have a problem to receive that data.
for(;;) {
#define number_of_ccr 1024
unsigned int lpBuffer[number_of_ccr] = {0};
unsigned long nNumberOfBytesToRead = number_of_ccr*4;
unsigned long lpNumberOfBytesRead;
QueryPerformanceCounter(&startCounter);
ReadFile(
hSerial,
lpBuffer,
nNumberOfBytesToRead,
&lpNumberOfBytesRead,
NULL
);
if(!strcmp(lpBuffer, "end\r\n")) {
CloseHandle(FileHandle);
fprintf(stderr, "end flag was received\n");
break;
}
else if(lpNumberOfBytesRead > 0) {
// NOTE(): succeed
QueryPerformanceCounter(&endCounter);
time = Win32GetSecondsElapsed(startCounter, endCounter);
char *copyString = "copy";
WriteFile(hSerial, copyString , strlen(copyString), &bytes_written, NULL);
DWORD BytesWritten;
// write data to file
WriteFile(FileHandle, lpBuffer, nNumberOfBytesToRead, &BytesWritten, 0);
}
}
QPC shows that speed was 0.00733297970 - it's one time for one successful data block transfer (1024*4 bytes).
this is the Listener code, I bet that this is not how it should be done, so I here to seek advices. I was hopping that maybe full streaming without control sequences ("copy") will be possible, but in that case I can't receive adjacent data (within one transfer block it's OKAY, but two consecutive received blocks aren't adjacent.
Example:
block_1: 1 2 3 4 5 6
block_2: 13 14 15 16 17 18
Is there any way to speed up my receiving?
(I was trying O2 key without any success)
You need to configure buffer on PC side that will be 2 or 3 times the buffer you are transfer from your board, and use something like double buffer scheme for transferring the data. You transfer the first buffer while filing the second, then alternate.
Good thing to do is to activate caches, and place the buffers in fast memory for stm32h7 (it's 1 domain RAM).
But if your interface do not match the speed you needed, there will be no tricks to do this. Except maybe one, if your controller is fast enough -> you can implement and use lossless data compression on that data of yours and transfer compressed files. If you transmit low entropy data, this could give you a solid boost in speed.

Resources