(NEAR protocol) How to get the Validator Node up - nearprotocol-validator

I am trying to run a validator node using the https://docs.near.org/docs/develop/node/validator/deploy-on-mainnet instruction. I have successfully deployed mainnet Staking Pool with the following command (2nd step of the instruction):
near call poolv1.near create_staking_pool '{"staking_pool_id":"<name_of_pool>", "owner_id":"<wallet_name>.near", "stake_public_key":"ed25519:3QohztWwCktk3j3MBiCuGaB6vXxeqjUasLan6ChSnduh", "reward_fee_fraction": {"numerator": 3, "denominator": 100}}' --account_id <wallet_name>.near --amount 30 --gas 300000000000000
The transaction
https://explorer.mainnet.near.org/transactions/93xQC8UozL6toVddkPk14qiExdRZMt3gJqCfHz9BBNpV
But after starting NEAR node, the database synchronization does not start (3nd step of the instruction).
target/release/neard run
The operating system listens on ports 3030 and 24567. Both ports are open in the Firewall.

I needed to add boot_nodes to config.json.
rm ~/.near/config.json
wget -O ~/.near/config.json https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/mainnet/config.json
After that, the node started
Nov 01 12:08:09.253 INFO stats: #51601485 EgEtVtQuGmvAkpRdjMb6rRwc7qjh2wYEKtmoWuqTDxf5 -/60 33/27/40 peers ⬇ 314.7kiB/s ⬆ 252.9kiB/s 0.80 bps 100.56 Tgas/s CPU: 26%, Mem: 6.0 GiB
Nov 01 12:08:19.256 INFO stats: #51601494 73BhJpUURFQU36Npgi5VajgJxPNdfuF3TJKFHneeA3Pd -/60 33/27/40 peers ⬇ 318.9kiB/s ⬆ 253.2kiB/s 0.90 bps 109.87 Tgas/s CPU: 24%, Mem: 6.0 GiB
Nov 01 12:08:29.299 INFO stats: #51601503 4o7iuNUaypY6KQphhrEYoGU5ZRECyPwyM7Vge8AVnsYq -/60 33/27/40 peers ⬇ 313.7kiB/s ⬆ 247.1kiB/s 0.90 bps 116.37 Tgas/s CPU: 28%, Mem: 6.0 GiB
Nov 01 12:08:39.303 INFO stats: #51601511 GxQePJAQBrirSNwt7uraNeWqXGFZG1YkpKLJfX4Wd9Pb -/60 33/27/40 peers ⬇ 313.6kiB/s ⬆ 247.9kiB/s 0.80 bps 78.89 Tgas/s CPU: 22%, Mem: 6.0 GiB
Nov 01 12:08:49.306 INFO stats: #51601520 9HzEv22KpcGmip83s9T3FqmKB9qbH6Mu1Cdh8cULf9BN -/60 33/27/40 peers ⬇ 319.0kiB/s ⬆ 250.8kiB/s 0.90 bps 98.36 Tgas/s CPU: 25%, Mem: 6.0 GiB
Nov 01 12:08:59.309 INFO stats: #51601529 7NaEtQW8qp1jtZ3CbzRFcjzjjFx4DLtT9bUCc1CVRCB8 -/60 33/27/40 peers ⬇ 318.7kiB/s ⬆ 250.9kiB/s 0.90 bps 115.87 Tgas/s CPU: 24%, Mem: 6.0 GiB

Related

Cannot download blocks

I try to run a rpc node, after download headers, it show some warn. It didn't start download block. Anyone can help me?
Mar 27 00:06:43.791 INFO neard: Version: 1.25.0, Build: crates-0.12.0-31-g9b3d6ba55, Latest Protocol: 52
Mar 27 00:06:43.796 INFO near: Opening store database at "/home/aurora/.near/mainnet/data"
Mar 27 00:06:44.033 INFO near_network::peer_manager::peer_manager_actor: Bandwidth stats total_bandwidth_used_by_all_peers=0 total_msg_received_count=0 max_max_record_num_messages_in_progress=0
Mar 27 00:06:54.078 INFO stats: #62165908 Waiting for peers 1 peer ⬇ 0.1kiB/s ⬆ 3 B/s 0.00 bps 0 gas/s
Mar 27 00:07:04.080 INFO stats: #62165908 Waiting for peers 2 peers ⬇ 507.6kiB/s ⬆ 62.4kiB/s 0.00 bps 0 gas/s CPU: 30%, Mem: 145.0 MiB
Mar 27 00:07:14.082 INFO stats: #62165908 Waiting for peers 2 peers ⬇ 1015.8kiB/s ⬆ 616.2kiB/s 0.00 bps 0 gas/s CPU: 156%, Mem: 206.6 MiB
Mar 27 00:07:24.085 INFO stats: #62165908 Downloading headers 100.00% (468) 4 peers ⬇ 1018.6kiB/s ⬆ 2.0MiB/s 0.00 bps 0 gas/s CPU: 14%, Mem: 314.0 MiB
Mar 27 00:07:34.088 INFO stats: #62165908 Downloading headers 100.00% (473) 5 peers ⬇ 1.5MiB/s ⬆ 2.0MiB/s 0.00 bps 0 gas/s CPU: 38%, Mem: 366.0 MiB
Mar 27 00:07:44.091 INFO stats: #62165908 Downloading headers 100.00% (484) 5 peers ⬇ 2.0MiB/s ⬆ 2.5MiB/s 0.00 bps 0 gas/s CPU: 7%, Mem: 378.2 MiB
Mar 27 00:07:44.362 WARN near_network::peer_manager::peer_manager_actor: Peer bandwidth exceeded threshold peer_id=ed25519:DCiEjHES1eRwj8zYbU5EdFyWQRa8zrjq7hhrBmb3Seop bandwidth_used=31287035 msg_received_count=124
Mar 27 00:07:44.362 WARN near_network::peer_manager::peer_manager_actor: Peer bandwidth exceeded threshold peer_id=ed25519:5DNVteGRxgUv4WSpJu4337aQud5P9m8TN1uAwAGzTjFP bandwidth_used=31235365 msg_received_count=123
Mar 27 00:07:44.362 WARN near_network::peer_manager::peer_manager_actor: Peer bandwidth exceeded threshold peer_id=ed25519:EVqGxpKoP3rxqMruuTmkQrhZnVE3um8XF2gwap4HhVqd bandwidth_used=31395111 msg_received_count=248
Mar 27 00:07:44.362 WARN near_network::peer_manager::peer_manager_actor: Peer bandwidth exceeded threshold peer_id=ed25519:CzYbBGrPx3XrdJeoNHs1YAz7TY3garm1RhZqruJXGPY5 bandwidth_used=34275095 msg_received_count=383
Mar 27 00:07:44.362 WARN near_network::peer_manager::peer_manager_actor: Peer bandwidth exceeded threshold peer_id=ed25519:GVWThRijD1ZyqTf6pNvS4BBYBd9gCB2P1Qjfi9DAMi7G bandwidth_used=10089910 msg_received_count=2
Mar 27 00:07:44.362 INFO near_network::peer_manager::peer_manager_actor: Bandwidth stats total_bandwidth_used_by_all_peers=138282516 total_msg_received_count=880 max_max_record_num_messages_in_progress=52
Mar 27 00:08:44.364 INFO near_network::peer_manager::peer_manager_actor: Bandwidth stats total_bandwidth_used_by_all_peers=16194710 total_msg_received_count=1826 max_max_record_num_messages_in_progress=53
Mar 27 00:09:44.365 INFO near_network::peer_manager::peer_manager_actor: Bandwidth stats total_bandwidth_used_by_all_peers=6119474 total_msg_received_count=1653 max_max_record_num_messages_in_progress=85
Mar 27 00:10:44.367 INFO near_network::peer_manager::peer_manager_actor: Bandwidth stats total_bandwidth_used_by_all_peers=10920258 total_msg_received_count=1615 max_max_record_num_messages_in_progress=110
Mar 27 00:11:44.368 INFO near_network::peer_manager::peer_manager_actor: Bandwidth stats total_bandwidth_used_by_all_peers=4072216 total_msg_received_count=1857 max_max_record_num_messages_in_progress=169
Mar 27 00:12:44.369 INFO near_network::peer_manager::peer_manager_actor: Bandwidth stats total_bandwidth_used_by_all_peers=1962496 total_msg_received_count=1741 max_max_record_num_messages_in_progress=249
Mar 27 00:13:44.370 INFO near_network::peer_manager::peer_manager_actor: Bandwidth stats total_bandwidth_used_by_all_peers=2022483 total_msg_received_count=1689 max_max_record_num_messages_in_progress=290
Mar 27 00:14:44.373 INFO near_network::peer_manager::peer_manager_actor: Bandwidth stats total_bandwidth_used_by_all_peers=1901363 total_msg_received_count=1710 max_max_record_num_messages_in_progress=332
Mar 27 00:15:44.374 INFO near_network::peer_manager::peer_manager_actor: Bandwidth stats total_bandwidth_used_by_all_peers=1901830 total_msg_received_count=1635 max_max_record_num_messages_in_progress=372
Mar 27 00:16:44.376 INFO near_network::peer_manager::peer_manager_actor: Bandwidth stats total_bandwidth_used_by_all_peers=1944505 total_msg_received_count=1753 max_max_record_num_messages_in_progress=414```

near indexer does not add anything to the database

I've tried to run https://github.com/near/near-indexer-for-explorer
No firewall, IP accessable (tested right now).
With empty data, it waits for peers forever.
With data from the run started some days ago
./target/release/indexer-explorer --home-dir ../.near/mainnet run --store-genesis --stream-while-syncing --allow-missing-relations-in-first-blocks 1000 sync-from-latest
It does something
Nov 01 18:42:23.293 INFO indexer_for_explorer: AccessKeys from genesis were added/updated successful.
Nov 01 18:42:33.188 INFO stats: # 9820210 Waiting for peers 1/1/40 peers ⬇ 0 B/s ⬆ 0 B/s 0.00 bps 0 gas/s CPU: 0%, Mem: 0 B
Nov 01 18:42:43.190 INFO stats: # 9820210 Downloading headers 68.72% (13074549) 3/3/40 peers ⬇ 149.3kiB/s ⬆ 6.0kiB/s 0.00 bps 0 gas/s CPU: 23%, Mem: 510.7 MiB
Nov 01 18:42:53.192 INFO stats: # 9820210 Downloading headers 68.72% (13074559) 2/2/40 peers ⬇ 299.4kiB/s ⬆ 297.5kiB/s 0.00 bps 0 gas/s CPU: 40%, Mem: 621.3 MiB
Nov 01 18:43:03.194 INFO stats: # 9820210 Downloading headers 68.72% (13074569) 1/1/40 peers ⬇ 150.1kiB/s ⬆ 148.9kiB/s 0.00 bps 0 gas/s CPU: 42%, Mem: 520.7 MiB
Nov 01 18:43:13.196 INFO stats: # 9820210 Downloading headers 68.72% (13074578) 2/2/40 peers ⬇ 150.3kiB/s ⬆ 148.8kiB/s 0.00 bps 0 gas/s CPU: 10%, Mem: 631.6 MiB
Nov 01 18:43:23.198 INFO stats: # 9820210 Downloading headers 68.72% (13074590) 2/1/40 peers ⬇ 294.1kiB/s ⬆ 297.6kiB/s 0.00 bps 0 gas/s CPU: 14%, Mem: 601.5 MiB
Nov 01 18:43:33.200 INFO stats: # 9820210 Downloading headers 68.72% (13074598) 1/1/40 peers ⬇ 149.4kiB/s ⬆ 148.8kiB/s 0.00 bps 0 gas/s CPU: 2%, Mem: 602.9 MiB
Nov 01 18:43:43.203 INFO stats: # 9820210 EPnLgE7iEq9s7yTkos96M3cWymH5avBAPm3qx3NXqR8H -/4 2/2/40 peers ⬇ 150.0kiB/s ⬆ 148.8kiB/s 0.00 bps 0 gas/s CPU: 9%, Mem: 657.0 MiB
Nov 01 18:43:53.209 INFO stats: # 9820210 Downloading headers 68.72% (13074608) 1/1/40 peers ⬇ 150.5kiB/s ⬆ 148.8kiB/s 0.00 bps 0 gas/s CPU: 3%, Mem: 661.0 MiB
Nov 01 18:44:03.212 INFO stats: # 9820210 EPnLgE7iEq9s7yTkos96M3cWymH5avBAPm3qx3NXqR8H -/4 1/1/40 peers ⬇ 148.6kiB/s ⬆ 148.8kiB/s 0.00 bps 0 gas/s CPU: 4%, Mem: 664.8 MiB
Nov 01 18:44:13.213 INFO stats: # 9820210 EPnLgE7iEq9s7yTkos96M3cWymH5avBAPm3qx3NXqR8H -/4 0/0/40 peers ⬇ 0 B/s ⬆ 0 B/s 0.00 bps 0 gas/s CPU: 2%, Mem: 664.8 MiB
Nov 01 18:44:23.215 INFO stats: # 9820210 EPnLgE7iEq9s7yTkos96M3cWymH5avBAPm3qx3NXqR8H -/4 0/0/40 peers ⬇ 0 B/s ⬆ 0 B/s 0.00 bps 0 gas/s CPU: 1%, Mem: 666.8 MiB
Nov 01 18:44:33.217 INFO stats: # 9820210 Downloading headers 68.72% (13074655) 1/1/40 peers ⬇ 150.0kiB/s ⬆ 148.8kiB/s 0.00 bps 0 gas/s CPU: 11%, Mem: 614.7 MiB
Nov 01 18:44:43.219 INFO stats: # 9820210 EPnLgE7iEq9s7yTkos96M3cWymH5avBAPm3qx3NXqR8H -/4 0/0/40 peers ⬇ 0 B/s ⬆ 0 B/s 0.00 bps 0 gas/s CPU: 1%, Mem: 614.9 MiB
Nov 01 18:44:53.224 INFO stats: # 9820210 EPnLgE7iEq9s7yTkos96M3cWymH5avBAPm3qx3NXqR8H -/4 0/0/40 peers ⬇ 0 B/s ⬆ 0 B/s 0.00 bps 0 gas/s CPU: 1%, Mem: 614.9 MiB
Nov 01 18:45:03.227 INFO stats: # 9820210 EPnLgE7iEq9s7yTkos96M3cWymH5avBAPm3qx3NXqR8H -/4 0/0/40 peers ⬇ 0 B/s ⬆ 0 B/s 0.00 bps 0 gas/s CPU: 1%, Mem: 616.4 MiB
Nov 01 18:45:13.232 INFO stats: # 9820210 EPnLgE7iEq9s7yTkos96M3cWymH5avBAPm3qx3NXqR8H -/4 0/0/40 peers ⬇ 0 B/s ⬆ 0 B/s 0.00 bps 0 gas/s CPU: 1%, Mem: 616.4 MiB
But nothing get added to database.
What am I doing wrong?
Your concern about the indexing part (no data in the database) will get resolved once the node reaches “syncing blocks” stage. Currently, your node is still only at “syncing block headers” stage. To speed up this process, start from a backup: https://docs.near.org/docs/develop/node/validator/running-a-node#starting-a-node-from-backup
As to the fact that the node dropped off the p2p network, I have no clues why that could have happened. I recommend you start with running a simple neard node and report any issues there before you get to the Indexer (which is just an extension to nearcore and thus you can use the same home & data folder)
First of all, indexer requires full archive mode. Links to 5-epoch are misleading. They are not usable for indexer.
Second (may save lots of time for downloading), indexer requires AVX extension to run. If your CPU does not AVX, don't bother building nearcore. That should be mentioned in docs. nearcore depends on some wasm, wasm requires AVX to run. Indexer will run for some time and fail miserably without AVX.

Hadoop: Diagnose long running job

I need help with diagnosing why a particular Job in Jobtracker is long-running and workarounds for improving it.
Here is an excerpt of the job in question (please pardon the formatting):
Hadoop job_201901281553_38848
User: mapred
Job-ACLs: All users are allowed
Job Setup: Successful
Status: Running
Started at: Fri Feb 01 12:39:05 CST 2019
Running for: 3hrs, 23mins, 58sec
Job Cleanup: Pending
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed
Task Attempts
map 100.00% 1177 0 0 1177 0 0 / 0
reduce 95.20% 12 0 2 10 0 0 / 0
Counter Map Reduce Total
File System Counters FILE: Number of bytes read 1,144,088,621 1,642,723,691 2,786,812,312
FILE: Number of bytes written 3,156,884,366 1,669,567,665 4,826,452,031
FILE: Number of read operations 0 0 0
FILE: Number of large read operations 0 0 0
FILE: Number of write operations 0 0 0
HDFS: Number of bytes read 11,418,749,621 0 11,418,749,621
HDFS: Number of bytes written 0 8,259,932,078 8,259,932,078
HDFS: Number of read operations 2,365 5 2,370
HDFS: Number of large read operations 0 0 0
HDFS: Number of write operations 0 12 12
Job Counters Launched map tasks 0 0 1,177
Launched reduce tasks 0 0 12
Data-local map tasks 0 0 1,020
Rack-local map tasks 0 0 157
Total time spent by all maps in occupied slots (ms) 0 0 4,379,522
Total time spent by all reduces in occupied slots (ms) 0 0 81,115,664
Map-Reduce Framework Map input records 77,266,616 0 77,266,616
Map output records 77,266,616 0 77,266,616
Map output bytes 11,442,228,060 0 11,442,228,060
Input split bytes 177,727 0 177,727
Combine input records 0 0 0
Combine output records 0 0 0
Reduce input groups 0 37,799,412 37,799,412
Reduce shuffle bytes 0 1,853,727,946 1,853,727,946
Reduce input records 0 76,428,913 76,428,913
Reduce output records 0 48,958,874 48,958,874
Spilled Records 112,586,947 62,608,254 175,195,201
CPU time spent (ms) 2,461,980 14,831,230 17,293,210
Physical memory (bytes) snapshot 366,933,626,880 9,982,947,328 376,916,574,208
Virtual memory (bytes) snapshot 2,219,448,848,384 23,215,755,264 2,242,664,603,648
Total committed heap usage (bytes) 1,211,341,733,888 8,609,333,248 1,219,951,067,136
AcsReducer ColumnDeletesOnTable- 0 3,284,862 3,284,862
ColumnDeletesOnTable- 0 3,285,695 3,285,695
ColumnDeletesOnTable- 0 3,284,862 3,284,862
ColumnDeletesOnTable- 0 129,653 129,653
ColumnDeletesOnTable- 0 129,653 129,653
ColumnDeletesOnTable- 0 129,653 129,653
ColumnDeletesOnTable- 0 129,653 129,653
ColumnDeletesOnTable- 0 517,641 517,641
ColumnDeletesOnTable- 0 23,786 23,786
ColumnDeletesOnTable- 0 594,872 594,872
ColumnDeletesOnTable- 0 597,739 597,739
ColumnDeletesOnTable- 0 595,665 595,665
ColumnDeletesOnTable- 0 36,101,345 36,101,345
ColumnDeletesOnTable- 0 11,791 11,791
ColumnDeletesOnTable- 0 11,898 11,898
ColumnDeletesOnTable-0 176 176
RowDeletesOnTable- 0 224,044 224,044
RowDeletesOnTable- 0 224,045 224,045
RowDeletesOnTable- 0 224,044 224,044
RowDeletesOnTable- 0 17,425 17,425
RowDeletesOnTable- 0 17,425 17,425
RowDeletesOnTable- 0 17,425 17,425
RowDeletesOnTable- 0 17,425 17,425
RowDeletesOnTable- 0 459,890 459,890
RowDeletesOnTable- 0 23,786 23,786
RowDeletesOnTable- 0 105,910 105,910
RowDeletesOnTable- 0 107,829 107,829
RowDeletesOnTable- 0 105,909 105,909
RowDeletesOnTable- 0 36,101,345 36,101,345
RowDeletesOnTable- 0 11,353 11,353
RowDeletesOnTable- 0 11,459 11,459
RowDeletesOnTable- 0 168 168
WholeRowDeletesOnTable- 0 129,930 129,930
deleteRowsCount 0 37,799,410 37,799,410
deleteRowsMicros 0 104,579,855,042 104,579,855,042
emitCount 0 48,958,874 48,958,874
emitMicros 0 201,996,180 201,996,180
rollupValuesCount 0 37,799,412 37,799,412
rollupValuesMicros 0 234,085,342 234,085,342
As you can see its been running almost 3.5 hours now. There were 1177 Map tasks and they complete some time ago. The Reduce phase is incomplete at 95%.
So I drill into the 'reduce' link and it takes me to the tasklist. If I drill into the first incomplete task, here it is:
Job job_201901281553_38848
All Task Attempts
Task Attempts Machine Status Progress Start Time Shuffle Finished Sort Finished Finish Time Errors Task Logs Counters Actions
attempt_201901281553_38848_r_000000_0 RUNNING 70.81% 2/1/2019 12:39 1-Feb-2019 12:39:59 (18sec) 1-Feb-2019 12:40:01 (2sec) Last 4KB 60
Last 8KB
All
From there I can see the machine/datanode running the task so i ssh into it and I look at the log (filtering on just the task in question)
from datanode $/var/log/hadoop-0.20-mapreduce/hadoop-mapred-tasktracker-.log
2019-02-01 12:39:40,836 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201901281553_38848_r_000000_0 task's state:UNASSIGNED
2019-02-01 12:39:40,838 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201901281553_38848_r_000000_0 which needs 1 slots
2019-02-01 12:39:40,838 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 21 and trying to launch attempt_201901281553_38848_r_000000_0 which needs 1 slots
2019-02-01 12:39:40,925 INFO org.apache.hadoop.mapred.TaskController: Writing commands to /disk12/mapreduce/tmp-map-data/ttprivate/taskTracker/mapred/jobcache/job_201901281553_38848/attempt_201901281553_38848_r_000000_0/taskjvm.sh
2019-02-01 12:39:41,904 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201901281553_38848_r_-819481850 given task: attempt_201901281553_38848_r_000000_0
2019-02-01 12:39:49,011 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.09402435% reduce > copy (332 of 1177 at 23.66 MB/s) >
2019-02-01 12:39:56,250 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.25233644% reduce > copy (891 of 1177 at 12.31 MB/s) >
2019-02-01 12:39:59,206 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.25233644% reduce > copy (891 of 1177 at 12.31 MB/s) >
2019-02-01 12:39:59,350 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.33333334% reduce > sort
2019-02-01 12:40:01,599 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.33333334% reduce > sort
2019-02-01 12:40:02,469 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6667039% reduce > reduce
2019-02-01 12:40:05,565 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6667039% reduce > reduce
2019-02-01 12:40:11,666 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6668788% reduce > reduce
2019-02-01 12:40:14,755 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.66691136% reduce > reduce
2019-02-01 12:40:17,838 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6670001% reduce > reduce
2019-02-01 12:40:20,930 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6671631% reduce > reduce
2019-02-01 12:40:24,016 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201901281553_38848_r_000000_0 0.6672566% reduce > reduce
.. and these lines repeat in this manner for hours..
^ so it appears the shuffle/sort phase went very quick but after that, its just the reduce phase crawling, the percentage is slowly increasing but takes hours before the task completes.
1) So that looks like the bottleneck here- am I correct in identifying the cause of my long-running job is because this task (and many tasks like it) is taking a very long time on the reduce phase for the task?
2) if so, what are my options for speeding it up?
Load appears to be reasonably low on the datanode assigned that task, as well as its iowait:
top - 15:20:03 up 124 days, 1:04, 1 user, load average: 3.85, 5.64, 5.96
Tasks: 1095 total, 2 running, 1092 sleeping, 0 stopped, 1 zombie
Cpu(s): 3.8%us, 1.5%sy, 0.9%ni, 93.6%id, 0.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 503.498G total, 495.180G used, 8517.543M free, 5397.789M buffers
Swap: 2046.996M total, 0.000k used, 2046.996M free, 432.468G cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
82236 hbase 20 0 16.9g 16g 17m S 136.9 3.3 26049:16 java
30143 root 39 19 743m 621m 13m R 82.3 0.1 1782:06 clamscan
62024 mapred 20 0 2240m 1.0g 24m S 75.1 0.2 1:21.28 java
36367 mapred 20 0 1913m 848m 24m S 11.2 0.2 22:56.98 java
36567 mapred 20 0 1898m 825m 24m S 9.5 0.2 22:23.32 java
36333 mapred 20 0 1879m 880m 24m S 8.2 0.2 22:44.28 java
36374 mapred 20 0 1890m 831m 24m S 6.9 0.2 23:15.65 java
and a snippet of iostat -xm 4:
avg-cpu: %user %nice %system %iowait %steal %idle
2.15 0.92 0.30 0.17 0.00 96.46
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 350.25 0.00 30.00 0.00 1.49 101.67 0.02 0.71 0.00 0.71 0.04 0.12
sdb 0.00 2.75 0.00 6.00 0.00 0.03 11.67 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 9.75 0.00 1.25 0.00 0.04 70.40 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 6.50 0.00 0.75 0.00 0.03 77.33 0.00 0.00 0.00 0.00 0.00 0.00
sdg 0.00 5.75 0.00 0.50 0.00 0.02 100.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 8.00 0.00 0.75 0.00 0.03 93.33 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 6.25 0.00 0.50 0.00 0.03 108.00 0.00 0.00 0.00 0.00 0.00 0.00
sdi 0.00 3.75 93.25 0.50 9.03 0.02 197.57 0.32 3.18 3.20 0.00 1.95 18.30
sdj 0.00 3.50 0.00 0.50 0.00 0.02 64.00 0.00 0.00 0.00 0.00 0.00 0.00
sdk 0.00 7.00 0.00 0.75 0.00 0.03 82.67 0.00 0.33 0.00 0.33 0.33 0.03
sdl 0.00 6.75 0.00 0.75 0.00 0.03 80.00 0.00 0.00 0.00 0.00 0.00 0.00
sdm 0.00 7.75 0.00 5.75 0.00 0.05 18.78 0.00 0.04 0.00 0.04 0.04 0.03
#<machine>:~$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 40G 5.9G 32G 16% /
tmpfs 252G 0 252G 0% /dev/shm
/dev/sda1 488M 113M 350M 25% /boot
/dev/sda8 57G 460M 54G 1% /tmp
/dev/sda7 9.8G 1.1G 8.2G 12% /var
/dev/sda5 40G 17G 21G 45% /var/log
/dev/sda6 30G 4.4G 24G 16% /var/log/audit.d
/dev/sdb1 7.2T 3.3T 3.6T 48% /disk1
/dev/sdc1 7.2T 3.3T 3.6T 49% /disk2
/dev/sdd1 7.2T 3.3T 3.6T 48% /disk3
/dev/sde1 7.2T 3.3T 3.6T 48% /disk4
/dev/sdf1 7.2T 3.3T 3.6T 48% /disk5
/dev/sdi1 7.2T 3.3T 3.6T 48% /disk6
/dev/sdg1 7.2T 3.3T 3.6T 48% /disk7
/dev/sdh1 7.2T 3.3T 3.6T 48% /disk8
/dev/sdj1 7.2T 3.3T 3.6T 48% /disk9
/dev/sdk1 7.2T 3.3T 3.6T 48% /disk10
/dev/sdm1 7.2T 3.3T 3.6T 48% /disk11
/dev/sdl1 7.2T 3.3T 3.6T 48% /disk12
This is version Hadoop 2.0.0-cdh4.3.0. Its highly-available with 3 zookeeper nodes, 2 namenodes, and 35 datanodes. YARN is not installed. Using hbase, oozie. Jobs mainly come in via Hive and HUE.
Each datanode has 2 physical cpus, each with 22 cores. Hyperthreading is enabled.
If you need more information, please let me know. My guess is maybe I need more reducers, there are mapred-site.xml settings that need tuning, perhaps the input data from map phase is too large, or Hive query needs written better. Im fairly new administrator to Hadoop, any detailed advice is great.
Thanks!

Phoneme Recognition with PocketSphinx

I need the real-time phoneme recognition from the microphone on Windows 8 Desktop. So I followed http://cmusphinx.sourceforge.net/wiki/phonemerecognition and built pocketsphinx_continuous from the subversion source in VS2013. Running it in the command line as Administrator:
D:\_SPHINX\cmusphinx-code-13103-trunk\pocketsphinx\bin\Release\Win32>pocketsphinx_continuous.exe -infile ../../../test/data/goforward.raw -hmm ../../../model/en-us/en-us -allphone ../../../model/en-us/en-us-phone.lm.bin -backtrace yes -beam 1e-20 -pbeam 1e-20 -lw 2.0
INFO: pocketsphinx.c(145): Parsed model-specific feature parameters from ../../../model/en-us/en-us/feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+000
-allphone ../../../model/en-us/en-us-phone.lm.bin
-allphone_ci no no
-alpha 0.97 9.700000e-001
-ascale 20.0 2.000000e+001
-aw 1 1
-backtrace no yes
-beam 1e-48 1.000000e-020
-bestpath yes yes
-bestpathlw 9.5 9.500000e+000
-ceplen 13 13
-cmn current current
-cmninit 8.0 40,3,-1
-compallsen no no
-debug 0
-dict
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict ../../../model/en-us/en-us/noisedict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams ../../../model/en-us/en-us/feat.params
-fillprob 1e-8 1.000000e-008
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-064
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+000
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-029
-fwdtree yes yes
-hmm ../../../model/en-us/en-us
-input_endian little little
-jsgf
-keyphrase
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-001
-kws_threshold 1 1.000000e+000
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 22
-lm
-lmctl
-lmname
-logbase 1.0001 1.000100e+000
-logfn
-logspec no no
-lowerf 133.33334 1.300000e+002
-lpbeam 1e-40 1.000000e-040
-lponlybeam 7e-29 7.000000e-029
-lw 6.5 2.000000e+000
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef ../../../model/en-us/en-us/mdef
-mean ../../../model/en-us/en-us/means
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-007
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 25
-nwpen 1.0 1.000000e+000
-pbeam 1e-48 1.000000e-020
-pip 1.0 1.000000e+000
-pl_beam 1e-10 1.000000e-010
-pl_pbeam 1e-10 1.000000e-010
-pl_pip 1.0 1.000000e+000
-pl_weight 3.0 3.000000e+000
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+004
-seed -1 -1
-sendump ../../../model/en-us/en-us/sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-003
-smoothspec no no
-svspec 0-12/13-25/26-38
-tmat ../../../model/en-us/en-us/transition_matrices
-tmatfloor 0.0001 1.000000e-004
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 6.800000e+003
-uw 1.0 1.000000e+000
-vad_postspeech 50 50
-vad_prespeech 20 20
-vad_startspeech 10 10
-vad_threshold 2.0 2.000000e+000
-var ../../../model/en-us/en-us/variances
-varfloor 0.0001 1.000000e-004
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-029
-wip 0.65 6.500000e-001
-wlen 0.025625 2.562500e-002
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(164): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: ../../../model/en-us/en-us/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: ../../../model/en-us/en-us/mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(206): Reading HMM transition probability matrices: ../../../model/en-us/en-us/transition_matrices
at the last INFO line Windows 8 throws this error:
Is anything wrong with PocketSphinx debug output, or my command-line options? Or it is a pure Windows problem? I noticed this folder: /bin/Release/Win32. My Windows 8 is 64bit on Intel NUC. Sphinxbase.dll was compiled from subversion in Debug mode, while PacketSphinx had only Release mode.
Also I read somewhere that phonemes timing information is available - how to get it?
ADDITION: following Nikolay's advice, with these parameters, I eliminated errors, but got no phonemes:
D:\_SPHINX\pocketsphinx\bin\Debug>pocketsphinx_continuous.exe -infile ../../test/data/goforward.raw -hmm ../../model/en-us/en-us -allphone ../../model/en-us/en-us.lm.dmp -backtrace yes -beam 1e-20 -pbeam 1e-20 -lw 2.0 -debug 3 -verbose yes
INFO: cmd_ln.c(697): Parsing command line:
pocketsphinx_continuous.exe \
-infile ../../test/data/goforward.raw \
-hmm ../../model/en-us/en-us \
-allphone ../../model/en-us/en-us.lm.dmp \
-backtrace yes \
-beam 1e-20 \
-pbeam 1e-20 \
-lw 2.0 \
-debug 3 \
-verbose yes
. . . .
INFO: acmod.c(252): Parsed model-specific feature parameters from ../../model/en-us/en-us/feat.params
INFO: fe_interface.c(177): Current FE Parameters:
INFO: fe_interface.c(178): Sampling Rate: 16000.000000
INFO: fe_interface.c(179): Frame Size: 410
INFO: fe_interface.c(180): Frame Shift: 160
INFO: fe_interface.c(181): FFT Size: 512
INFO: fe_interface.c(183): Lower Frequency: 130
INFO: fe_interface.c(185): Upper Frequency: 6800
INFO: fe_interface.c(186): Number of filters: 25
INFO: fe_interface.c(187): Number of Overflow Samps: 0
INFO: fe_interface.c(188): Start Utt Status: 0
INFO: fe_interface.c(190): Will not remove DC offset at frame level
INFO: fe_interface.c(196): Will not add dither to audio
INFO: fe_interface.c(200): Will apply sine-curve liftering, period 22
INFO: fe_interface.c(203): Will normalize filters to unit area
INFO: fe_interface.c(205): Will round filter frequencies to DFT points
INFO: fe_interface.c(207): Will not use double bandwidth in mel filter
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(171): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: ../../model/en-us/en-us/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: ../../model/en-us/en-us/mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(206): Reading HMM transition probability matrices: ../../model/en-us/en-us/transition_matrices
INFO: acmod.c(124): Attempting to use PTM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: ../../model/en-us/en-us/means
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: ../../model/en-us/en-us/variances
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(354): 222 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file ../../model/en-us/en-us/sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(835): Maximum top-N: 4
INFO: phone_loop_search.c(115): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 4101 * 20 bytes (80 KiB) for word entries
INFO: dict.c(342): Reading filler dictionary: ../../model/en-us/en-us/noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(345): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=19794, 2=1377200, 3=3178194
INFO: ngram_model_dmp.c(242): 19794 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(288): 1377200 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(314): 3178194 = LM.trigrams read
INFO: ngram_model_dmp.c(339): 57155 = LM.prob2 entries read
INFO: ngram_model_dmp.c(359): 10935 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(379): 34843 = LM.prob3 entries read
INFO: ngram_model_dmp.c(407): 2690 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(463): 19794 = ascii word strings read
INFO: allphone_search.c(239): Building PHMM net of 137095 phones
INFO: allphone_search.c(312): 29324 nodes, 1958591 links
INFO: allphone_search.c(611): Allphone(beam: -450, pbeam: -450)
INFO: continuous.c(299): pocketsphinx_continuous.exe COMPILED ON: Aug 23 2015, AT: 14:00:33
INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00 3.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 44.50 -4.13 0.15 6.94 4.06 -5.38 -2.56 -3.13 -6.12 -1.20 -7.44 -2.25 0.48 >
INFO: allphone_search.c(852): 214 frames, 214 HMMs (1/fr), 642 senones (3/fr), 214 history entries (1/fr)
INFO: allphone_search.c(865): allphone 0.61 CPU 0.283 xRT
INFO: allphone_search.c(867): allphone 0.62 wall 0.290 xRT
INFO: allphone_search.c(911): Hyp: SIL
INFO: pocketsphinx.c(1133): SIL (-858993460)
word start end pprob ascr lscr lback
SIL 51 264 1.000 -1627 0 0
INFO: allphone_search.c(911): Hyp: SIL
SIL
INFO: cmn_prior.c(131): cmn_prior_update: from < 44.50 -4.13 0.15 6.94 4.06 -5.38 -2.56 -3.13 -6.12 -1.20 -7.44 -2.25 0.48 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 44.50 -4.13 0.15 6.94 4.06 -5.38 -2.56 -3.13 -6.12 -1.20 -7.44 -2.25 0.48 >
INFO: allphone_search.c(852): 0 frames, 0 HMMs (0/fr), 0 senones (0/fr), 0 history entries (0/fr)
INFO: allphone_search.c(651): TOTAL fwdflat 0.61 CPU 0.285 xRT
INFO: allphone_search.c(654): TOTAL fwdflat 0.64 wall 0.298 xRT
What is the correct set of command-line parameters, to get phonemes output?

Unknown Memory leak in android

adb shell dumpsys meminfo of my package shows the following and my native allocated size increases and finally causing mobile restart. Is that any memory ? How can i fix that??
native dalvik other total limit bitmap nativeBmp
size: 445456 5955 N/A 451411 32768 N/A N/A
allocated: 445024 3726 N/A 448750 N/A 10948 1912
free: 43 2229 N/A 2272 N/A N/A N/A
(Pss): 132631 870 300292 433793 N/A N/A N/A
(shared dirty): 2532 1656 5552 9740 N/A N/A N/A
(priv dirty): 132396 708 298960 432064 N/A N/A N/A
Objects
Views: 0 ViewRoots: 0
AppContexts: 0 Activities: 0
Assets: 6 AssetManagers: 6
Local Binders: 5 Proxy Binders: 14
Death Recipients: 1
OpenSSL Sockets: 0
SQL
heap: 0 MEMORY_USED: 0
PAGECACHE_OVERFLOW: 0 MALLOC_SIZE: 50
Asset Allocations
zip:/data/app/com.outlook.screens-2.apk:/resources.arsc: 25K
zip:/data/app/com.outlook.screens-2.apk:/assets/font/RobotoCondensedRegular.
ttf: 156K
zip:/data/app/com.outlook.screens-2.apk:/assets/font/RobotoCondensedBold.ttf
: 158K
zip:/data/app/com.outlook.screens-2.apk:/assets/font/RobotoCondensedLight.tt
f: 157K
Uptime: 190161845 Realtime now=432619753

Resources