Impala memory limit exceeded with simple count query

Impala memory limit exceeded with simple count query - hadoop

Edit:
There are some corrupt AVRO files in the table. After remove some of them, every thing works fine. I've de-compress these files to json with avro-tools and the decompressed file is not very large, either. So it seems to be some bug in Impala to handle corrupt AVRO files.
I have an Impala table with gziped AVRO format, which is partitioned by "day". When I execute the query:
select count(0) from adhoc_data_fast.log where day='2017-04-05';
It said:
Query: select count(0) from adhoc_data_fast.log where day='2017-04-05'
Query submitted at: 2017-04-06 13:35:56 (Coordinator: http://szq7.appadhoc.com:25000)
Query progress can be monitored at: http://szq7.appadhoc.com:25000/query_plan?query_id=ef4698db870efd4d:739c89ef00000000
WARNINGS:
Memory limit exceeded
GzipDecompressor failed to allocate 109051904000 bytes.
Each node is configured with 96 GB memory and the single pool memory limit is set to 300 GB.
All the files after compressed is no larger than 250MB:
62M log.2017-04-05.1491321605834.avro
79M log.2017-04-05.1491323647211.avro
62M log.2017-04-05.1491327241311.avro
60M log.2017-04-05.1491330839609.avro
52M log.2017-04-05.1491334439092.avro
59M log.2017-04-05.1491338038503.avro
93M log.2017-04-05.1491341639694.avro
130M log.2017-04-05.1491345239969.avro
147M log.2017-04-05.1491348843931.avro
183M log.2017-04-05.1491352442955.avro
218M log.2017-04-05.1491359648079.avro
181M log.2017-04-05.1491363247597.avro
212M log.2017-04-05.1491366845827.avro
207M log.2017-04-05.1491370445873.avro
197M log.2017-04-05.1491374045830.avro
164M log.2017-04-05.1491377650935.avro
155M log.2017-04-05.1491381249597.avro
203M log.2017-04-05.1491384846366.avro
185M log.2017-04-05.1491388450262.avro
198M log.2017-04-05.1491392047694.avro
206M log.2017-04-05.1491395648818.avro
214M log.2017-04-05.1491399246407.avro
167M log.2017-04-05.1491402846469.avro
77M log.2017-04-05.1491406180615.avro
3.2M log.2017-04-05.1491409790105.avro
1.3M log.2017-04-05.1491413385884.avro
928K log.2017-04-05.1491416981829.avro
832K log.2017-04-05.1491420581588.avro
1.1M log.2017-04-05.1491424180191.avro
2.6M log.2017-04-05.1491427781339.avro
3.8M log.2017-04-05.1491431382552.avro
3.3M log.2017-04-05.1491434984679.avro
5.2M log.2017-04-05.1491438586674.avro
5.1M log.2017-04-05.1491442192541.avro
2.3M log.2017-04-05.1491445789230.avro
884K log.2017-04-05.1491449386630.avro
And I've get them from HDFS and use avro-tools to convert them to json in order to decompress them. The decompressed files are no larger than 1GB:
16M log.2017-04-05.1491321605834.avro.json
308M log.2017-04-05.1491323647211.avro.json
103M log.2017-04-05.1491327241311.avro.json
150M log.2017-04-05.1491330839609.avro.json
397M log.2017-04-05.1491334439092.avro.json
297M log.2017-04-05.1491338038503.avro.json
160M log.2017-04-05.1491341639694.avro.json
95M log.2017-04-05.1491345239969.avro.json
360M log.2017-04-05.1491348843931.avro.json
338M log.2017-04-05.1491352442955.avro.json
71M log.2017-04-05.1491359648079.avro.json
161M log.2017-04-05.1491363247597.avro.json
628M log.2017-04-05.1491366845827.avro.json
288M log.2017-04-05.1491370445873.avro.json
162M log.2017-04-05.1491374045830.avro.json
90M log.2017-04-05.1491377650935.avro.json
269M log.2017-04-05.1491381249597.avro.json
620M log.2017-04-05.1491384846366.avro.json
70M log.2017-04-05.1491388450262.avro.json
30M log.2017-04-05.1491392047694.avro.json
114M log.2017-04-05.1491395648818.avro.json
370M log.2017-04-05.1491399246407.avro.json
359M log.2017-04-05.1491402846469.avro.json
218M log.2017-04-05.1491406180615.avro.json
29M log.2017-04-05.1491409790105.avro.json
3.9M log.2017-04-05.1491413385884.avro.json
9.3M log.2017-04-05.1491416981829.avro.json
8.3M log.2017-04-05.1491420581588.avro.json
2.3M log.2017-04-05.1491424180191.avro.json
25M log.2017-04-05.1491427781339.avro.json
24M log.2017-04-05.1491431382552.avro.json
5.7M log.2017-04-05.1491434984679.avro.json
35M log.2017-04-05.1491438586674.avro.json
5.8M log.2017-04-05.1491442192541.avro.json
23M log.2017-04-05.1491445789230.avro.json
4.3M log.2017-04-05.1491449386630.avro.json
And here is the Impala profiling:
[szq7.appadhoc.com:21000] > profile;
Query Runtime Profile:
Query (id=ef4698db870efd4d:739c89ef00000000):
Summary:
Session ID: f54bb090170bcdb6:621ac5796ef2668c
Session Type: BEESWAX
Start Time: 2017-04-06 13:35:56.454441000
End Time: 2017-04-06 13:35:57.326967000
Query Type: QUERY
Query State: EXCEPTION
Query Status:
Memory limit exceeded
GzipDecompressor failed to allocate 109051904000 bytes.
Impala Version: impalad version 2.7.0-cdh5.9.1 RELEASE (build 24ad6df788d66e4af9496edb26ac4d1f1d2a1f2c)
User: ubuntu
Connected User: ubuntu
Delegated User:
Network Address: ::ffff:192.168.1.7:29026
Default Db: default
Sql Statement: select count(0) from adhoc_data_fast.log where day='2017-04-05'
Coordinator: szq7.appadhoc.com:22000
Query Options (non default):
Plan:
----------------
Estimated Per-Host Requirements: Memory=410.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
adhoc_data_fast.log
03:AGGREGATE [FINALIZE]
| output: count:merge(0)
| hosts=13 per-host-mem=unavailable
| tuple-ids=1 row-size=8B cardinality=1
|
02:EXCHANGE [UNPARTITIONED]
| hosts=13 per-host-mem=unavailable
| tuple-ids=1 row-size=8B cardinality=1
|
01:AGGREGATE
| output: count(0)
| hosts=13 per-host-mem=10.00MB
| tuple-ids=1 row-size=8B cardinality=1
|
00:SCAN HDFS [adhoc_data_fast.log, RANDOM]
partitions=1/7594 files=38 size=3.45GB
table stats: unavailable
column stats: all
hosts=13 per-host-mem=400.00MB
tuple-ids=0 row-size=0B cardinality=unavailable
----------------
Estimated Per-Host Mem: 429916160
Estimated Per-Host VCores: 1
Tables Missing Stats: adhoc_data_fast.log
Request Pool: default-pool
Admission result: Admitted immediately
ExecSummary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
-------------------------------------------------------------------------------------------------------------
03:AGGREGATE 1 52.298ms 52.298ms 0 1 4.00 KB -1.00 B FINALIZE
02:EXCHANGE 1 676.993ms 676.993ms 0 1 0 -1.00 B UNPARTITIONED
01:AGGREGATE 0 0.000ns 0.000ns 0 1 0 10.00 MB
00:SCAN HDFS 0 0.000ns 0.000ns 0 -1 0 400.00 MB adhoc_data_fast.log
Planner Timeline: 69.589ms
- Analysis finished: 6.642ms (6.642ms)
- Equivalence classes computed: 6.980ms (337.753us)
- Single node plan created: 13.302ms (6.322ms)
- Runtime filters computed: 13.368ms (65.984us)
- Distributed plan created: 15.131ms (1.763ms)
- Lineage info computed: 16.488ms (1.356ms)
- Planning finished: 69.589ms (53.101ms)
Query Timeline: 874.026ms
- Start execution: 63.320us (63.320us)
- Planning finished: 72.764ms (72.701ms)
- Submit for admission: 73.592ms (827.496us)
- Completed admission: 73.775ms (183.088us)
- Ready to start 13 remote fragments: 126.950ms (53.175ms)
- All 13 remote fragments started: 161.919ms (34.968ms)
- Rows available: 856.761ms (694.842ms)
- Unregister query: 872.527ms (15.765ms)
- ComputeScanRangeAssignmentTimer: 356.136us
ImpalaServer:
- ClientFetchWaitTimer: 0.000ns
- RowMaterializationTimer: 0.000ns
Execution Profile ef4698db870efd4d:739c89ef00000000:(Total: 782.712ms, non-child: 0.000ns, % non-child: 0.00%)
Number of filters: 0
Filter routing table:
ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed Enabled
----------------------------------------------------------------------------------------------------------------------------
Fragment start latencies: Count: 13, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 2ms, 95th %-ile: 2ms, 99.9th %-ile: 35ms
Per Node Peak Memory Usage: szq15.appadhoc.com:22000(0) szq1.appadhoc.com:22000(0) szq13.appadhoc.com:22000(0) szq12.appadhoc.com:22000(0) szq11.appadhoc.com:22000(0) szq20.appadhoc.com:22000(0) szq14.appadhoc.com:22000(0) szq8
.appadhoc.com:22000(0) szq5.appadhoc.com:22000(0) szq9.appadhoc.com:22000(0) szq4.appadhoc.com:22000(0) szq6.appadhoc.com:22000(0) szq7.appadhoc.com:22000(0)
- FiltersReceived: 0 (0)
- FinalizationTimer: 0.000ns
Coordinator Fragment F01:(Total: 729.811ms, non-child: 0.000ns, % non-child: 0.00%)
MemoryUsage(500.000ms): 12.00 KB
- AverageThreadTokens: 0.00
- BloomFilterBytes: 0
- PeakMemoryUsage: 12.00 KB (12288)
- PerHostPeakMemUsage: 0
- PrepareTime: 52.291ms
- RowsProduced: 0 (0)
- TotalCpuTime: 0.000ns
- TotalNetworkReceiveTime: 676.991ms
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 0.000ns
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 102.40 GB (109951164416)
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalIntegrityCheckTime: 0.000ns
- TotalReadBlockTime: 0.000ns
CodeGen:(Total: 63.837ms, non-child: 63.837ms, % non-child: 100.00%)
- CodegenTime: 828.728us
- CompileTime: 2.957ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.89 MB (1984232)
- NumFunctions: 7 (7)
- NumInstructions: 96 (96)
- OptimizationTime: 8.070ms
- PrepareTime: 51.769ms
AGGREGATION_NODE (id=3):(Total: 729.291ms, non-child: 52.298ms, % non-child: 7.17%)
ExecOption: Codegen Enabled
- BuildTime: 0.000ns
- GetResultsTime: 0.000ns
- HTResizeTime: 0.000ns
- HashBuckets: 0 (0)
- LargestPartitionPercent: 0 (0)
- MaxPartitionLevel: 0 (0)
- NumRepartitions: 0 (0)
- PartitionsCreated: 0 (0)
- PeakMemoryUsage: 4.00 KB (4096)
- RowsRepartitioned: 0 (0)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SpilledPartitions: 0 (0)
EXCHANGE_NODE (id=2):(Total: 676.993ms, non-child: 676.993ms, % non-child: 100.00%)
BytesReceived(500.000ms): 0
- BytesReceived: 0
- ConvertRowBatchTime: 0.000ns
- DeserializeRowBatchTimer: 0.000ns
- FirstBatchArrivalWaitTime: 0.000ns
- PeakMemoryUsage: 0
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SendersBlockedTimer: 0.000ns
- SendersBlockedTotalTimer(*): 0.000ns
Averaged Fragment F00:
split sizes: min: 114.60 MB, max: 451.79 MB, avg: 271.65 MB, stddev: 104.16 MB
completion times: min:694.632ms max:728.356ms mean: 725.379ms stddev:8.878ms
execution rates: min:157.45 MB/sec max:620.68 MB/sec mean:374.89 MB/sec stddev:144.30 MB/sec
num instances: 13
Fragment F00:
Instance ef4698db870efd4d:739c89ef00000001 (host=szq5.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000002 (host=szq8.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000003 (host=szq14.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000004 (host=szq20.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000005 (host=szq11.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000006 (host=szq12.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000007 (host=szq13.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000008 (host=szq1.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000009 (host=szq15.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef0000000a (host=szq6.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef0000000b (host=szq4.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef0000000c (host=szq9.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef0000000d (host=szq7.appadhoc.com:22000):
So why Impala needs so many memory?

It could be that Impala is missing statistics on your table for that partition. The explain plan highlights the following:
Estimated Per-Host Requirements: Memory=410.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
adhoc_data_fast.log
Try running a COMPUTE STATS on the table, or a COMPUTE INCREMENTAL STATS for the partition.
e.g.
COMPUTE INCREMENTAL STATS adhoc_data_fast.log PARTITION (day='2017-04-05');
This will help Impala when it does its resource planning. I would be surprised if this fixes it, but worth a shot initially.

Related

Can't add node to the cockroachde cluster

I'm staking to join a CockroachDB node to a cluster.
I've created first cluster, then try to join 2nd node to the first node, but 2nd node created new cluster as follows.
Does anyone knows whats are wrong steps on the following my steps, any suggestions are wellcome.
I've started first node as follows:
cockroach start --insecure --advertise-host=163.172.156.111
* Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v19.1/secure-a-cluster.html
*
CockroachDB node starting at 2019-05-11 01:11:15.45522036 +0000 UTC (took 2.5s)
build: CCL v19.1.0 # 2019/04/29 18:36:40 (go1.11.6)
webui: http://163.172.156.111:8080
sql: postgresql://root#163.172.156.111:26257?sslmode=disable
client flags: cockroach <client cmd> --host=163.172.156.111:26257 --insecure
logs: /home/ueda/cockroach-data/logs
temp dir: /home/ueda/cockroach-data/cockroach-temp449555924
external I/O path: /home/ueda/cockroach-data/extern
store[0]: path=/home/ueda/cockroach-data
status: initialized new cluster
clusterID: 3e797faa-59a1-4b0d-83b5-36143ddbdd69
nodeID: 1
Then, start secondary node to join to 163.172.156.111, but can't join:
cockroach start --insecure --advertise-addr=128.199.127.164 --join=163.172.156.111:26257
CockroachDB node starting at 2019-05-11 01:21:14.533097432 +0000 UTC (took 0.8s)
build: CCL v19.1.0 # 2019/04/29 18:36:40 (go1.11.6)
webui: http://128.199.127.164:8080
sql: postgresql://root#128.199.127.164:26257?sslmode=disable
client flags: cockroach <client cmd> --host=128.199.127.164:26257 --insecure
logs: /home/ueda/cockroach-data/logs
temp dir: /home/ueda/cockroach-data/cockroach-temp067740997
external I/O path: /home/ueda/cockroach-data/extern
store[0]: path=/home/ueda/cockroach-data
status: restarted pre-existing node
clusterID: a14e89a7-792d-44d3-89af-7037442eacbc
nodeID: 1
The cockroach.log of joining node shows some gosip error:
cat cockroach-data/logs/cockroach.log
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] file created at: 2019/05/11 01:21:13
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] running on machine: amfortas
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] binary: CockroachDB CCL v19.1.0 (x86_64-unknown-linux-gnu, built 2019/04/29 18:36:40, go1.11.6)
I190511 01:21:13.762309 1 util/log/clog.go:1199 [config] arguments: [cockroach start --insecure --advertise-addr=128.199.127.164 --join=163.172.156.111:26257]
I190511 01:21:13.762309 1 util/log/clog.go:1199 line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓
I190511 01:21:13.762307 1 cli/start.go:1033 logging to directory /home/ueda/cockroach-data/logs
W190511 01:21:13.763373 1 cli/start.go:1068 RUNNING IN INSECURE MODE!
- Your cluster is open for any client that can access <all your IP addresses>.
- Any user, even root, can log in without providing a password.
- Any user, connecting as root, can read or write any data in your cluster.
- There is no network encryption nor authentication, and thus no confidentiality.
Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v19.1/secure-a-cluster.html
I190511 01:21:13.763675 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
W190511 01:21:13.763752 1 cli/start.go:944 Using the default setting for --cache (128 MiB).
A significantly larger value is usually needed for good performance.
If you have a dedicated server a reasonable setting is --cache=.25 (248 MiB).
I190511 01:21:13.764011 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
W190511 01:21:13.764047 1 cli/start.go:957 Using the default setting for --max-sql-memory (128 MiB).
A significantly larger value is usually needed in production.
If you have a dedicated server a reasonable setting is --max-sql-memory=.25 (248 MiB).
I190511 01:21:13.764239 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
I190511 01:21:13.764272 1 cli/start.go:1082 CockroachDB CCL v19.1.0 (x86_64-unknown-linux-gnu, built 2019/04/29 18:36:40, go1.11.6)
I190511 01:21:13.866977 1 server/status/recorder.go:610 available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
I190511 01:21:13.867002 1 server/config.go:386 system total memory: 992 MiB
I190511 01:21:13.867063 1 server/config.go:388 server configuration:
max offset 500000000
cache size 128 MiB
SQL memory pool size 128 MiB
scan interval 10m0s
scan min idle time 10ms
scan max idle time 1s
event log enabled true
I190511 01:21:13.867098 1 cli/start.go:929 process identity: uid 1000 euid 1000 gid 1000 egid 1000
I190511 01:21:13.867115 1 cli/start.go:554 starting cockroach node
I190511 01:21:13.868242 21 storage/engine/rocksdb.go:613 opening rocksdb instance at "/home/ueda/cockroach-data/cockroach-temp067740997"
I190511 01:21:13.894320 21 server/server.go:876 [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled
I190511 01:21:13.894813 21 storage/engine/rocksdb.go:613 opening rocksdb instance at "/home/ueda/cockroach-data"
W190511 01:21:13.896301 21 storage/engine/rocksdb.go:127 [rocksdb] [/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/version_set.cc:2566] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
W190511 01:21:13.905666 21 storage/engine/rocksdb.go:127 [rocksdb] [/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/version_set.cc:2566] More existing levels in DB than needed. max_bytes_for_level_multiplier may not be guaranteed.
I190511 01:21:13.911380 21 server/config.go:494 [n?] 1 storage engine initialized
I190511 01:21:13.911417 21 server/config.go:497 [n?] RocksDB cache size: 128 MiB
I190511 01:21:13.911427 21 server/config.go:497 [n?] store 0: RocksDB, max size 0 B, max open file limit 10000
W190511 01:21:13.912459 21 gossip/gossip.go:1496 [n?] no incoming or outgoing connections
I190511 01:21:13.913206 21 server/server.go:926 [n?] Sleeping till wall time 1557537673913178595 to catches up to 1557537674394265598 to ensure monotonicity. Delta: 481.087003ms
I190511 01:21:14.251655 65 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:322 [n?] circuitbreaker: gossip [::]:26257->163.172.156.111:26257 tripped: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69"
I190511 01:21:14.251695 65 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:447 [n?] circuitbreaker: gossip [::]:26257->163.172.156.111:26257 event: BreakerTripped
W190511 01:21:14.251763 65 gossip/client.go:122 [n?] failed to start gossip client to 163.172.156.111:26257: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69"
I190511 01:21:14.395848 21 gossip/gossip.go:392 [n1] NodeDescriptor set to node_id:1 address:<network_field:"tcp" address_field:"128.199.127.164:26257" > attrs:<> locality:<> ServerVersion:<major_val:19 minor_val:1 patch:0 unstable:0 > build_tag:"v19.1.0" started_at:1557537674395557548
W190511 01:21:14.458176 21 storage/replica_range_lease.go:506 can't determine lease status due to node liveness error: node not in the liveness table
I190511 01:21:14.458465 21 server/node.go:461 [n1] initialized store [n1,s1]: disk (capacity=24 GiB, available=18 GiB, used=2.2 MiB, logicalBytes=41 MiB), ranges=20, leases=0, queries=0.00, writes=0.00, bytesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=6467.00 p90=26940.00 pMax=43017435.00}, writesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=0.00 p90=0.00 pMax=0.00}
I190511 01:21:14.458775 21 storage/stores.go:244 [n1] read 0 node addresses from persistent storage
I190511 01:21:14.459095 21 server/node.go:699 [n1] connecting to gossip network to verify cluster ID...
W190511 01:21:14.469842 96 storage/store.go:1525 [n1,s1,r6/1:/Table/{SystemCon…-11}] could not gossip system config: [NotLeaseHolderError] r6: replica (n1,s1):1 not lease holder; lease holder unknown
I190511 01:21:14.474785 21 server/node.go:719 [n1] node connected via gossip and verified as part of cluster "a14e89a7-792d-44d3-89af-7037442eacbc"
I190511 01:21:14.475033 21 server/node.go:542 [n1] node=1: started with [<no-attributes>=/home/ueda/cockroach-data] engine(s) and attributes []
I190511 01:21:14.475393 21 server/status/recorder.go:610 [n1] available memory from cgroups (8.0 EiB) exceeds system memory 992 MiB, using system memory
I190511 01:21:14.475514 21 server/server.go:1582 [n1] starting http server at [::]:8080 (use: 128.199.127.164:8080)
I190511 01:21:14.475572 21 server/server.go:1584 [n1] starting grpc/postgres server at [::]:26257
I190511 01:21:14.475605 21 server/server.go:1585 [n1] advertising CockroachDB node at 128.199.127.164:26257
W190511 01:21:14.475655 21 jobs/registry.go:341 [n1] unable to get node liveness: node not in the liveness table
I190511 01:21:14.532949 21 server/server.go:1650 [n1] done ensuring all necessary migrations have run
I190511 01:21:14.533020 21 server/server.go:1653 [n1] serving sql connections
I190511 01:21:14.533209 21 cli/start.go:689 [config] clusterID: a14e89a7-792d-44d3-89af-7037442eacbc
I190511 01:21:14.533257 21 cli/start.go:697 node startup completed:
CockroachDB node starting at 2019-05-11 01:21:14.533097432 +0000 UTC (took 0.8s)
build: CCL v19.1.0 # 2019/04/29 18:36:40 (go1.11.6)
webui: http://128.199.127.164:8080
sql: postgresql://root#128.199.127.164:26257?sslmode=disable
client flags: cockroach <client cmd> --host=128.199.127.164:26257 --insecure
logs: /home/ueda/cockroach-data/logs
temp dir: /home/ueda/cockroach-data/cockroach-temp067740997
external I/O path: /home/ueda/cockroach-data/extern
store[0]: path=/home/ueda/cockroach-data
status: restarted pre-existing node
clusterID: a14e89a7-792d-44d3-89af-7037442eacbc
nodeID: 1
I190511 01:21:14.541205 146 server/server_update.go:67 [n1] no need to upgrade, cluster already at the newest version
I190511 01:21:14.555557 149 sql/event_log.go:135 [n1] Event: "node_restart", target: 1, info: {Descriptor:{NodeID:1 Address:128.199.127.164:26257 Attrs: Locality: ServerVersion:19.1 BuildTag:v19.1.0 StartedAt:1557537674395557548 LocalityAddress:[] XXX_NoUnkeyedLiteral:{} XXX_sizecache:0} ClusterID:a14e89a7-792d-44d3-89af-7037442eacbc StartedAt:1557537674395557548 LastUp:1557537671113461486}
I190511 01:21:14.916458 59 gossip/gossip.go:1510 [n1] node has connected to cluster via gossip
I190511 01:21:14.916660 59 storage/stores.go:263 [n1] wrote 0 node addresses to persistent storage
I190511 01:21:24.480247 116 storage/store.go:4220 [n1,s1] sstables (read amplification = 2):
0 [ 51K 1 ]: 51K
6 [ 1M 1 ]: 1M
I190511 01:21:24.480380 116 storage/store.go:4221 [n1,s1]
** Compaction Stats [default] **
Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
L0 1/0 50.73 KB 0.5 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 8.0 0 1 0.006 0 0
L6 1/0 1.26 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
Sum 2/0 1.31 MB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 8.0 0 1 0.006 0 0
Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 8.0 0 1 0.006 0 0
Uptime(secs): 10.6 total, 10.6 interval
Flush(GB): cumulative 0.000, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
estimated_pending_compaction_bytes: 0 B
I190511 01:21:24.481565 121 server/status/runtime.go:500 [n1] runtime stats: 170 MiB RSS, 114 goroutines, 0 B/0 B/0 B GO alloc/idle/total, 14 MiB/16 MiB CGO alloc/total, 0.0 CGO/sec, 0.0/0.0 %(u/s)time, 0.0 %gc (7x), 50 KiB/1.5 MiB (r/w)net
What is the possibly cause to block to join? Thank you for your suggestion!

It seems you had previously started the second node (the one running on 128.199.127.164) by itself, creating its own cluster.
This can be seen in the error message:
W190511 01:21:14.251763 65 gossip/client.go:122 [n?] failed to start gossip client to 163.172.156.111:26257: initial connection heartbeat failed: rpc error: code = Unknown desc = client cluster ID "a14e89a7-792d-44d3-89af-7037442eacbc" doesn't match server cluster ID "3e797faa-59a1-4b0d-83b5-36143ddbdd69"
To be able to join the cluster, the data directory of the joining node must be empty. You can either delete cockroach-data or specify an alternate directory with --store=/path/to/data-dir

Is hive.exec.parallel broken?

Apparently, there is a reason why hive.exec.parallel is false by default.
When I set it to true (as recommended by an answer to my previous question), my process dies with this message:
MapReduce Jobs Launched:
Job 0: Map: 2 Reduce: 1 Cumulative CPU: 6.43 sec HDFS Read: 556 HDFS Write: 96 SUCCESS
Job 1: Map: 1 Reduce: 1 Cumulative CPU: 3.15 sec HDFS Read: 475 HDFS Write: 96 SUCCESS
Job 2: Map: 1 Reduce: 1 Cumulative CPU: 3.36 sec HDFS Read: 475 HDFS Write: 96 SUCCESS
Job 3: Map: 1 Reduce: 1 Cumulative CPU: 2.19 sec HDFS Read: 475 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 15 seconds 130 msec
OK
normalized_keyword pixel_id count sum_log events
Time taken: 72.419 seconds
...
14.98user 0.62system 1:16.79elapsed 20%CPU (0avgtext+0avgdata 851392maxresident)k
8inputs+2096outputs (0major+83271minor)pagefaults 0swaps
text: java.io.EOFException
at java.io.DataInputStream.readShort(Unknown Source)
at org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:113)
at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:81)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:254)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:304)
No useful data is produced. set hive.exec.parallel.thread.number=2 has no effect (same failure)
Suggestions?
EDIT: hive --version does not work, but when it starts, it prints
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar!/hive-log4j.properties
so, I guess, the version is 0.10.0.

Cassandra read latency high even with row caching, why?

I am testing cassandra performance with a simple model.
CREATE TABLE "NoCache" (
key ascii,
column1 ascii,
value ascii,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='ALL' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
I am fetching 100 columns of a row key using pycassa, get/xget function (). but getting read latency about 15ms in the server.
colums=COL_FAM.get(row_key, column_count=100)
nodetool cfstats
Column Family: NoCache
SSTable count: 1
Space used (live): 103756053
Space used (total): 103756053
Number of Keys (estimate): 128
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 20
Read Latency: 15.717 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Bloom Filter False Positives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 976
Compacted row minimum size: 4769
Compacted row maximum size: 557074610
Compacted row mean size: 87979499
Latency of this type is amazing! When nodetool info shows that read hits directly in the row cache.
Row Cache : size 4834713 (bytes), capacity 67108864 (bytes), 35 hits, 38 requests, 1.000 recent hit rate, 0 save period in seconds
Can anyone tell me why is cassandra taking so much time while reading from row cache?

Enable tracing and see what it's doing. http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2

Caching not Working in Cassandra

I dont seem to have any caching enabled when checking in Opscenter or cfstats. Im running Cassandra 1.1.7 with Solandra on Debian. I have set the required global options in cassandra.yaml:
key_cache_size_in_mb: 800
key_cache_save_period: 14400
row_cache_size_in_mb: 800
row_cache_save_period: 15400
row_cache_provider: SerializingCacheProvider
Column Families were created as follows:
create column family example
with column_type = 'Standard'
and comparator = 'BytesType'
and default_validation_class = 'BytesType'
and key_validation_class = 'BytesType'
and read_repair_chance = 1.0
and dclocal_read_repair_chance = 0.0
and gc_grace = 864000
and min_compaction_threshold = 4
and max_compaction_threshold = 32
and replicate_on_write = true
and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
and caching = 'ALL';
Opscenter shows no data available on caching graphs and CFSTATS doesn't show any cache related fields:
Column Family: charsets
SSTable count: 1
Space used (live): 5558
Space used (total): 5558
Number of Keys (estimate): 128
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 61381
Read Latency: 0.123 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Bloom Filter False Postives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 16
Compacted row minimum size: 1917
Compacted row maximum size: 2299
Compacted row mean size: 2299
Any help or suggestions are appreciated.
Sam

The caching stats have been moved from cfstats to info in Cassandra 1.1. If you run nodetool info you should see something like:
Key Cache : size 5552 (bytes), capacity 838860800 (bytes), 38 hits, 47 requests, 0.809 recent hit rate, 14400 save period in seconds
Row Cache : size 0 (bytes), capacity 838860800 (bytes), 0 hits, 0 requests, NaN recent hit rate, 15400 save period in seconds
This is because there are now global caches, rather than per-CF. It seems that Opscenter needs updating for this change - maybe there is a later version available that will work.

Need help analysing the VarnishStat results

I am a newbie with Varnish. I have successfully installed it and now its working, but I need some guidance from the more knowledgeable people about how the server is performing.
I read this article - http://kristianlyng.wordpress.com/2009/12/08/varnishstat-for-dummies/ but I am still not sure howz the server performance.
The server has been running since last 9 hours. I understand that more content will be cached with time so cache hit ratio will better, but right now my concern is about intermediate help from your side on server performance.
Hitrate ratio: 10 100 613
Hitrate avg: 0.2703 0.3429 0.4513
239479 8.00 7.99 client_conn - Client connections accepted
541129 13.00 18.06 client_req - Client requests received
157594 1.00 5.26 cache_hit - Cache hits
3 0.00 0.00 cache_hitpass - Cache hits for pass
313499 9.00 10.46 cache_miss - Cache misses
67377 4.00 2.25 backend_conn - Backend conn. success
316739 7.00 10.57 backend_reuse - Backend conn. reuses
910 0.00 0.03 backend_toolate - Backend conn. was closed
317652 8.00 10.60 backend_recycle - Backend conn. recycles
584 0.00 0.02 backend_retry - Backend conn. retry
3 0.00 0.00 fetch_head - Fetch head
314040 9.00 10.48 fetch_length - Fetch with Length
4139 0.00 0.14 fetch_chunked - Fetch chunked
5 0.00 0.00 fetch_close - Fetch wanted close
386 . . n_sess_mem - N struct sess_mem
55 . . n_sess - N struct sess
313452 . . n_object - N struct object
313479 . . n_objectcore - N struct objectcore
38474 . . n_objecthead - N struct objecthead
368 . . n_waitinglist - N struct waitinglist
12 . . n_vbc - N struct vbc
61 . . n_wrk - N worker threads
344 0.00 0.01 n_wrk_create - N worker threads created
2935 0.00 0.10 n_wrk_queued - N queued work requests
1 . . n_backend - N backends
47 . . n_expired - N expired objects
149425 . . n_lru_moved - N LRU moved objects
1 0.00 0.00 losthdr - HTTP header overflows
461727 10.00 15.41 n_objwrite - Objects sent with write
239468 8.00 7.99 s_sess - Total Sessions
541129 13.00 18.06 s_req - Total Requests
64678 3.00 2.16 s_pipe - Total pipe
5346 0.00 0.18 s_pass - Total pass
318187 9.00 10.62 s_fetch - Total fetch
193589421 3895.84 6459.66 s_hdrbytes - Total header bytes
4931971067 14137.41 164569.09 s_bodybytes - Total body bytes
117585 3.00 3.92 sess_closed - Session Closed
2283 0.00 0.08 sess_pipeline - Session Pipeline
892 0.00 0.03 sess_readahead - Session Read Ahead
458468 10.00 15.30 sess_linger - Session Linger
414010 9.00 13.81 sess_herd - Session herd
36912073 880.96 1231.68 shm_records - SHM records

What VCL are you using? If the answer is 'none' then you are probably not getting a very good hitrate. On a fresh install, Varnish is quite conservative about what it caches (and rightly so), but you can probably improve matters by reading how to achieve a high hitrate. If it's safe to, you can selectively unset cookies and normalise requests with your VCL, which will result in fewer backend calls.
How much of your website is cacheable? Is your object cache big enough? If you can answer those two questions, you ought to be able to achieve a great hitrate with Varnish.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Impala memory limit exceeded with simple count query - hadoop

Related

Can't add node to the cockroachde cluster

Is hive.exec.parallel broken?

Cassandra read latency high even with row caching, why?

Caching not Working in Cassandra

Need help analysing the VarnishStat results

Categories

Resources