NEAR Lake Indexer - Stuck on "#9820210 - Waiting for peers" - nearprotocol

So I was trying to use the Near Lake Indexer but after completing every step, when I ran
./target/release/near-lake --home ~/.near/mainnet run --endpoint https://my-endpoint --bucket my-bucket --region us-east-1 --stream-while-syncing sync-from-latest
there is nothing other than
INFO stats: # 9820210 Waiting for peers 0 peers ⬇ 0 B/s ⬆ 0 B/s 0.00 bps 0 gas/s CPU: 0%, Mem: 32.8 MB
Why is it trying to fetch block 9820210 but not the latest one, even though I passed in "sync-from-latest"?

You need to follow the standard nearcore network sync procedures: https://github.com/near/near-lake-indexer#syncing
INFO stats: # 9820210 Waiting for peers 0 peers ⬇ 0 B/s ⬆ 0 B/s 0.00 bps 0 gas/s CPU: 0%, Mem: 32.8 MB
This means that your node does not have the network state backup, and it waits to get synced (why it is not happening is not clear to me, and needs to be asked on https://github.com/near/nearcore by providing them with the means to reproduce the issue running neard instead of the indexer).
Why is it trying to fetch block 9820210 but not the latest one, even though I passed in "sync-from-latest"?
stats log is reported by nearcore, which reports the most recent synced block on the node.

Related

JHipster Elasticsearch Fails to Start on Kubernetes

When trying to deploy the JHipster Console on Kubernetes the jhipster-elasticsearch-client pod fails to start. The pod fails with reason OOMKilled and exit code 137.
Increasing the default memory limit from 512M to 1G did not solve the issue.
The node also has plenty of memory left:
Non-terminated Pods: (9 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default gateway-mysql-5c66b69cb6-r84xb 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default jhipster-console-84c54fbd79-k8hjt 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default jhipster-elasticsearch-client-7cb576d5d7-s48mn 200m (10%) 400m (20%) 512Mi (6%) 1Gi (13%)
default jhipster-import-dashboards-s9k2g 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default jhipster-registry-0 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default jhipster-zipkin-6df799f5d8-7fhz9 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system calico-node-hc5p9 250m (12%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-cgmqj 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system tiller-deploy-5c688d5f9b-zxnnp 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
450m (22%) 400m (20%) 512Mi (6%) 1Gi (13%)
The default installation of Elasticsearch is configured with a 1 GB heap. You can configure the Docker Elasticsearch memory requirements by adding an ENV variable to your container:
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
Related Docs:
https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html

building software using bazel is slower than make?

My team has a project which is not too big, built by make -js, cost 40 seconds, when using bazel, the time incresed to 70 secs. And here is the profile of the build process of bazel. I noticed that SKYFUNCTION takes 47% of the time cost, is that reasonable?
PROFILES
the last section of it:
Type Total Count Average
ACTION 0.03% 77 0.70 ms
ACTION_CHECK 0.00% 4 0.90 ms
ACTION_EXECUTE 40.40% 77 912 ms
ACTION_UPDATE 0.00% 74 0.02 ms
ACTION_COMPLETE 0.19% 77 4.28 ms
INFO 0.00% 1 0.05 ms
VFS_STAT 1.07% 117519 0.02 ms
VFS_DIR 0.27% 4613 0.10 ms
VFS_MD5 0.22% 151 2.56 ms
VFS_DELETE 4.43% 53830 0.14 ms
VFS_OPEN 0.01% 232 0.11 ms
VFS_READ 0.06% 3523 0.03 ms
VFS_WRITE 0.00% 4 0.97 ms
WAIT 0.05% 156 0.56 ms
SKYFRAME_EVAL 6.23% 1 10.830 s
SKYFUNCTION 47.01% 687 119 ms
#ittai, #Jin, #Ondrej K, I have tried to switched off the sandboxing in bazel, it seems much faster than with it switched on. here is the comparison:
SWITCHED ON: 70s
SWITCHED OFF: 33s±2
the skyFunction still takes 47% of all the execution time. but the everage times it takes turned from 119ms to 21ms.

Why does Spark only use one executor on my 2 worker node cluster if I increase the executor memory past 5 GB?

I am using a 3 node cluster: 1 master node and 2 worker nodes, using T2.large EC2 instances.
The "free -m" command gives me the following info:
Master:
total used free shared buffers cached
Mem: 7733 6324 1409 0 221 4555
-/+ buffers/cache: 1547 6186
Swap: 1023 0 1023
Worker Node 1:
total used free shared buffers cached
Mem: 7733 3203 4530 0 185 2166
-/+ buffers/cache: 851 6881
Swap: 1023 0 1023
Worker Node 2:
total used free shared buffers cached
Mem: 7733 3402 4331 0 185 2399
-/+ buffers/cache: 817 6915
Swap: 1023 0 1023
In the yarn-site.xml file, I have the following properties set:
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>7733</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>7733</value>
</property>
In $SPARK_HOME/conf/spark-defaults.conf I am setting the spark.executor.cores at 2 and spark.executor.instances at 2.
When looking at the spark-history UI after running my spark application, both executors (1 and 2) show up in the "Executors" tab along with the driver. In the cores column on that same page, it says 2 for both executors.
When I set the executor-memory at 5G and lower, my spark application runs fine with both worker node executors running. When I set the executor memory at 6G or more, only one worker node runs an executor. Why does this happen? Note: I have tried increasing the yarn.nodemanager.resource.memory-mb and it doesn't change this behavior.

Spark + Parquet + S3n : Seems to read parquet file many times

I have the parquet files in Hive-like partitioned way on S3n bucket. The metadata files are not created, the parquet footers are in the file itself.
When I tried a sample spark job in local mode (v-1.6.0) trying to read a file of size 5.2 MB:
val filePath = "s3n://bucket/trackingPackage/dpYear=2016/dpMonth=5/dpDay=10/part-r-00004-1c86d6b0-4f6f-4770-a930-c42d77e3c729-1462833064172.gz.parquet"
val path: Path = new Path(filePath)
val conf = new SparkConf().setMaster("local[2]").set("spark.app.name", "parquet-reader-s3n").set("spark.eventLog.enabled", "true")
val sc = new SparkContext(conf)
val sqlc = new org.apache.spark.sql.SQLContext(sc)
val df = sqlc.read.parquet(filePath).select("referenceCode")
Thread.sleep(1000*10) // Intentionally given
println(df.schema)
val output = df.collect
The log generated is:
..
[22:21:56.505][main][INFO][BlockManagerMaster:58] Registered BlockManager
[22:21:56.909][main][INFO][EventLoggingListener:58] Logging events to file:/tmp/spark-events/local-1463676716372
[22:21:57.307][main][INFO][ParquetRelation:58] Listing s3n://bucket//trackingPackage/dpYear=2016/dpMonth=5/dpDay=10/part-r-00004-1c86d6b0-4f6f-4770-a930-c42d77e3c729-1462833064172.gz.parquet on driver
[22:21:59.927][main][INFO][SparkContext:58] Starting job: parquet at InspectInputSplits.scala:30
[22:21:59.942][dag-scheduler-event-loop][INFO][DAGScheduler:58] Got job 0 (parquet at InspectInputSplits.scala:30) with 2 output partitions
[22:21:59.942][dag-scheduler-event-loop][INFO][DAGScheduler:58] Final stage: ResultStage 0 (parquet at InspectInputSplits.scala:30)
[22:21:59.943][dag-scheduler-event-loop][INFO][DAGScheduler:58] Parents of final stage: List()
[22:21:59.944][dag-scheduler-event-loop][INFO][DAGScheduler:58] Missing parents: List()
[22:21:59.954][dag-scheduler-event-loop][INFO][DAGScheduler:58] Submitting ResultStage 0 (MapPartitionsRDD[1] at parquet at InspectInputSplits.scala:30), which has no missing parents
[22:22:00.218][dag-scheduler-event-loop][INFO][MemoryStore:58] Block broadcast_0 stored as values in memory (estimated size 64.5 KB, free 64.5 KB)
[22:22:00.226][dag-scheduler-event-loop][INFO][MemoryStore:58] Block broadcast_0_piece0 stored as bytes in memory (estimated size 21.7 KB, free 86.2 KB)
[22:22:00.229][dispatcher-event-loop-0][INFO][BlockManagerInfo:58] Added broadcast_0_piece0 in memory on localhost:54419 (size: 21.7 KB, free: 1088.2 MB)
[22:22:00.231][dag-scheduler-event-loop][INFO][SparkContext:58] Created broadcast 0 from broadcast at DAGScheduler.scala:1006
[22:22:00.234][dag-scheduler-event-loop][INFO][DAGScheduler:58] Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at parquet at InspectInputSplits.scala:30)
[22:22:00.235][dag-scheduler-event-loop][INFO][TaskSchedulerImpl:58] Adding task set 0.0 with 2 tasks
[22:22:00.278][dispatcher-event-loop-1][INFO][TaskSetManager:58] Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2076 bytes)
[22:22:00.281][dispatcher-event-loop-1][INFO][TaskSetManager:58] Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,PROCESS_LOCAL, 2395 bytes)
[22:22:00.290][Executor task launch worker-0][INFO][Executor:58] Running task 0.0 in stage 0.0 (TID 0)
[22:22:00.291][Executor task launch worker-1][INFO][Executor:58] Running task 1.0 in stage 0.0 (TID 1)
[22:22:00.425][Executor task launch worker-1][INFO][ParquetFileReader:151] Initiating action with parallelism: 5
[22:22:00.447][Executor task launch worker-0][INFO][ParquetFileReader:151] Initiating action with parallelism: 5
[22:22:00.463][Executor task launch worker-0][INFO][Executor:58] Finished task 0.0 in stage 0.0 (TID 0). 936 bytes result sent to driver
[22:22:00.471][task-result-getter-0][INFO][TaskSetManager:58] Finished task 0.0 in stage 0.0 (TID 0) in 213 ms on localhost (1/2)
[22:22:00.586][pool-20-thread-1][INFO][NativeS3FileSystem:619] Opening 's3n://bucket//trackingPackage/dpYear=2016/dpMonth=5/dpDay=10/part-r-00004-1c86d6b0-4f6f-4770-a930-c42d77e3c729-1462833064172.gz.parquet' for reading
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
[22:22:25.890][Executor task launch worker-1][INFO][Executor:58] Finished task 1.0 in stage 0.0 (TID 1). 4067 bytes result sent to driver
[22:22:25.898][task-result-getter-1][INFO][TaskSetManager:58] Finished task 1.0 in stage 0.0 (TID 1) in 25617 ms on localhost (2/2)
[22:22:25.898][dag-scheduler-event-loop][INFO][DAGScheduler:58] ResultStage 0 (parquet at InspectInputSplits.scala:30) finished in 25.656 s
[22:22:25.899][task-result-getter-1][INFO][TaskSchedulerImpl:58] Removed TaskSet 0.0, whose tasks have all completed, from pool
[22:22:25.905][main][INFO][DAGScheduler:58] Job 0 finished: parquet at InspectInputSplits.scala:30, took 25.977801 s
StructType(StructField(referenceCode,StringType,true))
[22:22:36.271][main][INFO][DataSourceStrategy:58] Selected 1 partitions out of 1, pruned 0.0% partitions.
[22:22:36.325][main][INFO][MemoryStore:58] Block broadcast_1 stored as values in memory (estimated size 89.3 KB, free 175.5 KB)
[22:22:36.389][main][INFO][MemoryStore:58] Block broadcast_1_piece0 stored as bytes in memory (estimated size 20.2 KB, free 195.7 KB)
[22:22:36.389][dispatcher-event-loop-0][INFO][BlockManagerInfo:58] Added broadcast_1_piece0 in memory on localhost:54419 (size: 20.2 KB, free: 1088.2 MB)
[22:22:36.391][main][INFO][SparkContext:58] Created broadcast 1 from collect at InspectInputSplits.scala:34
[22:22:36.520][main][INFO][deprecation:1174] mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
[22:22:36.522][main][INFO][ParquetRelation:58] Reading Parquet file(s) from s3n://bucket//trackingPackage/dpYear=2016/dpMonth=5/dpDay=10/part-r-00004-1c86d6b0-4f6f-4770-a930-c42d77e3c729-1462833064172.gz.parquet
[22:22:36.554][main][INFO][SparkContext:58] Starting job: collect at InspectInputSplits.scala:34
[22:22:36.556][dag-scheduler-event-loop][INFO][DAGScheduler:58] Got job 1 (collect at InspectInputSplits.scala:34) with 1 output partitions
[22:22:36.556][dag-scheduler-event-loop][INFO][DAGScheduler:58] Final stage: ResultStage 1 (collect at InspectInputSplits.scala:34)
[22:22:36.556][dag-scheduler-event-loop][INFO][DAGScheduler:58] Parents of final stage: List()
[22:22:36.557][dag-scheduler-event-loop][INFO][DAGScheduler:58] Missing parents: List()
[22:22:36.557][dag-scheduler-event-loop][INFO][DAGScheduler:58] Submitting ResultStage 1 (MapPartitionsRDD[4] at collect at InspectInputSplits.scala:34), which has no missing parents
[22:22:36.571][dag-scheduler-event-loop][INFO][MemoryStore:58] Block broadcast_2 stored as values in memory (estimated size 7.6 KB, free 203.3 KB)
[22:22:36.575][dag-scheduler-event-loop][INFO][MemoryStore:58] Block broadcast_2_piece0 stored as bytes in memory (estimated size 4.0 KB, free 207.3 KB)
[22:22:36.576][dispatcher-event-loop-1][INFO][BlockManagerInfo:58] Added broadcast_2_piece0 in memory on localhost:54419 (size: 4.0 KB, free: 1088.2 MB)
[22:22:36.577][dag-scheduler-event-loop][INFO][SparkContext:58] Created broadcast 2 from broadcast at DAGScheduler.scala:1006
[22:22:36.577][dag-scheduler-event-loop][INFO][DAGScheduler:58] Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[4] at collect at InspectInputSplits.scala:34)
[22:22:36.577][dag-scheduler-event-loop][INFO][TaskSchedulerImpl:58] Adding task set 1.0 with 1 tasks
[22:22:36.585][dispatcher-event-loop-3][INFO][TaskSetManager:58] Starting task 0.0 in stage 1.0 (TID 2, localhost, partition 0,PROCESS_LOCAL, 2481 bytes)
[22:22:36.586][Executor task launch worker-1][INFO][Executor:58] Running task 0.0 in stage 1.0 (TID 2)
[22:22:36.605][Executor task launch worker-1][INFO][ParquetRelation$$anonfun$buildInternalScan$1$$anon$1:58] Input split: ParquetInputSplit{part: s3n://bucket//trackingPackage/dpYear=2016/dpMonth=5/dpDay=10/part-r-00004-1c86d6b0-4f6f-4770-a930-c42d77e3c729-1462833064172.gz.parquet start: 0 end: 5364897 length: 5364897 hosts: []}
[22:22:38.253][Executor task launch worker-1][INFO][NativeS3FileSystem:619] Opening 's3n://bucket//trackingPackage/dpYear=2016/dpMonth=5/dpDay=10/part-r-00004-1c86d6b0-4f6f-4770-a930-c42d77e3c729-1462833064172.gz.parquet' for reading
[22:23:04.249][Executor task launch worker-1][INFO][NativeS3FileSystem:619] Opening 's3n://bucket//trackingPackage/dpYear=2016/dpMonth=5/dpDay=10/part-r-00004-1c86d6b0-4f6f-4770-a930-c42d77e3c729-1462833064172.gz.parquet' for reading
[22:23:28.337][Executor task launch worker-1][INFO][CodecPool:181] Got brand-new decompressor [.gz]
[22:23:28.400][dispatcher-event-loop-1][INFO][BlockManagerInfo:58] Removed broadcast_0_piece0 on localhost:54419 in memory (size: 21.7 KB, free: 1088.2 MB)
[22:23:28.408][Spark Context Cleaner][INFO][ContextCleaner:58] Cleaned accumulator 1
[22:23:49.993][Executor task launch worker-1][INFO][Executor:58] Finished task 0.0 in stage 1.0 (TID 2). 9376344 bytes result sent to driver
[22:23:50.191][task-result-getter-2][INFO][TaskSetManager:58] Finished task 0.0 in stage 1.0 (TID 2) in 73612 ms on localhost (1/1)
[22:23:50.191][task-result-getter-2][INFO][TaskSchedulerImpl:58] Removed TaskSet 1.0, whose tasks have all completed, from pool
[22:23:50.191][dag-scheduler-event-loop][INFO][DAGScheduler:58] ResultStage 1 (collect at InspectInputSplits.scala:34) finished in 73.612 s
[22:23:50.195][main][INFO][DAGScheduler:58] Job 1 finished: collect at InspectInputSplits.scala:34, took 73.640193 s
The SparkUI snapshot is:
Questions:
In logs, I can see that the parquet file is seen to be read in total of 3 times. One time by [pool-21-thread-1] thread (on driver) and another two times by [Executor task launch worker-1] thread, which I assume to be worker thread. On debug, I can see that before first read, two s3n requests were made specifically for the footer (it had the http header of content-range), first to get the size of the footer and then to get the footer itself. My question is: When we had the footer information, why [pool-21-thread-1] thread still had to read the entire file? And why the executor thread made 2 requests to read the s3 file?
In the spark UI, It shows that only 670 KB is being taken as input. Since I was not assured this to be true, I looked into network activity and it seems 20+ MB has been received. Snapshot attached shows nearly 5+ MB received data in first read and later on 15+ MB for the 2 reads after Thread.sleep(1000*10). I could not reach the debug point for last 2 reads by [pool-21-thread-1] thread due to IDE issues, so not sure whether the particular column ("referenceCode") is being read or the entire file. I understand that there are overhead network packets at the tcp/udp layers, but 20+ MB seems quite a lot for just one column.
After debugging into the application, it turned out that S3N still uses jets3t library but the S3A has a new implementation based on AWS SDK (
Hadoop-10400 )
The hadoop's implementation of NativeS3FileSystem does not support seek (partial content reads) on S3 files. It downloads the whole file first.
EDIT: The scenario was not seen in EMR. On EMR amazon provides a highly optimized S3 connector - emrfs for all schemes which overrides the connector provided by hadoop.

SonarQube WebServer process spikes CPU after a while

We're running SonarQube 5.1.2 on an AWS node. After a short period of use, typically a day or two, the Sonar web server becomes unresponsive and spikes the server's CPUs:
top - 01:59:47 up 2 days, 3:43, 1 user, load average: 1.89, 1.76, 1.11
Tasks: 93 total, 1 running, 92 sleeping, 0 stopped, 0 zombie
Cpu(s): 94.5%us, 0.0%sy, 0.0%ni, 5.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 7514056k total, 2828772k used, 4685284k free, 155372k buffers
Swap: 0k total, 0k used, 0k free, 872440k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2328 root 20 0 3260m 1.1g 19m S 188.3 15.5 62:51.79 java
11 root 20 0 0 0 0 S 0.3 0.0 0:07.90 events/0
2284 root 20 0 3426m 407m 19m S 0.3 5.5 9:51.04 java
1 root 20 0 19356 1536 1224 S 0.0 0.0 0:00.23 init
The 188% CPU load is coming from the WebServer process:
$ ps -eF|grep "root *2328"
root 2328 2262 2 834562 1162384 0 Mar01 ? 01:06:24 /usr/java/jre1.8.0_25/bin/java -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djruby.management.enabled=false -Djruby.compile.invokedynamic=false -Xmx768m -XX:MaxPermSize=160m -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=/opt/sonar/temp -cp ./lib/common/*:./lib/server/*:/opt/sonar/lib/jdbc/mysql/mysql-connector-java-5.1.34.jar org.sonar.server.app.WebServer /tmp/sq-process615754070383971531properties
We initially thought that we were running on way too small of a node and recently upgraded to an m3-large instance, but we're seeing the same problem (except now it's spiking 2 CPUs instead of one).
The only interesting info in the log is this:
2016.03.04 01:52:38 WARN web[o.e.transport] [sonar-1456875684135] Received response for a request that has timed out, sent [39974ms] ago, timed out [25635ms] ago, action [cluster:monitor/nodes/info], node [[#transport#-1][xxxxxxxx-build02-us-west-2b][inet[/127.0.0.1:9001]]], id [43817]
2016.03.04 01:53:19 INFO web[o.e.client.transport] [sonar-1456875684135] failed to get node info for [#transport#-1][xxxxxxxx-build02-us-west-2b][inet[/127.0.0.1:9001]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[/127.0.0.1:9001]][cluster:monitor/nodes/info] request_id [43817] timed out after [14339ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:366) ~[elasticsearch-1.4.4.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.8.0_25]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.8.0_25]
at java.lang.Thread.run(Unknown Source) [na:1.8.0_25]
Does anyone know what might be going on here or has some ideas how to further diagnose this problem?

Resources