I am using Grafana version 5.1.3 (commit: 087143285) ,InfluxDB shell version: 1.5.2 along with jmeter.
There are 13 panels. Panel is taking 5 to 8 seconds to load.
Below query is running for panel:(When I run the same query on db server it is running very fast )
SELECT mean(“startedThreads”) FROM “virtualUsers” WHERE time >= 1537865329564ms and time <= 1537867129564ms GROUP BY time(60s) fill(null);
EXPLAIN ANALYZE
execution_time: 157.341µs
planning_time: 626.44µs
total_time: 783.781µs
SELECT count(“responseTime”)/60 FROM “requestsRaw” WHERE time >= 1537865329564ms and time <= 1537867129564ms GROUP BY time(60s) fill(null)"
execution_time: 535.011µs
planning_time: 1.805892ms
total_time: 2.340903ms
Below is memory and cpu details.Influx db and Grafans are hosted on same server.
free -g
total used free shared buff/cache available
Mem: 15 3 11 0 1 12
Swap: 7 0 6
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
And as per my initial understanding Grafana minimum memory requirement is 249MB.So memory is not problem for Grafana.
Please let me if you need more details.
It is odd that the query runs fast while Grafana needs a long time. Panels should be diplayed as soon as Grafana gets a response.
Since rendering is done in the Browser AFAIK this could be the bottleneck. So if your Browser runs on a Raspberry Pi 1, please try using a different computer.
It is not clear if all Panels need a long time to load or if it is just one Panel that needs a long time. You should try to find out if the loading time is related to just one Panel.
Lastly consider that all queries are sent at the same time, so making just one query to the server from CLI may not be representative. You could try to spliting the Dashboard in multiple Dashboards to improve the loading time.
Related
I have a large database (100M rows) indexed by SphinxSearch. Each search takes 0.1-0.5s. However, if I run 10 searches concurrently, they take 20s on average.
Is it the expected behaviour of SphinxSearch?
Should I adjust the config or move to another search engine for concurrency?
My config file is simple:
searchd
{
listen = 9312
listen = 9306:mysql41
pid_file = /var/searchd.pid
read_timeout = 30
log = /var/log/sphinxsearch/searchd.log
query_log = /var/log/sphinxsearch/query.log
}
Is it the expected behaviour of SphinxSearch?
It heavily depends on the number of CPUs. If you have more than 10 physical CPUs then latency degradation from 0.5 sec to 20 sec by increasing the concurrency from 1 to 10 is definitely not expected. In this case first of all make sure all your CPUs are busy under the concurrency load. If it's not - depending on your Sphinx version and multi-tasking mode let it run with more threads.
Should I adjust the config or move to another search engine for concurrency?
I recommend Manticore Search as:
it's open source - https://github.com/manticoresoftware/manticoresearch/
it's the only fork of Sphinx and if you are familiar with Sphinx in general it shouldn't be a problem to migrate
hundreds of bugs have been fixed
the multi-tasking mode is completely different (coroutines)
Using the Android Management API, I'm trying to collect the device's storage consumption information.
I found some information in memoryInfo and memoryEvents.
In memoryInfo there is an attribute called "totalInternalStorage" and in "memoryEvents there is" an event of type "INTERNAL_STORAGE_MEASURED".
Questions:
Please, what does the value shown in "totalInternalStorage" mean? Does it mean the total amount of storage available?
What does the value shown in "INTERNAL_STORAGE_MEASURED" mean? Does it mean the consumed value of internal storage?
How is a "memoryEvents" fired? Can I collect this information at any time or do I have to wait for Google to do it in their time?
I took a test and collected the following information:
totalInternalStorage = 0.1 GB
memoryEvents = 4 GB (INTERNAL_STORAGE_MEASURED, 3 days ago)
This information, to me, is very confusing and that's why I need your help.
Thanks
totalInternalStorage in the memoryInfo is the measurement of the root of the total "system" partition storage
MemoryEvent returns 3 value per event eventType , createTime, and byteCount. in the test you made the value you receive is as follows
eventType - INTERNAL_STORAGE_MEASURED it means that the memory measured was the Internal Storage or read-only system partition
byteCount - 4 GB is the number of free bytes in the medium or in your internal storage
createTime - 3 days ago , it is the day where the event occurred
The memoryInfo measurements are taken asynchronously on the device either when a change is detected or when there's a periodic refresh of the device status. You can check the status everytime you call device.get()
I am trying to query from s3 (15 days of data). I tried querying them separately (each day) it works fine. It works fine for 14 days as well. But when I query 15 days the job keeps running forever (hangs) and the task # is not updating.
My settings :
I am using 51 node cluster r3.4x large with dynamic allocation and maximum resource turned on.
All I am doing is =
val startTime="2017-11-21T08:00:00Z"
val endTime="2017-12-05T08:00:00Z"
val start = DateUtils.getLocalTimeStamp( startTime )
val end = DateUtils.getLocalTimeStamp( endTime )
val days: Int = Days.daysBetween( start, end ).getDays
val files: Seq[String] = (0 to days)
.map( start.plusDays )
.map( d => s"$input_path${DateTimeFormat.forPattern( "yyyy/MM/dd" ).print( d )}/*/*" )
sqlSession.sparkContext.textFile( files.mkString( "," ) ).count
When I run the same with 14 days, I got 197337380 (count) and I ran the 15th day separately and got 27676788. But when I query 15 days total the job hangs
Update :
The job works fine with :
var df = sqlSession.createDataFrame(sc.emptyRDD[Row], schema)
for(n <- files ){
val tempDF = sqlSession.read.schema( schema ).json(n)
df = df(tempDF)
}
df.count
But can some one explain why it works now but not before ?
UPDATE : After setting mapreduce.input.fileinputformat.split.minsize to 256 GB it works fine now.
Dynamic allocation and maximize resource allocation are both different settings, one would be disabled when other is active. With Maximize resource allocation in EMR, 1 executor per node is launched, and it allocates all the cores and memory to that executor.
I would recommend taking a different route. You seem to have a pretty big cluster with 51 nodes, not sure if it is even required. However, follow this rule of thumb to begin with, and you will get a hang of how to tune these configurations.
Cluster memory - minimum of 2X the data you are dealing with.
Now assuming 51 nodes is what you require, try below:
r3.4x has 16 CPUs - so you can put all of them to use by leaving one for the OS and other processes.
Set your number of executors to 150 - this will allocate 3 executors per node.
Set number of cores per executor to 5 (3 executors per node)
Set your executor memory to roughly total host memory/3 = 35G
You got to control the parallelism (default partitions), set this to number of total cores you have ~ 800
Adjust shuffle partitions - make this twice of number of cores - 1600
Above configurations have been working like a charm for me. You can monitor the resource utilization on Spark UI.
Also, in your yarn config /etc/hadoop/conf/capacity-scheduler.xml file, set yarn.scheduler.capacity.resource-calculator to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator - which will allow Spark to really go full throttle with those CPUs. Restart yarn service after change.
You should be increasing the executor memory and # executors, If the data is huge try increasing the Driver memory.
My suggestion is to not use the dynamic resource allocation and let it run and see if it still hangs or not (Please note that spark job can consume entire cluster resources and make other applications starve for resources try this approach when no jobs are running). if it doesn't hang that means you should play with the resource allocation, then start hardcoding the resources and keep increasing resources so that you can find the best resource allocation you can possibly use.
Below links can help you understand the resource allocation and optimization of resources.
http://site.clairvoyantsoft.com/understanding-resource-allocation-configurations-spark-application/
https://community.hortonworks.com/articles/42803/spark-on-yarn-executor-resource-allocation-optimiz.html
I'm running several spiders over different websites. Most runs take 2 - 3 days and many work fine. But sometimes it happens, that the crawl just stops or crashes?
With:
scrapy crawl myspider > logs/myspider.log 2>&1 &
I'm writing the output into a file and for one crawl for instance the last entry is:
[scrapy.extensions.logstats] INFO: Crawled 1975 pages (at 1 pages/min), scraped 1907 items (at 1 items/min)
and it simply stops there. Not dumped any stats and it didn't get to the end of everything.
Now I assume that could be a network issue or similar?
The machine has an average load of 0.10, I'm scrapying with a 40 sec delay and running 5 - 10 spiders. The hardware is old but RAM and CPU are usually bored in htop. I didn't change the LOG_LEVEL so it should by default be DEBUG.
How can I find out what happens?
I have a medium-sized neo4j database with about 700000 nodes and 1-5 outgoing relations on each node.
If I use browser interface for querying nodes on indexed attribute and finding adjacent nodes, it takes about 1500ms, which is fine for me.
MATCH (n {id_str : 'some_id'})-->(child) return child.id_str
...
Returned 2 rows in 1655 ms
But if I run a similar Cypher query mentioning relations using Ruby Neography library it tooks a couple of minutes to complete.
lookup_links = "MATCH (n {id_str : {id_str}})-[:internal_link]->(child) return child.id_str"
links = #neo.execute_query(lookup_links, :id_str => id_str)
And after that regular browser queries become extremely slow too, taking about two minutes each.
MATCH (n2 {id_str : 'some_id'})-->(child) return child.id_str
Returned 2 rows in 116201 ms
I run the experiments on 64bit ubuntu 14.04 laptop with 8GB ram and 1GB heap for neo4j. Neo4j version is 2.1.3 installed from official deb packet. Neography version is 1.6.0. I use MRI-1.9.3.
I've done a stackdump using kill -3 while neo is busy serving the query.
https://gist.github.com/akamaus/a06bc9e04c7209c480e9
Any Ideas what's going wrong and how to investivate it?