I know about optimizing scripts to use getValues over a range in order to reduce the number of calls.
I've tremendously sped up the overall script by calling and creating an object that has all named ranges for the sheet as global object....very fast and 4x increase in over operation of all functions that get/set ranges.
I went to sleep last night with every running well. I woke up this morning to this:
[20-02-22 07:39:55:825 MST] SpreadsheetApp.Range.setValue([FALSE]) [0 seconds]
[20-02-22 07:39:55:825 MST] SpreadsheetApp.Range.setValue([TRUE]) [0 seconds]
[20-02-22 07:39:55:826 MST] SpreadsheetApp.getActiveSpreadsheet() [0 seconds]
[20-02-22 07:39:55:826 MST] SpreadsheetApp.Spreadsheet.getSheetByName([Status]) [0 seconds]
[20-02-22 07:39:55:827 MST] SpreadsheetApp.Sheet.getRange([17:17]) [0 seconds]
[20-02-22 07:40:16:209 MST] SpreadsheetApp.Range.getValues() [20.381 seconds]
[20-02-22 07:40:26:373 MST] SpreadsheetApp.Range.getValues() [10.164 seconds]
[20-02-22 07:40:26:373 MST] SpreadsheetApp.Spreadsheet.getSheetByName([Sender]) [0 seconds]
[20-02-22 07:40:26:374 MST] SpreadsheetApp.Sheet.activate() [0 seconds]
[20-02-22 07:40:26:374 MST] SpreadsheetApp.Range.getColumn() [0 seconds]
[20-02-22 07:40:26:375 MST] Logger.log([Settings:Replies,,Complete, []]) [0 seconds]
[20-02-22 07:40:26:375 MST] SpreadsheetApp.Range.setValue([NU6COI7]) [0 seconds]
[20-02-22 07:40:46:818 MST] SpreadsheetApp.Range.getValues() [20.442 seconds]
Set values are very fast, but each and every get (even if just a single cell), it taking forever and then some. Ideas? Is this Google having a bad morning or a script change that is odd?
I know big forumlas involved in the cells being called can slow down the get as Google waits for the results of formulas, but I don't have that. There isn't a circular dependency or anything like that. Internet is consistent.
ing like that.
Yes, It was Google. I decided to just watch some videos and ignore this for a while. An hour later, without touching it, all is back to normal. Tested on different computers and all was the same having slow speeds. Now fast as can be. Internet speed still the same.
Related
What is the best practice for Max Wait (ms) value in JDBC Connection Configuration?
JDBC
I am executing 2 types of tests:
20 loops for each number of threads - to get max Throupught
30min runtime for each number of Threads - to get Response time
With Max Wait = 10000ms I can execute JDBC request with 10,20,30,40,60 and 80 Threads without an error. With Max Wait = 20000ms I can go higher and execute with 100, 120, 140 Threads without an error. It seems to be logical behaviour.
Now question.
Can I increase Max Wait value as desired? Is it correct way how to get more test results?
Should I stop testing and do not increase number of Threads if any error occur in some Report? I got e.g. 0.06% errors from 10000 samples. Is this stop for my testing?
Thanks.
Everything depends on what your requirements are and how you defined performance baseline.
Can I increase Max Wait value as desired? Is it correct way how to get more test results?
If you are OK with higher response times and the functionality should be working, then you can keep max time as much as you want. But, practically, there will be the threshold to response times (like, 2 seconds to perform a login transaction), which you define as part of your performance SLA or performance baseline. So, though you are making your requests successful by increasing max time, eventually it is considered as failed request due to high response time (by crossing threshold values)
Note: Higher response times for DB operations eventually results in higher response times for web applications (or end users)
Should I stop testing and do not increase number of Threads if any error occur in some Report?
Same applies to error rates as well. If SLA says, some % error rate is agreed, then you can consider that the test is meeting SLA or performance baseline if the actual error rate is less that that. eg: If requirements says 0% error rate, then 0.1% is also considered as failed.
Is this stop for my testing?
You can stop the test at whatever the point you want. It is completely based on what metrics you want to capture. From my knowledge, It is suggested to continue the test, till it reaches a point where there is no point in continuing the test, like error rate reached 99% etc. If you are getting error rate as 0.6%, then I suggest to continue with the test, to know the breaking point of the system like server crash, response times reached to unacceptable values, memory issues etc.
Following are some good references:
https://www.nngroup.com/articles/response-times-3-important-limits/
http://calendar.perfplanet.com/2011/how-response-times-impact-business/
difference between baseline and benchmark in performance of an application
https://msdn.microsoft.com/en-us/library/ms190943.aspx
https://msdn.microsoft.com/en-us/library/bb924375.aspx
http://searchitchannel.techtarget.com/definition/service-level-agreement
This setting maps to DBCP -> BasicDataSource -> maxWaitMillis parameter, according to the documentation:
The maximum number of milliseconds that the pool will wait (when there are no available connections) for a connection to be returned before throwing an exception, or -1 to wait indefinitely
It should match the relevant setting of your application database configuration. If your goal is to determine the maximum performance - just put -1 there and the timeout will be disabled.
In regards to Is this stop for my testing? - it depends on multiple factors like what application is doing, what you are trying to achieve and what type of testing is being conducted. If you test database which orchestrates nuclear plant operation than zero error threshold is the only acceptable. And if this is a picture gallery of cats, this error level can be considered acceptable.
In majority of cases performance testing is divided into several test executions like:
Load Testing - putting the system under anticipated load to see if it capable to handle forecasted amount of users
Soak Testing - basically the same as Load Testing but keeping the load for a prolonged duration. This allows to detect e.g. memory leaks
Stress testing - determining boundaries of the application, saturation points, bottlenecks, etc. Starting from zero load and gradually increasing it until it breaks mentioning the maximum amount of users, correlation of other metrics like Response Time, Throughput, Error Rate, etc. with the increasing amount of users, checking whether application recovers when load gets back to normal, etc.
See Why ‘Normal’ Load Testing Isn’t Enough article for above testing types described in details.
I have the feeling that the logging of the server is quite exhaustive. Is there a way to disable or reduce the logging output? It seems that if I send a document to the server it will write the content to stdout which might be a performance killer.
Can I do that somehow?
Update
I found a way to suppress the output from the server. Still my question is how and if I can do this using a command line argument for the actual server. However for a dirty workaround it seems the following can ease the overhead.
Running the server with
java -mx6g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -prettyPrint false 2&>1 >/dev/null
where >/dev/null would pipe the output into nothing. Unfortunately this alone did not help. 2&>1 seems to do the trick here. I confess that I do not know what it's actually doing. However, I compared two runs.
Running with 2&>1 >/dev/null
Processed 100 sentences
Overall time: 2.1797 sec
Time per sentence: 0.0218 sec
Processed 200 sentences
Overall time: 6.5694 sec
Time per sentence: 0.0328 sec
...
Processed 1300 sentences
Overall time: 30.482 sec
Time per sentence: 0.0234 sec
Processed 1400 sentences
Overall time: 32.848 sec
Time per sentence: 0.0235 sec
Processed 1500 sentences
Overall time: 35.0417 sec
Time per sentence: 0.0234 sec
Running without additional arguments
ParagraphVectorTrainer - Epoch 1 of 6
Processed 100 sentences
Overall time: 2.9826 sec
Time per sentence: 0.0298 sec
Processed 200 sentences
Overall time: 5.5169 sec
Time per sentence: 0.0276 sec
...
Processed 1300 sentences
Overall time: 54.256 sec
Time per sentence: 0.0417 sec
Processed 1400 sentences
Overall time: 59.4675 sec
Time per sentence: 0.0425 sec
Processed 1500 sentences
Overall time: 64.0688 sec
Time per sentence: 0.0427 sec
This was a very shallow test but it appears that this can have quite an impact. The difference here is a factor of 1.828 which is quite a difference over time.
However, this was just a quick test and I cannot guarantee that my results are completely sane!
Further update:
I assume that this has to do with how the JVM is optimizing the code over time but the time per sentence becomes compareable with the one I am having on my local machine. Keep in mind that I got the results below using 2&>1 >/dev/null to eliminate the stdout logging.
Processed 68500 sentences
Overall time: 806.644 sec
Time per sentence: 0.0118 sec
Processed 68600 sentences
Overall time: 808.2679 sec
Time per sentence: 0.0118 sec
Processed 68700 sentences
Overall time: 809.9669 sec
Time per sentence: 0.0118 sec
You're now the third person that's asked for this :) -- Preventing Stanford Core NLP Server from outputting the text it receives . In the HEAD of the GitHub repo, and in versions 3.6.1 onwards, there's a -quiet flag that prevents the server from outputting the text it receives. Other logging can then be configured with SLF4J, if it's in your classpath.
I search in the site and found another question about that, but there's no answers.
I'm executing YCSB tool on a cassandra cluster, and the output of YCSB is:
[OVERALL], RunTime(ms), 302016.0 -> 05 mins 02 secs
[OVERALL], Throughput(ops/sec), 3311.0828565374018
[UPDATE], Operations, 499411
[UPDATE], AverageLatency(us), 2257.980987603397
[UPDATE], MinLatency(us), 389
[UPDATE], MaxLatency(us), 169380
[UPDATE], 95thPercentileLatency(ms), 4
[UPDATE], 99thPercentileLatency(ms), 8
[UPDATE], Return=0, 499411
[UPDATE], 0, 50039
[UPDATE], 1, 222610
[UPDATE], 2, 138349
[UPDATE], 3, 49465
and it continue about 'till number 70. How does it mean? Are there the number of seconds in which are runs that number of operations? Strange, cause the test runs for over than 5 minutes as you can see from the voice overall.
Thank you for your time!
The output indicates
The total execution time was 05 mins 02 secs
The average throughput was 3311.0828565374018 across all threads
There were 499411 update operations
The Average, Minimum, Maximum, 99th and 95th Percentile latency
499411 operations gave a return code of zero (All were successful. Non-zero return indicates failed operation)
50039 operations completed in less than 1ms.
222610 operations completed between 1 and 2ms.
138349 operations completed between 2 to 3ms.
...and so on...They will probably go up to 1000ms.
It is also possible to get a time-series for the latencies by adding the -p timeseries.granularity=2000 switch to the ycsb command.
More information is available in the documentation
In my MR job, which does bulk loading using HFileOutputFormat, 87 map tasks are spawned and in around 20 mins all the tasks reached 100%. Yet the individual task status is still in 'Running' in the hadoop admin page and none is moved to the completed state. The reducer is always in pending state and never starts. I just waited but it errored out after the 30 mins timeout.
My job has to load around 150+ columns. I tried running same MR job with less number of columns and it gets easily completed. Any idea why the map tasks are not moved to completed state even after reaching 100%?
One probable cause would be that the output data emitted would be huge. Sorting it, writing it back to disk would be a time consuming thing to do. This is typically not the case.
It would be even wise to check the logs and look out for ways to improve your map-reduce code.
Waiting time is defined as how long each process has to wait before it gets it's time slice.
In scheduling algorithms such as Shorted Job First and First Come First Serve, we can find that waiting time easily when we just queue up the jobs and see how long each one had to wait before it got serviced.
When it comes to Round Robin or any other preemptive algorithms, we find that long running jobs spend a little time in CPU, when they are preempted and then wait for sometime for it's turn to execute and at some point in it's turn, it executes till completion. I wanted to findout the best way to understand 'waiting time' of the jobs in such a scheduling algorithm.
I found a formula which gives waiting time as:
Waiting Time = (Final Start Time - Previous Time in CPU - Arrival Time)
But I fail to understand the reasoning for this formula. For e.g. Consider a job A which has a burst time of 30 units and round-robin happens at every 5 units. There are two more jobs B(10) and C(15).
The order in which these will be serviced would be:
0 A 5 B 10 C 15 A 20 B 25 C 30 A 35 C 40 A 45 A 50 A 55
Waiting time for A = 40 - 5 - 0
I choose 40 because, after 40 A never waits. It just gets its time slices and goes on and on.
Choose 5 because A spent in process previouly between 30 and 35.
0 is the start time.
Well, I have a doubt in this formula as why was 15 A 20 is not accounted for?
Intuitively, I unable to get how this is getting us the waiting time for A, when we are just accounting for the penultimate execution only and then subtracting the arrival time.
According to me, the waiting time for A should be:
Final Start time - (sum of all times it spend in the processing).
If this formula is wrong, why is it?
Please help clarify my understanding of this concept.
You've misunderstood what the formula means by "previous time in CPU". This actually means the same thing as what you call "sum of all times it spend in the processing". (I guess "previous time in CPU" is supposed to be short for "total time previously spent running on the CPU", where "previously" means "before the final start".)
You still need to subtract the arrival time because the process obviously wasn't waiting before it arrived. (Just in case this is unclear: The "arrival time" is the time when the job was submitted to the scheduler.) In your example, the arrival time for all processes is 0, so this doesn't make a difference there, but in the general case, the arrival time needs to be taken into account.
Edit: If you look at the example on the webpage you linked to, process P1 takes two time slices of four time units each before its final start, and its "previous time in CPU" is calculated as 8, consistent with the interpretation above.
Last waiting
value-(time quantum×(n-1))
Here n denotes the no of times a process arrives in the gantt chart.