Powershell Start-Job and Start-ThreadJob performance - performance

I need to run a function more than 200 times simultaneously with PowerShell. So far I have two options Start-Job and Start-ThreadJob. In both cases I use a "launcher" as is
$MyFunction = [scriptblock]::Create(#"
Function FunctionName {$function:FunctionName}
"#)
foreach($i in $loop){
Start-ThreadJob -name $JobName -ThrottleLimit 30 -InitializationScript $MyFunction -ScriptBlock {FunctionName $Using:var1 $Using:var2}
#Start-Job -name $JobName -InitializationScript $MyFunction -ScriptBlock {FunctionName $Using:var1 $Using:var2}
}
With Start-Job, it takes about 30 seconds for a job to start. I noticed two things
This time depends on the function (here FunctionName). If there are more variables it takes more time. My function has about 15 variables. I create a PSCustomObject and some data are stored in a MySQL DB every 2 seconds.
If I comment all my code, the time remains the same! But if I remove all the code the job is created instantly.
With Start-Job, it then takes about 30 * 200 = 6000 seconds which is really too long.
Now with Start-ThreadJob, every jobs start instantly. There is a but, otherwise I would not post here, the data in MySQL are stored every 10 seconds! which is really too long. I don't believe this is a MySQL issue but a performance issue with PowerShell.
Do you have more option to propose or do you know how to improve performance?
Thank you for your time.
Yann

Related

Continuously check if a process is running

I have a little code I wrote that checks to see if Outlook is running, and if it is not, opens Outlook. The problem is that my work PC tends to idle around 7% usage, but spikes up to the upper 30s while the script is running. If it detects that Outlook is no longer active, CPU usage can spike up to nearly 100% while opening Outlook. This ~33% increase while the script is running could cause problems when I am working. Is there another way to accomplish the functionality of my code while using less processing power?
do{
$running = Get-Process outlook -ErrorAction SilentlyContinue
if (!$running)
{
Start-Process outlook
}
} while (1 -eq 1)
You need to add a Start-Sleep in there, to keep the script from continuously using CPU time. Otherwise it's continuously looping without rest, making sure Outlook is running. At the end of your do-block:
Start-Sleep -s 60
You can adjust the number of seconds, or even specify milliseconds instead with the -m parameter you require it.
Another way of solving this problem is running below batchfile (scheduled)
#echo off
SET outlookpath=C:\Program Files\Microsoft Office 15\root\office15\outlook.exe
for /f "usebackq" %%f in (`tasklist /FI "IMAGENAME eq outlook.exe"`) do set a=%%f
REM echo A:%a%
if not "%a%"=="outlook.exe" start "" "%outlookpath%"
If you schedule this to run every 5 minutes, than within 5 minutes after closing outlook, it will start again. If you think 5 minutes is too long, schedule it more often. 😉

How do i execute part of a shell script every 5 minutes and anothe rpart every 1 hour?

I have a shell script that collects some data and send it to destination. Part of the data should be copied every 5 minutes and other every 20 minutes. How can this be achieved in a single script? As of now i'm collecting the data every 5 minutes by scheduling with cron.
Best practice would be to use to separate files with two different cron entries. If you need to reutilize part of your code consider using functions.
If you still want to do it in only one file, you should run it every 5 minutes and on each run check whether or not you should execute the other part (every 20 min) or not.
modulo=$((`date +%_M)` % 20))
//do whatever has to be done every 5min
[...]
//check for modulo of current minute / 20
if [ $modulo -eq 0 ]; then
echo Current minute is `date +%_M)`, must execute part 2
//whatever has to be done every 20min
else
//do nothing
fi;
The reason why the variable modulo is defined in the first line is because what has to be done every 5min can potentially take longer than 1min to execute so by the time it is done minute is no longer 20 but 21. Defining the variable before is a safeguard to overcome this.

Run bash script between time variables

I need to run some task at a dynamic time presented in the variable (which value is in HH.mm.ss format) + 2 minutes from its value and less than 5 minutes. Then I could add this job to crontab to schedule it for every minute and I hope that the script will run when the time variable syncs the current time + 2 (or a bit more) minutes (but no more than 5 minutes).
Thank you.
Update:
Thanks to l0b0, all that is left is to find a way to substract 2 minutes from HH.mm.ss variable to get for example 05:28:00 after substraction from var 05:30:00. I think it must be somehow simple. Thanks for help.
at should do the trick. Based on man at and an offset variable $offset you should be able to use this (untested):
echo 'some_command with arguments' | at "now + ${offset}"

Excel Power Query - Sleep or Wait Command to Wait on API Rate Limit

In Excel Power Query (PQ) 2016, is there such a function that can insert a "SLEEP 15 seconds" before proceeding? Not a pause, but a sleep function.
Problem:
I wrote a function in PQ to query: https://westus.api.cognitive.microsoft.com/text/analytics/v2.0. That function works fine, as designed.
I have a worksheet with 10K tweets that I want to pass to that function. When I do, it gets to ~60 or so complete and I get an ERROR line in PQ. A look at Fiddler says this:
message=Rate limit is exceeded. Try again in 11 seconds. statusCode=429
I think if I insert a SLEEP 5 second (equivalent) command into the PQ function, it won't do this.
Help & thanks.
You want Function.InvokeAfter
Function.InvokeAfter(function as function, delay as duration) as any
Here's an example:
= Function.InvokeAfter( () => 2 + 2, #duration(0,0,0,5))
Returns 4 after waiting 5 seconds.
To answer a question you didn't ask yet, if you're going to execute the exact same Web.Contents call a second time, you may need to use the
[IsRetry = true]
option of Web.Contents to indicate you actually want to run the web request again..

Neo4j 2.0.0 - Poor performance for dev/test in a virtual machine

I have Neo4j server running inside a virtual machine using Ubuntu 13.10 and I am accessing via REST using Cypher queries. The virtual machine has 4 GB of memory allocated to it.
I've changed the open file count to 40000, set the initial JVM heap to 1G and my neo4j.properties file is as follows:
neostore.nodestore.db.mapped_memory=250M
neostore.relationshipstore.db.mapped_memory=100M
neostore.propertystore.db.mapped_memory=100M
neostore.propertystore.db.strings.mapped_memory=100M
neostore.propertystore.db.arrays.mapped_memory=100M
keep_logical_logs=3 days
node_auto_indexing=true
node_keys_indexable=id
I've also updated sysctl based on the Neo4j Linux tuning guide:
vm.dirty_background_ratio = 50
vm.dirty_ratio = 80
Since I am testing queries, the basic routine is to run my suite of tests and then delete all of the nodes and run them all again. At the start of each test run, the database has 0 nodes in it. My suite of tests of about 100 queries is taking 22 seconds to run. Basic parameterized creates such as:
CREATE (x:user { email: {param0},
name: {param1},
displayname: {param2},
id: {param3},
href: {param4},
object: {param5} })
CREATE x-[:LOGIN]->(:login { password: {param6},
salt: {param7} } )
are currently taking over 170ms to execute (and that's the average, first query time is 700ms). During a test run, the CPU in the VM never exceeds 50% and memory usage is at a steady 1.4Gb.
Why would creating a single node in an empty database take 170ms? At this point unit testing is becoming almost impossible since it is so slow. This is my first time trying to tune Neo4j so I'm not really sure how to figure out where the problem is or what changes should be made.
Additional Details
I'm using Go 1.2 to make REST calls to the cypher endpoint (http://localhost:7474/db/data/cypher) of a locally installed Neo4j instance. I'm setting the request headers for content-type to "application/json", accept to "application/json" and "X-Stream" to true. I always return either an array of maps or nothing depending on the query.
It seems like the creates are the problem and are taking forever. For example:
2014/01/15 11:35:51 NewUser took 123.314938ms
2014/01/15 11:35:51 NewUser took 156.101784ms
2014/01/15 11:35:52 NewUser took 167.439442ms
2014/01/15 11:35:52 ValidatePassword took 4.287416ms
NewUser creates two new nodes and one relationship and is taking 167ms, while ValidatePassword is a read-only operation and it completes in 4ms. Also note that the three calls to NewUser are identical parameterized queries. While the creates are the big problem, I'm also a little concerned that Neo4j is taking 4ms to just find a labeled node when there are only 100 nodes in the database.
I do not restart the server in between test runs or delete the database. I issue a single delete all nodes query MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r at the end of the test run. Running the same test suite multiple times back to back does not improve the query times.
Are your 100 queries all the same only with different parameters, or actually 100 different queries?
What you see is actually setup work. The parser has to load the parsing rules initially that takes a few ms. Also new queries that have not been seen are compiled, planned and put in the query cache.
So the first query always takes a bit longer. But as you parametrize all subsequent ones should be fast.
Can you confirm that?
I think you see the transactional overhead of flushing the transaction to disk.
Did you try to batch more requests into one? I.e. with the transactional endpoint? Or the /db/data/batch (but I'd rather use the new tx-endpoint /db/data/transaction).
Did you create an index for your lookup property for your validate query?
Can you do me a favor and test your create query without a label? I found some perf issues when testing that myself earlier this week.
Just ran a test with curl
for i in `seq 1 10`; do time curl -i -H content-type:application/json -H accept:application/json -H X-Stream:true -d #perf_test.json http://localhost:7474/db/data/cypher; done
I'm getting between 16 and 30ms per request externally including starting curl
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8; stream=true
Access-Control-Allow-Origin: *
Transfer-Encoding: chunked
Server: Jetty(9.0.5.v20130815)
{"columns":[],"data":[]}
real 0m0.016s
user 0m0.005s
sys 0m0.005s
Perhaps it is rather the VM (disk or network) or the cross-vm communication?
Did another test with ab and 1000 requests for both endpoints, got a mean of about 5 ms both times.
https://gist.github.com/jexp/8452037

Resources