Let go tool pprof collect new data periodically - go

I'm using go pprof like this:
go tool pprof -no_browser -http=0.0.0.0:8081 http://localhost:6060/debug/pprof/profile?seconds=60
How can I ask pprof to fetch the profiling data periodically?

Here's a python script that uses wget to grab the data every hour, putting the output into a file whose name includes the timestamp.
Each file can be inspected by running
go tool pprof pprof_data_YYYY_MM_DD_HH
Here's the script:
import subprocess
import time
from datetime import datetime
while True:
now = datetime.now()
sleepTime = 3601 - (60 * now.minute + now.second + 1e-6 * now.microsecond)
time.sleep(sleepTime)
now = datetime.now()
tag = f"{now.year}-{now.month:02d}-{now.day:02d}_{now.hour:02d}"
subprocess.run(["wget", "-O", f"pprof_data_{tag}", "-nv", "-o", "/dev/null", "http://localhost:6060/debug/pprof/profile?seconds=60"])
The 3601 causes wget to run about 1 second after the top of the hour to avoid the race condition that time.Sleep returns just before the top of the hour.
You could obviously write a similar script in bash or your favorite language.

Related

pexpect timed out before script ends

I am using pexpect to connect to a remote server using ssh.
The following code works but I have to use time.sleep to make a delay.
Especially when I am sending a command to run a script on the remote server.
The script will take up to a minute to run and if I don't use a 60 seconds delay, then the script will end prematurely.
The same issue when I am using sftp to download a file. If the file is large, then it download partially.
Is there a way to control without using a delay?
#!/usr/bin/python3
import pexpect
import time
from subprocess import call
siteip = "131.235.111.111"
ssh_new_conn = 'Are you sure you want to continue connecting'
password = 'xxxxx'
child = pexpect.spawn('ssh admin#' + siteip)
time.sleep(1)
child.expect('admin#.* password:')
child.sendline('xxxxx')
time.sleep(2)
child.expect('admin#.*')
print('ssh to abcd - takes 60 seconds')
child.sendline('backuplog\r')
time.sleep(50)
child.sendline('pwd')
Many pexpect functions take an optional timeout= keyword, and the one you give in spawn() sets the default. Eg
child.expect('admin#',timeout=70)
You can use the value None to never timeout.

running subprocesses in parallel with Python

I am trying to understand how can I build a parallel computing pipeline for multiple subprocesses.
As I see, each subprocess block waits for the previous code block to run, whereas I have a pipeline which does not have a dependency for the previous run, and it can be handled in parallel. I want to understand whether this is possible, and if so, a sample syntax for showing how to do that would be a great help! Thanks in advance.
import sys
import os
import subprocess
subprocess.run("python pipelinecode1.py".split() +
[run_date, this_wk, last_wk, prev_wk], shell=True)
subprocess.run("python pipelinecode2.py".split() +
[run_date, this_wk, last_wk, prev_wk], shell=True)
subprocess.run("python pipelinecode3.py".split() +
[run_date, this_wk, last_wk, prev_wk], shell=True)
The MCVE as-is shows zero dependency on the python-interpreter, so the most efficient step for running a set of mutualy independent tasks ( not a pipeline, where one-step-after-another order of processing steps "forms" the "pipeline" ) is GNU parallel:
$ parallel python {} run_date this_wk last_wk prev_wk ::: pipelinecode1.py \
pipelinecode2.py \
pipelinecode3.py
This way you do not waste CPU / cache resources and escape from the blocking and GIL-lock re-introduced re-[SERIAL]-isation of the code-execution without any add-on overhead costs.
For all configurables available read respective details in man parallel

How to run a query and return the results in csv format

I am trying to convert a query using the BigQuery command-line tool, to instead be done using Go but I am not finding how I should configure the query correctly. The command I have sets the format to CSV, the maximum number of rows to output, maximum bytes billed, and the project id; then runs a standard SQL query and writes the output to a CSV file while removing the headers and blank lines at the top.
Below is the command I have working correctly using the command-line tool:
bq query --format csv \
--max_rows <max_row_int> --maximum_bytes_billed <max_bytes_billed_int> \
--project_id <project_id> "#standardSQL
<standard_sql_statment>
" \
| tail -n +3 >results.csv
I can see from the docs to run a query in Go I need to create the client/connection and then I should be able to run the query as below:
ctx := context.Background()
client, error := bigquery.NewClient(ctx, "<project_id_string>")
if err != nil {
return nil, err
}
q := client.Query(<standard_sql_query>)
How can I set the configuration flags I have above in my command-line tool code in my Go code?
You can use QueryConfig from the package bigquery [1] [2], which holds the configuration for a query job. There, for example, you will find MaxBytesBilled which is an analogy of the --maximum_bytes_billed flag.
Saving the output result is not a specific feature of the BigQuery package. You can use the encoding/csv package [3] for this purpose.
[1] - https://godoc.org/cloud.google.com/go/bigquery
[2] - https://github.com/googleapis/google-cloud-go/blob/master/bigquery/query.go#L26
[3] - https://golang.org/pkg/encoding/csv/
here is the reference implementation for libraries https://cloud.google.com/bigquery/docs/quickstarts/quickstart-client-libraries
then you can execute your script:
GOOGLE_CLOUD_PROJECT=project-id ./script.go

how to pass specific arguments in batch commands during jenkins builds

I'm trying to use Jenkins to automate performance testing with JMeter,
each build is a single JMeter test and I want to increase the number of users(threads) for each Jenkins build if the previous was successful.
I have configured most of the build, with SSH plugin I can restart Tomcat, copy catalina.out, with publishing performance I can open the .jtl file and determine if the build was successful.
What I want is to execute a different batch command for the next build(to increase the number of users(threads) and user id's)
For example:
jmeter -Jthreads=10 -n -t C:\TestScripts\script.jmx -l C:\TestScripts\Jenkins.jtl
jmeter -Jthreads=20 -n -t C:\TestScripts\script.jmx -l C:\TestScripts\Jenkins.jtl
jmeter -Jthreads=30 -n -t C:\TestScripts\script.jmx -l C:\TestScripts\Jenkins.jtl...
Is there some good jmeter plugin some counter that i can use to increase some variable by 10 each time:
jmeter -Jthreads=%variable1%...
I have tried by setting environmental variables and then incrementing that variable by:
"SET /A thread+=10"
but it doesn't change that variable because jenkins opens its own CMD, a new process :
("cmd /c call C:\WINDOWS\TEMP\jenkins556482303577128680.bat")
Use the following SET command to increase threads variable by 10:
SET /A threads=threads+10
Or inside double quotes:
SET /A "threads+=10"
Not knowing your Jenkins configuration, and which plugins you have installed and how do you run the test it is quite hard to come up with the best solution.
The only "universal" workaround I can think of is writing the current number of threads into a file in Jenkins workspace and reading the value of threads from the file on next execution.
Add setUp Thread Group to your Test Plan
Add JSR223 Sampler to your Thread Group
Put the following Groovy code into "Script" area:
import org.apache.jmeter.threads.ThreadGroup
import org.apache.jorphan.collections.SearchByClass
import org.apache.commons.io.FileUtils
SampleResult.setIgnore()
def file = new File(System.getenv('WORKSPACE') + System.getProperty('file.separator') + 'threads.number')
if (file.exists()) {
def newThreadNum = (FileUtils.readFileToString(file, 'UTF-8') as int) + 10
FileUtils.writeStringToFile(file, newThreadNum as String)
def engine = ctx.getEngine()
def test = org.apache.commons.lang.reflect.FieldUtils.getField(engine.getClass(), 'test', true)
def testPlanTree = test.get(engine)
SearchByClass<ThreadGroup> threadGroupSearch = new SearchByClass<>(ThreadGroup.class)
testPlanTree.traverse(threadGroupSearch)
def threadGroups = threadGroupSearch.getSearchResults()
threadGroups.each {
it.setNumThreads(newThreadNum)
}
} else {
FileUtils.writeStringToFile(file, props.get('threads'))
}
The code will write down the current number of threads in all Thread Groups into a file called threads.number in Jenkins Workspace and on subsequent runs it reads the value from it, adds 10 and writes it back.
For now i am creating 20 .jmx files (1.jmx, 2.jmx , 3.jmx ...) each whith a different number of users. and calling them whit this command :
jmeter -n -t C:\TestScripts\%BUILD_NUMBER%.jmx -l C:\TestScripts\%BUILD_NUMBER%.jtl
the first billd will call 1.jmx the second 2.jmx ...
it isn't the best method but it works for now. I will try your advice over the weekend when i have more time.
i have found the a solution that works for me, it inst pretty. I created a python script which changes a .CVS fil from which JMeter reads the number of threads and the starting user id. This python script incremets the starting user id by the number of threads in the previous bild and the number of threads by 10
file = open('C:\\Users\\mp\\AppData\\Local\\Programs\\Python\\Python37-32\\eggs.csv', 'r')
a,b=file.readlines()[0].split(",")
print(a,b)
b=int(b)
a=int(a)
b=a+b
a=a+10
print(a,b)
f = open("C:\\Users\\mp\\AppData\\Local\\Programs\\Python\\Python37-32\\eggs2.csv", "a")
f.write(str(a)+","+str(b))
f.close()
I have python on my pc and a i am calling the script in Jenkins as a windows Bach command
C:\Users\mp\AppData\Local\Programs\Python\Python37-32\python.exe C:\Users\mp\AppData\Local\Programs\Python\Python37-32\rename_write_file.py
I am much better in python than Java so I implemented this in Python.
So for each new test,the CSV file from which jmeter reads values is changed.

SpaCy model won't load in AWS Lambda

Has anyone gotten SpaCy 2.0 to work in AWS Lambda? I have everything zipped and packaged correctly, since I can get a generic string to return from my lambda function if I test it. But when I do the simple function below to test, it stalls for about 10 seconds and then returns empty, and I don't get any error messages. I did set my Lambda timeout at 60 seconds so that isn't the problem.
import spacy
nlp = spacy.load('en_core_web_sm') #model package included
def lambda_handler(event, context):
doc = nlp(u'They are')
msg = doc[0].lemma_
return msg
When I load the model package without using it, it also returns empty, but if I comment it out it sends me the string as expected, so it has to be something about loading the model.
import spacy
nlp = spacy.load('en_core_web_sm') #model package included
def lambda_handler(event, context):
msg = 'message returned'
return msg
To optimize model load you have to store it on S3, and download it using your own script to tmp folder in lambda and then load it into spacy from it.
It will take 5 seconds to download it from S3 and run. The good optimization here is to keep model on warm container and check if it was already downloaded. On warm container code takes 0.8 seconds to run.
Here is the link to the code and package with example:
https://github.com/ryfeus/lambda-packs/blob/master/Spacy/source2.7/index.py
import spacy
import boto3
import os
def download_dir(client, resource, dist, local='/tmp', bucket='s3bucket'):
paginator = client.get_paginator('list_objects')
for result in paginator.paginate(Bucket=bucket, Delimiter='/', Prefix=dist):
if result.get('CommonPrefixes') is not None:
for subdir in result.get('CommonPrefixes'):
download_dir(client, resource, subdir.get('Prefix'), local, bucket)
if result.get('Contents') is not None:
for file in result.get('Contents'):
if not os.path.exists(os.path.dirname(local + os.sep + file.get('Key'))):
os.makedirs(os.path.dirname(local + os.sep + file.get('Key')))
resource.meta.client.download_file(bucket, file.get('Key'), local + os.sep + file.get('Key'))
def handler(event, context):
client = boto3.client('s3')
resource = boto3.resource('s3')
if (os.path.isdir("/tmp/en_core_web_sm")==False):
download_dir(client, resource, 'en_core_web_sm', '/tmp','ryfeus-spacy')
spacy.util.set_data_path('/tmp')
nlp = spacy.load('/tmp/en_core_web_sm/en_core_web_sm-2.0.0')
doc = nlp(u'Apple is looking at buying U.K. startup for $1 billion')
for token in doc:
print(token.text, token.pos_, token.dep_)
return 'finished'
P.S. To package spacy within AWS Lambda you have to strip shared libraries.
Knew it was probably going to be something simple. The answer is that there wasn't enough allocated memory to run the Lambda function - I found that I had to minimally increase it to near the max 2816 MB to get the example above to work. It is notable that before last month it wasn't possible to go this high:
https://aws.amazon.com/about-aws/whats-new/2017/11/aws-lambda-doubles-maximum-memory-capacity-for-lambda-functions/
I turned it up to the max of 3008 MB to handle more text and everything seems to work just fine now.
What worked for me was cding into <YOUR_ENV>/lib/Python<VERSION>/site-packages/ and removing the language models I didn't need. For example, I only needed the English language model so once in my own site-packages directory I just needed to run als -d */ | grep -v en | xargs rm -rf`, and then zip up the contents to get it under Lambda's limits.

Resources