NewRelic doesn't graph any data in a Python + Gevent based system - gevent

I have a Gevent based backend, where in a set of external apis are called in parallel. NewRelic doesn't report the time consumed by these apis in its UI.
I tried to push these timings myself using custom metrics. Here is the code snippet:
import time
t1 = time.time()
instance.getAval()
newrelic.agent.record_custom_metric("Custom/BUS_EXT_CALL", time.time() - t1)
This didn't work either. I tried to look into their python agent code and made following changes:
def record_custom_metric(name, value, application=None, app_name=None):
if app_name is None and application is None:
transaction = current_transaction()
if transaction:
transaction.record_custom_metric(name, value)
elif app_name is not None:
application = newrelic.api.application.application_instance(app_name)
application.record_custom_metric(name, value)
else:
if application.enabled:
application.record_custom_metric(name, value)
and now newrelic.agent.record_custom_metric("Custom/BUS_EXT_CALL", time.time() - t1, app_name="ControllerTest") this call was successful, but somehow the data doesn't appear in NewRelic's dashboard.

Related

Iterate through asyncio loop

I am very new with aiohttp and asyncio so apologies for my ignorance up front. I am having difficulties with the event loop portion of the documentation and don't think my below code is executing asynchronously. I am trying to take the output of all combinations of two lists via itertools, and POST to XML. A more full blown version is listed here while using the requests module, however that is not ideal as I am needing to POST 1000+ requests potentially at a time. Here is a sample of how it looks now:
import aiohttp
import asyncio
import itertools
skillid = ['7715','7735','7736','7737','7738','7739','7740','7741','7742','7743','7744','7745','7746','7747','7748' ,'7749','7750','7751','7752','7753','7754','7755','7756','7757','7758','7759','7760','7761','7762','7763','7764','7765','7766','7767','7768','7769','7770','7771','7772','7773','7774','7775','7776','7777','7778','7779','7780','7781','7782','7783','7784']
agent= ['5124','5315','5331','5764','6049','6076','6192','6323','6669','7690','7716']
url = 'https://url'
user = 'user'
password = 'pass'
headers = {
'Content-Type': 'application/xml'
}
async def main():
async with aiohttp.ClientSession() as session:
for x in itertools.product(agent,skillid):
payload = "<operation><operationType>update</operationType><refURLs><refURL>/unifiedconfig/config/agent/" + x[0] + "</refURL></refURLs><changeSet><agent><skillGroupsRemoved><skillGroup><refURL>/unifiedconfig/config/skillgroup/" + x[1] + "</refURL></skillGroup></skillGroupsRemoved></agent></changeSet></operation>"
async with session.post(url,auth=aiohttp.BasicAuth(user, password), data=payload,headers=headers) as resp:
print(resp.status)
print(await resp.text())
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
I see that coroutines can be used but not sure that applies as there is only a single task to execute. Any clarification is appreciated.
Because you're making a request and then immediately await-ing on it, you are only making one request at a time. If you want to parallelize everything, you need to separate making the request from waiting for the response, and you need to use something like asyncio.gather to wait for the requests in bulk.
In the following example, I've modified your code to connect to a local httpbin instance for testing; I'm making requests to the /delay/<value> endpoint so that each requests takes a random amount of time to complete.
The theory of operation here is:
Move the request code into the asynchronous one_request function,
which we use to build an array of tasks.
Use asyncio.gather to run all the tasks at once.
The one_request functions returns a (agent, skillid, response)
tuple, so that when we iterate over the responses we can tell which
combination of parameters resulted in the given response.
import aiohttp
import asyncio
import itertools
import random
skillid = [
"7715", "7735", "7736", "7737", "7738", "7739", "7740", "7741", "7742",
"7743", "7744", "7745", "7746", "7747", "7748", "7749", "7750", "7751",
"7752", "7753", "7754", "7755", "7756", "7757", "7758", "7759", "7760",
"7761", "7762", "7763", "7764", "7765", "7766", "7767", "7768", "7769",
"7770", "7771", "7772", "7773", "7774", "7775", "7776", "7777", "7778",
"7779", "7780", "7781", "7782", "7783", "7784",
]
agent = [
"5124", "5315", "5331", "5764", "6049", "6076", "6192", "6323", "6669",
"7690", "7716",
]
user = 'user'
password = 'pass'
headers = {
'Content-Type': 'application/xml'
}
async def one_request(session, agent, skillid):
# I'm setting `url` here because I want a random parameter for
# reach request. You would probably just set this once globally.
delay = random.randint(0, 10)
url = f'http://localhost:8787/delay/{delay}'
payload = (
"<operation>"
"<operationType>update</operationType>"
"<refURLs>"
f"<refURL>/unifiedconfig/config/agent/{agent}</refURL>"
"</refURLs>"
"<changeSet>"
"<agent>"
"<skillGroupsRemoved><skillGroup>"
f"<refURL>/unifiedconfig/config/skillgroup/{skillid}</refURL>"
"</skillGroup></skillGroupsRemoved>"
"</agent>"
"</changeSet>"
"</operation>"
)
# This shows when the task actually executes.
print('req', agent, skillid)
async with session.post(
url, auth=aiohttp.BasicAuth(user, password),
data=payload, headers=headers) as resp:
return (agent, skillid, await resp.text())
async def main():
tasks = []
async with aiohttp.ClientSession() as session:
# Add tasks to the `tasks` array
for x in itertools.product(agent, skillid):
task = asyncio.ensure_future(one_request(session, x[0], x[1]))
tasks.append(task)
print(f'making {len(tasks)} requests')
# Run all the tasks and wait for them to complete. Return
# values will end up in the `responses` list.
responses = await asyncio.gather(*tasks)
# Just print everything out.
print(responses)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
The above code results in about 561 requests, and runs in about 30
seconds with the random delay I've introduced.
This code runs all the requests at once. If you wanted to limit the
maximum number of concurrent requests, you could introduce a
Semaphore to make one_request block if there were too many active requests.
If you wanted to process responses as they arrived, rather than
waiting for everything to complete, you could investigate the
asyncio.wait method instead.

Asyncio with multiprocessing : Producers-Consumers model

I am trying retrieve stock prices and process the prices them as they come. I am a beginner with concurrency but I thought this set up seems suited to an asyncio producers-consumers model in which each producers retrieve a stock price, and pass it to the consumers vial a queue. Now the consumers have do the stock price processing in parallel (multiprocessing) since the work is CPU intensive. Therefore I would have multiple consumers already working while not all the producers are finished retrieving data. In addition, I would like to implement a step in which, if the consumer finds that the stock price it's working on is invalid , we spawn a new consumer job for that stock.
So far, i have the following toy code that sort of gets me there, but has issues with my process_data function (the consumer).
from concurrent.futures import ProcessPoolExecutor
import asyncio
import random
import time
random.seed(444)
#producers
async def retrieve_data(ticker, q):
'''
Pretend we're using aiohttp to retrieve stock prices from a URL
Place a tuple of stock ticker and price into asyn queue as it becomes available
'''
start = time.perf_counter() # start timer
await asyncio.sleep(random.randint(4, 8)) # pretend we're calling some URL
price = random.randint(1, 100) # pretend this is the price we retrieved
print(f'{ticker} : {price} retrieved in {time.perf_counter() - start:0.1f} seconds')
await q.put((ticker, price)) # place the price into the asyncio queue
#consumers
async def process_data(q):
while True:
data = await q.get()
print(f"processing: {data}")
with ProcessPoolExecutor() as executor:
loop = asyncio.get_running_loop()
result = await loop.run_in_executor(executor, data_processor, data)
#if output of data_processing failed, send ticker back to queue to retrieve data again
if not result[2]:
print(f'{result[0]} data invalid. Retrieving again...')
await retrieve_data(result[0], q) # add a new task
q.task_done() # end this task
else:
q.task_done() # so that q.join() knows when the task is done
async def main(tickers):
q = asyncio.Queue()
producers = [asyncio.create_task(retrieve_data(ticker, q)) for ticker in tickers]
consumers = [asyncio.create_task(process_data(q))]
await asyncio.gather(*producers)
await q.join() # Implicitly awaits consumers, too. blocks until all items in the queue have been received and processed
for c in consumers:
c.cancel() #cancel the consumer tasks, which would otherwise hang up and wait endlessly for additional queue items to appear
'''
RUN IN JUPYTER NOTEBOOK
'''
start = time.perf_counter()
tickers = ['AAPL', 'AMZN', 'TSLA', 'C', 'F']
await main(tickers)
print(f'total elapsed time: {time.perf_counter() - start:0.2f}')
'''
RUN IN TERMINAL
'''
# if __name__ == "__main__":
# start = time.perf_counter()
# tickers = ['AAPL', 'AMZN', 'TSLA', 'C', 'F']
# asyncio.run(main(tickers))
# print(f'total elapsed time: {time.perf_counter() - start:0.2f}')
The data_processor() function below, called by process_data() above needs to be in a different cell in Jupyter notebook, or a separate module (from what I understand, to avoid a PicklingError)
from multiprocessing import current_process
def data_processor(data):
ticker = data[0]
price = data[1]
print(f'Started {ticker} - {current_process().name}')
start = time.perf_counter() # start time counter
time.sleep(random.randint(4, 5)) # mimic some random processing time
# pretend we're processing the price. Let the processing outcome be invalid if the price is an odd number
if price % 2==0:
is_valid = True
else:
is_valid = False
print(f"{ticker}'s price {price} validity: --{is_valid}--"
f' Elapsed time: {time.perf_counter() - start:0.2f} seconds')
return (ticker, price, is_valid)
THE ISSUES
Instead of using python's multiprocessing module, i used concurrent.futures' ProcessPoolExecutor, which I read is compatible with asyncio (What kind of problems (if any) would there be combining asyncio with multiprocessing?). But it seems that I have to choose between retrieving the output (result) of the function called by the executor and being able to run several subprocesses in parallel. With the construct below, the subprocesses run sequentially, not in parallel.
with ProcessPoolExecutor() as executor:
loop = asyncio.get_running_loop()
result = await loop.run_in_executor(executor, data_processor, data)
Removing result = await in front of loop.run_in_executor(executor, data_processor, data) allows to run several consumers in parallel, but then I can't collect their results from the parent process. I need the await for that. And then of course the remaining of the code block will fail.
How can I have these subprocesses run in parallel and provide the output? Perhaps it needs a different construct or something else than the producers-consumers model
the part of the code that requests invalid stock prices to be retrieved again works (provided I can get the result from above), but it is ran in the subprocess that calls it and blocks new consumers from being created until the request is fulfilled. Is there a way to address this?
#if output of data_processing failed, send ticker back to queue to retrieve data again
if not result[2]:
print(f'{result[0]} data invalid. Retrieving again...')
await retrieve_data(result[0], q) # add a new task
q.task_done() # end this task
else:
q.task_done() # so that q.join() knows when the task is done
But it seems that I have to choose between retrieving the output (result) of the function called by the executor and being able to run several subprocesses in parallel.
Luckily this is not the case, you can also use asyncio.gather() to wait for multiple items at once. But you obtain data items one by one from the queue, so you don't have a batch of items to process. The simplest solution is to just start multiple consumers. Replace
# the single-element list looks suspicious anyway
consumers = [asyncio.create_task(process_data(q))]
with:
# now we have an actual list
consumers = [asyncio.create_task(process_data(q)) for _ in range(16)]
Each consumer will wait for an individual task to finish, but that's ok because you'll have a whole pool of them working in parallel, which is exactly what you wanted.
Also, you might want to make executor a global variable and not use with, so that the process pool is shared by all consumers and lasts as long as the program. That way consumers will reuse the worker processes already spawned instead of having to spawn a new process for each job received from the queue. (That's the whole point of having a process "pool".) In that case you probably want to add executor.shutdown() at the point in the program where you don't need the executor anymore.

Trying to access an object from a listener python web framework

Pretty new to asynch so here is my question and thank you in advance.
Hi All very simple question I might be thinking too much into.
I am trying to access this cassandra client outside of these defined listeners below that get registered to a sanic main app.
I need the session in order to use an update query which will execute Asynchronously. I can definetly connect and event query from the 'setup_cassandra_session_listener' method below. But having tough time figuring how to call this Cassandra session outside and isolate so i can access else where.
from aiocassandra import aiosession
from cassandra.cluster import Cluster
from sanic import Sanic
from config import CLUSTER_HOST, TABLE_NAME, CASSANDRA_KEY_SPACE, CASSANDRA_PORT, DATA_CENTER, DEBUG_LEVEL, LOGGER_FORMAT
log = logging.getLogger('sanic')
log.setLevel('INFO')
cassandra_cluster = None
def setup_cassandra_session_listener(app, loop):
global cassandra_cluster
cassandra_cluster = Cluster([CLUSTER_HOST], CASSANDRA_PORT, DATA_CENTER)
session = cassandra_cluster.connect(CASSANDRA_KEY_SPACE)
metadata = cassandra_cluster.metadata
app.session = cassandra_cluster.connect(CASSANDRA_KEY_SPACE)
log.info('Connected to cluster: ' + metadata.cluster_name)
aiosession(session)
app.cassandra = session
def teardown_cassandra_session_listener(app, loop):
global cassandra_cluster
cassandra_cluster.shutdown()
def register_cassandra(app: Sanic):
app.listener('before_server_start')(setup_cassandra_session_listener)
app.listener('after_server_stop')(teardown_cassandra_session_listener)
Here is a working example that should do what you need. It does not actually run Cassandra (since I have no experience doing that). But, in principle this should work with any database connection you need to manage across the lifespan of your running server.
from sanic import Sanic
from sanic.response import text
app = Sanic()
class DummyCluser:
def connect(self):
print("Connecting")
return "session"
def shutdown(self):
print("Shutting down")
def setup_cassandra_session_listener(app, loop):
# No global variables needed
app.cluster = DummyCluser()
app.session = app.cluster.connect()
def teardown_cassandra_session_listener(app, loop):
app.cluster.shutdown()
def register_cassandra(app: Sanic):
# Changed these listeners to be more friendly if running with and ASGI server
app.listener('after_server_start')(setup_cassandra_session_listener)
app.listener('before_server_stop')(teardown_cassandra_session_listener)
#app.get("/")
async def get(request):
return text(app.session)
if __name__ == "__main__":
register_cassandra(app)
app.run(debug=True)
The idea is that you attach to your app instance (as you did) and then are able to simply access that inside your routes with request.app.

Run a specific task in the end of play

How I run a specific task when playbook all other tasks completed? The problem is that this needs to be done in every playbook. Just adding to every playbook is not a good idea, I need to make it common for everyone. There is one common role in every playbook, but it works in the beginning. Is it possible to add a task to it that would start at the very end? Or some other way to do this, so that it is always done at the end without editing each playbook.
You could do it with writing a Callback Plugin. This is python code, that executes pre-defined Functions when an (ansible-internal) event occurs.
Interesting for you would be the v2_playbook_on_stats method, which is on of the last steps executed.
For this, please checkout the basic Developer Guidelines page of Ansible:
https://docs.ansible.com/ansible/latest/dev_guide/index.html
But more importantly the Plugins Guide:
https://docs.ansible.com/ansible/latest/dev_guide/developing_plugins.html
The basic structure as outlined in the document is:
from ansible.plugins.callback import CallbackBase
class CallbackModule(CallbackBase):
pass
They even provide a proper example executing the v2_playbook_on_stats method:
# Make coding more python3-ish, this is required for contributions to Ansible
from __future__ import (absolute_import, division, print_function)
__metaclass__ = type
# not only visible to ansible-doc, it also 'declares' the options the plugin requires and how to configure them.
DOCUMENTATION = '''
callback: timer
callback_type: aggregate
requirements:
- whitelist in configuration
short_description: Adds time to play stats
version_added: "2.0"
description:
- This callback just adds total play duration to the play stats.
options:
format_string:
description: format of the string shown to user at play end
ini:
- section: callback_timer
key: format_string
env:
- name: ANSIBLE_CALLBACK_TIMER_FORMAT
default: "Playbook run took %s days, %s hours, %s minutes, %s seconds"
'''
from datetime import datetime
from ansible.plugins.callback import CallbackBase
class CallbackModule(CallbackBase):
"""
This callback module tells you how long your plays ran for.
"""
CALLBACK_VERSION = 2.0
CALLBACK_TYPE = 'aggregate'
CALLBACK_NAME = 'namespace.collection_name.timer'
# only needed if you ship it and don't want to enable by default
CALLBACK_NEEDS_WHITELIST = True
def __init__(self):
# make sure the expected objects are present, calling the base's __init__
super(CallbackModule, self).__init__()
# start the timer when the plugin is loaded, the first play should start a few milliseconds after.
self.start_time = datetime.now()
def _days_hours_minutes_seconds(self, runtime):
''' internal helper method for this callback '''
minutes = (runtime.seconds // 60) % 60
r_seconds = runtime.seconds - (minutes * 60)
return runtime.days, runtime.seconds // 3600, minutes, r_seconds
# this is only event we care about for display, when the play shows its summary stats; the rest are ignored by the base class
def v2_playbook_on_stats(self, stats):
end_time = datetime.now()
runtime = end_time - self.start_time
# Shows the usage of a config option declared in the DOCUMENTATION variable. Ansible will have set it when it loads the plugin.
# Also note the use of the display object to print to screen. This is available to all callbacks, and you should use this over printing yourself
self._display.display(self._plugin_options['format_string'] % (self._days_hours_minutes_seconds(runtime)))
I want to also highlight the importance of the DOCUMENTATION string. I first thought, that this is only for generating the Doc help page. But no. Checkout this Example:
options:
format_string:
description: format of the string shown to user at play end
ini:
- section: callback_timer
key: format_string
env:
- name: ANSIBLE_CALLBACK_TIMER_FORMAT
default: "Playbook run took %s days, %s hours, %s minutes, %s seconds"
In there you have ini, env, and default sections, these are actually used to inject options into your Callback plugin using self._plugin_options['format_string'] or using self.get_option("format_string") for a list of all callback methods that can be overriden, please refer to https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/callback/init.py
For you the methods starting with v2_ are interesting, because these are for Ansible 2+.
Checkout https://github.com/ansible/ansible/tree/devel/lib/ansible/plugins/callback for more examples.
But it seems, that they are cleaning up quite a lot at the moment.
Therefore, I would say, please checkout a Version Tag, like:
https://github.com/ansible/ansible/tree/v2.9.6/lib/ansible/plugins/callback
Hope this helps.

How to send jmeter test results to datadog?

I wanted to ask if anyone has ever saved jmeter test results (sampler names, duration, pass/fail) to Datadog? Kinda like the backend listener for influx/graphite... but for Datadog. Jmeter-plugins has no such plugin. Datadog seems to offer something called "JMX integration" but I'm not sure whether that is what I need.
I figured out how to do this using the datadog api https://docs.datadoghq.com/api/?lang=python#post-timeseries-points. The following python script takes in the jtl file (jmeter results) and posts the transaction name, response time, and status (pass/fail) to datadog.
#!/usr/bin/env python3
import sys
import pandas as pd
from datadog import initialize, api
options = {
'api_key': '<API_KEY>',
'app_key': '<APPLICATION_KEY>'
}
metrics = []
def get_current_metric(timestamp, label, elapsed, success):
metric = {}
metric.update({'metric': 'jmeter'})
metric.update({'points': [(timestamp, elapsed)]})
curtags = {}
curtags.update({'testcase': label})
curtags.update({'success': success})
metric.update({'tags': curtags})
return metric
initialize(**options)
jtl_file = sys.argv[1]
df = pd.read_csv(jtl_file)
for index, row in df.iterrows():
timestamp = row['timeStamp']/1000
label = row['label']
elapsed = row['elapsed']
success = str(row['success'])
metric = get_current_metric(timestamp, label, elapsed, success)
metrics.append(metric)
api.Metric.send(metrics)

Resources