Pyppeteer: Browser closed unexpectedly in AWS Lambda - aws-lambda

I'm running into this error in AWS Lambda. It appears that the devtools websocket is not up. Not sure how to fix it. Any ideas? Thanks for your time.
Exception originated from get_ws_endpoint() due to websocket response timeout https://github.com/pyppeteer/pyppeteer/blob/ad3a0a7da221a04425cbf0cc92e50e93883b077b/pyppeteer/launcher.py#L225
Lambda code:
import os
import json
import asyncio
import logging
import boto3
import pyppeteer
from pyppeteer import launch
logger = logging.getLogger()
logger.setLevel(logging.INFO)
pyppeteer.DEBUG = True # print suppressed errors as error log
def lambda_handler(event, context):
asyncio.get_event_loop().run_until_complete(main())
async def main():
browser = await launch({
'headless': True,
'args': [
'--no-sandbox'
]
})
page = await browser.newPage()
await page.goto('http://example.com')
await page.screenshot({'path': '/tmp/example.png'})
await browser.close()
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
Exception:
Response:
{
"errorMessage": "Browser closed unexpectedly:\n",
"errorType": "BrowserError",
"stackTrace": [
" File \"/var/task/lambda_handler.py\", line 23, in lambda_handler\n asyncio.get_event_loop().run_until_complete(main())\n",
" File \"/var/lang/lib/python3.8/asyncio/base_events.py\", line 616, in run_until_complete\n return future.result()\n",
" File \"/var/task/lambda_handler.py\", line 72, in main\n browser = await launch({\n",
" File \"/opt/python/pyppeteer/launcher.py\", line 307, in launch\n return await Launcher(options, **kwargs).launch()\n",
" File \"/opt/python/pyppeteer/launcher.py\", line 168, in launch\n self.browserWSEndpoint = get_ws_endpoint(self.url)\n",
" File \"/opt/python/pyppeteer/launcher.py\", line 227, in get_ws_endpoint\n raise BrowserError('Browser closed unexpectedly:\\n')\n"
]
}
Request ID:
"06be0620-8b5c-4600-a76e-bc785210244e"
Function Logs:
START RequestId: 06be0620-8b5c-4600-a76e-bc785210244e Version: $LATEST
---- files in /tmp ----
[W:pyppeteer.chromium_downloader] start chromium download.
Download may take a few minutes.
0%| | 0/108773488 [00:00<?, ?it/s]
11%|█▏ | 12267520/108773488 [00:00<00:00, 122665958.31it/s]
27%|██▋ | 29470720/108773488 [00:00<00:00, 134220418.14it/s]
42%|████▏ | 46172160/108773488 [00:00<00:00, 142570388.86it/s]
58%|█████▊ | 62607360/108773488 [00:00<00:00, 148471487.93it/s]
73%|███████▎ | 79626240/108773488 [00:00<00:00, 154371569.93it/s]
88%|████████▊ | 95754240/108773488 [00:00<00:00, 156353972.12it/s]
100%|██████████| 108773488/108773488 [00:00<00:00, 161750092.47it/s]
[W:pyppeteer.chromium_downloader]
chromium download done.
[W:pyppeteer.chromium_downloader] chromium extracted to: /tmp/local-chromium/588429
-----
/tmp/local-chromium/588429/chrome-linux/chrome
[ERROR] BrowserError: Browser closed unexpectedly:
Traceback (most recent call last):
  File "/var/task/lambda_handler.py", line 23, in lambda_handler
    asyncio.get_event_loop().run_until_complete(main())
  File "/var/lang/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/var/task/lambda_handler.py", line 72, in main
    browser = await launch({
  File "/opt/python/pyppeteer/launcher.py", line 307, in launch
    return await Launcher(options, **kwargs).launch()
  File "/opt/python/pyppeteer/launcher.py", line 168, in launch
    self.browserWSEndpoint = get_ws_endpoint(self.url)
  File "/opt/python/pyppeteer/launcher.py", line 227, in get_ws_endpoint
    raise BrowserError('Browser closed unexpectedly:\n')END RequestId: 06be0620-8b5c-4600-a76e-bc785210244e
REPORT RequestId: 06be0620-8b5c-4600-a76e-bc785210244e Duration: 33370.61 ms Billed Duration: 33400 ms Memory Size: 3008 MB Max Memory Used: 481 MB Init Duration: 445.58 ms

I think BrowserError: Browser closed unexpectedly is just the error you get when Chrome crashes for whatever reason. It would be nice if pyppeteer printed out the error, but it doesn't.
To track things down, it's helpful to pull up the exact command that pyppeteer runs. You can do that this way:
>>> from pyppeteer.launcher import Launcher
>>> ' '.join(Launcher().cmd)
/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome --disable-background-networking --disable-background-timer-throttling --disable-breakpad --disable-browser-side-navigation --disable-client-side-phishing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=site-per-process --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --disable-translate --metrics-recording-only --no-first-run --safebrowsing-disable-auto-update --enable-automation --password-store=basic --use-mock-keychain --headless --hide-scrollbars --mute-audio about:blank --no-sandbox --remote-debugging-port=33423 --user-data-dir=/root/.local/share/pyppeteer/.dev_profile/tmp5cj60q6q
When I ran that command in my Docker image, I got the following error:
$ /root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome # ...
/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome:
error while loading shared libraries:
libnss3.so: cannot open shared object file: No such file or directory
So I installed libnss3:
apt-get install -y libnss3
Then I ran the command again and got a different error:
$ /root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome # ...
[0609/190651.188666:ERROR:zygote_host_impl_linux.cc(89)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.
So I needed to change my launch command to something like:
browser = await launch(headless=True, args=['--no-sandbox'])
and now it works!

Answering my own question.
Finally I was able to run Pyppeteer(v0.2.2) with Python 3.6 and 3.7 (not 3.8) after I bundled chromium binary in a lambda layer.
So in summary, it appears to work only when its configured to run with user provided chromium executable path and not with automatically downloaded chrome. Probably some race condition or something.
Got Chromium from https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-41/stable-headless-chromium-amazonlinux-2017-03.zip
browser = await launch(
headless=True,
executablePath='/opt/python/headless-chromium',
args=[
'--no-sandbox',
'--single-process',
'--disable-dev-shm-usage',
'--disable-gpu',
'--no-zygote'
])
Issue posted on repo https://github.com/pyppeteer/pyppeteer/issues/108

I have been trying to run pyppeteer in a Docker container and ran into the same issue.
Finally managed to fix it thanks to this comment: https://github.com/miyakogi/pyppeteer/issues/14#issuecomment-348825238
I installed Chrome manually through apt
curl -sSL https://dl.google.com/linux/linux_signing_key.pub | apt-key add -
echo "deb [arch=amd64] https://dl.google.com/linux/chrome/deb/ stable main" > /etc/apt/sources.list.d/google-chrome.list
apt update -y && apt install -y google-chrome-stable
and then specified the path when launching the browser.
You also have to run it in headless and with args "--no-sandbox"
browser = await launch(executablePath='/usr/bin/google-chrome-stable', headless=True, args=['--no-sandbox'])
Hope this will help!

If anybody is running on Heroku and facing the same error:
Add the buildpack : The url for the buildpack is below :
https://github.com/jontewks/puppeteer-heroku-buildpack
Ensure that you're using --no-sandbox mode
launch({ args: ['--no-sandbox'] })

Make sure all the necessary dependencies are installed. You can run ldd /path/to/your/chrome | grep not on a Linux machine to check which dependencies are missing.
In my case, i get this:
libatk-bridge-2.0.so.0 => not found
libgtk-3.so.0 => not found
and then install dependencies:
sudo apt-get install at-spi2-atk gtk3
and now it works!

Related

How could I resolve xmlrpc.client.Fault while trying to search a package through pip3? [duplicate]

I am getting this error in pip search while studying python.
The picture is an error when I pip search. Can you tell me how to fix it?
$ pip search pdbx
ERROR: Exception:
Traceback (most recent call last):
File "*/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 224, in _main
status = self.run(options, args)
File "*/lib/python3.7/site-packages/pip/_internal/commands/search.py", line 62, in run
pypi_hits = self.search(query, options)
File "*/lib/python3.7/site-packages/pip/_internal/commands/search.py", line 82, in search
hits = pypi.search({'name': query, 'summary': query}, 'or')
File "/usr/lib/python3.7/xmlrpc/client.py", line 1112, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python3.7/xmlrpc/client.py", line 1452, in __request
verbose=self.__verbose
File "*/lib/python3.7/site-packages/pip/_internal/network/xmlrpc.py", line 46, in request
return self.parse_response(response.raw)
File "/usr/lib/python3.7/xmlrpc/client.py", line 1342, in parse_response
return u.close()
File "/usr/lib/python3.7/xmlrpc/client.py", line 656, in close
raise Fault(**self._stack[0])
xmlrpc.client.Fault: <Fault -32500: 'RuntimeError: This API has been temporarily disabled due to unmanageable load and will be deprecated in the near future. Please use the Simple or JSON API instead.'>
The pip search command queries PyPI's servers, and PyPI's maintainers have explained that the API endpoint that the pip search command queries is very resource intensive and too expensive for them to always keep open to the public. Consequently they sometimes throttle access and are actually planning to remove it completely soon.
See this GitHub issues thread ...
The solution I am using for now is to pip install pip-search (a utility created by GitHub user #victorgarric).
So, instead of 'pip search', I use pip_search. Definitely beats searching PyPI via a web browser
Follow the suggestion from JRK at the discussion at github (last comment) the search command is temporarily disabled, use your browser to search for packages meanwhile:
Check the thread on github and give him a thumb up ;)
search on website, https://pypi.org/,
then install the package you wanted
The error says
Please use the Simple or JSON API instead
You can try pypi-simple to query the pip repository
https://pypi.org/project/pypi-simple/
It gives an example too, I tried to use it here:
pypi-simple version 0.8.0 DistributionPackage' object has no attribute 'get_digest':
!/usr/bin/env python3
-*- coding: utf-8 -*-
"""
Created on Thu Nov 11 17:40:03 2020
#author: Pietro
"""
from pypi_simple import PyPISimple
def simple():
package=input('\npackage to be checked ')
try:
with PyPISimple() as client:
requests_page = client.get_project_page(package)
except:
print("\n SOMETHING WENT WRONG !!!!! \n\n",
"CHECK INTERNET CONNECTION OR DON'T KNOW WHAT HAPPENED !!!\n")
pkg = requests_page.packages[0]
print(pkg)
print(type(pkg))
print('\n',pkg,'\n')
print('\n'+pkg.filename+'\n')
print('\n'+pkg.url+'\n')
print('\n'+pkg.project+'\n')
print('\n'+pkg.version+'\n')
print('\n'+pkg.package_type+'\n')
#print('\n'+pkg.get_digest()+'\n','ENDs HERE !!!!') #wasnt working
if __name__ == '__main__':
simple()
got -4 so far for this answer don't know why , figureout I can try to check for a package with:
# package_name = input('insert package name : ')
package_name = 'numpy'
import requests
url = ('https://pypi.org/pypi/'+package_name+'/json')
r = requests.get(url)
try:
data = r.json()
for i in data:
if i == 'info':
print('ok')
for j in data[i]:
if j == 'name':
print((data[i])[j])
print([k for k in (data['releases'])])
except:
print('something went south !!!!!!!!!!')

Local AWS serverless application model python application could not able to load jira module

Local AWS serverless application model python application could not able to load jira module. It is working fine if i use pycharm run option but whenever i use debug option it throws exception
my requirements.txt is as follows,
cherrypy
dash-table
dash
dash_core_components
dash_html_components
dash_renderer
flask
Jinja2
jira
jupyter-core
markupsafe
nltk
oauthlib[signedtoken]
pandas
pbr
plotly
pymongo
python-dateutil
requests
splunklib
boto3
botocore
sklearn
also tried with mentioning jira version like
jira==2.0.0/jira==1.0.8 etc
whole exception message
START RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72 Version: $LATEST
END RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72
REPORT RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72 Init Duration: 15853.40 ms Duration: 0.02 ms Billed Duration: 100 ms Memory Size: 1024 MB Max Memory Used: 133 MB
{
"errorType": "Exception",
"errorMessage": "Versioning for this project requires either an sdist tarball, or access to an upstream git repository. It's also possible that there is a mismatch between the package name in setup.cfg and the argument given to pbr.version.VersionInfo. Project name jira was given, but was not able to be found.",
"stackTrace": [
" File \"/var/lang/lib/python3.7/imp.py\", line 234, in load_module\n return load_source(name, filename, file)\n",
" File \"/var/lang/lib/python3.7/imp.py\", line 171, in load_source\n module = _load(spec)\n",
" File \"\u003cfrozen importlib._bootstrap\u003e\", line 696, in _load\n",
" File \"\u003cfrozen importlib._bootstrap\u003e\", line 677, in _load_unlocked\n",
" File \"\u003cfrozen importlib._bootstrap_external\u003e\", line 728, in exec_module\n",
" File \"\u003cfrozen importlib._bootstrap\u003e\", line 219, in _call_with_frames_removed\n",
" File \"/var/task/app.py\", line 5, in \u003cmodule\u003e\n import project.helper as helper\n",
" File \"/var/task/project/helper.py\", line 15, in \u003cmodule\u003e\n import project.jira_helper as JiraHandling\n",
" File \"/var/task/project/jira_helper.py\", line 6, in \u003cmodule\u003e\n from qbojira import JiraController\n",
" File \"/var/task/qbojira/JiraController.py\", line 2, in \u003cmodule\u003e\n print (pbr.version.VersionInfo('jira').version_string())\n",
" File \"/var/task/pbr/version.py\", line 467, in version_string\n return self.semantic_version().brief_string()\n",
" File \"/var/task/pbr/version.py\", line 462, in semantic_version\n self._semantic = self._get_version_from_pkg_resources()\n",
" File \"/var/task/pbr/version.py\", line 449, in _get_version_from_pkg_resources\n result_string = packaging.get_version(self.package)\n",
" File \"/var/task/pbr/packaging.py\", line 874, in get_version\n name=package_name))\n"
]
}
I was getting the same issue when I tried to upload my code with packages on AWS lambda function. After much research, trial and errors, adding the idna packages along with the jira packages worked for me.
idna
idna-2.10.dist-info
jira
jira-2.0.0.dist-info

Cloudify File Plugin "Operation not permitted" error

I'm attempting to copy a file to a VM using the cloudify.nodes.File type, but am running into a permission error that I'm having trouble figuring out.
According to the documentation, I should be able to copy a file by using:
docker_yum_repo:
type: cloudify.nodes.File
properties:
resource_config:
resource_path: resources/docker.repo
file_path: /etc/yum.repos.d/docker.repo
owner: root:root
mode: 644
The relevant portions of my blueprint are:
vm_0:
type: cloudify.nodes.aws.ec2.Instances
properties:
client_config: *client_config
agent_config:
install_method: none
user: ubuntu
resource_config:
kwargs:
ImageId: { get_attribute: [ ami, aws_resource_id ] }
InstanceType: t2.micro
UserData: { get_input: install_script }
KeyName: automation
relationships:
- type: cloudify.relationships.depends_on
target: ami
- type: cloudify.relationships.depends_on
target: nic_0
...
file_0:
type: cloudify.nodes.File
properties:
resource_config:
resource_path: resources/config/file.conf
file_path: /home/ubuntu/file.conf
owner: root:root
mode: 644
relationships:
- type: cloudify.relationships.contained_in
target: vm_0
But, I keep receiving the error:
2019-02-20 15:36:59.128 CFY <sbin> 'install' workflow execution failed: RuntimeError: Workflow failed: Task failed 'cloudify_files.tasks.create' -> [Errno 1] Operation not permitted: './file.conf'
Execution of workflow install for deployment sbin failed. [error=Traceback (most recent call last):
File "/opt/mgmtworker/env/lib/python2.7/site-packages/cloudify/dispatch.py", line 571, in _remote_workflow_child_thread
workflow_result = self._execute_workflow_function()
File "/opt/mgmtworker/env/lib/python2.7/site-packages/cloudify/dispatch.py", line 600, in _execute_workflow_function
result = self.func(*self.args, **self.kwargs)
File "/opt/mgmtworker/env/lib/python2.7/site-packages/cloudify/plugins/workflows.py", line 30, in install
node_instances=set(ctx.node_instances))
File "/opt/mgmtworker/env/lib/python2.7/site-packages/cloudify/plugins/lifecycle.py", line 29, in install_node_instances
processor.install()
File "/opt/mgmtworker/env/lib/python2.7/site-packages/cloudify/plugins/lifecycle.py", line 102, in install
graph.execute()
File "/opt/mgmtworker/env/lib/python2.7/site-packages/cloudify/workflows/tasks_graph.py", line 237, in execute
raise self._error
RuntimeError: Workflow failed: Task failed 'cloudify_files.tasks.create' -> [Errno 1] Operation not permitted: './file.conf'
I've tried a few different values for file_path: "/home/ubuntu/file.conf", "/tmp/file.conf", and "./file.conf" (shown in the error output above), but I receive the same permission error each time. I've also tried the relationship: cloudify.relationships.depends_on without any success as well.
I'm using Cloudify Manager 4.5.5 via their Docker image.
Has anyone seen this issue? Am I using the plugin incorrectly? And is this "best-practice" or should I create a new VM that already has all of the files necessary and have that spun-up on AWS?
Thanks in advance!
Update
I forgot to mention that if I try to set the owner of the file to ubuntu:ubuntu, I get an error about the user not being found:
2019-02-20 16:19:21.743 CFY <sbin> 'install' workflow execution failed: RuntimeError: Workflow failed: Task failed 'cloudify_files.tasks.create' -> 'getpwnam(): name not found: ubuntu'
Execution of workflow install for deployment sbin failed. [error=Traceback (most recent call last):
File "/opt/mgmtworker/env/lib/python2.7/site-packages/cloudify/dispatch.py", line 571, in _remote_workflow_child_thread
workflow_result = self._execute_workflow_function()
File "/opt/mgmtworker/env/lib/python2.7/site-packages/cloudify/dispatch.py", line 600, in _execute_workflow_function
result = self.func(*self.args, **self.kwargs)
File "/opt/mgmtworker/env/lib/python2.7/site-packages/cloudify/plugins/workflows.py", line 30, in install
node_instances=set(ctx.node_instances))
File "/opt/mgmtworker/env/lib/python2.7/site-packages/cloudify/plugins/lifecycle.py", line 29, in install_node_instances
processor.install()
File "/opt/mgmtworker/env/lib/python2.7/site-packages/cloudify/plugins/lifecycle.py", line 102, in install
graph.execute()
File "/opt/mgmtworker/env/lib/python2.7/site-packages/cloudify/workflows/tasks_graph.py", line 237, in execute
raise self._error
RuntimeError: Workflow failed: Task failed 'cloudify_files.tasks.create' -> 'getpwnam(): name not found: ubuntu'
It looks like the VM isn't yet ready to receive the file (since it's failing in the install lifecycle).
Try "use_sudo: true" in the "resource_config" block. Also add an interfaces block like this:
interfaces:
cloudify.interfaces.lifecycle:
create:
executor: host_agent
delete:
executor: host_agent
If you don't override the executor, it will run on the manager (which is probably why you see "ubuntu" user not existing).

PIP search - HTTPError: 503 Server Error: Backend is unhealthy for url

I'm trying to install python-telegram-bot in my router (asuswrt) which has entware.
I'm getting "503 Server Error: Backend if unhealthy for URL"..
I tried on win10 but keep getting the same error.
$ pip search python
HTTP error 503 while getting https://pypi.org/pypi
Exception:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/pip/_internal/cli/base_command.py", line 143, in main
status = self.run(options, args)
File "/usr/lib/python2.7/site-packages/pip/_internal/commands/search.py", line 48, in run
pypi_hits = self.search(query, options)
File "/usr/lib/python2.7/site-packages/pip/_internal/commands/search.py", line 65, in search
hits = pypi.search({'name': query, 'summary': query}, 'or')
File "/usr/lib/python2.7/xmlrpclib.py", line 1243, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1602, in __request
verbose=self.__verbose
File "/usr/lib/python2.7/site-packages/pip/_internal/download.py", line 789, in request
response.raise_for_status()
File "/usr/lib/python2.7/site-packages/pip/_vendor/requests/models.py", line 939, in raise_for_status
raise HTTPError(http_error_msg, response=self)
HTTPError: 503 Server Error: Backend is unhealthy for url: https://pypi.org/pypi
PyPI was down for scheduled maintenance.
In the future, you can check https://status.python.org/ when you have issues like this.

azure-sdk-python status code not found GraphRbacManagementClient

I am trying to enumerate Azure AD users from an azure subscription, with this code:
WORKING_DIRECTORY = os.getcwd()
TENANT_ID = "REDACTED_AZURE_ID_OF_MY_AZURE_AD_USER"
AZURE_AUTH_LOCATION = os.path.join(WORKING_DIRECTORY, "mycredentials.json") # from: az ad sp create-for-rbac --sdk-auth > mycredentials.json
# I've tried with get_client_from_cli_profile() while logged in azure CLI
# I've tried with and without parameters auth_path and tenant_id
rbac_client = get_client_from_auth_file(GraphRbacManagementClient,auth_path=AZURE_AUTH_LOCATION, tenant_id=TENANT_ID)
# Try to list users
for user in rbac_client.users.list():
pprint(user.__dict__)
As I've detailed in the comments, I've tried to fix the issue with a couple of unsuccessful attempts, here is the stacktrace
/home/guillaumedsde/.virtualenvs/champollion/bin/python /home/guillaumedsde/PycharmProjects/champollion/champollion/champollion.py
Traceback (most recent call last):
File "/home/guillaumedsde/PycharmProjects/champollion/champollion/champollion.py", line 582, in <module>
gitlab_project_member.access_level)
File "/home/guillaumedsde/PycharmProjects/champollion/champollion/champollion.py", line 306, in create_role_assignment
"principal_id": get_user_azure_id(user)} # get_user_azure_id(user)} # TODO
File "/home/guillaumedsde/PycharmProjects/champollion/champollion/champollion.py", line 329, in get_user_azure_id
for user in rbac_client.users.list():
File "/home/guillaumedsde/.virtualenvs/champollion/lib/python3.6/site-packages/msrest/paging.py", line 131, in __next__
self.advance_page()
File "/home/guillaumedsde/.virtualenvs/champollion/lib/python3.6/site-packages/msrest/paging.py", line 117, in advance_page
self._response = self._get_next(self.next_link)
File "/home/guillaumedsde/.virtualenvs/champollion/lib/python3.6/site-packages/azure/graphrbac/operations/users_operations.py", line 158, in internal_paging
raise models.GraphErrorException(self._deserialize, response)
azure.graphrbac.models.graph_error.GraphErrorException: Operation returned an invalid status code 'Not Found'
Process finished with exit code 1
Was a bug fixed in azure-common 1.1.13
https://pypi.org/project/azure-common/1.1.13/
You can now simply do that (with no tenant ID)
rbac_client = get_client_from_auth_file(GraphRbacManagementClient,auth_path=AZURE_AUTH_LOCATION)
I took this opportunity to fix the CLI version of this method as well.
(I own this code at MS)

Resources