My Spider crashes whenever I schedule to production on Heroku - heroku

My spider is working both on the local machine and on the Heroku platform, however, I wish to schedule the spider so it works according to the time given through cron expression.
This is my periodic_requests.py file:
from pytz import timezone
import requests, pytz
from apscheduler.schedulers.twisted import TwistedScheduler
from twisted.internet import reactor
def send_request():
requests.post(
url="https://datacrawlapp.herokuapp.com/schedule.json",
data={"project":"datahubsitecrawler",
"spider":"datahubcrawlapp",
"data_url": "https://drive.google.com/file/d/1zBfsWZdoFzoAq6ZELfw-tvvBFcVmCMI7/view?usp=sharing"
}
)
if __name__ == '__main__':
scheduler = TwistedScheduler(timezone=pytz.utc),
scheduler.add_job(send_request, 'cron', day_of_week='mon-sun', hour='19', minute='47')
scheduler.start()
try:
reactor.run() # the script will block here until the crawling is finished
except (KeyboardInterrupt, SystemExit):
pass
Procfile:
web: scrapyd
clock: python periodic_requests
I get error:
File "/app/.heroku/python/lib/python3.8/site-packages/scrapyd/website.py", line 212,
in prep_tab_finished
2022-02-07T08:58:06.697647+00:00 app[web.1]: return '\n'.join(
2022-02-07T08:58:06.697647+00:00 app[web.1]: builtins.TypeError: 'NoneType' object is not iterable:
and then it crashes as shown below:
2022-02-07T08:48:32.411005+00:00 heroku[web.1]: State changed from starting to up
2022-02-07T08:48:32.611649+00:00 heroku[clock.1]: Process exited with status 0
2022-02-07T08:48:32.721748+00:00 heroku[clock.1]: State changed from up to crashed
2022-02-07T08:48:32.742567+00:00 heroku[clock.1]: State changed from crashed to starting
2022-02-07T08:48:41.000000+00:00 app[api]: Build succeeded
2022-02-07T08:48:42.473990+00:00 heroku[clock.1]: Starting process with command `python periodic_requests.py`
2022-02-07T08:48:43.195961+00:00 heroku[clock.1]: State changed from starting to up
2022-02-07T08:48:44.920927+00:00 heroku[clock.1]: Process exited with status 0
2022-02-07T08:48:44.984024+00:00 heroku[clock.1]: State changed from up to crashed
Will appreciate your help!

Related

Hypercorn runs with duplicated process

I am not sure whether this is really hypercorn issue, but could not imagine what else can be. I have searched the net but have not found any topic close to this, so please bear with me.
I am running a server with hypercorn on Ubuntu 20.04, with python3.8.10.
The problem is that it is runs with a duplicated process in background.
root 2278497 0.8 0.1 41872 33568 pts/7 S 10:03 0:00 /usr/bin/python3 /usr/local/bin/hypercorn -c config.toml main:app --reload
root 2278499 0.0 0.0 17304 11332 pts/7 S 10:03 0:00 /usr/bin/python3 -c from multiprocessing.resource_tracker import main;main(4)
root 2278500 0.7 0.1 41648 34148 pts/7 S 10:03 0:00 /usr/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe_handle=7) --multiprocessing-fork
The main process is 2278497, but there are duplicated processes 2278499 and 2278450. I do not know why these are started.
This causes unwanted effects by executing twice the same tasks.
How can I avoid that?
EDIT:
A minimal example:
# test_main.py
from fastapi import FastAPI
app = FastAPI()
#app.get("/")
async def root():
return {"message": "Hello World"}
print("main module loaded.")
I then type:
sudo hypercorn test_main:app
and the stdout is:
main module loaded.
main module loaded.
[2022-11-02 15:08:45 +0100] [2364437] [INFO] Running on http://127.0.0.1:8000 (CTRL + C to quit)
I get the impression you're using Hypercorn wrong and it's not designed for you to run your own code in the same process.
That said, what you're seeing in your MWE is a master process and a worker process. You can distinguish between these by checking whether the current process is a "daemon" process (i.e. Unix-style background task) or not via:
import multiprocessing
print(multiprocessing.current_process().daemon)
this will output False for the master process, and True for all worker processes. E.g. this would increase to 5 Trues when executed as hypercorn -w5 test_main:app.
I think I'd suggest not using this hack in production and using another system (e.g. systemd or supervisord) to make sure that any background tasks are kept running. This would give you more control over them. You could still have the code in the same file, just behind the normal if __name__ == '__main__': guard.
Update with more complete example:
from fastapi import FastAPI
from multiprocessing import current_process
app = FastAPI()
# see https://asgi.readthedocs.io/en/latest/specs/lifespan.html
#app.on_event('startup')
async def on_startup():
print("asgi lifecycle startup event")
#app.on_event('shutdown')
async def on_shutdown():
print("asgi lifecycle shutdown event")
#app.get("/")
async def root():
return {"message": "Hello World"}
def main():
print("running as main module")
# see https://docs.python.org/3/library/__main__.html
if __name__ == "__main__":
import sys
sys.exit(main())
# warning, these will also execute if this module imported
if not current_process().daemon:
print("main module loaded into master process")
else:
print("main module loaded into worker process")
can be run as:
$ hypercorn -w2 test_main:app
main module loaded into master process
main module loaded into worker process
asgi lifecycle startup event
[2022-11-04 11:39:31 +0000] [24243] [INFO] Running on http://127.0.0.1:8000 (CTRL + C to quit)
main module loaded into worker process
asgi lifecycle startup event
[2022-11-04 11:39:31 +0000] [24244] [INFO] Running on http://127.0.0.1:8000 (CTRL + C to quit)
^C
asgi lifecycle shutdown event
asgi lifecycle shutdown event
$ python -m test_hyper
running as main module
not the the first line mentions "master process". This is Hypercorn's supervisor process which is responsible for looking after worker processes (e.g. clean shutdown / restarting). I also show that this code can recognise that it's being run as a main module, and could do different things there. This is because Hypercorn is importing this module into each process (i.e. whether it's a master or worker).

Issue Deploying Heroku App - RQ Worker: wrong number of arguments for 'HSET' command

Modifying an app based on this example, I am having a problem when deploying my updated Heroku app during initialization of the web.1 dyno.
Everything is working fine in my development envr; when I start the 'app' and 'worker' executables directly, I can access the working app from 127.0.0.1:5000
I already have the postgresql and redistogo addons created as part of the app...
heroku config file
=== app Config Vars
APP_SETTINGS: config.StagingConfig
DATABASE_URL: postgres://[points-to-AWS-EC2].compute-1.amazonaws.com:[port]/[identifier]
REDISTOGO_URL: redis://redistogo:[identifier]#pike.redistogo.com:[port]/
Procfile
web: bash heroku.sh
heroku.sh
#!/bin/bash
gunicorn app:app --daemon
python worker.py
app.py
from collections import Counter
from datetime import datetime
import json
import re
import operator
import os
from flask import Flask, render_template, request, jsonify
from flask_sqlalchemy import SQLAlchemy
from requests import get, exceptions as rx
import nltk
from rq import Queue
from rq.job import Job
from validators import url as vurl
from worker import conn
app = Flask(__name__, template_folder='app-templates/')
app.config.from_object(os.environ['APP_SETTINGS'])
db = SQLAlchemy(app)
q = Queue(connection=conn)
def worker_task(url):
'''
takes website url and ...
worker.py
import os
import redis
from rq import Worker, Queue, Connection
listen = ['default']
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
# for tracking worker job in a terminal
# assumptions: redis-server is running
try:
with Connection(conn):
worker = Worker(list(map(Queue, listen)))
--->>> worker.work() <<<---
except Exception as e:
print(f"{str(e)}")
During the dyno startup, I get the following errors:
heroku[web.1]: State changed from crashed to starting
heroku[web.1]: Starting process with command bash heroku.sh
app[web.1]: Command # 2 (HSET rq:worker:2bd1d3d3a7f04d2fb7a09b34786689bf
birth 2020-05-23T11:28:04.485474Z
last_heartbeat 2020-05-23T11:28:04.485474Z
queues default pid 11
hostname b428750b-5d7f-4837-b9c2-b2195e234978
version 1.4.1 python_version 3.7.4 (default, Sep 12 2019, 01:19:52)
app[web.1]: [GCC 7.4.0]) of pipeline caused error: wrong number of arguments for 'hset' command
heroku[web.1]: Process exited with status 0
heroku[web.1]: State changed from starting to crashed
Im fairly confident that the error is in the highlighted worker.work() line from the worker.py file, because I get the same error when I run heroku run python worker.py (or directly copying those lines into the heroku run python REPL interpreter)
I had a similar problem, and I fixed it by downgrading my version of rq from 1.4.1 to 1.3.0.
SOLVED was an issue with my package versions...
the original example I was using had the following package version dependencies:
redis = "==3.4.1"
rq = "==1.2.2"
but I was just using the most recently updated packages = "*" for both, and when I originally deployed the package in Heroku, it installed RQ version "1.4.0", but then (apparently) when my new app was deployed, the Heroku installer used the latest 1.4.1 package, which broke during the worker.work() method...
Worker RQ v 1.4.0 output:
{
'connection': Redis<ConnectionPool<Connection<host=pike.redistogo.com,port=10038,db=0>>>,
'hostname': 'e868b6d3-70b4-4e17-98c2-b96aae04d9a8',
'pid': 39,
'job_class': <class 'rq.job.Job'>,
'queue_class': <class 'rq.queue.Queue'>, 'version': '1.4.0',
**'python_version': '3.7.4 (default, Sep 12 2019, 01:19:52) \n[GCC 7.4.0]',**
**'serializer': <module 'pickle' from '/app/.heroku/python/lib/python3.7/pickle.py'>,**
'name': '05c7af34c21f44f0b4374b40525779af',
'queues': [Queue('default')],
'_exc_handlers': [], ... ...
}
Worker RQ v1.4.1 output:
{
'connection': Redis<ConnectionPool<Connection<host=pike.redistogo.com,port=11022,db=0>>>,
'hostname': '19a476fa-b4e6-4e63-b40e-ac8779ae0f9e',
'pid': 39,
'job_class': <class 'rq.job.Job'>,
'queue_class': <class 'rq.queue.Queue'>,
'version': '1.4.1',
**'python_version': '3.7.4 (default, Sep 12 2019, 01:19:52) \n[GCC 7.4.0]',**
**'serializer': <class 'rq.serializers.DefaultSerializer'>,**
'name': 'b63389b3499f4d73be889a33f3777b46',
'queues': [Queue('default')],
'_exc_handlers': [], ... ...
}
You can see a difference in the 'serializer' key between the two, possibly causing the HSET exception that was being throw...
Reverting to RQ v1.2.2 solved the issue for me; I probably could have used the same v1.4.0...

Why is the bot on Heroku Server not responding?

I programmed a simple bot that is connected to Google Sheets API. It asks a few questions to the user and stores it in the Sheet I have. When I use bot.polling it works fine, but I was trying the webhook so I could use Heroku to have my bot running 24/7.
I introduced the code:
#server.route('/' + token, methods=['POST'])
def getMessage():
bot.process_new_updates([telebot.types.Update.de_json(request.stream.read().decode("utf-8"))])
return "!", 200
#server.route("/")
def webhook():
bot.remove_webhook()
bot.set_webhook(url = '<Heroku App Link>' + token)
return "!", 200
if __name__ == "__main__":
server.run(host="0.0.0.0", port = int(os.environ.get('PORT', 5000)))
I have the requirements.txt file and the Procfile that currently is like this:
web: python3 bot.py
I tried a bunch of things in the Procfile but this is the only one that didn't gave an error.
When I saw the log this is what I got:
2019-09-24T10:18:54.369238+00:00 heroku[web.1]: State changed from crashed to starting
2019-09-24T10:19:04.416777+00:00 heroku[web.1]: Starting process with command `python3 bot.py`
2019-09-24T10:19:09.868915+00:00 heroku[web.1]: State changed from starting to up
2019-09-24T10:19:09.790773+00:00 app[web.1]: * Serving Flask app "bot" (lazy loading)
2019-09-24T10:19:09.790800+00:00 app[web.1]: * Environment: production
2019-09-24T10:19:09.790853+00:00 app[web.1]: WARNING: This is a development server. Do not use it in a production deployment.
2019-09-24T10:19:09.790940+00:00 app[web.1]: Use a production WSGI server instead.
2019-09-24T10:19:09.790991+00:00 app[web.1]: * Debug mode: off
2019-09-24T10:19:09.799470+00:00 app[web.1]: * Running on http://0.0.0.0:57826/ (Press CTRL+C to quit)
2019-09-24T10:19:15.000000+00:00 app[api]: Build succeeded
Can anyone help me understand why the bot doesn't answer any commands?

Deploy Phoenix application in Heroku, but result in application error

I'm deployed my application in Heroku, but although deploy ok, in browser show view:
The Only clue of error is in log of application:
: State changed from crashed to starting
2017-03-31T020 heroku[web.1]: Process exited with status 1
1]: Starting process with command `MIX_ENV=prod mix phoenix.server`
=INFO REPORT==== 31-Mar-2017::02:20:04 ===
application: boom_one
exited: {{shutdown,
{failed_to_start_child,'Elixir.BoomOne.Endpoint',
{shutdown,
{failed_to_start_child,'Elixir.Phoenix.Endpoint.Server',
{shutdown,
{failed_to_start_child,
{ranch_listener_sup,'Elixir.BoomOne.Endpoint.HTTP'},
{'EXIT',
{noproc,
{gen_server,call,
[ranch_server,
{set_new_listener_opts,'Elixir.BoomOne.Endpoint.HTTP',
16384,
[{env,
[{'_',[],
[{[<<"socket">>,<<"websocket">>],
[{dispatch,
[],'Elixir.Phoenix.Endpoint.CowboyWebSocket',
{'Elixir.Phoenix.Transports.WebSocket',
{'Elixir.BoomOne.Endpoint',
'Elixir.BoomOne.UserSocket',websocket}}},
{'Elixir.BoomOne.Endpoint',
{'_',[],'Elixir.Plug.Adapters.Cowboy.Handler',
[]}}]}]}]}]}]}}}}}}}}},
{'Elixir.BoomOne',start,[normal,[]]}}
type: permanent
** (exit) exited in: GenServer.call(Mix.State, {:get, {Map, :get, [:debug, false]}}, 5000)
** (EXIT) no process
(mix) lib/mix/cli.ex:65: Mix.CLI.run_task/2
(elixir) lib/code.ex:363: Code.require_file/2
(elixir) lib/gen_server.ex:596: GenServer.call/3
{"init terminating in do_boot",{noproc,{gen_server,call,[elixir_config,{get_and_put,at_exit,[]}]}}}
init terminating in do_boot ()
Crash dump is being written to: erl_crash.dump...done
1]: Process exited with status 1
1]: State changed from starting to crashed
I followed this phoenix page tutorial on heroku.
I encountered the same problem and solved it by hardcoding "secret_key_base" setting instead of reading from system environment variable. This solution is not generally recommended as it exposes the secret key in the code repository but it might help debug the issue.
The problem has solution. As mentioned in elixirforum.com, you have to change the elixir version from 1.3 to 1.4

Heroku: no module named flask though it is in requirements.txt

Can anyone help with this error on Heroku?
2015-01-12T22:26:00.575669+00:00 heroku[web.1]: Starting process with command `python hangman.py`
2015-01-12T22:26:01.066240+00:00 app[web.1]: Traceback (most recent call last):
2015-01-12T22:26:01.066256+00:00 app[web.1]: File "hangman.py", line 3, in <module>
2015-01-12T22:26:01.066262+00:00 app[web.1]: from flask import (
2015-01-12T22:26:01.066268+00:00 app[web.1]: ImportError: No module named flask
2015-01-12T22:26:01.742444+00:00 heroku[web.1]: Process exited with status 1
2015-01-12T22:26:01.751614+00:00 heroku[web.1]: State changed from starting to crashed
My git repo is here: https://github.com/PJBentham/Hangman
I believe I have done everything correct as far as set up is concerned but I'm still getting the above error and can't figure out why?
You have to show the last or pull traceback to debug further ...! You git repo working perfectly fine.
From your traceback snippet flask is not there. Check if your app environ having flask or not using heroku run pip freeze

Resources