Heroku - prevent scraping - heroku

I am looking on my drain log and I see this
327 <158>1 2018-04-17T22:03:27.578702+00:00 heroku router - - at=info method=GET path="/{url}" host={my_host} request_id=11bb9b05-dea3-42c2-b57a-9be6fb9b93d2 fwd="80.6.26.72,141.101.107.25" dyno=web.1 connect=0ms service=1ms status=200 bytes=6265 protocol=http
I am certain that this request doesn't come from a legit user, how is it possible to dig in more and get the remote server IP? I used https://stackoverflow.com/a/6837689/2513428 inside my script to check the ip's but I assume it returned the proxy of herocu servers.

Heroku makes the IP making the request available in the fwd log field: https://devcenter.heroku.com/articles/http-routing#heroku-router-log-format
You can also read it within your code by looking at the X-Forwarded-For HTTP header.
So in your case, the IP of the client making this request was 80.6.26.72.

Related

Heroku Request blocked

in my application (deployed in Heroku), there is a request (GET) that is blocked in the infrastructure layer, the request does not get to execute the code of my application. It returns an error status=400 and connect=0ms and does not carry any associated Heroku error code and description. The request never reaches the application.
It only happens with this GET request, when it comes from the production server. if I make the request from postman it is received correctly with status=200
The other requests have not problem and are executed correctly from the production server.
This is an example:
2021-08-20T10: 27: 02.217551 + 00: 00 heroku [router]: at=info method=GET path="/api/get" host=myapp.herokuapp.com request_id=2920634e-87f2-4b2c-be60-b38497c53e58 dyno=web.1 connect=0ms service=1ms status=400 bytes=47 protocol=https
The problem was identified and corrected.
The problem was that one of the headers of the GET request was being sent to null and the request was rejected as Bad Request Exception before entering the APP.
regards

MeteorJS high response time on Heroku

I am running a Meteor app on Heroku and the Heroku response time chart is constantly filled with high response time queries hitting 30 seconds limit
The router logs shows that all the 30 seconds queries are some kind of sockJS xhr call
14 Aug 2021 11:37:33.249334 <158>1 2021-08-14T03:37:32.837295+00:00 heroku router - - at=info method=POST path="/sockjs/950/pwiv10fk/xhr" host=www.abc.com request_id=edbe326a-cddd-4e5e-9347-0f5c1c49651b fwd="121.6.80.103,162.158.165.53" dyno=web.1 connect=0ms service=29493ms status=200 bytes=384 protocol=httpHigh Response Time
14 Aug 2021 11:37:33.358335 <158>1 2021-08-14T03:37:32.837093+00:00 heroku router - - at=info method=POST path="/sockjs/707/7haw05h4/xhr" host=www.abc.com request_id=20ca52a2-5aa4-43fa-bb61-39462eb0ab86 fwd="121.6.80.103,162.158.167.169" dyno=web.1 connect=0ms service=29484ms status=200 bytes=384 protocol=httpHigh Response Time
14 Aug 2021 11:38:03.405335 <158>1 2021-08-14T03:38:02.836998+00:00 heroku router - - at=info method=POST path="/sockjs/950/pwiv10fk/xhr" host=www.abc.com request_id=f3a51bc3-9319-4b80-91b5-ed76bf179493 fwd="121.6.80.103,162.158.166.136" dyno=web.1 connect=0ms service=29480ms status=200 bytes=384 protocol=httpHigh Response Time
The reality is that there is only a few users on the app, and no unusual activity happening on server. Actually the users did not experience slow down as well, so i suspect, the socket connections mechanisms on Meteor app is polluting the heroku response time charts, I also have to turn off slow response time alerts because they are too many.
These non-stops 30 seconds response times calls creates a problem which I do not really know the true speed of the web app.
What is happening and how to avoid this?
It seems that the issue was introduced by Cloudflare. They seem to have been interrupting the websocket connections, causing Meteor to fall back to polling, which is not efficient.

How do I suppress http request logs on heroku (using flask, if it matters)?

Very simple question: I'm running a simple flask app on heroku with no changes to the default logging settings. But my logs are filled with all kinds of terrible http request noise.
For example, I don't have any favicon or anything like that set up on my app. I don't need one. But every browser, of course, requests one, and so whenever I try to look at my logs, I get floods of requests with a 404 for the favicon and such. Which is totally useless information to me.
Example garbage logs (with sensitive information stripped):
2018-02-01T04:11:32.538658+00:00 heroku[router]: at=info method=GET
path="/apple-touch-icon-precomposed.png" host=[MY_HOSTNAME_CENSORED]
request_id=[A_UUID] fwd="[AN_IP_ADDRESS]" dyno=web.1 connect=0ms
service=17ms status=404 bytes=386 protocol=https
2018-02-01T04:11:32.675406+00:00 heroku[router]: at=info method=GET
path="/favicon.ico" host=[MY_HOSTNAME_CENSORED] request_id=
fwd="[AN_IP_ADDRESS]" dyno=web.1 connect=0ms service=2ms status=404
bytes=386 protocol=https
I think that these logs are generated by heroku itself rather than the application (that's what the bit after the timestamp means, right?), but I can't find any documentation anywhere on how to change that.
There's an earlier related SO, but the latest relevant answer saying that you can't disable logs is from 2014---so I like to think this might have changed.
Alternatively, is there some way to instruct browsers not to request favicons and such?
You could easily do this kind of filtering in whatever tool you are using for reading your logs.
For example, if you attach the Papertrail add-on to your Heroku app, you can easily configure it to filter out any log patterns you want, even if you are using their free plan.
Such configuration is done via the Papertrail "Settings" menu, under "Filter logs".
See Log Filtering for details.
There isn't any way to get rid of it entirely. But, if what you're really annoyed by is the router showing up when you're live tailing your logs (which is what I was annoyed by), then you can add "--source app" to the tail command to get rid of the router logs, like this:
heroku logs --tail --source app --remote whateveryounamedit
Then you'll only see logs generated by your app.

What does fwd mean in the heroku logs?

Inside the heroku logs, I have the following line...
Aug 06 21:50:18 coolApp heroku/router: at=info method=GET path="/about.jpg" host=coolapp.com fwd="78.7.88.177,643.198.55.55" dyno=web.1 connect=0ms service=43ms status=500 bytes=420
My question is what does fwd represent? I see they are IP addresses. Are they IP addresses from the user?
From heroku docs:
fwd: HTTP request X-Forwarded-For header value
From Wikipedia:
The X-Forwarded-For (XFF) HTTP header field is a common method for identifying the originating IP address of a client connecting to a web server through an HTTP proxy or load balancer.
So I think you are right when the http requests are from a user's browser.

Proximo Heroku addon is timing out

I have a simple node proxy that I have added proximo to use whitelisting for an API. It worked before, but having followed the tutorial I now get this in my logs:
2014-08-19T16:23:21.376311+00:00 heroku[router]: at=error code=H12 desc="Request timeout" method=GET path="/?url=http://www.google.com" host=warm-cliffs-7633.herokuapp.com request_id=ecf77eea-a027-4115-86ff-5acf527c7333 fwd="82.24.137.140" dyno=web.1 connect=1ms service=30001ms status=503 bytes=623
If I try to access the page, I get an error message, but the page works fine if no URL is requested from the proxy.
There isn't very much documentation and I'm not sure if this is right in my Procfile:
web: bin/proximo node proxy.js

Resources