IP addresses and detection of bot/spam traffic - proxy

I am trying to detect bot traffic to an application using a list of session ips.
The simplistic solution would be to find occurrences of identical ips and if the number of these are beyond a threshold, we could say that that traffic is coming from a bot.
I got myself thinking and doing some research and was questioning:
Could traffic coming from a single IP be coming from multiple users hiding behind a subnet or proxy? In which case definitely not being a bot?
(also i dont really understand how subnetting or proxies work, so be gentle.)

there is more than just visit from IP, and its possible to get different visitors from the same IP (especially if visitor is using dial-up connection)
The way I would catch bots are with process of elimination from obvious to probable
if userAgent is empty
if userAgent is short or not descriptive
if userAgent contains some obvious signatures or rogue bots that I
don't want accessing my site
if visitor's average pageview stay on the page is less than 3 sec its
a bot
in this case I bounce the hit
and then not so obvious
I record a ip, timestamp and userAgent of every visit for 30 min. and compare every new visit to the pool.
If IP is accessing the site too quickly, most likely its a bot
if IP is accessing the site with different userAgents is probably a
bot
in this case I preset captcha

Related

How to improve/minimize varying response time of api

I created a rest api and I am not very happy with the performance of it. I spent some time to investigate and stumbled across a tool to easily track the performance of my api (www.apiscience.com).
They split the overall response time in 4 categories- connect, resolve, processing and transfer. The resolve part often takes about 150ms while the processing of the call itself only takes about 18ms which results in an average response time of 160ms (the call i tried here is really simple so the average would be higher normally).
My question is how can I improve/minimize the resolve time for my calls?
(side info: my servers are placed in Ireland and I chose Ireland as location for the tests too)
Thanks in advance!
Edit - What do they mean with Resolve Time?
(https://www.apiscience.com/blog/what-do-api-sciences-curl-based-timings-mean/)
API Science’s “Resolve Time” is the equivalent of Ken’s “DNS Lookup.”
DNS stands for Domain Name System. A URL consists of text (and
sometimes numbers); however, the communication addresses that compose
the Internet are formulated as IP (Internet Protocol) addresses, for
example, 208.80.152.2. Before a request can be routed between the
requesting client and the server that will process the request, the IP
address that the URL refers must be looked up. A request is sent to a
DNS resolver by curl, and the resolver returns the correlated IP
address. API Science’s “Resolve Time” is the time in milliseconds that
it took this operation to complete.
As the documentation mentions, the DNS resolution time is the amount of time an API consuming client waits before finding out where to route the actual calls to your API server - the mapping between your server's name and IP address.
Where you host your DNS can be completely independent from both where you host your API service, and where your domain name is registered, and there are multiple choices in the market for DNS hosting service. DNSPerf (of which I have no affiliation) does a comparison of services and is probably a good starting point for further research if you'd like to select a new DNS provider.

How to protect websocket connection ip from being modified

I am working on a small project to help me understand websockets better. I am making a simple browser game that connects to an ip via a websocket. There will be 3 ip addresses however I want to assign the user an ip and not have them able to modify it so they are unable to get on the same server as friends.
I will assign the ip based on how full the games are etc and this will be down via php. Currently although it connects to this ip, the user is able to use the console in a browser to modify the ip to one of the other ones.
I was thinking of sending a check number, so the web server sends this to the user along with the ip. It also sends it to the websocket server. Then when a user connects if the check number doesn't match it rejects the connection.
I'm new to websockets so I'm not sure if this would be easy to implement, so are there any easy solutions to this?
That seems to be the duty of other element, in particular the load balancer. How are you balancing the requests across those 3 servers? Does your load balancer support sticky sessions?
If not, probably you can record to which IP address the user connected first, and they if it connects to one of the other two later, you can return a HTTP 302 (Redirect) pointing to the server you want.
Cheers.

Redirect To Specific Page

What is the problem when we cannot connect to specific domain .
For example , we cannot visit hotmail.com.
Without more information it's hard to tell but here are a few possibilities:
An issue on your connection. If you can visit other remote sites, that's obviously not the problem.
An issue on one of your ISP connections. Can you visit other sites in the same area/country as the site that you cannot visit?
An explicit filter that restricts access to that site. For example, some ISPs block YouTube, corporations may block their competitors' networks, governments block sites that allow their political opponents to speak up, educational institutions (attempt to) block porn sites and aware parents block as much as they can on the computers of their children.
A DNS server issue that does not allow that site to be resolved. If you know its IP address you can try that directly.
Connectivity problems from that remote site or its ISP. DDoS attack on the network of an ISP or hosting provider can easily disable a large number of sites at the same time.
The problem site could simply experience server problems or be overloaded. Major sites like Hotmail are far more unlikely to be affected like this, although a DDoS attack can bring a site on its knees.
Someone in your corner of the Internet (or you, for that matter) has been bad (sic), and the remote site has temporarily blocked your IP address range to protect themselves.
There are other alternatives, of course, but debugging network issues is impossible with a problem description of "it don't works anymore"...

I can't add the last Google Apps Email MX record because Zerigo DNS only allows 10 records. Will it matter?

I am using Heroku for my site hosting, and Zerigo for DNS magagement. I'm trying to setup Google Apps email, but I can't add the last MX record (ASPMX5.GOOGLEMAIL.COM.) because I have hit my limit of 10 allowed total records on Zerigo. Will it matter if I don't add this last record?
No it won't. Google Apps gives you that many MX servers for heavy redundancy in case of problems. MX records have a server to send attempt to send mail to and a priority associated with it (IE which server are you in line to receive mail for this domain if the other isn't available). You weren't able to add the lowest priority server to your DNS. Normally no mail server would attempt to connect to it (aspmx5.googlemail.com) unless all the others were unavailable. In an instance like that we'd probably have bigger problems to worry about :)
It's very unlikely that it will matter. Each DNS record is given a priority, so whenever an email server is resolving that domain name, it will go through each record, order by priority in ascending order, until it finds one that resolves.
Google's servers are very stable, so if they're to the point that 9 previous records are failing, chances are good that it's catastrophic and your email will be down regardless of DNS.
tl;dr: you're fine

Why do some websites require "www"? [duplicate]

When browsing through the internet for the last few years, I'm seeing more and more pages getting rid of the 'www' subdomain.
Are there any good reasons to use or not to use the 'www' subdomain?
There are a ton of good reasons to include it, the best of which is here:
Yahoo Performance Best Practices
Due to the dot rule with cookies, if you don't have the 'www.' then you can't set two-dot cookies or cross-subdomain cookies a la *.example.com. There are two pertinent impacts.
First it means that any user you're giving cookies to will send those cookies back with requests that match the domain. So even if you have a subdomain, images.example.com, the example.com cookie will always be sent with requests to that domain. This creates overhead that wouldn't exist if you had made www.example.com the authoritative name. Of course you can use a CDN, but that depends on your resources.
Also, you then don't have the ability to set a cross-subdomain cookie. This seems evident, but this means allowing authenticated users to move between your subdomains is more of a technical challenge.
So ask yourself some questions. Do I set cookies? Do I care about potentially needless bandwidth expenditure? Will authenticated users be crossing subdomains? If you're really concerned with inconveniencing the user, you can always configure your server to take care of the www/no www thing automatically.
See dropwww and yes-www (saved).
Just after asking this question I came over the no-www page which says:
...Succinctly, use of the www subdomain
is redundant and time consuming to
communicate. The internet, media, and
society are all better off without it.
Take it from a domainer, Use both the www.domainname.com and the normal domainname.com
otherwise you are just throwing your traffic away to the browers search engine (DNS Error)
Actually it is amazing how many domains out there, especially amongst the top 100, correctly resolve for www.domainname.com but not domainname.com
There are MANY reasons to use the www sub-domain!
When writing a URL, it's easier to handwrite and type "www.stackoverflow.com", rather than "http://stackoverflow.com". Most text editors, email clients, word processors and WYSIWYG controls will automatically recognise both of the above and create hyperlinks. Typing just "stackoverflow.com" will not result in a hyperlink, after all it's just a domain name.. Who says there's a web service there? Who says the reference to that domain is a reference to its web service?
What would you rather write/type/say.. "www." (4 chars) or "http://" (7 chars) ??
"www." is an established shorthand way of unambiguously communicating the fact that the subject is a web address, not a URL for another network service.
When verbally communicating a web address, it should be clear from the context that it's a web address so saying "www" is redundant. Servers should be configured to return HTTP 301 (Moved Permanently) responses forwarding all requests for #.stackoverflow.com (the root of the domain) to the www subdomain.
In my experience, people who think WWW should be omitted tend to be people who don't understand the difference between the web and the internet and use the terms interchangeably, like they're synonymous. The web is just one of many network services.
If you want to get rid of www, why not change the your HTTP server to use a different port as well, TCP port 80 is sooo yesterday.. Let's change that to port 1234, YAY now people have to say and type "http://stackoverflow.com:1234" (eightch tee tee pee colon slash slash stack overflow dot com colon one two three four) but at least we don't have to say "www" eh?
There are several reasons, here are some:
1) The person wanted it this way on purpose
People use DNS for many things, not only the web. They may need the main dns name for some other service that is more important to them.
2) Misconfigured dns servers
If someone does a lookup of www to your dns server, your DNS server would need to resolve it.
3) Misconfigured web servers
A web server can host many different web sites. It distinguishes which site you want via the Host header. You need to specify which host names you want to be used for your website.
4) Website optimization
It is better to not handle both, but to forward one with a moved permanently http status code. That way the 2 addresses won't compete for inbound link ranks.
5) Cookies
To avoid problems with cookies not being sent back by the browser. This can also be solved with the moved permanently http status code.
6) Client side browser caching
Web browsers may not cache an image if you make a request to www and another without. This can also be solved with the moved permanently http status code.
There is no huge advantage to including-it or not-including-it and no one objectively-best strategy. “no-www.org” is a silly load of old dogma trying to present itself as definitive fact.
If the “big organisation that has many different services and doesn't want to have to dedicate the bare domain name to being a web server” scenario doesn't apply to you (and in reality it rarely does), which address you choose is a largely cultural matter. Are people where you are used to seeing a bare “example.org” domain written on advertising materials, would they immediately recognise it as a web address without the extra ‘www’ or ‘http://’? In Japan, for example, you would get funny looks for choosing the non-www version.
Whichever you choose, though, be consistent. Make both www and non-www versions accessible, but make one of them definitive, always link to that version, and make the other redirect to it (permanently, status code 301). Having both hostnames respond directly is bad for SEO, and serving any old hostname that resolves to your server leaves you open to DNS rebinding attacks.
Apart from the load optimization regarding cookies, there is also a DNS related reason for using the www subdomain. You can't use CNAME to the naked domain. On yes-www.org (saved) it says:
When using a provider such as Heroku or Akamai to host your web site, the provider wants to be able to update DNS records in case it needs to redirect traffic from a failing server to a healthy server. This is set up using DNS CNAME records, and the naked domain cannot have a CNAME record. This is only an issue if your site gets large enough to require highly redundant hosting with such a service.
As jdangel points out the www is good practice in some cookie situations but I believe there is another reason to use www.
Isn't it our responsibility to care for and protect our users. As most people expect www, you will give them a less than perfect experience by not programming for it.
To me it seems a little arrogant, to not set up a DNS entry just because in theory it's not required. There is no overhead in carrying the DNS entry and through redirects etc they can be redirected to a non www dns address.
Seriously don't loose valuable traffic by leaving your potential visitor with an unnecessary "site not found" error.
Additionally in a windows only network you might be able to set up a windows DNS server to avoid the following problem, but I don't think you can in a mixed environment of mac and windows. If a mac does a DNS query against a windows DNS mydomain.com will return all the available name servers not the webserver. So if in your browser you type mydomain.com you will have your browser query a name server not a webserver, in this case you need a subdomain (eg www.mydomain.com ) to point to the specific webserver.
Some sites require it because the service is configured on that particular set up to deliver web content via the www sub-domain only.
This is correct as www is the conventional sub-domain for "World Wide Web" traffic.
Just as port 80 is the standard port. Obviously there are other standard services and ports as well (http tcp/ip on port 80 is nothing special!)
Imagine mycompany...
mx1.mycompany.com 25 smtp, etc
ftp.mycompany.com 21 ftp
www.mycompany.com 80 http
Sites that don't require it basically have forwarding in dns or redirection of some-kind.
e.g.
*.mycompany.com 80 http
The onlty reason to do it as far as I can see is if you prefer it and you want to.

Resources