Some weeks ago, we discovered someone going on our site with the robots.txt directory:
http://www.ourdomain.com/robots.txt
I've been doing some research and it said that robots.txt makes the permissions of our search engine?
I'm not certain of that...
The reason why I'm asking this is because he is trying to get into that file once again today...
The thing is that we do not have this file on our website... So why is someone trying to access that file? Is it dangerous? Should we be worried?
We have tracked the IP address and it says the location is in Texas, and some weeks ago, it was in Venezuela... Is he using a VPN? Is this a bot?
Can someone explain what this file does and why he is trying to access it?
In a robots.txt (a simple text file) you can specify which URLs of your site should not be crawled by bots (like search engine crawlers).
The location of this file is fixed so that bots always know where to find the rules: the file named robots.txt has to be placed in the document root of your host. For example, when your site is http://example.com/blog, the robots.txt must be accessible from http://example.com/robots.txt.
Polite bots will always check this file before trying to access your pages; impolite bots will ignore it.
If you don’t provide a robots.txt, polite bots assume that they are allowed to crawl everything. To get rid of the 404s, use this robots.txt (which says the same: all bots are allowed to crawl everything):
User-agent: *
Disallow:
Related
I uploaded a new .php file to GoDaddy via Cpanel. It works on my local, but when I try to run it on live, it shows 404 on console. Seems like it cannot see the new file.
Is there some sort of latency on Godaddy servers like uploaded files appear X hours later or something? Or what may be wrong?
Nope Cpannel is pretty close to instant. Be sure your .php file is named index.php and is in a folder that has the desired url. (This may not be what’s preventing you from seeing the file, just a common issue I’ve seen that’s on the quick check list.) do not use capital letters or spaces when naming your index file.
You also get hosting support over the phone with godaddy. This is there number. +1-480-463-8389.
But feel free to use their url to find their support number.
our 2 products such as www.nbook.in and www.a4auto.com are removed from all search engines. These projects got sub-domains and the links from subdomains are available. The domain is not blacklisted. The custom sitemap is created and the same is getting indexed even now. Analyzed the URL by google search console and it seems fine. 'site:nbook.in' in google didn't produce any result. Actually, we got more than 75,000 links available on google. it was working fine till last week. This is affecting our product's Alexa rank and reach. There is no issue with robots.txt file in the server document root folder. There is no rule set to deny any bot or user agent. It is not just Google, all search engines removed our links and this is making me confuse. We have other products which is designed and developed on the same way but there is absolutely no issue. I think there is nothing more to do with google search console/webmaster. I'm ready to provide more data upon requirement. Please help.
create robots.txt on root and put this code
User-agent: *
Disallow: /
I'm looking add referencing the sitemap for multiple domain name alias which is spun off logic within a Laravel framework. in my robots.txt file - but I'm not quite sure what the correct way is to do this. Sitemaps exist and are present and correct, but just unsure as to the format google expects...so really looking for SEO based answers rather than was to achieve this.
I'm thinking I can do this for robots.txt
i.e.,
Sitemap: https://www.main-domain.com/sitemap.xml
Sitemap: https://www.domain-alias1.com/sitemap.xml
Sitemap: https://www.domain-alias2.com/sitemap.xml
Any pro-seo tips would be much appreciated!
Assuming your code is about generating a proper robot.txt having multiple sitemaps, then listing them one per line with the Sitemap: instruction is the right way. These must be located at the same directory level as the robot.txt file (i.e. root level).
I am using Wordpress and Easy Digital Download plugin to sell digital files.
I have the following how-to questions:
How to avoid a user to see or to use direct download link?
How to create a download link that has an expiration like session?
How to secure a wp-contents/uploads folder?
You might be interested in this:
deny direct access to a folder and file by htaccess
simply put a .htaccess file with the content "deny from all" in the folder.
Then only scripts from your webspace should be able to read files from there.
This should be a first step. You would need a php-file serving the data instead of accessing those files directly.
Eg like this: http://www.kavoir.com/2009/05/php-hide-the-real-file-url-and-provide-download-via-a-php-script.html
(as I am not aware of wordpress plugins, maybe just google for them, this explains how to write those in php - if you cant do that youre pretty much stuck to wordpress plugins)
Nah, Nah, Nah... It's easy man update your plugin, and check your settings...
Step 2. You will see in you plugin settings, that it has 24 hours expiration date download link in forwarded email
"WP Secure Links" is WordPress plugin at codecanyon that lets you create a secure link to downloadable files.
http://codecanyon.net/item/wp-secure-links/4418678
http://sixthlife.net/product/wp-secure-links/
for securing the uploads folder needs some good .htaccess work
check the comments section of this, its explained
Still relevant in 2020 so I have written a small plugin that enables secure and temporary downloads in Wordpress:
WP DISPATCHER
It is free and simple. Check it out :)
I'm working for a simple bot for a project, and I noticed, that a lot of sites do not have sitemaps in their robot.txt files. There is of course an option to simply index the sites in question and crawl all possible pages, but that often takes much more time than simply downloading sitemap.
What is the best way to detect sitemap if it is not mentioned in robots.txt?
Normally it should be placed in the root directory of a domain like xydomain.xyz/sitemap.xml .
I would only add the site map into the robots file, if it is placed elsewhere. If a site uses more than one site map located on another place, it should be noted in an index map.
You can use this online tool to scan your site and create a bespoke sitemap.xlm file for your site.
To help your sitemap to be discovered through the robot.txt add the URL of your sitemap at the very top of your robot.txt file, (see below example).
So, the robots.txt file looks like this:
Sitemap: http://www.example.com/sitemap.xml
User-agent:*
Disallow: