We use Joomla 3.4.4 for our Company website. We have mod_rewrite and SEF-Urls.
In our company website, we use categories only to organize articles internally, not for public access.
Nevertheless, Google has somehow found out the categories and displays them in the search results. People clicking on these category search results land on a page with several articles, which is not intended.
How can I prevent Google from indexing the category pages?
I'll try to set the robots field in the category options to "noindex, follow". Hope this helps.
A quick workaround: Adding some RewriteRules in .htaccess. These redirect the unwanted category requests to the main page.
I scanned the whole google results and by now I have about 10 RewriteRules for unwanted URIs.
This was a major problem with our websites. Google searches would show several unwanted categories and include a number prefix (10-videos). Clicking the Google search would show a dozen various articles that were all marked noindex, nofollow. As the category itself was marked noindex, nofollow and the global default was noindex, nofollow, it was a complete mystery why this was happening.
After a several years of frustration, I finally solved it. There are two parts. A long-term permanent solution and a short-term temporary solution which will remove them from Google searches within 24 hours.
The long-term permanent solution is to disallow the categories in robots.txt. When the site is re-crawled, they should go away. Add the offending categories at the end of robots.txt. This will also take care of their sub-categories. These are case sensitive in Google so make sure to use only lower case. (Disallow: /10-videos)
The short-term 90 day solution is to manually remove the URLs in Google Search Console. This can currently only be done in the old version of GSC. It is not yet available in the new version.
In the new Search Console click Index : Coverage. Make sure the Valid tab is highlighted. At the bottom, click on “Indexed, not submitted in sitemap” to see the offending links.
In the old version go to Google Index : Remove URLs. Click Temporarily Hide. Enter just the category (10-videos) as it will automatically add the website link. This will also take care of their sub-categories. By the next day the bad URLs will be gone (for 90 days). These are case sensitive so make sure to use only lower case.
Related
I already added two versions: one with www, and the other without www, both pointing to same location. I set the non-www as the preferred version. I add sitemap only to the non-www version. I added all the links in the sitemap. Then still my sitemap index status is Pending. I have an article that was current and relevant, but when I search for that keyword in google search, i ended already on the last search page, still my website doesn't appear. However those websites which are not even related to my search keyword appear. It's a bit frustrating cause I am confident my page is relevant and properly created, however Google still doesn't show it in the search results :(
Is there anything I still need to do?
It takes a couple of days for Google to crawl through and update the results after you make each change. Wait a couple of days and see what happens.
Quick question, since I've added the magento cookie options in 1.7.0.2 google has swapped my description (the bit of text under main link in search results) for the text that I have in my cookie confirmation box. Not only is this terrible for people that find us through google, I doubt google bot will be all too pleased with it. All my pages have descriptions set but for some reason they are not being used? the cookie explanation text is used instead. Does anyone know how I can change this? or stop it happening?
Many Thanks
I was facing the exact same problem: Google was showing the cookie warning text as description in search results for my Magento store.
The problem turned out to be my Meta description being too short. Solution for me was making the meta description longer, atleast about 150 characters (including spaces).
What goes in your < description > tag is found in Magento's backoffice: system>configuration>general>design, under HTML head, Default Description.
After save, I cleared cache and checked the page source for showing the updated meta. To make things with Google go faster, I used their webmaster tools to submit the store url for crawling. After a little wait, Google was showing the store's description in the search results just like it's supposed to.
Hope this can still help you!
Cheers
Could you paste your cookie confirmation box and how it works, as well as some of your meta descriptions?
Blank out as necessary, just need the gist of the structure.
I'm in the process of creating a sitemap for my website. I'm doing this because I have a large number of pages that can only be reached via a search form normally by users.
I've created an automated method for pulling the links out of the database and compiling them into a sitemap. However, for all the pages that are regularly accessible, and do not live in the database, I would have to manually go through and add these to the sitemap.
It strikes me that the regular pages are those that get found anyway by ordinary crawlers, so it seems like a hassle manually adding in those pages, and then making sure the sitemap keeps up to date on any changes to them.
Is it a bad to just leave those out, if they're already being indexed, and have my sitemap only contain my dynamic pages?
Google will crawl any URLs (as allowed by robots.txt) it discovers, even if they are not in the sitemap. So long as your static pages are all reachable from the other pages in your sitemap, it is fine to exclude them. However, there are other features of sitemap XML that may incentivize you to include static URLs in your sitemap (such as modification dates and priorities).
If you're willing to write a script to automatically generate a sitemap for database entries, then take it one step further and make your script also generate entries for static pages. This could be as simple as searching through the webroot and looking for *.html files. Or if you are using a framework, iterate over your framework's static routes.
Yes, I think it is not a good to leave them out. I think it would also be advisable to look for a way that your search pages can be found by a crawler without a sitemap. For example, you could add some kind of advanced search page where a user can select in a form the search term. Crawlers can also fill in those forms.
I am building a forum site where the post is retrieved on the same page as the listing via AJAX. When a new post is shown, the URI fragment is changed (ex: .php#1_This-is-the-first-post). Also the title and meta tags are changed.
My question is this. I have read that search engines aren't able to use #these-words. So therefore, my entire site won't be able to be indexed (as it will look like one page).
What can i do to get around this, or at least make my sub-pages be able to get indexed?
NOTE: I have built almost all of the site, so radically changes would be hard. SEO is my weakest geek-skill.
Add non-AJAX versions of every page, and link to them from your popups as "permalinks" (or whatever you want to call them). Not only aren't your pages available to search engines, they can't be bookmarked or emailed to friends. I recently worked with some designers on a site and talked them out of using an AJAX-only design. They ended up putting article "teasers" in popups and making users go to a page with a bookmarkable URL to read the complete texts.
As difficult as it may be, the "best" answer may be to re-architect your site to use the hash tag URL scheme more sparingly
Short of that, I'd suggest the following:
Create an alternative, non-hash based URL scheme. This is a must.
Create a site-map that allows search engines to find your existing pages through the new URL scheme.
Slowly port your site over. You might consider adding these deeper links on the page, or encourage users to share those links instead of the hash-based ones, etc.
Hope this helps!
Situation: Google has indexed a page in a forum. The thread is now deleted. How/whether can I make Google and other search engines to delete the cached copy? I doubt they would have anything against that since the linked page does not exist anymore and keeping the index updated and valid should be in their best interests.
Is this possible or do I have to wait months for an index update? Or will the page now stay there forever?
I am not the owner of the respective site so I can't change robots.txt for example. I would like to force the update as the "third party".
I also noticed that a new page on that resource I created two days ago is already in the cache. Given that can I make an estimate how long will it take for a non valid page on this domain to be dropped?
EDIT: So I did the test. It took google shortly under 2 months to drop the page. Quite a long time...
It's damn near impossible to get it removed - however replacing the page with entirely blank content will ensure that you nuke the ranking of the page when it is respidered.
You can't really make Google delete anything, except perhaps in extreme circumstances. You can adjust your robots.txt file to promote a revisit interval that might update things sooner, but if it is a low traffic site, you might not get a revisit very soon.
EDIT:
Since you are not the site-owner, you can modify the meta tags on the page with "revisit-after" tags as discussed here.
You cant make search engines to remove the link but don't worry soon the link will be removed as the link will not longer be active. You need not wait for months for this to happen.
If your site is registered with Google Webmaster, you can request to remove pages from the index. It works, I tried and used it in the past.
EDIT: Since you are not the owner, I am afraid that this solution would not work.