Sitemap.xml has the same product in multiple URLS - sitemap

I am sorry if this question was asked before. But I really dont know what terms to use to look it up.
my sitemap: https://www.zeroohm.com/sitemap.xml
has the same product (resistor 470) repeated 5 times. Is this bad? should I clean my sitemap to show the same product only once?
I am concerned because the sitemap submits about 4000 URLs but my google indexes only about 1130 links.
Here is a sample:
https://www.zeroohm.com/stackpole-electronics-inc.-sei/resistor-470
https://www.zeroohm.com/components/discrete/resistors/resistor-470
https://www.zeroohm.com/components/discrete/resistor-470
https://www.zeroohm.com/components/resistor-470
https://www.zeroohm.com/resistor-470/
thanks folks!

Yes you should. You must avoid having duplicating content on different pages, and if you do, use the canonical url tag. Also the sitemap doesn't make much sense, it should follow a certain logic. You can have links to the resistor-470 page spread through out your site, but they all should point to the same page/url.

Related

Internationalizing Title/Meta Tags ok or bad practice?

Is there a problem if I have both English and Chinese versions of the same title/meta tags under the same exact url? I detect the language the user has set for the browser (through the http header "accept-language" field) and change the titles/meta tags based on the language set. I get a large percentage of my traffic from China and felt this was a better-localized user experience for those users BUT I have no idea how Google would view this. My gut feeling tells me that this is not good for SEO.
Baidu.com, a major Chinese search engine, does in fact pick up my translated tags however for other US based sites it does not translate their English title/meta tags into Chinese. I would think Chinese users are less likely to click on those.
Creating sub domains and or separate domains for other countries is not an option at this point. That being said should I only have one language (English) for my title/meta tags to avoid any search engine issues?
Thanks for any advice / wisdom you can offer. Really hoping to get clarity on best practices.
Thanks all!
Yes, it probably is a problem. Search engines see mixed language content. You are not describing how you “detect and change the titles/meta tags based on the users browser language”, but you are probably doing it client-side and using “browser language”, which is wrong whatever it means in detail (it does not specify the user’s preferred language).
To get a more targeted answer, ask a more real question, with a URL.
If you want to get search traffic from search engines in both English and Chinese, you should have two urls instead of one.
When googlebot crawls a page, it does not even send the "Accept-Language" header. You have to send it your default language. When there is one url, there is no way for you to have your second language indexed. You won't be ranked in search engines in multiple languages.
For best SEO, use separate top level domains, subdomains, or folders for different languages.
http://example.de/
http://example.es/
http://example.com/
http://de.example.com/
http://es.example.com/
http://www.example.com/
http://example.com/de/
http://example.com/es/
http://example.com/en/
I think there are no problem when you use English and Chinese in same meta tags.

#! (hashbang) and Google SEO [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 12 months ago.
Improve this question
I've read over the Google specification for crawling AJAX-enabled pages. Since part of Google's indexing method uses the URL itself, will converting to !# negatively effect SEO?
For instance, if I have a page at www.mysite.com/surfing, Google will be likely to rate it highly if a user searches for "surfing" because it has "surfing" in the URL. Would the same be true for www.mysite.com/#!surfing or does it ignore the hash fragments for the purposes of weighting the URL itself?
Perhaps you have already read in the google Ajax-crawling instructions that the !# is actually transformed into ?_escaped_fragment_ by the google crawler. So let's use your example:
www.mysite.com/#!surfing , the google crawler will see the link as www.mysite.com/?_escaped_fragment_=surfing . So it comes to the question : what is better for google SEO a link with a paremeter ?_escaped_fragment_=surfing or without one /surfing ?
Search engineer representatives have confirmed on numerous occasions that URLs with more than 2 dynamic parameters may not be spidered unless they are perceived as significantly important (i.e. have many, many links pointing to them). So unless you're using too many parameters in the url, you don't have much to worry about. If you haven't done it already, you can always read the detailed google documentation https://developers.google.com/webmasters/ajax-crawling/docs/getting-started . Now, just an advice - don't rely on # in your AJAX website. Use history.pushState() to change your url to whatever you wish. I use #! only on browsers that don't support history.pushState() like IE. The problem with the SEO with #! doesn't come form the url but from the difficulties in the Server Side processing of the information needed to provide HTML snapshot for the crawler.
The question is old.
Now Google not supports AJAX-Crawling anymore:
https://webmasters.googleblog.com/2015/10/deprecating-our-ajax-crawling-scheme.html
And this document officially deprecated:
https://developers.google.com/search/docs/ajax-crawling/docs/getting-started
So don't use hashbangs in URLs.
Traditionally, from SEO perspective, hash tag (#) is used to avoid the following issues
-Cannibalization issues
-Affiliate URLs (Here is a good article about how to use hash for tracking purpose instead of using question mark in the URL)
-Show limited content on the page (pagination issues)
The usage you are refering to is what Google recommends on how to make AJAX pages being able to be read by Google - https://support.google.com/webmasters/answer/174992?hl=en
For more info about hash tag and its SEO benefits, check this blog post - https://digitalreadymarketing.com/adding-hash-in-urls-seo-benefits/
In My personal opinion and 8 years in SEO & development It won't harm but it depends more on the site other parameters so adding the !# won't do harm...
Do you have the site URL so I can take a more in-depth Look ?
That could cause a problem if Google's crawler thought that there could be an infinite number of possibilities. Like with a ? in the url. But the answer beyond that is clear.
website.com/oreo-cookies
is more semantic and easier to understand for both people and crawlers than
website.com/#!oreo-cookies
But is this going to have a major impact? If you were a client paying me for SEO, I would tell you that your incoming text links with relevant keyword phrases from relevant related websites is far more important. I would also say that if you are submitting an xml sitemap for google to digest, and lots of popular websites are using the #! google will figure it out and ignore it.
So bottom line, if my content was worth linking to, and I made sure google was finding all my pages and indexing them, I would not worry about it.
I think that it will not harm your SEO in any way I am in SEO for last 5 years and haven't experienced such problem yet so don't worry about it. So my opinion is you can do it by adding the !# no harm !!

Canonical, SiteMap and Index Files?

I've set all my website URLs to be displayed without any index.* references, through .HTAccess, so making my canonical definitions simpler. My question is, does the sitemap.xml definitions also need to lose the index.* references?
Ultimate aim is not to confuse Google...
You probably need to give some examples for us to make 100% sure we understand you correctly.
But yes, naturally your XML sitemap should reflect your real URLs :)
So if instead of somedir/index.html you now use somedir/ your XML sitemap should reflect that :)

Pretty URLs Vs. Duplicate Content

I'm trying to clear up a grey area about this much talked about topic...
Like most devs, I've made some pretty URLs with mod_rewrite. My sites internal links point to the pretty URLs and things are working nicely.
But, I can still access the old URL if I point to it directly.
Now, this is most certainly going to cause duplicate content issues so after doing some research it seems that 301 redirects are the way to go.
But.... and here's the grey bit...
If you are working on a site with thousands of URLs, what's best practice to achieve this? I don't wantto list 1k+ lines in .htaccess I thought of a regexp in my rewrite rule, but my pretty URLs have names from the database in them... and I can't access that from .htaccess :)
Have I hit a dead end? Is there a way around this? Would Google's canonical tag be a possibility??
Well, I don't know if this is the "definitive" answer, but I have a bunch of "functional" URLS like:
http://www.flipscript.com/product.aspx?cid=7&pid=42&ds=asdjlf8i7sdfkhsjfd978
but I remap the URLs, link to them and list them in my site map as:
http://www.flipscript.com/ambigram-ring.aspx
I haven't seen ANY evidence that identical URLS pointing to the same content within the same domain has any negative impact on SEO.
In fact, over the past year, I have climbed to the #1 position on Google with this in place for my primary keyword.
My theory about why this should be so is that Google applies the duplicate content penalty for entire "clone sites", not for just linking with different URLs to the same content within a single site.
A quick dirty way would be to re-route everything on the site via a PHP file that checks to see if the path is still valid, querying the database if necessary. Use a 301 redirect if the path has permanently moved. Soon enough these "grey urls" should hardly ever come across, and indexes should be updated across search engines. At which point you can remove the router.
If you could specify what your "grey url" looks like I may be able to suggest a better alternative.
"Would Google's canonical tag be a possibility??" -- Why not?
--> It automatically transfers page rank
--> Google recommends canonical tag even if the content differs slightly but is more or less similar.
--> Too many 301 redirects to pages within site are bad for SEO (my personal experience with Bing).
--> Too may 301 redirects increase the effective load time of content for your users (especially bad if the ping times from their location to your server is high).

mod_rewrite and redundant / old urls, some SEO best practices needed

Having a look at how google perceives our site at the moment and coming up short...
Basically, we use a bog-standard structure of URL rewriting to make them look SEO friendly.
for instance, a product URL takes shape of any string_([0-9]).html and so forth. of course, this allows us to link to whatever we want before the product id... which we have done. In the past, a product page was Product_Name_79.html and then became Brand_Name_Product_Name_79.html. apache does not really care and id 79 gets passed on in either case. However, google now has 2 versions of this product cached under different URLs - and that's not a good thing as it continues to arrive to the first URL and spider it.
same thing applies to our rewrite rules for brands and categories, some of which had been dropped and some of which have been modified.
there are over 11k urls in site:domain whereas our sitemap gets some 5.8k only. how would you prevent spiders from fetching older versions of urls that you no-longer link to (considering it's not a manual process and often such urls can be very dynamic).
eg, Mens_Merrell_Trail_Running_Shoes__50-100__10____024/ is a dynamic url for the merrell brand, narrowed down by items in trail running shoes that cost between 50 and 100 and size 10 with gender set to men's.
if we decide to nofollow any size and money filter urls, that leaves google still being able to access them through its old cache...
what is the best practice for disallowing a particular type of urls? as the combinations above are nearly infinite, i cannot produce a list and it certainly cannot be backdated against what brands and categories google may hold for us historically.
shall we add noindex when such filters are applied? shall we export them to robots.txt? do nothing in the hope that google stops returning?
to put it into perspective, we have 2600 product page urls that are now redundant / disabled, what would you do with them? redirect to homepage, brand page, 404, do nothing?
thanks for any advice
i think you're looking for rel="canonical", google should start ignoring you're links if they're really not linked to. You can check any incoming links with a tool like this: http://www.seomoz.org/linkscape.
Also if you're old urls match (or don't match) a consisent pattern you could set up a 301 redirect in apache either for pages matching the old pattern or not matching the new pattern...
hope this helps!
Just be sure to set up redirects for any URL you change. Also, I don't recommend using rel=nofollow since it indicates to Google that your site is not trustworthy.

Resources