Canonical, SiteMap and Index Files? - sitemap

I've set all my website URLs to be displayed without any index.* references, through .HTAccess, so making my canonical definitions simpler. My question is, does the sitemap.xml definitions also need to lose the index.* references?
Ultimate aim is not to confuse Google...

You probably need to give some examples for us to make 100% sure we understand you correctly.
But yes, naturally your XML sitemap should reflect your real URLs :)
So if instead of somedir/index.html you now use somedir/ your XML sitemap should reflect that :)

Related

Sitemap.xml has the same product in multiple URLS

I am sorry if this question was asked before. But I really dont know what terms to use to look it up.
my sitemap: https://www.zeroohm.com/sitemap.xml
has the same product (resistor 470) repeated 5 times. Is this bad? should I clean my sitemap to show the same product only once?
I am concerned because the sitemap submits about 4000 URLs but my google indexes only about 1130 links.
Here is a sample:
https://www.zeroohm.com/stackpole-electronics-inc.-sei/resistor-470
https://www.zeroohm.com/components/discrete/resistors/resistor-470
https://www.zeroohm.com/components/discrete/resistor-470
https://www.zeroohm.com/components/resistor-470
https://www.zeroohm.com/resistor-470/
thanks folks!
Yes you should. You must avoid having duplicating content on different pages, and if you do, use the canonical url tag. Also the sitemap doesn't make much sense, it should follow a certain logic. You can have links to the resistor-470 page spread through out your site, but they all should point to the same page/url.

What does the sharp and exclamation mark (#!) stand for in a url? Don't even know how to look for an answer

I have seen these "domain.com/#!/" formated urls, and driven merely by curiosity I chose to ask you people... what is that used for? A kinda "exclamated-hashtag" if you know what I mean.
I see it on sites such as "hypem.com" or "buzzchips.com", both of them delivering asynchronous dynamic content in a similar way.
I uploaded a tiny shot just so you actually see what I see, here and there.
It appears to be a standard for allowing dynamically created content to be crawled.
You can see a good explanation of this under the SEO heading for the following answer:
https://softwareengineering.stackexchange.com/questions/46716/what-should-a-developer-know-before-building-a-public-web-site/46760#46760

Pretty URLs Vs. Duplicate Content

I'm trying to clear up a grey area about this much talked about topic...
Like most devs, I've made some pretty URLs with mod_rewrite. My sites internal links point to the pretty URLs and things are working nicely.
But, I can still access the old URL if I point to it directly.
Now, this is most certainly going to cause duplicate content issues so after doing some research it seems that 301 redirects are the way to go.
But.... and here's the grey bit...
If you are working on a site with thousands of URLs, what's best practice to achieve this? I don't wantto list 1k+ lines in .htaccess I thought of a regexp in my rewrite rule, but my pretty URLs have names from the database in them... and I can't access that from .htaccess :)
Have I hit a dead end? Is there a way around this? Would Google's canonical tag be a possibility??
Well, I don't know if this is the "definitive" answer, but I have a bunch of "functional" URLS like:
http://www.flipscript.com/product.aspx?cid=7&pid=42&ds=asdjlf8i7sdfkhsjfd978
but I remap the URLs, link to them and list them in my site map as:
http://www.flipscript.com/ambigram-ring.aspx
I haven't seen ANY evidence that identical URLS pointing to the same content within the same domain has any negative impact on SEO.
In fact, over the past year, I have climbed to the #1 position on Google with this in place for my primary keyword.
My theory about why this should be so is that Google applies the duplicate content penalty for entire "clone sites", not for just linking with different URLs to the same content within a single site.
A quick dirty way would be to re-route everything on the site via a PHP file that checks to see if the path is still valid, querying the database if necessary. Use a 301 redirect if the path has permanently moved. Soon enough these "grey urls" should hardly ever come across, and indexes should be updated across search engines. At which point you can remove the router.
If you could specify what your "grey url" looks like I may be able to suggest a better alternative.
"Would Google's canonical tag be a possibility??" -- Why not?
--> It automatically transfers page rank
--> Google recommends canonical tag even if the content differs slightly but is more or less similar.
--> Too many 301 redirects to pages within site are bad for SEO (my personal experience with Bing).
--> Too may 301 redirects increase the effective load time of content for your users (especially bad if the ping times from their location to your server is high).

What simple syntax can be used for rich text?

I want in an application with a simple text input, enriched with some marks to include formatting or semantic labeling. I want the syntax as easy as possible and I want to include self-defined labels.
Example:
[bold]Stackoverflow[/bold] is a [tag]good[/tag] resource for programmers.
Tables would be needed too.
HTML/XML and LaTeX are mighty enough to allow this, but too complicated. Wiki-Syntax seems simple, but uses another symbol for each markup, has unclear quoting and every Wiki seems to have another syntax. For tables and similar stuff Wiki becomes very complicated.
Exists a language/syntax, that matches my needs or can be slightly changed to do so? Or do I have to invent something myself? In that case, do you have suggestions?
Definitely do NOT invent your own. There are plenty of simple markup languages already, and users HATE learning new ones. Trust me on this!
I would suggest using one of the following:
Textile
Markdown
BBCode
Make your decision based on your userbase, as well as what tools and parsers are available in your chosen language. For my site, we went with Textile, but I've found that BBCode tends to be the language that most people already know. However, this will vary with different user demographics.
StackOverflow, along with several other sites, uses Markdown. I think it will give you the best balance between features and simplicity.
Let me add ReStructuredText to the list.
An additional benefit of using it is given by the availability of ReStructuredText to Anything service that makes extremely easy to create HTML or PDF versions of the document.
As already pointed out there are a lot of lightweight markup languages (many are listed here: wikipedia article), there should be no need of creating your own.

What are the url parameters naming convention or standards to follow

Are there any naming conventions or standards for Url parameters to be followed. I generally use camel casing like userId or itemNumber. As I am about to start off a new project, I was searching whether there is anything for this, and could not find anything. I am not looking at this from a perspective of language or framework but more as a general web standard.
I recommend reading Cool URI's Don't Change by Tim Berners-Lee for an insight into this question. If you're using parameters in your URI, it might be better to rewrite them to reflect what the data actually means.
So instead of having the following:
/index.jsp?isbn=1234567890
/author-details.jsp?isbn=1234567890
/related.jsp?isbn=1234567890
You'd have
/isbn/1234567890/index
/isbn/1234567890/author-details
/isbn/1234567890/related
It creates a more obvious data structure, and means that if you change the platform architecture, your URI's don't change. Without the above structure,
/index.jsp?isbn=1234567890
becomes
/index.aspx?isbn=1234567890
which means all the links on your site are now broken.
In general, you should only use query strings when the user could reasonably expect the data they're retrieving to be generated, e.g. with a search. If you're using a query string to retrieve an unchanging resource from a database, then use URL-rewriting.
There are no standards that I'm aware of. Just be mindful of IE's URL length limit of 2,083 characters.
Standard for URI are defined by RFC2396.
Anything after the standardized portion of the URL is left to you.
You probably only want to follow a particular convention on your parameters based on the framework you use.
Most of the time you wouldn't even really care because these are not under your control, but when they are, you probably want to at least be consistent and try to generate user-friendly bits:
that are short,
if they are meant to be directly accessible by users, they should be easy to remember,
case-insensitive (may be hard depending on the server OS).
follow some SEO guidelines and best practices, they may help you a lot.
I would say that cleanliness and user-friendliness are laudable goals to strive for when presenting URLs.
StackOverflow does a fairly good job of it.
I use lowercase. Depending on the technology you use, QS is either threated as case-sensitive (eg. PHP) or not (eg. ASP). Using lowercase avoids possible confusion.
Like the other answers I've not heard about any conventions.
The only "standard" I would adhere to is to use the more search engine friendly practice of using a URL rewriter.
There are no standards that I know of, and case shouldn't matter.
However within your application (website), you should stick to your own standards. For your own sanity if nothing else.

Resources