Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 12 months ago.
Improve this question
I've read over the Google specification for crawling AJAX-enabled pages. Since part of Google's indexing method uses the URL itself, will converting to !# negatively effect SEO?
For instance, if I have a page at www.mysite.com/surfing, Google will be likely to rate it highly if a user searches for "surfing" because it has "surfing" in the URL. Would the same be true for www.mysite.com/#!surfing or does it ignore the hash fragments for the purposes of weighting the URL itself?
Perhaps you have already read in the google Ajax-crawling instructions that the !# is actually transformed into ?_escaped_fragment_ by the google crawler. So let's use your example:
www.mysite.com/#!surfing , the google crawler will see the link as www.mysite.com/?_escaped_fragment_=surfing . So it comes to the question : what is better for google SEO a link with a paremeter ?_escaped_fragment_=surfing or without one /surfing ?
Search engineer representatives have confirmed on numerous occasions that URLs with more than 2 dynamic parameters may not be spidered unless they are perceived as significantly important (i.e. have many, many links pointing to them). So unless you're using too many parameters in the url, you don't have much to worry about. If you haven't done it already, you can always read the detailed google documentation https://developers.google.com/webmasters/ajax-crawling/docs/getting-started . Now, just an advice - don't rely on # in your AJAX website. Use history.pushState() to change your url to whatever you wish. I use #! only on browsers that don't support history.pushState() like IE. The problem with the SEO with #! doesn't come form the url but from the difficulties in the Server Side processing of the information needed to provide HTML snapshot for the crawler.
The question is old.
Now Google not supports AJAX-Crawling anymore:
https://webmasters.googleblog.com/2015/10/deprecating-our-ajax-crawling-scheme.html
And this document officially deprecated:
https://developers.google.com/search/docs/ajax-crawling/docs/getting-started
So don't use hashbangs in URLs.
Traditionally, from SEO perspective, hash tag (#) is used to avoid the following issues
-Cannibalization issues
-Affiliate URLs (Here is a good article about how to use hash for tracking purpose instead of using question mark in the URL)
-Show limited content on the page (pagination issues)
The usage you are refering to is what Google recommends on how to make AJAX pages being able to be read by Google - https://support.google.com/webmasters/answer/174992?hl=en
For more info about hash tag and its SEO benefits, check this blog post - https://digitalreadymarketing.com/adding-hash-in-urls-seo-benefits/
In My personal opinion and 8 years in SEO & development It won't harm but it depends more on the site other parameters so adding the !# won't do harm...
Do you have the site URL so I can take a more in-depth Look ?
That could cause a problem if Google's crawler thought that there could be an infinite number of possibilities. Like with a ? in the url. But the answer beyond that is clear.
website.com/oreo-cookies
is more semantic and easier to understand for both people and crawlers than
website.com/#!oreo-cookies
But is this going to have a major impact? If you were a client paying me for SEO, I would tell you that your incoming text links with relevant keyword phrases from relevant related websites is far more important. I would also say that if you are submitting an xml sitemap for google to digest, and lots of popular websites are using the #! google will figure it out and ignore it.
So bottom line, if my content was worth linking to, and I made sure google was finding all my pages and indexing them, I would not worry about it.
I think that it will not harm your SEO in any way I am in SEO for last 5 years and haven't experienced such problem yet so don't worry about it. So my opinion is you can do it by adding the !# no harm !!
Related
I'm brand new to programming (though I'm willing to learn), so apologies in advance for my very basic question.
The [SEC makes available all of their filings via FTP][1], and eventually, I would like to download a subset of these files in bulk. However, before creating such a script, I need to generate a list for the location of these files, which follow this format:
/edgar/data/51143/000005114313000007/0000051143-13-000007-index.htm
51143 = the company ID, and I already accessed the list of company IDs I need via FTP
000005114313000007/0000051143-13-000007 = the report ID, aka "accession number"
I'm struggling with how to figure this out as the documentation is fairly light. If I already have the 000005114313000007/0000051143-13-000007 (what the SEC calls the "accession number") then it's pretty straightforward. But I'm looking for ~45k entries and would obviously need to generate these automatically for a given CIK ID (which I already have).
Is there an automated way to achieve this?
Welcome to SO.
I'm currently scraping the same site, so I'll explain what I've done so far. What I am assuming is that you'll have the CIK numbers of the companies you're looking to scrape. If you search the company's CIK, you'll get a list of all of the files that are available for the company in question. Let's use Apple as an example (since they have a TON of files):
Link to Apple's Filings
From here you can set a search filter. The document you linked was a 10-Q, so let's use that. If you filter 10-Q, you'll have a list of all of the 10-Q documents. You'll notice that the URL changes slightly, to accommodate for the filter.
You can use Python and its web scraping libraries to take that URL and scrape all of the URLs of the documents in the table on that page. For each of these links you can scrape whatever links or information you want off the page. I personally use BeautifulSoup4, but lxml is another choice for web scraping, should you choose Python as your programming language. I would recommend using Python, as it's fairly easy to learn the basics and some intermediate programming constructs.
Past that, the project is yours. Good luck, I've posted some links below to get you started. I'm only allowed to post two links since I'm new to the site, so I'll give you the beautiful soup link:
Beautiful Soup Home Page
If you choose to use Python and are new to the language, check out the codecademy python course, and don't forget to check out lxml, as some people prefer it over BeautifulSoup (some people also use both in conjunction, so it's all a matter of personal preference).
Is there a problem if I have both English and Chinese versions of the same title/meta tags under the same exact url? I detect the language the user has set for the browser (through the http header "accept-language" field) and change the titles/meta tags based on the language set. I get a large percentage of my traffic from China and felt this was a better-localized user experience for those users BUT I have no idea how Google would view this. My gut feeling tells me that this is not good for SEO.
Baidu.com, a major Chinese search engine, does in fact pick up my translated tags however for other US based sites it does not translate their English title/meta tags into Chinese. I would think Chinese users are less likely to click on those.
Creating sub domains and or separate domains for other countries is not an option at this point. That being said should I only have one language (English) for my title/meta tags to avoid any search engine issues?
Thanks for any advice / wisdom you can offer. Really hoping to get clarity on best practices.
Thanks all!
Yes, it probably is a problem. Search engines see mixed language content. You are not describing how you “detect and change the titles/meta tags based on the users browser language”, but you are probably doing it client-side and using “browser language”, which is wrong whatever it means in detail (it does not specify the user’s preferred language).
To get a more targeted answer, ask a more real question, with a URL.
If you want to get search traffic from search engines in both English and Chinese, you should have two urls instead of one.
When googlebot crawls a page, it does not even send the "Accept-Language" header. You have to send it your default language. When there is one url, there is no way for you to have your second language indexed. You won't be ranked in search engines in multiple languages.
For best SEO, use separate top level domains, subdomains, or folders for different languages.
http://example.de/
http://example.es/
http://example.com/
http://de.example.com/
http://es.example.com/
http://www.example.com/
http://example.com/de/
http://example.com/es/
http://example.com/en/
I think there are no problem when you use English and Chinese in same meta tags.
I have seen these "domain.com/#!/" formated urls, and driven merely by curiosity I chose to ask you people... what is that used for? A kinda "exclamated-hashtag" if you know what I mean.
I see it on sites such as "hypem.com" or "buzzchips.com", both of them delivering asynchronous dynamic content in a similar way.
I uploaded a tiny shot just so you actually see what I see, here and there.
It appears to be a standard for allowing dynamically created content to be crawled.
You can see a good explanation of this under the SEO heading for the following answer:
https://softwareengineering.stackexchange.com/questions/46716/what-should-a-developer-know-before-building-a-public-web-site/46760#46760
New OScommerce user.
I've been fiddling around with Chemo's Ultimate SEO add-on the last few days. I've mostly got it working (minus one bizarre redirect loop for category pages?) but I'm a little disappointed in the limited options for formatting URLs.
I'm seeing:
http://www.website.com/category-awesomeproduct-p-1735.html
When we'd really like to do something more in line with:
http://www.website.com/category/awesomeproduct
What are my options? Am I out of luck?
I fear that the stock URL parameters are rigidly defined and that there's no way to hide the less friendly ones.
After researching this for quite awhile, and receiving no answers here, I believe the answer is: no
The view controller expects that data and it can't be omitted, even with customizations installed.
What do you think.. are clean URLs a backend or frontend 'discipline'
The answer is BOTH.
For example:
https://stackoverflow.com/questions/203278/are-clean-urls-a-backend-or-a-frontend-thing
The number above is a database id, a back-end thing. Chop off the pretty part and it goes to the same page. Therefore the "are-clean-urls-a-backend-or-a-frontend-thing" is part of the front-end thing.
If we're talking url's being 'clean' from an end user experience then I'm going to break the mould a bit and say that url's in general are not intuitive and they never will be, they are intended to be machine readable.
There is no standard to the format of a url such that when navigating from site to site humans will never ever remember how to reach a resource purely through remembering urls and their 'friendly syntax'. We can argue the toss about whether using a '?' and '&' or '/' to express how how to identify a resource via a url; is one method better than the other? it doesn't matter. At the end of the day a machine parses it and sends back the result.
We should stop deluding ourselves that people actually type these things in and realise that uri's are for machines, not people.
I have yet to use/remember a uri that goes beyond the first few characters of the http://domain.com/ part of an address, and I've been using the web since a long time. That's what bookmarks are for. Nowhere on a website does it say 'change this part here in our url to view 'whatever else' resource' because url's are usually undocumented and opaque.
Yes make your uri's SEO friendly (hell even they change periodically) but forget about the whole 'human/clean' resource identifier thing, it's a mystical pipe dream.
I agree with Vlion that url's should provide a unique mechanism to bookmark a resource and return to it (unlike some of these abominable web 2.0 ajax/silverlight/flash creations), but the bookmark will never be for humans to comprehend and understand. There seems to be quite a lot of preoccupation and energy spent in dreaming up url strategies that humans can remember and type in, it's a waste of energy. Let's get on and solve real problems.
Sorry for the rant, but there's a lot of web 2.0 nonsense related to urls going on in certain circles that are just a total waste of time.
Now that Firefox's Awesome bar and Google Chrome's Omnibox address bars can be used to search the browsing history it makes it much easier for users to search their history for previously visited sites, so having clean urls may help the user find sites in their history more easily.
Making sure the page has an appropriate Title is important (as both browsers search the title as well as the url) but by making sure the url has relevant keywords in it as well, when those keywords are typed in the address bar the urls will be more likely to show up higher in the suggestions as the keyword will be matched twice, in the url and the title.
Also, once a user has typed the name of a site they will be presented with example urls from the site which they can then use as a template for narrowing down their search. So using verbs and nouns in the url for different sections or actions of the site will aid the user to narrow their search to just the part of the site they are interested in, e.g. the /questions/ or /tag/ sections of stackoverflow, or the "/doc" at the end of docs.google.com/doc that can be used to view just document pages on Google docs*.
Since both Firefox and Chrome search for each space separated word typed into the address bar, it could be argued that it isn't necessary for searching that the url be completely human readable, but to allow the user to actually read the keywords they are interested in from the url the amount of "noise" should be kept to a minimum.
* which are of the form http://docs.google.com/Doc?id=gibberish
My perspective is simple:
every place I visit with my browser(with various edge case exceptions) should be bookmarkable and Forward/Back should be usable and not destroy any data entry.
Backend for sure. Your server is the one that has to take care of the routing to the resources requested by the URL.
I think the main reasons for using friendly URLs are:
Ease of linking / sharing
Presentation
Seo
So I think it's purely a client-side pleasure. While they're nice on the server as well, they're not mission critical.