Restructured text (rst) http links underscore ('__' vs '_' use) - python-sphinx

With restructured text, I've seen both these used:
`Some Link <http://www.some.com>`_
`Some Link <http://www.some.com>`__
Both generate the same output from Sphinx,
Whats the difference between using _ or a double underscore __ for http URL links?
Why would you one over another?

In short, if its a one-off (anonymous) URL which you don't intend to reference, use double underscore.
In practice you could use either in most cases, they generate the same HTML output for example.
However, using single underscores for links means that by default you're creating a reference target - which could conflict with other references of the same name.
So this for example will warn:
.. _Thing:
Title
=====
Text with `Thing <http://link.com>`_.
WARNING: Duplicate target name, cannot be used as a unique reference: "thing".
While this could be overlooked in most cases, it could make for confusing situations especially for anyone inexperienced with reStructuredText. So you may prefer to avoid this entirely only defining targets when that is your intention.
According to:
http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#anonymous-hyperlinks
With a single trailing underscore, the reference is named and the same target URI may be referred to again. With two trailing underscores, the reference and target are both anonymous, and the target cannot be referred to again. These are "one-off" hyperlinks.
There are examples on the page links.

Related

How to create external link references in AsciiDoc without repeating the URL multiple times?

In markdown I can write:
[example1][myid]
[example2][myid]
[myid]: http://example.com
so I don't have to retype the full external link multiple times.
Is there an analogous feature in AsciiDoc? Specially interested in the Asciidoctor implementation.
So far I could only find:
internal cross references with <<>>
I think I saw a replacement feature of type :myid:, but I can't find it anymore. And I didn't see how to use different texts for each link however.
Probably you mean something like this:
Userguide Chapter 28.1. Setting configuration entries
...
Attribute entries promote clarity and eliminate repetition
URLs and file names in AsciiDoc3 macros are often quite long — they break paragraph flow and readability suffers. The problem is compounded by redundancy if the same name is used repeatedly. Attribute entries can be used to make your documents easier to read and write, here are some examples:
:1: http://freshmeat.net/projects/asciidoc3/
:homepage: http://asciidoc3.org[AsciiDoc3 home page]
:new: image:./images/smallnew.png[]
:footnote1: footnote:[A meaningless latin term]
Using previously defined attributes: See the {1}[Freshmeat summary]
or the {homepage} for something new {new}. Lorem ispum {footnote1}.
...
BTW, there is a 100% Python3 port available now: https://asciidoc3.org
I think you are looking for this (and both will work just fine),
https://www.google.com[Google]
or
link: https://google.com[Google]
Reference:
Ascii Doc User Manual Link
Update #1: Use of link along with variables in asciidoc
Declare variable
:url: https://www.google.com
Use variable feature using format mentioned above
Using ' Link with label '
{url}[Google]
Using a relative link
link:{url}[Google]

Can I use underscore in url instead of hyphen?

I would like to use underscore in my url instead of hyphen.
I mean like this wikipedia link
My Current url:
www.example.com/2013/01/hello-this-is-a-test-post/
Desired url
www.example.com/2013/01/hello_this_is_a_test_post/
But one good programmer in wordpress stackexchange advised me, Google treats - as word separator, but not _.
He also mentioned that rule doesn't apply for MediaWiki sites.
Is it true?
Google treats hyphens as word seperators is TRUE.
The reasoning behind it I recall is based on programmers searching for functions which usually (if not always) have underscores in them. So instead Google treats underscores as word joiners.
This article elaborates: http://www.ecreativeim.com/blog/2011/03/seo-basics-hyphen-or-underscore-for-seo-urls/

Insert a hyperlink to another file (Word) into Visual Studio code file

I am currently developing some functionality that implements some complex calculations. The calculations themselves are explained and defined in Word documents.
What I would like to do is create a hyperlink in each code file that references the assocciated Word document - just as you can in Word itself. Ideally this link would be placed in or near the XML comments for each class.
The files reside on a network share and there are no permissions to worry about.
So far I have the following but it always comes up with a file not found error.
file:///\\165.195.209.3\engdisk1\My Tool\Calculations\111-07 MyToolCalcOne.docx
I've worked out the problem is due to the spaces in the folder and filenames.
My Tool
111-07 MyToolCalcOne.docx
I tried replacing the spaces with %20, thus:
file:///\\165.195.209.3\engdisk1\My%20Tool\Calculations\111-07%20MyToolCalcOne.docx
but with no success.
So the question is; what can I use in place of the spaces?
Or, is there a better way?
One way that works beautifully is to write your own URL handler. It's absolutely trivial to do, but so very powerful and useful.
A registry key can be set to make the OS execute a program of your choice when the registered URL is launched, with the URL text being passed in as a command-line argument. It just takes a few trivial lines of code to will parse the URL in any way you see fit in order to locate and launch the documentation.
The advantages of this:
You can use a much more compact and readable form, e.g. mydocs://MyToolCalcOne.docx
A simplified format means no trouble trying to encode tricky file paths
Your program can search anywhere you like for the file, making the document storage totally portable and relocatable (e.g. you could move your docs into source control or onto a website and just tweak your URL handler to locate the files)
Your URL is unique, so you can differentiate files, web URLs, and documentation URLs
You can register many URLs, so can use different ones for specs, designs, API documentation, etc.
You have complete control over how the document is presented (does it launch Word, an Internet Explorer, or a custom viewer to display the docs, for example?)
I would advise against using spaces in filenames and URLs - spaces have never worked properly under Windows, and always cause problems (or require ugliness like %20) sooner or later. The easiest and cleanest solution is simply to remove the spaces or replace them with something like underscores, dashes or periods.

Do I really need to encode '&' as '&'?

I'm using an '&' symbol with HTML5 and UTF-8 in my site's <title>. Google shows the ampersand fine on its SERPs, as do all the browsers in their titles.
http://validator.w3.org is giving me this:
& did not start a character reference. (& probably should have been escaped as &.)
Do I really need to do &?
I'm not fussed about my pages validating for the sake of validating, but I'm curious to hear people's opinions on this and if it's important and why.
Yes. Just as the error said, in HTML, attributes are #PCDATA meaning they're parsed. This means you can use character entities in the attributes. Using & by itself is wrong and if not for lenient browsers and the fact that this is HTML not XHTML, would break the parsing. Just escape it as & and everything would be fine.
HTML5 allows you to leave it unescaped, but only when the data that follows does not look like a valid character reference. However, it's better just to escape all instances of this symbol than worry about which ones should be and which ones don't need to be.
Keep this point in mind; if you're not escaping & to &, it's bad enough for data that you create (where the code could very well be invalid), you might also not be escaping tag delimiters, which is a huge problem for user-submitted data, which could very well lead to HTML and script injection, cookie stealing and other exploits.
Please just escape your code. It will save you a lot of trouble in the future.
Validation aside, the fact remains that encoding certain characters is important to an HTML document so that it can render properly and safely as a web page.
Encoding & as & under all circumstances, for me, is an easier rule to live by, reducing the likelihood of errors and failures.
Compare the following: which is easier? Which is easier to bugger up?
Methodology 1
Write some content which includes ampersand characters.
Encode them all.
Methodology 2
(with a grain of salt, please ;) )
Write some content which includes ampersand characters.
On a case-by-case basis, look at each ampersand. Determine if:
It is isolated, and as such unambiguously an ampersand. eg. volt & amp > In that case don't bother encoding it.
It is not isolated, but you feel it is nonetheless unambiguous, as the resulting entity does not exist and will never exist since the entity list could never evolve. E.g., amp&volt >. In that case, don't bother encoding it.
It is not isolated, and ambiguous. E.g., volt&amp > Encode it.
??
HTML5 rules are different from HTML4. It's not required in HTML5 - unless the ampersand looks like it starts a parameter name. "&copy=2" is still a problem, for example, since © is the copyright symbol.
However it seems to me that it's harder work to decide to encode or not to encode depending on the following text. So the easiest path is probably to encode all the time.
I think this has turned into more of a question of "why follow the spec when browser's don't care." Here is my generalized answer:
Standards are not a "present" thing. They are a "future" thing. If we, as developers, follow web standards, then browser vendors are more likely to correctly implement those standards, and we move closer to a completely interoperable web, where CSS hacks, feature detection, and browser detection are not necessary. Where we don't have to figure out why our layouts break in a particular browser, or how to work around that.
Specifically, if HTML5 does not require using & in your specific situation, and you're using an HTML5 doctype (and also expecting your users to be using HTML5-compliant browsers), then there is no reason to do it.
Well, if it comes from user input then absolutely yes, for obvious reasons. Think if this very website didn't do it: the title of this question would show up as Do I really need to encode ‘&’ as ‘&’?
If it's just something like echo '<title>Dolce & Gabbana</title>'; then strictly speaking you don't have to. It would be better, but if you don't, no user will notice the difference.
Could you show us what your title actually is? When I submit
<!DOCTYPE html>
<html>
<title>Dolce & Gabbana</title>
<body>
<p>Am I allowed loose & mpersands?</p>
</body>
</html>
to http://validator.w3.org/ - explicitly asking it to use the experimental HTML 5 mode - it has no complaints about the &s...
In HTML, a & marks the begin of a reference, either of a character reference or of an entity reference. From that point on, the parser expects either a # denoting a character reference, or an entity name denoting an entity reference, both followed by a ;. That’s the normal behavior.
But if the reference name or just the reference opening & is followed by a white space or other delimiters like ", ', <, >, &, the ending ; and even a reference to represent a plain, & can be omitted:
<p title="&">foo & bar</p>
<p title="&amp">foo &amp bar</p>
<p title="&">foo & bar</p>
Only in these cases can the ending ; or even the reference itself be omitted (at least in HTML 4). I think HTML 5 requires the ending ;.
But the specification recommends to always use a reference like the character reference & or the entity reference & to avoid confusion:
Authors should use "&" (ASCII decimal 38) instead of "&" to avoid confusion with the beginning of a character reference (entity reference open delimiter). Authors should also use "&" in attribute values since character references are allowed within CDATA attribute values.
Update (March 2020): The W3C validator no longer complains about escaping URLs.
I was checking why image URLs need escaping and hence tried it in https://validator.w3.org. The explanation is pretty nice. It highlights that even URLs need to be escaped. [PS: I guess it will be unescaped when it's consumed since URLs need &. Can anyone clarify?]
<img alt="" src="foo?bar=qut&qux=fop" />
An entity reference was found in the document, but there is no
reference by that name defined. Often this is caused by misspelling
the reference name, unencoded ampersands, or by leaving off the
trailing semicolon (;). The most common cause of this error is
unencoded ampersands in URLs as described by the WDG in "Ampersands in
URLs". Entity references start with an ampersand (&) and end with a
semicolon (;). If you want to use a literal ampersand in your document
you must encode it as "&" (even inside URLs!). Be careful to end
entity references with a semicolon or your entity reference may get
interpreted in connection with the following text. Also keep in mind
that named entity references are case-sensitive; &Aelig; and æ
are different characters. If this error appears in some markup
generated by PHP's session handling code, this article has
explanations and solutions to your problem.
It depends on the likelihood of a semicolon ending up near your &, causing it to display something quite different.
For example, when dealing with input from users (say, if you include the user-provided subject of a forum post in your title tags), you never know where they might be putting random semicolons, and it might randomly display strange entities. So always escape in that situation.
For your own static HTML content, sure, you could skip it, but it's so trivial to include proper escaping, that there's no good reason to avoid it.
If the user passes it to you, or it will wind up in a URL, you need to escape it.
If it appears in static text on a page? All browsers will get this one right either way, and you don't worry much about it, since it will work.
Yes, you should try to serve valid code if possible.
Most browsers will silently correct this error, but there is a problem with relying on the error handling in the browsers. There is no standard for how to handle incorrect code, so it's up to each browser vendor to try to figure out what to do with each error, and the results may vary.
Some examples where browsers are likely to react differently is if you put elements inside a table but outside the table cells, or if you nest links inside each other.
For your specific example it's not likely to cause any problems, but error correction in the browser might for example cause the browser to change from standards compliant mode into quirks mode, which could make your layout break down completely.
So, you should correct errors like this in the code, if not for anything else so to keep the error list in the validator short, so that you can spot more serious problems.
A couple of years ago, we got a report that one of our web apps wasn't displaying correctly in Firefox. It turned out that the page contained a tag that looked like
<div style="..." ... style="...">
When faced with a repeated style attribute, Internet Explorer combines both of the styles, while Firefox only uses one of them, hence the different behavior. I changed the tag to
<div style="...; ..." ...>
and sure enough, it fixed the problem! The moral of the story is that browsers have more consistent handling of valid HTML than of invalid HTML. So, fix your damn markup already! (Or use HTML Tidy to fix it.)
If & is used in HTML then you should escape it.
If & is used in JavaScript strings, e.g., an alert('This & that'); or document.href, you don't need to use it.
If you're using document.write then you should use it, e.g. document.write(<p>this & that</p>).
If you're really talking about the static text
<title>Foo & Bar</title>
stored in some file on the hard disk and served directly by a server, then yes: it probably doesn't need to be escaped.
However, since there is very little HTML content nowadays that's completely static, I'll add the following disclaimer that assumes that the HTML content is generated from some other source (database content, user input, web service call result, legacy API result, ...):
If you don't escape a simple &, then chances are you also don't escape a & or a or <b> or <script src="http://attacker.com/evil.js"> or any other invalid text. That would mean that you are at best displaying your content wrongly and more likely are suspectible to XSS attacks.
In other words: when you're already checking and escaping the other more problematic cases, then there's almost no reason to leave the not-totally-broken-but-still-somewhat-fishy standalone-& unescaped.
The link has a fairly good example of when and why you may need to escape & to &
https://jsfiddle.net/vh2h7usk/1/
Interestingly, I had to escape the character in order to represent it properly in my answer here. If I were to use the built-in code sample option (from the answer panel), I can just type in & and it appears as it should. But if I were to manually use the <code></code> element, then I have to escape in order to represent it correctly :)

URL Rewriting, SEO and encoding

I found this article regarding URL Rewriting most useful.
But here are a couple of questions.
I would love to use a URL (before rewriting, with spaces inside the query string)
http://www.store.com/products.aspx?category=CD s-Dvd s
First of all, should I replace the spaces with the plus sign (+) for any reason? Like this:
http://www.store.com/products.aspx?category=CD+s-Dvd+s
Secondly, my native language is Greek. Should I encode the parameters? Generally speaking, would the result with URL encoding on be different, regarding S.E.O.?
Actually you should replace spaces with hyphens. That actually is better for SEO than using an underscore.
If the value must come through unaltered, then yes you must use escaping. In a URL query parameter value, a space may be encoded as + or %20. mod_rewrite will generally do this for you as long as the external version was suitably spelled.
In the external version of the URL, only %20 can be used:
http://www.store.com/products/CD%20s-Dvd%20s
http://www.store.com/products.php?category=CD%20s-Dvd%20s
because a + in a URL path part would literally mean a plus.
(Are you sure you want a space there? “CDs-DVDs” without the spaces would seem to be a better title.)
It is non-trivial to get arbitrary strings through from a path part to a parameter. Apart from the escaping issues, you've got problems with /, which should be encoded as %2F in a path part. However Apache will by default block any URL containing %2F for security reasons. (\ is similarly affected under Windows.) You can turn this behaviour off using the AllowEncodedSlashes config, but it means if you want to be portable you can't use “CDs/DVDs” as a category name.
For this reason, and because having a load of %20​s in your URL is a bit ugly, strings are usually turned into ‘slugs’ before being put in a URL, where all the contentious ASCII characters that would result in visible %-escapes are replaced with filler characters such as hyphen or underscore. This does mean you can't round-trip the string, so you need to store either a separate title and slug in the database to be able to look up the right entity for a given slug, or just use an additional ID in the URL (like Stack Overflow does).
General practice is to replace spaces with underscores, ala http://www.store.com/products.aspx?category=CD_s-Dvd_s

Resources