Link conventions - coding-style

There are multiple ways of declaring a link with a title or without. I'm wondering whether there is a preferred way of doing it, either official or widely used in the community.
Taken from markdownguide.org/basic-syntax/#formatting-the-second-part-of-the-link, here are all the ways of declaring reference-style links (this also applies to inline-style links, there would just be a ( instead of : and a ) at the end):
[1]: https://en.wikipedia.org/wiki/Hobbit#Lifestyle
[1]: https://en.wikipedia.org/wiki/Hobbit#Lifestyle "Hobbit lifestyles"
[1]: https://en.wikipedia.org/wiki/Hobbit#Lifestyle 'Hobbit lifestyles'
[1]: https://en.wikipedia.org/wiki/Hobbit#Lifestyle (Hobbit lifestyles)
[1]: <https://en.wikipedia.org/wiki/Hobbit#Lifestyle> "Hobbit lifestyles"
[1]: <https://en.wikipedia.org/wiki/Hobbit#Lifestyle> 'Hobbit lifestyles'
[1]: <https://en.wikipedia.org/wiki/Hobbit#Lifestyle> (Hobbit lifestyles)
I assume the whitespace after [1]: should be included, considering they haven't shown this option. Is that correct?
I've looked at the official documentation by John Gruber at daringfireball.net/projects/markdown and it looks like they prefer [name]: link "title".
If there's not a "preferred" way, which one do you use and why? Are any of them functionally different in any way whatsoever?

White space after [1]: should be included it even says so in Formatting the Second Part of the Link: "The label, in brackets, followed immediately by a colon and at least one space"
My preferred style is [1]: https://en.wikipedia.org/wiki/Hobbit#Lifestyle "Hobbit lifestyles". I try to avoid single quotes and brackets.

Related

Find HTML Tags in Properties

My current issue is to find HTML-Tags inside of property values. I thought it would be easy to search with a query like /jcr:root/content/xgermany//*[jcr:contains(., '<strong>')] order by #jcr:score
It looks like there is a problem with the chars < and > because this query finds everything which has strong in it's property. It finds <strong>Some Text</strong> but also This is a strong man.
Also the Query Builder API didn't helped me.
Is there a possibility to solve it with a XPath or SQL Query or do I have to iterate through the whole content?
I don't fully understand why it finds This is a strong man as a result for '<strong>', but it sounds like the unexpected behavior comes from the "simple search-engine syntax" for the second argument to jcr:contains(). Apparently the < > are just being ignored as "meaningless" punctuation.
You could try quoting the search term:
/jcr:root/content/xgermany//*[jcr:contains(., '"<strong>"')]
though you may have to tweak that if your whole XPath expression is enclosed in double quotes.
Of course this will not be very robust even if it works, since you're trying to find HTML elements by searching for fixed strings, instead of actually parsing the HTML.
If you have an specific jcr:primaryType and the targeted properties you can do something like this
select * from nt:unstructured where text like '%<strong>%'
I tested it , but you need to know the properties you are intererested in.
This is jcr-sql syntax
Start using predicates like a champ this way all of this will make sense to you!
HTML Encode <strong>
HTML Decimal <strong>
Query builder is your friend:
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%3Cstrong%3E%25
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have a go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%26lt%3Bstrong%26gt%3B%25
XPath:
/jcr:root/content/geometrixx//element(*, nt:unstructured)
[
jcr:like(#text, '%<strong>%')
]
SQL2 (already covered... NASTY YUK..)
SELECT * FROM [nt:unstructured] AS s WHERE ISDESCENDANTNODE([/content/geometrixx]) and text like '%<strong>%'
Although I'm sure it's entirely possible with a string of predicates, it's possibly heading down the wrong route. Ideally it would be better to parse the HTML when it is stored or published.
The required information would be stored on simple properties on the node in question. The query will then be a lot simpler with just a property = value query, than lots of overly complex query syntax.
It will probably be faster too.
So if you read in your HTML with something like HTMLClient and then parse it with a OSGI service, that can accurately save these properties for you. Every time the HTML is changed the process would update these properties as necessary. Just some thoughts if your SQL is getting too much.

How to use substring() with Import.io?

I'm having some issues with XPath and import.io and I hope you'll be able to help me. :)
The html code:
<a href="page.php?var=12345">
For the moment, I manage to extract the content of the href ( page.php?var=12345 ) with this:
./td[3]/a[1]/#href
Though, I would like to just collect: 12345
substring might be the solution but it does not seem to work on import.io as I use it...
substring(./td[3]/a[1]/#href,13)
Any ideas of what the problem is?
Thank's a lot in advance!
Try using this for the xpath: (Have the field selected as Text)
.//*[#class='oeil']/a/#href
Then use this for your regex:
([^=]*)$
This will get you the ISBN number you are looking for.
import.io only support functions in XPath when they return a node list
Your path expression is fine, but perhaps it should be
substring(./td[3]/a[1]/#href,14)
"Does not seem to work" is not a very clear description of what is wrong. Do you get error messages? Is the output wrong? Do you have any code surrounding the path expression you could show?
You can use substring, but using substring-after() would be even better.
substring-after(/a/#href,'=')
assuming as input the tiny snippet you have shown:
<a href="page.php?var=12345"/>
will select
12345
and taking into account the structure of your input
substring-after(./td[3]/a[1]/#href,'=')
A leading . in a path expression selects only immediate child td nodes of the current context node. I trust you know what you are doing.

What the heck are these characters?

I recently read this post on stack overflow:
RegEx match open tags except XHTML self-contained tags
The top reply contains text with text which appears to 'bleed':
ea͠ki̧n͘g fr̶ǫm ̡yo​͟ur eye͢s̸ ̛l̕ik͏e liq​uid pain, the song of re̸gular exp​ression parsing will exti​nguish the voices of mor​tal man from the sp​here I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful t​he final snuffing of the lie​s of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL I​S LOST the pon̷y he comes he c̶̮omes he comes the ich​or permeates all MY FACE MY FACE ᵒh god no NO NOO̼O​O NΘ stop the an​*̶͑̾̾​̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e n​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ
..
Lookig at these individually they look like single characters. How are they created? How can I find more information about them? For example, the "A" character:
A̡͊͠͝
WTF is that?
Those are combined Unicode characters.
http://es.wikipedia.org/wiki/Unicode
and
http://en.wikipedia.org/wiki/Combining_character

How do I define a SemgrexPattern in Stanford API, to match nodes' text without using {lemma:...}?

I am working with the edu.stanford.nlp.semgrex and edu.stanford.nlp.tress.semgraph packages and am looking for a way to match nodes with a text value other than the lemma: directive.
I couldn't find all possible attribute names in javadoc for SemgrexPattern, only those for lemma, tag, and relational operators - is there a comprehensive list available?
For example, in the following sentence
My take-home pay is $20.
extracting the 'take-home' node is not possible using
(SemgrexPattern.compile( "{lemma:take-home}"))
.matcher( "My take-home pay is $20.").find()
yields false, because take-home is deemed not to be a lemma.
What do I need to do to match nodes with non-lemma, arbitrary text?
Thanks for any advice or comment.
Sorry - I realize that {word:take-home} would work in the example above.
Thanks..

using xpath in selenium.get.Text and selenium.click

I have Адреса магазинов on page and want to store text, then click on this link and verify that the page where am I going to contains this text in headers. So I tried to find element by xpath, and selenium.getText get the right result, but selenium.click goes to another link. Where have I made a mistake? Thanks in advance!
String m_1 = selenium.getText("xpath=html/body/div[3]/div[2]/div[1]/h4[1]");
selenium.click("xpath=html/body/div[3]/div[2]/div[1]/h4[1]");
selenium.waitForPageToLoad("30000");
assertTrue(selenium.getText("css=h3").contains(m_1));
page:http://www.svyaznoy.ru/map/
Resume:
using xpath=//descendant::a[#href='/address_shops/'][2] or css=div.deff_one_column a[href='/address_shops/'] get right results
using xpath=//a[#href='/address_shops/'] - Element is not currently visible
xpath=//a[#href='/address_shops/'][2] - Element not found
There is a missing slash at the beginning of the expression. I am kind of surprised this got through at all - the first slash means "begin at root node".
Also, it is better to select the <a> element instead of the <h>. Sometimes it works, sometimes is misclicks, sometimes the click doesn't do anything at all. Try to be as concrete as you can be.
Try this one.
String m1 = selenium.getText("xpath=/html/body/div[3]/div[2]/div/h4/a");
selenium.click("xpath=/html/body/div[3]/div[2]/div/h4/a");
selenium.waitForPageToLoad("30000");
// your variable is named m1, but m_1 was used here
assertTrue(selenium.getText("css=h3").contains(m1));
By the way, there are even better XPath expressions you could use. See the documentation, it really is helpful. Just an example, this would work, too, and is much easier to write and read:
String m1 = selenium.getText("xpath=//a[#href='/address_shops/']");
selenium.click("xpath=//a[#href='/address_shops/']");
Sorry, didn't notice page link. Css for second link can be something like that css=div.deff_one_column a[href='/address_shops/']

Resources