I have a project that contains a library and some applications, and I am including man pages. I am generating the man pages using ronn, and I have a rule in my makefile that looks like:
%.1 : %.1.ronn
$(RONN) -r $<
This works well, and I create the man pages using markdown and ronn happily spits out the man pages.
The problem is that not all of my man pages are for section 1 of the manual. I would like to put the library pages in section 3, and I may need other sections later as well. I could define more rules, one for each section, changing the %.1 to %.2, %.3, etc. I was wondering, though, if there is a way to simply do %.n (where n would match any number or maybe single character), to cut down on the number of rules in my makefile.
Is this possible? My google results are not showing anything, and nothing I have tried so far has worked.
Thanks in advance for any help
Use % instead of %.1, and %.ronn instead of %.1.ronn.
If you have files name.1.ronn, name.2.ronn, the % will match name.1 and name.2.
Related
I want to read out and later process a value from a website (Facebook Ads) from a bash script that runs daily. Unfortunately I need to be logged in to get this value:
So far I've figured out how to log into this website on Firefox and save the html file where the value could theoretically be read out:
The only unique identifier in this file is the first instance of "Gesamtausgaben". Is there any way with this information to cut out everything besides "100,10" ?
I'd also be happy for a different kind of way to get this value. And no, I don't have any API access.
I appreciate all ideas.
Thanks,
Patrick
How to Parse HTML (Badly) with PCRE
You can't reliably parse HTML with just regular expressions, so you'll need an XML/HTML or XPATH parser to do this properly. That said, if you have a PCRE-compatible grep then the following will likely work provided the HTML is minified and the class isn't re-used on your page.
$ pcregrep -o 'span class=".*_3df[ij].*>\K[^<]+' foo.html
100,10 €
If your target HTML spreads across multiple lines, or if you have multiple spans with the same classes assigned, then you'll have to do some work to refine the regular expression and differentiate between which matches are important to you. Context lines or subsequent matches may be helpful, but your mileage will definitely vary.
In markdown I can write:
[example1][myid]
[example2][myid]
[myid]: http://example.com
so I don't have to retype the full external link multiple times.
Is there an analogous feature in AsciiDoc? Specially interested in the Asciidoctor implementation.
So far I could only find:
internal cross references with <<>>
I think I saw a replacement feature of type :myid:, but I can't find it anymore. And I didn't see how to use different texts for each link however.
Probably you mean something like this:
Userguide Chapter 28.1. Setting configuration entries
...
Attribute entries promote clarity and eliminate repetition
URLs and file names in AsciiDoc3 macros are often quite long — they break paragraph flow and readability suffers. The problem is compounded by redundancy if the same name is used repeatedly. Attribute entries can be used to make your documents easier to read and write, here are some examples:
:1: http://freshmeat.net/projects/asciidoc3/
:homepage: http://asciidoc3.org[AsciiDoc3 home page]
:new: image:./images/smallnew.png[]
:footnote1: footnote:[A meaningless latin term]
Using previously defined attributes: See the {1}[Freshmeat summary]
or the {homepage} for something new {new}. Lorem ispum {footnote1}.
...
BTW, there is a 100% Python3 port available now: https://asciidoc3.org
I think you are looking for this (and both will work just fine),
https://www.google.com[Google]
or
link: https://google.com[Google]
Reference:
Ascii Doc User Manual Link
Update #1: Use of link along with variables in asciidoc
Declare variable
:url: https://www.google.com
Use variable feature using format mentioned above
Using ' Link with label '
{url}[Google]
Using a relative link
link:{url}[Google]
I've got a rather large asciidoc document that I translate dynamically to PDF for our developer guide. Since the doc often refers to Java classes that are documented in our developer guide we converted them into links directly in the docs e.g.:
In this block we create a new
https://www.codenameone.com/javadoc/com/codename1/ui/Form.html[Form]
named `hi`.
This works rather well for the most part and looks great in HTML as every reference to a class leads directly to its JavaDoc making the reference/guide process much simpler.
However when we generate a PDF we end up with something like this on some pages:
Normally I wouldn't mind a lot of footnotes or even repeats from a previous page. However, in this case the link to Container appears 3 times.
I could remove some of the links but I'd rather not since they make a lot of sense on the web version. Since I also have no idea where the page break will land I'd rather not do it myself.
This looks to me like a bug somewhere, if the link is the same the footnote for the link should only be generated once.
I'm fine with removing all link footnotes in the document if that is the price to pay although I'd rather be able to do this on a case by case basis so some links would remain printable
Adding these two parameters in fo-pdf.xsl remove footnotes:
<xsl:param name="ulink.footnotes" select="0"></xsl:param>
<xsl:param name="ulink.show" select="0"></xsl:param>
The first parameter disable footnotes, which triggers urls to re-appear inline.
The second parameter removes urls from the text. Links remain active and clickable.
Non-zero values toggle these parameters.
Source:
http://docbook.sourceforge.net/release/xsl/1.78.1/doc/fo/ulink.show.html
We were looking for something similar in a slightly different situation and didn't find a solution. We ended up writing a processor that just stripped away some of the links e.g. every link to the same URL within a section that started with '==='.
Not an ideal situation but as far as I know its the only way.
I am working on a project using a bash shell script. The idea is to grep a wget retrieved page, in order to pick up a certain paragraph on the web page. The area I would like to copy, usually starts with a
<p><b>
but the paragraph also contains other bits of HTML code, such as anchor tags, that I don't want to be in the output of the grep.
I have tried
cat page.html| grep "<p><b>" >grep.txt
and then I grep the output file, which now contains the paragraph I want
cat grep.txt|grep -v '<p>|<b>|<a>' >grep.txt
but then all it does is clear everything from the file and not read anything. How can I get it to exclude only the HTML code?
I am also trying to follow the links that are in the paragraph that I grep, in order to do the same thing with those pages. Only 2 levels deep, so the main page and then what ever sub page(s) stem from the first paragraph of the main page. I know this is a difficult idea, hopefully I explained well enough to get some help. If you have any ideas, any help is appreciated.
Do you have to do this in bash? It seems to me that Python would lend itself to this problem, in particular a library called Beautiful Soup.
I've used this for parsing HTML in the past and it's the easiest tool I could find. It has good documentation for dealing with html.
Perhaps you could make a standalone python code that extracts the HTML and then echos the string you're after. The python code could then be called from inside your bash script if you have some bash functions you want to perform on the string.
I know this is 7 years old but just posting solution I have with bash
https://api.jquery.com/jquery.grep/
I want to start a site with a collection of BSD man-pages, similar to man.cgi, but static HTML, and which includes all the stuff from the ports trees, too.
I've tried unpacking man/ from all the OpenBSD packages for a recent release, and I've noticed that although some packages provide mdoc pages, in man/man?/page.?, some others only provide terminal formatted pages in man/cat?/page.0.
I can use groff -mdoc -Thtml or mandoc -Txhtml for the mdoc files in man/man?/, but how do I convert the cat files from man/cat?/ into XHTML?
How do those man.cgi scripts at FreeBSD.org and NetBSD.org do this?
In MirBSD we’re delivering all online manpages as static HTML (the actual web CGI is thus very small), and use a crafty script to convert the output of nroff -Tcol foo.1 | col -x to XHTML/1.1 – although, for this to work, we had to tweak nroff(1) and the mdoc and man macropackages (and ms and me etc.) slightly. We only ship all manpages from base, as well as the historic BSD docs, though.
Also, GNU gnroff has no -Tcol, but -Tascii will work – but if you want to use this with gnroff output, you might need to change the regexps accordingly.
Be extra careful when editing this file: it contains normal UTF-8 stuff as well as extra control characters and invalid byte sequences; if you’re not careful, your editor will corrupt it. (I’m using jupp myself.)
For more live feedback on this, feel free to visit the MirBSD IRC channel.
As to your original goal: I suggest to only harvest manpages from binary packages, because they often get changed during compilation, such as by AC_SUBST in autoconf, or even only generated as part of the package build.