breakable slashes everywhere but URLs - python-sphinx

I generate pdf (latex) from restructured text using python sphinx (1.4.6) .
I use narrow table column headers with texts like "stuff/misc/other". I need the slashes to be breakable, so the table headers don't overflow into the next column.
The LaTeX solution is to use \BreakableSlash or \slash where necessary. I can use python code to replace all slashes:
from sphinx.util.texescape import tex_replacements
# \BreakableSlash needs package hyphenat to be loaded
tex_replacements.append((u'/', ur'\BreakableSlash ') )
# tex_replacements.append((u'/', ur'\slash ') )
But that will break any URL like http://www.example.com/ into something like
http:\unhbox\voidb#x\penalty\#M\hskip\z#skip/\discretionary{-}{}{}\penalty\#M\hskip\z#skip\unhbox\voidb#x\penalty\#M\hskip\z#skip/\discretionary{-}{}{}\penalty\#M\hskip\z#skipwww.example.com
or
http:/\penalty\exhyphenpenalty/\penalty\exhyphenpenaltywww.example.com
I'd like to use a general solution that works in both cases, where the editor of the documentation can still use normal ReST and doesn't have to worry about latex.
Any idea how to get classic slashes in URLs and breakable slashes everywhere else?

You have not really given data and source code and only asked for an idea, so I take the liberty of only sketching a solution in pseudo code:
Split the document into a list of strings at each position of a space using .split()
For each string, check whether it is an URL by comparing its left side to http:// (and maybe also ftp://, https:// or similar tags)
Do replacements, but only in strings which are no URLs
Recombine all strings including the spaces again, using a command such as " ".join(my_list)

One way to do it, might be to write a Transform subclass. And then use add transform in setup(app) to use it in every read.
I could use DefaultSubstitutions from transforms.py as template for my own class.

Related

How do I filter file names out of a SQLite dump?

I'm trying to filter out all file names from an SQLite text dump using Ruby. I'm not very handy/familiar with regex and need a way to read, and write to a file, another dump of image files that are within the SQLite dump. I can filter out everything except stuff like this:
VALUES(3,5,1,43,'/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG','1415',NULL);
and this:
src="/images/9/94/folder%2FGraph.JPG"
I can't figure out the easiest way to filter through this. I've tried using split and other functions, but instead of splitting the string into an array by the character specified, it just removed the character.
You should be able to use .gsub('%2', ' ') the %2 with a space, while quoted, it should be fine.
Split does remove the character that is being split, though. So you may not want to do that, or if you do, you may want to use the Array#join method with the argument of the character you split with to put it back in.
I want to 'extract' the file name from the statements above. Say I have src="/images/9/94/folder%2FGraph.JPG", I want folder%2FGraph.JPG to be extracted out.
If you want to extract what is inside the src parameter:
foo = 'src="/images/9/94/folder%2FGraph.JPG"'
foo[/^src="(.+)"/, 1]
=> "/images/9/94/folder%2FGraph.JPG"
That returns a string without the surrounding parenthesis.
Here's how to do the first one:
bar = "VALUES(3,5,1,43,'/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG','1415',NULL);"
bar.split(',')[4][1..-2]
=> "/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG"
Not everything in programming is a regex problem. Somethings, actually, in my opinion, most things, are not candidates for a pattern. For instance, the first example could be written:
foo.split('=')[1][1..-2]
and the second:
bar[/'(.+?)'/, 1]
The idea is to use whichever is most clean and clear and understandable.
If all you want is the filename, then use a method designed to return only the filename.
Use one of the above and pass its output to File.basename. Filename.basename returns only the filename and extension.

How to handle two dashes in ReST

I'm using Sphinx to document a command line utility written in Python. I want to be able to document a command line option, such as --region like this:
**--region** <region_name>
in ReST and then use Sphinx to to generate my HTML and man pages for me.
This works great when generating man pages but in the generated HTML, the -- gets turned into - which is incorrect. I have found that if I change my source ReST document to look like this:
**---region** <region_name>
The HTML generates correctly but now my man pages have --- instead of --. Also incorrect.
I've tried escaping the dashes with a backslash character (e.g. \-\-) but that had no effect.
Any help would be much appreciated.
This is a configuration option in Sphinx that is on by default: the html_use_smartypants option (http://sphinx-doc.org/config.html?highlight=dash#confval-html_use_smartypants).
If you turn off the option, then you will have to use the Unicode character '–' if you want an en-dash.
With
**-\\-region** <region_name>
it should work.
In Sphinx 1.6 html_use_smartypants has been deprecated, and it is no longer necessary to set html_use_smartypants = False in your conf.py or as an argument to sphinx-build. Instead you should use smart_quotes = False.
If you want to use the transformations formerly provided by html_use_smartypants, instead it is recommended to use smart_quotes, e.g., smart_quotes = True.
Note that at the time of this writing Read the Docs pins sphinx==1.5.3, which does not support the smart_quotes option. Until then, you'll need to continue using html_use_smartypants.
EDIT It appears that Sphinx now uses smartquotes instead of docutils smart_quotes. h/t #bad_coder.
To add two dashes, add the following:
.. include:: <isotech.txt>
|minus|\ |minus|\ region
Note the backward-slash and the space. This avoids having a space between the minus signs and the name of the parameter.
You only need to include isotech.txt once per page.
With this solution, you can keep the extension smartypants and write two dashes in every part of the text you need. Not just in option lists or literals.
As commented by #mzjn, the best way to address the original submitter's need is to use Option Lists.
The format is simple: a sequence of lines that start with -, --, + or /, followed by the actual option, (at least) two spaces and then the option's description:
-l long listing
-r reversed sorting
-t sort by time
--all do not ignore entries starting with .
The number of spaces between option and description may vary by line, it just needs to be at least two, which allows for a clear presentation (as above) on the source, as well as on the generated document.
Option Lists have syntax for an option argument as well (just put an additional word or several words enclosed in <> before the two spaces); see the linked page for details.
The other answers on this page targeted the original submitter's question, this one addresses their actual need.

C# MVC3 and non-latin characters

I have my database results (áéíóúàâêô...) and when I display any of this characters I get codes like:
á
My controller is like this:
ViewBag.EstadosDeAlma = (from e in db.EstadosDeAlma select e.Title).ToList();
My cshtml page is like this:
var data = '#foreach (dynamic item in ViewBag.EstadosDeAlma){ #(item + " ") }';
In addition, if I use any rich text editor as Tiny MCE all non-latin characters are like this too.
What should I do to avoid this problem?
What output encoding are you using on your web pages? I would suggest using UTF-8 since you want a lot of non-ascii characters to work.
I think you should HTML encode/decode the values before comparing them.
Since you are using jQuery you can take advantage of the encoding functions built-in into it. For example:
$('<div/>').html('& #225;gil').html()
gives you "ágil" (notice that I added an extra space between the & and the # so that stackoverflow does not encode it, you won't need it)
This other question has more information about this.
HTML-encoding lost when attribute read from input field

Inserting characters before whatever is on a line, for many lines

I have been looking at regular expressions to try and do this, but the most I can do is find the start of a line with ^, but not replace it.
I can then find the first characters on a line to replace, but can not do it in such a way with keeping it intact.
Unfortunately I don´t have access to a tool like cut since I am on a windows machine...so is there any way to do what I want with just regexp?
Use notepad++. It offers a way to record an sequence of actions which then can be repeated for all lines in the file.
Did you try replacing the regular expression ^ with the text you want to put at the start of each line? Also you should use the multiline option (also called m in some regex dialects) if you want ^ to match the start of every line in your input rather than just the first.
string s = "test test\ntest2 test2";
s = Regex.Replace(s, "^", "foo", RegexOptions.Multiline);
Console.WriteLine(s);
Result:
footest test
footest2 test2
I used to program on the mainframe and got used to SPF panels. I was thrilled to find a Windows version of the same editor at Command Technology. Makes problems like this drop-dead simple. You can use expressions to exclude or include lines, then apply transforms on just the excluded or included lines and do so inside of column boundaries. You can even take the contents of one set of lines and overlay the contents of another set of lines entirely or within column boundaries which makes it very easy to generate mass assignments of values to variables and similar tasks. I use Notepad++ for most stuff but keep a copy of SPFSE around for special-purpose editing like this. It's not cheap but once you figure out how to use it, it pays for itself in time saved.

Sphinx char_set resource?

I need to have Sphinx index some characters that are apparently part of the internal char_set, notably "," and "&".
As far as I understand there is no way (?!) to simply add these as indexable chars, I need to
Make a char_set table in my index
Not only include the "," and "&" as indexable characters but now manually add the char_set Sphinx uses
This is a little frustrating as it seems one should be able to add back in chars without manually recreating the char_set used internally. However if that is the situation that is the situation, yet in the documentation
http://sphinxsearch.com/docs/current/conf-charset-table.html
I don't see or understand how I would manually specify a char_set table and exclude / index the two or three characters I want.
You do need to duplicate the 'stock' charset_table before you can add to it.
At less than 100 characters, even if you typed it rather than copy/pasting, would still be les effort than typing this whole question!
(even less if use the the english shorthand)
Just copy what there and add your new chars to the end. If you don't know the unicode mapping, something like asciitable.com is useful at least for the chars in ASCII.

Resources