I'm using YAML for a computer and human-editable and readable input format for a simulator. For human readability, some parts of the input are mostly amenable to block style, while flow style suits others better.
The default for PyYAML is to use block style wherever there are nested maps or sequences, and flow style everywhere else. *default_flow_style* allows one to choose all-flow-style or all-block-style.
But I'd like to output files more of the form
bonds:
- { strength: 2.0 }
- ...
tiles:
- { color: red, edges: [1, 0, 0, 1], stoic: 0.1}
- ...
args:
block: 2
Gse: 9.4
As can be seen, this doesn't follow a consistent pattern for styles throughout, and instead changes depending upon the part of the file. Essentially, I'd like to be able to specify that all values in some block style sequences be in flow style. Is there some way to get that sort of fine-level control over dumping? Being able to dump the top-level mapping in a particular order while not requiring that order (eg, omap) would be nice as well for readability.
It turns out this can be done by defining subclasses with representers for each item I want not to follow default_flow_style, and then converting everything necessary to those before dumping. In this case, that means I get something like:
class blockseq( dict ): pass
def blockseq_rep(dumper, data):
return dumper.represent_mapping( u'tag:yaml.org,2002:map', data, flow_style=False )
class flowmap( dict ): pass
def flowmap_rep(dumper, data):
return dumper.represent_mapping( u'tag:yaml.org,2002:map', data, flow_style=True )
yaml.add_representer(blockseq, blockseq_rep)
yaml.add_representer(flowmap, flowmap_rep)
def dump( st ):
st['tiles'] = [ flowmap(x) for x in st['tiles'] ]
st['bonds'] = [ flowmap(x) for x in st['bonds'] ]
if 'xgrowargs' in st.keys(): st['xgrowargs'] = blockseq(st['xgrowargs'])
return yaml.dump(st)
Annoyingly, the easier-to-use dumper.represent_list and dumper.represent_dict don't allow flow_style to be specified, so I have to specify the tag, but the system does work.
Related
I am trying to customize my Awesome Window Manager to change the tag numbers into Roman numbers (changing 1 for I, 2 for II...). In order to achieve this, I am modifying my /etc/xdg/awesome/rc.lua file, specially the {{tags}} section.
I have found this blog post, in which he manages to edit the tag names at will, have a look at the top left corner:
I also read the rc.lua file attached to the theme, and realized the technique used for what I want to do is a for loop in combination with some tables.
This is the code snippet of interest in the file:
-- {{{ Tags
-- Define a tag table which hold all screen tags.
tags = {}
tagnames = { "irc", "mpd", "net", "usr", "png", "msg", }
taglayouts = {
awful.layout.suit.tile.top,
awful.layout.suit.tile.bottom,
awful.layout.suit.floating,
awful.layout.suit.fair,
awful.layout.suit.floating,
awful.layout.suit.floating }
for s = 1, screen.count() do
-- Each screen has its own tag table.
tags[s] = {}
for tagnumber = 1, 6 do
-- Add tags and name them.
tags[s][tagnumber] = tag(tagnames[tagnumber])
-- Add tags to screen one by one, giving them their layouts at the same time.
tags[s][tagnumber].screen = s
awful.layout.set(taglayouts[tagnumber], tags[s][tagnumber])
end
-- I'm sure you want to see at least one tag.
tags[s][1].selected = true
end
-- }}}
...and this is my rc.lua file:
-- {{{ Tags
-- Define a tag table which hold all screen tags.
tags = {}
tagnames = { "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX", }
taglayouts = {
awful.layout.suit.tile.top,
awful.layout.suit.tile.bottom,
awful.layout.suit.floating,
awful.layout.suit.fair,
awful.layout.suit.floating,
awful.layout.suit.floating }
for s = 1, screen.count() do
-- Each screen has its own tag table.
-- tags[s] = awful.tag({ "1", "2", "3", "4", "5", "6", "7", "8",$
tags[s] = {}
for tagnumber = 1, 9 do
tags[s][tagnumber] = tag(tagnames[tagnumber])
tags[s][tagnumber].screen = s
awful.layout.set(taglayouts[tagnumber], tags[s][tagnumber])
end
tags[s][1].selected = true
end
--- }}}
As you can see, they are pretty the same, with the difference that I have nine tags instead of six (I changed the code according to it). When I try to debug the setup using Xephyr, an error appears in the console and I am only able to see my wallpaper:
error while running function
stack traceback:
[C]: in global 'tag'
/etc/xdg/awesome/rc.lua:100: in main chunk
error: /etc/xdg/awesome/rc.lua:100: bad argument #2 to 'tag' (table expected, got string)
error while running function
stack traceback:
[C]: in global 'tag'
/etc/xdg/awesome/rc.lua:100: in main chunk
error: /etc/xdg/awesome/rc.lua:100: bad argument #2 to 'tag' (table expected, got string)
E: awesome: main:605: couldn't find any rc file
I can't see where the error is, as I am not able to detect any language violation in the error line tags[s][tagnumber] = tag(tagnames[tagnumber]): it's just filling the tags array with my custom names, telling it to treat them as a tag and not as a random string.
UPDATE: I have just realized that there are six layouts in taglayouts, the same number as tags in the original Lua file. I think I should have nine tag layouts, but I don't know which one should I add. Also, I don't see this as a critical impediment for the code to compile properly, as the error line does not have anything to do with the layout list.
UPDATE 2: Added three more awful.layout.suit.floating to taglayouts. Same error.
Following another answer, I replaced my {Tags} section with:
-- {{{ Tags
-- Define a tag table which hold all screen tags.
tagnum = { "I", "II", "III", "IV", "V", "VI", "VII",
"VIII", "IX" }
for i = 1, 9 do
awful.tag.add((tagnum[i]), {
layout = awful.layout.suit.tile,
master_fill_policy = "master_width_factor",
gap_single_client = true,
gap = 15,
screen = s,
})
end
-- }}}
This creates i number of tags, their name defined in the tagnum table. This is only useful if you want to create identical tags, but it will always be much cleaner than having to type i definitions.
A MUCH BETTER, CLEANER WAY:
The initial solution was useful, but it had a problem: when starting AwesomeWM, you won't appear in a defined tag, but in all of them at the same time. That is, if you open a terminal, you will open it in every tag you have unless you previously selected one with Mod4+TagNum (following default conf.).
Trying to solve this problem, I compared the default configuration file with the modded one, and I realized it all worked well in the default one. So I started modifying the code in order to find a solution. In all, I have discovered that with a minimal modification of the default code you are able to customize your tag names at will. This is how I did it:
-- {{{ Tags
tags = {}
-- Generates tags with custom names
for s = 1, screen.count() do
tags[s] = awful.tag({ "I", "II", "III", "IV", "V", "VI", "VII", "IX" }),
end
-- }}}
P.S. I keep the old solution in case someone would wish to use the code for another purpose.
Not an official answer yet, but yesterday I wrote more doc about this:
https://github.com/awesomeWM/awesome/pull/1279/files#diff-df495cc7fcbd48cd2698645bca070ff9R39
It is for Awesome 4.0, but in this case not much changed, so the example is almost valid (the gap property is not available in 3.4/3.5).
Also, if you wish to setup complex tags, I would suggest my Tyrannical module (Awesome 3.5+) or Shifty (Awesome 3.2-3.4). It is designed to make this much easier.
I'm making an app that will translate roleplaying-style messages into something much more generic. The user has the ability to specify their preferences, like:
Moves
- /me <move>
- *<move>*
Speech
- <speech>
- "<speech>"
Out-of-Character
- [<ooc>]
- ((ooc))
- //ooc
I need to parse a message like this:
/me eats food "This is *munch* good!" [You're good at this]
or like this:
*eats food* This is *munch* good! ((You're good at this))
into a more generic, XML-like string like this:
<move>eats food <speech>This is <move>munch</move> good!</speech> <ooc>You're good at this</ooc></move>
but with regard to which is inside which. For example:
*eats food "This is munch* good" // You're good at this
should be parsed as:
<move>eats food "This is munch</move><speech> good" </speech><ooc> You're good at this</ooc>
even if that's not what the user intended. Note that the quotes in this last example weren't parsed because they didn't wrap a complete segment, and the current move segment had not finished by the time the first was encountered, and speech had already started when the second one was, and the second one didn't have another after it to surround a separate speech segment.
I've tried doing this iteratively, recursively, with trees, and even with regexes, but I haven't found a solution that works like I want it to. How do I parse the above RP-style messages into the above generic XML-style messages?
Also important is that the spacing is preserved.
Here are some other examples using the above-listed preferences:
I like roller coasters.
[what are you like?]
/me eats a hamburger // wanna grab lunch after this?
*jumps up and down* This ((the party)) is great!
/me performs *an action* within an action "And that's just fine [As is *an action* in ooc in speech]"
And messages /me can change contexts // at any point
[But ill-formatted ones *must be parsed] according "to* the rules"
-And text formatted in <non-specified ways> is ¬ treated; specially-
become:
<speech>I like roller coasters.</speech>
<ooc>what are you like?</ooc>
<move>eats a hamburger <ooc> wanna grab lunch after this?</ooc></move>
<move>jumps up and down</move><speech> This <ooc>the party</ooc> is great!</speech>
<move>performs <move>an action</move> within an action <speech>And that's just fine <ooc>As is <move>an action</move> in ooc in speech</ooc></speech></move>
<speech>And messages <move>can change contexts <ooc> at any point</ooc></move></speech>
<ooc>But ill-formatted ones *must be parsed</ooc><speech> according <speech>to* the rules</speech></speech>
<speech>-And text formatted in <non-specified ways> is ¬ treated; specially-</speech>
What you have is a bunch of tokens that should trigger an xml tag. It is fairly straightforward to implement this using a function for each tag.
void move(){
xmlPrintWriter.println("<move>");
parse();
xmlPrintWriter.println(content);
xmlPrintWriter.println("</move>");
}
Where the parse() consumes and classifies the input text.
void parse(){
if (text.startsWith("*")) action = MOVE;
... other cases
if ( action == MOVE){
move();
}
... other actions.
The parse method has to check for all possible state-changers "*" -> move, "((" -> ooc, """ -> speech and so on.
Here MOVE is a class constant, action a state variable along with text and xmlPrintWriter. move and parse are both methods
This approach will not work though if you allow your last example. Then the situation becomes extremely hairy and would need to be decided on a case by case basis.
Something to this affect might do:
public static RPMessageSegment split(RPMessageSegment text)
{
ArrayList<RPMessageSegment> majorSegments = new ArrayPP<>();
scan: for(int i = 0, l = text.length() - 1; i < l; i++)
{
dels: for(Delimiter d : delimiters)
{
if (d.startsWith(text, i))
{
RPMessageSegment newSegment = d.extractSegment(text, i);
i += newSegment.lengthWithOriginalDelimiters();
majorSegments.add(newSegment);
continue scan;
}
}
}
if (majorSegments.length() == 1)
return majorSegments.get(0);
for(int i = 0, l = majorSegments.length(); i < l; i++)
{
majorSegments.set(i, split(majorSegments.get(i)));
}
return new RPMessageSegment(majorSegments);
}
Of course, this presumes that the referenced classes have these methods that respond as one might expect. They shouldn't be terribly hard to imagine, not to mention write.
After it's parsed into RPMessageSegments, those can easily be echoed out into strings surrounded by XML-style tags
This is a repost of my question in the Google Group. Hopefully I will get some response here.
Frequently I run into this problem. I want to generate a line of text if the text is not empty. If it is empty, do not generate the line. Illustration template:
namespace #classSpec.getNamespace()
#classSpec.getComment()
class #classSpec.getName() {
...
}
If #classSpec.getComment() returns meaningful comment text, the result looks like
namespace com.example
// this is comment
class MyClass {
...
}
But if there is no comment, it will be
namespace com.example
class MyClass {
...
}
Notice the extra empty line? I do not want it. Currently the solution is to write template as
namespace #classSpec.getNamespace()
#classSpec.getComment()class #classSpec.getName() {
...
}
and make sure the getComment() will append a "\n" to the return value. This makes the template much less readable. Also, imagine I need to generate a function with multiple parameters in a for loop. If each parameter requires complex logic of template code, I need to make them all written in one line as above. Otherwise, the result file will have function like
function myFunction(
String stringParam,
Integer intParam,
Long longParam
)
The core problem is, the template file does not only contain scripts, but also raw text to be written in the output. For script part, we want newlines and indentations. We want the space to be trimmed just like what compilers usually do. But for raw text, we want the spaces to be exact as specified in the file. I feel we need a bit more raw text control mechanism to reconcile the two parts.
Specific to this case, is there some special symbol to treat multiple lines as single line in the output? For example, like if we can write
namespace #classSpec.getNamespace()
#classSpec.getComment()\\
class #classSpec.getName() {
...
}
Thanks!
This is just a known bug see
https://github.com/greenlaw110/Rythm/issues/259.
https://github.com/greenlaw110/Rythm/issues/232
Unfortunately there is no proper work-around for this yet. You might want to add your comments to the bugs above and reference your question.
Take the example below which you can try out at http://fiddle.rythmengine.org/#/editor
#def setXTimesY(int x,int y) { #{ int result=x*y;} #(result)}
1
2 a=#setXTimesY(2,3)
3 b=#setXTimesY(3,5)
4 c=#setXTimesY(4,7)
5
this will properly create the output:
1
2 a= 6
3 b= 15
4 c= 28
5
now try to beautify the #def setXTimesY ...
#def setXTimesY(int x,int y) {
#{
int result=x*y;
}#(result)}
1
2 a=#setXTimesY(2,3)
3 b=#setXTimesY(3,5)
4 c=#setXTimesY(4,7)
will give a wrong result
1
2 a=(result)
3 b=(result)
4 c=(result)
#def setXTimesY(int x,int y) {
#{
int result=x*y;
} #(result)}
1
2 a=#setXTimesY(2,3)
3 b=#setXTimesY(3,5)
4 c=#setXTimesY(4,7)
is better but adds a space
So
https://github.com/greenlaw110/Rythm/issues/270
is another bug along the same lines
I'm experiencing the same problem. I've not been able to find a solution in the own Rythm.
To obtain a single line as result of processing several lines in the template, I've had to implement my own mechanism, in form of a post-processing. In the template, at the end of each line that I want to join the next one, I use a custom symbol/tag as token. Then, once the template has been processed, I replace that symbol/tag, together with the line break character(s) right after it, with an empty string.
For example, if you used a tag called "#join-next-line#", the template would look like this:
#for (Bar bar : foo.getBars()).join (", ") {
#bar.name#join-next-line#
}
It's not the perfect solution, but it has worked for me.
I want to create several files from a single template, which differ only by a variable name. For example :
(file1.rst):
.. |variable| replace:: 1
.. include template.rst
(template.rst) :
Variable |variable|
=====================
Image
-------
.. image:: ./images/|variable|-image.png
where of course I have an image called "./images/1-image.png". The substitution of "|variable|" by "1" works well in the title, but not in the image file name, and at compilation I get :
WARNING: image file not readable: ./images/|variable|-image.png
How can I get reST to make the substitution in the variable name too? (if this changes anything, am using Sphinx).
There are two problems here: a substitution problem, and a parsing order problem.
For the first problem, the substitution reference |variable| cannot have adjacent characters (besides whitespace or maybe _ for hyperlinking) or else it won't parse as a substitution reference, so you need to escape it:
./images/\ |variable|\ -image.png
However, the second problem is waiting around the corner. While I'm not certain of the details, it seems reST is unable to parse substitutions inside other directives. I think it first parses the image directive, which puts it in the document tree and thus out of reach of the substitution mechanism. Similarly, I don't think it's possible to use a substitution to insert content intended to be parsed (e.g. .. |img1| replace::`.. image:: images/1-image.png`). This is all speculative based on some tests and my incomplete comprehension of the official documentation, so someone more knowledgeable can correct what I've said here.
I think you're aware of the actual image substitution directive (as opposed to text substitution), but I don't think it attains the generality you're aiming for (you'll still need a separate directive for the image as from the |variable|), but in any case it looks like this:
.. |img1| image:: images/1-image.png
Since you're using Sphinx, you can try creating your own directive extension (see this answer for information), but it won't solve the substitutions-inside-markup problem.
You have to create a custom directive in this case as Sphinx doesn't allow you to substitute image paths. You can change Sphinx figure directive as follows and use it instead of the image directive.
from typing import Any, Dict, List, Tuple
from typing import cast
from docutils import nodes
from docutils.nodes import Node, make_id, system_message
from docutils.parsers.rst import directives
from docutils.parsers.rst.directives import images, html, tables
from sphinx import addnodes
from sphinx.directives import optional_int
from sphinx.domains.math import MathDomain
from sphinx.util.docutils import SphinxDirective
from sphinx.util.nodes import set_source_info
if False:
# For type annotation
from sphinx.application import Sphinx
class CustomFigure(images.Figure):
"""The figure directive which applies `:name:` option to the figure node
instead of the image node.
"""
def run(self) -> List[Node]:
name = self.options.pop('name', None)
path = self.arguments[0] #path = ./images/variable-image.png
#replace 'variable' from th.e given value
self.argument[0] = path.replace("variable", "string substitution")
result = super().run()
if len(result) == 2 or isinstance(result[0], nodes.system_message):
return result
assert len(result) == 1
figure_node = cast(nodes.figure, result[0])
if name:
# set ``name`` to figure_node if given
self.options['name'] = name
self.add_name(figure_node)
# copy lineno from image node
if figure_node.line is None and len(figure_node) == 2:
caption = cast(nodes.caption, figure_node[1])
figure_node.line = caption.line
return [figure_node]
def setup(app: "Sphinx") -> Dict[str, Any]:
directives.register_directive('figure', Figure)
return {
'version': 'builtin',
'parallel_read_safe': True,
'parallel_write_safe': True,
}
You can add this CustomFigure.py directive in the conf.py of the project and use the customfigure directive across Sphinx project instead of the Image directive. Refer http://www.sphinx-doc.org/en/master/usage/extensions/index.html to add a custom directive to your Sphinx project.
I'm using XPath with Scrapy to scrape data off of a movie website BoxOfficeMojo.com.
As a general question: I'm wondering how to select certain child nodes of one parent node all in one Xpath string.
Depending on the movie web page from which I'm scraping data, sometimes the data I need is located at different children nodes, such as whether or not there is a link or not. I will be going through about 14000 movies, so this process needs to be automated.
Using this as an example. I will need actor/s, director/s and producer/s.
This is the Xpath to the director: Note: The %s corresponds to a determined index where that information is found - in the action Jackson example director is found at [1] and actors at [2].
//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font/text()
However, would a link exist to a page on the director, this would be the Xpath:
//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font/a/text()
Actors are a bit more tricky, as there <br> included for subsequent actors listed, which may be the children of an /a or children of the parent /font, so:
//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font//a/text()
Gets all most all of the actors (except those with font/br).
Now, the main problem here, I believe, is that there are multiple //div[#class="mp_box_content"] - everything I have works EXCEPT that I also end up getting some digits from other mp_box_content. Also I have added numerous try:, except: statements in order to get everything (actors, directors, producers who both have and do not have links associated with them). For example, the following is my Scrapy code for actors:
actors = hxs.select('//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font//a/text()' % (locActor,)).extract()
try:
second = hxs.select('//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font/text()' % (locActor,)).extract()
for n in second:
actors.append(n)
except:
actors = hxs.select('//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font/text()' % (locActor,)).extract()
This is an attempt to cover for the facts that: the first actor may not have a link associated with him/her and subsequent actors do, the first actor may have a link associated with him/her but the rest may not.
I appreciate the time taken to read this and any attempts to help me find/address this problem! Please let me know if any more information is needed.
I am assuming you are only interested in textual content, not the links to actors' pages etc.
Here is a proposition using lxml.html (and a bit of lxml.etree) directly
First, I recommend you select td[2] cells by the text content of td[1], with expressions like .//tr[starts-with(td[1], "Director")]/td[2] to account for "Director", or "Directors"
Second, testing various expressions with or without <font>, with or without <a> etc., makes code difficult to read and maintain, and since you're interested only in the text content, you might as well use string(.//tr[starts-with(td[1], "Actor")]/td[2]) to get the text, or use lxml.html.tostring(e, method="text", encoding=unicode) on selected elements
And for the <br> issue for multiple names, the way I do is generally modify the lxml tree containing the targetted content to add a special formatting character to <br> elements' .text or .tail, for example a \n, with one of lxml's iter() functions. This can be useful on other HTML block elements, like <hr> for example.
You may see better what I mean with some spider code:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
import lxml.etree
import lxml.html
MARKER = "|"
def br2nl(tree):
for element in tree:
for elem in element.iter("br"):
elem.text = MARKER
def extract_category_lines(tree):
if tree is not None and len(tree):
# modify the tree by adding a MARKER after <br> elements
br2nl(tree)
# use lxml's .tostring() to get a unicode string
# and split lines on the marker we added above
# so we get lists of actors, producers, directors...
return lxml.html.tostring(
tree[0], method="text", encoding=unicode).split(MARKER)
class BoxOfficeMojoSpider(BaseSpider):
name = "boxofficemojo"
start_urls = [
"http://www.boxofficemojo.com/movies/?id=actionjackson.htm",
"http://www.boxofficemojo.com/movies/?id=cloudatlas.htm",
]
# locate 2nd cell by text content of first cell
XPATH_CATEGORY_CELL = lxml.etree.XPath('.//tr[starts-with(td[1], $category)]/td[2]')
def parse(self, response):
root = lxml.html.fromstring(response.body)
# locate the "The Players" table
players = root.xpath('//div[#class="mp_box"][div[#class="mp_box_tab"]="The Players"]/div[#class="mp_box_content"]/table')
# we have only one table in "players" so the for loop is not really necessary
for players_table in players:
directors_cells = self.XPATH_CATEGORY_CELL(players_table,
category="Director")
actors_cells = self.XPATH_CATEGORY_CELL(players_table,
category="Actor")
producers_cells = self.XPATH_CATEGORY_CELL(players_table,
category="Producer")
writers_cells = self.XPATH_CATEGORY_CELL(players_table,
category="Producer")
composers_cells = self.XPATH_CATEGORY_CELL(players_table,
category="Composer")
directors = extract_category_lines(directors_cells)
actors = extract_category_lines(actors_cells)
producers = extract_category_lines(producers_cells)
writers = extract_category_lines(writers_cells)
composers = extract_category_lines(composers_cells)
print "Directors:", directors
print "Actors:", actors
print "Producers:", producers
print "Writers:", writers
print "Composers:", composers
# here you should of course populate scrapy items
The code can be simplified for sure, but I hope you get the idea.
You can do similar things with HtmlXPathSelector of course (with the string() XPath function for example), but without modifying the tree for <br> (how to do that with hxs?) it works only for non-multiple names in your case:
>>> hxs.select('string(//div[#class="mp_box"][div[#class="mp_box_tab"]="The Players"]/div[#class="mp_box_content"]/table//tr[contains(td, "Director")]/td[2])').extract()
[u'Craig R. Baxley']
>>> hxs.select('string(//div[#class="mp_box"][div[#class="mp_box_tab"]="The Players"]/div[#class="mp_box_content"]/table//tr[contains(td, "Actor")]/td[2])').extract()
[u'Carl WeathersCraig T. NelsonSharon Stone']