How can I parse this plaintext RP-style string into a more generic XML-style one? - algorithm

I'm making an app that will translate roleplaying-style messages into something much more generic. The user has the ability to specify their preferences, like:
Moves
- /me <move>
- *<move>*
Speech
- <speech>
- "<speech>"
Out-of-Character
- [<ooc>]
- ((ooc))
- //ooc
I need to parse a message like this:
/me eats food "This is *munch* good!" [You're good at this]
or like this:
*eats food* This is *munch* good! ((You're good at this))
into a more generic, XML-like string like this:
<move>eats food <speech>This is <move>munch</move> good!</speech> <ooc>You're good at this</ooc></move>
but with regard to which is inside which. For example:
*eats food "This is munch* good" // You're good at this
should be parsed as:
<move>eats food "This is munch</move><speech> good" </speech><ooc> You're good at this</ooc>
even if that's not what the user intended. Note that the quotes in this last example weren't parsed because they didn't wrap a complete segment, and the current move segment had not finished by the time the first was encountered, and speech had already started when the second one was, and the second one didn't have another after it to surround a separate speech segment.
I've tried doing this iteratively, recursively, with trees, and even with regexes, but I haven't found a solution that works like I want it to. How do I parse the above RP-style messages into the above generic XML-style messages?
Also important is that the spacing is preserved.
Here are some other examples using the above-listed preferences:
I like roller coasters.
[what are you like?]
/me eats a hamburger // wanna grab lunch after this?
*jumps up and down* This ((the party)) is great!
/me performs *an action* within an action "And that's just fine [As is *an action* in ooc in speech]"
And messages /me can change contexts // at any point
[But ill-formatted ones *must be parsed] according "to* the rules"
-And text formatted in <non-specified ways> is &not treated; specially-
become:
<speech>I like roller coasters.</speech>
<ooc>what are you like?</ooc>
<move>eats a hamburger <ooc> wanna grab lunch after this?</ooc></move>
<move>jumps up and down</move><speech> This <ooc>the party</ooc> is great!</speech>
<move>performs <move>an action</move> within an action <speech>And that's just fine <ooc>As is <move>an action</move> in ooc in speech</ooc></speech></move>
<speech>And messages <move>can change contexts <ooc> at any point</ooc></move></speech>
<ooc>But ill-formatted ones *must be parsed</ooc><speech> according <speech>to* the rules</speech></speech>
<speech>-And text formatted in <non-specified ways> is &not treated; specially-</speech>

What you have is a bunch of tokens that should trigger an xml tag. It is fairly straightforward to implement this using a function for each tag.
void move(){
xmlPrintWriter.println("<move>");
parse();
xmlPrintWriter.println(content);
xmlPrintWriter.println("</move>");
}
Where the parse() consumes and classifies the input text.
void parse(){
if (text.startsWith("*")) action = MOVE;
... other cases
if ( action == MOVE){
move();
}
... other actions.
The parse method has to check for all possible state-changers "*" -> move, "((" -> ooc, """ -> speech and so on.
Here MOVE is a class constant, action a state variable along with text and xmlPrintWriter. move and parse are both methods
This approach will not work though if you allow your last example. Then the situation becomes extremely hairy and would need to be decided on a case by case basis.

Something to this affect might do:
public static RPMessageSegment split(RPMessageSegment text)
{
ArrayList<RPMessageSegment> majorSegments = new ArrayPP<>();
scan: for(int i = 0, l = text.length() - 1; i < l; i++)
{
dels: for(Delimiter d : delimiters)
{
if (d.startsWith(text, i))
{
RPMessageSegment newSegment = d.extractSegment(text, i);
i += newSegment.lengthWithOriginalDelimiters();
majorSegments.add(newSegment);
continue scan;
}
}
}
if (majorSegments.length() == 1)
return majorSegments.get(0);
for(int i = 0, l = majorSegments.length(); i < l; i++)
{
majorSegments.set(i, split(majorSegments.get(i)));
}
return new RPMessageSegment(majorSegments);
}
Of course, this presumes that the referenced classes have these methods that respond as one might expect. They shouldn't be terribly hard to imagine, not to mention write.
After it's parsed into RPMessageSegments, those can easily be echoed out into strings surrounded by XML-style tags

Related

Indesign Scripting: View array's actual content (strings) in ExtendScript console

I'm a beginning learner of InDesign scripting and would like to help myself with debugging, but my attempts seem to run into walls. Hope someone has some insights that will help me going forward.
I'm working on a little project that loops through some selected tables, puts the 3 tables into an array/variable (accomplished that) and then loops through the content of those tables to find a GREP match and store those in an array/variable (for further uses I won't get into now)
My main objective at this point: See exactly what text characters the .findGrep(); function is catching and display those in the Javascript Console of the ExtendScript Toolkit app.
So here's a bit of the journey up to this point, including codes tried and suggestions from others. (All of my attempted uses of these has failed...why I'm here now... and why this is long; my apologies)
Initial try.
var myTables = []; (in Data Browser this shows values of [object Table], [object Table], [object Table]
var myFinds = [];
var myTest = [];
var myCharacters = [];
app.findGrepPreferences = null;
app.findGrepPreferences.findWhat = "\"";
for (x = 0; x < myTables.length; x++) {
var myFinds = myTables[x].findGrep();
$.writeln(myFinds);
};
Notes on this code: Because not every table has the characters in the findWhat, sometimes in this loop myFinds has nothing, but when it does have something, it shows this in console [object Character],[object Character],[object Character]
So someone (firstHelp) gave me this: And it did not work... error thrown on .contents.toString(); *"undefined is not an object" which I thought, "ok, yes I see at times in the loop myFinds has nothing in it... more on this later"
var stringArray = [];
for( var n=0; n<myFinds.length; n++ ) {
stringArray[n] = myFinds[n].contents.toString();
};
$.writeln(myFinds.join("\r"));
Code revamp Gave up on the $.writeln(myFinds); within the loop and tried this in order to gather Grep finds in a variable/array that could be dealt with outside of loop.
for (x = 0; x < myTables.length; x++) {
$.writeln(myTables[x].cells.firstItem().texts[0].contents[0]);
myFinds.push(myTables[x].findGrep());
};
$.writeln(myFinds);
ExtendScript Toolkit console now showing this for myFinds:
*myFinds = [Array], [object Character], [object Character], [object...
+ (object symbol) 0 =
+ (object symbol) 1 = [object Character], [object Character], [object Character]
+ (object symbol) 2 =
+ (object symbol) _proto_ =*
*again tried the .contents.toString(); on the myFinds and still the same error, "undefined..." including targeting the array when it clearly had something in it.
**So then I get this tipoff...(but no helpful code to apply to what I already have)
"you are dealing with arrays of arrays mixed with texts.
So you have to check with each item of the result array if it is text
or another array of texts.
If it is an array loop that array."
And later this bit of code that is supposed to "flatten" my array... a = [].concat.apply([],a);
Replacing a with myFinds like this, myFinds = [].concat.apply([],myFinds); did absolutely nothing. The array and its contents showed no change in the console... and I have no idea how to loop through each item of this array within an array, find out if it's text or another array and then show its real contents to console.
Really...how many loops and if/thens etc do I need to run on one array to show its actual contents in the console? But I know I struggle with breaking down every little step I want, to its minute scripting granularity and so my ignorance regularly impedes me. I welcome any suggestions/tips to move me closer to my **main objective" as stated above. Thanks
Regarding the first help. The real reason why you get an error while accessing content property is that you don’t check the type of the object and presume it will be a Text object. As the findGrep may not find a Text occurrence, you actually get an empty array. And Array.prototype.contents doesn’t exist hence the error.
Then $.writeln is legacy of Adobe ExtendScript toolkit, the IDE for ExtendScript. This product is no longer de eloped and maintained by Adobe. You should consider using other logging techniques such as the Visual Studio ExtendScript plugin which will allow you to use breakpoints and everything you need.

MS Bot Framework: Is there a way to cancel a prompt dialog? [duplicate]

The PromptDialog.Choice in the Bot Framework display the choice list which is working well. However, I would like to have an option to cancel/escape/exit the dialog with giving cancel/escape/exit optioin in the list. Is there anything in PromptDialog.Choice which can be overridden since i have not found any cancel option.
here is my code in c#..
PromptDialog.Choice(
context: context,
resume: ChoiceSelectAsync,
options: getSoftwareList(softwareItem),
prompt: "We have the following software items matching " + softwareItem + ". (1), (2), (3). Which one do you want?:",
retry: "I didn't understand. Please try again.",
promptStyle: PromptStyle.PerLine);
Example:
Bot: We have the following software items matching Photoshop. (1), (2), (3). Which one do you want
Version 1
Version 2
Version 3
What I want if user enter none of above or a command or number, cancel, exit, that bypasses the options above, without triggering the retry error message.
How do we do that?
There are two ways of achieving this:
Add cancel as an option as suggested. While this would definitely work, long term you will find repeating yourself a lot, plus that you will see the cancel option in the list of choices, what may not be desired.
A better approach would be to extend the current PromptChoice to add your exit/cancelation logic. The good news is that there is something already implemented that you could use as is or as the base to achieve your needs. Take a look to the CancelablePromptChoice included in the BotBuilder-Samples repository. Here is how to use it.
Just add the option "cancel" on the list and use a switch-case on the method that gets the user input, then call your main manu, or whatever you want to do on cancel
Current Prompt Choice does not work in that way to allows user select by number. I have override the ScoreMatch function in CancleablePromptChoice as below
public override Tuple<bool, int> ScoreMatch(T option, string input)
{
var trimmed = input.Trim();
var text = option.ToString();
// custom logic to allow users to select by number
int isInt;
if(int.TryParse(input,out isInt) && isInt <= promptOptions.Options.Count())
{
text = promptOptions.Options.ElementAt(isInt - 1).ToString();
trimmed = option.ToString().Equals(text) ? text :trimmed;
}
bool occurs = text.IndexOf(trimmed, StringComparison.CurrentCultureIgnoreCase) >= 0;
bool equals = text == trimmed;
return occurs ? Tuple.Create(equals, trimmed.Length) : null;
}
#Ezequiel Once again thank you!.

How to remove a newline from template?

This is a repost of my question in the Google Group. Hopefully I will get some response here.
Frequently I run into this problem. I want to generate a line of text if the text is not empty. If it is empty, do not generate the line. Illustration template:
namespace #classSpec.getNamespace()
#classSpec.getComment()
class #classSpec.getName() {
...
}
If #classSpec.getComment() returns meaningful comment text, the result looks like
namespace com.example
// this is comment
class MyClass {
...
}
But if there is no comment, it will be
namespace com.example
class MyClass {
...
}
Notice the extra empty line? I do not want it. Currently the solution is to write template as
namespace #classSpec.getNamespace()
#classSpec.getComment()class #classSpec.getName() {
...
}
and make sure the getComment() will append a "\n" to the return value. This makes the template much less readable. Also, imagine I need to generate a function with multiple parameters in a for loop. If each parameter requires complex logic of template code, I need to make them all written in one line as above. Otherwise, the result file will have function like
function myFunction(
String stringParam,
Integer intParam,
Long longParam
)
The core problem is, the template file does not only contain scripts, but also raw text to be written in the output. For script part, we want newlines and indentations. We want the space to be trimmed just like what compilers usually do. But for raw text, we want the spaces to be exact as specified in the file. I feel we need a bit more raw text control mechanism to reconcile the two parts.
Specific to this case, is there some special symbol to treat multiple lines as single line in the output? For example, like if we can write
namespace #classSpec.getNamespace()
#classSpec.getComment()\\
class #classSpec.getName() {
...
}
Thanks!
This is just a known bug see
https://github.com/greenlaw110/Rythm/issues/259.
https://github.com/greenlaw110/Rythm/issues/232
Unfortunately there is no proper work-around for this yet. You might want to add your comments to the bugs above and reference your question.
Take the example below which you can try out at http://fiddle.rythmengine.org/#/editor
#def setXTimesY(int x,int y) { #{ int result=x*y;} #(result)}
1
2 a=#setXTimesY(2,3)
3 b=#setXTimesY(3,5)
4 c=#setXTimesY(4,7)
5
this will properly create the output:
1
2 a= 6
3 b= 15
4 c= 28
5
now try to beautify the #def setXTimesY ...
#def setXTimesY(int x,int y) {
#{
int result=x*y;
}#(result)}
1
2 a=#setXTimesY(2,3)
3 b=#setXTimesY(3,5)
4 c=#setXTimesY(4,7)
will give a wrong result
1
2 a=(result)
3 b=(result)
4 c=(result)
#def setXTimesY(int x,int y) {
#{
int result=x*y;
} #(result)}
1
2 a=#setXTimesY(2,3)
3 b=#setXTimesY(3,5)
4 c=#setXTimesY(4,7)
is better but adds a space
So
https://github.com/greenlaw110/Rythm/issues/270
is another bug along the same lines
I'm experiencing the same problem. I've not been able to find a solution in the own Rythm.
To obtain a single line as result of processing several lines in the template, I've had to implement my own mechanism, in form of a post-processing. In the template, at the end of each line that I want to join the next one, I use a custom symbol/tag as token. Then, once the template has been processed, I replace that symbol/tag, together with the line break character(s) right after it, with an empty string.
For example, if you used a tag called "#join-next-line#", the template would look like this:
#for (Bar bar : foo.getBars()).join (", ") {
#bar.name#join-next-line#
}
It's not the perfect solution, but it has worked for me.

Specifying styles for portions of a PyYAML dump

I'm using YAML for a computer and human-editable and readable input format for a simulator. For human readability, some parts of the input are mostly amenable to block style, while flow style suits others better.
The default for PyYAML is to use block style wherever there are nested maps or sequences, and flow style everywhere else. *default_flow_style* allows one to choose all-flow-style or all-block-style.
But I'd like to output files more of the form
bonds:
- { strength: 2.0 }
- ...
tiles:
- { color: red, edges: [1, 0, 0, 1], stoic: 0.1}
- ...
args:
block: 2
Gse: 9.4
As can be seen, this doesn't follow a consistent pattern for styles throughout, and instead changes depending upon the part of the file. Essentially, I'd like to be able to specify that all values in some block style sequences be in flow style. Is there some way to get that sort of fine-level control over dumping? Being able to dump the top-level mapping in a particular order while not requiring that order (eg, omap) would be nice as well for readability.
It turns out this can be done by defining subclasses with representers for each item I want not to follow default_flow_style, and then converting everything necessary to those before dumping. In this case, that means I get something like:
class blockseq( dict ): pass
def blockseq_rep(dumper, data):
return dumper.represent_mapping( u'tag:yaml.org,2002:map', data, flow_style=False )
class flowmap( dict ): pass
def flowmap_rep(dumper, data):
return dumper.represent_mapping( u'tag:yaml.org,2002:map', data, flow_style=True )
yaml.add_representer(blockseq, blockseq_rep)
yaml.add_representer(flowmap, flowmap_rep)
def dump( st ):
st['tiles'] = [ flowmap(x) for x in st['tiles'] ]
st['bonds'] = [ flowmap(x) for x in st['bonds'] ]
if 'xgrowargs' in st.keys(): st['xgrowargs'] = blockseq(st['xgrowargs'])
return yaml.dump(st)
Annoyingly, the easier-to-use dumper.represent_list and dumper.represent_dict don't allow flow_style to be specified, so I have to specify the tag, but the system does work.

Ruby Regular Expression: Setting $1 variable in a hash

Everything in this code works properly, except the contents of the $1 variable aren't being properly displayed. According to my tests, all the matching is being done properly, I am just having trouble figuring out how to actually output the contents of $1.
codeTags = {
/\[b\](.+?)\[\/b\]/m => "<strong>#{$1}</strong>",
/\[i\](.+?)\[\/i\]/m => "<em>#{$1}</em>"
}
regexp = Regexp.new(/(#{Regexp.union(codeTags.keys)})/)
message = (message).gsub(/#{regexp}/) do |match|
codeTags[codeTags.keys.select {|k| match =~ Regexp.new(k)}[0]]
end
return message.html_safe
Thank you!
As soon as you do this:
codeTags = {
/\[b\](.+?)\[\/b\]/m => "<strong>#{$1}</strong>",
/\[i\](.+?)\[\/i\]/m => "<em>#{$1}</em>"
}
The #{$1} bits in the values are interpolated using whatever happens to be in $1 at the time. The values will most likely be "<strong></strong>" and "<em></em>" and those aren't very useful.
And regexp is already a regular expression object so gsub(/#{regexp}/) should be just gsub(regexp). Similar things apply to the keys of codeTags, they're already regular expression objects so you don't need to Regexp.new(k).
I'd change the whole structure, you're overcomplicating things. Just something simple like this would be fine for only two replacements:
message = message.gsub(/\[b\](.*?)\[\/b\]/) { '<strong>' + $1 + '</strong>' }
message = message.gsub(/\[i\](.*?)\[\/i\]/) { '<em>' + $1 + '</em>' }
If you try to do it all at once you'll have problems with nesting in something like this:
message = 'Where [b]is[/b] pancakes [b]house [i]and[/i] more[/b] stuff?'
You'd end up having to use a recursive gsub and possibly some lambdas if you wanted to properly handle things like that with a single expression.
There are better things to spend your time on than trying to be clever on something like this.
Response to comments: If you have more bb-tags and some smilies to worry about and several messages per page then you should HTMLify each message when you create it. You could store only the HTML version or both HTML and BB-Code versions if you want the BB-Code stuff around for some reason. This way you'd only pay for the HTMLification once per message and producing your big lists would be nearly free.

Resources