multiline graphviz node with math mode

multiline graphviz node with math mode - graphviz

I have a node which represents a multiline label in math mode. However, I failed to include the newline symbol \n into the node label. Here is my try
\begin{dot2tex}[neato,scale=.5,options=-t math]
digraph G
{
c[shape=none,label="x_1",pos="1,.25!"];
d[label="D",pos=".5,-1.6!"];
}
\end{dot2tex}
How can I add newline for the node c?

From the dot2tex documentation page:
The \ character needs to be escaped with \\ if used in the label attribute.
Therefore, label="first line\\\\second line" (four backslashes) should result in a latex newline sequence (can't test it though).

I found the solution here by using matrix package from tikz
\begin{dot2tex}[neato,scale=.8,options=-t math]
digraph G
{
c[shape=none,texlbl="$
\begin{matrix}
x_1
\\ x_2
\end{matrix}$"
,pos="1,-1.2!"];
}
\end{dot2tex}
will add node with two lines $x_1$ and $x_2$.

Related

C backslash newline in the middle of word emits error

The GNU CPP says about backslash new lines
A continued line is a line which ends with a backslash, ‘\’. The backslash is removed and the following line is joined with the current one. No space is inserted, so you may split a line anywhere, even in the middle of a word. (It is generally more readable to split lines only at white space.)
When I compile the following, breaking the printf word I get an error like:
error: unknown type name 'prin'
// Online C compiler https://www.programiz.com/c-programming/online-compiler/
#include <stdio.h>
int main() {
// Write C code here
prin\
tf("Hello world");
return 0;
}
Why are the lines not joined together?

The lines are joined together. However, as the second line starts with indent, you end up with prin tf. Remove all leading white space on the line with tf, and it will compile.

sed to get string between two patterns

I am working on a latex file from which I need to pick out the references marked by \citep{}. This is what I am doing using sed.
cat file.tex | grep citep | sed 's/.*citep{\(.*\)}.*/\1/g'
Now this one works if there is only one pattern in a line. If there are more than one patterns i.e. \citep in a line, it fails. It fails even when there is only one pattern but more than one closing bracket }. What should I do, so that it works for all the patterns in a line and also for the exclusive bracket I am looking for?
I am working on bash. And a part of the file looks like this:
of the Asian crust further north \citep{TapponnierM76, WangLiu2009}. This has led to widespread deformation both within and
\citep{BilhamE01, Mitraetal2005} and by distributed seismicity across the region (Fig. \ref{fig1_2}). Recent GPS Geodetic
across the Dawki fault and Naga Hills, increasing eastwards from $\sim$3~mm/yr to $\sim$13~mm/yr \citep{Vernantetal2014}.
GPS velocity vectors \citep{TapponnierM76, WangLiu2009}. Sikkim Himalaya lies at the transition between this relatively simple
this transition includes deviation of the Himalaya from a perfect arc beyond 89\deg\ longitude \citep{BendickB2001}, reduction
\citep{BhattacharyaM2009, Mitraetal2010}. Rivers Tista, Rangit and Rangli run through Sikkim eroding the MCT and Ramgarh
thrust to form a mushroom-shaped physiography \citep{Mukuletal2009,Mitraetal2010}. Within this sinuous physiography,
\citep{Pauletal2015} and also in accordance with the findings of \citet{Mitraetal2005} for northeast India. In another study
field results corroborate well with seismic studies in this region \citep{Actonetal2011, Arunetal2010}. From studies of
On one line, I get answer like this
BilhamE01, TapponnierM76} and by distributed seismicity across the region (Fig. \ref{fig1_2
whereas I am looking for
BilhamE01, TapponnierM76
Another example with more than one /citep patterns gives output like this
Pauletal2015} and also in accordance with the findings of \citet{Mitraetal2005} for northeast India. In another study
whereas I am looking for
Pauletal2015 Mitraetal2005
Can anyone please help?

it's a greedy match change the regex match the first closing brace
.*citep{\([^}]*\)}
test
$ echo "\citep{string} xyz {abc}" | sed 's/.*citep{\([^}]*\)}.*/\1/'
string
note that it will only match one instance per line.

If you are using grep anyway, you can as well stick with it (assuming GNU grep):
$ echo $str | grep -oP '(?<=\\citep{)[^}]+(?=})'
BilhamE01, TapponierM76

For what it's worth, this can be done with sed:
echo "\citep{string} xyz {abc} \citep{string2},foo" | \
sed 's/\\citep{\([^}]*\)}/\n\1\n\n/g; s/^[^\n]*\n//; s/\n\n[^\n]*\n/, /g; s/\n.*//g'
output:
string, string2
But wow, is that ugly. The sed script is more easily understood in this form, which happens to be suitable to be fed to sed via a -f argument:
# change every \citep{string} to <newline>string<newline><newline>
s/\\citep{\([^}]*\)}/\n\1\n\n/g
# remove any leading text before the first wanted string
s/^[^\n]*\n//
# replace text between wanted strings with comma + space
s/\n\n[^\n]*\n/, /g
# remove any trailing unwanted text
s/\n.*//
This makes use of the fact that sed can match and sub the newline character, even though reading a new line of input will not result in a newline initially appearing in the pattern space. The newline is the one character that we can be certain will appear in the pattern space (or in the hold space) only if sed puts it there intentionally.
The initial substitution is purely to make the problem manageable by simplifying the target delimiters. In principle, the remaining steps could be performed without that simplification, but the regular expressions involved would be horrendous.
This does assume that the string in every \citep{string} contains at least one character; if the empty string must be accommodated, too, then this approach needs a bit more refinement.
Of course, I can't imagine why anyone would prefer this to #Lev's straight grep approach, but the question does ask specifically for a sed solution.

f.awk
BEGIN {
pat = "\\citep"
latex_tok = "\\\\[A-Za-z_][A-Za-z_]*" # match \aBcD
}
{
f = f $0 # store content of input file as a sting
}
function store(args, n, k, i) { # store `keys' in `d'
gsub("[ \t]", "", args) # remove spaces
n = split(args, keys, ",")
for (i=1; i<=n; i++) {
k = keys[i]
d[k]
}
}
function ntok() { # next token
if (match(f, latex_tok)) {
tok = substr(f, RSTART ,RLENGTH)
f = substr(f, RSTART+RLENGTH-1 )
return 1
}
return 0
}
function parse( i, rc, args) {
for (;;) { # infinite loop
while ( (rc = ntok()) && tok != pat ) ;
if (!rc) return
i = index(f, "{")
if (!i) return # see `pat' but no '{'
f = substr(f, i+1)
i = index(f, "}")
if (!i) return # unmatched '}'
# extract `args' from \citep{`args'}
args = substr(f, 1, i-1)
store(args)
}
}
END {
parse()
for (k in d)
print k
}
f.example
of the Asian crust further north \citep{TapponnierM76, WangLiu2009}. This has led to widespread deformation both within and
\citep{BilhamE01, Mitraetal2005} and by distributed seismicity across the region (Fig. \ref{fig1_2}). Recent GPS Geodetic
across the Dawki fault and Naga Hills, increasing eastwards from $\sim$3~mm/yr to $\sim$13~mm/yr \citep{Vernantetal2014}.
GPS velocity vectors \citep{TapponnierM76, WangLiu2009}. Sikkim Himalaya lies at the transition between this relatively simple
this transition includes deviation of the Himalaya from a perfect arc beyond 89\deg\ longitude \citep{BendickB2001}, reduction
\citep{BhattacharyaM2009, Mitraetal2010}. Rivers Tista, Rangit and Rangli run through Sikkim eroding the MCT and Ramgarh
thrust to form a mushroom-shaped physiography \citep{Mukuletal2009,Mitraetal2010}. Within this sinuous physiography,
\citep{Pauletal2015} and also in accordance with the findings of \citet{Mitraetal2005} for northeast India. In another study
field results corroborate well with seismic studies in this region \citep{Actonetal2011, Arunetal2010}. From studies of
Usage:
awk -f f.awk f.example
Expected ouput:
BendickB2001
Arunetal2010
Pauletal2015
Mitraetal2005
BilhamE01
Mukuletal2009
TapponnierM76
WangLiu2009
BhattacharyaM2009
Mitraetal2010
Actonetal2011
Vernantetal2014

Regex to find a newline character ("\n") and replace with empty string from address

We have a string which contains address in it like below:
"first-name, last-name, email, address\n Ashok, G, \"Hyderabad\nTelangana\n India\"\n John, M, \"Mayur Vihar\nNew Delhi\n110096, India\"\n"
and the requirement is to replace all the newline characters ("\n") characters with "" from the address string only (inside \" \")
The Expected output should be like:
"first-name, last-name, email, address\n Ashok, G, \"Hyderabad Telangana India\"\n John, M, \"Mayur Vihar, New Delhi 110096, India\"\n "

\\n(?=(?:(?!\\").)*\\"(?:(?:(?!\\").)*\\"(?:(?!\\").)*\\")*(?:(?!\\").)*$)
Try this.Replace by empty string.See demo.
https://www.regex101.com/r/rG7gX4/7

I suggest you do it as follows:
str.gsub(/(?<=\").*?(?=\")/) { |s| s.gsub(/\n/,' ') }
#=> "first-name, last-name, email, address\n Ashok, G, \"heyderabad |
Telangana India\" ABCD, L, \"Guntur AP 500505, India\"\n"
This matches each string bracketed by \", which in turn is passed to the block for removal of all \n's. (?<=\") is a positive lookbehind; (?=\") is a postive lookahead. ? is needed to make .* non-greedy, so the match stops before the first matching postive lookahead.
This doesn't give quite the spacing contained in your desired output. That spacing seems somewhat inconsistent, however. For example, where did the single space at the end of the string come from? You said you wanted to replace \n between pairs of \", but you didn't say what you want to replace it with. (I assumed one space.) If you want different spacing, you could adjust the regex used by gsub inside the block. For example, you might have /\s*\n\s*/.

How to handle Combining Diacritical Marks with UnicodeUtils?

I am trying to insert spaces into a string of IPA characters, e.g. to turn ɔ̃wɔ̃tɨ into ɔ̃ w ɔ̃ t ɨ. Using split/join was my first thought:
s = ɔ̃w̃ɔtɨ
s.split('').join(' ') #=> ̃ ɔ w ̃ ɔ p t ɨ
As I discovered by examining the results, letters with diacritics are in fact encoded as two characters. After some research I found the UnicodeUtils module, and used the each_grapheme method:
UnicodeUtils.each_grapheme(s) {|g| g + ' '} #=> ɔ ̃w ̃ɔ p t ɨ
This worked fine, except for the inverted breve mark. The code changes ̑a into ̑ a. I tried normalization (UnicodeUtils.nfc, UnicodeUtils.nfd), but to no avail. I don't know why the each_grapheme method has a problem with this particular diacritic mark, but I noticed that in gedit, the breve is also treated as a separate character, as opposed to tildes, accents etc. So my question is as follows: is there a straightforward method of normalization, i.e. turning the combination of Latin Small Letter A and Combining Inverted Breve into Latin Small Letter A With Inverted Breve?

I understand your question concerns Ruby but I suppose the problem is about the same as with Python. A simple solution is to test the combining diacritical marks explicitly :
import unicodedata
liste=[]
s = u"ɔ̃w̃ɔtɨ"
comb=False
prec=u""
for char in s:
if unicodedata.combining(char):
liste.append(prec+char)
prec=""
else:
liste.append(prec)
prec=char
liste.append(prec)
print " ".join(liste)
>>>> ɔ̃ w̃ ɔ t ɨ

Xpath with htmlagilitypack

I am try to select the "string b" text node using XPath with the HtmlAgilliyPack.
<div>
string a<br/>
string b<br/>
string c<br/>
</div>
I am not sure how to select the text?
This won't work //div/text(1)
Anybody has some suggestions?

There are two problems with your expression:
XPath starts counting at 1, so you want the second text node
text() is a node filter which does not accept arguments. If you want to limit to the second text node, use the predicate [position() = 2] or the short version [2].
Use this expression:
//div/text()[2]
Selecting text nodes can include some hassles, chopping leading and trailing whitespace and omitting whitespace-only text nodes is implementation-dependent.

Try:
//div/br[1]/following-sibling::text()[1]'
The direct following text after the first br.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

multiline graphviz node with math mode - graphviz

From the dot2tex documentation page: The \ character needs to be escaped with \\ if used in the label attribute. Therefore, label="first line\\\\second line" (four backslashes) should result in a latex newline sequence (can't test it though).

I found the solution here by using matrix package from tikz \begin{dot2tex}[neato,scale=.8,options=-t math] digraph G { c[shape=none,texlbl="$ \begin{matrix} x_1 \\ x_2 \end{matrix}$" ,pos="1,-1.2!"]; } \end{dot2tex} will add node with two lines $x_1$ and $x_2$.

Related

C backslash newline in the middle of word emits error

sed to get string between two patterns

Regex to find a newline character ("\n") and replace with empty string from address

How to handle Combining Diacritical Marks with UnicodeUtils?

Xpath with htmlagilitypack

Categories

Resources