Chinese NER doesn't recognize some locations contained in Weibo texts - stanford-nlp

I'm currently doing some work in classifying Chinese Weibo texts, in which one of the steps is to extract the Geo-locations contained in the texts. I followed the steps described in the Stanford-NLP website, i.e., use the Chinese Word Segmenter first to segment the Chinese text and then apply the Chinese NER model on the segmented text.
However, I've seen many false negatives where the texts do contain Geo-locations but the NER software fails to recognize them. Some examples are listed below (Italics are manually labeled Geo-locations).
【开展防汛排查】6月29日,紫阳县红椿镇强降雨引发了山体滑坡和泥石流,为避免发生不安全事故,红椿派出所与交警中队民警冒雨开展重点路段巡查,疏导交通,排查险情。目前,共排查险情3处,救助因山体落石被砸的伤员1名。#安康警务
【 开展/O 防汛/O 排查/O 】 6月/MISC 29日/MISC ,/O 紫阳/O 县/O 红椿镇/O 强/O 降雨/O 引发/O
了/O 山体/O 滑坡/O 和/O 泥石流/O ,/O 为/O 避免/O 发生/O 不安全/O 事故/O ,/O 红椿/O 派出所/O
与/O 交警/O 中队/O 民警/O 冒雨/O 开展/O 重点/O 路段/O 巡查/O ,/O 疏导/O 交通/O ,/O 排查/O
险情/O 。/O 目前/O ,/O 共/O 排查/O 险情/O 3/O 处/O ,/O 救助/O 因/O 山体/O 落石/O 被/O 砸/O
的/O 伤员/O 1/O 名/O 。/O #/O 安康/O 警务/O
【云南预警发布中心】沧源县气象台2015年7月16日14时00分发布暴雨蓝色预警信号:预计未来12小时,我县西部、南部的部分地区降雨量将达50毫米以上,请注意防范局地洪涝、滑坡和泥石流等灾害。
【 云南/ORG 预警/ORG 发布/ORG 中心/ORG 】 沧源/ORG 县/ORG 气象台/ORG 2015年/MISC
7月/MISC 16日/MISC 14时00/MISC 分/MISC 发布/O 暴雨/O 蓝色/O 预警/O 信号/O :/O 预计/O
未来/O 12/MISC 小时/MISC ,/O 我/O 县/O 西部/O 、/O 南部/O 的/O 部分/O 地区/O 降雨量/O 将/O
达/O 50/O 毫米/O 以上/O ,/O 请/O 注意/O 防范/O 局地/O 洪涝/O 、/O 滑坡/O 和/O 泥石流/O 等/O
灾害/O 。/O
【张掖肃南县遭受山洪泥石流灾害 暂无人员伤亡报告】
【 张掖肃/PERSON 南县/O 遭受/O 山洪/O 泥石流/O 灾害/O 暂/O 无/O 人员/O 伤亡/O 报告/O 】
马尔康县马江街红苕沟泥石流!
马尔康县/O 马江/O 街/O 红苕/O 沟/O 泥石流/O !/O
走G214时候已经见过了陡坡,急弯,泥石流,滑坡,临水临崖,积雪泥泞等各种路况,今天出左贡县这段几十公里简直想骂娘,这种烂泥搓板路简直专治肾结石,哪儿结石都给你颠出来……
走/O G214/O 时候/O 已经/O 见/O 过/O 了/O 陡坡/O ,/O 急弯/O ,/O 泥石流/O ,/O 滑坡/O ,/O 临/O 水/O 临崖/O ,/O 积雪/O 泥泞/O 等/O 各/O 种/O 路况/O ,/O 今天/MISC 出/O 左贡县/O 这/O 段/O 几十/MISC 公里/MISC 简直/O 想/O 骂娘/O ,/O 这/O 种/O 烂泥/O 搓板/O 路/O 简直/O 专治/O 肾/O 结石/O ,/O 哪儿/O 结石/O 都/O 给/O 你颠/O 出来/O .../O .../O
One weird thing for the last example is that the online demo can correctly classify just the word "左贡县" by itself as GPE, while when I run it on my computer it will print "左贡县/O".
I don't know if I'm using the software correctly, and I don't know how I'm supposed to handle these problems if I'm using the software correctly. What can I possibly do to correct these? Like training my own model?
I really appreciate any helps.

Has the Chinese character been embedded in your software? Lacking embedded font might be the problem.

Related

Why does the number of observations change the prediction of a sarimax model with fixed coefficients?

After training a sarimax model, I had hoped to be able to preform forecasts in future using it with new observations without having to retrain it. However, I noticed that the number of observations i use in the newly applied forecast change the predictions.
From my understanding, provided that enough observations are given to allow the autoregression and moving average to be calculated correctly, the model would not even use the earlier historic observations to inform itself as the coefficients are not being retrained. In a (3,0,1) example i would have thought it would need atleast 3 observations to apply its trained coefficients. However this does not seem to be the case and i am questioning whether i have understood the model correctly.
as an example and test, i have applied a trained sarimax to the exact same data with the initial few observations removed to test the effect of the number of rows on the prediction with the following code:
import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX, SARIMAXResults
y = [348, 363, 435, 491, 505, 404, 359, 310, 337, 360, 342, 406, 396, 420, 472, 548, 559, 463, 407, 362, 405, 417, 391, 419, 461, 472, 535, 622, 606, 508, 461, 390, 432]
ynew = y[10:]
print(ynew)
model = SARIMAX(endog=y, order=(3,0,1))
model = model.fit()
print(model.params)
pred1 = model.predict(start=len(y), end = len(y)+7)
model2 = model.apply(ynew)
print(model.params)
pred2 = model2.predict(start=len(ynew), end = len(ynew)+7)
print(pd.DataFrame({'pred1': pred1, 'pred2':pred2}))
The results are as follows:
pred1 pred2
0 472.246996 472.711770
1 494.753955 495.745968
2 498.092585 499.427285
3 489.428531 490.862153
4 477.678527 479.035869
5 469.023243 470.239459
6 465.576002 466.673790
7 466.338141 467.378903
Based on this, it means that if I were to produce a forecast from a trained model with new observations, the change in the number of observations itself would impact the integrity of the forecast.
What is the explanation for this? What is the standard practice for applying a trained model on new observations given the change in the number of them?
If i wanted to update the model but could not control for whether or not i had all of the original observations from the very start of my training set, this test would indicate that my forecast might as well be random numbers.
Main issue
The main problem here is that you are not using your new results object (model2) for your second set of predictions. You have:
pred2 = model.predict(start=len(ynew), end = len(ynew)+7)
but you should have:
pred2 = model2.predict(start=len(ynew), end = len(ynew)+7)
If you fix this, you get very similar predictions:
pred1 pred2
0 472.246996 472.711770
1 494.753955 495.745968
2 498.092585 499.427285
3 489.428531 490.862153
4 477.678527 479.035869
5 469.023243 470.239459
6 465.576002 466.673790
7 466.338141 467.378903
To understand why they're not identical, there is a second issue (which is not a problem in your code, but just a statistical feature of your data/model).
Secondary issue
Your estimated parameters imply an extremely persistent model:
print(params)
gives
ar.L1 2.134401
ar.L2 -1.683946
ar.L3 0.549369
ma.L1 -0.874801
sigma2 1807.187815
with is associated with a near-unit-root process (largest eigenvalue
= 0.99957719).
What this means is that it takes a very long time for the effects of a particular datapoint on the forecast to die out. In your case, this just means that there are still small effects on the forecasts from the first 10 periods.
This isn't a problem, it's just the way this particular estimated model works.

Plotting a list of lines in Mathematica and trimming to an area

I have specified some lines in a list for example
linelist = {Line[{{-390, 1}, {1690, 1}}],
Line[{{-390, 40}, {1690, 40}}], Line[{{-390, 79}, {1690, 79}}],
Line[{{-390, 118}, {1690, 118}}], Line[{{-390, 781}, {1690, 781}}],
Line[{{-390, 820}, {1690, 820}}], Line[{{-390, 859}, {1690, 859}}],
Line[{{-390, 898}, {1690, 898}}], Line[{{-498, 460}, {1185, 1682}}],
Line[{{-521, 491}, {1162, 1714}}],
Line[{{-544, 523}, {1139, 1745}}],
Line[{{-567, 554}, {1116, 1777}}],
Line[{{-590, 586}, {1093, 1809}}],
Line[{{-613, 617}, {1070, 1840}}],
Line[{{-636, 649}, {1047, 1872}}],
Line[{{-659, 681}, {1024, 1903}}],
Line[{{946, -541}, {1588, 1437}}],
Line[{{908, -528}, {1551, 1449}}],
Line[{{871, -517}, {1514, 1462}}],
Line[{{834, -504}, {1477, 1473}}],
Line[{{797, -493}, {1440, 1486}}],
Line[{{760, -481}, {1402, 1498}}],
Line[{{723, -469}, {1366, 1510}}],
Line[{{686, -457}, {1328, 1522}}],
Line[{{1291, -237}, {648, 1741}}],
Line[{{1255, -250}, {611, 1729}}],
Line[{{1217, -261}, {575, 1717}}],
Line[{{1181, -274}, {538, 1705}}],
Line[{{1143, -285}, {501, 1693}}],
Line[{{1107, -296}, {463, 1681}}],
Line[{{1069, -309}, {427, 1668}}],
Line[{{1032, -321}, {389, 1657}}], Line[{{995, -333}, {352, 1646}}],
Line[{{958, -345}, {316, 1633}}],
Line[{{1002, -638}, {-680, 584}}], Line[{{979, -668}, {-703, 553}}]}
Graphics#linelist
I'm trying to figure out a way to iterate through each line to perform a test, for example the distance from the 0,0 coordinate.
Also, the end points are outside of my area of concern. I would like to constrain the lines to a boxed area, say from -1600,-1600 to 1600, 1600
I've been playing with this for hours and trying to make a for loop work for the Line statements, but, then I can't get them back on the same graph.
The plot I get without clipping is:
The plot I get with clipping works for horizontal lines, but, the slanted lines are no longer parallel. (from suggested answer below)
newlinelist = Map[({{x1, y1}, {x2, y2}} = #[[1]];
Line[{{Clip[x1, {0, 1300}],
Clip[y1, {0, 1300}]}, {Clip[x2, {0, 1300}],
Clip[y2, {0, 1300}]}}]) &, linelist]
Other kinds of programming languages depend heavily on you writing For loops. It is possible to do that in Mathematica, but there are other ways to do things. For example:
linelist={
Line[{{-390,1},{1690,1}}],Line[{{-390,40},{1690,40}}],Line[{{-390,79},{1690,79}}],
Line[{{-390,118},{1690,118}}],Line[{{-390,781},{1690,781}}],Line[{{-390,820},{1690,820}}],
Line[{{-390,859},{1690,859}}],Line[{{-390,898},{1690,898}}],Line[{{-498,460},{1185,1682}}],
Line[{{-521,491},{1162,1714}}],Line[{{-544,523},{1139,1745}}],Line[{{-567,554},{1116,1777}}],
Line[{{-590,586},{1093,1809}}],Line[{{-613,617},{1070,1840}}],Line[{{-636,649},{1047,1872}}],
Line[{{-659,681},{1024,1903}}],Line[{{946,-541},{1588,1437}}],Line[{{908,-528},{1551,1449}}],
Line[{{871,-517},{1514,1462}}],Line[{{834,-504},{1477,1473}}],Line[{{797,-493},{1440,1486}}],
Line[{{760,-481},{1402,1498}}],Line[{{723,-469},{1366,1510}}],Line[{{686,-457},{1328,1522}}],
Line[{{1291,-237},{648,1741}}],Line[{{1255,-250},{611,1729}}],Line[{{1217,-261},{575,1717}}],
Line[{{1181,-274},{538,1705}}],Line[{{1143,-285},{501,1693}}],Line[{{1107,-296},{463,1681}}],
Line[{{1069,-309},{427,1668}}],Line[{{1032,-321},{389,1657}}],Line[{{995,-333},{352,1646}}],
Line[{{958,-345},{316,1633}}],Line[{{1002,-638},{-680,584}}],Line[{{979,-668},{-703,553}}]};
newlinelist=Map[({{x1,y1},{x2,y2}}=#[[1]];
Line[{{Clip[x1,{-1600,1600}],Clip[y1,{-1600,1600}]},
{Clip[x2,{-1600,1600}],Clip[y2,{-1600,1600}]}}])&,linelist
]
returns
{Line[{{-390,1},{1600,1}}],Line[{{-390,40},{1600,40}}],Line[{{-390,79},{1600,79}}],
Line[{{-390,118},{1600,118}}],Line[{{-390,781},{1600,781}}],Line[{{-390,820},{1600,820}}],
Line[{{-390,859},{1600,859}}],Line[{{-390,898},{1600,898}}],Line[{{-498,460},{1185,1600}}],
Line[{{-521,491},{1162,1600}}],Line[{{-544,523},{1139,1600}}],Line[{{-567,554},{1116,1600}}],
Line[{{-590,586},{1093,1600}}],Line[{{-613,617},{1070,1600}}],Line[{{-636,649},{1047,1600}}],
Line[{{-659,681},{1024,1600}}],Line[{{946,-541},{1588,1437}}],Line[{{908,-528},{1551,1449}}],
Line[{{871,-517},{1514,1462}}],Line[{{834,-504},{1477,1473}}],Line[{{797,-493},{1440,1486}}],
Line[{{760,-481},{1402,1498}}],Line[{{723,-469},{1366,1510}}],Line[{{686,-457},{1328,1522}}],
Line[{{1291,-237},{648,1600}}],Line[{{1255,-250},{611,1600}}],Line[{{1217,-261},{575,1600}}],
Line[{{1181,-274},{538,1600}}],Line[{{1143,-285},{501,1600}}],Line[{{1107,-296},{463,1600}}],
Line[{{1069,-309},{427,1600}}],Line[{{1032,-321},{389,1600}}],Line[{{995,-333},{352,1600}}],
Line[{{958,-345},{316,1600}}],Line[{{1002,-638},{-680,584}}],Line[{{979,-668},{-703,553}}]}
What that does is use the Map function which takes a list and another
function to apply to each item in that list and it returns a list of the
results. What that function does for your application is extract the x1,y1,x2,y2 values out of your Line and then uses the Mathematica Clip function to constrain the values and finally constructs a new Line with the new values.
That # and & function stuff may be difficult for a new user to understand.
Here is an alternate way of writing that which should do the same thing.
f[Line[{{x1_,y1_},{x2_,y2_}}]]:=Line[{{Clip[x1,{-1600,1600}],Clip[y1,{-1600,1600}]},
{Clip[x2,{-1600,1600}],Clip[y2,{-1600,1600}]}}];
newlinelist=Map[f,linelist]
You should verify that it did correctly trim each of your lines
to lie in the -1600...1600 box that you desired.
I am a little worried about the result. If you compare these two graphics
Graphics[linelist]
Graphics[newlinelist]
you can see that the upper part of those are different and it doesn't seem to just be because of trimming the ranges of x and y. Notice that some of the lines are no longer parallel in the second one. You should try to convince yourself if that is what you really want or not.
A completely different way of getting the different graphic, but without changing the underlying list of lines, is to compare these two
Graphics[linelist]
Graphics[linelist,PlotRange->{{-1600,1600},{-1600,1600}}]
Notice that all the lines remain parallel in the second one.
You wrote that you had tried using PlotRange without success and I think you should study exactly why that didn't work for you and whether this does work for you.

What do the four digit sequences in a JPEG mean?

So when I open a JPEG file in sublime or a text editor, i get a very long list of four digit sequences, such as:
84ac b7ac 5b2a ccda 5557 5541 af6a c5ae
17a8 d11c ec18 da5e c4c7 6b7a 9f25 896c
44b4 cf7b 52af 8ac9 4179 ec95 858c 0756
7395 3b36 71d7 99b3 d21e 2ae5 dbbe 72de
37d0 b2f3 b3d6 d352 cb46 7c3d c6de 7c47
0be9 a7ab 3f8b a7d7 a744 7cab d2fa d56f
c873 f49e bcb1 7469 d856 51af 743f 3fc8
f59e 4b58 5d79 f669 37d1 73e5 4d7f 5ecc
6f8b caf4 3c6e 8ad9 b69b e74e 6df5 823a
7cde a3f2 be27 b2e2 eeda d79d beea c785
b3b3 4e9c fcb1 b69e 9c73 871a c643 a29b
5880 6d15 f126 57a7 2af0 f93a 7a7d 1e47
522a d2c7 3d30 0bf3 ba4b aa69 a8b6 679a
fa47 3bbb c5d3 a466 e8a9 5ce3 a6d5 7c64
f4dc c34e 71dc c699 b4db 4ac6 fb39 41ae
8d7c dd0a 771d 1dae 5e7e 3f47 78f3 a6c3
52f3 162a 88a2 8e90 0578 c40e 010c 8c64
2ccc d5ed 96b3 55b2 a040 ed4b 3f9a f519
375e 3317 bb3d b979 4b7d 0e0d 12fb 4e37
63cd db07 cabe b3e4 749f 0bb7 b3de f4f1
f29a 7d4f 439e fc97 a7e9 dbc3 a60a fa73
What do these sequences of 4 digits mean? I tried googling this but haven't got an answer. Does each one relate to a pixel? or the color of a pixel?
It's the text editor's representation of the binary code that makes up the image file. Each set does not necessarily have to relate to one pixel or one detail of a pixel.

Smoothing measured data in MATLAB?

I have measured data from MATLAB and I'm wondering how to best smooth the data?
Example data (1st colum=x-data / second-colum=y-data):
33400 209.11
34066 210.07
34732 212.3
35398 214.07
36064 215.61
36730 216.95
37396 218.27
38062 219.52
38728 220.11
39394 221.13
40060 221.4
40726 222.5
41392 222.16
42058 223.29
42724 222.77
43390 223.97
44056 224.42
44722 225.4
45388 225.32
46054 225.98
46720 226.7
47386 226.53
48052 226.61
48718 227.43
49384 227.84
50050 228.41
50716 228.57
51382 228.92
52048 229.67
52714 230.02
53380 229.54
54046 231.19
54712 231.00
55378 231.5
56044 231.5
56710 231.79
57376 232.26
58042 233.12
58708 232.65
59374 233.51
60040 234.16
60706 234.21
The data in the second column should be monoton but it isn't. How to make it smooth?
I could probably invent a short algorithm myself but I think it's a better way to use an established and proven one... do you know a good way to somehow integrate the outliners to make it a monoton curve!?
Thanks in advance
Monotone in your case is always increasing!
See the options below (1. Cobb-Douglas; 2. Quadratic; 3. Cubic)
clear all
close all
load needSmooth.dat % Your data
x=needSmooth(:,1);
y=needSmooth(:,2);
n=length(x);
% Figure 1
logX=log(x);
logY=log(y);
Y=logY;
X=[ones(n,1),logX];
B=regress(Y,X);
a=exp(B(1,1));
b=B(2,1);
figure(1)
plot(x,y,'k*')
hold
for i=1:n-1
plot([x(i,1);x(i+1,1)],[a*x(i,1)^b;a*x(i+1,1)^b],'k-')
end
%Figure 2
X=[ones(n,1),x,x.*x];
Y=y;
B=regress(Y,X);
c=B(1,1);
b=B(2,1);
a=B(3,1);
figure(2)
plot(x,y,'k*')
hold
for i=1:n-1
plot([x(i,1);x(i+1,1)],[c+b*x(i,1)+a*x(i,1)^2; c+b*x(i+1,1)+a*x(i+1,1)^2],'k-')
end
%Figure 3
X=[ones(n,1),x,x.*x,x.*x.*x];
Y=y;
B=regress(Y,X);
d=B(1,1);
c=B(2,1);
b=B(3,1);
a=B(4,1);
figure(3)
plot(x,y,'k*')
hold
for i=1:n-1
plot([x(i,1);x(i+1,1)],[d+c*x(i,1)+b*x(i,1)^2+a*x(i,1)^3; d+c*x(i+1,1)+b*x(i+1,1)^2+a*x(i+1,1)^3],'k-')
end
There are also some cooked functions in Matlab such as "smooth" and "spline" that should also work in your case since your data is almost monotone.

Highlight numbers like keywords in a Notepad++ custom language (for access logs)

I want to write a custom language for access logs in Notepad++.
The Problem is that numbers (here: HTTP status codes) won't be highlighted like real keywords (i.e. GET). Notepad++ only provides a highlight color for numbers in general.
How do I handle numbers like text?
Sample log file
192.23.0.9 - - [10/Sep/2012:13:46:42 +0200] "GET /js/jquery-ui.custom.min.js HTTP/1.1" 200 206731
192.23.0.9 - - [10/Sep/2012:13:46:43 +0200] "GET /js/onmediaquery.min.js HTTP/1.1" 200 1229
192.23.0.9 - - [10/Sep/2012:13:46:43 +0200] "GET /en/contact HTTP/1.1" 200 12836
192.23.0.9 - - [10/Sep/2012:13:46:44 +0200] "GET /en/imprint HTTP/1.1" 200 17380
192.23.0.9 - - [10/Sep/2012:13:46:46 +0200] "GET /en/nothere HTTP/1.1" 404 2785
Sample custom languages
http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=User_Defined_Language_Files
I also tried editing and importing a predefined language like this:
http://notepad-plus.sourceforge.net/commun/userDefinedLang/Log4Net.xml
I thought the custom language should look like this:
<KeywordLists>
[...]
<Keywords name="Words1">404 501</Keywords>
<Keywords name="Words2">301 303</Keywords>
<Keywords name="Words3">200</Keywords>
</KeywordLists>
<Styles>
<WordsStyle name="DEFAULT" styleID="11" fgColor="000000" bgColor="FFFFFF" colorStyle="0" fontName="Courier New" fontStyle="0"/>
[...]
<WordsStyle name="KEYWORD1" styleID="5" fgColor="FF0000" bgColor="FFFFFF" colorStyle="1" fontName="" fontStyle="0"/>
<WordsStyle name="KEYWORD2" styleID="6" fgColor="0000FF" bgColor="FFFFFF" colorStyle="1" fontName="" fontStyle="1"/>
<WordsStyle name="KEYWORD3" styleID="7" fgColor="00FF00" bgColor="FFFFFF" colorStyle="1" fontName="" fontStyle="0"/>
[...]
// This line causes number highlighting. Deletion doesn't work either.
<WordsStyle name="NUMBER" styleID="4" fgColor="0F7F00" bgColor="FFFFFF" fontName="" fontStyle="0"/>
</Styles>
Unfortunately numbers will be colored in the same color.
I'd like to color them like this:
etc.
Any suggestions? How to handle the numbers like keywords?
It isn't possible to highlight numbers as keywords as the built-in lexers (parsers/language definitions) use a numeric as a token meaning that the only way to differentiate between a numeric and your keyword would be to parse the whole numeric block and then compare to the keyword list, in which case it becomes required to also parse the delimiters around the numeric block to ensure that .200. doesn't highlight as 200. This is why your numbers all highlighted as the same color; namely the 'number' color.
While this could be done using a custom lexer using either fixed position tokens or regex matching you'll find the user defined languages (the last I heard) do not have this capability.
As your request is actually a fairly simple, from what I understand, being as general as possible ( as requested in your comment )...
Highlight space delimited numeric values contained in a given set.
[[:space:]](200|301|404)[[:space:]]
We can use the 'Mark' feature of the 'Find' dialog with that regex but then everything is marked the same color like with your failed experiment.
Perhaps what would be simple and suit your needs would be to use a npp pythonscript and the Mark Style settings in the Style Configurator to get the desired result?
something like this crude macro style:
from Npp import *
def found(line, m):
global first
pos = editor.positionFromLine(line)
if first:
editor.setSelection(pos + m.end(), pos + m.start())
first = False
else:
editor.addSelection(pos + m.end(), pos + m.start())
editor.setMultipleSelection(True)
lines = editor.getUserLineSelection()
# Use space padded search since MARKALLEXT2 will act just
# like the internal lexer if only the numeric is selected
# when it is called.
first = True
editor.pysearch( " 200 ", found, 0, lines[0], lines[1])
notepad.menuCommand(MENUCOMMAND.SEARCH_MARKALLEXT1)
first = True
editor.pysearch( " 301 ", found, 0, lines[0], lines[1])
notepad.menuCommand(MENUCOMMAND.SEARCH_MARKALLEXT2)
first = True
editor.pysearch( " 404 ", found, 0, lines[0], lines[1])
notepad.menuCommand(MENUCOMMAND.SEARCH_MARKALLEXT3)
Which, to use, just use the plugin manager to install Python Script, go to the plugin menu and select New Script then paste, save, select the tab for the doc you want to parse, and execute the script (once again from the plugin menu).
Obviously you could use all 5 Mark styles for different terms, you could assign to a shortcut, and you could get more into the 'scripting' -vs- 'macro' style of nppPython and make a full blown script to parse whatever you want... shoot having a script trigger whenever you select a particular lexer style is doable too.

Resources