I want to improve the readability of pyparsing's debugging output by adding indentation. For example, instead of this:
Match part at loc 0(1,1)
Match subpart1 at loc 0(1,1)
Match subsubpart1 at loc 0(1,1)
Matched subsubpart1 at loc 10(2,1) -> ...
Matched subpart1 at loc 20(3,1) -> ...
Match subpart2 at loc 20(3,1)
Match subsubpart2 at loc 20(3,1)
Matched subsubpart2 at loc 30(4,1) -> ...
Matched subpart2 at loc 40(5,1) -> ...
Matched part at loc 50(6,1) -> ...
I would like to have it indented like this to better understand what's going on during parsing:
Match part at loc 0(1,1)
Match subpart1 at loc 0(1,1)
Match subsubpart1 at loc 0(1,1)
Matched subsubpart1 at loc 10(2,1) -> ...
Matched subpart1 at loc 20(3,1) -> ...
Match subpart2 at loc 20(3,1)
Match subsubpart2 at loc 20(3,1)
Matched subsubpart2 at loc 30(4,1) -> ...
Matched subpart2 at loc 40(5,1) -> ...
Matched part at loc 50(6,1) -> ...
So in pyparsing.py, I just changed _defaultStartDebugAction, _defaultSuccessDebugAction and _defaultExceptionDebugAction to:
pos = -1
def _defaultStartDebugAction( instring, loc, expr ):
global pos
pos = pos + 1
print ("\t" * pos + ("Match " + _ustr(expr) + " at loc " + _ustr(loc) + "(%d,%d)" % ( lineno(loc,instring), col(loc,instring) )))
def _defaultSuccessDebugAction( instring, startloc, endloc, expr, toks ):
print ("\t" * pos + "Matched " + _ustr(expr) + " -> " + str(toks.asList()))
global pos
pos = pos - 1
def _defaultExceptionDebugAction( instring, loc, expr, exc ):
print ("\t" * pos + "Exception raised:" + _ustr(exc))
global pos
pos = pos - 1
(I just added the pos expressions and "\t" * pos to the output to get my desired result)
However, I don't like tampering directly with the pyparsing library. On the other hand, I don't want to use the .setDebugActions method on every parser element I define, I want them all to use my modified default debug actions.
Is there a way I can achieve this without having to tamper with the pyparsing.py library directly?
Thanks!
Python modules are just like any other Python object, and you can manipulate their symbols using standard Python function decorating methods. Often referred to as "monkeypatching", these can be done entirely from your own code, without modifying the actual library source.
The simplest way to implement this change is to just overwrite the symbols. In your code, write:
import pyparsing
# have to import _ustr explicitly, since it does not get pulled in with '*' import
_ustr = pyparsing._ustr
pos = -1
def defaultStartDebugAction_with_indent( instring, loc, expr ):
global pos
pos = pos + 1
print ("\t" * pos + ("Match " + _ustr(expr) + " at loc " + _ustr(loc) + "(%d,%d)" % ( lineno(loc,instring), col(loc,instring) )))
def defaultSuccessDebugAction_with_indent( instring, startloc, endloc, expr, toks ):
global pos
print ("\t" * pos + "Matched " + _ustr(expr) + " -> " + str(toks.asList()))
pos = pos - 1
def defaultExceptionDebugAction_with_indent( instring, loc, expr, exc ):
global pos
print ("\t" * pos + "Exception raised:" + _ustr(exc))
pos = pos - 1
pyparsing._defaultStartDebugAction = defaultStartDebugAction_with_indent
pyparsing._defaultSuccessDebugAction = defaultSuccessDebugAction_with_indent
pyparsing._defaultExceptionDebugAction = defaultExceptionDebugAction_with_indent
Or a cleaner version is to wrap the original methods with your code as a decorator:
pos = -1
def incr_pos(fn):
def _inner(*args):
global pos
pos += 1
print ("\t" * pos , end="")
return fn(*args)
return _inner
def decr_pos(fn):
def _inner(*args):
global pos
print ("\t" * pos , end="")
pos -= 1
return fn(*args)
return _inner
import pyparsing
pyparsing._defaultStartDebugAction = incr_pos(pyparsing._defaultStartDebugAction)
pyparsing._defaultSuccessDebugAction = decr_pos(pyparsing._defaultSuccessDebugAction)
pyparsing._defaultExceptionDebugAction = decr_pos(pyparsing._defaultExceptionDebugAction)
This way, if you update pyparsing and the original code changes, your monkeypatch will get the updates without your having to modify your copies of the original methods.
To make your intentions even clearer, and to avoid duplicating those function names (DRY), this will replace those last 3 lines:
def monkeypatch_decorate(module, name, deco_fn):
setattr(module, name, deco_fn(getattr(module, name)))
monkeypatch_decorate(pyparsing, "_defaultStartDebugAction", incr_pos)
monkeypatch_decorate(pyparsing, "_defaultSuccessDebugAction", decr_pos)
monkeypatch_decorate(pyparsing, "_defaultExceptionDebugAction", decr_pos)
Related
We have a csv file with nested double quotes column.
For example : 1,John,26,"how are you "Jim"".
In this example we have 4 columns id, name, age and message.
Here message column is having nested double quotes, which is causing the data parsing issue in convertRecord Nifi processor(could not parse incoming data error). Is there any way we can escape nested double quotes and read the data properly ?
As shown in the below image, we are using the following properties in both CSVReader and CSVRecordSetWritter controller services.
We had the exact same issue and as #daggett highlighted - How could you detect which quote is the end of the field? We even spoke with Cloudera, and everything boils down to that data does not conform to CSV standard rules.
So written a small python script which is called using ExecuteScript processor, and able to escape almost all the special characters except when double quote and dilimiter is part of the data eg. "field_1","field_2 this is very invalid", data","field_3"
Give it a go and please comment if it works so that we can encompass logic into a custom processor!
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
from org.apache.nifi.processors.script import ExecuteScript
from org.python.core.util.FileUtil import wrap
from io import StringIO
import re
# Define a subclass of StreamCallback for use in session.write()
class PyStreamCallback(StreamCallback):
def __init__(self):
pass
def process(self, inputStream, outputStream):
with wrap(inputStream) as f:
lines = f.readlines()
outer_new_value_list = []
is_header_row = True
for row in lines:
if is_header_row:
is_header_row = False
outer_new_value_list.append(row)
continue
char_list = list(row.strip())
for position, char in enumerate(char_list):
#print(position, char)
# if position == 54:
# print()
if (position + 1) == len(char_list):
continue
if position == 0:
continue
else:
if char == '"':
if char_list[position - 1] == ',' or char_list[position + 1] == ',':
# this double quote is Quote Character at start of field or end of field
continue
if char_list[position - 1] != ',' and char_list[position + 1] != ',':
# this double quote is inbetween and is not Quote Character, add escape character to it
replace_char = '\\' + char
char_list[position] = replace_char
if char == ',':
# Int values are not in double quotes, so check previous and next char is of int type
previous_char_type = ''
next_char_type = ''
try:
previous_char = char_list[position - 1]
if isinstance(int(previous_char), int):
previous_char_type = 'Int'
except:
pass
# print('previous_char : ' + str(previous_char))
try:
next_char = char_list[position + 1]
if isinstance(int(next_char), int):
next_char_type = 'Int'
except:
pass
# print(" next_char: " + str(next_char))
if previous_char_type == 'Int' or next_char_type == 'Int':
print('No need to replace this instance of comma')
continue
if char_list[position - 1] == '"' or char_list[position + 1] == '"':
# delimited comma
continue
if char_list[position - 1] != '"' and char_list[position + 1] != '"':
# not delimited comma, inbetween comma, add with escape character to it
replace_char = '\\' + char
char_list[position] = replace_char
if char == '\\':
replace_char = ''
char_list[position] = replace_char
new_data_line = ''.join([str(elem) for elem in char_list])
outer_new_value_list.append(new_data_line + '\r\n')
with wrap(outputStream, 'w') as filehandle:
filehandle.writelines("%s" % line for line in outer_new_value_list)
# end class
flowFile = session.get()
if (flowFile != None):
flowFile = session.write(flowFile, PyStreamCallback())
session.transfer(flowFile, ExecuteScript.REL_SUCCESS)
# implicit return at the end
I am trying to write a small parser with golang target, but not using visitors or walkers, but I am not able to find any sample code to build my parser upon.
For example, the following is the grammar code which I am trying to replicate with golang:
# Expr.g4:
grammar Expr;
#header {
}
#parser::members {
def eval(self, left, op, right):
if ExprParser.MUL == op.type:
return left * right
elif ExprParser.DIV == op.type:
return left / right
elif ExprParser.ADD == op.type:
return left + right
elif ExprParser.SUB == op.type:
return left - right
else:
return 0
}
stat: e NEWLINE {print($e.v);}
| ID '=' e NEWLINE {self.memory[$ID.text] = $e.v}
| NEWLINE
;
e returns [int v]
: a=e op=('*'|'/') b=e {$v = self.eval($a.v, $op, $b.v)}
| a=e op=('+'|'-') b=e {$v = self.eval($a.v, $op, $b.v)}
| INT {$v = $INT.int}
| ID
{
id = $ID.text
$v = self.memory.get(id, 0)
}
| '(' e ')' {$v = $e.v}
;
MUL : '*' ;
DIV : '/' ;
ADD : '+' ;
SUB : '-' ;
ID : [a-zA-Z]+ ; // match identifiers
INT : [0-9]+ ; // match integers
NEWLINE:'\r'? '\n' ; // return newlines to parser (is end-statement signal)
WS : [ \t]+ -> skip ; // toss out whitespace
And this is the python tester code for it:
# test_expr.py:
import sys
from antlr4 import *
from antlr4.InputStream import InputStream
from ExprLexer import ExprLexer
from ExprParser import ExprParser
if __name__ == '__main__':
parser = ExprParser(None)
parser.buildParseTrees = False
parser.memory = {} # how to add this to generated constructor?
line = sys.stdin.readline()
lineno = 1
while line != '':
line = line.strip()
istream = InputStream(line + "\n")
lexer = ExprLexer(istream)
lexer.line = lineno
lexer.column = 0
token_stream = CommonTokenStream(lexer)
parser.setInputStream(token_stream)
parser.stat()
line = sys.stdin.readline()
lineno += 1
Can anybody please post a sample golang code which is equivalent to the above python and inlined code?
I have got a problem in running the blew program for reading a text file as input file. This input file is like a matrix with 3 columns -- 3 numbers with format of (3(F3.6,1x)) -- and 4368 rows.
The input file is:
602340.440000 129706.190000 28.892939
602340.880000 129706.390000 28.955128
602884.500000 128780.700000 29.876873
602884.380000 128781.190000 29.875114
602884.250000 128781.660000 29.885448
602884.130000 128782.150000 29.895996
602883.940000 128782.630000 29.899380
602883.810000 128783.120000 29.903221
602883.690000 128783.590000 29.907070
PROGRAM is:
USE BIEF
USE DECLARATIONS_TELEMAC2D
IMPLICIT NONE
INTEGER LNG,LU, ITRAC,I, NSOM,J, K, NDOWN
INTEGER, PARAMETER :: NLINE =4368
DOUBLE PRECISION, PARAMETER:: BATHY_RADIER_up= 29.84D0
DOUBLE PRECISION, PARAMETER:: DEPTH_up = 2.15D0
REAL :: A(5000),B(5000),C(5000)
DOUBLE PRECISION :: XPOLYD(14), YPOLYD(14), INPOLYD(14)
COMMON/INFO/LNG,LU
DOUBLE PRECISION XPOLY(6), YPOLY(6),COTE_RADIER_up
NSOM = 6
XPOLY(1) = 602883.13
XPOLY(2) = 602886.15
XPOLY(3) = 602887.15
XPOLY(4) = 602905.46
XPOLY(5) = 602902.52
XPOLY(6) = 602884.13
YPOLY(1) = 128779.99
YPOLY(2) = 128780.80
YPOLY(3) = 128777.12
YPOLY(4) = 128741.21
YPOLY(5) = 128739.75
YPOLY(6) = 128775.96
AT = 0.D0
CALL OS( 'X=0 ' , X=U )
CALL OS( 'X=0 ' , X=V )
IF(CDTINI(1:10).EQ.'COTE NULLE'.OR.
* CDTINI(1:14).EQ.'ZERO ELEVATION') THEN
CALL OS( 'X=C ' , H , H , H , 0.D0 )
CALL OS( 'X=X-Y ' , H , ZF , H , 0.D0 )
ELSEIF(CDTINI(1:14).EQ.'COTE CONSTANTE'.OR.
* CDTINI(1:18).EQ.'CONSTANT ELEVATION') THEN
CALL OS( 'X=C ' , H , H , H , COTINI )
CALL OS( 'X=X-Y ' , H , ZF , H , 0.D0 )
ELSEIF(CDTINI(1:13).EQ.'HAUTEUR NULLE'.OR.
* CDTINI(1:10).EQ.'ZERO DEPTH') THEN
CALL OS( 'X=C ' , H , H , H , 0.D0 )
ELSEIF(CDTINI(1:17).EQ.'HAUTEUR CONSTANTE'.OR.
* CDTINI(1:14).EQ.'CONSTANT DEPTH') THEN
CALL OS( 'X=C ' , H , H , H , HAUTIN )
ELSEIF(CDTINI(1:13).EQ.'PARTICULIERES'.OR.
* CDTINI(1:10).EQ.'PARTICULAR'.OR.
* CDTINI(1:07).EQ.'SPECIAL') THEN
NDOWN = 14
XPOLYD(1) = 602883.13
XPOLYD(2) = 602886.15
XPOLYD(3) = 602864.47
XPOLYD(4) = 602837.90
XPOLYD(5) = 602821.91
XPOLYD(6) = 602649.77
XPOLYD(7) = 602634.35
XPOLYD(8) = 602345.08
XPOLYD(9) = 602326.07
XPOLYD(10) = 602619.31
XPOLYD(11) = 602638.33
XPOLYD(12) = 602811.64
XPOLYD(13) = 602831.52
XPOLYD(14) = 602857.16
YPOLYD(1) = 128779.99
YPOLYD(2) = 128780.80
YPOLYD(3) = 128867.74
YPOLYD(4) = 128936.74
YPOLYD(5) = 128953.95
YPOLYD(6) = 129105.43
YPOLYD(7) = 129143.43
YPOLYD(8) = 129713.38
YPOLYD(9) = 129708.26
YPOLYD(10) = 129136.41
YPOLYD(11) = 129094.72
YPOLYD(12) = 128941.16
YPOLYD(13) = 128931.09
YPOLYD(14) = 128865.81
PRINT *, 'opening file'
DO 10 J=1,NPOIN
IF(INPOLY(X(J),Y(J),XPOLY,YPOLY,NSOM)) THEN
PRINT *, 'upstream area'
H%R(J)=MAX(0.D0,COTE_RADIER_up-ZF%R(J))
U%R(J)=0.0D0
PRINT *, 'upstream area'
write(lu,*) 'upstream ....',J,H%r(J)
ELSE
IF(INPOLY(X(J),Y(J),XPOLYD,YPOLYD,NDOWN)) THEN
OPEN(unit =90, FILE = 'cunnette_xyz.txt', FORM='FORMATTED')
PRINT *, 'downstream area'
READ(90,*) A(K),B(K),C(K)
PRINT *, 'already read'
DO K=1,NLINE
PRINT *, "number of lines read:", NLINE
IF(A(K).EQ.X(J).AND.B(K).EQ.Y(J)) then
PRINT *, 'Nodes are inside'
H%R(K)=0.45D0
U%R(K)=0.D0
ELSE
H%R(K)=0.0D0
U%R(K)=0.0D0
ENDIF
ENDDO
CLOSE(90)
ENDIF
ENDIF
10 CONTINUE
ELSE
IF(LNG.EQ.1) THEN
WRITE(LU,*) 'CONDIN : CONDITION INITIALE NON PREVUE : ',CDTINI
ENDIF
IF(LNG.EQ.2) THEN
WRITE(LU,*) 'CONDIN: INITIAL CONDITION UNKNOWN: ',CDTINI
ENDIF
STOP
ENDIF
IF(NTRAC.GT.0) THEN
DO ITRAC=1,NTRAC
CALL OS( 'X=C ' , X=T%ADR(ITRAC)%P , C=TRAC0(ITRAC) )
ENDDO
ENDIF
CALL OS( 'X=C ' , VISC , VISC , VISC , PROPNU )
RETURN
END
The error message when running is:
at line read file: Fortran runtime error: End of file.
My last output is 'downstream area' aftre open Command. Could anybody help me please?
First of all, you are opening the file repeatedly for every value of J. This seems wrong.
Secondly, you have not given the variable NPOIN a value (greater than 1), so your outer loop on J will not stop running (I guess), and this will eventually lead to you trying to read beyond the end of file.
You have nested loops, one on J and one on K, which seems not logical. You should have one only.
Open the file outside of the J loop, and then read line per line until you reach the end of the file:
INTEGER :: Reason
INTEGER :: NPOIN
OPEN(unit =90, IOSTAT=Reason, FILE = 'cunnette_xyz.txt', FORM='FORMATTED')
IF (Reason > 0) THEN
* ... something wrong ...
PRINT *, 'Error opening file'
STOP
END IF
NPOIN = 0
DO
NPOIN = NPOIN + 1
READ(*,*, IOSTAT=Reason) A(NPOIN), B(NPOIN), C(NPOIN)
IF (Reason > 0) THEN
PRINT *, 'Error reading input from file. Aborted'
STOP
ELSE IF (Reason < 0) THEN
* ... end of file reached ...
NPOIN = NPOIN - 1
PRINT *, 'All data read from file'
EXIT
END IF
END DO
* All input should be in A, B, C arrays now, with NPOIN entries
DO K=1, NPOIN
* Your processing comes here. No more file I/O.
END DO
I'm making a JSON parser and I am looking for an algorithm that can find all of the matching brackets ([]) and braces ({}) and put them into a table with the positions of the pair.
Examples of returned values:
table[x][firstPos][secondPos] = type
table[x] = {firstPos, secondPos, bracketType}
EDIT: Let parse() be the function that returns the bracket pairs. Let table be the value returned by the parse() function. Let codeString be the string containing the brackets that I want to detect. Let firstPos be the position of the first bracket in the Nth pair of brackets. Let secondPos be the position of the second bracket in the Nth pair of brackets. Let bracketType be the type of the bracket pair ("bracket" or "brace").
Example:
If you called:
table = parse(codeString)
table[N][firstPos][secondPos] would be equal to type.
Well, In plain Lua, you could do something like this, also taking into account nested brackets:
function bm(s)
local res ={}
if not s:match('%[') then
return s
end
for k in s:gmatch('%b[]') do
res[#res+1] = bm(k:sub(2,-2))
end
return res
end
Of course you can generalize this easy enough to braces, parentheses, whatever (do keep in mind the necessary escaping of [] in patterns , except behind the %b pattern).
If you're not restricted to plain Lua, you could use LPeg for more flexibility
If you are not looking for the contents of the brackets, but the locations, the recursive approach is harder to implement, since you should keep track of where you are. Easier is just walking through the string and match them while going:
function bm(s,i)
local res={}
res.par=res -- Root
local lev = 0
for loc=1,#s do
if s:sub(loc,loc) == '[' then
lev = lev+1
local t={par=res,start=loc,lev=lev} -- keep track of the parent
res[#res+1] = t -- Add to the parent
res = t -- make this the current working table
print('[',lev,loc)
elseif s:sub(loc,loc) == ']' then
lev = lev-1
if lev<0 then error('too many ]') end -- more closing than opening.
print(']',lev,loc)
res.stop=loc -- save bracket closing position
res = res.par -- revert to the parent.
end
end
return res
end
Now that you have all matched brackets, you can loop through the table, extracting all locations.
I figured out my own algorithm.
function string:findAll(query)
local firstSub = 1
local lastSub = #query
local result = {}
while lastSub <= #self do
if self:sub(firstSub, lastSub) == query then
result[#result + 1] = firstSub
end
firstSub = firstSub + 1
lastSub = lastSub + 1
end
return result
end
function string:findPair(openPos, openChar, closeChar)
local counter = 1
local closePos = openPos
while closePos <= #self do
closePos = closePos + 1
if self:sub(closePos, closePos) == openChar then
counter = counter + 1
elseif self:sub(closePos, closePos) == closeChar then
counter = counter - 1
end
if counter == 0 then
return closePos
end
end
return -1
end
function string:findBrackets(bracketType)
local openBracket = ""
local closeBracket = ""
local openBrackets = {}
local result = {}
if bracketType == "[]" then
openBracket = "["
closeBracket = "]"
elseif bracketType == "{}" then
openBracket = "{"
closeBracket = "}"
elseif bracketType == "()" then
openBracket = "("
closeBracket = ")"
elseif bracketType == "<>" then
openBracket = "<"
closeBracket = ">"
else
error("IllegalArgumentException: Invalid or unrecognized bracket type "..bracketType.."\nFunction: findBrackets()")
end
local openBrackets = self:findAll(openBracket)
if not openBrackets[1] then
return {}
end
for i, j in pairs(openBrackets) do
result[#result + 1] = {j, self:findPair(j, openBracket, closeBracket)}
end
return result
end
Will output:
5 14
6 13
7 12
8 11
9 10
I've been parsing Excel documents in Perl successfully with Spreadhsheet::ParseExcel (as recommended in What's the best way to parse Excel file in Perl?), but I can't figure out how to extract cell comments.
Any ideas? A solution in Perl or Ruby would be ideal.
The Python xlrd library will parse cell comments (if you turn on xlrd.sheet.OBJ_MSO_DEBUG, you'll see them), but it doesn't expose them from the API. You could either parse the dump or hack on it a bit so you can get to them programmatically. Here's a start (tested extremely minimally):
diff --git a/xlrd/sheet.py b/xlrd/sheet.py
--- a/xlrd/sheet.py
+++ b/xlrd/sheet.py
## -206,6 +206,7 ##
self._dimncols = 0
self._cell_values = []
self._cell_types = []
+ self._cell_notes = []
self._cell_xf_indexes = []
self._need_fix_ragged_rows = 0
self.defcolwidth = None
## -252,6 +253,7 ##
return Cell(
self._cell_types[rowx][colx],
self._cell_values[rowx][colx],
+ self._cell_notes[rowx][colx],
xfx,
)
## -422,12 +424,14 ##
if self.formatting_info:
self._cell_xf_indexes[nrx].extend(aa('h', [-1]) * nextra)
self._cell_values[nrx].extend([''] * nextra)
+ self._cell_notes[nrx].extend([None] * nextra)
if nc > self.ncols:
self.ncols = nc
self._need_fix_ragged_rows = 1
if nr > self.nrows:
scta = self._cell_types.append
scva = self._cell_values.append
+ scna = self._cell_notes.append
scxa = self._cell_xf_indexes.append
fmt_info = self.formatting_info
xce = XL_CELL_EMPTY
## -436,6 +440,7 ##
for _unused in xrange(self.nrows, nr):
scta([xce] * nc)
scva([''] * nc)
+ scna([None] * nc)
if fmt_info:
scxa([-1] * nc)
else:
## -443,6 +448,7 ##
for _unused in xrange(self.nrows, nr):
scta(aa('B', [xce]) * nc)
scva([''] * nc)
+ scna([None] * nc)
if fmt_info:
scxa(aa('h', [-1]) * nc)
self.nrows = nr
## -454,6 +460,7 ##
aa = array_array
s_cell_types = self._cell_types
s_cell_values = self._cell_values
+ s_cell_notes = self._cell_notes
s_cell_xf_indexes = self._cell_xf_indexes
s_dont_use_array = self.dont_use_array
s_fmt_info = self.formatting_info
## -465,6 +472,7 ##
nextra = ncols - rlen
if nextra > 0:
s_cell_values[rowx][rlen:] = [''] * nextra
+ s_cell_notes[rowx][rlen:] = [None] * nextra
if s_dont_use_array:
trow[rlen:] = [xce] * nextra
if s_fmt_info:
## -600,6 +608,7 ##
bk_get_record_parts = bk.get_record_parts
bv = self.biff_version
fmt_info = self.formatting_info
+ txos = {}
eof_found = 0
while 1:
# if DEBUG: print "SHEET.READ: about to read from position %d" % bk._position
## -877,13 +886,23 ##
break
elif rc == XL_OBJ:
# handle SHEET-level objects; note there's a separate Book.handle_obj
- self.handle_obj(data)
+ obj = self.handle_obj(data)
+ if obj:
+ obj_id = obj.id
+ else:
+ obj_id = None
elif rc == XL_MSO_DRAWING:
self.handle_msodrawingetc(rc, data_len, data)
elif rc == XL_TXO:
- self.handle_txo(data)
+ txo = self.handle_txo(data)
+ if txo and obj_id:
+ txos[obj_id] = txo
+ obj_id = None
elif rc == XL_NOTE:
- self.handle_note(data)
+ note = self.handle_note(data)
+ txo = txos.get(note.object_id)
+ if txo:
+ self._cell_notes[note.rowx][note.colx] = txo.text
elif rc == XL_FEAT11:
self.handle_feat11(data)
elif rc in bofcodes: ##### EMBEDDED BOF #####
## -1387,19 +1406,16 ##
def handle_obj(self, data):
- if not OBJ_MSO_DEBUG:
- return
- DEBUG = 1
if self.biff_version < 80:
return
o = MSObj()
data_len = len(data)
pos = 0
- if DEBUG:
+ if OBJ_MSO_DEBUG:
fprintf(self.logfile, "... OBJ record ...\n")
while pos < data_len:
ft, cb = unpack('<HH', data[pos:pos+4])
- if DEBUG:
+ if OBJ_MSO_DEBUG:
hex_char_dump(data, pos, cb, base=0, fout=self.logfile)
if ft == 0x15: # ftCmo ... s/b first
assert pos == 0
## -1430,16 +1446,14 ##
else:
# didn't break out of while loop
assert pos == data_len
- if DEBUG:
+ if OBJ_MSO_DEBUG:
o.dump(self.logfile, header="=== MSOBj ===", footer= " ")
+ return o
def handle_note(self, data):
- if not OBJ_MSO_DEBUG:
- return
- DEBUG = 1
if self.biff_version < 80:
return
- if DEBUG:
+ if OBJ_MSO_DEBUG:
fprintf(self.logfile, '... NOTE record ...\n')
hex_char_dump(data, 0, len(data), base=0, fout=self.logfile)
o = MSNote()
## -1453,13 +1467,11 ##
o.original_author, endpos = unpack_unicode_update_pos(data, 8, lenlen=2)
assert endpos == data_len - 1
o.last_byte = data[-1]
- if DEBUG:
+ if OBJ_MSO_DEBUG:
o.dump(self.logfile, header="=== MSNote ===", footer= " ")
+ return o
def handle_txo(self, data):
- if not OBJ_MSO_DEBUG:
- return
- DEBUG = 1
if self.biff_version < 80:
return
o = MSTxo()
## -1477,8 +1489,9 ##
rc3, data3_len, data3 = self.book.get_record_parts()
assert rc3 == XL_CONTINUE
# ignore the formatting runs for the moment
- if DEBUG:
+ if OBJ_MSO_DEBUG:
o.dump(self.logfile, header="=== MSTxo ===", footer= " ")
+ return o
def handle_feat11(self, data):
if not OBJ_MSO_DEBUG:
## -1638,11 +1651,12 ##
class Cell(BaseObject):
- __slots__ = ['ctype', 'value', 'xf_index']
+ __slots__ = ['ctype', 'value', 'note', 'xf_index']
- def __init__(self, ctype, value, xf_index=None):
+ def __init__(self, ctype, value, note=None, xf_index=None):
self.ctype = ctype
self.value = value
+ self.note = note
self.xf_index = xf_index
def __repr__(self):
Then you could write something like:
import xlrd
xlrd.sheet.OBJ_MSO_DEBUG = True
xls = xlrd.open_workbook('foo.xls')
for sheet in xls.sheets():
print 'sheet %s (%d x %d)' % (sheet.name, sheet.nrows, sheet.ncols)
for rownum in xrange(sheet.nrows):
for cell in sheet.row(rownum):
print cell, cell.note
One option is to use Ruby's win32ole library.
The following (somewhat verbose) example connects to an open Excel worksheet and gets the comment text from cell B2.
require 'win32ole'
xl = WIN32OLE.connect('Excel.Application')
ws = xl.ActiveSheet
cell = ws.Range('B2')
comment = cell.Comment
text = comment.Text
More info and examples of using Ruby's win32ole library to automate Excel can be found here:
http://rubyonwindows.blogspot.com/search/label/excel