I'm following http://antlr3.org/api/C/buildrec.html tutorial.
It's my understanding that in order to remove/alter tokens before they are consumed by the parser I have to use none buffered stream COMMON_TREE_NODE_STREAM
In this view, how should i feed the parser ?
currently I use tstream=antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT,TOKENSOURCE(lxr));
to "feed" the parser.
Appreciate every advice
No, the COMMON_TREE_NODE_STREAM is the source for a tree parser, not the normal parser. The ANTLR_TOKEN_STREAM is the input stream for that which has a default implementation in the C runtime known as ANTLR3_COMMON_TOKEN_STREAM_struct. Look up its implementation to learn how to create your own token stream.
Related
I am using apache arrow golang library to read parquet. No-repeated column seems straight forward, but how can I read repeated field?
For reading repeated fields in Parquet there's really two answers: a complex way and an easy way.
The easy way is to use the pqarrow package and just read directly into an Arrow list array of some kind and let the complexity be handled for you. (https://pkg.go.dev/github.com/apache/arrow/go/v10#v10.0.1/parquet/pqarrow)
To read them the complex way, you have to understand repetition and definition levels and how Parquet uses them. Instead of trying to explain them here, I'm going to point you to the excellent write-up on the Apache Arrow blog here: https://arrow.apache.org/blog/2022/10/08/arrow-parquet-encoding-part-2/ which explains how to decode definition and repetition levels (yes it's in the context of the Rust implementation of Parquet, but the basic concepts are the same for the Go implementation).
All of the ColumnChunkReader types allow you to retrieve those Definition and Repetition levels in their ReadBatch methods. For an example have a look at https://pkg.go.dev/github.com/apache/arrow/go/v10#v10.0.1/parquet/file#Float32ColumnChunkReader.ReadBatch
When you call ReadBatch you can pass an []int16 for the definition levels and the repetition levels to be filled in alongside the data, and then you can use those to decode the repeated field accordingly. Personally, I prefer to use the pqarrow package which does it for you, but sometimes you do need the granular access.
I'm currently trying to customize the standard writer built into Pandoc to produce output in the ConTeXt format from Markdown input. Unfortunately, the documentation to create a custom writer found at the Pandoc website is not giving me too much information apart from how to write a custom HTML writer. So, I would like to ask for help with some fundamental ideas:
What would be the preferrable way to add some (probably) very simple functionality to the ConTeXt writer, e.g.: I would like to rewrite the sequence of characters " - " (in a Markdown document) as another sequence "~-- " (in the resulting ConTeXt document).
If I understood correctly, I'm supposed to base my custom writer on the standard (built-in) writers... But where can I find these? There doesn't seem to be anything in /usr/share/pandoc/(I'm working on Linux).
The website mentions the "classic style" and the "new style". Apart from one obviously being newer, what style am I supposed to use?
I know that these questions may sound rather simple, but there doesn't seem to be a lot of information available beyond the usual basic stuff. Any help would be much appreciated.
Pandoc has another feature that is similar to custom writers, called "Lua filters". Filters are quite likely a simpler and better choice in this case: They allow to modify the internal document representation. E.g.:
function Inlines (inlines)
for i=#inlines, 3, -1 do -- iterate backwards through the list of inlines
if inlines[i-2].t == 'Space' and inline[i-1] == pandoc.Str '-' and
inlines[i].t == 'Space' then
-- Replace elements with raw ConTeXt
inlines[i-2] = pandoc.RawInline('context', '~--')
inlines:remove(i)
inlines:remove(i-1)
end
end
return inlines
end
The above would be used b writing it to a file, and then passing that file to pandoc via --lua-filters FILENAME.lua
The documentation for Lua filters is also less sparse and hopefully more approachable than the docs for custom writers.
I am looking for a sample bank of swift messages, Say for example 101 the swift provides a extensive documentation on the website.
But there is no full sample. a complete swift message which could be used for testing and constructing a parser.
did try google but could not find samples for all the message types.
I had some free time so thought i could do some good by writing the parser.
I'm one of the authors of the Prowide libraries for SWIFT. You can use our open source SWIFT MT parser instead of writing your own:
https://github.com/prowide/prowide-core
For the messages structure, our library model Javadoc might be useful.
Take a look at Javadocs for each MTnnn class where you will find the message structure in terms of the sequences and the mandatory and optional fields.
Then in the Javadocs for each Fieldnnn you can see the internal structure for each field components.
https://www.javadoc.io/doc/com.prowidesoftware/pw-swift-core
As for samples for all message types, you might be able to find some googling, but there is no comprehensive sample store that I know.
Say I have some custom file format that has some logical format defined. I'd like to "unmarshall" the "objects" from the file. Then how can I use java8 Streams in a parallel fashion to unmarshall the objects?
Is this unreasonable can you explain a more reasonable approach?
Is this not possible? If not is this possible outside of java8, or java9 or scala? Can you provide an example?
[
abc:123,
xy:"yz",
s12:13,
],
...
[
abc:1
s:133,
]
It seems that Parallel message unmarshalling from a token delimited input stream with Java8 stream API is asking something similar but not necessarily from a file perspective. It wasn't clear to me, but I think for that issue it's not possible for java8.
I'd like to create a Base16 encoder and decoder for Lion's new Security.framework to complement kSecBase32Encoding and kSecBase64Encoding. Apple's documentation shows how to write a custom transform (Caesar cipher) using SecTransformRegister. As far as I can tell, custom transforms registered this way have to operate symmetrically on the data and can't be used to encode and decode data differently. Does anyone know if writing a custom encoder/decoder is possible, and if so, how?
I don't see a way to tie custom encoders into SecEncodeTransformCreate(), which is what kSecBase32Encoding and the others are based on. But it's easy to create a transform that accepts a "encoding" bool and makes use of that to decide whether to encode or decode. In the CaesarTransform example, they attach an attribute called key with SecTransformSetAttribute(). You'd do the same thing, but with a bool encode.
And of course you could just create an encoding transform and a decoding transform.