pycparser - Getting source line number from AST - pycparser

I'm looking into parsing a C-file with pycparser and I'm trying to get the source line number from the AST generated by pycparser. Is this possible?

It is possible with the "coord" Object. Have a look at the coord class in the plyparser.py
https://bitbucket.org/eliben/pycparser/src/b169b693a194/pycparser/plyparser.py?at=default

I couldn't exactly figure out how to use the Coord class, but having read a bit at that part of the code, it turns out, that there is a .show() method for AST nodes, which accepts a showcoord boolean argument; so you can write in your Python code:
ast_node.show(showcoord=True)
... and this will print out the structure of the node, annotated with filename and line number and column number - something like:
Decl: my_array, ['const'], [], ['extern'], [] (at my_header.h:41:20)
ArrayDecl: [] (at my_header.h:41:20)
PtrDecl: [] (at my_header.h:41:18)
TypeDecl: my_array, ['const'], None (at my_header.h:41:20)
IdentifierType: ['void'] (at my_header.h:41:14)
...

Related

how to iterate over a list of values returning from ops to jobs in dagster

I am new to the dagster world and working on ops and jobs concepts. \
my requirement is to read a list of data from config_schema and pass it to #op function and return the same list to jobs. \
The code is show as below
#op(config_schema={"table_name":list})
def read_tableNames(context):
lst=context.op_config['table_name']
return lst
#job
def write_db():
tableNames_frozenList=read_tableNames()
print(f'-------------->',type(tableNames_frozenList))
print(f'-------------->{tableNames_frozenList}')
when it accepts the list in #op function, it is showing as a frozenlist type but when i tried to return to jobs it conver it into <class 'dagster._core.definitions.composition.InvokedNodeOutputHandle'> data type
My requirement is to fetch the list of data and iterate over the list and perform some operatiosn on individual data of a list using #ops
Please help to understand this
Thanks in advance !!!
When using ops / graphs / jobs in Dagster it's very important to understand that the code defined within a #graph or #job definition is only executed when your code is loaded by Dagster, NOT when the graph is actually executing. The code defined within a #graph or #job definition is essentially a compilation step that only serves to define the dependencies between ops - there shouldn't be any general-purpose python code within those definitions. Whatever operations you want to perform on data flowing through your job should take place within the #op definitions. So if you wanted to print the values of your list that is be input via a config schema, you might do something like
#op(config_schema={"table_name":list})
def read_tableNames(context):
lst=context.op_config['table_name']
context.log.info(f'-------------->',type(tableNames_frozenList'))
context.log.info(f'-------------->{tableNames_frozenList}')
here's an example using two ops to do this data flow:
#op(config_schema={"table_name":list})
def read_tableNames(context):
lst=context.op_config['table_name']
return lst
#op
def print_tableNames(context, table_names):
context.log.info(f'-------------->',type(table_names)
#job
def simple_flow():
print_tableNames(read_tableNames())
Have a look at some of the Dagster tutorials for more examples

Problems with '._ElementUnicodeResult'

While trying to help another user out with some question, I ran into the following problem myself:
The object is to find the country of origin of a list of wines on the page. So we start with:
import requests
from lxml import etree
url = "https://www.winepeople.com.au/wines/Dry-Red/_/N-1z13zte"
res = requests.get(url)
content = res.content
res = requests.get(url)
tree = etree.fromstring(content, parser=etree.HTMLParser())
tree_struct = etree.ElementTree(tree)
Next, for reasons I'll get into in a separate question, I'm trying to compare the xpath of two elements with certain attributes. So:
wine = tree.xpath("//div[contains(#class, 'row wine-attributes')]")
country = tree.xpath("//div/text()[contains(., 'Australia')]")
So far, so good. What are we dealing with here?
type(wine),type(country)
>> (list, list)
They are both lists. Let's check the type of the first element in each list:
type(wine[0]),type(country[0])
>> (lxml.etree._Element, lxml.etree._ElementUnicodeResult)
And this is where the problem starts. Because, as mentioned, I need to find the xpath of the first elements of the wine and country lists. And when I run:
tree_struct.getpath(wine[0])
The output is, as expected:
'/html/body/div[13]/div/div/div[2]/div[6]/div[1]/div/div/div[2]/div[2]'
But with the other:
tree_struct.getpath(country[0])
The output is:
TypeError: Argument 'element' has incorrect type (expected
lxml.etree._Element, got lxml.etree._ElementUnicodeResult)
I couldn't find much information about _ElementUnicodeResult), so what is it? And, more importantly, how do I fix the code so that I get an xpath for that node?
You're selecting a text() node instead of an element node. This is why you end up with a lxml.etree._ElementUnicodeResult type instead of a lxml.etree._Element type.
Try changing your xpath to the following in order to select the div element instead of the text() child node of div...
country = tree.xpath("//div[contains(., 'Australia')]")

Jackrabbit XPath Query: UUID with leading number in path

I have what I think is an interesting problem executing queries in Jackrabbit when a node in the query path is a UUID that start with a number.
For example, this query work fine as the second node starts with a letter, 'f':
/*/JCP/feeadeaf-1dae-427f-bf4e-842b07965a93/label//*[#sequence]
This query however does not, if the first 'f' is replaced with '2':
/*/JCP/2eeadeaf-1dae-427f-bf4e-842b07965a93/label//*[#sequence]
The exception:
Encountered "-" at line 1, column 26.
Was expecting one of:
<IntegerLiteral> ...
<DecimalLiteral> ...
<DoubleLiteral> ...
<StringLiteral> ...
... rest omitted for brevity ...
for statement: for $v in /*/JCP/2eeadeaf-1dae-427f-bf4e-842b07965a93/label//*[#sequence] return $v
My code in general
def queryString = queryFor path
def queryManager = session.workspace.queryManager
def query = queryManager.createQuery queryString, Query.XPATH // fails here
query.execute().nodes
I'm aware my query, with the leading asterisk, may not be the best, but I'm just starting out with querying in general. Maybe using another language other than XPATH might work.
I tried the advice in this post, adding a save before creating the query, but no luck
Jackrabbit Running Queries against UUID
Thanks in advance for any input!
A solution that worked was to try and properly escape parts of the query path, namely the individual steps used to build up the path into the repository. The exception message was somewhat misleading, at least to me, as in made me think that the hyphens were part of the root cause. The root problem was that the leading number in the node name created an illegal XPATH query as suggested above.
A solution in this case is to encode the individual steps into the path and build the rest of the query. Resulting in the leading number only being escaped:
/*/JCP/_x0032_eeadeaf-1dae-427f-bf4e-842b07965a93//*[#sequence]
Code that represents a list of steps or a path into the Jackrabbit repository:
import org.apache.commons.lang3.StringUtils;
import org.apache.jackrabbit.util.ISO9075;
class Path {
List<String> steps; //...
public String asQuery() {
return steps.size() > 0 ? "/*" + asPathString(encodedSteps()) + "//*" : "//*";
}
private String asPathString(List<String> steps) {
return '/' + StringUtils.join(steps, '/');
}
private List<String> encodedSteps() {
List<String> encodedSteps = new ArrayList<>();
for (String step : steps) {
encodedSteps.add(ISO9075.encode(step));
}
return encodedSteps;
}
}
Some more notes:
If we escape more of the query string as in:
/_x002a_/JCP/_x0032_eeadeaf-1dae-427f-bf4e-842b07965a93//_x002a_[#sequence]
Or the original path encoded as a whole as in:
_x002f_a_x002f_fffe4dcf0-360c-11e4-ad80-14feb59d0ab5_x002f_2cbae0dc-35e2-11e4-b5d6-14feb59d0ab5_x002f_c
The queries do not produce the wanted results.
Thanks to #matthias_h and #LarsH
An XML element name cannot start with a digit. See the XML spec's rules for STag, Name, and NameStartChar. Therefore, the "XPath expression"
/*/JCP/2eeadeaf-1dae-427f-bf4e-842b07965a93/label//*[#sequence]
is illegal, because the name test 2eead... isn't a legal XML name.
As such, you can't just use any old UUID as an XML element name nor as a name test in XPath. However if you put a legal NameStartChar on the front (such as _), you can probably use any UUID.
I'm not clear on whether you think you already have XML data with an element named <2eead...> (and are trying to query that element's descendants); if so, whatever tool produced it is broken, as it emits illegal XML. On the other hand if the <2eead...> is something that you yourself are creating, then presumably you have the option of modifying the element name to be a legal XML name.

Pygments lexer for AspectJ

I just asked the support guys on GitHub why AspectJ (*.aj) files are not syntax-highlighted. The answer was that they are using Pygments, but are unaware of any existing lexer for AspectJ. I did a quick web search and did not find any either. Has anyone here written one or can point me to a link for an existing one?
Long ago I have written a lexer for Kconfig (Linux kernel configuration) files, but it was rather hard for me because I do not speak Python. So before I start torturing my brain again, I thought I should better ask first instead of possibly re-inventing the wheel.
After having created a "copy, paste & modify" solution of JavaLexer initially because I really do not speak Python, I managed to hack another quick'n'dirty solution which subclasses JavaLexer and delegates lexing to it for the most part. Exceptions are
AspectJ-specific keywords,
handling of inter-type declarations followed by colons without a space not as Java labels, but as AspectJ keywords plus ":" operator and
handling of inter-type annotation declarations as AspectJ keywords and not as Java name decorators.
I am sure my little heuristic solution misses some details, but as Andrew Eisenberg said: an imperfect, but working solution is better than a non-existent perfect one:
class AspectJLexer(JavaLexer):
"""
For `AspectJ <http://www.eclipse.org/aspectj/>`_ source code.
"""
name = 'AspectJ'
aliases = ['aspectj']
filenames = ['*.aj']
mimetypes = ['text/x-aspectj']
aj_keywords = [
'aspect', 'pointcut', 'privileged', 'call', 'execution',
'initialization', 'preinitialization', 'handler', 'get', 'set',
'staticinitialization', 'target', 'args', 'within', 'withincode',
'cflow', 'cflowbelow', 'annotation', 'before', 'after', 'around',
'proceed', 'throwing', 'returning', 'adviceexecution', 'declare',
'parents', 'warning', 'error', 'soft', 'precedence', 'thisJoinPoint',
'thisJoinPointStaticPart', 'thisEnclosingJoinPointStaticPart',
'issingleton', 'perthis', 'pertarget', 'percflow', 'percflowbelow',
'pertypewithin', 'lock', 'unlock', 'thisAspectInstance'
]
aj_inter_type = ['parents:', 'warning:', 'error:', 'soft:', 'precedence:']
aj_inter_type_annotation = ['#type', '#method', '#constructor', '#field']
def get_tokens_unprocessed(self, text):
for index, token, value in JavaLexer.get_tokens_unprocessed(self, text):
if token is Name and value in self.aj_keywords:
yield index, Keyword, value
elif token is Name.Label and value in self.aj_inter_type:
yield index, Keyword, value[:-1]
yield index, Operator, value[-1]
elif token is Name.Decorator and value in self.aj_inter_type_annotation:
yield index, Keyword, value
else:
yield index, token, value
Syntax highlighting for aspectj should be quite straight forward to implement if you start with a Java lexer. The lexer would be identical to Java's with some extra keywords.
See here for a list of the AspectJ-specific keywords:
http://git.eclipse.org/c/ajdt/org.eclipse.ajdt.git/tree/org.eclipse.ajdt.core/src/org/eclipse/ajdt/core/AspectJPlugin.java
And here for the Java keywords:
http://git.eclipse.org/c/ajdt/org.eclipse.ajdt.git/tree/org.eclipse.ajdt.ui/src/org/eclipse/ajdt/internal/ui/editor/AspectJCodeScanner.java

How do I marshal a lambda (Proc) in Ruby?

Joe Van Dyk asked the Ruby mailing list:
Hi,
In Ruby, I guess you can't marshal a lambda/proc object, right? Is
that possible in lisp or other languages?
What I was trying to do:
l = lamda { ... }
Bj.submit "/path/to/ruby/program", :stdin => Marshal.dump(l)
So, I'm sending BackgroundJob a lambda object, which contains the
context/code for what to do. But, guess that wasn't possible. I
ended up marshaling a normal ruby object that contained instructions
for what to do after the program ran.
Joe
You cannot marshal a Lambda or Proc. This is because both of them are considered closures, which means they close around the memory on which they were defined and can reference it. (In order to marshal them you'd have to Marshal all of the memory they could access at the time they were created.)
As Gaius pointed out though, you can use ruby2ruby to get a hold of the string of the program. That is, you can marshal the string that represents the ruby code and then reevaluate it later.
you could also just enter your code as a string:
code = %{
lambda {"hello ruby code".split(" ").each{|e| puts e + "!"}}
}
then execute it with eval
eval code
which will return a ruby lamda.
using the %{} format escapes a string, but only closes on an unmatched brace. i.e. you can nest braces like this %{ [] {} } and it's still enclosed.
most text syntax highlighters don't realize this is a string, so still display regular code highlighting.
If you're interested in getting a string version of Ruby code using Ruby2Ruby, you might like this thread.
Try ruby2ruby
I've found proc_to_ast to do the best job: https://github.com/joker1007/proc_to_ast.
Works for sure in ruby 2+, and I've created a PR for ruby 1.9.3+ compatibility(https://github.com/joker1007/proc_to_ast/pull/3)
Once upon a time, this was possible using ruby-internal gem (https://github.com/cout/ruby-internal), e.g.:
p = proc { 1 + 1 } #=> #<Proc>
s = Marshal.dump(p) #=> #<String>
u = Marshal.load(s) #=> #<UnboundProc>
p2 = u.bind(binding) #=> #<Proc>
p2.call() #=> 2
There are some caveats, but it has been many years and I cannot remember the details. As an example, I'm not sure what happens if a variable is a dynvar in the binding where it is dumped and a local in the binding where it is re-bound. Serializing an AST (on MRI) or bytecode (on YARV) is non-trivial.
The above code works on YARV (up to 1.9.3) and MRI (up to 1.8.7). There's no reason why it cannot be made to work on Ruby 2.x, with a small amount of effort.
If proc is defined into a file, U can get the file location of proc then serialize it, then after deserialize use the location to get back to the proc again
proc_location_array = proc.source_location
after deserialize:
file_name = proc_location_array[0]
line_number = proc_location_array[1]
proc_line_code = IO.readlines(file_name)[line_number - 1]
proc_hash_string = proc_line_code[proc_line_code.index("{")..proc_line_code.length]
proc = eval("lambda #{proc_hash_string}")

Resources