How to build a Control Flow Graph (CFG) from a JSON object (AST) - static-analysis

I want to build a control flow graph (CFG) from an AST given in JSON format. So this AST is automatically created in TouchDevelop against each script. And since TouchDevelop is not Object Oriented programming, can I still use the Visitor pattern? Any useful pointers would be appreciated.
Update1: My problem is that I don't understand where to start. From the internet, I am supposed to use Visitor Pattern to walk through AST to visit each node and collect information. And from there, I can build a CFG and then do Data Flow analysis. But there are two issues:
1) AFAIK, I need object oriented programming model to use Visitor Pattern, (I might be wrong) which TouchDevelop is NOT.
2) The AST as given below is not in AST format as I find on the internet. It's in JSON format. I think I could parse the JSON to convert it into the desired AST structure, but I am not so sure.
Source code of a sample script
meta version "v2.2,nothing";
meta name "DivideByZero";
//
meta platform "current";
action main() {
(5 / 0)→post_to_wall;
}
Resulting AST (JSON formatted) is given below:
{
"type":"app",
"version":"v2.2,nothing",
"name":"DivideByZero",
"icon":null,
"color":null,
"comment":"",
"things":[
{
"type":"action",
"name":"main",
"isEvent":false,
"outParameters":[
],
"inParameters":[
],
"body":[
{
"type":"exprStmt",
"tokens":[
{
"type":"operator",
"data":"("
},
{
"type":"operator",
"data":"5"
},
{
"type":"operator",
"data":"/"
},
{
"type":"operator",
"data":"0"
},
{
"type":"operator",
"data":")"
},
{
"type":"propertyRef",
"data":"post to wall"
}
]
}
],
"isPrivate":false
}
]
}

I didn't find a reference to the TouchDevelop scripting language yet. I don't know what you can do with it and what you can't.
You don't necessarily have to use a visitor pattern. Visitor patterns is the method used when your abstract syntax tree is described by instances of nodes from a class hierarchy. The conversion from AST to CFG is more general than that. An abstract syntax tree is an abstract data type, a special case of tree. Like any other abstract data type, it can be represented in many ways. It doesn't matter how you do it, but the only thing you need to do, is to iterate over this tree. And the iteration method you have depend on the language you are using. This should answer your question 2/: a JSON string may be a representation of an AST. The AST is an abstract data type while the JSON string is an implementation of this abstract data type.
In JSON, you can have values, arrays or sets of (key,value) associations. I can probably assume that your AST nodes will be the set of (key,value) associations. I assume as well, that each of these nodes have a key named type which allow you to identify what kind of node it is.
If I am correct, this answer the question: why you don't need a visitor pattern. A visitor pattern allows us to extract the type of each node. (this is what is called "double dispatch") But here, you don't need it since the type of each node is encoded in the type field.
Typically, the conversion from AST to CFG is done by using a set of functions: one function for each type of node in the AST. Each of these functions need to write the CFG part associated with the node it takes as parameter. It will recursively call conversion functions for the children nodes. (This is what a visitor pattern would do, in case of OO-AST)
For instance, you'll have a function ConvertNode. This function will read the type field of a node, and call the according conversion function with the node. Your root node have type app. Then the ConvertNode function will dispatch to the ConvertApp function. ConvertApp will read some fields like name and will iterate over the things array and call ConvertNode for each of these nodes. Then again ConvertNode will dispatch the call to the appropriate function.
The way those conversion functions will be called follow exactly the AST structure. How the CFG is created when you iterate over the tree is dependent of the input language. Each of the conversion function may return a constructed node or transition of your CFG to allow the caller to reuse it. Or the caller might pass a node or transition as parameter to allow the called function to continue the construction from there. You are free to choose the appropriate way to build the CFG and to break the general rules: there may clever ways to simplify the construction.

Related

How to unmarshal protbuf data into custom struct in Golang

I have a proxy service that translate protobuf into another struct. I
can just write some manual code to do that, but that is inefficient and boilerplate. I can also transform the protobuf data to JSON, and deserlize the JSON data into the destination struct, but the speed is slow and it is CPU heavy.
The Unmarshaler interface is now deprecated, and Message interface have internal types which I cannot implement in my project.
Is there a way I can do this now?
Psuedo code: basically, if Go's reflection supports setting and getting of struct / class fields by some sort of field identifier, then you can do this. Something like this in C# works, so long as the field types in the two classes are the same (because in C#, I'm doing object = object, which ends up being OK if they're the same actual type).
SourceStructType sourceStruct;
DestStructType destStruct;
foreach (Field sourceField in sourceStruct.GetType().GetFields())
{
Field destField = destStruct.GetType().FindFieldByName(sourceField.name);
destStruct.SetFieldValue(destField) = sourceStruct.GetFieldValue(sourceField);
}
If the structs are more complex - i.e. they have structs within them, then you'll have to recurse down into them. It can get fiddly, but once written you'll never have to write it ever again!

Data Structure - Abstract data type VS Concrete data type

Abstract data type (ADT) : Organized data and operations on this data
Examples : Stack, Queue
what's meaning of Concrete data type (CDT)?
please explain by Examples.
One way to understand it is that an ADT is a specification of an object with certain methods.
For example if we talk about a List we are referring to an object that performs list operations such as:
add to the beginning
add to the end
insert at position
size
etc..
A "concrete data type" in this context would refer to the actual Data Structure you use to implement the list.
For example, one implementation of a List is to create nodes with a value and next pointer to point to the next node in the list.
Another is to have a value array, and a next array to tell you where the next node is (this is a more popular implementation for parallelism).
And yet another is to have a dynamic array (known as an ArrayList in Java) where you use an array till it fills up and then you duplicate it's size and copy the values to the new array.
So the concrete data type refers to the data structure actually being used, whereas the ADT is the abstract concept like List, Dictionary, Stack, Queue, Graph, etc..
There are many ways to implement an ADT.

Why are there "two names" for each GraphQL query/mutation?

I am learning GraphQL and one basic point has me puzzled. I know there is an easy explanation, but I can't find it. Specifically, from the Apollo documentation (https://www.apollographql.com/docs/apollo-server/essentials/data.html#operation):
...it makes sense to name the operation in order to quickly identify
operations during debugging or to aggregate similar operations
together...Operations can be named by placing an identifier after the
query or mutation keyword, as we’ve done with HomeBookListing here:
query HomeBookListing {
getBooks {
title
}
}
If HomeBookListing is the name of the query, what, then, is getBooks? The name of the resolver?
Similarly, when you pass variables to a query, why are there "two levels" of parameters, like this
mutation HomeQuickAddBook($title: String, $author: String = "Anonymous") {
addBook(title: $title, author: $author) {
title
}
}
So, would $title: String, $author: String = "Anonymous" be the variables passed to the query, and title: $title, author: $author variables passed to the resolver?
Of course I can memorise the pattern, but I'm keen to understand, conceptually, what the different pieces are doing here. Any insights much appreciated!
You may find it helpful to review the spec, but what follows is a somewhat shorter explanation:
What is an operation?
There are three operations in GraphQL (query, mutation and subscription). Typically, a GraphQL request consists of only one of these three operations, and it forms the root of the request, or the entry point into the rest of the schema.
Each operation has a single object type associated with it. By convention, these types are named Query, Mutation and Subscription, but their naming is functionally irrelevant to your schema. Other than their association with a particular operation, there's nothing special about these object types -- each has a name, description and fields just like any other object type in your schema. Collectively, we call these three types root operation types.
In your example, the query root type has a field called getBooks. That field is resolved according to the same rules as any other field in your schema. The only special thing about this field is that it's at the root -- there is no "parent" field that was resolved before it.
Operation names are optional because they do not impact the data returned by the server -- they are there generally for debugging purposes (although some clients and tools use them to provide other features, so it's always good to have them). Specifying at least one field name for your root operation type, however, is necessary, otherwise your operation would not actually do anything (i.e. query the server for the data). Again, these fields are your entry point into the rest of the schema and the starting point for your data graph.
Ok, but what about the variables?
According to the spec:
Variables must be defined at the top of an operation and are in scope throughout the execution of that operation.
While we do not initialize a variable inside the document with a value, we do need to define it by telling GraphQL what the type of the variable it is. This allows GraphQL to then validate the usages of your variables throughout the document. For example, if you define a variable as a String and then attempt to use it at an input field that is an Int, validation will fail and your request will blow up before it is even executed.
Variables are always defined as part of the operation definition -- they can be used anywhere in the document, though, even multiple times. So there are no "two levels of parameters" here -- one line is simply the definition, the other line is usage.
A word on semantics
Even though we have a spec, the language around GraphQL has evolved past the terms outlined inside it. The term "query" has taken on multiple meanings that you may encounter while reviewing various docs and articles. It helps to keep these definitions in mind to avoid getting confused:
By convention, we name the root operation type associated with the query operation the Query type
Informally, the fields on that Query (i.e. getBooks) that are often referred to as the "queries" of your schema (just like the fields on the Mutation type are often called the "mutations" of your schema.
The complete request string we send to the server, which includes the whole operation and any relevant fragments is officially called the document. However, we often refer to making a request as querying your server. This has led to the document itself often being called a query, whether the operation is contains is actually a query or a different operation like a mutation.

Best way to validate and extend constructor parameters in Scala 2.10

I want to have a class that has a number of fields such as String, Boolean, etc and when the class is constructed I want to have a fieldname associated with each field and verify the field (using regex for strings). Ideally I would just like specify in the constructor that the parameter needs to meet certain criteria.
Some sample code of how :
case class Data(val name: String ..., val fileName: String ...) {
name.verify
// Access fieldName associated with the name parameter.
println(name.fieldName) // "Name"
println(fileName.fieldName) // "File Name"
}
val x = Data("testName", "testFile")
// Treat name as if it was just a string field in Data
x.name // Is of type string, does not expose fieldName, etc
Is there an elegant way to achieve this?
EDIT:
I don't think I have been able to get across clearly what I am after.
I have a class with a number of string parameters. Each of those parameters needs to validated in a specific way and I also want to have a string fieldName associated with each parameter. However, I want to still be able to treat the parameter as if it was just a normal string (see the example).
I could code the logic into Data and as an apply method of the Data companion object for each parameter, but I was hoping to have something more generic.
Putting logic (such as parameter validation) in constructors is dubious. Throwing exceptions from constructors is doubly so.
Usually this kind of creational pattern is best served with one or more factory methods or a builder of some sort.
For a basic factory, just define a companion with the factory methods you want. If you want the same short-hand construction notation (new-free) you can overload the predefined apply (though you may not replace the one whose signature matches the case class constructor exactly).
If you want to spare your client code the messiness of dealing with exceptions when validation fails, you can return Option[Data] or Either[ErrorIndication, Data] instead. Or you can go with ScalaZ's Validation, which I'm going to arbitrarily declare to be beyond the scope of this answer ('cause I'm not sufficiently familiar with it...)
However, you cannot have instances that differ in what properties they present. Not even subclasses can subtract from the public API. If you need to be able to do that, you'll need a more elaborate construct such as a trait for the common parts and separate case classes for the variants and / or extensions.

How can I extract a part of a xaml object graph via linq to xml?

I have an object graph serialized to xaml. A rough sample of what it looks like is:
<MyObject xmlns.... >
<MyObject.TheCollection>
<PolymorphicObjectOne .../>
<HiImPolymorphic ... />
</MyObject.TheCollection>
</MyObject>
I want to use Linq to XML in order to extract the serialized objects within the TheCollection.
Note: MyObject may be named differently at runtime; I'm interested in any object that implements the same interface, which has a public collection called TheCollection that contains types of IPolymorphicLol.
The only things I know at runtime are the depth at which I will find the collection and that the collection element is named ``*.TheCollection`. Everything else will change.
The xml will be retrieved from a database using Linq; if I could combine both queries so instead of getting the entire serialized graph and then extracting the collection objects I would just get back the collection that would be sweet.
Will,
It is not possible to find out whether an object implements some interface by looking at XAML.
With constraints given you can find xml element that has a child named .
You can use following code:
It will return all elements having child element which name ends with .TheCollection
static IEnumerable<XElement> FindElement(XElement root)
{
foreach (var element in root.Elements())
{
if (element.Name.LocalName.EndsWith(".TheCollection"))
{
yield return element.Parent;
}
foreach (var subElement in FindElement(element))
{
yield return subElement;
}
}
}
To make sure that object represented by this element implements some interface you need to read metadata from your assemblies. I would recommend you to use Mono.Cecil framework to analyze types in your assemblies without using reflection.
#aku
Yes, I know that xaml doesn't include any indication of base types or interfaces. But I do know the interface of the root objects, and the interface that the collection holds, at compile time.
The serialized graphs are stored in a sql database as XML, and we're using linq to retrieve them as XElements. Currently, along with your solution, we are limited to deserializing the graphs, iterating through them, pulling out the objects we want from the collection, removing all references to them from, and then disposing, their parents. Its all very kludgy. I was hoping for a single stroke solution; something along the lines of an xpath, but inline with our linq to sql query that returns just the elements we're looking for...

Resources