Java Bytecode manipulation libraries - bytecode

I am starting to work on a project and for one of the tasks I need to analyze the source code in order to gather information about the classes and their methods. More specifically, for each method I need to know which internal attributes and external objects (references) it uses throughout the entire method body.
I discussed it with my supervisors and they think that Bytecode manipulation libraries is the way to go. I already looked at BCEL, ASM and Javassist but I'm not sure which one I need to use. Do they all provide access to the method body where I can see all the instructions and get the information I need?
Any advice would be appreciate it. Thank you!

If you really “need to analyze the source code”, then libraries which allow to inspect the bytecode are not the way to go.
Otherwise, you really need to define your task precisely. Either, you are about to analyze classes, regardless of whether you will look at their source code or byte code, or you want to analyze source code and consider doing it by compiling first, followed by analyzing the compiled result. In the latter case, you have to compare the effort of both steps with alternative solution, which may, e.g. incorporate direct source code analysis.
Parsing byte code is rather easy, easier than analyzing source code, which is the reason why bytecode is produced prior to the execution of Java programs. To answer your concrete question, yes, all three libraries offer you a way to analyze the instructions and associated information. Which one is the best to fit your needs, is a question that is beyond the scope of Stackoverflow.
Whether analyzing the byte code helps, depends on your exact requirements. When it comes to field and method access, you may precisely get most of them using that approach. Only inlined compile-time constants lack their origins. When it comes to type use, you have to consider that not every source code artifact has an existing counterpart in the byte code, e.g. widening casts produce no actual code and and local variables usually don’t have a declared type (debugging information aside), but only an implied type which depends on how they are actually used. They also have no information about Generics, unless debugging information has been included.

Related

Including ComplexExpr and ComplexFunc classes in Halide API

The ComplexExpr and ComplexFunc classes in the links below seem very convenient to work with complex numbers. Is there a plan to include them into the official Halide API? Or is there a reason why they are not included?
https://github.com/halide/Halide/blob/master/apps/fft/complex.h
https://github.com/halide/Halide/blob/be1269b15f4ba8b83df5fa0ef1ae507017fe1a69/apps/fft/funct.h
Speaking as a Halide developer...
Or is there a reason why they are not included?
We haven't included these historically since we didn't want to bless a particular representation for complex numbers. There are a few valid ways of dealing with them and the headers in question are just one.
Is there a plan to include them into the official Halide API?
We've started talking about packaging some of this type of code into a set of header-only "Halide tools" libraries, so named to avoid the normative implication of calling it something like "stdlib". So as of right now, there is no concrete plan, but the odds are nonzero.
In the meantime, the code is MIT licensed, so you should feel free to use those files, regardless.

Will go compilers ignore unused functions

If there is a function from an external package that is not used at all in my project, will the compiler remove the function from the generated machine code?
This question could be targeted at any language compiler in general. But, I think the behaviour may vary language to language. So, I am interested in knowing what does go compilers do.
I would appreciate any help on understanding this.
The language spec does not mention this anywhere, and from a correctness point of view this is irrelevant.
But know that the current version does remove certain constructs that the compiler can prove is not used and will not change the runtime behaviour of the app.
Quoting from The Go Blog: Smaller Go 1.7 binaries:
The second change is method pruning. Until 1.6, all methods on all used types were kept, even if some of the methods were never called. This is because they might be called through an interface, or called dynamically using the reflect package. Now the compiler discards any unexported methods that do not match an interface. Similarly the linker can discard other exported methods, those that are only accessible through reflection, if the corresponding reflection features are not used anywhere in the program. That change shrinks binaries by 5–20%.
Methods are a "harder" case than functions because methods can be listed and called with reflection (unlike functions), but the Go tools do what they can even to remove unused methods too.
You can see examples and proof of removed / unlinked code in this answer:
How to remove unused code at compile time?
Also see other relevant questions:
Splitting client/server code
Call all functions with special prefix or suffix in Golang

Scala dynamic class management

I would like to know if the following is possible in Scala (but I think the question can be applied also to Java):
Create a Scala file dynamically (ok, no problem here)
Compile it (I don't think this would be a real problem)
Load/Unload the new class dynamically
Aside from knowing if dynamic code loading/reloading is possible (it's possible in Java so I think it's feasible also in Scala) I would like also to know the implication of this in terms of performance degradation (I could have many many classes, with no name clash but really many of them!).
TIA!
P.S.: I know other questions about class loading in Scala exist, but I haven't been able to find an answer about performance!
Yes, everything you want to do is certainly possible. You might like to take a look at ScalaMock, which is an example of creating Scala source code dynamically. And at SBT which is an example of calling the compiler from code. And then there are many different systems that load classes dynamically - look at the documentation for loadLibrary as a starting point.
But, depending on what you want to achieve, you might like to look at Scala Macros instead. They provide the same kind of flexibility as you would get by generating source code and then compiling it, but without many of the downsides of that approach. The original version of ScalaMock used to work by generating source code, but I'm in the process of moving to using macros instead.
It's all possible in Scala, as is clearly demonstrated by the REPL. It's even going to be relatively easy with Scala 2.10.

How can one get a list of Mathematica's built-in global rewrite rules?

I understand that over a thousand built-in rewrite rules in Mathematica populate the global rules table by default. Is there any way to get Mathematica to give a full or even partial list of those rules?
The best way is to get a job at Wolfram Research.
Failing that, I think that for things not completely compiled into the kernel you can recover most of the rules/definitions. Look at
Attributes[fn]
where fn is the command that you're interested in. If it returns
{Protected, ReadProtected}
then there's something you can get a look at (although often it's just a MakeBoxes (formatting) definition or a AutoLoad/Stub type definition). To see what's there run
Unprotect[fn];
ClearAttributes[fn, ReadProtected];
??fn
Quite often you'll have to run an example of the command to load it if it was a stub. You'll also have to dig down from the user-facing commands to the back-end implementations.
Eventually you'll most likely reach a core command that is compiled into the kernel that you can not see the details of.
I previously mentioned this in tips for creating Graph diagrams and it got a mention in What is in your Mathematica tool bag?.
An good example, with a nice bite-sized and digestible bit of code is Experimental`AngularSlider[] mentioned in Circular/Angular slider. I'll leave it up to you to look at the code produced.
Another example is something like BoxWhiskerChart, where you need to call it once in order to load all of the code. Then you see that BoxWhiskerChart proceeds to call Charting`iBoxWhiskerChart which you'll have to unprotect to look at, etc...

Is there any downside to redundant qualifiers? Any benefit?

For example, referencing something as System.Data.Datagrid as opposed to just Datagrid. Please provide examples and explanation. Thanks.
The benefit is that you don't need to add an import for everything you use, especially if it's the only thing you use from a particular namespace, it also prevents collisions.
The downside, of course, is that the code balloons out in size and gets harder to read the more you use specific qualifiers.
Personally I tend to use imports for most things unless I know for sure I will only be using something from a particular namespace once or twice, so it won't impact the readability of my code.
You're being very explicit about the type you're referencing, and that is a benefit. Although, in the very same process you're giving up code clarity, which clearly is a downside in my case, as I want code to be readable and understandable. I go for the short version unless I have a conflict in different namespaces which can only be solved with the explicit referencing to classes.. Unless I make an alias for it with the keyword using:
using Datagrid = System.Data.Datagrid;
Actually the full path is global::System.Data.DataGrid. The point of using a more qualified path is to avoid having to use additional using statements, especially if the introduction of another using will cause problems with type resolution. More fully qualified identifiers exist so that you can be explicit when you need to be explicit, but if the class's namespace is clear, then the DataGrid version is clearer to many.
I generally use the shortest form available in order to keep the code as clean and readable as possible. That's what using directives are for, after all, and tooltips in the VS editor give you instant detail on the provenance of a type.
I also tend to use a namespace tag for RCWs in a COM interop layer, to call out those variables explicitly in the code (they may need special attention on lifecycle and collection), eg
using _Interop = Some.Interop.Namespace;
In terms of performance there is no upside/downside. Everything is resolved at compile time and the generated MSIL is identical whether you use fully-qualified names or not.
The reason why its use is prevalent in the .NET world is because of auto-generated code, such as designer markup. In that case it would be better to fully-qualify names like class names because of possible conflicts with other classes you may have in your code.
If you have a tool like ReSharper, it will actually tell you what fully-qualified references you have are unnecessary (e.g. by graying them out) so you can lop them off. If you frequently cut-paste code across your various code bases, it would be a must to fully qualify them. (then again, why would you want to do cut-paste all the time; it's a bad form of code reuse!)
I don't think there is really a downside, just readability vs actual time spent coding. In general if you don't have namespaces with ambiguous object I don't think it's really needed. Another thing to consider is level of use. If you have one method that uses reflection and you are alright with typeing System.Reflection 10 times, then it's not a big deal but if you plan on using a namespace alot then I would recommend an include.
Depending on your situation, extra qualifiers will generate a warning (if this is what you mean by redundant). If you then treat warnings as errors, that's a pretty serious downside.
I've run into this with GCC for example.
struct A {
int A::b; // warning!
}

Resources