Finding patterns in source code - algorithm

If I wanted to learn about pattern recognition in general what would be a good place to start (recommend a book)?
Also, does anybody have any experience/knowledge on how to go about applying these algorithms to find abstraction patterns in programs? (repeated code, chunks of code that do the same thing, but in slightly different ways, etc.)
Thanks
Edit: I don't mind mathematically intensive books. In fact, that would be a good thing.

If you are reasonably mathematically confident then either of Chris Bishop's books "Pattern Recognition and Machine Learning" or "Neural Networks for Pattern Recognition" are very good for learning about pattern recognition.

It helps if you have access to the parse tree generated during compilation. This way you can look for pieces of the tree which are similar, ignoring the nodes which are deeper than what you are looking at, this way you can pick out e.g. nodes which multiply together two sub-expressions, ignoring the contents of the sub-expressions. You can apply the same logic to a collection of nodes, e.g. you want to find a multiplication of two sub-expressions where those two sub-expressions are additions of more sub-expressions. You first look for multiplies, then check if the two nodes underneath the multiply are additions, ignoring anything any deeper.

I'd suggest looking at the code of some open source project (e.g. FindBugs or SIM)
that does the kind of thing you're talking about.

If you're working in one of the supported languages, IntelliJ idea has a really smart structural search and replace that would fit your problem.

Other interesting projects are PMD and Eclipse.
Eclipse uses AST (abstract syntax trees) for all source code in any project. Tools can then register for certain types of ASTs (like Java source) and get a preprocessed view where they can add additional information (like links to documentation, error markers, etc).

Another project you can look into is Duplo - it's an open-source/GPL project, so you can pore over their approach by grabbing the code from SourceForge.

This is specific to .Net and visual studio, but it finds duplicate code in your project. It does report some false positives I've found but it could be a good place to start.
Clone Detective

One kind of pattern is code that has been cloned by copy and paste methods. See CloneDR for a tool that automatically finds such code in spite of variations in layout and even changes in the body of the clone, by comparing abstract syntax trees for the language in question.
CloneDR works with a variety of langauges: C, C++, C#, Java, JavaScript, PHP, COBOL, Python, ... The website shows clone detection reports for a variety of programming languages.

Related

Marble diagram generator java/javascript for documentation using rxjava/rxjs or reactor

I am looking to create documentation for a project created with reactor library.
I searched but did not found any useful tool that generates photo diagrams after running a piece of reactor(or rx in general) code. The only thing i found is a text based syntax like this.Which I guess is a solution follow if i dont find anything else.
libraries found that use this syntax
https://flames-of-code.netlify.com/blog/rx-marbles/
https://github.com/cescoffier/rx-marble-docker
Ideally i would like to run a piece of code eg.
Flux.from(f1)
.bufferTimeout(writeDbBuffer, Duration.ofSeconds(10))
.parallel()
.runOn(Schedulers.parallel()).subscribe(photosBatch -> {
photoRepository.saveAll(photosBatch);
});
And generate marble diagram in photo or ever text based.
As a solution to the text based syntax mentioned above i could create text generators based on this syntax but this would require a lot of effort and time.
There is any way to generate images with marble diagrams with rxjava, rxjs or preferable reactor library from pieces of code?(I am including rx because is way more popular that reactor)
There is any library generating the above text based syntax from pieces of code?
What other options i have for documentation over these libraries?
also a similar question but not exactly what i am looking for
Something that dynamic is, to my knowledge, not yet available in the Java world. Closest thing I know of is rxfiddle, and to an extent rxmarbles.com (although the later doesn't allow generation from arbitrary pieces of code).
Generating clean and good looking visualization of arbitrary reactive sequences dynamically is no small task, but that's something the Reactor team would love to see at some point (either done officially or by the community).
The text-based solutions are great for simple marbles and simple operators, because you are in essence drawing the marble yourself, using the syntax of each tool (and thus being limited by it).
Higher-order sequences, parallelization, etc... introduce far greater complexity and start to stretch these tools to their limits.

Hadoop starter project suggestions

I would love to find a few topics, thanks.
MergeSort is a fantastic/easy one to start with. You could also go with generating word counts for all words in a file. A good source of data is the Project Gutenberg library of public domain books (you could always concatenate a few of them together).
If you want something more advanced but in the same vein as word count, you could write a very simple distributed spell checker. Peter Norvig as an awesome simple demonstration of a spell checker written in Python. A good exercise would be extending this algorithm to operate on a file in a distributed fashion.
You have a few projects here
There is a few nice and interesting examples of small hadoop projects. Everything is described very well, additionally you can find the source code and all needed theory.

Automate Finding Pertinent Methods in Large Project

I have tried to be disciplined about decomposing into small reusable methods when possible. As the project growing, I am re-implementing the exact same method.
I would like to know how to deal with this in an automated way. I am not looking for an IDE specific solution. Dependency on method names may not be sufficient. Unix and scripting are solutions that would be extremely beneficial. Answers such as "take care" etc. are not the solutions I am seeking.
I think the cheapest solution to implement might be to use Google Desktop. A more accurate solution would probably be much harder to implement - treat your code base as a collection of documents where the identifiers (or tokens in the identifiers) are words of the document, and then use document clustering techniques to find the closest matching code to a query. I'm aware of some research similar to that, but nothing close to out-of-the-box code that you could use. You might try looking on Google Code Search for something. I don't think they offer a desktop version, but you might be able to find some applicable code you can adapt.
Edit: And here's a list of somebody's favorite code search engines. I don't know whether any are adaptable for local use.
Edit2: Source Code Search Engine is mentioned in the comments following the list of code search engines. It appears to be a commercial product (with a free evaluation version) that is intended to be used for searching local code.

Minimum CompSci Knowledge Needed for Writing Desktop Apps

Having been a hobbyist programmer for 3 years (mainly Python and C) and never having written an application longer than 500 lines of code, I find myself faced with two choices :
(1) Learn the essentials of data structures and algorithm design so I can become a l33t computer scientist.
(2) Learn Qt, which would help me build projects I have been itching to build for a long time.
For learning (1), everyone seems to recommend reading CLRS. Unfortunately, reading CLRS would take me at least an year of study (or more, I'm not Peter Krumins). I also understand that to accomplish any moderately complex task using (2), I will need to understand at least the fundamentals of (1), which brings me to my question : assuming I use C++ as the programming language of choice, which parts of CLRS would give me sufficient knowledge of algorithms and data structures to work on large projects using (2)?
In other words, I need a list of theoretical CompSci topics absolutely essential for everyday application programming tasks. Also, I want to use CLRS as a handy reference, so I don't want to skip any material critical to understanding the later sections of the book.
Don't get me wrong here. Discrete math and the theoretical underpinnings of CompSci have been on my "TODO: URGENT" list for about 6 months now, but I just don't have enough time owing to college work. After a long time, I have 15 days off to do whatever the hell I like, and I want to spend these 15 days building applications I really want to build rather than sitting at my desk, pen and paper in hand, trying to write down the solution to a textbook problem.
(BTW, a less-math-more-code resource on algorithms will be highly appreciated. I'm just out of high school and my math is not at the level it should be.)
Thanks :)
This could be considered heresy, but the vast majority of application code does not require much understanding of algorithms and data structures. Most languages provide libraries which contain collection classes, searching and sorting algorithms, etc. You generally don't need to understand the theory behind how these work, just use them!
However, if you've never written anything longer than 500 lines, then there are a lot of things you DO need to learn, such as how to write your application's code so that it's flexible, maintainable, etc.
For a less-math, more code resource on algorithms than CLRS, check out Algorithms in a Nutshell. If you're going to be writing desktop applications, I don't consider CLRS to be required reading. If you're using C++ I think Sedgewick is a more appropriate choice.
Try some online comp sci courses. Berkeley has some, as does MIT. Software engineering radio is a great podcast also.
See these questions as well:
What are some good computer science resources for a blind programmer?
https://stackoverflow.com/questions/360542/plumber-programmers-vs-computer-scientists#360554
Heed the wisdom of Don and just do it. Can you define the features that you want your application to have? Can you break those features down into smaller tasks? Can you organize the code produced by those tasks into a coherent structure?
Of course you can. Identify any 'risky' areas (areas that you do not understand, e.g. something that requires more math than you know, or special algorithms you would have to research) and either find another solution, prototype a solution, or come back to SO and ask specific questions.
Moving from 500 loc to a real (eve if small) application it's not that easy.
As Don was pointing out, you'll need to learn a lot of things about code (flexibility, reuse, etc), you need to learn some very basic of configuration management as well (visual source safe, svn?)
But the main issue is that you need a way to don't be overwhelmed by your functiononalities/code pair. That it's not easy. What I can suggest you is to put in place something to 'automatically' test your code (even in a very basic way) via some regression tests. Otherwise it's going to be hard.
As you can see I think it's no related at all to data structure, algorithms or whatever.
Good luck and let us know
I must say that sitting down with a dry old textbook and reading it through is not the way to learn how to do anything effectively, even if you are making notes. Doing it is the best way to learn, using the textbooks as a reference. Indeed, using sites like this as a reference.
As for data structures - learn which one is good for whatever situation you envision: Sets (sorted and unsorted), Lists (ArrayList, LinkedList), Maps (HashMap, TreeMap). Complexity of doing basic operations - adding, removing, searching, sorting, etc. That will help you to select an appropriate library data structure to use in your application.
And also make sure you're reasonably warm with MVC - i.e., ensure your model is separate from your view (the QT front-end) as best as possible. Best would be to have the model and algorithms working on their own, and then put the GUI on top. Or a unit test on top. Etc...
Good luck!
It's like saying you want to move to France, so should you learn french from a book, and what are the essential words - or should you just go to France and find out which words you need to know from experience and from copying the locals.
Writing code is part of learning computer science. I was writing code long before I'd even heard of the term, and lots of people were writing code before the term was invented.
Besides, you say you're itching to write certain applications. That can't be taught, so just go ahead and do it. Some things you only learn by doing.
(The theoretical foundations will just give you a deeper understanding of what you wind up doing anyway, which will mainly be copying other people's approaches. The only caveat is that in some cases the theoretical stuff will tell you what's futile to attempt - e.g. if one of your itches is to solve an NP complete problem, you probably won't succeed :-)
I would say the practical aspects of coding are more important. In particular, source control is vital if you don't use that already. I like bzr as an easy to set up and use system, though GUI support isn't as mature as it could be.
I'd then move on to one or both of the classics about the craft of coding, namely
The Pragmatic Programmer
Code Complete 2
You could also check out the list of recommended books on Stack Overflow.

Do you know any patterns for GUI programming? (Not patterns on designing GUIs)

I'm looking for patterns that concern coding parts of a GUI. Not as global as MVC, that I'm quite familiar with, but patterns and good ideas and best practices concerning single controls and inputs.
Let say I want to make a control that display some objects that may overlap. Now if I click on an object, I need to find out what to do (Just finding the object I can do in several ways, such as an quad-tree and Z-order, thats not the problem). And also I might hold down a modifier key, or some object is active from the beginning, making the selection or whatever a bit more complicated. Should I have an object instance representing a screen object, handle the user-action when clicked, or a master class. etc.. What kind of patterns or solutions are there for problems like this?
I think to be honest you a better just boning up on your standard design patterns and applying them to the individual problems that you face in developing your UI.
While there are common UI "themes" (such as dealing with modifier keys) the actual implementation may vary widely.
I have O'Reilly's Head First Design Patterns and The Poster, which I have found invaluable!
Shameless Plug : These links are using my associates ID.
Object-Oriented Design and Patterns by Cay Horstmann has a chapter entitled "Patterns and GUI Programming". In that chapter, Horstmann touches on the following patterns:
Observer Layout Managers and the
Strategy Pattern Components,
Containers, and the Composite Pattern
Scroll Bars and the Decorator Pattern
I don't think the that benefit of design patterns come from trying to find a design pattern to fit a problem. You can however use some heuristics to help clean up your design in this quite a bit, like keeping the UI as decoupled as possible from the rest of the objects in your system.
There is a pattern that might help out in this case, the Observer Pattern.
I know you said not as global as MVC, but there are some variations on MVC - specifically HMVC and PAC - which I think can answer questions such as the ones you pose.
Other than that, try to write new code "in the spirit" of existing patterns even if you don't apply them directly.
perhaps you're looking for something like the 'MouseTrap' which I saw in some articles on codeproject (search for UI Platform)?
I also found this series very useful http://codebetter.com/jeremymiller/2007/07/26/the-build-your-own-cab-series-table-of-contents/ where you might have a look at embedded controllers etc.
Micha.
You are looking at a professional application programming. I searched for tips and tricks a long time, without success. Unfortunately you will not find anything useful, it is a complicated topic and only with many years of experience you will be able to understand how to write efficiently an application. For example, almost every program opens a file, extracts information, shows it in different forms, allow processing, saving, ... but nobody explains exactly what the good strategy is and so on. Further, if you are writing a big application, you need to look at some strategies to reduce your compilation time (otherwise you will wait hours at every compilation). Impls idioms in C++ help you for example. And then there is a lot more. For this reason software developers are well paid and there are so many jobs :-)

Resources