Looking for source code repositories for AI Planning systems written in CLIPS - clips

I have gone through github extensively. Last time I checked the CLIPS tag was removed btw.
I am looking for CLIPS code repositories that contain AI Planners. Since the word CLIPS confuses many search engines, I was hoping the wider community might have better answers. A single example planner that is more than 40 lines long, slightly more than a homework assignment.

Github is the largest repository of CLIPS projects that I've found. I've seen a few planning projects there, mostly programs that move a robot to accomplish some goals.

I found a generalized planning system written in CLIPS that was part of a Masters Thesis here: https://github.com/arcra/PlanningEngine

Related

AlphaGo Improving Itself

I've read a couple news articles about AlphaGo and they all mention that AlphaGo became better from first playing human games, then playing games against itself. One thing I am curious about is, how did AlphaGo improve itself? Does it modify variables in the code? Or does it change it's code completely writing it itself? Or did the creators add it? How does it actually learn? A generalised answer is fine as it's just for my general knowledge.
Maybe I'm misunderstanding the whole concept, news articles tend to give a broad and sometimes misinformed understanding. Some clarity would be great or links to useful information.
AlphaGo uses machine learning.
In Machine Learning you have a function (say ax +b) that gives you a result and you tune the parameters of that function (a and b) so that the result matches more and more the examples you have. In the case of AlphaGo they had 2 functions, one to pick the next move and and one to say who is winning, and both are very complex with many thousands of parameters.
When they played a game between two instances of AlphaGo they would record the result and use it as an example to train the functions, so that the next version plays even better.
There are great tutorials on the web on how machine learning works if you want to know more.

Automate Finding Pertinent Methods in Large Project

I have tried to be disciplined about decomposing into small reusable methods when possible. As the project growing, I am re-implementing the exact same method.
I would like to know how to deal with this in an automated way. I am not looking for an IDE specific solution. Dependency on method names may not be sufficient. Unix and scripting are solutions that would be extremely beneficial. Answers such as "take care" etc. are not the solutions I am seeking.
I think the cheapest solution to implement might be to use Google Desktop. A more accurate solution would probably be much harder to implement - treat your code base as a collection of documents where the identifiers (or tokens in the identifiers) are words of the document, and then use document clustering techniques to find the closest matching code to a query. I'm aware of some research similar to that, but nothing close to out-of-the-box code that you could use. You might try looking on Google Code Search for something. I don't think they offer a desktop version, but you might be able to find some applicable code you can adapt.
Edit: And here's a list of somebody's favorite code search engines. I don't know whether any are adaptable for local use.
Edit2: Source Code Search Engine is mentioned in the comments following the list of code search engines. It appears to be a commercial product (with a free evaluation version) that is intended to be used for searching local code.

How to go about a large refactoring project? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I am about to start planning a major refactoring of our codebase, and I would like to get some opinions and answers to some questions (I have seen quite a few discussions on similar topics, such as https://stackoverflow.com/questions/108141/how-do-i-work-effectively-with-very-messy-legacy-code, Strategy for large scale refactoring, but I have some specific questions (at the bottom):
We develop a complex application. There are some 25 developers working the codebase. Total man years put into the product to date are roughly 150.
The current codebase is a single project, built with ant. The high level goal of the project I'm embarking on is to modularize the codebase into its various infrastructures and applicative components.
There is currently no good separation between the various logical components, so it's clear that any modularization effort will need to include some API definitions and serious untangling to enable the separation.
Quality standards are low - there are almost no tests, and definitely no tests running as part of the build process.
Another very important point is that this project needs to take place in parallel to active product development and versions being shipped to customers.
Goals of project:
allow reuse of components across different projects
separate application from infrastructure, and allow them to evolve independently
improve testability (by creating APIs)
simplify developers' dev env (less code checked out and compiled)
My thoughts and questions:
What are your thoughts regarding the project's goals? Anything you would change?
do you have experience with such projects? What would some recommendations?
I'm very concerned with the lack of tests - hence lack of control for me to know that the refactoring process is not breaking anything as i go. This is a catch 22, because one of the goals of this project is to make our code more testable...
I was very influenced by Michael Feathers' Working Effectively With Legacy Code . According to it, a bottom up approach is the way to solve my problem - don't jump head first into the codebase and try to fix it, but rather start small by adding unit tests around new code for several months, and see how the code (and team) become much better, to an extent where abstractions will emerge, APIs will surface, etc, and essentially - the modularization will start happening by itself.
Does anyone have experience with such a direction?
As seen in many other questions on this topic - the main problem here is managerial disbelief. "how is testing class by class (and spending a lot of time doing so) gonna bring us to a stable system? It's a nice theory which doesn't work in real life". Any tips on selling this?
Well I guess it's better now than later but you've definitely got a task ahead of you. I was once in a team of three responsible for a refactoring a product of similar size. It was procedural code but I'll describe some of the issues we had that will similarly apply.
We started at the bottom and started easing into it by picking functions that should have been highly reusable but weren't. We'd write a bunch of unit tests on the existing code (none existed at all!), but before long, we faced our first big problem--the existing code had bugs that had been laying dormant.
Do we fix them? If we do, then we've gone beyond a refactoring. So we'd log an issue with the existing code hoping to get a fixed and freshly tested code base, but of course management decided there were more important priorities than fixing bugs that had never surfaced. Understandable.
So we thought we'd try fixing the bugs in our new code. Then we discovered that these bugs in the original code made other code work, so really were 'conceptual bugs' rather than 'functional bugs'. Well maybe. There were occasional intermittent spasms in the original software that had never been tracked down.
So then we changed tack and decided to keep the bugs in place, as a true refactoring should do. It's easy to unintentionally introduce bugs, it's far harder to do it intentionally!
The next problem was that the code was in such as mess that the initial unit tests we wrote had to substantially change to cater for the refactoring. In other words, two moving targets. Not good. Just writing the tests was taking ages and lost us belief in the worthiness of the project. It really was something you just wanted to walk away from.
We found in the end we really had to tone down the extent of the refactoring if we were going to finish this millennium, which meant the codebase we dreamed of wouldn't be achieved. We declared that the most feasible solution was just to clean and trim the code and at least make it conceptually easier to understand for future developers to modify.
The reduced benefits of the limited refactoring was deemed not worth the effort by management, and given that similar reliability issues were being found in the hardware platform (embedded project), the company decided it was their chance to renew the entire product, with the software written from scratch, new language, objects. It was only the extensive system test specs in place from the original product that meant this had a chance.
Clearly the absence of tests is going to make people nervous when you attempt to refactor the code. Where will anybody get any faith that your refactoring doesn't break the application? Most of the answers you'll get, I think, will be "this is gonna be very hard and not very successful", and this is largely because you are facing a huge manual task and no faith in the answer.
There are only two ways out.
Build a bunch of tests. Unfortunately, this will cost a lot of time and most managers don't see any value; after all, you've gotten along without them so far. Pointing back to the faith question won't help; you're still using a lot of time before anything useful happens. If they do let you build tests, you'll have the problem of evolving the tests as you refactor; they may not change functionality one bit, but as you build new APIs the tests will have to change to match the new APIs. That's additional work beyond refactoring the code base.
Automate the refactoring process. If you apply trustworthy automated transformations, you can argue (often unsuccessfully) that the refactored code preserves the original system function. The way to beat the unsucessful argument is to write those tests (see first method) and apply the refactoring process to the application and the tests; as the application changes structures, the tests have to change too. But they are just application code from the point of view of automated machinery.
Not a lot of people do the latter; where do you get the tools that can do such things?
In fact, such tools exist. They are called program transformation tools and are used to carry out massive transformations on code.
Think of these as tools for literally refactoring in the large; because of scale,
they tend not to be interactive.
It does take effort to configure them for the task at hand; you have to write custom rules to accomplish your custom desired result. You likely can't do this in a week, but this is a lot less work than manually modifying a large system. And you should consider that you have 150 man-years invested in the existing software; it took that long to make the mess. It seems reasonable that "some" effort small in comparison should be OK.
I only know of 3 such tools that have a chance of working on real code: TXL, Stratego/XT, and our tool, the DMS Software Reengineering Toolkit. The first two are academic products (although TXL has been used for commercial activities in the past); DMS is commercial.
DMS has been used for a wide variety of large-scale software anaysis and massive transformation tasks. One task was automated translation between languages for the B-2 Stealth Bomber. Another, much closer to your refactoring problem, was automated architecting of a large-scale component-based system C++ for componentts, from a legacy proprietary RTOS with its idiosyncratic rules about how components are organized, to CORBA/RT in which the component APIs had to be changed from ad hoc structures to CORBA-style facet and receptacle interfaces as well as using CORBA/RT services in place of the legacy RTOS services. (These tasks were both done with 1-2 man-years of actual effort, by pretty smart and DMS-savvy guys).
There's still the test-construction problem (Both of these examples above had great system tests already).. Here I'm going to go out on a limb. I believe there is hope in getting such tools to automate test generation by instrumenting running code to collect function input-output results. We've built all kinds of instrumentation for source code (obviously you have to compile it after instrumentation) and think we know how to do this. YMMV.
There is something you do which is considerably less ambitious: identify the reusable parts of the code, by finding out what has been reused in the code. Most software systems contain a lot of cloned code (our experience is 10-20% [and I'm surprised by the PHP report of smaller numbers in another answer; I suspect they are using a weak clone detector). Cloned code is a hint of a missing abstraction in the application software. If you can find the clones and see how they vary, you can pretty easily see how to abstract them into functions (or whatever) to make them explicit and reusable.
Salion Inc. did clone detection and abstraction. The paper doesn't explore the abstraction activity; what Salion actually did was a periodic review of the detected clones, and manual remediation of the egregrious ones or those that made sense into (often library) methods. The net result was the code base actually shrank in size and the programmers became more effective because they had better ("more reusable") libraries.
They used our CloneDR, a tool for finding clones by using the program syntax as a guide. CloneDR finds exact clones and near misses (replacement of identifiers or statements) and provides a specific list of clone locatons and clone paramerizations, regardless of layout and comments. You can see clone reports for a number of languages at the link. (I'm the originator and author of CloneDR among my many hats).
Regarding the "small clone percentage" for the PHP project discussed in another answer: I don't know what was being used for a clone detector. The only clone detector focused on PHP that I know is PHPCPD, which IMHO is a terrible clone detector; it only finds exact clones if I understand the claimed implementation. See the PHP example at our site for comparative purposes.
This is exactly what we've been doing for web2project for the past couple years.. we forked from an existing system (dotproject) that had terrible metrics like high cyclomatic complexity (low: 17, avg: 27, high: 195M), lots of duplicate code (8% of overall code), and zero tests.
Since the split, we've reduced duplicate code (2.1% overall), reduced the total code (200kloc to 155kloc), added nearly 500 unit tests, and improved cyclomatic complexity (low: 1, avg: 11, high: 145M). Yes, we still have a ways to go.
Our strategy is detailed in my slides here:
http://caseysoftware.com/blog/phpbenelux-2011-recap - Project Triage & Recovery; and here:
http://www.phparch.com/2010/11/codeworks-2010-slides/ - Unit Testing Strategies; and in various posts like this one:
http://caseysoftware.com/blog/technical-debt-doesn039t-disappear
And just to warn you.. it's not fun at first. It can be fun and satisfying once your metrics start improving but that takes a while.
Good luck.

Three Way Merge Algorithms for Text

So I've been working on a wiki type site. What I'm trying to decide on is what the best algorithm for merging an article that is simultaneously being edited by two users.
So far I'm considering using Wikipedia's method of merging the documents if two unrelated areas are edited, but throwing away the older change if two commits conflict.
My question is as follows: If I have the original article, and two changes to it, what are the best algorithms to merge them and then deal with conflicts as they arise?
Bill Ritcher's excellent paper "A Trustworthy 3-Way Merge" talks about some of the common gotchas with three way merging and clever solutions to them that commercial SCM packages have used.
The 3-way merge will automatically apply all the changes (which are not overlapping) from each version. The trick is to automatically handle as many almost overlapping regions as possible.
There's a formal analysis of the diff3 algorithm, with pseudocode, in this paper:
http://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf
It is titled "A Formal Investigation of Diff3" and written by Sanjeev Khanna, Keshav Kunal, and Benjamin C. Pierce from Yahoo.
Frankly, I'd rely on diff3. It's on pretty much every Unix distro, and you can always build and bundle an .EXE for Windows to ensure it is there for your purposes.

Finding patterns in source code

If I wanted to learn about pattern recognition in general what would be a good place to start (recommend a book)?
Also, does anybody have any experience/knowledge on how to go about applying these algorithms to find abstraction patterns in programs? (repeated code, chunks of code that do the same thing, but in slightly different ways, etc.)
Thanks
Edit: I don't mind mathematically intensive books. In fact, that would be a good thing.
If you are reasonably mathematically confident then either of Chris Bishop's books "Pattern Recognition and Machine Learning" or "Neural Networks for Pattern Recognition" are very good for learning about pattern recognition.
It helps if you have access to the parse tree generated during compilation. This way you can look for pieces of the tree which are similar, ignoring the nodes which are deeper than what you are looking at, this way you can pick out e.g. nodes which multiply together two sub-expressions, ignoring the contents of the sub-expressions. You can apply the same logic to a collection of nodes, e.g. you want to find a multiplication of two sub-expressions where those two sub-expressions are additions of more sub-expressions. You first look for multiplies, then check if the two nodes underneath the multiply are additions, ignoring anything any deeper.
I'd suggest looking at the code of some open source project (e.g. FindBugs or SIM)
that does the kind of thing you're talking about.
If you're working in one of the supported languages, IntelliJ idea has a really smart structural search and replace that would fit your problem.
Other interesting projects are PMD and Eclipse.
Eclipse uses AST (abstract syntax trees) for all source code in any project. Tools can then register for certain types of ASTs (like Java source) and get a preprocessed view where they can add additional information (like links to documentation, error markers, etc).
Another project you can look into is Duplo - it's an open-source/GPL project, so you can pore over their approach by grabbing the code from SourceForge.
This is specific to .Net and visual studio, but it finds duplicate code in your project. It does report some false positives I've found but it could be a good place to start.
Clone Detective
One kind of pattern is code that has been cloned by copy and paste methods. See CloneDR for a tool that automatically finds such code in spite of variations in layout and even changes in the body of the clone, by comparing abstract syntax trees for the language in question.
CloneDR works with a variety of langauges: C, C++, C#, Java, JavaScript, PHP, COBOL, Python, ... The website shows clone detection reports for a variety of programming languages.

Resources