I have complex log files, which are full of noise.
Can someone recommand a simple utility program which I can use to define lines which I want to filter out, or highlight using wildcards or any other method?
As well, a utility that can find logs which fulfill a certain condition (e.g., contains a line of a certain template) among a directory full of logs?
Regards
Have a look at LogParser from Microsoft. It has a SQL-like query language to allow you to filter log files based on conditions. Jeff Atwood has a brief overview of it here.
There's always good old "grep".
I have used Bear tail with great success. They have a free as well as a paid version.
If you have a bit of time to play with PERL and regular expressions, this is the kind of thing they do beautifully.
Related
I'm currently in process of making site i18n-aware. Marking hardcoded strings as translatable.
I wonder if there's any automated tool that would let me browse the site and quickly see which strings are marked and which still aren't. I saw a few projects like django-i18n-helper that try to highlight translated strings using HTML facilities, but this doesn't work well with JavaScript.
So I thought FДЦЖ CУЯILLIC, 𝔅𝔩𝔞𝔠𝔨𝔩𝔢𝔱𝔱𝔢𝔯 or ʇxǝʇ uʍop-ǝpısdn (or something along those lines) should do the trick. Easy to distinguish visually, still readable, yet doesn't depend on any rich text formatting besides Unicode support.
The problem is, I can't find any readily-available tool that'd eat gettext .po/.pot file(s) and spew out such translation. Still, I think the idea is pretty obvious, so there must be something out there, already.
In my case I'm using Python/Django, but I suppose this question applies to anything that uses gettext-compatible library. The only thing the tool should be aware of, is that there could be HTML fragments in translation strings.
The msgfilter program will let you run your translations through any program you want. It works especially well with GNU sed.
For example, to turn all your translations into uppercase (HTML is mostly case-insensitive, so this should work):
msgfilter -i django.po sed -e 's/\(.*\)/\U\1/'
The only strings in your app that have lowercase letters in them would then be the hardcoded ones.
If you really want to do faux cyrillic, you just have to write a program or script that reads Latin and outputs that, and feed that program to msgfilter instead of sed.
If your distribution has a talkfilters package, it might provide a few programs that might be useful in this specific case. All of these should work as msgfilter filters. (My personal favorite is chef. Bork bork bork!)
Haven't tried this myself yet, but found podebug tool from Translate Toolkit. Based on documentation (flipped and unicode rewrite options), this looks exactly the tool I wished for.
I am writing a complex application (a compiler analysis). To debug it I need to examine the application's execution trace to determine how its values and data structures evolve during its execution. It is quite common for me to generate megabytes of text output for a single run and sifting my way through all that is very labor-intensive. To help me manage these logs I've written my own library that formats them in HTML and makes it easy to color text from different code regions and indent code in called functions. An example of the output is here.
My question is: is there any better solution than my own home-spun library? I need some way to emit debug logs that may include arbitrary text and images and visually structure them and if possible, index them so that I can easily find the region of the output I'm most interested. Is there anything like this out there?
Regardless you didn't mentioned a language applied, I'd like to propose apache Log4XXX family: http://logging.apache.org/
It offers customizable details level as well as tag-driven loggers. GUI tool (chainsaw) can be combined with "old good" GREP approach (so you see only what you're interested in at the moment).
Colorizing, search and filtering using an expression syntax is available in the latest developer snapshot of Chainsaw. The expression syntax also supports regular expressions (using the 'like' keyword).
Chainsaw can parse any regular text log file, not just log files generated by log4j.
The latest developer snapshot of Chainsaw is available here:
http://people.apache.org/~sdeboy
The File, load Chainsaw configuration menu item is where you define the 'format' and location of the log file you want to process, and the expression syntax can be found in the tutorial, available from the help menu.
Feel free to email the log4j users list if you have additional questions.
I created a framework that might help you, https://github.com/pablito900/VisualLogs
I'm looking for a library or technique to detect the input language of blocks of text provided by users. Online lookups (like Google translate) won't work for this task as I'm writing an app which must run offline.
Thanks.
Here are two more n-gram-based gems you might want to try. They work offline.
https://github.com/echen/unsupervised-language-identification, optimized for separating english and other languages (has a live demo)
https://github.com/feedbackmine/language_detector, less specialized, will detect more languages. Some languages may need some extra training — I found it to be not precise enough for German text.
For anyone interested, I've found http://rubygems.org/gems/kenwaln-whatlanguage, which is performing excellently.
I'm using CLD which I really like, succinct and easy to use. Give it a try.
A quick demo of WhatLanguage in Ruby:
http://www.youtube.com/watch?v=lNqZ2cqOReo&list=UUJ_3fstMOH-g4yBxtvgAWkw&index=0&feature=plcp
Sadly, a project that I have been working on lately has a large amount of copy-and-paste code, even within single files. Are there any tools or techniques that can detect duplication or near-duplication within a single file? I have Beyond Compare 3 and it works well for comparing separate files, but I am at a loss for comparing single files.
Thanks in advance.
Edit:
Thanks for all the great tools! I'll definitely check them out.
This project is an ASP.NET/C# project, but I work with a variety of languages including Java; I'm interested in what tools are best (for any language) to remove duplication.
Check out Atomiq. It finds code that is duplicate that is prime for extracting to one location.
http://www.getatomiq.com/
If you're using Eclipse, you can use the copy paste detector (CPD) https://olex.openlogic.com/packages/cpd.
You don't say what language you are using, which is going to affect what tools you can use.
For Python there is CloneDigger. It also supports Java but I have not tried that. It can find code duplication both with a single file and between files, and gives you the result as a diff-like report in HTML.
See SD CloneDR, a tool for detecting copy-paste-edit code within and across multiple files. It detects exact copyies, copies that have been reformatted, and near-miss copies with different identifiers, literals, and even different seqeunces of statements.
The CloneDR handles many languages, including Java (1.4,1.5,1.6) and C# especially up to C#4.0. You can see sample clone detection reports at the website, also including one for C#.
Resharper does this automagically - it suggests when it thinks code should be extracted into a method, and will do the extraction for you
Check out PMD , once you have configured it (which is tad simple) you can run its copy paste detector to find duplicate code.
One with some Office skills can do following sequence in 1 minute:
use ordinary formatter to unify the code style, preferably without line wrapping
feed the code text into Microsoft Excel as a single column
search and replace all dual spaces with single one and do other replacements
sort column
At this point the keywords for duplicates will be already well detected. But to go further
add comparator formula to 2nd column and counter to 3rd
copy and paste values again, sort and see the most repetitive lines
There is an analysis tool, called Simian, which I haven't yet tried. Supposedly it can be run on any kind of text and point out duplicated items. It can be used via a command line interface.
Another option similar to those above, but with a different tool chain: https://www.npmjs.com/package/jscpd
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 11 months ago.
Improve this question
First of all, great praise goes out to PowerGREP. It's a great program.
But it's not free. Some of its options I'm looking for:
Being able to use .NET regexp's (or similar) to find things in a filtered list of files through subdirectories.
Replacing that stuff with other regexps.
Being able to jump to that part of the file in some sort of editor.
Non commandline.
Being able to copy the results / filename and occurrences of the text.
Low overhead would also be nice, so not too many dependencies, etc.
And I need it on Windows.
I would suggest trying the new dnGrep. It's a .NET application that provides grep-like functionality and has almost all the features you specified.
Here are the features and a sample screenshot:
Shell integration (ability to search from Windows Explorer)
Plain text/regex/XPath search (including case-insensitive search)
Phonetic search (using Bitap and Needleman-Wunch algorithms)
File move/copy/delete actions
Search inside archives (via plug-ins)
Search Microsoft Word documents (via plug-ins)
Search PDF documents (via plug-ins)
Undo functionality
Optional integration with a text editor (like Notepad++)
Bookmarks (ability to save regex searches for the future)
Pattern test form
Search result highlighting
Search result preview
Does not require installation (can be run from a USB drive)
Feature-wise nothing even comes close to PowerGREP, so the question is, how many compromises are you willing to make? I agree that PowerGREP's price tag is a bit steep (not that I have ever regretted a single penny I spent on it), so perhaps something cheaper might do?
UltraEdit is an excellent text editor with very good regex support. It supports Perl-style regular expressions, and you can do find/replace operations in multiple (optionally pre-filtered) files with it. I'd say it can do everything you want to do according to your question.
RegexBuddy, apart from being the best regex editor/debugger on the market, also has a limited GREP functionality, allowing search/replace in (pre-filtered) subdirectories. It's also not free, but considerably less expensive than PowerGREP, and its regex engine has all the features you could ask for (the current version even introduced recursive regexes, and the extremely useful ability to translate regexes between flavors). Big pluses here are the ability to do a non-desctructive preview for all operations, and to have backups automatically be created of all files that are modified during a grep.
I use GrepWin extensively during development and on production servers - it doesn't support all the features you specify, but it gets the job done (your mileage may vary).
For a fast loading, fast executing program used to only find (no search and replace) then I've found Baregrep to be pretty good. It does subdirectories.
You might have a look on this:
Open Source PowerGREP Alternatives
Currently there're six alternatives to PowerGREP.
Get Cygwin for a bunch of free alternatives!
grep, sed, awk, perl, python... goes on.
But, oops! you want to stick to GUI.
I always wonder at how people wrap GUI around things like grep and get cash for that!
WinGrep seems to be free though and, yet comes with quite a punch.
Windows Grep is designed for searching plain-ASCII text files, such as program source, HTML, RTF and batch files, but it can also search binary files such as word processor documents, databases, spreadsheets and executables.
I do not know PowerGREP, but grepWin lets you search regexes in directories.
You can get GNU grep or Gawk.