Beyond Compare 4: How to show only lines containing a specific string? - filter

I am trying to compare two log files containing a list of transactions. The fields of these transactions are defined within the line itself. eg:
transactionID: 1, transactionType: 6, transactionData: 123456
transactionID: 2, transactionType: 6, transactionData: 654321
In one log file, transactionType 6 transactions may come consecutively, while in the other file they may be interlaced with other transaction types. So while transactionID may be different, they would still be in the same order and contain the same data.
How can I filter or otherwise only show the lines in both files which contain the string "transactionType: 6"? This would filter out all the other transactions and allow me to see only the ones with Type 6.
Thank you.

What you are asking is not possible in Beyond Compare 4.1.1 (current version).
The closest you can get to what you're describing is to only display differences within text matching a specific string.
Define a regular expression grammar element that matches on ".transactionType: 6." using the Define Unimportant Text in Beyond Compare instructions.
After you've defined the grammar element, click the Rules toolbar button (referee icon). In the Importance tab, check the box next to your new grammar element and uncheck all other grammar elements to make them unimportant. This will only highlight differences in lines that match the grammar element you defined.

Here's the way I was able to accomplish the desired behavior in BC4.
BC supports running a "pre-processor" application as it opens a file for comparison.
So what I did was make a simple executable which takes 3 arguments (argv[]):
The path to the original file
The path to the processed file (which will ultimately be opened for comparison by BC)
The path to a txt file containing line-delimited search substrings (more on this below)
Item number 3 above could contain only one entry (to use the same example as the original question) such as "transactionType: 6". The executable then searches each line of the original file (item 1 above) for the search substrings defined in item 3 above. Every time there is a hit, the whole line is copied (appended) into the output file (item 2 above).
Next, you need to define a File Format in BC (on a Mac you go to Beyond Compare menu and click on File Formats...). Select your file's extension and click on the Conversion tab. Use the screenshot below as example.
Note: %s is defined by BC to refer to the path of the original file (item 1) and %t refers to the path of the processed file (item 2).
screenshot
So, when you open File1.txt and File2.txt for comparison, BC will invoke your executable (once for each) and then open the resulting files (item 2). This will effectively show a "filtered" version of both files showing only lines containing the search substring.
Also note that the %t argument will be some sort of temp path generated internally by BC.
Below is a quick-and-dirty implementation of the executable described above:
#include <iostream>
#include <fstream>
#include <list>
#include <string>
using namespace std;
int main(int argc, const char * argv[])
{
ifstream inputFile (argv[1]);
ifstream substringFile (argv[3]);
ofstream outputFile;
outputFile.open(argv[2]);
//build the list of substrings from the substring input file
list<string> substringList;
string line;
//TODO: make this safer
while (getline(substringFile, line))
{
substringList.push_back(line);
}
//for each substring in the list
for (list<string>::const_iterator iter = substringList.begin(); iter != substringList.end(); iter++)
{
if (inputFile.is_open())
{
//for all the lines in the file
while (getline(inputFile, line))
{
//Find the current substring
if (line.find(*iter) != string::npos)
{
outputFile << line << "\n";
}
}
}
//go back to the beginning of the file
inputFile.clear();
inputFile.seekg(0, ios::beg);
}
inputFile.close();
outputFile.close();
return 0;
}
Hope this helps!

Related

string size limit input cin.get() and getline()

In this project the user can type in a text(maximum 140 characters).
so for this limitation I once used getline():
string text;
getline(cin, text);
text = text.substr(1, 140);
but in this case the result of cout << text << endl; is an empty string.
so I used cin.get() like:
cin.get(text, 140);
this time I get this error: no matching function for call to ‘std::basic_istream::get(std::__cxx11::string&, int)’
note that I have included <iostream>
so the question is how can I fix this why is this happening?
Your first approach is sound with one correction - you need to use
text = text.substr(0, 140);
instead of text = text.substr(1, 140);. Containers (which includes a string) in C/C++ start with index 0 and you are requesting the string to be trimmed from position 1. This is perfectly fine, but if the string happens to be only one character long, calling text.substr(1, 140); will not necessarily cause the program to crash, but will not end up in the desired output either.
According to this source, substr will throw an out of range exception if called with starting position larger than string length. In case of a one character string, position 1 would be equal to string length, but the return value is not meaningful (in fact, it may even be an undefined behavior but I cannot find a confirmation of this statement - in yours and my case, calling it returns an empty string). I recommend you test it yourself in the interactive coding section following the link above.
Your second approach tried to pass a string to a function that expected C-style character arrays. Again, more can be found here. Like the error said, the compiler couldn't find a matching function because the argument was a string and not the char array. Some functions will perform a conversion of string to char, but this is not the case here. You could convert the string to char array yourself, as for instance described in this post, but the first approach is much more in line with C++ practices.
Last note - currently you're only reading a single line of input, I assume you will want to change that.

How is CTRL-R (reverse-i-search) is implemented in bash terminal?

Example of the reverse search:
(reverse-i-search)`grep': git log | grep master
What is the algorithm used to find a suggestion?
Where does its search space come from ?
A pointer to its source code would be greatly appreciated.
Reverse-i-search is part of GNU Readline Library. The Readline Library facilitates reading line along with editing facilities. The entire source code can be found here.
Source of search space
Following code snippet from the source shows how the source file for history is determined :
/* Return the string that should be used in the place of this
filename. This only matters when you dont specify the
filename to read_history (), or write_history (). */
static char *
history_filename (filename)
const char *filename;
{
char *return_val;
const char *home;
int home_len;
return_val = filename ? savestring (filename) : (char *)NULL;
if (return_val)
return (return_val);
home = sh_get_env_value ("HOME");
#if defined (_WIN32)
if (home == 0)
home = sh_get_env_value ("APPDATA");
#endif
if (home == 0)
return (NULL);
else
home_len = strlen (home);
return_val = (char *)xmalloc (2 + home_len + 8); /* strlen(".history") == 8 */
strcpy (return_val, home);
return_val[home_len] = '/';
#if defined (__MSDOS__)
strcpy (return_val + home_len + 1, "_history");
#else
strcpy (return_val + home_len + 1, ".history");
#endif
return (return_val);
}
savestring() is defined in savestring.c which simply copies the string filename if it is defined.
sh_get_env_value() function is implemented using getenv() function ( provided by <stdlib.h> ) used to get an environment value ( Refer man page getenv(3) ).
As shown, .bash_history or .history ( this is the file that is used in case the function returns NULL ) will be used as source for implementing the search on a Linux system.
Source : histfile.c
How history is stored
The searchable history is stored in a HIST_ENTRY( history list ) array. The data from .bash_history is added to this array. Source : history.c
The record of the commands entered in current session are saved in _rl_saved_line_for_history.
These two are combined into a _rl_search_cxt instance member array ( cxt->lines[] ) using which the search is performed.
Algorithm
The actual search is performed using _rl_isearch_dispatch() and _rl_search_getchar() function.
Short Summary :
The algorithm reads character by character the input deciding what it should do. In case of no interrupts, it adds the character to the search string searching for it in the array. If the string is not found, it moves to next element skipping over same string found again and strings shorter in length than current length of search string. ( Read default : in switch for exact details in _rl_isearch_dispatch() )
In case the string is not found, the bell is dinged. Else, it displays the string but doesn't actually moves there in history list till user accepts the location.

Find lines that have partial matches

So I have a text file that contains a large number of lines. Each line is one long string with no spacing, however, the line contains several pieces of information. The program knows how to differentiate the important information in each line. The program identifies that the first 4 numbers/letters of the line coincide to a specific instrument. Here is a small example portion of the text file.
example text file
1002IPU3...
POIPIPU2...
1435IPU1...
1812IPU3...
BFTOIPD3...
1435IPD2...
As you can see, there are two lines that contain 1435 within this text file, which coincides with a specific instrument. However these lines are not identical. The program I'm using can not do its calculation if there are duplicates of the same station (ie, there are two 1435* stations). I need to find a way to search through my text files and identify if there are any duplicates of the partial strings that represent the stations within the file so that I can delete one or both of the duplicates. If I could have BASH script output the number of the lines containing the duplicates and what the duplicates lines say, that would be appreciated. I think there might be an easy way to do this, but I haven't been able to find any examples of this. Your help is appreciated.
If all you want to do is detect if there are duplicates (not necessarily count or eliminate them), this would be a good starting point:
awk '{ if (++seen[substr($0, 1, 4)] > 1) printf "Duplicates found : %s\n",$0 }' inputfile.txt
For that matter, it's a good starting point for counting or eliminating, too, it'll just take a bit more work...
If you want the count of duplicates:
awk '{a[substr($0,1,4)]++} END {for (i in a) {if(a[i]>1) print i": "a[i]}}' test.in
1435: 2
or:
{
a[substr($0,1,4)]++ # put prefixes to array and count them
}
END { # in the end
for (i in a) { # go thru all indexes
if(a[i]>1) print i": "a[i] # and print out the duplicate prefixes and their counts
}
}
Slightly roundabout but this should work-
cut -c 1-4 file.txt | sort -u > list
for i in `cat list`;
do
echo -n "$i "
grep -c ^"$i" file.txt #This tells you how many occurrences of each 'station'
done
Then you can do whatever you want with the ones that occur more than once.
Use following Python script(syntax of python 2.7 version used)
#!/usr/bin/python
file_name = "device.txt"
f1 = open(file_name,'r')
device = {}
line_count = 0
for line in f1:
line_count += 1
if device.has_key(line[:4]):
device[line[:4]] = device[line[:4]] + "," + str(line_count)
else:
device[line[:4]] = str(line_count)
f1.close()
print device
here the script reads each line and initial 4 character of each line are considered as device name and creates a key value pair device with key representing device name and value as line numbers where we find the string(device name)
following would be output
{'POIP': '2', '1435': '3,6', '1002': '1', '1812': '4', 'BFTO': '5'}
this might help you out!!

How to make a syntax manipulator?

Firstly, sorry for the question, I know I've heard something that could help, but I just can't remember.
Basically I would like to create my own syntax for a programming language. For example this code:
WRITE OUT 'Hello World!'
NEW LINE
would turn into this Java code:
System.out.print("Hello World!");
System.out.println();
How could I achieve this? Is there a method?
Olá.
There are techniques and proper algorithms to do that.
Search for "compiler techniques" and "Interpreter pattern".
An initial approach could be a basic pattern interpreter.
Assuming simple sentences and only one sentence per line, you could read the input file line by line and search for defined patterns (regular expressions).
The patterns describe the structure of the commands in your invented language.
If you get a match then you do the translation.
In particular, we use the regex.h library in c to perform the regular expression search.
Of course regex is also available in java.
Ex. NEW LINE match the pattern " *NEW +LINE *"
The * means that the preceding character occurs 0 or more times.
The + means that the preceding character occurs 1 or more times.
Thus, this pattern can match the command " NEW LINE " with arbitrary spaces between the words.
Ex. WRITE OUT 'Hello World!' match the pattern "WRITE OUT '([[:print:]]*)'"
or if you want to allow spaces " *WRITE +OUT +'([[:print:]]*)' *"
[[:print:]] means: match one printable character (ex. 'a' or 'Z' or '0' or '+')
Thus, [[:print:]]* match a sequence of 0, 1 or more printable characters
If a line of your input file matched the pattern of some command then you can do the translation, but in most cases you will need to retrieve some information before,
ex. the arbitrary text after WRITE OUT. Thats why you need to put parenthesis around [[:print:]]*. That will indicate to the function that perform the search that you want retrieve that particular part of your pattern.
A nice coincidence is that I recently assisted a friend with an college project similar to the problem you want to solve: a translator from c to basic. I reused that code to make an example for you.
I tested the code and it works.
It can translate:
WRITE OUT 'some text'
WRITE OUT variable
NEW LINE
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
#include <string.h>
#define STR_SHORT 100
#define MATCHES_SIZE 10
/**************************************************************
Returns the string of a match
**************************************************************/
char * GetExp(char *Source, char *Destination, regmatch_t Matches) {
//Source The string that was searched
//Destination Will contains the matched string
//Matches One element of the vector passed to regexec
int Length = Matches.rm_eo - Matches.rm_so;
strncpy(Destination, Source+Matches.rm_so, Length);
Destination[Length]=0;
return Destination;
}
/**************************************************************
MAIN
**************************************************************/
int main(int argc, char *argv[]) {
//Usage
if (argc==1) {
printf("Usage:\n");
printf("interpreter source_file\n");
printf("\n");
printf("Implements a very basic interpreter\n");
return 0;
}
//Open the source file
FILE *SourceFile;
if ( (SourceFile=fopen(argv[1], "r"))==NULL )
return 1;
//This variable is used to get the strings that matched the pattern
//Matches[0] -> the whole string being searched
//Matches[1] -> first parenthetical
//Matches[2] -> second parenthetical
regmatch_t Matches[MATCHES_SIZE];
char MatchedStr[STR_SHORT];
//Regular expression for NEW LINE
regex_t Regex_NewLine;
regcomp(&Regex_NewLine, " *NEW +LINE *", REG_EXTENDED);
//Regular expression for WRITE OUT 'some text'
regex_t Regex_WriteOutStr;
regcomp(&Regex_WriteOutStr, " *WRITE +OUT +'([[:print:]]*)' *", REG_EXTENDED);
//Regular expresion for WRITE OUT variable
regex_t Regex_WriteOutVar;
regcomp(&Regex_WriteOutVar, " *WRITE +OUT +([_[:alpha:]][[:alnum:]]*) *", REG_EXTENDED);
//Regular expression for an empty line'
regex_t Regex_EmptyLine;
regcomp(&Regex_EmptyLine, "^([[:space:]]+)$", REG_EXTENDED);
//Now we read the file line by line
char Buffer[STR_SHORT];
while( fgets(Buffer, STR_SHORT, SourceFile)!=NULL ) {
//printf("%s", Buffer);
//Shorcut for an empty line
if ( regexec(&Regex_EmptyLine, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("\n");
continue;
}
//NEW LINE
if ( regexec(&Regex_NewLine, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("System.out.println();\n");
continue;
}
//WRITE OUT 'some text'
if ( regexec(&Regex_WriteOutStr, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("System.out.print(\"%s\");\n", GetExp(Buffer, MatchedStr, Matches[1]));
continue;
}
//WRITE OUT variable
//Assumes variable is a string variable
if ( regexec(&Regex_WriteOutVar, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("System.out.print(\"%%s\", %s);\n", GetExp(Buffer, MatchedStr, Matches[1]));
continue;
}
//Unknown command
printf("Unknown command: %s", Buffer);
}
return 0;
}
Proper solution for this question requires the following steps:
Parse the original syntax code and create a syntax tree.
That is commonly done with tools like ANTLR.
Go through the syntax tree and either convert it to Java code, or to a Java syntax tree.
Both of those steps have their own complexity, so it would be better to ask separate questions about specific issues you encounter while implementing them.
Strictly speaking you can skip step 2 and generate Java directly when parsing, but unless your language is very simple renaming of Java concepts, you wouldn't be able to do that easily.

QString Remove numbers not associated with letters

I am looking for a way to remove all numbers and letters in brackets, as well as numbers not associated with a letter (i.e. I want to keep 'v2' or 'vol.2').
For instance:
"My Notes v02 003 (2009) (My sillyness)"
would become:
"My Notes v02".
I have found ways to remove the data in the braces and the braces themselves, however the issue I have now is removing the numbers not associated with a volume identifier.
Currently I have:
QString myItem = "My Notes v02 003 (2009) (My sillyness)";
myItem = myItem.remove( QRegExp( "\\[.*\\]|\\(.*\\)" ) );
Do I need to break the strings up into individual words and check manually? Or is there a better solution?
first i want recommend you to use boost library to manipulate your string data easily
http://www.boost.org/
so if your QString myItem is always struct data, it is easy to get what you want using split your string every time you find blank
#include <boost/algorithm/string.hpp>
using namespace std;
using namespace boost;
/..
QString myItem = "My Notes v02 003 (2009) (My sillyness)";
vector< string > newItem;
split( newItem, myItem.tostdstring, is_any_of(" "));
cout <<newItem.at(0) <<" "<<newItem.at(1) <<" "<<newItem.at(2) <<endl;

Resources