How is CTRL-R (reverse-i-search) is implemented in bash terminal? - bash

Example of the reverse search:
(reverse-i-search)`grep': git log | grep master
What is the algorithm used to find a suggestion?
Where does its search space come from ?
A pointer to its source code would be greatly appreciated.

Reverse-i-search is part of GNU Readline Library. The Readline Library facilitates reading line along with editing facilities. The entire source code can be found here.
Source of search space
Following code snippet from the source shows how the source file for history is determined :
/* Return the string that should be used in the place of this
filename. This only matters when you dont specify the
filename to read_history (), or write_history (). */
static char *
history_filename (filename)
const char *filename;
{
char *return_val;
const char *home;
int home_len;
return_val = filename ? savestring (filename) : (char *)NULL;
if (return_val)
return (return_val);
home = sh_get_env_value ("HOME");
#if defined (_WIN32)
if (home == 0)
home = sh_get_env_value ("APPDATA");
#endif
if (home == 0)
return (NULL);
else
home_len = strlen (home);
return_val = (char *)xmalloc (2 + home_len + 8); /* strlen(".history") == 8 */
strcpy (return_val, home);
return_val[home_len] = '/';
#if defined (__MSDOS__)
strcpy (return_val + home_len + 1, "_history");
#else
strcpy (return_val + home_len + 1, ".history");
#endif
return (return_val);
}
savestring() is defined in savestring.c which simply copies the string filename if it is defined.
sh_get_env_value() function is implemented using getenv() function ( provided by <stdlib.h> ) used to get an environment value ( Refer man page getenv(3) ).
As shown, .bash_history or .history ( this is the file that is used in case the function returns NULL ) will be used as source for implementing the search on a Linux system.
Source : histfile.c
How history is stored
The searchable history is stored in a HIST_ENTRY( history list ) array. The data from .bash_history is added to this array. Source : history.c
The record of the commands entered in current session are saved in _rl_saved_line_for_history.
These two are combined into a _rl_search_cxt instance member array ( cxt->lines[] ) using which the search is performed.
Algorithm
The actual search is performed using _rl_isearch_dispatch() and _rl_search_getchar() function.
Short Summary :
The algorithm reads character by character the input deciding what it should do. In case of no interrupts, it adds the character to the search string searching for it in the array. If the string is not found, it moves to next element skipping over same string found again and strings shorter in length than current length of search string. ( Read default : in switch for exact details in _rl_isearch_dispatch() )
In case the string is not found, the bell is dinged. Else, it displays the string but doesn't actually moves there in history list till user accepts the location.

Related

Beyond Compare 4: How to show only lines containing a specific string?

I am trying to compare two log files containing a list of transactions. The fields of these transactions are defined within the line itself. eg:
transactionID: 1, transactionType: 6, transactionData: 123456
transactionID: 2, transactionType: 6, transactionData: 654321
In one log file, transactionType 6 transactions may come consecutively, while in the other file they may be interlaced with other transaction types. So while transactionID may be different, they would still be in the same order and contain the same data.
How can I filter or otherwise only show the lines in both files which contain the string "transactionType: 6"? This would filter out all the other transactions and allow me to see only the ones with Type 6.
Thank you.
What you are asking is not possible in Beyond Compare 4.1.1 (current version).
The closest you can get to what you're describing is to only display differences within text matching a specific string.
Define a regular expression grammar element that matches on ".transactionType: 6." using the Define Unimportant Text in Beyond Compare instructions.
After you've defined the grammar element, click the Rules toolbar button (referee icon). In the Importance tab, check the box next to your new grammar element and uncheck all other grammar elements to make them unimportant. This will only highlight differences in lines that match the grammar element you defined.
Here's the way I was able to accomplish the desired behavior in BC4.
BC supports running a "pre-processor" application as it opens a file for comparison.
So what I did was make a simple executable which takes 3 arguments (argv[]):
The path to the original file
The path to the processed file (which will ultimately be opened for comparison by BC)
The path to a txt file containing line-delimited search substrings (more on this below)
Item number 3 above could contain only one entry (to use the same example as the original question) such as "transactionType: 6". The executable then searches each line of the original file (item 1 above) for the search substrings defined in item 3 above. Every time there is a hit, the whole line is copied (appended) into the output file (item 2 above).
Next, you need to define a File Format in BC (on a Mac you go to Beyond Compare menu and click on File Formats...). Select your file's extension and click on the Conversion tab. Use the screenshot below as example.
Note: %s is defined by BC to refer to the path of the original file (item 1) and %t refers to the path of the processed file (item 2).
screenshot
So, when you open File1.txt and File2.txt for comparison, BC will invoke your executable (once for each) and then open the resulting files (item 2). This will effectively show a "filtered" version of both files showing only lines containing the search substring.
Also note that the %t argument will be some sort of temp path generated internally by BC.
Below is a quick-and-dirty implementation of the executable described above:
#include <iostream>
#include <fstream>
#include <list>
#include <string>
using namespace std;
int main(int argc, const char * argv[])
{
ifstream inputFile (argv[1]);
ifstream substringFile (argv[3]);
ofstream outputFile;
outputFile.open(argv[2]);
//build the list of substrings from the substring input file
list<string> substringList;
string line;
//TODO: make this safer
while (getline(substringFile, line))
{
substringList.push_back(line);
}
//for each substring in the list
for (list<string>::const_iterator iter = substringList.begin(); iter != substringList.end(); iter++)
{
if (inputFile.is_open())
{
//for all the lines in the file
while (getline(inputFile, line))
{
//Find the current substring
if (line.find(*iter) != string::npos)
{
outputFile << line << "\n";
}
}
}
//go back to the beginning of the file
inputFile.clear();
inputFile.seekg(0, ios::beg);
}
inputFile.close();
outputFile.close();
return 0;
}
Hope this helps!

LoadRunner concatenating correlated values

I have captured loadrunner correlation variable "Date" as below using ORD=ALL:
vuser_init.c(165): Notify: Saving Parameter "Date_1 = 101".
vuser_init.c(165): Notify: Saving Parameter "Date_2 = 102".
vuser_init.c(165): Notify: Saving Parameter "Date_3 = 103".
vuser_init.c(165): Notify: Saving Parameter "Date_4 = 104".
...
Now I want to substitute these values in the subsequent request as in comma separated format as the following:
101, 102, 103, 104...
How to achieve this?
// Storing the number of matches in count integer variable as Date_count will store all the matches
count = atoi(lr_eval_string("{Date_count}"));
// Storing all the values in commaSep variable
for (i=1; i<=count; i++)
{
if(i == count)
{
sprintf(value,"{Date_%d}",i);
strcat(commaSep,value);
}
else
{
sprintf(value,"{Date_%d}",i);
strcat(commaSep,value);
strcat(commaSep,",");
}
}
// Printing the result in Output console, after this you can use this variable wherever you want in your script
lr_output_message("Comma Separated value is = %s,lr_eval_string(commaSep));
C variable + loop to limit of your array + sprintf()
There are a number of functions for manipulating LR Params, and the ones you need are
char * lr_paramarr_idx( const char * paramArrayName, unsigned int index);
int lr_paramarr_len( const char * paramArrayName);
I've quickly made a function you can use for this:
/**
* ArrayToString( char *ResultParam, char *ArrayParam, char *Delimiter)
*
* #param ResultParam The resulting LoadRunner variable to contain result
* #param ArrayParam The Array param where the elements are
* #param Delimiter Delimiter to use between elements in list
*
*/
int ArrayToString( char *ResultParam, char *ArrayParam, char *Delimiter)
{
int idx,count;
char buf[1024];
// Create a tmp buffer with "{ResultParam}"
sprintf(buf,"{%s}",ResultParam);
// Get the Count of params
count = lr_paramarr_len("Date");
// Clear the variable
lr_save_string("",ResultParam);
// Add 1st variable
lr_param_sprintf(ResultParam, "%s",lr_paramarr_idx(ArrayParam, 1));
// Loop all variables, adding them to "ConcattedDates", starting at idx=2
for (idx=2; idx<=count; idx++) {
lr_param_sprintf (ResultParam, "%s%s%s",
lr_eval_string(buf),
Delimiter,
lr_paramarr_idx(ArrayParam, idx)
);
}
// Return the length of the Concatenated buffer. 0=Nothing in buffer
return strlen(lr_eval_string(buf));
}
Example Usage
// Take "Date" array param, output to "DateConcat" param
ArrayToString("DateConcat", "Date", ",");
// Debug output
lr_error_message( "DateConcat='%s'", lr_eval_string("{DateConcat}") );
In your case the resulting DateConcat string would be 101,102,103,104
In the code posted by K.Sandell I don't see concatenation of the values of a parameter array; I see the value of ResultParam being replaced rather than concatenated. My goal was to output all the values of the array in a single line (which I collected to use in another script). The following code, using strcat, got the job done for me:
// Loop all variables, adding them to "stringOfValues"
for (idx=1; idx<=TotalNumberOfMatches; idx++)
{
strcat(stringOfValues, lr_paramarr_idx("arrParameter", idx));
strcat(stringOfValues, Delimiter);
lr_output_message("ResultParam =%s", stringOfValues);
}
// Final Output will print concatenated values
lr_output_message("ResultParam =%s", stringOfValues);
Sample Output (a string of values in a line separated by the delimiter):
ResultParam =BU21, BU25, BU28, BU44, BU100
Incidentally, the normal, better way to pass parameter values would be to pass the parameter array itself to the next script or action where you want to use it (not convert it to strings). But in my case I wanted to run the above code on the LoadRunner controller, in a multithreaded manner, and collect the output for use in a report. I then used Windows findstr /s /C:"ResultParam" *.log to search through the controller's output to find these values. This approach required that all the data be on a single line in the controller log file.

How to make a syntax manipulator?

Firstly, sorry for the question, I know I've heard something that could help, but I just can't remember.
Basically I would like to create my own syntax for a programming language. For example this code:
WRITE OUT 'Hello World!'
NEW LINE
would turn into this Java code:
System.out.print("Hello World!");
System.out.println();
How could I achieve this? Is there a method?
Olá.
There are techniques and proper algorithms to do that.
Search for "compiler techniques" and "Interpreter pattern".
An initial approach could be a basic pattern interpreter.
Assuming simple sentences and only one sentence per line, you could read the input file line by line and search for defined patterns (regular expressions).
The patterns describe the structure of the commands in your invented language.
If you get a match then you do the translation.
In particular, we use the regex.h library in c to perform the regular expression search.
Of course regex is also available in java.
Ex. NEW LINE match the pattern " *NEW +LINE *"
The * means that the preceding character occurs 0 or more times.
The + means that the preceding character occurs 1 or more times.
Thus, this pattern can match the command " NEW LINE " with arbitrary spaces between the words.
Ex. WRITE OUT 'Hello World!' match the pattern "WRITE OUT '([[:print:]]*)'"
or if you want to allow spaces " *WRITE +OUT +'([[:print:]]*)' *"
[[:print:]] means: match one printable character (ex. 'a' or 'Z' or '0' or '+')
Thus, [[:print:]]* match a sequence of 0, 1 or more printable characters
If a line of your input file matched the pattern of some command then you can do the translation, but in most cases you will need to retrieve some information before,
ex. the arbitrary text after WRITE OUT. Thats why you need to put parenthesis around [[:print:]]*. That will indicate to the function that perform the search that you want retrieve that particular part of your pattern.
A nice coincidence is that I recently assisted a friend with an college project similar to the problem you want to solve: a translator from c to basic. I reused that code to make an example for you.
I tested the code and it works.
It can translate:
WRITE OUT 'some text'
WRITE OUT variable
NEW LINE
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
#include <string.h>
#define STR_SHORT 100
#define MATCHES_SIZE 10
/**************************************************************
Returns the string of a match
**************************************************************/
char * GetExp(char *Source, char *Destination, regmatch_t Matches) {
//Source The string that was searched
//Destination Will contains the matched string
//Matches One element of the vector passed to regexec
int Length = Matches.rm_eo - Matches.rm_so;
strncpy(Destination, Source+Matches.rm_so, Length);
Destination[Length]=0;
return Destination;
}
/**************************************************************
MAIN
**************************************************************/
int main(int argc, char *argv[]) {
//Usage
if (argc==1) {
printf("Usage:\n");
printf("interpreter source_file\n");
printf("\n");
printf("Implements a very basic interpreter\n");
return 0;
}
//Open the source file
FILE *SourceFile;
if ( (SourceFile=fopen(argv[1], "r"))==NULL )
return 1;
//This variable is used to get the strings that matched the pattern
//Matches[0] -> the whole string being searched
//Matches[1] -> first parenthetical
//Matches[2] -> second parenthetical
regmatch_t Matches[MATCHES_SIZE];
char MatchedStr[STR_SHORT];
//Regular expression for NEW LINE
regex_t Regex_NewLine;
regcomp(&Regex_NewLine, " *NEW +LINE *", REG_EXTENDED);
//Regular expression for WRITE OUT 'some text'
regex_t Regex_WriteOutStr;
regcomp(&Regex_WriteOutStr, " *WRITE +OUT +'([[:print:]]*)' *", REG_EXTENDED);
//Regular expresion for WRITE OUT variable
regex_t Regex_WriteOutVar;
regcomp(&Regex_WriteOutVar, " *WRITE +OUT +([_[:alpha:]][[:alnum:]]*) *", REG_EXTENDED);
//Regular expression for an empty line'
regex_t Regex_EmptyLine;
regcomp(&Regex_EmptyLine, "^([[:space:]]+)$", REG_EXTENDED);
//Now we read the file line by line
char Buffer[STR_SHORT];
while( fgets(Buffer, STR_SHORT, SourceFile)!=NULL ) {
//printf("%s", Buffer);
//Shorcut for an empty line
if ( regexec(&Regex_EmptyLine, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("\n");
continue;
}
//NEW LINE
if ( regexec(&Regex_NewLine, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("System.out.println();\n");
continue;
}
//WRITE OUT 'some text'
if ( regexec(&Regex_WriteOutStr, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("System.out.print(\"%s\");\n", GetExp(Buffer, MatchedStr, Matches[1]));
continue;
}
//WRITE OUT variable
//Assumes variable is a string variable
if ( regexec(&Regex_WriteOutVar, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("System.out.print(\"%%s\", %s);\n", GetExp(Buffer, MatchedStr, Matches[1]));
continue;
}
//Unknown command
printf("Unknown command: %s", Buffer);
}
return 0;
}
Proper solution for this question requires the following steps:
Parse the original syntax code and create a syntax tree.
That is commonly done with tools like ANTLR.
Go through the syntax tree and either convert it to Java code, or to a Java syntax tree.
Both of those steps have their own complexity, so it would be better to ask separate questions about specific issues you encounter while implementing them.
Strictly speaking you can skip step 2 and generate Java directly when parsing, but unless your language is very simple renaming of Java concepts, you wouldn't be able to do that easily.

Trimming a String

I'm using windows 7 and Visual C++. I have a console program and I am trying to trim a string at the begining and the end. TrimLeft() and TrimRight() don't seem to work without MFC. Here is what I have so far.
pBrowser->get_LocationURL(&bstr);
wprintf(L" URL: %s\n\n", bstr);
SysFreeString(bstr);
std::wstring s;
s = bstr;
s.TrimStart("http://");
s.TrimEnd("/*");
wprintf(L" URL: %s\n\n", s);
I'm trying to go from this:
"http://www.stackoverflow.com/questions/ask"
to this:
"www.stackoverflow.com"
TrimStart/End usually return a value, so you would have to set 's' to equal the value of s.TrimStart() and s.TrimEnd() respectively.
try,
s = s.TrimStart("http://");
s = s.TrimEnd("/*");
You should use find/rfind(right find - find from right) and substr(sub string) in sequence to do what you need to do.
1) Find the index of the first pattern (such as http://) with find - you already know its length, add this to the start index as the origo of your trimmed string
2) Find the last index of the ending pattern with find
3) Create a substring from the origo to the end using substr
These methods are all in std::string

Best word wrap algorithm? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Word wrap is one of the must-have features in a modern text editor.
How word wrap be handled? What is the best algorithm for word-wrap?
If text is several million lines, how can I make word-wrap very fast?
Why do I need the solution? Because my projects must draw text with various zoom level and simultaneously beautiful appearance.
The running environment is Windows Mobile devices. The maximum 600 MHz speed with very small memory size.
How should I handle line information? Let's assume original data has three lines.
THIS IS LINE 1.
THIS IS LINE 2.
THIS IS LINE 3.
Afterwards, the break text will be shown like this:
THIS IS
LINE 1.
THIS IS
LINE 2.
THIS IS
LINE 3.
Should I allocate three lines more? Or any other suggestions?
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
Here is a word-wrap algorithm I've written in C#. It should be fairly easy to translate into other languages (except perhaps for IndexOfAny).
static char[] splitChars = new char[] { ' ', '-', '\t' };
private static string WordWrap(string str, int width)
{
string[] words = Explode(str, splitChars);
int curLineLength = 0;
StringBuilder strBuilder = new StringBuilder();
for(int i = 0; i < words.Length; i += 1)
{
string word = words[i];
// If adding the new word to the current line would be too long,
// then put it on a new line (and split it up if it's too long).
if (curLineLength + word.Length > width)
{
// Only move down to a new line if we have text on the current line.
// Avoids situation where wrapped whitespace causes emptylines in text.
if (curLineLength > 0)
{
strBuilder.Append(Environment.NewLine);
curLineLength = 0;
}
// If the current word is too long to fit on a line even on it's own then
// split the word up.
while (word.Length > width)
{
strBuilder.Append(word.Substring(0, width - 1) + "-");
word = word.Substring(width - 1);
strBuilder.Append(Environment.NewLine);
}
// Remove leading whitespace from the word so the new line starts flush to the left.
word = word.TrimStart();
}
strBuilder.Append(word);
curLineLength += word.Length;
}
return strBuilder.ToString();
}
private static string[] Explode(string str, char[] splitChars)
{
List<string> parts = new List<string>();
int startIndex = 0;
while (true)
{
int index = str.IndexOfAny(splitChars, startIndex);
if (index == -1)
{
parts.Add(str.Substring(startIndex));
return parts.ToArray();
}
string word = str.Substring(startIndex, index - startIndex);
char nextChar = str.Substring(index, 1)[0];
// Dashes and the likes should stick to the word occuring before it. Whitespace doesn't have to.
if (char.IsWhiteSpace(nextChar))
{
parts.Add(word);
parts.Add(nextChar.ToString());
}
else
{
parts.Add(word + nextChar);
}
startIndex = index + 1;
}
}
It's fairly primitive - it splits on spaces, tabs and dashes. It does make sure that dashes stick to the word before it (so you don't end up with stack\n-overflow) though it doesn't favour moving small hyphenated words to a newline rather than splitting them. It does split up words if they are too long for a line.
It's also fairly culturally specific, as I don't know much about the word-wrapping rules of other cultures.
Donald E. Knuth did a lot of work on the line breaking algorithm in his TeX typesetting system. This is arguably one of the best algorithms for line breaking - "best" in terms of visual appearance of result.
His algorithm avoids the problems of greedy line filling where you can end up with a very dense line followed by a very loose line.
An efficient algorithm can be implemented using dynamic programming.
A paper on TeX's line breaking.
I had occasion to write a word wrap function recently, and I want to share what I came up with.
I used a TDD approach almost as strict as the one from the Go example. I started with the test that wrapping the string "Hello, world!" at 80 width should return "Hello, World!". Clearly, the simplest thing that works is to return the input string untouched. Starting from that, I made more and more complex tests and ended up with a recursive solution that (at least for my purposes) quite efficiently handles the task.
Pseudocode for the recursive solution:
Function WordWrap (inputString, width)
Trim the input string of leading and trailing spaces.
If the trimmed string's length is <= the width,
Return the trimmed string.
Else,
Find the index of the last space in the trimmed string, starting at width
If there are no spaces, use the width as the index.
Split the trimmed string into two pieces at the index.
Trim trailing spaces from the portion before the index,
and leading spaces from the portion after the index.
Concatenate and return:
the trimmed portion before the index,
a line break,
and the result of calling WordWrap on the trimmed portion after
the index (with the same width as the original call).
This only wraps at spaces, and if you want to wrap a string that already contains line breaks, you need to split it at the line breaks, send each piece to this function and then reassemble the string. Even so, in VB.NET running on a fast machine, this can handle about 20 MB/second.
I don't know of any specific algorithms, but the following could be a rough outline of how it should work:
For the current text size, font, display size, window size, margins, etc., determine how many characters can fit on a line (if fixed-type), or how many pixels can fit on a line (if not fixed-type).
Go through the line character by character, calculating how many characters or pixels have been recorded since the beginning of the line.
When you go over the maximum characters/pixels for the line, move back to the last space/punctuation mark, and move all text to the next line.
Repeat until you go through all text in the document.
In .NET, word wrapping functionality is built into controls like TextBox. I am sure that a similar built-in functionality exists for other languages as well.
With or without hyphenation?
Without it's easy. Just encapsulate your text as wordobjects per word and give them a method getWidth(). Then start at the first word adding up the rowlength until it is greater than the available space. If so, wrap the last word and start counting again for the next row starting with this one, etc.
With hyphenation you need hyphenation rules in a common format like: hy-phen-a-tion
Then it's the same as the above except you need to split the last word which has caused the overflow.
A good example and tutorial of how to structure your code for an excellent text editor is given in the Gang of Four Design Patterns book. It's one of the main samples on which they show the patterns.
I wondered about the same thing for my own editor project. My solution was a two-step process:
Find the line ends and store them in an array.
For very long lines, find suitable break points at roughly 1K intervals and save them in the line array, too. This is to catch the "4 MB text without a single line break".
When you need to display the text, find the lines in question and wrap them on the fly. Remember this information in a cache for quick redraw. When the user scrolls a whole page, flush the cache and repeat.
If you can, do loading/analyzing of the whole text in a background thread. This way, you can already display the first page of text while the rest of the document is still being examined. The most simple solution here is to cut the first 16 KB of text away and run the algorithm on the substring. This is very fast and allows you to render the first page instantly, even if your editor is still loading the text.
You can use a similar approach when the cursor is initially at the end of the text; just read the last 16 KB of text and analyze that. In this case, use two edit buffers and load all but the last 16 KB into the first while the user is locked into the second buffer. And you'll probably want to remember how many lines the text has when you close the editor, so the scroll bar doesn't look weird.
It gets hairy when the user can start the editor with the cursor somewhere in the middle, but ultimately it's only an extension of the end-problem. Only you need to remember the byte position, the current line number, and the total number of lines from the last session, plus you need three edit buffers or you need an edit buffer where you can cut away 16 KB in the middle.
Alternatively, lock the scrollbar and other interface elements while the text is loading; that allows the user to look at the text while it loads completely.
I cant claim the bug-free-ness of this, but I needed one that word wrapped and obeyed boundaries of indentation. I claim nothing about this code other than it has worked for me so far. This is an extension method and violates the integrity of the StringBuilder but it could be made with whatever inputs / outputs you desire.
public static void WordWrap(this StringBuilder sb, int tabSize, int width)
{
string[] lines = sb.ToString().Replace("\r\n", "\n").Split('\n');
sb.Clear();
for (int i = 0; i < lines.Length; ++i)
{
var line = lines[i];
if (line.Length < 1)
sb.AppendLine();//empty lines
else
{
int indent = line.TakeWhile(c => c == '\t').Count(); //tab indents
line = line.Replace("\t", new String(' ', tabSize)); //need to expand tabs here
string lead = new String(' ', indent * tabSize); //create the leading space
do
{
//get the string that fits in the window
string subline = line.Substring(0, Math.Min(line.Length, width));
if (subline.Length < line.Length && subline.Length > 0)
{
//grab the last non white character
int lastword = subline.LastOrDefault() == ' ' ? -1 : subline.LastIndexOf(' ', subline.Length - 1);
if (lastword >= 0)
subline = subline.Substring(0, lastword);
sb.AppendLine(subline);
//next part
line = lead + line.Substring(subline.Length).TrimStart();
}
else
{
sb.AppendLine(subline); //everything fits
break;
}
}
while (true);
}
}
}
Here is mine that I was working on today for fun in C:
Here are my considerations:
No copying of characters, just printing to standard output. Therefore, since I don't like to modify the argv[x] arguments, and because I like a challenge, I wanted to do it without modifying it. I did not go for the idea of inserting '\n'.
I don't want
This line breaks here
to become
This line breaks
here
so changing characters to '\n' is not an option given this objective.
If the linewidth is set at say 80, and the 80th character is in the middle of a word, the entire word must be put on the next line. So as you're scanning, you have to remember the position of the end of the last word that didn't go over 80 characters.
So here is mine, it's not clean; I've been breaking my head for the past hour trying to get it to work, adding something here and there. It works for all edge cases that I know of.
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
int isDelim(char c){
switch(c){
case '\0':
case '\t':
case ' ' :
return 1;
break; /* As a matter of style, put the 'break' anyway even if there is a return above it.*/
default:
return 0;
}
}
int printLine(const char * start, const char * end){
const char * p = start;
while ( p <= end )
putchar(*p++);
putchar('\n');
}
int main ( int argc , char ** argv ) {
if( argc <= 2 )
exit(1);
char * start = argv[1];
char * lastChar = argv[1];
char * current = argv[1];
int wrapLength = atoi(argv[2]);
int chars = 1;
while( *current != '\0' ){
while( chars <= wrapLength ){
while ( !isDelim( *current ) ) ++current, ++chars;
if( chars <= wrapLength){
if(*current == '\0'){
puts(start);
return 0;
}
lastChar = current-1;
current++,chars++;
}
}
if( lastChar == start )
lastChar = current-1;
printLine(start,lastChar);
current = lastChar + 1;
while(isDelim(*current)){
if( *current == '\0')
return 0;
else
++current;
}
start = current;
lastChar = current;
chars = 1;
}
return 0;
}
So basically, I have start and lastChar that I want to set as the start of a line and the last character of a line. When those are set, I output to standard output all the characters from start to end, then output a '\n', and move on to the next line.
Initially everything points to the start, then I skip words with the while(!isDelim(*current)) ++current,++chars;. As I do that, I remember the last character that was before 80 chars (lastChar).
If, at the end of a word, I have passed my number of chars (80), then I get out of the while(chars <= wrapLength) block. I output all the characters between start and lastChar and a newline.
Then I set current to lastChar+1 and skip delimiters (and if that leads me to the end of the string, we're done, return 0). Set start, lastChar and current to the start of the next line.
The
if(*current == '\0'){
puts(start);
return 0;
}
part is for strings that are too short to be wrapped even once. I added this just before writing this post because I tried a short string and it didn't work.
I feel like this might be doable in a more elegant way. If anyone has anything to suggest I'd love to try it.
And as I wrote this I asked myself "what's going to happen if I have a string that is one word that is longer than my wraplength" Well it doesn't work. So I added the
if( lastChar == start )
lastChar = current-1;
before the printLine() statement (if lastChar hasn't moved, then we have a word that is too long for a single line so we just have to put the whole thing on the line anyway).
I took the comments out of the code since I'm writing this but I really feel that there must be a better way of doing this than what I have that wouldn't need comments.
So that's the story of how I wrote this thing. I hope it can be of use to people and I also hope that someone will be unsatisfied with my code and propose a more elegant way of doing it.
It should be noted that it works for all edge cases: words too long for a line, strings that are shorter than one wrapLength, and empty strings.
I may as well chime in with a perl solution that I made, because gnu fold -s was leaving trailing spaces and other bad behavior. This solution does not (properly) handle text containing tabs or backspaces or embedded carriage returns or the like, although it does handle CRLF line-endings, converting them all to just LF. It makes minimal change to the text, in particular it never splits a word (doesn't change wc -w), and for text with no more than single space in a row (and no CR) it doesn't change wc -c (because it replaces space with LF rather than inserting LF).
#!/usr/bin/perl
use strict;
use warnings;
my $WIDTH = 80;
if ($ARGV[0] =~ /^[1-9][0-9]*$/) {
$WIDTH = $ARGV[0];
shift #ARGV;
}
while (<>) {
s/\r\n$/\n/;
chomp;
if (length $_ <= $WIDTH) {
print "$_\n";
next;
}
#_=split /(\s+)/;
# make #_ start with a separator field and end with a content field
unshift #_, "";
push #_, "" if #_%2;
my ($sep,$cont) = splice(#_, 0, 2);
do {
if (length $cont > $WIDTH) {
print "$cont";
($sep,$cont) = splice(#_, 0, 2);
}
elsif (length($sep) + length($cont) > $WIDTH) {
printf "%*s%s", $WIDTH - length $cont, "", $cont;
($sep,$cont) = splice(#_, 0, 2);
}
else {
my $remain = $WIDTH;
{ do {
print "$sep$cont";
$remain -= length $sep;
$remain -= length $cont;
($sep,$cont) = splice(#_, 0, 2) or last;
}
while (length($sep) + length($cont) <= $remain);
}
}
print "\n";
$sep = "";
}
while ($cont);
}
#ICR, thanks for sharing the C# example.
I did not succeed using it, but I came up with another solution. If there is any interest in this, please feel free to use this:
WordWrap function in C#. The source is available on GitHub.
I've included unit tests / samples.

Resources