error not all control paths return a value - controls

My compiler says that not all control paths return a value and it points to a overloaded>> variable function. I am not exactly sure what is causing the problem and any help would be appreciated. I am trying to overload the stream extraction operator to define if input is valid. if it is not it should set a failbit to indicate improper input.
std::istream &operator >> (std::istream &input, ComplexClass &c)//overloading extraction operator
{
int number;
int multiplier;
char temp;//temp variable used to store input
input >> number;//gets real number
// test if character is a space
if (input.peek() == ' ' /* Write a call to the peek member function to
test if the next character is a space ' ' */) // case a + bi
{
c.real = number;
input >> temp;
multiplier = (temp == '+') ? 1 : -1;
}
// set failbit if character not a space
if (input.peek() != ' ')
{
/* Write a call to the clear member function with
ios::failbit as the argument to set input's fail bit */
input.clear(ios::failbit);
}
else
// set imaginary part if data is valid
if (input.peek() == ' ')
{
input >> c.imaginary;
c.imaginary *= multiplier;
input >> temp;
}
if (input.peek() == '\n' /* Write a call to member function peek to test if the next
character is a newline \n */) // character not a newline
{
input.clear(ios::failbit); // set bad bit
} // end if
else
{
input.clear(ios::failbit); // set bad bit
} // end else
// end if
if (input.peek() == 'i' /* Write a call to member function peek to test if
the next character is 'i' */) // test for i of imaginary number
{
input >> temp;
// test for newline character entered
if (input.peek() == '\n')
{
c.real = 0;
c.imaginary = number;
} // end if
else if (input.peek() == '\n') // set real number if it is valid
{
c.real = number;
c.imaginary = 0;
} // end else if
else
{
input.clear(ios::failbit); // set bad bit
}
return input;
}
}

The return statement is only executed inside the last if:
if (input.peek() == 'i' ) {
...
return input;
}
It should be changed to:
if (input.peek() == 'i' ) {
...
}
return input;

Related

Filtering return on serial port

I have a CO2 sensor on my Arduino Mega and sometimes randomly when I'm reading the CO2 measurement, the sensor will return a "?". The question mark causes my program to crash and return "input string was not in a correct format".
I haven't tried anything because I don't know what approach would be the best for this. The CO2 sensor returns the measurement in the form of "Z 00000" but when this question mark appears it shows that all that returned was a "\n". Currently, I have the program just reading the 5 digits after the Z.
if (returnString != "")
{
val = Convert.ToDouble(returnString.Substring(returnString.LastIndexOf('Z')+ 1));
}
What I expect to return is the digits after Z which works but every so often I will get a random line return which crashes everything.
According to the C# documentation the ToDouble method throws FormatException whenever the input string is invalid. You should catch the exception to avoid further issues.
try {
val = Convert.ToDouble(returnString.Substring(returnString.LastIndexOf('Z')+ 1));
}
catch(FormatException e) {
//If you want to do anything in case of an error
//Otherwise you can leave it blank
}
Also I'd recommend using some sort of statemachine for parsing the data in your case, that could discard all invalid characters. Something like this:
bool z_received = false;
int digits = 0;
int value = 0;
//Called whenever you receive a byte from the serial port
void onCharacter(char input) {
if(input == 'Z') {
z_received = true;
}
else if(z_received && input <= '9' && input >= '0') {
value *= 10;
value += (input - '0');
digits++;
if(digits == 5) {
onData(value);
value = 0;
z_received = false;
digits = 0;
}
}
else {
value = 0;
z_received = false;
digits = 0;
}
}
void onData(int data) {
//do something with the data
}
This is just a mock-up, should work in your case if you can direct the COM port's byte stream into the onCharacter function.

split text file according to brackets or parantheses (top-level only) in terminal

I have several text files (utf-8) that I want to process in shell script. They aren't excactly the same format, but if I could only break them up into edible chunks I can handle that.
This could be programmed in C or python, but I prefer not.
EDIT: I wrote a solution in C; see my own answer. I think this may be the simplest approach after all. If you think I'm wrong please test your solution against the more complicated example input from my answer below.
-- jcxz100
For clarity (and to be able to debug more easily) I want the chunks to be saved as separate text files in a sub-folder.
All types of input files consist of:
junk lines
lines with junk text followed by start brackets or parentheses - i.e. '[' '{' '<' or '(' - and possibly followed by payload
payload lines
lines with brackets or parentheses nested within the top-level pairs; treated as payload too
payload lines with end brackets or parantheses - i.e. ']' '}' '>' or ')' - possibly followed by something (junk text and/or start of a new payload)
I want to break up the input according to only the matching pairs of top-level brackets/parantheses.
Payload inside these pairs must not be altered (including newlines and whitespace).
Everything outside the toplevel pairs should be discarded as junk.
Any junk or payload inside double-quotes must be considered atomic (handled as raw text, thus any brackets or parentheses inside should also be treated as text).
Here is an example (using only {} pairs):
junk text
"atomic junk"
some junk text followed by a start bracket { here is the actual payload
more payload
"atomic payload"
nested start bracket { - all of this line is untouchable payload too
here is more payload
"yet more atomic payload; this one's got a smiley ;-)"
end of nested bracket pair } - all of this line is untouchable payload too
this is payload too
} trailing junk
intermittent junk
{
payload that goes in second output file }
end junk
...sorry: Some of the input files really are as messy as that.
The first output file should be:
{ here is the actual payload
more payload
"atomic payload"
nested start bracket { - all of this line is untouchable payload too
here is more payload
"yet more atomic payload; this one's got a smiley ;-)"
end of nested bracket pair } - all of this line is untouchable payload too
this is payload too
}
... and the second output file:
{
payload that goes in second output file }
Note:
I haven't quite decided whether it's necesary to keep the pair of start/end characters in the output or if they themselves should be discarded as junk.
I think a solution that keeps them in is more general use.
There can be a mix of types of top-level bracket/paranthesis pairs in the same input file.
Beware: There are * and $ characters in the input files, so please avoid confusing bash ;-)
I prefer readability over brevity; but not at an exponential cost of speed.
Nice-to-haves:
There are backslash-escaped double-quotes inside the text; preferably they should be handled
(I have a hack, but it's not pretty).
The script oughtn't break over mismatched pairs of brackets/parentheses in junk and/or payload (note: inside the atomics they must be allowed!)
More-far-out-nice-to-haves:
I haven't seen it yet, but one could speculate that some input might have single-quotes rather than double-quotes to denote atomic content... or even a mix of both.
It would be nice if the script could be easily modified to parse input of similar structure but with different start/end characters or strings.
I can see this is quite a mouthful, but I think it wouldn't give a robust solution if I broke it down into simpler questions.
The main problem is splitting up the input correctly - everything else can be ignored or "solved" with hacks, so
feel free to ignore the nice-to-haves and the more-far-out-nice-to-haves.
Given:
$ cat file
junk text
"atomic junk"
some junk text followed by a start bracket { here is the actual payload
more payload
"atomic payload"
nested start bracket { - all of this line is untouchable payload too
here is more payload
"yet more atomic payload; this one's got a smiley ;-)"
end of nested bracket pair } - all of this line is untouchable payload too
this is payload too
} trailing junk
intermittent junk
{
payload that goes in second output file }
end junk
This perl file will extract the blocks you describe into files block_1, block_2, etc:
#!/usr/bin/perl
use v5.10;
use warnings;
use strict;
use Text::Balanced qw(extract_multiple extract_bracketed);
my $txt;
while (<>){$txt.=$_;} # slurp the file
my #blocks = extract_multiple(
$txt,
[
# Extract {...}
sub { extract_bracketed($_[0], '{}') },
],
# Return all the fields
undef,
# Throw out anything which does not match
1
);
chdir "/tmp";
my $base="block_";
my $cnt=1;
for my $block (#blocks){ my $fn="$base$cnt";
say "writing $fn";
open (my $fh, '>', $fn) or die "Could not open file '$fn' $!";
print $fh "$block\n";
close $fh;
$cnt++;}
Now the files:
$ cat block_1
{ here is the actual payload
more payload
"atomic payload"
nested start bracket { - all of this line is untouchable payload too
here is more payload
"yet more atomic payload; this one's got a smiley ;-)"
end of nested bracket pair } - all of this line is untouchable payload too
this is payload too
}
$ cat block_2
{
payload that goes in second output file }
Using Text::Balanced is robust and likely the best solution.
You can do the blocks with a single Perl regex:
$ perl -0777 -nlE 'while (/(\{(?:(?1)|[^{}]*+)++\})|[^{}\s]++/g) {if ($1) {$cnt++; say "block $cnt:== start:\n$1\n== end";}}' file
block 1:== start:
{ here is the actual payload
more payload
"atomic payload"
nested start bracket { - all of this line is untouchable payload too
here is more payload
"yet more atomic payload; this one's got a smiley ;-)"
end of nested bracket pair } - all of this line is untouchable payload too
this is payload too
}
== end
block 2:== start:
{
payload that goes in second output file }
== end
But that is a little more fragile than using a proper parser like Text::Balanced...
I have a solution in C. It would seem there's too much complexity for this to be easily achieved in shell script.
The program isn't overly complicated but nevertheless has more than 200 lines of code, which include error checking, some speed optimization, and other niceties.
Source file split-brackets-to-chunks.c:
#include <stdio.h>
/* Example code by jcxz100 - your problem if you use it! */
#define BUFF_IN_MAX 255
#define BUFF_IN_SIZE (BUFF_IN_MAX+1)
#define OUT_NAME_MAX 31
#define OUT_NAME_SIZE (OUT_NAME_MAX+1)
#define NO_CHAR '\0'
int main()
{
char pcBuff[BUFF_IN_SIZE];
size_t iReadActual;
FILE *pFileIn, *pFileOut;
int iNumberOfOutputFiles;
char pszOutName[OUT_NAME_SIZE];
char cLiteralChar, cAtomicChar, cChunkStartChar, cChunkEndChar;
int iChunkNesting;
char *pcOutputStart;
size_t iOutputLen;
pcBuff[BUFF_IN_MAX] = '\0'; /* ... just to be sure. */
iReadActual = 0;
pFileIn = pFileOut = NULL;
iNumberOfOutputFiles = 0;
pszOutName[OUT_NAME_MAX] = '\0'; /* ... just to be sure. */
cLiteralChar = cAtomicChar = cChunkStartChar = cChunkEndChar = NO_CHAR;
iChunkNesting = 0;
pcOutputStart = (char*)pcBuff;
iOutputLen = 0;
if ((pFileIn = fopen("input-utf-8.txt", "r")) == NULL)
{
printf("What? Where?\n");
return 1;
}
while ((iReadActual = fread(pcBuff, sizeof(char), BUFF_IN_MAX, pFileIn)) > 0)
{
char *pcPivot, *pcStop;
pcBuff[iReadActual] = '\0'; /* ... just to be sure. */
pcPivot = (char*)pcBuff;
pcStop = (char*)pcBuff + iReadActual;
while (pcPivot < pcStop)
{
if (cLiteralChar != NO_CHAR) /* Ignore this char? */
{
/* Yes, ignore this char. */
if (cChunkStartChar != NO_CHAR)
{
/* ... just write it out: */
fprintf(pFileOut, "%c", *pcPivot);
}
pcPivot++;
cLiteralChar = NO_CHAR;
/* End of "Yes, ignore this char." */
}
else if (cAtomicChar != NO_CHAR) /* Are we inside an atomic string? */
{
/* Yup; we are inside an atomic string. */
int bBreakInnerWhile;
bBreakInnerWhile = 0;
pcOutputStart = pcPivot;
while (bBreakInnerWhile == 0)
{
if (*pcPivot == '\\') /* Treat next char as literal? */
{
cLiteralChar = '\\'; /* Yes. */
bBreakInnerWhile = 1;
}
else if (*pcPivot == cAtomicChar) /* End of atomic? */
{
cAtomicChar = NO_CHAR; /* Yes. */
bBreakInnerWhile = 1;
}
if (++pcPivot == pcStop) bBreakInnerWhile = 1;
}
if (cChunkStartChar != NO_CHAR)
{
/* The atomic string is part of a chunk. */
iOutputLen = (size_t)(pcPivot-pcOutputStart);
fprintf(pFileOut, "%.*s", iOutputLen, pcOutputStart);
}
/* End of "Yup; we are inside an atomic string." */
}
else if (cChunkStartChar == NO_CHAR) /* Are we inside a chunk? */
{
/* No, we are outside a chunk. */
int bBreakInnerWhile;
bBreakInnerWhile = 0;
while (bBreakInnerWhile == 0)
{
/* Detect start of anything interesting: */
switch (*pcPivot)
{
/* Start of atomic? */
case '"':
case '\'':
cAtomicChar = *pcPivot;
bBreakInnerWhile = 1;
break;
/* Start of chunk? */
case '{':
cChunkStartChar = *pcPivot;
cChunkEndChar = '}';
break;
case '[':
cChunkStartChar = *pcPivot;
cChunkEndChar = ']';
break;
case '(':
cChunkStartChar = *pcPivot;
cChunkEndChar = ')';
break;
case '<':
cChunkStartChar = *pcPivot;
cChunkEndChar = '>';
break;
}
if (cChunkStartChar != NO_CHAR)
{
iNumberOfOutputFiles++;
printf("Start '%c' '%c' chunk (file %04d.txt)\n", *pcPivot, cChunkEndChar, iNumberOfOutputFiles);
sprintf((char*)pszOutName, "output/%04d.txt", iNumberOfOutputFiles);
if ((pFileOut = fopen(pszOutName, "w")) == NULL)
{
printf("What? How?\n");
fclose(pFileIn);
return 2;
}
bBreakInnerWhile = 1;
}
else if (++pcPivot == pcStop)
{
bBreakInnerWhile = 1;
}
}
/* End of "No, we are outside a chunk." */
}
else
{
/* Yes, we are inside a chunk. */
int bBreakInnerWhile;
bBreakInnerWhile = 0;
pcOutputStart = pcPivot;
while (bBreakInnerWhile == 0)
{
if (*pcPivot == cChunkStartChar)
{
/* Increase level of brackets/parantheses: */
iChunkNesting++;
}
else if (*pcPivot == cChunkEndChar)
{
/* Decrease level of brackets/parantheses: */
iChunkNesting--;
if (iChunkNesting == 0)
{
/* We are now outside chunk. */
bBreakInnerWhile = 1;
}
}
else
{
/* Detect atomic start: */
switch (*pcPivot)
{
case '"':
case '\'':
cAtomicChar = *pcPivot;
bBreakInnerWhile = 1;
break;
}
}
if (++pcPivot == pcStop) bBreakInnerWhile = 1;
}
iOutputLen = (size_t)(pcPivot-pcOutputStart);
fprintf(pFileOut, "%.*s", iOutputLen, pcOutputStart);
if (iChunkNesting == 0)
{
printf("File done.\n");
cChunkStartChar = cChunkEndChar = NO_CHAR;
fclose(pFileOut);
pFileOut = NULL;
}
/* End of "Yes, we are inside a chunk." */
}
}
}
if (cChunkStartChar != NO_CHAR)
{
printf("Chunk exceeds end-of-file. Exiting gracefully.\n");
fclose(pFileOut);
pFileOut = NULL;
}
if (iNumberOfOutputFiles == 0) printf("Nothing to do...\n");
else printf("All done.\n");
fclose(pFileIn);
return 0;
}
I've solved the nice-to-haves and one of the more-far-out-nice-to-haves.
To show this the input is a little more complex than the example in the question:
junk text
"atomic junk"
some junk text followed by a start bracket { here is the actual payload
more payload
'atomic payload { with start bracket that should be ignored'
nested start bracket { - all of this line is untouchable payload too
here is more payload
"this atomic has a literal double-quote \" inside"
"yet more atomic payload; this one's got a smiley ;-) and a heart <3"
end of nested bracket pair } - all of this line is untouchable payload too
this is payload too
"here's a totally unprovoked $ sign and an * asterisk"
} trailing junk
intermittent junk
<
payload that goes in second output file } mismatched end bracket should be ignored >
end junk
Resulting file output/0001.txt:
{ here is the actual payload
more payload
'atomic payload { with start bracket that should be ignored'
nested start bracket { - all of this line is untouchable payload too
here is more payload
"this atomic has a literal double-quote \" inside"
"yet more atomic payload; this one's got a smiley ;-) and a heart <3"
end of nested bracket pair } - all of this line is untouchable payload too
this is payload too
"here's a totally unprovoked $ sign and an * asterisk"
}
... and resulting file output/0002.txt:
<
payload that goes in second output file } mismatched end bracket should be ignored >
Thanks #dawg for your help :)

How to Read and write into/from a text file?

Please correct the below code:only.
file already contains entries : 1st row username; 2nd row password.
checkbox status required to write to the third line and need to read or alter only the checkbox status value in the file.
Currently this code is working if there already is a value for the checkbox status value, then it is overwriting, else UI is hanging.
WriteCheckStatusToFile(BOOL& locVar)
{
FILE *l_pFile = NULL;
CString l_strRememberCheck;
l_strRememberCheck = GetExePath() + _T("password");
CString sVar;
sVar.Format(_T("%d"),locVar);
if(NULL != (l_pFile = fopen(l_strRememberCheck, _T("r+"))) )
{
int count = 0;
char c;
while(count != 2)
{
if((c = fgetc(l_pFile)) == '\n') count++;
}
fseek(l_pFile,ftell(l_pFile),SEEK_SET);
fprintf(l_pFile, sVar);
}
l_strRememberCheck.ReleaseBuffer();
fclose(l_pFile);
}
thanks in advance to all!
sam.
This line
fprintf(l_pFile, sVar);
doesn't look right. I think it should be
fprintf(l_pFile, "%s\n", (LPCTSTR) sVar);
The loop could become infinite if the file has less than two linefeeds:
while(count != 2)
I think it should be:
while( (count < 2) && ( ! feof(l_pFile) ) && ( c != EOF ) )
Probably unrelated to your error, but - at least for this code snippet - CString::ReleaseBuffer() doesn't need to be called since you have not called CString::GetBuffer().
l_strRememberCheck.ReleaseBuffer();
This line may be unnecessary as it appears to fseek() to where the file pointer already is:
fseek(l_pFile,ftell(l_pFile),SEEK_SET);
In the event a two-line file is not terminated with a '\n' you would need to print like this:
if ( count == 2 )
{
fprintf(l_pFile, "%s\n", (LPCTSTR) sVar);
}
else
{
fprintf(l_pFile, "\n%s\n", (LPCTSTR) sVar);
}

String: replacing spaces by a number

I would like to replace every blank spaces in a string by a fixnum (which is the number of blank spaces).
Let me give an example:
s = "hello, how are you ?"
omg(s) # => "hello,3how10are2you1?"
Do you see a way (sexy if possible) to update a string like this?
Thank you Rubists :)
gsub can be fed a block for the "replace with" param, the result of the block is inserted into place where the match was found. The argument to the block is the matched string. So to implement this we capture as much whitespace as we can ( /\s+/ ) and feed that into the block each time a section is found, returning that string's length, which gets put back where the whitespace was originally found.
Code:
s = "hello, how are you ?"
res = s.gsub(/\s+/) { |m| m.length }
puts res
# => hello,3how10are2you1?
it is possible to do this via an array split : Javascript example
var s = "hello, how are you ?";
function omg( str ) {
var strArr = str.split('');
var count = 0;
var finalStr = '';
for( var i = 0; i < strArr.length; i++ ) {
if( strArr[i] == ' ' ) {
count++;
}
else
{
if( count > 0 ) {
finalStr += '' + count;
count = 0;
}
finalStr += strArr[i];
}
}
return finalStr
}
alert( omg( s ) ); //"hello,3how10are2you1?"
Lol, this seems the best it can be for javascript

Canonical way to parse the command line into arguments in plain C Windows API

In a Windows program, what is the canonical way to parse the command line obtained from GetCommandLine into multiple arguments, similar to the argv array in Unix? It seems that CommandLineToArgvW does this for a Unicode command line, but I can't find a non-Unicode equivalent. Should I be using Unicode or not? If not, how do I parse the command line?
Here is an implementation of CommandLineToArgvA that delegate the work to CommandLineToArgvW, MultiByteToWideChar and WideCharToMultiByte.
LPSTR* CommandLineToArgvA(LPSTR lpCmdLine, INT *pNumArgs)
{
int retval;
retval = MultiByteToWideChar(CP_ACP, MB_ERR_INVALID_CHARS, lpCmdLine, -1, NULL, 0);
if (!SUCCEEDED(retval))
return NULL;
LPWSTR lpWideCharStr = (LPWSTR)malloc(retval * sizeof(WCHAR));
if (lpWideCharStr == NULL)
return NULL;
retval = MultiByteToWideChar(CP_ACP, MB_ERR_INVALID_CHARS, lpCmdLine, -1, lpWideCharStr, retval);
if (!SUCCEEDED(retval))
{
free(lpWideCharStr);
return NULL;
}
int numArgs;
LPWSTR* args;
args = CommandLineToArgvW(lpWideCharStr, &numArgs);
free(lpWideCharStr);
if (args == NULL)
return NULL;
int storage = numArgs * sizeof(LPSTR);
for (int i = 0; i < numArgs; ++ i)
{
BOOL lpUsedDefaultChar = FALSE;
retval = WideCharToMultiByte(CP_ACP, 0, args[i], -1, NULL, 0, NULL, &lpUsedDefaultChar);
if (!SUCCEEDED(retval))
{
LocalFree(args);
return NULL;
}
storage += retval;
}
LPSTR* result = (LPSTR*)LocalAlloc(LMEM_FIXED, storage);
if (result == NULL)
{
LocalFree(args);
return NULL;
}
int bufLen = storage - numArgs * sizeof(LPSTR);
LPSTR buffer = ((LPSTR)result) + numArgs * sizeof(LPSTR);
for (int i = 0; i < numArgs; ++ i)
{
assert(bufLen > 0);
BOOL lpUsedDefaultChar = FALSE;
retval = WideCharToMultiByte(CP_ACP, 0, args[i], -1, buffer, bufLen, NULL, &lpUsedDefaultChar);
if (!SUCCEEDED(retval))
{
LocalFree(result);
LocalFree(args);
return NULL;
}
result[i] = buffer;
buffer += retval;
bufLen -= retval;
}
LocalFree(args);
*pNumArgs = numArgs;
return result;
}
Apparently you can use __argv outside main() to access the pre-parsed argument vector...
I followed the source for parse_cmd (see "argv_parsing.cpp" in the latest SDK) and modified it to match the paradigm and operation for CommandLineToArgW and developed the following. Note: instead of using LocalAlloc, per Microsoft recommendations (see https://msdn.microsoft.com/en-us/library/windows/desktop/aa366723(v=vs.85).aspx) I've substituted HeapAlloc. Additionally one change in the SAL notation. I deviate slightly be stating _In_opt_ for lpCmdLine - as CommandLineToArgvW does allow this to be NULL, in which case it returns an argument list containing just the program name.
A final caveat, parse_cmd will parse the command line slightly different from CommandLineToArgvW in one aspect only: two double quote characters in a row while the state is 'in quote' mode are interpreted as an escaped double quote character. Both functions consume the first one and output the second one. The difference is that for CommandLineToArgvW, there is a transition out of 'in quote' mode, while parse_cmdline remains in 'in quote' mode. This is properly reflected in the function below.
You would use the below function as follows:
int argc = 0;
LPSTR *argv = CommandLineToArgvA(GetCommandLineA(), &argc);
HeapFree(GetProcessHeap(), NULL, argv);
LPSTR* CommandLineToArgvA(_In_opt_ LPCSTR lpCmdLine, _Out_ int *pNumArgs)
{
if (!pNumArgs)
{
SetLastError(ERROR_INVALID_PARAMETER);
return NULL;
}
*pNumArgs = 0;
/*follow CommandLinetoArgvW and if lpCmdLine is NULL return the path to the executable.
Use 'programname' so that we don't have to allocate MAX_PATH * sizeof(CHAR) for argv
every time. Since this is ANSI the return can't be greater than MAX_PATH (260
characters)*/
CHAR programname[MAX_PATH] = {};
/*pnlength = the length of the string that is copied to the buffer, in characters, not
including the terminating null character*/
DWORD pnlength = GetModuleFileNameA(NULL, programname, MAX_PATH);
if (pnlength == 0) //error getting program name
{
//GetModuleFileNameA will SetLastError
return NULL;
}
if (*lpCmdLine == NULL)
{
/*In keeping with CommandLineToArgvW the caller should make a single call to HeapFree
to release the memory of argv. Allocate a single block of memory with space for two
pointers (representing argv[0] and argv[1]). argv[0] will contain a pointer to argv+2
where the actual program name will be stored. argv[1] will be nullptr per the C++
specifications for argv. Hence space required is the size of a LPSTR (char*) multiplied
by 2 [pointers] + the length of the program name (+1 for null terminating character)
multiplied by the sizeof CHAR. HeapAlloc is called with HEAP_GENERATE_EXCEPTIONS flag,
so if there is a failure on allocating memory an exception will be generated.*/
LPSTR *argv = static_cast<LPSTR*>(HeapAlloc(GetProcessHeap(),
HEAP_ZERO_MEMORY | HEAP_GENERATE_EXCEPTIONS,
(sizeof(LPSTR) * 2) + ((pnlength + 1) * sizeof(CHAR))));
memcpy(argv + 2, programname, pnlength+1); //add 1 for the terminating null character
argv[0] = reinterpret_cast<LPSTR>(argv + 2);
argv[1] = nullptr;
*pNumArgs = 1;
return argv;
}
/*We need to determine the number of arguments and the number of characters so that the
proper amount of memory can be allocated for argv. Our argument count starts at 1 as the
first "argument" is the program name even if there are no other arguments per specs.*/
int argc = 1;
int numchars = 0;
LPCSTR templpcl = lpCmdLine;
bool in_quotes = false; //'in quotes' mode is off (false) or on (true)
/*first scan the program name and copy it. The handling is much simpler than for other
arguments. Basically, whatever lies between the leading double-quote and next one, or a
terminal null character is simply accepted. Fancier handling is not required because the
program name must be a legal NTFS/HPFS file name. Note that the double-quote characters are
not copied.*/
do {
if (*templpcl == '"')
{
//don't add " to character count
in_quotes = !in_quotes;
templpcl++; //move to next character
continue;
}
++numchars; //count character
templpcl++; //move to next character
if (_ismbblead(*templpcl) != 0) //handle MBCS
{
++numchars;
templpcl++; //skip over trail byte
}
} while (*templpcl != '\0' && (in_quotes || (*templpcl != ' ' && *templpcl != '\t')));
//parsed first argument
if (*templpcl == '\0')
{
/*no more arguments, rewind and the next for statement will handle*/
templpcl--;
}
//loop through the remaining arguments
int slashcount = 0; //count of backslashes
bool countorcopychar = true; //count the character or not
for (;;)
{
if (*templpcl)
{
//next argument begins with next non-whitespace character
while (*templpcl == ' ' || *templpcl == '\t')
++templpcl;
}
if (*templpcl == '\0')
break; //end of arguments
++argc; //next argument - increment argument count
//loop through this argument
for (;;)
{
/*Rules:
2N backslashes + " ==> N backslashes and begin/end quote
2N + 1 backslashes + " ==> N backslashes + literal "
N backslashes ==> N backslashes*/
slashcount = 0;
countorcopychar = true;
while (*templpcl == '\\')
{
//count the number of backslashes for use below
++templpcl;
++slashcount;
}
if (*templpcl == '"')
{
//if 2N backslashes before, start/end quote, otherwise count.
if (slashcount % 2 == 0) //even number of backslashes
{
if (in_quotes && *(templpcl +1) == '"')
{
in_quotes = !in_quotes; //NB: parse_cmdline omits this line
templpcl++; //double quote inside quoted string
}
else
{
//skip first quote character and count second
countorcopychar = false;
in_quotes = !in_quotes;
}
}
slashcount /= 2;
}
//count slashes
while (slashcount--)
{
++numchars;
}
if (*templpcl == '\0' || (!in_quotes && (*templpcl == ' ' || *templpcl == '\t')))
{
//at the end of the argument - break
break;
}
if (countorcopychar)
{
if (_ismbblead(*templpcl) != 0) //should copy another character for MBCS
{
++templpcl; //skip over trail byte
++numchars;
}
++numchars;
}
++templpcl;
}
//add a count for the null-terminating character
++numchars;
}
/*allocate memory for argv. Allocate a single block of memory with space for argc number of
pointers. argv[0] will contain a pointer to argv+argc where the actual program name will be
stored. argv[argc] will be nullptr per the C++ specifications. Hence space required is the
size of a LPSTR (char*) multiplied by argc + 1 pointers + the number of characters counted
above multiplied by the sizeof CHAR. HeapAlloc is called with HEAP_GENERATE_EXCEPTIONS
flag, so if there is a failure on allocating memory an exception will be generated.*/
LPSTR *argv = static_cast<LPSTR*>(HeapAlloc(GetProcessHeap(),
HEAP_ZERO_MEMORY | HEAP_GENERATE_EXCEPTIONS,
(sizeof(LPSTR) * (argc+1)) + (numchars * sizeof(CHAR))));
//now loop through the commandline again and split out arguments
in_quotes = false;
templpcl = lpCmdLine;
argv[0] = reinterpret_cast<LPSTR>(argv + argc+1);
LPSTR tempargv = reinterpret_cast<LPSTR>(argv + argc+1);
do {
if (*templpcl == '"')
{
in_quotes = !in_quotes;
templpcl++; //move to next character
continue;
}
*tempargv++ = *templpcl;
templpcl++; //move to next character
if (_ismbblead(*templpcl) != 0) //should copy another character for MBCS
{
*tempargv++ = *templpcl; //copy second byte
templpcl++; //skip over trail byte
}
} while (*templpcl != '\0' && (in_quotes || (*templpcl != ' ' && *templpcl != '\t')));
//parsed first argument
if (*templpcl == '\0')
{
//no more arguments, rewind and the next for statement will handle
templpcl--;
}
else
{
//end of program name - add null terminator
*tempargv = '\0';
}
int currentarg = 1;
argv[currentarg] = ++tempargv;
//loop through the remaining arguments
slashcount = 0; //count of backslashes
countorcopychar = true; //count the character or not
for (;;)
{
if (*templpcl)
{
//next argument begins with next non-whitespace character
while (*templpcl == ' ' || *templpcl == '\t')
++templpcl;
}
if (*templpcl == '\0')
break; //end of arguments
argv[currentarg] = ++tempargv; //copy address of this argument string
//next argument - loop through it's characters
for (;;)
{
/*Rules:
2N backslashes + " ==> N backslashes and begin/end quote
2N + 1 backslashes + " ==> N backslashes + literal "
N backslashes ==> N backslashes*/
slashcount = 0;
countorcopychar = true;
while (*templpcl == '\\')
{
//count the number of backslashes for use below
++templpcl;
++slashcount;
}
if (*templpcl == '"')
{
//if 2N backslashes before, start/end quote, otherwise copy literally.
if (slashcount % 2 == 0) //even number of backslashes
{
if (in_quotes && *(templpcl+1) == '"')
{
in_quotes = !in_quotes; //NB: parse_cmdline omits this line
templpcl++; //double quote inside quoted string
}
else
{
//skip first quote character and count second
countorcopychar = false;
in_quotes = !in_quotes;
}
}
slashcount /= 2;
}
//copy slashes
while (slashcount--)
{
*tempargv++ = '\\';
}
if (*templpcl == '\0' || (!in_quotes && (*templpcl == ' ' || *templpcl == '\t')))
{
//at the end of the argument - break
break;
}
if (countorcopychar)
{
*tempargv++ = *templpcl;
if (_ismbblead(*templpcl) != 0) //should copy another character for MBCS
{
++templpcl; //skip over trail byte
*tempargv++ = *templpcl;
}
}
++templpcl;
}
//null-terminate the argument
*tempargv = '\0';
++currentarg;
}
argv[argc] = nullptr;
*pNumArgs = argc;
return argv;
}
CommandLineToArgvW() is in shell32.dll. I'd guessthat the Shell developers created the function for their own use, and it was made public either because someone decided that 3rd party devs would find it useful or because some court action made them do it.
Since the Shell developers only ever needed a Unicode version that's all they ever wrote. It would be fairly simple to write an ANSI wrapper for the function that converts the ANSI to Unicode, calls the function and converts the Unicode results to ANSI (and if Shell32.dll ever provided an ANSI variant of this API, that's probably exactly what would do).
None of these solved the problem perfectly when don't want parsing UNICODE, so my solution is modified from WINE projects, they contains source code of CommandLineToArgvW of shell32.dll, modified it to below and it's work perfectly for me:
/*************************************************************************
* CommandLineToArgvA [SHELL32.#]
*
* MODIFIED FROM https://www.winehq.org/ project
* We must interpret the quotes in the command line to rebuild the argv
* array correctly:
* - arguments are separated by spaces or tabs
* - quotes serve as optional argument delimiters
* '"a b"' -> 'a b'
* - escaped quotes must be converted back to '"'
* '\"' -> '"'
* - consecutive backslashes preceding a quote see their number halved with
* the remainder escaping the quote:
* 2n backslashes + quote -> n backslashes + quote as an argument delimiter
* 2n+1 backslashes + quote -> n backslashes + literal quote
* - backslashes that are not followed by a quote are copied literally:
* 'a\b' -> 'a\b'
* 'a\\b' -> 'a\\b'
* - in quoted strings, consecutive quotes see their number divided by three
* with the remainder modulo 3 deciding whether to close the string or not.
* Note that the opening quote must be counted in the consecutive quotes,
* that's the (1+) below:
* (1+) 3n quotes -> n quotes
* (1+) 3n+1 quotes -> n quotes plus closes the quoted string
* (1+) 3n+2 quotes -> n+1 quotes plus closes the quoted string
* - in unquoted strings, the first quote opens the quoted string and the
* remaining consecutive quotes follow the above rule.
*/
LPSTR* WINAPI CommandLineToArgvA(LPSTR lpCmdline, int* numargs)
{
DWORD argc;
LPSTR *argv;
LPSTR s;
LPSTR d;
LPSTR cmdline;
int qcount,bcount;
if(!numargs || *lpCmdline==0)
{
SetLastError(ERROR_INVALID_PARAMETER);
return NULL;
}
/* --- First count the arguments */
argc=1;
s=lpCmdline;
/* The first argument, the executable path, follows special rules */
if (*s=='"')
{
/* The executable path ends at the next quote, no matter what */
s++;
while (*s)
if (*s++=='"')
break;
}
else
{
/* The executable path ends at the next space, no matter what */
while (*s && *s!=' ' && *s!='\t')
s++;
}
/* skip to the first argument, if any */
while (*s==' ' || *s=='\t')
s++;
if (*s)
argc++;
/* Analyze the remaining arguments */
qcount=bcount=0;
while (*s)
{
if ((*s==' ' || *s=='\t') && qcount==0)
{
/* skip to the next argument and count it if any */
while (*s==' ' || *s=='\t')
s++;
if (*s)
argc++;
bcount=0;
}
else if (*s=='\\')
{
/* '\', count them */
bcount++;
s++;
}
else if (*s=='"')
{
/* '"' */
if ((bcount & 1)==0)
qcount++; /* unescaped '"' */
s++;
bcount=0;
/* consecutive quotes, see comment in copying code below */
while (*s=='"')
{
qcount++;
s++;
}
qcount=qcount % 3;
if (qcount==2)
qcount=0;
}
else
{
/* a regular character */
bcount=0;
s++;
}
}
/* Allocate in a single lump, the string array, and the strings that go
* with it. This way the caller can make a single LocalFree() call to free
* both, as per MSDN.
*/
argv=LocalAlloc(LMEM_FIXED, (argc+1)*sizeof(LPSTR)+(strlen(lpCmdline)+1)*sizeof(char));
if (!argv)
return NULL;
cmdline=(LPSTR)(argv+argc+1);
strcpy(cmdline, lpCmdline);
/* --- Then split and copy the arguments */
argv[0]=d=cmdline;
argc=1;
/* The first argument, the executable path, follows special rules */
if (*d=='"')
{
/* The executable path ends at the next quote, no matter what */
s=d+1;
while (*s)
{
if (*s=='"')
{
s++;
break;
}
*d++=*s++;
}
}
else
{
/* The executable path ends at the next space, no matter what */
while (*d && *d!=' ' && *d!='\t')
d++;
s=d;
if (*s)
s++;
}
/* close the executable path */
*d++=0;
/* skip to the first argument and initialize it if any */
while (*s==' ' || *s=='\t')
s++;
if (!*s)
{
/* There are no parameters so we are all done */
argv[argc]=NULL;
*numargs=argc;
return argv;
}
/* Split and copy the remaining arguments */
argv[argc++]=d;
qcount=bcount=0;
while (*s)
{
if ((*s==' ' || *s=='\t') && qcount==0)
{
/* close the argument */
*d++=0;
bcount=0;
/* skip to the next one and initialize it if any */
do {
s++;
} while (*s==' ' || *s=='\t');
if (*s)
argv[argc++]=d;
}
else if (*s=='\\')
{
*d++=*s++;
bcount++;
}
else if (*s=='"')
{
if ((bcount & 1)==0)
{
/* Preceded by an even number of '\', this is half that
* number of '\', plus a quote which we erase.
*/
d-=bcount/2;
qcount++;
}
else
{
/* Preceded by an odd number of '\', this is half that
* number of '\' followed by a '"'
*/
d=d-bcount/2-1;
*d++='"';
}
s++;
bcount=0;
/* Now count the number of consecutive quotes. Note that qcount
* already takes into account the opening quote if any, as well as
* the quote that lead us here.
*/
while (*s=='"')
{
if (++qcount==3)
{
*d++='"';
qcount=0;
}
s++;
}
if (qcount==2)
qcount=0;
}
else
{
/* a regular character */
*d++=*s++;
bcount=0;
}
}
*d='\0';
argv[argc]=NULL;
*numargs=argc;
return argv;
}
Be careful when parsing empty string "", it's return NULL instead of executable path, that's the different behavior with the standard CommandLineToArgvW, the recommanded usage is below:
int argc;
LPSTR * argv = CommandLineToArgvA(GetCommandLineA(), &argc);
// AFTER consumed argv
LocalFree(argv);
The following is about the simplest way I can think of to obtain an old-fashioned argc/argv pair at the top of WinMain. Assuming that the command-line really was ANSI text, you don't actually need any conversions fancier than this.
int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nShowCmd) {
int argc;
LPWSTR *szArglist = CommandLineToArgvW(GetCommandLineW(), &argc);
char **argv = new char*[argc];
for (int i=0; i<argc; i++) {
int lgth = wcslen(szArglist[i]);
argv[i] = new char[lgth+1];
for (int j=0; j<=lgth; j++)
argv[i][j] = char(szArglist[i][j]);
}
LocalFree(szArglist);

Resources