Maximum Length of Command Line String - windows

In Windows, what is the maximum length of a command line string? Meaning if I specify a program which takes arguments on the command line such as abc.exe -name=abc
A simple console application I wrote takes parameters via command line and I want to know what is the maximum allowable amount.

From the Microsoft documentation: Command prompt (Cmd. exe) command-line string limitation
On computers running Microsoft Windows XP or later, the maximum length of the string that you can use at the command prompt is 8191 characters.

Sorry for digging out an old thread, but I think sunetos' answer isn't correct (or isn't the full answer). I've done some experiments (using ProcessStartInfo in c#) and it seems that the 'arguments' string for a commandline command is limited to 2048 characters in XP and 32768 characters in Win7. I'm not sure what the 8191 limit refers to, but I haven't found any evidence of it yet.

As #Sugrue I'm also digging out an old thread.
To explain why there is 32768 (I think it should be 32767, but lets believe experimental testing result) characters limitation we need to dig into Windows API.
No matter how you launch program with command line arguments it goes to ShellExecute, CreateProcess or any extended their version. These APIs basically wrap other NT level API that are not officially documented. As far as I know these calls wrap NtCreateProcess, which requires OBJECT_ATTRIBUTES structure as a parameter, to create that structure InitializeObjectAttributes is used. In this place we see UNICODE_STRING. So now lets take a look into this structure:
typedef struct _UNICODE_STRING {
USHORT Length;
USHORT MaximumLength;
PWSTR Buffer;
} UNICODE_STRING;
It uses USHORT (16-bit length [0; 65535]) variable to store length. And according this, length indicates size in bytes, not characters. So we have: 65535 / 2 = 32767 (because WCHAR is 2 bytes long).
There are a few steps to dig into this number, but I hope it is clear.
Also, to support #sunetos answer what is accepted. 8191 is a maximum number allowed to be entered into cmd.exe, if you exceed this limit, The input line is too long. error is generated. So, answer is correct despite the fact that cmd.exe is not the only way to pass arguments for new process.

In Windows 10, it's still 8191 characters...at least on my machine.
It just cuts off any text after 8191 characters. Well, actually, I got 8196 characters, and after 8196, then it just won't let me type any more.
Here's a script that will test how long of a statement you can use. Well, assuming you have gawk/awk installed.
echo rem this is a test of how long of a line that a .cmd script can generate >testbat.bat
gawk 'BEGIN {printf "echo -----";for (i=10;i^<=100000;i +=10) printf "%%06d----",i;print;print "pause";}' >>testbat.bat
testbat.bat

Related

ReadFile truncating console input data containing multibyte characters, how to get correct input?

I was trying to implement a unified input interface using Windows API function ReadFile for my application, which should be able to handle both console input and redirection. It didn't work as expected with console input containing multibyte (like CJK) characters.
According to Microsoft Documentation, for console input handles, ReadFile just behaves like ReadConsoleA. (FYI, results are encoded in console's current code page, so A family console functions are acceptable. And there's no ReadFileW as ReadFile works on bytes.) The third and fourth arguments in ReadFile is nNumberOfBytesToRead and lpNumberOfBytesRead respectively, but they are nNumberOfCharsToRead and lpNumberOfCharsRead in ReadConsole. To find out the exact mechanism, I did the following test:
BYTE buf[8];
DWORD len;
BOOL f = ReadFile(in, buf, 4, &len, NULL);
if (f) {
// Print buf, len
ReadConsoleW(in, buf, 4, &len, NULL); // check count of remaining characters
// Print len
}
For input like 字, len is set to 4 first (character plus CRLF), indicating the arguments are counting bytes.
For 文字 or a字, len keeps 4 and only the first 4 bytes of buf are used at first, but the second read does not get the CRLF. Only when more than 3 characters are input will the second read get unread LF, then CR. It means that ReadFile is actually consuming up to 4 logical characters, and discarding the part of input after the first 4 bytes.
The behavior of ReadConsoleA is identical to ReadFile.
Obviously, this is more likely to be a bug than design. I did some searches and found a related feedback dating back to 2009. It seems that ReadConsoleA and ReadFile did read data fully from console input, but as it was inconsistent with ReadFile specifications and could cause severe buffer overflow that threatened system processes, Microsoft did a makeshift repair, by simply discarding excess bytes, ignoring support for multibyte charsets. (This is an issue about the behavior after that, limiting buffer to 1 byte.)
Currently the only practical solution I have come up with to make input correct is to check whether the input handle is a console, and process it differently using ReadConsoleW if so, which adds complexity to the implementation. Are there other ways to get it correct?
Maybe I could still keep ReadFile, by providing a buffer large enough to hold any input at one time. However, I don't have any ideas on how to check or set the input buffer size. (I can only enter 256 characters (254 plus CRLF) in my application on my computer, but cmd.exe allows to enter 8,192 characters, so this is really a problem.) It will also be helpful if more information about this can be provided.
Ps.: Maybe _getws could also help, but this question is about Windows API, and my application needs to use some low-level console functions.

Unexpected Blank lines in python output to Windows console

I have a little program that prints out a direcory structure.
It works fine except when the direcory names contain german umlaut characters.
In this case int prints a blank line after the directory line.
I'm running Python 3.50 on Windows 7 64bit.
This Code ...
class dm():
...
def print(self, rootdir=None, depth=0):
if rootdir is None:
rootdir = self.initialdir
if rootdir in self.dirtree:
print('{}{} ({} files)'.format(' '*depth,
rootdir,
len(self.dirtree[rootdir]['files'])))
for _dir in self.dirtree[rootdir]['dirs']:
self.print(os.path.join(rootdir, _dir), depth+1)
else:
pass
...produces the following output:
B:\scratch (11 files)
B:\scratch\Test1 (3 files)
B:\scratch\Test1 - Kopie (0 files)
B:\scratch\Test1 - Übel (0 files)
B:\scratch\Test2 (3 files)
B:\scratch\Test2\Test21 (0 files)
This is so with codepage set to 65001. If i change the codepage to e.g. 850 then the blank line disappears but of course the "Ü" isn't printed correctly.
The structure self.dirtree is a dict of dicts of lists, is parsed with os.walk and seems OK.
Python or Windows? Any suggestions?
Marvin
There are several bugs when using codepage 65001 (UTF-8) -- all of which are due to the Windows console (i.e. conhost.exe), not Python. The best solution is to avoid this buggy codepage, and instead use the wide-character API, such as by loading win_unicode_console.
You're experiencing a bug that exists in the legacy console that was used prior to Windows 10. (It's still available in Windows 10 if you select the option "Use legacy console".) The console decodes the UTF-8 buffer to UTF-16 and reports back that it writes b'\xc3\x9c' (i.e. "Ü" encoded as UTF-8) as one character, but it's supposed to report back the number of bytes that it writes, which is two. Python's buffered sys.stdout sees that apparently one byte wasn't written, so it dutifully writes the last byte of the line again, which is b'\n'. That's why you get an extra newline. The result can be far worse if a written buffer has many non-ASCII characters, especially codes above U+07FF that get encoded as three UTF-8 bytes.
There's a worse bug if you try to paste "Ü" into the interactive REPL. This bug is still present even in Windows 10. In this case a process is reading the console's wide-character (UTF-16) input buffer encoded as UTF-8. The console does the conversion via WideCharToMultiByte with a buffer that assumes one Unicode character is a single byte in the target codepage. But that's completely wrong for UTF-8, in which one UTF-16 code may map to as many as three bytes. In this case it's two bytes, and the console only allocates one byte in the translation buffer. So WideCharToMultiByte fails, but does the console try to increase the translation buffer size? No. Does it fail the call? No. It actually returns back that it 'successfully' read 0 bytes. To Python's REPL that signals EOF (end of file), so the interpreter just exits as if you had entered Ctrl+Z at the prompt.

using int64 type for snmp v2c oid?

I am debugging some snmp code for an integer overflow problem. Basically we use an integer to store disk/raid capacity in KB. However when a disk/raid of more than 2TB is used, it'll overflow.
I read from some internet forums that snmp v2c support integer64 or unsigned64. In my test it'll still just send the lower 32 bits even though I have set the type to integer64 or unsigned64.
Here is how I did it:
a standalone program will obtain the capacity and write the data to a file. example lines for raid capacity
my-sub-oid
Counter64
7813857280
/etc/snmp/snmpd.conf has a clause to pass thru the oids:
pass_persist mymiboid /path/to/snmpagent
in the mysnmpagent source, read the oidmap into oid/type/value structure from the file, and print to stdout.
printf("%s\n", it->first.c_str());
printf("%s\n", it->second.type.c_str());
printf("%s\n", it->second.value.c_str());
fflush(stdout);
use snmpget to get the sub-oid, and it returns:
mysuboid = Counter32: 3518889984
I use tcpdump and the last segment of the value portion is:
41 0500 d1be 0000
41 should be the tag, 05 should be the length, and the value is only carrying the lower 32-bit of the capacity. (note 7813857280 is 0x1.d1.be.00.00)
I do find that using string type would send correct value (in octetstring format). But I want to know if there is a way to use 64-bit integer in snmp v2c.
I am running NET-SNMP 5.4.2.1 though.
thanks a lot.
Update:
Found the following from snmpd.conf regarding pass (and probably also pass_persist) in net-snmp doc page. I guess it's forcing the Counter64 to Counter32.
Note:
The SMIv2 type counter64 and SNMPv2 noSuchObject exception are not supported.
You are supposed to use two Unsigned32 for lower and upper bytes of your large number.
Counter64 is not meant to be used for large numbers this way.
For reference : 17 Common MIB Design Errors (last one)
SNMP SMIv2 defines a new type Counter64,
https://www.rfc-editor.org/rfc/rfc2578#page-24
which is in fact unsigned 64 bit integer. So if your data fall into the range, using Counter64 is proper.
"In my test it'll still just send the lower 32 bits even though I have set the type to integer64 or unsigned64" sounds like a problem, but unless you show more details (like showing some code) on how you tested it out and received the result, nobody might help further.

8086 Assembly / MS-DOS, passing file name from the command line

Say I have PROGRAM.ASM - I have the following in the data segment:
.data
Filename db 'file.txt', 0
Fhndl dw ?
Buffer db ?
I want 'file.txt' to be dynamic I guess? Once compiled, PROGRAM.exe needs to be able to accept a file name via the command line:
c:\> PROGRAM anotherfile.txt
EXECUTION GOES HERE
How do I enable this? Thank you in advance.
DOS stores the command line in a legacy structure called the Program Segment Prefix ("PSP"). And I do mean legacy. This structure was designed to be backwards-compatible with programs ported from CP/M.
Where's the PSP?
You know how programs built as .COM files always start with ORG 100h? The reason for that is precisely that - for .COM programs - the PSP is always stored at the beginning of the code segment (at CS:0h). The PSP is 0FFh bytes long, and the actual program code starts right after that (that is, at CS:100h).
The address is also conveniently available at DS:00h and ES:00h, since the key characteristic of the .COM format is that all the segment registers start with the same value (and a COM program typically never changes them).
To read the command line from a .COM program, you can pick its length at CS:80h (or DS:80h, etc. as long as you haven't changed those registers). The Command Line starts at CS:81h and takes the rest of PSP, ending with a Carriage Return (0Dh) as a terminator, so the command line is never more than 126 bytes long.
(and that is why the command line has been 126 bytes in DOS forever, despite the fact we all wished for years it could be made longer. Since WinNT uses provides a different mechanism to access the command line, the WinNT/XP/etc. command line doesn't suffer from this size limitation).
For an .EXE program, you can't rely on CS:00h because the startup code segment can be just about anywhere in memory. However, when the program starts, DOS always stores the PSP at the base of the default data segment. So, at startup, DS:00h and ES:00h will always point to the PSP, for both .EXE and .COM programs.
If you didn't keep track of PSP address at the beginning of the program, and you change both DS and ES, you can always ask DOS to provide the segment value at any time, via INT 21h, function 62h. The segment portion of the PSP address will be returned in BX (the offset being of course 0h).

Why does the 260 character path length limit exist in Windows?

I have come up against this problem a few times at inopportune moments:
Trying to work on open source Java projects with deep paths
Storing deep Fitnesse wiki trees in source control
An error trying to use Bazaar to import my source control tree
Why does this limit exist?
Why hasn't it been removed yet?
How do you cope with the path limit?
And no, switching to Linux or Mac OS X is not a valid answer to this question ;)
Quoting this article https://learn.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file#maximum-path-length-limitation
Maximum Path Length Limitation
In the Windows API (with some exceptions discussed in the following paragraphs), the maximum length for a path is MAX_PATH, which is defined as 260 characters. A local path is structured in the following order: drive letter, colon, backslash, name components separated by backslashes, and a terminating null character. For example, the maximum path on drive D is "D:\some 256-character path string<NUL>" where "<NUL>" represents the invisible terminating null character for the current system codepage. (The characters < > are used here for visual clarity and cannot be part of a valid path string.)
Now we see that it is 1+2+256+1 or [drive][:\][path][null] = 260. One could assume that 256 is a reasonable fixed string length from the DOS days. And going back to the DOS APIs we realize that the system tracked the current path per drive, and we have 26 (32 with symbols) maximum drives (and current directories).
The INT 0x21 AH=0x47 says “This function returns the path description without the drive letter and the initial backslash.” So we see that the system stores the CWD as a pair (drive, path) and you ask for the path by specifying the drive (1=A, 2=B, …), if you specify a 0 then it assumes the path for the drive returned by INT 0x21 AH=0x15 AL=0x19. So now we know why it is 260 and not 256, because those 4 bytes are not stored in the path string.
Why a 256 byte path string, because 640K is enough RAM.
This is not strictly true as the NTFS filesystem supports paths up to 32k characters. You can use the win32 api and "\\?\" prefix the path to use greater than 260 characters.
A detailed explanation of long path from the .Net BCL team blog.
A small excerpt highlights the issue with long paths
Another concern is inconsistent behavior that would result by exposing long path support. Long paths with the \\?\ prefix can be used in most of the file-related Windows APIs, but not all Windows APIs. For example, LoadLibrary, which maps a module into the address of the calling process, fails if the file name is longer than MAX_PATH. So this means MoveFile will let you move a DLL to a location such that its path is longer than 260 characters, but when you try to load the DLL, it would fail. There are similar examples throughout the Windows APIs; some workarounds exist, but they are on a case-by-case basis.
The question is why does the limitation still exist. Surely modern Windows can increase the side of MAX_PATH to allow longer paths. Why has the limitation not been removed?
The reason it cannot be removed is that Windows promised it would never change.
Through API contract, Windows has guaranteed all applications that the standard file APIs will never return a path longer than 260 characters.
Consider the following correct code:
WIN32_FIND_DATA findData;
FindFirstFile("C:\Contoso\*", ref findData);
Windows guaranteed my program that it would populate my WIN32_FIND_DATA structure:
WIN32_FIND_DATA {
DWORD dwFileAttributes;
FILETIME ftCreationTime;
FILETIME ftLastAccessTime;
FILETIME ftLastWriteTime;
//...
TCHAR cFileName[MAX_PATH];
//..
}
My application didn't declare the value of the constant MAX_PATH, the Windows API did. My application used that defined value.
My structure is correctly defined, and only allocates 592 bytes total. That means that i am only able to receive a filename that is less than 260 characters. Windows promised me that if i wrote my application correctly, my application would continue to work in the future.
If Windows were to allow filenames longer than 260 characters then my existing application (which used the correct API correctly) would fail.
For anyone calling for Microsoft to change the MAX_PATH constant, they first need to ensure that no existing application fails. For example, i still own and use a Windows application that was written to run on Windows 3.11. It still runs on 64-bit Windows 10. That is what backwards compatibility gets you.
Microsoft did create a way to use the full 32,768 path names; but they had to create a new API contract to do it. For one, you should use the Shell API to enumerate files (as not all files exist on a hard drive or network share).
But they also have to not break existing user applications. The vast majority of applications do not use the shell api for file work. Everyone just calls FindFirstFile/FindNextFile and calls it a day.
From Windows 10. you can remove the limitation by modifying a registry key.
Tip Starting in Windows 10, version 1607, MAX_PATH limitations have been removed from common Win32 file and directory functions. However, you must opt-in to the new behavior.
A registry key allows you to enable or disable the new long path behavior. To enable long path behavior set the registry key at HKLM\SYSTEM\CurrentControlSet\Control\FileSystem LongPathsEnabled (Type: REG_DWORD). The key's value will be cached by the system (per process) after the first call to an affected Win32 file or directory function (list follows). The registry key will not be reloaded during the lifetime of the process. In order for all apps on the system to recognize the value of the key, a reboot might be required because some processes may have started before the key was set.
The registry key can also be controlled via Group Policy at Computer Configuration > Administrative Templates > System > Filesystem > Enable NTFS long paths.
You can also enable the new long path behavior per app via the manifest:
<application xmlns="urn:schemas-microsoft-com:asm.v3">
<windowsSettings xmlns:ws2="http://schemas.microsoft.com/SMI/2016/WindowsSettings">
<ws2:longPathAware>true</ws2:longPathAware>
</windowsSettings>
</application>
You can mount a folder as a drive. From the command line, if you have a path C:\path\to\long\folder you can map it to drive letter X: using:
subst x: \path\to\long\folder
One way to cope with the path limit is to shorten path entries with symbolic links.
For example:
create a C:\p directory to keep short links to long paths
mklink /J C:\p\foo C:\Some\Crazy\Long\Path\foo
add C:\p\foo to your path instead of the long path
You can enable long path names using PowerShell:
Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem' -Name LongPathsEnabled -Type DWord -Value 1
Another version is to use a Group Policy in Computer Configuration/Administrative Templates/System/Filesystem:
As to why this still exists - MS doesn't consider it a priority, and values backwards compatibility over advancing their OS (at least in this instance).
A workaround I use is to use the "short names" for the directories in the path, instead of their standard, human-readable versions. So e.g. for C:\Program Files\ I would use C:\PROGRA~1\ You can find the short name equivalents using dir /x.
As to how to cope with the path size limitation on Windows - using 7zip to pack (and unpack) your path-length sensitive files seems like a viable workaround. I've used it to transport several IDE installations (those Eclipse plugin paths, yikes!) and piles of autogenerated documentation and haven't had a single problem so far.
Not really sure how it evades the 260 char limit set by Windows (from a technical PoV), but hey, it works!
More details on their SourceForge page here:
"NTFS can actually support pathnames up to 32,000 characters in
length."
7-zip also support such long names.
But it's disabled in SFX code. Some users don't like long paths, since
they don't understand how to work with them. That is why I have
disabled it in SFX code.
and release notes:
9.32 alpha 2013-12-01
Improved support for file pathnames longer than 260 characters.
4.44 beta 2007-01-20
7-Zip now supports file pathnames longer than 260 characters.
IMPORTANT NOTE: For this to work properly, you'll need to specify the destination path in the 7zip "Extract" dialog directly, rather than dragging & dropping the files into the intended folder. Otherwise the "Temp" folder will be used as an interim cache and you'll bounce into the same 260 char limitation once Windows Explorer starts moving the files to their "final resting place". See the replies to this question for more information.
It does, and it is a default for some reason, but you could easily override it with this registry key:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem]
"LongPathsEnabled"=dword:00000001
See: https://blogs.msdn.microsoft.com/jeremykuhne/2016/07/30/net-4-6-2-and-long-paths-on-windows-10/
Another way to cope with it is to use Cygwin, depending on what do you want to do with the files (i.e. if Cygwin commands suit your needs)
For example it allows to copy, move or rename files that even Windows Explorer can't. Or of course deal with the contents of them like md5sum, grep, gzip, etc.
Also for programs that you are coding, you could link them to the Cygwin DLL and it would enable them to use long paths (I haven't tested this though)

Resources