File-size and line-length limits for Windows batch files - windows

I am generating a Windows batch file that might become quite large, say a few megabytes. I've searched for possible limits regarding the maximum file size and maximum length of a line in a batch file, but I couldn't find anything. Any practical experiences?

I think filesize can be anything up to 2 GB, perhaps even more. It's an interpreted language, so if this is done right, filesize limit should be the filesize limit of the file system. I never had any errors with batch files being too large, and some of those I created were several MBs in size.
There should be a line length limit, but it should be more than 256. This can easily be tested, just do some "set A=123456789012...endofline", after that "echo %A%", and you'll see how far you can go.
It works for me with very long lines (around 4K), but at 8K echo gives a message, "Line too long", so 8192 bytes should be some limit.
Now tested for filesize, too, tested with "echo off", thousands of set lines, after that "echo end", and it worked for a 11 MB file (although it took some seconds to finish :) - no limit in sight here.
110 MB worked, too. Is this enough? ;)

The maximum length of any command line (or variable) within CMD is 8191 characters.

There are many different limits.
There is no (known) limit for the file itself, also code blocks seems to be unlimited.
The maximal size of a variable is 8191 characters.
Obviously the limit for a line has to be larger to assign a maximal sized variable.
Even the variable name can be 8191 chars long.
It's possible to build a line with 16387 chars
set <varname_8191_chars_long>=<content_8191_chars_long>
But the batch parser is able to read much longer lines (tested with 100k in one line).
But it's required, that the effective length is < ~8191 chars after the percent expansion.
Like in
echo %======%%=====%%======%%=====% ..... %=====%END
This works because after the expansion the line contains only echo END.
( One bug of the parser: The parser drops at every multiple of 8192 one character)

Some ideas, not necessarily mutually exclusive:
Switch to PowerShell.
Switch to a data-driven application, so that all the variable stuff is kept in a data file (CSV, text, whatever), and as a result you can have a smaller, boilerplate script that opens the data file and operates.

It should work at least up to 2 GB. The lines are read
directly from the BAT file on the disk (there is no caching
involved). I make this statement because of the following:
In fact you can edit a BAT file while it is running! And it
will work even though a text editor may rename the original
version and save the new version in a new location on the
disk. As long as you are careful not to insert text above
the currently executing command. Lines can be
changed/inserted/deleted below the currently executing
command and the new lines will be the ones executed. I have
often done this with BAT files containing a long list of
wget commands, each taking tens of minutes to execute.

According to Microsoft, there is no limit to a batch file size. However, a batch file line should not exceed 127 bytes or it will truncated at execution.
See Maximum Line Length and Count for Batch Files & CONFIG.SYS

Related

FB_FileGets vs FB_FileRead in twincat

There are two similar function for reading file in twincat software for Beckhoff company. FB_FileGets and FB_FileRead. I will be appreciate if someone explain what are the differences of these function and clear when we use each of them. both of them have same ‌prerequisite or not, use in same way in programs? which one has better speed(fast reading in different file format) and any inform that make them clear for better programming.
vs
The FB_FileGets reads the file line by line. So when you call it, you always get a one line of the text file as string. The maximum length of a line is 255 characters. So by using this function block, it's very easy to read all lines of a file. No need for buffers and memory copying, if the 255 line length limit is ok.
THe FB_FileReadreads given number of bytes from the file. So you can read files with for example 65000 characters in a single line.
I would use the FB_FileGets in all cases where you know that the lines are less than 255 characters and the you handle the data as line-by-line. It's very simple to use. If you have no idea of the line sizes, you need all data at once or the file is very big, I would use the FB_FileRead.
I haven't tested but I think that the FB_FileReadis probably faster, as it just copies the bytes to buffer. And you can read the whole file at once, not line-by-line.

VBscript alternatives to writing variables to a file needed (FSO.write is too slow)

TL;DR: I need something way faster than FSO.write OR another way to share a variable in memory between different script instances.
Hello, I am running CCPulse (on Windows 7), which is a Call Center monitoring tool. Agents are represented as "Objects" and can have various statistics (like calls taken, total talk duration etc). CCPulse allows to apply thresholds and actions to any statistic. These are basically vbscripts and as far as I can tell, there are no restrictions.
This allows me to take the "Threshold StatValue" and do things with it, ie writing it to a file. The issue is that if I apply a threshold to a statistic for all agents, the script executes for each agent object seperately (in sequence, not parallel). However, I want to export all the agent stats to a single csv file.
I already got it working, by creating a file if it doesn't exist, then open/ReadAll into a string. If an agent has not been written to the file yet his stat values get appended as a newline in the string, if he already exists in this file I search and replace his line using a regex pattern. I then write the entire multiline string back to the file:
Set objFile = objFSO.OpenTextFile(inFile,2)
objFile.Write strMemoryBuffer
objFile.Close
set objFile = nothing
strMemoryBuffer contains the files original content, with either a new line or a modified line. This string (and subsequently the export file) is around 30kb in size after all agents have been exported. It looks like this (simplified):
LoginID;Calls;TotalTalkTime
2243;08;9403
2132;12;8439
As I said, since the script runs seperately for each agent, only one line is ever added/modified per pass (CCpulse will execute the script one object at a time, until all are finished).
The write process is very slow however, using Timer() it says it needs between 0.10 and 0.15 seconds! That is way too slow, as I need to run the script on almost 500 agents (ideally in no more than 30 second intervals), but all the writing would take over a minute (CCPulse would create a backlog of threshold operations which could never be finished. I can decrease the recalculation frequency, but that is detrimental in other ways).
If I comment out only the above block, execution time dramatically decreases to ~0.02 seconds. So reading the file and manipulating the string takes almost no time at all, just the write process is slow.
I am writing the file locally to a hard drive (no SSD though). I cannot use a RAM Disk.
I also already tried writing to the volatile environment, but somehow, this is even slower (it does work, but for some reason the explorer process goes crazy with up to 50% cpu usage and ccpulse locks up, allthough the export file is still being updated).
The ideal solution would to have the string being repeadetly manipulated only in memory, and then written to file like only once every 30 seconds or something like that, but I don't know how I can make the strMemoryBuffer variable available to the "next" agent. Any ideas?

True in-place file editing using GNU tools

I have a very large (multiple gigabytes) file that I want to do simple operations on:
Add 5-10 lines in the end of the file.
Add 2-3 lines in the beginning of the file.
Delete a few lines in the beginning, up to a certain substring. Specifically, I need to traverse the file up to a line that says "delete me!\n" and then delete all lines in the file up to and including that line.
I'm struggling to find a tool that can do the editing in place, without creating a temporary file (very long task) that has essentially a copy of my original file. Basically, I want to minimize the number of I/O operations against the disk.
Both sed -i, and awk -i, do exactly that slow thing (https://askubuntu.com/questions/20414/find-and-replace-text-within-a-file-using-commands) and are inefficient as a result. What's a better way?
I'm on Debian.
Adding 5-10 lines at the beginning of a multi-GB file will always require fully rewriting the contents of that file, unless you're using an OS and filesystem that provides nonstandard syscalls. (You can avoid needing multiple GB of temporary space by writing back to a point in the file you're modifying from which you've already read to a buffer, but you can't avoid needing to rewrite everything past the point of the edit).
This is because UNIX only permits adding new contents to a file in a manner that changes its overall size at or past its existing end. You can edit part of a file in-place -- that is to say, you can seek 1GB in and write 1MB of new contents -- but this changes the 1MB of contents that had previously been in that location; it doesn't change the total size of the file. Similarly, you can truncate and rewrite a file at a location of your choice, but everything past the point of truncation needs to be rewritten.
An example of the nonstandard operations referred to above is the FALLOC_FL_INSERT_RANGE and FALLOC_FL_COLLAPSE_RANGE operations, which with very new Linux kernels will allow blocks to be inserted to or removed from an existing file. This is unlikely to be helpful to you here:
Only exact blocks (ie. 4kb -- whatever your filesystem is formatted for) can be inserted, not individual lines of text of arbitrary size.
Only XFS and ext4 are supported.
See the documentation for fallocate(2).
here is a recommendation for editing large files (change the lines and number of digits based on your file length and number of sections to work on)
split -l 1000 -a 4 -d bigfile bigfile_
for that you need space, since bigfile won't be removed
insert header as first line
sed -i '1iheader` bigfile_000
search a specific pattern, get the file name and remove the previous sections.
grep pattern bigfile_*
etc.
Once all editing is done, just cat back the remaining pieces
cat bigfile_* > edited_bigfile

How to display command output in a whiptail textbox

The whiptail command has an option --textbox that has the following description:
--textbox <file> <height> <width>
The first option requires a file as input; I would like to use the output of a command in its place. It seems like this should be possible in sh or bash. For the sake of the question, let's say I'd like to view the output of ls -l in a whiptail textbox.
Note that process substitution does not appear to work in whiptail (e.g. whiptail --textbox <(ls -l) 40 80 does not work.
This question is a re-asking of this other stackoverflow question, which technically was answered.
For the record, the question says that
whiptail --textbox <(ls -l) 40 80
"doesn't work". It's most certainly worth stating clearly that the nature of the failure is that whiptail displays an empty textbox. (That's mentioned in a comment to an answer to the original question, linked in this question, but that's a pretty obscure place to look for a problem report.)
In 2014, this workaround was available (and was the original contents of this answer):
whiptail --textbox /dev/stdin 40 80 <<<"$(ls -l)"
That will still work in 2022, if ls -l produces enough output (at least 64k on a standard Linux/Bash install).
Another possible workaround is to use a msgbox instead of a textbox:
whiptail --msgbox "$(ls -l)" 40 80
However, that will fail if the output from the command is too large to use as a command-line argument, which might be the case at 128k.
So if you can guess reasonably accurately how big the output will be, one of those solutions will work. Up to around 100k, you can use the msgbox solution; beyond that, you can use the textbox with a here-string.
But that's far from ideal, since it's really hard to reliably guess the size of the output of a command, even within a factor of two.
What will always work is to save the output of the command to a temporary file, then provide the file to whiptail, and then delete the file. (In fact, you can delete the file immediately, since Posix systems don't delete files until there are no open file handles.) But no matter how hard you try, you will occasionally end up with a file which should have been deleted. On Linux, your best bet is to create temporary files in the /tmp directory, which is an in-memory filesystem which is emptied automatically on reboot.
Why does this happen?
I came up with the solution above, eight years prior to this edit, on the assumption that OP was correct in their guess that the problem had to do with not being able to seek() a process substituted fd. Indeed, it's true that you can't seek() /dev/fd/63. It was also true at the time that bash implemented here-strings and here-docs by creating a temporary file to hold the expanded text, and then redirecting standard input (or whatever fd was specified) to that temporary file. So the suggested workaround did work; it ensured that /dev/stdin was just a regular file.
But in 2022, the same question was asked, this time because the suggested workaround failed. As it turns out, the reason is that Bash v5.1, which was released in late 2020, attempts to optimise small here-strings and here-docs:
c. Here documents and here strings now use pipes for the expanded document if it's smaller than the pipe buffer size, reverting to temporary files if it's larger.
(from the Bash CHANGES file; changes between bash 5.1alpha and bash 5.0, in section 3, New features in Bash.)
So with the new Bash version, unless the here-string is bigger than a pipe buffer (on Linux, 16 pages), it will no longer be a regular file.
One slightly confusing aspect of this issue is that whiptail does not, in fact, try to call lseek() on the textbox file. So the initial guess about the nature of the problem was not exact. That's not all that surprising, since using lseek() on a FIFO to position the stream at SEEK_END produces an explicit error, and it's reasonable to expect software to actually report error returns. But whiptail does not attempt to get the filesize by seeking to the end. Instead, it fstat()s the file and gets the file size from the st_size field in the returned struct stat. It then allocates exactly enough memory to hold the contents of the file, and reads the indicated number of bytes.
Of course, fstat cannot report the size of a FIFO, since that's not known until the FIFO is completely drained. But unlike lseek, that's not considered an error. fstat is documented as not filling in the st_size field on FIFOs, sockets, character devices, and other stream types. As it happens, on Linux the st_size field is filled in as 0 for such file descriptors, but Posix actually allows it to be unset. In any case, there is no error indication, and it's essentially impossible to distinguish between a stream which doesn't have a known size and a stream which is known to have size 0, like an empty file. Thus, whiptail treats a FIFO as though it were an empty file, reading 0 bytes and presenting an empty textbox.
What about dialog?
One alternative to whiptail is Dialog, currently maintained by Thomas Dickey. You can often directly substitute dialog for whiptail, and it has some additional widgets which can be useful. However, it does not provide a simple solution in this case.
Unlike whiptail, dialog's textbox attempts to avoid reading the entire file into memory before drawing the widget. As a result, it does depend on lseek() in order to read out of order, and thus cannot work on special files at all. Attempting to use a FIFO with dialog produces an error message, rather than drawing an empty textbox; that makes diagnosis easier but doesn't really solve the underlying problem. Dialog does have a variety of widgets which can read from stdin, but as far as I know none of them allow scrolling, so they're only useful if the command output fits in a single window. But it's possible that I've missed something.
Drawing out a moral: just read until you reach the end
(Skip this section if you're only interested in using command-line utilities, not writing them.)
The tragic part of this complicated tale is that it was all completely unnecessary. Whiptail is going to read the entire file in any case; trying to get the size of the file before reading was either laziness or a misguided attempt to optimise. Had the code just read until an end of file indication, possibly allocating more memory as needed, all these problems would have been avoided. And not just these problems. As it happens, Posix does not guarantee that the st_size field is accurate even for apparently normal files. Stat is only required to report accurate sizes for symlinks (the length of the link itself) and shared memory objects. The Linux documentation indicates that st_size will be returned as 0 for certain automatically-generated files:
For example, the value 0 is returned for many files under the /proc directory, while various files under /sys report a size of 4096 bytes, even though the file content is smaller. For such files, one should simply try to read as many bytes as possible (and append '\0' to the returned buffer if it is to be interpreted as a string). (from man 7 inode).
lseek will also fail on many autogenerated files, as well as sockets, FIFOs and character devices. So the more common idiom for this particular optimization is also unreliable, as well as leading to TOCTOU-like race conditions when the file is truncated or appended to while it is being read.

compare 2 files and copy source if different from destination - vbscript?

I'm working on Windows XP and I need to make a script that would compare 2 files (1 on a server and 1 on a client). Basically, I need my script to check if the file from the client is different from the server version and replace the client version if it finds a difference (in the file itself, not only the modification date).
As you suggest, you can skip the date check as that can be changed without the contents changing.
First check that the sizes are different. If so, that may be enough to conclude that they are different. This can have false positives too though depending on the types of files. For example a unicode text file may contain the exact same content as an ansi text file, but be encoded with two bytes per character. If it's a script, it would execute with exactly the same results, but be twice the size.
If the sizes are the same, they may still contain different bytes. The brute force test would be to load each file into a string and compare them for equality. If they are big files and you don't want to read them all into memory if not necessary, then read them line by line until you encounter a difference. That's assuming they are text files. If they aren't text files, you can do something similar by reading them in fixed size chunks and comparing those.
Another option would be to to run the "fc" file compare command on the two files and capture the result and do your update based on that.

Resources