FB_FileGets vs FB_FileRead in twincat - twincat

There are two similar function for reading file in twincat software for Beckhoff company. FB_FileGets and FB_FileRead. I will be appreciate if someone explain what are the differences of these function and clear when we use each of them. both of them have same ‌prerequisite or not, use in same way in programs? which one has better speed(fast reading in different file format) and any inform that make them clear for better programming.
vs

The FB_FileGets reads the file line by line. So when you call it, you always get a one line of the text file as string. The maximum length of a line is 255 characters. So by using this function block, it's very easy to read all lines of a file. No need for buffers and memory copying, if the 255 line length limit is ok.
THe FB_FileReadreads given number of bytes from the file. So you can read files with for example 65000 characters in a single line.
I would use the FB_FileGets in all cases where you know that the lines are less than 255 characters and the you handle the data as line-by-line. It's very simple to use. If you have no idea of the line sizes, you need all data at once or the file is very big, I would use the FB_FileRead.
I haven't tested but I think that the FB_FileReadis probably faster, as it just copies the bytes to buffer. And you can read the whole file at once, not line-by-line.

Related

readString vs readLine

I am writing an application to read from a list of files, line by line and do some processing. I want to use as little RAM as I can.
I came across this question https://stackoverflow.com/a/41741702/3531263
Where the poster is saying readString uses more RAM than readLine and they have posted some code.
What I don't understand is how one uses more RAM? Because ultimately, the way their code is written, they are still writing an entire line to their buffer. So would that not mean if they had just used readString, it would have been the same thing?
the way their code is written, they are still writing an entire line to their buffer
Their code, yes. Your code might not need the whole line to be in memory at the same time. For example, your program is filtering a log file by request id, which is in the beginning of the line. It doesn't need to read the whole line which may be a few megabytes or more, only to reject it due to wrong request id. But with ReadString you don't have the luxury of choice.
I 'gree with Sergio. Also, have a look at the current implementation in the standard library. ReadLine calls ReadSlice('\n') once, then runs through a few branches to make sure the appropriate sentinel values or errors are returned with the converted data. On the other hand, ReadBytes and ReadString both loop over repeated calls to ReadSlice(delim), so it follows that they would necessarily be copying at least as much data into memory as ReadLine, and potentially much more if the delimiter wasn't found in the first call.

Accessing memory location using pseudo "file handle" in MATLAB

There's lots of questions relating to dealing with large data sets by avoiding loading the whole thing into memory. My question is kind of the opposite: I've written code that reads files line by line to avoid memory overflow problems. However, I've just been given access to a powerful workstation with several hundred GB of memory, removing that problem, and making disk-access into the bottleneck.
Thing is, my code is written to access data files line by line using functions like fgetl. Is it possibly for me to somehow replace the file handle f = fopen('datafile.txt') with something else that acts in exactly the same way with respect to functions reading from a file, but instead of reading from the disk just returns values stored in memory?
I'm thinking, for example, having a large cell array with the contents of the file split by line and fgetl just returns the next. If I have to write my own wrapper for this, how can I go about doing this?

Trim Illumina reads in a bam/sam file

I have found plenty of tools for trimming reads in a fastq format, but are there any available for trimming already aligned reads?
I would personally discourage trimming of reads after aligning your reads especially if the sequences you're trying to trim are adapter sequences.
The presence of these adapter sequences will prevent your reads from aligning properly to the genome (you'll get a much lower percentage of alignments that you should from my experience). Since your alignment is already inaccurate, it will be quite pointless to trim the sequences after alignment (garbage in, garbage out).
You'll be much better off trimming the fastq files before aligning them.
Do you want the alignment to be informing the trimming protocol, or are you wanting to trim on things like quality values? One approach would be to simply convert back to FASTQ and then use any of the myriad of conventional trimming options available. You can do this with Picard:
http://picard.sourceforge.net/command-line-overview.shtml#SamToFastq
One possibility would be use GATK toolset, for example ClipReads. If you want to remove adaptors, you can use ReadAdaptorTrimmer. No back converting to fastq needed(Documantation : http://www.broadinstitute.org/gatk/gatkdocs/).
Picard is, off course, another possibility.
The scenario of trimming reads in bam file would be encountered when you want to normalize the reads to the same length after you have performed a tremendous alignment works. Remapping after trimming the fastq reads is not energy efficient. In site reads trimming from bam file will be a prefer solution.
Please have a try bbmap/reformat.sh, which can trim the reads with input file accepting bam format.
reformat.sh in=test.bam out=test_trim.bam allowidenticalnames=t overwrite=true forcetrimright=74 sam=1.4
## the default output format of reformat is sam 1.4. however, many tools only recognize 1.3 version. So the following step is to convert the 1.4 to version 1.3.
reformat.sh in=test_trim.bam out=test_trim_1.3.bam allowidenticalnames=t overwrite=true sam=1.3

compare 2 files and copy source if different from destination - vbscript?

I'm working on Windows XP and I need to make a script that would compare 2 files (1 on a server and 1 on a client). Basically, I need my script to check if the file from the client is different from the server version and replace the client version if it finds a difference (in the file itself, not only the modification date).
As you suggest, you can skip the date check as that can be changed without the contents changing.
First check that the sizes are different. If so, that may be enough to conclude that they are different. This can have false positives too though depending on the types of files. For example a unicode text file may contain the exact same content as an ansi text file, but be encoded with two bytes per character. If it's a script, it would execute with exactly the same results, but be twice the size.
If the sizes are the same, they may still contain different bytes. The brute force test would be to load each file into a string and compare them for equality. If they are big files and you don't want to read them all into memory if not necessary, then read them line by line until you encounter a difference. That's assuming they are text files. If they aren't text files, you can do something similar by reading them in fixed size chunks and comparing those.
Another option would be to to run the "fc" file compare command on the two files and capture the result and do your update based on that.

File-size and line-length limits for Windows batch files

I am generating a Windows batch file that might become quite large, say a few megabytes. I've searched for possible limits regarding the maximum file size and maximum length of a line in a batch file, but I couldn't find anything. Any practical experiences?
I think filesize can be anything up to 2 GB, perhaps even more. It's an interpreted language, so if this is done right, filesize limit should be the filesize limit of the file system. I never had any errors with batch files being too large, and some of those I created were several MBs in size.
There should be a line length limit, but it should be more than 256. This can easily be tested, just do some "set A=123456789012...endofline", after that "echo %A%", and you'll see how far you can go.
It works for me with very long lines (around 4K), but at 8K echo gives a message, "Line too long", so 8192 bytes should be some limit.
Now tested for filesize, too, tested with "echo off", thousands of set lines, after that "echo end", and it worked for a 11 MB file (although it took some seconds to finish :) - no limit in sight here.
110 MB worked, too. Is this enough? ;)
The maximum length of any command line (or variable) within CMD is 8191 characters.
There are many different limits.
There is no (known) limit for the file itself, also code blocks seems to be unlimited.
The maximal size of a variable is 8191 characters.
Obviously the limit for a line has to be larger to assign a maximal sized variable.
Even the variable name can be 8191 chars long.
It's possible to build a line with 16387 chars
set <varname_8191_chars_long>=<content_8191_chars_long>
But the batch parser is able to read much longer lines (tested with 100k in one line).
But it's required, that the effective length is < ~8191 chars after the percent expansion.
Like in
echo %======%%=====%%======%%=====% ..... %=====%END
This works because after the expansion the line contains only echo END.
( One bug of the parser: The parser drops at every multiple of 8192 one character)
Some ideas, not necessarily mutually exclusive:
Switch to PowerShell.
Switch to a data-driven application, so that all the variable stuff is kept in a data file (CSV, text, whatever), and as a result you can have a smaller, boilerplate script that opens the data file and operates.
It should work at least up to 2 GB. The lines are read
directly from the BAT file on the disk (there is no caching
involved). I make this statement because of the following:
In fact you can edit a BAT file while it is running! And it
will work even though a text editor may rename the original
version and save the new version in a new location on the
disk. As long as you are careful not to insert text above
the currently executing command. Lines can be
changed/inserted/deleted below the currently executing
command and the new lines will be the ones executed. I have
often done this with BAT files containing a long list of
wget commands, each taking tens of minutes to execute.
According to Microsoft, there is no limit to a batch file size. However, a batch file line should not exceed 127 bytes or it will truncated at execution.
See Maximum Line Length and Count for Batch Files & CONFIG.SYS

Resources