How to read a PS file in reverse order? - sorting

I have a PS file to be read in reverse order and process accordingly. Do we have a way to mention to read the file in reverse order in FD in COBOL module? OR do we have something to achieve the same using SORT?
Note: Reading the records into a buffer (array) and using it in reverse order would be the first idea that comes to mind but that way doesnt work for file with millions of records.
Your suggestions will be appreciated.

I do not believe there is a standard method for doing this in COBOL. However, if the file contains fixed length records you might try processing it as a relative file and just run thourgh it by descending record number. The other option is, as you suggest, sort it in reverse order then process as "normal".

If the device the file is on supports it, you can use "OPEN INPUT fname REVERSED". But the file will need to be on a tape or a device that is pretending to ba a tape.

Some versions of COBOL support a READ LAST statement to get the last record on the file. Then use READ PRIOR to read the file in reverse order. Not sure what COBOL version you're working with.

Related

Ruby miscalculation of MD5 for file

I am calculating a MD5 sum for a file to compare it with values supplied in a text file. I use the following line to create the checksum:
cksum = File.open(File.join(File.dirname(path), file),'rb') do |f|
MD5.hexdigest(f.read)
end
Every once in a while I get one that does not match but running the md5 manually at the system level shows the file has the correct MD5.
Does anyone see any issue with the process I am using to calculate the MD5 value or have any idea why they sometimes do not match when calculated by this ruby method?
For followers, there's also a method:
for a file: Digest::MD5.file('filename').hexdigest
At this point MD5 is a well-exercised message digest with an extensive suite of test vectors. It is extremely unlikely that there is an issue with Ruby's implementation of it.
It is almost certainly another explanation, such as that perhaps when your checksum executes, the file has not yet been fully written (i.e. by another process). In troubleshooting, it may be helpful to note the length of the result from f.read and verify that against the file size. You could even save the read contents to a separate file for later comparison when you discover a discrepancy. That could offer a clue.
You're correctly opening the file with binary mode, so that is good.

How to split a large csv file into multiple files in GO lang?

I am a novice Go lang programmer,trying to learn Go lang features.I wanted to split a large csv file into multiple files in GO lang, each file containing the header.How do i do this? I have searched everywhere but couldnt get the right solution.Any help in this regard will be greatly appreciated.
Also please suggest me a good book for reference.
Thanking You
Depending on your shell fu this problem might be better suited for common shell utilities but you specifically mentioned go.
Let's think through the problem.
How big is this csv file? Are we talking 100 lines or is it 5G ?
If it's smallish I typically use this:
http://golang.org/pkg/io/ioutil/#ReadFile
However, this package also exists:
http://golang.org/pkg/encoding/csv/
Regardless - let's return to the abstraction of the problem. You have a header (which is the first line) and then the rest of the document.
So what we probably want to do (if ignoring csv for the moment) is to read in our file.
Then we want to split the file body by all the newlines in it.
You can use this to do so:
http://golang.org/pkg/strings/#Split
You didn't mention but do you know how many files you want to split by or would you rather split by the line count or byte count? What's the actual limitation here?
Generally it's not going to be file count but if we pretend it is we simply want to divide our line count by our expected file count to give lines/file.
Now we can take slices of the appropriate size and write the file back out via:
http://golang.org/pkg/io/ioutil/#WriteFile
A trick I use sometime to help think me threw these things is to write down our mission statement.
"I want to split a large csv file into multiple files in go"
Then I start breaking that up into pieces but take the divide/conquer approach - don't try to solve the entire problem in one go - just break it up to where you can think about it.
Also - make gratiutious use of pseudo-code until you can comfortably write the real code itself. Sometimes it helps to just write a short comment inline with how you think the code should flow and then get it down to the smallest portion that you can code and work from there.
By the way - many of the golang.org packages have example links where you can literally run in your browser the example code and cut/paste that to your own local environment.
Also, I know I'll catch some haters with this - but as for books - imo - you are going to learn a lot faster just by trying to get things working rather than reading. Action trumps passivity always. Don't be afraid to fail.
Here is a package that might help. You can set a necessary chunk size in bytes and a file will be split on an appropriate amount of chunks.

Why are files returned by a For Each loop sorted, but not always?

I'm not sure if this is the correct place to post this question because I have a hunch that the behavior I witness will also be observed using other methods. But anyway, here it goes.
I have a VBscript that contains code like this:
For Each objFile In colFiles
...
Next
I've been running this code for quite some time on many different systems. I never bothered to order the files alphabetically. But today I found out by accident that the logic of my program depends on it. I ran the code on a new system (under Citrix) and the files were returned in a seemingly random order.
Does anybody know why Windows sometimes returns the files sorted alphabetically while sometimes it doesn't?
Added note: It might be relevant to note that the script as well as the input folder are on a network share (where my script outputs randomly ordered files).
Ordering is not supported for FileSystemObject. See KB 189751 http://support.microsoft.com/kb/189751/en-us
Also check out an answer on how to deal with that on SO Order of Files collection in FileSystemObject
The docs do not specify an ordering. Thus, you cannot depend on it to have an order. The Files property needs to ask the underlying file system for the files, and then gives it to you as it, without any processing. If that file system happens to return the files in order, that's great. If not, you'll have to sort it. Regardless of whether it is in order, you should always order it if you expect it in a certain order because the implementation may change tomorrow (as you've just witnessed).
It depends on what data structure you are looping through.
You will obviously get a different order if you use foreach loop in an array and a hashset, for example.
Personally, I don't know anything about VB. But it does work this way in C#.

How do I delete data/characters from a file? [duplicate]

This question already has answers here:
How do I insert and delete some characters in the middle of a file?
(4 answers)
Closed 9 years ago.
I'm writing a program to edit a txt file.
But I found that the windows API WriteFile can only add data/characters to a file, but not deleting data from files.
The only solution I've come up is to read the whole file into a buffer using ReadFile, and then use a loop to shift the data one by one, then replace the old file with the new file. But I think this will probably make my program really slow.
Can anyone help please
thanks.
If you're trying to delete from the end of the file it can be very fast with truncate() and ftruncate().
Where are you trying to delete the data from? If it's from the middle, you'll have to use fseek(): If the file contains "ABCDEFG", and you want to delete "DEF", use fseek() to get to G, copy "G" into a buffer, fseek to where "C" is, then write() what's there. Then truncate the file to the correct size with ftruncate().
If this really becomes a performance issue for you, you'll want to either design your file in a way that accounts for this or use a database of some kind. You may also want to use memory-mapped files, but usually this is better done by a database that someone else wrote instead of reinventing the wheel.
Files are linear streams of data. If you want to remove content from a file, you must re-write all the content of the file that follows the part that you have remove. So, unless the content to be removed is at the end of the file, you will need to perform some writing. In the worst case scenario, in order to remove the first byte of a file, you need to re-write the entire file apart from the byte that you removed.
FWIW, Raymond Chen wrote a nice article on this subject: How do I delete bytes from the beginning of a file?

Prepending to a multi-gigabyte file

What would be the most performant way to prepend a single character to a multi-gigabyte file (in my practical case, a 40GB file).
There is no limitation on the implementation to do this. Meaning it can be through a tool, a shell script, a program in any programming language, ...
There is no really simple solution. There are no system calls to prepend data, only append or rewrite.
But depending on what you're doing with the file, you may get away with tricks.
If the file is used sequentially, you could make a named pipe and put cat onecharfile.txt bigfile > namedpipe and then use "namedpipe" as file. The same can be achieved by cat onecharfile.txt bigfile | program if your program takes stdin as input.
For random access a FUSE filesystem could be done, but probably waay too complicated for this.
If you want to get your hands really dirty, figure out howto
allocate a datablock (about inode and datablock structure)
insert it into a file's chain as second block (or first and then you're practically done)
write the beginning of file into that block
write the single character as first in file
mark first block as if it uses only one byte of available payload (this is possible for last block, I don't know if it's possible for blocks in middle of file chain).
This has possibilities to majorly wreck your filesystem though, so not recommended; good fun.
Let the file have an initial block of null characters. When you prepend a character, read the block, insert the character right-to-left, and write back the block. When the block is full, then do the more expensive full rewrite in order to prepend another null block. That way, you can reduce the number of times by a large factor that you have to do a full rewrite.
Added: Keep the file in two subfiles: A (a short one) and B (a long one). Prepend to A any way you like. When A gets "big enough", prepend A to B (by re-writing), and clear A.
Another way: Keep the file as a directory of small files ..., A000003, A000002, A000001.
Just prepend to the largest-numbered file. When it's big enough, make the next file in sequence.
When you need to read the file, just read them all in descending order.
You might be able to invert your implementation depending on your problem: append single characters to the end of your file. When it comes time to read the file, read it in reverse.
Hide this behind enough of an abstraction layer and it may not make a difference to your code how the bytes are physically stored.
If you use linux you could try to use a custom version of READ(2) loaded with LD_PRELOAD and have it prepend your data at the first read.
See https://zlibc.linux.lu/zlibc.html for implementation inspiration.
if you mean prepend that character to the start of the entire file, one way
$ echo "C" > tmp
$ cat my40gbfile >> tmp
$ mv tmp my40gbfile
or using sed
$ sed -i '1i C' my40gbfile
if you mean prepending the character to every line of the file
$ awk '{print "C"$0}' my40gbfile > temp && mv temp my40gbfile
As I understand, this is handled on the file system level, meaning if you prepend data to a file, it effectively rewrites the file. This is the same reason why the ID3 tags in MP3 files are zero padded, so that future updates don't rewrite the entire file, but just update those reserved bytes.
So whichever way you use will give roughly similar results. What you can try is do some tests with a custom copy function, that reads/writes in bigger chunks than the default system copy, say 2MB or 5MB, which might improve performance. Ultimately your disk I/O is the bottleneck here.
The absolutely most high-performance way would seem to be to get down into the level of sectors and how the file is actually stored. I'm not sure if the OS then becomes a factor, but the target platform might, anyway it's useful for us to know what you run on.
I think this is a case where C is the obvious choice, this kind of low-level stuff is exactly what a systems programming language is for.
Can you tell us what you end up doing, would be interesting.
Here's the Windows command line ("DOS") way:
Put your 1 char into prepend.txt
copy /b prepend.txt + myHugeFile fileNameOfCombinedFile

Resources