Unexpected EOF when using golang bufio in gzip - go

Golang Code Here: (include 4 files here)
https://gist.github.com/kmahyyg/02a2da2970001de455f847f4e7525aff
When defined as above, compress a big file (512M here, a bin created by dd from /dev/urandom).
If you use SetWriter(out), try to pass out as a bufio.Writer but keep the struct field definition as io.Writer, and the same as Reader part.
Then try decompress, you will get an unexpected EOF error.
But if you pass out as a io.Writer, everything will be fine.
Compress function have no errors.
Why use bufio.Writer will cause unexpected EOF?
Note:
after some observation, it seems that file smaller than a specific size (here, is 337MB on my machine) will not get unexpected EOF.
The official gunzip extract the same gzip file which caused unexpected EOF will only get about the first 337M part of data, then get the "corrupted file" message.
Edit: 1. Full code attached.
2. Screen shot here: (Use zstd as an example, same result when use gzip)

#leafbebop has the correct answer. The io.Writer will not automatically flush the buffer when close. So you must manually flush it before close when use bufio.Writer as io.Writer

Related

Opening file with write throws "No implicit conversion of String into Integer"

It's been quite a while time since I last wrote code in Ruby (Ruby 2 was new and wow it's 3 already), so I feel like an idiot.
I have a text file containing only the word:
hello
My ruby file contains the following code:
content = File.read("test_file_str.txt","w")
puts content
When I run it, I get:
`read': no implicit conversion of String into Integer (TypeError)
I've never had this happen before, but it has been quite a while since I wrote code, so clearly PEBKAC.
However, when I run this without ,"w" all is seemingly well. What am I doing wrong?
ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x64-mingw32]
As per the docs, the second argument for File.read is the length of bytes to be read from the given file which is meant to be an integer.
Opens the file, optionally seeks to the given offset, then returns length bytes (defaulting to the rest of the file). read ensures the file is closed before returning.
So, in your case the error happens because you're passing an argument which must be an integer. It doesn't state this per-se in the docs for File.read, but it does it for File#read:
Reads length bytes from the I/O stream.
length must be a non-negative integer or nil.
If you want to specify the mode, you can use the mode option for that:
File.read("filename", mode: "r") # "r" or any other
# or
File.new("filename", mode: "r").read(1)
Open Files for Reading Don't Accept Write Mode
In general, it doesn't make sense to open a filehandle for reading in write mode. So, you need to refactor your method to something like:
content = File.read("test_file_str.txt")
or perhaps:
content = File.new("test_file_str.txt", "r+").read
depending on exactly what you're trying to do.
See Also: File Permissions in IO#new
The documentation for File in Ruby 3.0.3 points you to IO#new for the available mode permissions. You might take a look there if you don't see exactly the options you're looking for.

X937 file decoding in golang?

I am trying to open and parse an x937 file - which I BELIEVE is usually encoded in EBCDIC 0037.
I am using the following library to decode the main bytes of the file :
"github.com/gdumoulin/goebcdic"
and the code I am using is as follows, for now.
// Bytes in file.
b, _ := ioutil.ReadFile("testingFile.x937")
fmt.Println(string(goebcdic.ASCIItoEBCDICofBytes(b)))
But if I dump the output of my file, I still don't seem to get anything that matches what I would have thought I would be looking for.
Any ideas on how I can work with this?

Opening filehandles for use with TabularMSA in skbio

Hey there skbio team.
So I need to allow either DNA or RNA MSAs. When I do the following, if I leave out the alignment_fh.close() skbio reads the 'non header' line in the except block making me think I need to close the file first so it will start at the beginning, but if I add alignment_fh.close() I cannot get it to read the file. I've tried opening it via a variety of methods, but I believe TabularMSA.read() should allow files OR file handles. Thoughts? Thank you!
try:
aln = skbio.TabularMSA.read(alignment_fh, constructor=skbio.RNA)
except:
alignment_fh.close()
aln = skbio.TabularMSA.read(alignment_fh, constructor=skbio.DNA)
I've tried opening it via a variety of methods, but I believe TabularMSA.read() should allow files OR file handles.
You're correct: scikit-bio generally supports reading and writing files using open file handles or file paths.
The issue you're running into is that your first TabularMSA.read() call reads the entire contents of the open file handle, so that when the second TabularMSA.read() call is hit within the except block, the file pointer is already at the end of the open file handle -- this is why you're getting an error message hinting that the file is empty.
This behavior is intentional; when scikit-bio is given an open file handle, it will read from or write to the file but won't attempt to manage the handle's file pointer (that type of management is up to the caller of the code).
Now, when asking scikit-bio to read a file path (i.e. a string containing the path to a file on disk or accessible at some URI), scikit-bio will handle opening and closing the file handle for you, so that's often the easier way to go.
You can use file paths or file handles to accomplish your goal. In the following examples, suppose aln_filepath is a str pointing to your alignment file on disk (e.g. "/path/to/my/alignment.fasta").
With file paths: You can simply pass the file path to both TabularMSA.read() calls; no open() or close() calls are necessary on your part.
try:
aln = skbio.TabularMSA.read(aln_filepath, constructor=skbio.RNA)
except ValueError:
aln = skbio.TabularMSA.read(aln_filepath, constructor=skbio.DNA)
With file handles: You'll need to open a file handle and reset the file pointer within your except block before reading a second time.
with open(aln_filepath, 'r') as aln_filehandle:
try:
aln = skbio.TabularMSA.read(aln_filehandle, constructor=skbio.RNA)
except ValueError:
aln_filehandle.seek(0) # reset file pointer to beginning of file
aln = skbio.TabularMSA.read(aln_filehandle, constructor=skbio.DNA)
Note: In both examples, I've used except ValueError instead of a "catch-all" except statement. I recommend catching specific error types (e.g. ValueError) instead of any exception because the code could be failing in different ways than what you're expecting. For example, with a "catch-all" except statement, users won't be able to interrupt your program with Ctrl-C because KeyboardInterrupt will be caught and ignored.

How to read a large file into a string

I'm trying to save and load the states of Matrices (using Matrix) during the execution of my program with the functions dump and load from Marshal. I can serialize the matrix and get a ~275 KB file, but when I try to load it back as a string to deserialize it into an object, Ruby gives me only the beginning of it.
# when I want to save
mat_dump = Marshal.dump(#mat) # serialize object - OK
File.open('mat_save', 'w') {|f| f.write(mat_dump)} # write String to file - OK
# somewhere else in the code
mat_dump = File.read('mat_save') # read String from file - only reads like 5%
#mat = Marshal.load(mat_dump) # deserialize object - "ArgumentError: marshal data too short"
I tried to change the arguments for load but didn't find anything yet that doesn't cause an error.
How can I load the entire file into memory? If I could read the file chunk by chunk, then loop to store it in the String and then deserialize, it would work too. The file has basically one big line so I can't even say I'll read it line by line, the problem stays the same.
I saw some questions about the topic:
"Ruby serialize array and deserialize back"
"What's a reasonable way to read an entire text file as a single string?"
"How to read whole file in Ruby?"
but none of them seem to have the answers I'm looking for.
Marshal is a binary format, so you need to read and write in binary mode. The easiest way is to use IO.binread/write.
...
IO.binwrite('mat_save', mat_dump)
...
mat_dump = IO.binread('mat_save')
#mat = Marshal.load(mat_dump)
Remember that Marshaling is Ruby version dependent. It's only compatible under specific circumstances with other Ruby versions. So keep that in mind:
In normal use, marshaling can only load data written with the same major version number and an equal or lower minor version number.

How to convert bson.Binary to []byte in Go

I'm writing a small application that receives message in BSON format from network(its not MongoDB) and have to save fields in files on local machine. I'm using gopkg.in/mgo.v2/bson for message unmarshaling and it works fine.
Almost everything works except one. There "userdefined" binary field in message and I have to save it to separate file. I tried to use:
var pwr = msg["pwr"].([]byte)
but got an "error panic: interface conversion: interface is bson.Binary, not []uint8".
Can some one point me an example how to convert bson.Binary to []byte, so I can save it to file.
This does what you want:
pwr := bson.Binary(msg["pwr"].(bson.Binary)).Data
But assumes msg["pwr"] can't be anything other than a bson.Binary... if that's not an invariant you should do the type assertion first (handle the possible type mismatch case when it happens) and then cast to get the Data field.

Resources