How do I go from io.ReadCloser to io.ReadSeeker? - go

I'm trying to download a file from S3 and upload that file to another bucket in S3. Copy API won't work here because I've been told not to use it.
Getting an object from S3 has a response.Body that's an io.ReadCloser and to upload that file, the payload takes a Body that's an io.ReadSeeker.
The only way I can figure this out is by saving the response.Body to a file then passing that file as a io.ReadSeeker. This would require writing the entire file to disk first then reading the entire file from disk which sounds pretty wrong.
What I would like to do is:
resp, _ := conn.GetObject(&s3.GetObjectInput{Key: "bla"})
conn.PutObject(&s3.PutObjectInput{Body: resp.Body}) // resp.Body is an io.ReadCloser and the field type expects an io.ReadSeeker
Question is, how do I go from an io.ReadCloser to an io.ReadSeeker in the most efficient way possible?

io.ReadSeeker is the interface that groups the basic Read() and Seek() methods. The definition of the Seek() method:
Seek(offset int64, whence int) (int64, error)
An implementation of the Seek() method requires to be able to seek anywhere in the source, which requires all the source to be available or reproducible. A file is a perfect example, the file is saved permanently to your disk and any part of it can be read at any time.
response.Body is implemented to read from the underlying TCP connection. Reading from the underlying TCP connection gives you the data that the client at the other side sends you. The data is not cached, and the client won't send you the data again upon request. That's why response.Body does not implement io.Seeker (and thus io.ReadSeeker either).
So in order to obtain an io.ReadSeeker from an io.Reader or io.ReadCloser, you need something that caches all the data, so that upon request it can seek to anywhere in that.
This caching mechanism may be writing it to a file as you mentioned, or you can read everything into memory, into a []byte using ioutil.ReadAll(), and then you can use bytes.NewReader() to obtain an io.ReadSeeker from a []byte. Of course this has its limitations: all the content must fit into memory, and also you might not want to reserve that amount of memory for this file copy operation.
All in all, an implementation of io.Seeker or io.ReadSeeker requires all the source data to be available, so your best bet is writing it to a file, or for small files reading all into a []byte and streaming the content of that byte slice.

As an alternative, use github.com/aws/aws-sdk-go/service/s3/s3manager.Uploader, which takes an io.Reader as input.
I imagine the reason that PutObject takes an io.ReadSeeker instead of an io.Reader is that requests to s3 need to be signed (and have a content length), but you can't generate a signature until you have all the data. The stream-y way to do this would be to buffer the input into chunks as they come in and use the multipart uploading api to upload each chunk separately. This is (I think) what s3manager.Uploader does behind the scenes.

Related

test is data available in `io.ReadWriteCloser`

io.ReadWriteCloser has blocking Read() until data available to read.
What if I want to test if it has data available to read, without actually Read() it. Since I need to do some other processing between:
It has data available to read
and
io.Copy(thisReadWriteCloser, anotherReadWriteCloser)
using bufio.Reader Peek() function
bi := bufio.NewReader(i)
bi.Peek(1)
But I have follow up issue not able to re-use the original io.ReadWriteCloser after executing the bi.Peek(1): convert from `bufio.Reader` to `io.ReadWriteCloser`

Why sends of "large" array/slice using net/rpc/jsonrpc codec over unix socket connection hang?

I'm trying to send an array of data as an rpc reply using golang's built-in net/rpc server and client and the net/rpc/jsonrpc codec. But I'm running into some trouble.
The data I'm sending is around 48 bytes, and the client will just hang in client.Call.
I've made a playground that replicates the problem:
https://go.dev/play/p/_IQ9SF7TSdc
If you change the constant "N" in the above program to 5,
things work as expected!
Another playground shows how the issue seems to crop up only when the slice/array in question exceeds 49 bytes:
https://go.dev/play/p/R8CQa0mv7vB
Does anyone know what might be the issue? Golang's tests for the array and slice data types are not exactly designed for "large" arrays in mind. Thanks in advance.
On the line where the listener is set up:
listener, err := net.ListenUnix("unixpacket", &net.UnixAddr{RPCPath, "unixpacket"})
Don't use unixpacket. It corresponds to the underlying SOCK_SEQPACKET which is not a stream protocol. Likely large files were separated into packets in a way the receiver was not able to process. Use unix instead, which corresponds to SOCK_STREAM.
See this SO post
for more.

Difficulty in using io.Pipe

Hi friends I want to write a data in a writer and pass it to a library using a reader so that it can read
Now the problem I have is that of png. Encode no longer continues and gets stuck there
r, w := io.Pipe()
err := png.Encode(w, img)
Tell me the solution if possible. Of course, I don't care if this problem is resolved, if you know another solution to the case that the data is written in a writer and read in a reader please suggest, there were secondary solutions, but I use two libraries that one just wants writer and one just reader.
w is blocked waiting for a reader to read the data written to the pipe, thus blocking Encode
Reading from r will unblock Encode to the writer..
each Write to the PipeWriter blocks until it has satisfied one or more
Reads from the PipeReader that fully consume the written data

Unable to send data in chunks to server in Golang

I'm completely new to Golang. I am trying to send a file from the client to the server. The client should split it into smaller chunks and send it to the rest end point exposed by the server. The server should combine those chunks and save it.
This is the client and server code I have written so far. When I run this to copy a file of size 39 bytes, the client is sending two requests to the server. But the server is displaying the following errors.
2017/05/30 20:19:28 Was not able to access the uploaded file: unexpected EOF
2017/05/30 20:19:28 Was not able to access the uploaded file: multipart: NextPart: EOF
You are dividing buffer with the file into separate chunks and sending each of them as separate HTTP message. This is not how multipart is intended to be used.
multipart MIME means that a single HTTP message may contain one or more entities, quoting HTTP RFC:
MIME provides for a number of "multipart" types -- encapsulations of
one or more entities within a single message-body. All multipart types
share a common syntax, as defined in section 5.1.1 of RFC 2046
You should send the whole file and send it in a single HTTP message (file contents should be a single entity). The HTTP protocol will take care of the rest but you may consider using FTP if the files you are planning to transfer are large (like > 2GB).
If you are using a multipart/form-data, then it is expected to take the entire file and send it up as a single byte stream. Go can handle multi-gigabyte files easily this way. But your code needs to be smart about this.
ioutil.ReadAll(r.Body) is out of the question unless you know for sure that the file will be very small. Please don't do this.
multipartReader, err := r.MultipartReader() use a multipart reader. This will iterate over uploading files, in the order they are included in the encoding. This is important, because you can keep the file entirely out of memory, and do a Copy from one filehandle to another. This is how large files are handled easily.
You will have issues with middle-boxes and reverse proxies. We have to change defaults in Nginx so that it will not cut off large files. Nginx (or whatever reverse-proxy you might use) will need to cooperate, as they often are going to default to some really tiny file size max like 300MB.
Even if you think you dealt with this issue on upload with some file part trick, you will then need to deal with large files on download. Go can do single large files very efficiently by doing a Copy from filehandle to filehandle. You will also end up needing to support partial content (http 206) and not modified (304) if you want great performance for downloading files that you uploaded. Some browsers will ignore your pleas to not ask for partial content when things like large video is involved. So, if you don't support this, then some content will fail to download.
If you want to use some tricks to cut up files and send them in parts, then you will end up needing to use a particular Javascript library. This is going to be quite harmful to interoperability if you are going for programmatic access from any client to your Go server. But maybe you can't fix middle-boxes that impose size limits, and you really want to cut files up into chunks. You will have a lot of work to handle downloading the files that you managed to upload in chunks.
What you are trying to do is the typical code that is written with a tcp connection with most other languages, in GO you can use tcp too with net.Listen and eventually accept on the listener object. Then this should be fine.

Google Protocol Buffers - Storing messages into file

I'm using google protocol buffer to serialize equity market data (ie. timestamp, bid,ask fields).
I can store one message into a file and deserialize it without issue.
How can I store multiple messages into a single file? Not sure how I can separate the messages. I need to be able to append new messages to the file on the fly.
I would recommend using the writeDelimitedTo(OutputStream) and parseDelimitedFrom(InputStream) methods on Message objects. writeDelimitedTo writes the length of the message before the message itself; parseDelimitedFrom then uses that length to read only one message and no farther. This allows multiple messages to be written to a single OutputStream to then be parsed separately. For more information, see https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/MessageLite#writeDelimitedTo(java.io.OutputStream)
From the docs:
http://code.google.com/apis/protocolbuffers/docs/techniques.html#streaming
Streaming Multiple Messages
If you want to write multiple messages to a single file or stream, it
is up to you to keep track of where one message ends and the next
begins. The Protocol Buffer wire format is not self-delimiting, so
protocol buffer parsers cannot determine where a message ends on their
own. The easiest way to solve this problem is to write the size of
each message before you write the message itself. When you read the
messages back in, you read the size, then read the bytes into a
separate buffer, then parse from that buffer. (If you want to avoid
copying bytes to a separate buffer, check out the CodedInputStream
class (in both C++ and Java) which can be told to limit reads to a
certain number of bytes.)
Protobuf does not include a terminator per outermost record, so you need to do that yourself. The simplest approach is to prefix the data with the length of the record that follows. Personally, I tend to use the approach of writing a string-header (for an arbitrary field number), then the length as a "varint" - this means the entire document is then itself a valid protobuf, and could be consumed as an object with a "repeated" element, however, just a fixed-length (typically 32-bit little-endian) marker would do just as well. With any such storage, it is appendable as you require.
If you're looking for a C++ solution, Kenton Varda submitted a patch to protobuf around August 2015 that adds support for writeDelimitedTo() and readDelimitedFrom() calls that will serialize/deserialize a sequence of proto messages to/from a file in a way that's compatible with the Java version of these calls. Unfortunately this patch hasn't been approved yet, so if you want the functionality you'll need to merge it yourself.
Another option is Google has open sourced protobuf file reading/writing code through other projects. The or-tools library, for example, contains the classes RecordReader and RecordWriter that serialize/deserialize a proto stream to a file.
If you would like stand-alone versions of these classes that have almost no external dependencies, I have a fork of or-tools that contains only these classes. See: https://github.com/moof2k/recordio
Reading and writing with these classes is straightforward:
File* file = File::Open("proto.log", "w");
RecordWriter writer(file);
writer.WriteProtocolMessage(msg1);
writer.WriteProtocolMessage(msg2);
...
writer.Close();
An easier way is to base64 encode each message and store it as a record per line.

Resources