Content type of a distant file - ruby

I would like to get the content type of a distant file because I have a problem when I download it. The content type is wrong. Here is my code to download a file
url = "http://my_url/my_file.mp4"
file = open(URI::encode(url))
content_type = file.content_type # => text/plain instead of video/mpeg or video/mp4
I tried the following code to get the content_type before but still not working:
url = URI.parse(url)
Net::HTTP.start(url.host, url.port){|http| http.head(url.request_uri)['Content-Type']}
Does anybody have an idea?
Edit
Here is the code I use to find out which content_type it is..
MIME::Types.type_for(URI::encode(url)).map{|type| type.content_type}.join(' ')
# => "application/mp4 audio/mp4 video/mp4 video/vnd.objectvideo"
But here is the result with a video with sound track.. How am I suppose to only pink "video/mp4"? I can't check every file type to see what the result is.. It's endless.

Servers aren't obligated to tell you the correct content type and in many cases they'll get it wrong because it's not normally important. Most browsers provide considerable leeway on what they'll accept and process.
The only way to know for sure is to pull down the file and use a tool like file to examine it. This has a fairly large database of different file formats and ways of identifying them.
The result of your request might be an HTML error message. You won't know until you verify the file's contents.

Related

Add content disposition param for uploadTask using URLSession and URLRequest

I am using URLSession 'uploadTask from file'
func uploadTask(with request: URLRequest, fromFile fileURL: URL) -> URLSessionUploadTask
Almost everything works fine, but now our server needs an extra param as 'uploadKey' to be passed as content disposition along with fileName.
This can be done by generating multipart request with content disposition added as we normally do.
I want to add it while using 'uploadTask from file' to avoid memory pressure. Please suggest how to do it.
From reading the question, I suspect that you're subtly misunderstanding what upload tasks do (and unfortunately, Apple's documentation needs some serious improvement in this area, which doesn't help matters). These tasks don't upload a file in the same way that a web browser would if you chose a file in an upload form. Rather, they use the file as the body of an upload request. I think they default to providing a sane Content-Type based on the filename, though I'm not certain, but they do not send data in form encoding.
So assuming I'm fully understanding the question, your options are either:
Keep using multipart encoding. Optionally write the multipart body into a file rather than keeping it in memory, and use the upload task to provide the body from that file instead of from an NSData object.
Upload the file you're trying to send, unencoded, as the entirety of the upload body, and provide whatever additional parameters you need to provide in the form of GET parameters in the URL itself.
Use some other encoding like JSON or protocol buffers.
Either way, the server code will determine which of these approaches is supported. If you can modify the server code, then I would recommend the second approach. It is slightly more efficient than the first approach, much more efficient than JSON, and easier to implement than any of the other approaches.

How do i GZIP static resources using zlib?

Edit: I'm using Ruby with Sinatra.
UPDATE: here is the code I'm using which doesn't work...
get '/' do
session[:time] = Time.now
z = Zlib::Deflate.new(6, 31)
z.deflate(File.read('public/Assets/Styles/build.css'))
z.flush
z.finish
z.close
erb :home
end
...I don't get any errors. But when I check the file via Firebug's Yslow plugin it tells me that file isn't GZIP'ed
I'm trying to understand how I GZIP web page content and static files like JavaScript and CSS using zlib?
I know I can pass a string of data to Zlib::Deflate.deflate but I'm using Sinatra with ERB files. So do I pass in a path to the ERB file and the Js/CSS files? Or can I pass in the directory where scripts/styles are stored? Would I pass in a path to the ERB file or the symbol that references the ERB file?
Unless you are writing your own HTTP server, your server needs to handle this. The client first has to let the server know that it accepts gzip content encoding, and then the server can deliver gzip content encoding.
Zlib::Deflate.deflate will not produce gzip-encoded data. It will only produce zlib-encoded data. You would need to use Zlib::Deflate.new with the windowBits argument equal to 31 to start a gzip stream.

Fetching only X/HTML links (not images) based on mime type

I'm crawling a site using Ruby + OpenURI + Nokogiri. Fetch a page, find all the a[href] and (if they're in the same domain and right protocol) follow them to crawl again.
Sometimes there are links to large binaries (e.g. jpeg, exe), and I don't want to crawl those.
I tried using the HTTP "Accept" header to get an error or empty response for the wrong mime types like so:
require 'open-uri'
page = open(url, 'Accept'=>'text/html,application/xhtml+xml,application/xml')
...but OpenURI still downloads binaries sent with another mime type.
Other than looking at file extensions in the url for a probable file type, how can I prevent the download (or detect a conflicting response type) for an arbitrary URL?
You could send a HEAD request first, then check the Content-type header of the response and only make the real request if it’s acceptable:
ACCEPTABLE_TYPES = %w{text/html application/xhtml+xml application/xml}
uri = URI(url)
type = Net::HTTP.start(uri.host, uri.port) do |http|
http.head(uri.path).content_type
end
if ACCEPTABLE_TYPES.include? type
# fetch the url
else
# do whatever
end
This will need an extra request for each page, but I can’t see a way of avoiding it. It also relies on the server sending the same headers for a HEAD request as it does for a GET, which I think is a reasonable assumption but something to be aware of.

Spring MVC Upload File - How is Content Type determine?

I'm using Spring 3 ability to upload a file. I would like to know the best way to validate that a file is of a certain type, specifically a csv file. I'm rather sure that checking the extension is useless and currently I am checking the content type of the file that is uploaded. I just ensure that it is of type "text/csv". And just to clarify this is a file uploaded by the client meaning I have no control of its origins.
I'm curious how Spring/the browser determines what the content type is? Is this the best/safest way to determine what kind of file has been uploaded? Can I ever be 100% certain?
UPDATE: Again I'm not wondering how to determine what the content type is of a file but how the content type gets determined. How does spring/the browser know that the content type is a "text/csv" based on the file uploaded?
You can use
org.springframework.web.multipart.commons.CommonsMultipartFile object.
it hasgetContentType(); method.
Look at the following example http://www.ioncannon.net/programming/975/spring-3-file-upload-example/
you can just add the simple test on CommonsMultipartFile object and redirect to error page if it the content type is incorrect.
So you can also count the number of commas in the file per line.There should normally be the same amount of commas on each line of the file for it to be a valid CSV file.
Why you don't just take the file name in you validator and split it, the file type is fileName.split("\.")[filename.length()-1] string
Ok, in this case i suggest you to use the Csvreader java library. You just have to check your csvreader object and that's all.
As far as I'm aware the getContentType(String) method gets its value from whatever the user agent tells it - so you're right to be wary as this can easily be spoofed.
For binary files you could check the magic number or use a library, such as mime-util or jMimeMagic. There's also Files.probeContentType(String) since Java 7 but it only works with files on disk and bugs have been reported on some OSes.

How do I calculate the content-length of a file upload in ruby?

I need to include the content-length of a image /png file posted in an upload to a webservice.
But how do I calculate the content-length to include in the header?
Thanks.
I am submitting it using rest-client.
The webservice for the upload is Postful: and the documentation has been unclear: http://www.postful.com/developer/guide#uploading_attachments
Because I am writing the payload and headers, seems like I need to input that value.
I am also looking at postalmethods which says that the content-length is the user input:
http://postalmethods.com/method/2009-02-26/UploadFile
The files themselves are .PNG. I am going to attach them to a model using Paperclip, so will have a filepath from that.
The file that I need the content-length to post is stored as an attachment using paperclip, so the specific code generating problems is:
File.size(#postalcard.postalimage.url)
Well, you know how you're reading and posting the data, presumably - so you know how much data you're sending. That's the content length. If you're just sending it directly in binary as the body of the post, it's just the length of the file. If you're base-64 encoding it, then the content length will be the ((file length + 2) / 3) * 4. If it's going in a SOAP envelope etc, you'll need to take account of that.
One way of doing this for complicated situations is to build the entire post body in memory first, set the content length, and then just copy from memory directly to the post body.
Well, you can use File.size(filepath) but it's unlikely that you'll need to - most libraries for making HTTP requests should do that automatically - which library are you using? (Or, what kind of webservice is it?)

Resources