Is there a way or a gem to parse an xlsx document string without having a file? I was using Roo to parse the excel file when it was on my local machine but I would like to do it without downloading the actual document. I receive it in my google mailbox, and can pull the excel document string but can't find any way to parse just the string without a file path. Any ideas would be appreciated.
You should be able to wrap your string in a StringIO object that will expose a similar API as File objects. Unfortunately, it looks like Roo has a lot of file handling built in to the various classes, and expects a pathname in the initialize method. As I see it you have a few options:
Subclass one of Roo's spreadsheet classes to override the file handling, and substitute in your StringIO object.
Save the string to a Tempfile and pass the temporary path into one of Roo's standard initializers (I suspect this would be easier to implement).
You can not parse a file without supplying it to your parser.
So at one point or another you will have to download the file, so that the parser has anything he can read for parsing.
Related
I'm using Sphinx to maintain docs on a project, I am generating a jsonschema document from a tool where all properties of objects are listed.
Those objects properties are documented in rst files, I need to:
I've managed to read the rst files in the doctree-resolved event, and match with the json property, but I'm not sure if this is the best approach since I need
a) check all properties are documented, this is almost donde, I can mark on the json properties found and then check the json at the end.
b) Copy the description retrieved from the doctree object in the json (adding a property to the json) the format I need is markup, so I need to figure out how to convert a doctree node(set) to markup. Also the url links should be working at this stage. Also if markdown is not possible converting the fragment to html then to markdown might be easier
I don't know if I'm in the right path or should I write a builder instead?
Thanks
I'd like to write a Codec plugin to enable LogStash to decode a binary data format.
The official documentation for writing a Codec shows that I need to define a decode method that accepts a single parameter: a variable called data.
I'm new to both LogStash and Ruby. Having worked mostly with statically typed languages, I'm unsure how to learn more about the data variable. I assume that it's analogous to an InputStream-type object, allowing me to read data as it becomes available, but I'm not sure.
Questions:
What type is the data object? What methods does it have?
How do Ruby developers typically go about investigating variables like this? I'm not sure I see a way to figure it out without writing a skeleton plugin and dumping a string representation of data to STDOUT.
Thanks!
The documentation for writing an input plugin hints at this. From the run() method section:
data = $stdin.sysread(16384)
#codec.decode(data) do |event|
decorate(event)
event.set("host", #host) if !event.include?("host")
queue << event
end
The data variable is a Ruby String, which is being used as a buffer of arbitrary bytes. I have verified this by creating a skeleton plugin and inspecting the value at runtime.
This seems to be cause for caution: the bytes provided to your codec's decode method are not guaranteed to be a complete event.
Which attribute can be used to pass the File Name while ingesting a document?
How to determine the file type when a document is pulled from Documentum using DFC API
Once a file is uploaded to Documentum, it "loses" its filename. A document is linked to a content object, which is again linked to the file itself on a filestore.
There are ways to get hints about the original filename and/or file extensions:
Find the Content ID by looking at i_contents_id, and look at that object's set_file attribute. Normally, this string will contain the full path to path (including filename) of the original file, but there are no guarantees.
If storage extensions are on (yes, they're on by default), you could use the following API command to get the file extension: getpath,c,<doc_id>
The document's a_content_type links to the name attribute of a dm_format object. Look at that object's dos_extension attribute to see the registered file extension for that given format (there is no guarantee that this was the original file extension, however).
As for which attribute should contain the filename, there is no clear answer. It's all up to the client. Normally, using object_nameshould suffice, or you could create a custom type with a custom attribute if the original filename is very important to you.
File in Documentum repository don't need to have document names that is originating from file that was uploaded from file system.
When you export document via export action with WDK application, i.e. Documentum Administrator or Webtop exported file will have name based on the value that was place in object_name property of that specific object.
File type of the content that is related to sepecific document object in repository is written in attribute a_content_type. Values in this attribute are internal Documetnum notation but names are intuitive. Check this question for more info or google.
I'm using Spring 3 ability to upload a file. I would like to know the best way to validate that a file is of a certain type, specifically a csv file. I'm rather sure that checking the extension is useless and currently I am checking the content type of the file that is uploaded. I just ensure that it is of type "text/csv". And just to clarify this is a file uploaded by the client meaning I have no control of its origins.
I'm curious how Spring/the browser determines what the content type is? Is this the best/safest way to determine what kind of file has been uploaded? Can I ever be 100% certain?
UPDATE: Again I'm not wondering how to determine what the content type is of a file but how the content type gets determined. How does spring/the browser know that the content type is a "text/csv" based on the file uploaded?
You can use
org.springframework.web.multipart.commons.CommonsMultipartFile object.
it hasgetContentType(); method.
Look at the following example http://www.ioncannon.net/programming/975/spring-3-file-upload-example/
you can just add the simple test on CommonsMultipartFile object and redirect to error page if it the content type is incorrect.
So you can also count the number of commas in the file per line.There should normally be the same amount of commas on each line of the file for it to be a valid CSV file.
Why you don't just take the file name in you validator and split it, the file type is fileName.split("\.")[filename.length()-1] string
Ok, in this case i suggest you to use the Csvreader java library. You just have to check your csvreader object and that's all.
As far as I'm aware the getContentType(String) method gets its value from whatever the user agent tells it - so you're right to be wary as this can easily be spoofed.
For binary files you could check the magic number or use a library, such as mime-util or jMimeMagic. There's also Files.probeContentType(String) since Java 7 but it only works with files on disk and bugs have been reported on some OSes.
I use Ruby Gem curb to fetch image by the method of body_str of Curl::Easy instance, then want to use RMagick to process the image, however Magick::Image.read needs a file name to read, but what i get is the content string of the image. Yes, i know i can firstly convert the image content string into a file, then pass the file name to Magick::Image.read method. But that will add one more IO operation.
So i want to know whether it's possible to convert an image content string into stream directly, so that i can use RMagick to read it directly.
Thank you in advance.
Check other class methods of Image class, particularly from_blob. Sounds like what you need.