I have a requirement where I need to upload xsl (excel 2003) into SQL 2008R2 database. I am using ORCHAD stuff for scheduling.
I am using HTTPPOSTEDFILEBASE filestream to convert into byte array and store in database.
After storing background scheduler picks up the task and process the data stored. I need to create objects from data in excel and send for processing. I am struck at decoding byte array :(
What is the best way to handle this kinda requirement? any libraries which I make use.
My web app is build with MVC3, EF4.1,repository pattern, Autofaq.
I have not used the HTTPPOSTEDFILEBASE class, but you could:
Convert the file to a byte stream
Save it as the appropriate byte/blob type in your database (store the extension in a separate field)
Retrieve bytes and add the appropriate extension to the file stream
Treat as a normal file...
But I'm actually wondering if your requirements even demand this. Why are you storing the file in the first place? If you are only using the file data to shape your business object (that I'm guessing gets saved somewhere), you could perform that data extraction, shaping, and persistence before you store the file as raw bytes so you never have to reconstitute the file for that purpose.
Related
I was getting segment error while uploading a large file.
I have read the file data in chunks of bytes using the Read method through io.Reader. Now, I need to upload the bytes of data continuously into the StorJ.
Storj, architected as an S3-compatible distributed object storage system, does not allow changing objects once uploaded. Basically, you can delete or overwrite, but you can't append.
You could make something that seemed like it supported append, however, using Storj as the backend. For example, by appending an ordinal number to your object's path, and incrementing it each time you want to add to it. When you want to download the whole thing, you would iterate over all the parts and fetch them all. Or if you only want to seek to a particular offset, you could calculate which part that offset would be in, and download from there.
sj://bucket/my/object.name/000
sj://bucket/my/object.name/001
sj://bucket/my/object.name/002
sj://bucket/my/object.name/003
sj://bucket/my/object.name/004
sj://bucket/my/object.name/005
Of course, this leaves unsolved the problem of what to do when multiple clients are trying to append to your "file" at the same time. Without some sort of extra coordination layer, they would sometimes end up overwriting each other's objects.
Following the question on how to execute a file dump in icCube, I would like to know it it is possible to:
create a file dump
then use it as a data source
I tried to build a sequence of data views, but I can not get it to work, and I wonder if it is even possible at al?
(The reason I would like to do it is that my main data source is an odata feed and I need a lot of data manipulation before I can load it. I anticipate that it will be much easier to do these on CSV files.)
This is not possible as the rationale behind the ETL support is to transform data-tables as returned by the data-sources.
For example I have protocol buffer file compressed in snappy-format
file.pbuf.sn
how can I view the file's content? Which programms are recommended to work with protocol buffers files?
There's two separate steps here:
un-snappy the file container
process the contents that are presumably protobuf
If you're trying to do this through code, then obviously each will depend on your target language/platform/etc. Presumably "snappy" tools are available from Google (who created "snappy", IIRC).
Once you have he contents, it depends whether it is a .proto schema, binary data contents, JSON data contents, or some combination. If you have a schema for the data, then run it through "protoc" or the language/platform-specific tool of your choice to get the generated code that matches the schema. Then you can run either binary or JASON data through that generated code to get a populated object model.
If you don't have a schema: if it is JSON you should be able to understand the data via the names. Just run it through your chosen JSON tooling
If it is binary data without a schema, things are tougher. Protobuf data doesn't include names and the same values can be encoded in multiple ways (so: the same bytes can have come from multiple sources values). So you'll have to reverse-engineer the meaning of each field. "Protoc" has a schema-less decode mode that might help with this, as does https://protogen.marcgravell.com/decode
I want to know what are the file types used to load data in apache Spark, example (CSV,txt etc)
fileStream can accept any file type as long as you can provide input format class that can convert it to records. To be useful input should be splitable and easy to parse without reading a whole file but it is not must have as long as you can accept performance penalty.
I want to read a document content from FileNetP8 parallel to reduce my reading time. Also the issue is I write into a OutputStream. Is there anyway or any API from where I can parallelize my reads into a OutputStream. I am asking this because I am sure IBM would have provided some way to do it.
Also because let's say if my file is 1GB, then sequential reads are going to be performance hit.
I think from a Document instance there's only one API to retrieve the content - accessContentStream which gives you an object of InputStream. However, for reading huge files there's a new util class called ExtendedInputStream which you might be interested in.
An ExtendedInputStream is an input stream that can retrieve content at arbitrary positions within the stream. The ExtendedInputStream class includes methods that can read a certain number of bytes from the stream or read an unspecified number of bytes. The stream keeps track of the last byte position that was read. You can specify a position in the input stream to get to a later or earlier position within the stream.
More details at :
https://www.ibm.com/support/knowledgecenter/SSGLW6_5.2.1/com.ibm.p8.ce.dev.java.doc/com/filenet/api/util/ExtendedInputStream.html
Edit:
ExtendedInputStream has been introduced in v5.2.1 and is not available if you are using older version P8.