How to extract documents from a FileNet database - filenet-p8

I am working on a project which requires extracting documents from a FileNet system. I need to extract documents identified by their Object_ID and store them in files. The system is working under Windows and is using an Oracle 11G database.
The question is: is there a way to retrieve document content using direct database access and SQL? Can I write an SQL query that retrieves the binary content of document by passing its Object_ID as a parameter.
Thanks

Content does not have to be stored in the database. It can be, as BLOB, but can also be stored in FileStores, as files, or in Fixed Content Areas. If they are stored in the database, technically you should be able to retrieve them with a query by GUID.
However I would suggest using the Java API to retrieve content. That will let you manage all situations (all kinds of content areas, multi content elements...). I don't know how many documents you intend to export, but it can be significantly optimized using the API (batch, multi threading...).

I could help you in this task if you like ,
Usually the content of FileNet is stored in a directory called /cestore in windows or Linux or even AIX.
Due to some restriction on the number of files in the directory especially in Unix based systems they store the files in long tree like fn01/fn03/fn04
So what you will do is
Usually the name of the file has the next format {DocumentId}
You will Scan all the files under /cestore by libraries like Apache IO commons or better by python script store them in Map Contains then you will be able get any document Path of all the documents

Answering to an old question. But thought it might act as a quick help for someone. For the situation given here, IMHO, FileNet Queries are the best solution. This is how you do it:
Domain domain = Factory.Domain.fetchInstance(conn, null, null);
ObjectStore objStore = Factory.ObjectStore.fetchInstance(domain, osName, null);
SearchScope search = new SearchScope(objStore);
// your doc-class and identifier (index) goes here
String sql1 = "Select * from DocClassName where someIndex=abc456";
SearchSQL searchSQL = new SearchSQL(sql1);
DocumentSet documents = (DocumentSet) search.fetchObjects(searchSQL, Integer.valueOf("20"), null, Boolean.valueOf(true));
// go nuts on doc
Document doc;

maybe this will help you:
There is a tool: FileNet Enterprise Manager or just FEM if you prefer, where you can export documents (binaries) and the metadata.
From this tool you can make a SQL search, or build a search with the tool, in you object store. Then you can select the results and export them to a local directory. As a result from these tasks you will have a directory with binaries and some XML files. These XML files will host all the metadata from your database, like ID's and stuff.
Hope this help you somehow.

Related

Query to read data from URL

I am using Greenplum Database.
Need to write query or may be function to read data from a URL.
Say a sharepoint URL is there, which contains some tabular data.
I need to write a query to fetch that Data from within the sql query or function.
I can get http_get, but its not helpful because the version is 8.2x.
I also tried python as pg language, it is also not working as it is listed as untrusted language. Hence looking for some alternative.
Have you tried using web external tables:
https://gpdb.docs.pivotal.io/5170/admin_guide/external/g-creating-and-using-web-external-tables.html

Serializing query result

I have a financial system with all its business logic located in the database and i have to code an automated workflow for transactions batch processing, which consists of steps listed below:
A user or an external system inserts some data in a table
Before further processing a snapshot of this data in the form of CSV file with a digital signature has to be made. The CSV snapshot itself and its signature have to be saved in the same input table. Program updates successfully signed rows to make them available for further steps of code
...further steps of code
Obvious trouble is step#2: I don't know, how to assign results of a query as a BLOB, that represents a CSV file, to a variable. It seems like some basic stuff, but I couldn't find it. The CSV format was chosen by users, because it is human-readable. Signing itself can be made with a request to external system, so it's not an issue.
Restrictions:
there is no application server, which could process the data, so i have to do it with plsql
there is no way to save a local file, everything must be done on the fly
I know that normally one would do all the work on the application layer or with some local files, but unfortunately this is not the case.
Any help would be highly appreciated, thanks in advance
I agree with #william-robertson. you just need to create a comma delimited values string (assuming header and data row) and write that to a CLOB. I recommend an "insert" trigger. There are lots of SQL tricks you can do to make that easier). On usage of that CSV string will need to be owned by the part of the application that reads it in and needs to do something with it.
I understand yo stated you need to create a CVS, but see if you could do XML instead. Then you could use DBMS_XMLGEN to generate the necessary snapshot into a database column directly from the query for it.
I do not accept the concept that a CVS is human-readable (actually try it sometime as straight text). What is valid is that Excel displays it in human-readable form. But is should also be able to display the XML as human-readable. Further, if needed the data in it can be directly back-ported into the original columns.
Just a alternate idea.

Recover data from encrypted btrieve file

i've a simple question, but huge issue for me.
i have need to recover the data which is in a encrypted btrieve file, for migration purpose, but i can't access to the record structure.
Someone knowns a technique for that? or opensource program?
Thanks for any help or direction to go.
By "encrypted," do you mean it has an owner name or do you mean that when you open it in a text editor, it looks strange?
Btrieve data files require the Btrieve / Pervasive PSQL engine in order to be read. Once you have the engine, you can open it and read it. You'll still need to know the record layout (or guess) in order to extract meaningful data from it. Btrieve files do not store field metadata so any Btrieve tool will only see the record as a collection of bytes.
If you know the record structure, you can create a table definition using DDF Builder or the Pervasive Control Center, and then access the table using ODBC (or JDBC, ADO.NET, PDAC, ActiveX, or OLEDB) and extract the data using your favorite tool.

csv viewer on windows environement for 10MM lines file

We a need a csv viewer which can look at 10MM-15MM rows on a windows environment and each column can have some filtering capability (some regex or text searching) is fine.
I strongly suggest using a database instead, and running queries (eg, with Access). With proper SQL queries you should be able to filter on the columns you need to see, without handling such huge files all at once. You may need to have someone write a script to input each row of the csv file (and future csv file changes) into the database.
I don't want to be the end user of that app. Store the data in SQL. Surely you can define criteria to query on before generating a .csv file. Give the user an online interface with the column headers and filters to apply. Then generate a query based on the selected filters, providing the user only with the lines they need.
This will save many people time, headaches and eye sores.
We had this same issue and used a 'report builder' to build the criteria for the reports prior to actually generating the downloadable csv/Excel file.
As other guys suggested, I would also choose SQL database. It's already optimized to perform queries over large data sets. There're couple of embeded databases like SQLite or FirebirdSQL (embeded).
http://www.sqlite.org/
http://www.firebirdsql.org/manual/ufb-cs-embedded.html
You can easily import CSV into SQL database with just few lines of code and then build a SQL query instead of writing your own solution to filter large tabular data.

Document Management (best strategy to implement)

I have a situation where users have a primary document (a purchase order) that will, throughout its life, have various other documents added to it. The documents could be email messages, word documents or anything else.
Right now the (clunky) solution is to print the document to PDF and then append the document to the Purchase order stored as a PDF.
I'm thinking of using a database (keyed by PO number) and linking the documents to it. The only issue with this is getting the documents into a standard (PDF) format and linking them them to the PO in the database. Any suggestions on a user-friendly way to do this?
If your intention is to store the PDFs externally, your best bet is to store the document with a file name containing the DocumentID generated from your Documents database table, as in
475833.PDF
You will need another table to collect all of the related documents together, like a binder table.
Printing to PDF does have the advantage that it is not dependent on any particular application to produce the PDF; it will work in all applications. The trick is to find software that allows you to specify the file name programmatically. CutePDF does this using registry entries.

Resources