Document Management (best strategy to implement) - pdf-generation

I have a situation where users have a primary document (a purchase order) that will, throughout its life, have various other documents added to it. The documents could be email messages, word documents or anything else.
Right now the (clunky) solution is to print the document to PDF and then append the document to the Purchase order stored as a PDF.
I'm thinking of using a database (keyed by PO number) and linking the documents to it. The only issue with this is getting the documents into a standard (PDF) format and linking them them to the PO in the database. Any suggestions on a user-friendly way to do this?

If your intention is to store the PDFs externally, your best bet is to store the document with a file name containing the DocumentID generated from your Documents database table, as in
475833.PDF
You will need another table to collect all of the related documents together, like a binder table.
Printing to PDF does have the advantage that it is not dependent on any particular application to produce the PDF; it will work in all applications. The trick is to find software that allows you to specify the file name programmatically. CutePDF does this using registry entries.

Related

Elasticsearch - Modelling video catalogue information into one index vs multiple indexes

I need to model a video catalogue composed of movies, tv shows, episodes, TV channels and live programs information into elasticsearch. Some of these entities are correlated, some not.
The attributes of these entities are quite different, even if there are some common ones.
Now since I may need to do query cross-entity, imagine the scenario of a customer searching for something that could be a movie, a tv channel or a live event program, is it better to have 1 single index containing a generic entity marked with a logical type attribute, or is it better to have multiple indexes, 1 for each entity (movie, show episode, channel, program) ?
In addition, some of these entities, like movies, can have metadata attributes into multiple languages.
Coming from a relational data model DB, I would create different indexes, one for every entity and have a language variant index for every language. Any suggestion or better approach in order to have great search performance and usability?
Whether to use several indexes or not very much depends on the application, so I cannot provide a definite answer, rather a few thoughts.
From my experience, indexes are rather a means to help maintenance and operations than for data modeling. It is, for example, much easier to delete an index than delete all documents from one source from a bigger index. Or if you support totally separate search applications which do not query across each others data, different indexes are the way to go.
But when you want to query, as you do, documents across data sources, it makes sense to keep them in one index. If only to have comparable ranking across all items in your index. Make sure to re-use fields across your data that have similar meaning (title, year of production, artists, etc.) For fields unique to a source we usually use prefix-marked field names, e.g. movie_... for movie-only meta data.
As for the the language you need to use language specific fields, like title_en, title_es, title_de. Ideally, at query time, you know your user's language (from the browser, because they selected it explicitly, ...) and then search in the language specific fields where available. Be sure to use the language specific analyzers for these fields, at query and at index time.
I see a search engine a bit as the dual of a database: A database stores data but can also index it. A search engine indexes data but can also store it. A database tends to normalize the schema to remove redundancy, a search engine works best with denormalized data for query performance.

How to make data to show in separate word instead of one word from XML in jaspersoft adhoc

I have a lookup table that is harvested from the XML file and not physically stored in the MySQL database. Because of that all the data are represented in one word when it is queried out using jasper adhoc for example
ridikill
peon
thegreat
All these lookup should be like so
ridi kill
pe on
the great
how to make the data to show correctly in separate words.
You are going to have some trouble doing this exclusively in the Ad-Hoc editor, it simply doesn't have this kind of functionality on it's own. You could create a calculated field with the following code in the formula builder:
CaseWhen("RigType" == 'deepwaterdrillship', 'deep water drill ship', "RigType" == 'standardjackup', 'Standard Jack Up',"RigType"=='standardfloater','Standard Floater')
Replace all instances of "RigType" with your original field name. Obviously this will get quite manual if you have a lot of different strings.
If you created a calculated table in the domain/topic that you are using, with similar logic to the code above, this would be more powerful since you can join to your other tables. However, as Petter commented, this is a data source problem and in my experience it is always better to fix the source if possible.

How to extract documents from a FileNet database

I am working on a project which requires extracting documents from a FileNet system. I need to extract documents identified by their Object_ID and store them in files. The system is working under Windows and is using an Oracle 11G database.
The question is: is there a way to retrieve document content using direct database access and SQL? Can I write an SQL query that retrieves the binary content of document by passing its Object_ID as a parameter.
Thanks
Content does not have to be stored in the database. It can be, as BLOB, but can also be stored in FileStores, as files, or in Fixed Content Areas. If they are stored in the database, technically you should be able to retrieve them with a query by GUID.
However I would suggest using the Java API to retrieve content. That will let you manage all situations (all kinds of content areas, multi content elements...). I don't know how many documents you intend to export, but it can be significantly optimized using the API (batch, multi threading...).
I could help you in this task if you like ,
Usually the content of FileNet is stored in a directory called /cestore in windows or Linux or even AIX.
Due to some restriction on the number of files in the directory especially in Unix based systems they store the files in long tree like fn01/fn03/fn04
So what you will do is
Usually the name of the file has the next format {DocumentId}
You will Scan all the files under /cestore by libraries like Apache IO commons or better by python script store them in Map Contains then you will be able get any document Path of all the documents
Answering to an old question. But thought it might act as a quick help for someone. For the situation given here, IMHO, FileNet Queries are the best solution. This is how you do it:
Domain domain = Factory.Domain.fetchInstance(conn, null, null);
ObjectStore objStore = Factory.ObjectStore.fetchInstance(domain, osName, null);
SearchScope search = new SearchScope(objStore);
// your doc-class and identifier (index) goes here
String sql1 = "Select * from DocClassName where someIndex=abc456";
SearchSQL searchSQL = new SearchSQL(sql1);
DocumentSet documents = (DocumentSet) search.fetchObjects(searchSQL, Integer.valueOf("20"), null, Boolean.valueOf(true));
// go nuts on doc
Document doc;
maybe this will help you:
There is a tool: FileNet Enterprise Manager or just FEM if you prefer, where you can export documents (binaries) and the metadata.
From this tool you can make a SQL search, or build a search with the tool, in you object store. Then you can select the results and export them to a local directory. As a result from these tasks you will have a directory with binaries and some XML files. These XML files will host all the metadata from your database, like ID's and stuff.
Hope this help you somehow.

ADOX Rearrange Or Insert Columns Rather than Append them in Access Vb6, VB.Net or CSharp

I need to insert a field in the middle of current fields of a database table. I'm currently doing this in VB6 but may get the green light to do this in .net. Anyway I'm wondering since Access gives you the ability to "insert" fields in the table is there a way to do this in ADOX? If I had to I could step back and use DAO, but not sure how to do it there either.
If yor're wondering why I want to do this applications database has changed over time and I'm being asked to create Upgrade program for some of the installations with older versions.
Any help would be great.
This should not be necessary. Use the correct list of fields in your queries to retrieve them in the required order.
BUT, if you really need to do that, the only way i know is to create a new table with the fields in the required order, read the data from the old table into the new one, delete the old table and rename the new table as the old one.
I hear you: in Access the order of the fields is important.
If you need a comprehensive way to work with ADOX, your go to place is Allen Browne's website. I have used it to from my novice to pro in handling Access database changes. Here it is: www.AllenBrowne.com. Go to Access Tips then scroll down to ADOX Code.
That is also where I normally refer people with doubts about capabilities of Access as a database :)
In your case, you will juggle through creating a new table with the new field in the right position, copying data to the new table, applying properties to the fields, deleting original table, renaming the new table to the required (original) name.
That is the correct order. Do not apply field properties before copying the data. Some indexes and key properties may not be applied when the fields already have data.
Over time, I have automated this so I just run an application to do detect and implement the required changes for me. But that took A LOT of work-weeks.

Does Core Data/SQLite compress redundant information?

I want to use Core Data (probably with SQLite backing) to store a large database. Much of the string data will be the same between numerous rows. Does Core Data/SQLite see such redundancy, and automatically save space in the db files?
Do I need to make sure that the same text in different rows is the same string object before adding it to the db? If so, how do I detect that a new piece of text matches something anywhere in the existing db?
No, Core Data does not attempt to analyze your data to avoid duplication. If you want to save 10 million objects with the same attributes, you'll get 10 million copies.
If you want to avoid creating duplicate instances, you need to do a fetch for matching instances before creating a new one. The general approach is
Fetch objects matching new data-- according to whatever standard indicates a duplicate for your app. Use a predicate with the fetch that contains the attribute(s) that you don't want to duplicate.
If you find anything, either (a) update the instances you find with any new values you have, or (b) if there are no new values, do nothing.
If you don't find anything, create a new instance.
Application-layer logic can help reduce space at the cost of application complexity.
Say your name field can contain either an integer or a string. (SQLite's weak typing makes this easy to do).
If string -- that's the name right there.
If integer -- go look it up on a name table, using the int as key
Of course you have to create that name table, either on the fly as data is inserted, or a once-in-a-while trawl through the data for new names that are worth surrogating in this way.

Resources