Aspose Markup fields too wide - linq

I'm using the reporting engine for aspose and everything is working fine.
The issue I have is that my document model has some larger names in it. I didn't build the model and I would rather not create a new one just for reporting, but if that's my only option I will. Thought I'd check here first.
Example:
<<[NatureOfInjury]>><<[NatureOfInjuryOptionA]>>
Hurt Hit thumb with hammer
The markup for <<[NatureOfInjury]>> is wider on the word document than the value that will end up going in there, and it's making formatting the document difficult.
Is there any way other than changing the object model to make the markup smaller, independent of the actual text values that will go in there?
Thanks very much in advance.

Unfortunately, there is no other way to populate template tag with data source field name using LINQ Reporting. You need to use the same name in template document and data source.
I work with Aspose as Developer Evangelist.

Related

Is it possible to store the raw data as plain text but have it show as rtf? When it saves to our database it has html tags which we don't want

I want the underlying data to be plain text, but the editing experience to be rtf-like.
The data is later passed to a SharePoint column that does not have rtf support (which I realize is easy to change), but for the sake of argument, I'd like SP to receive it as plain text.
Might be a contradiction but I saw some CKEditor methods and classes that suggested it might be possible. Like with the HtmlDataProcessor class. Any ideas?

Kofax Seperate Main Invoice from Supporting Document without using Seperator sheet

When a batch gets created documents should get separated automatically without using separator sheet or Barcode separator.
How can I classify documents for Invoice and supporting document.
In our project we get many invoices with supporting document so the scanning person has to insert the separator sheets manually, so to avoid this we want to automatically classify the supporting documents.
In general the concept would be that you would enable separation in the project and then train your classes with examples to be used for the layout or content classifiers.
However, as I'm sure you've seen, the obstacle with invoices is that they are different enough between vendors that it would not reliably classify all to an Invoice class. Similarly with "Supporting Documents" which are likely to be very different from each other, so unfortunately there isn't a completely easy answer without separator sheets (or barcode stickers affixed to supporting docs).
What you might want to do is write code in the one of the separation events like Document_AfterSeparate event. Despite the name, the document has not yet been split at this point, but the classifiers have run. See Scripting Help topic "Server Script Events Sequence > Document Separation > Standard Document Separation" for more detail. Setting the SplitPage property on the CDocPage (pXDoc.CDoc.Pages.ItemByIndex(lPage).SplitPage) will allow you to use your own logic to determine which pages to separate.
For example if you know that you will always have single page invoices, you can split on the first page and classify accordingly. Or you can try to search for something that indicates the end of the invoice like "Total" or other characteristics. There is an example of how you can use locators to help separation in the Scripting Help topic "Script Samples > Use Locator Results for Standard Document Separation". The example uses a Barcode Locator, but the same concept works if you wanted to try it with a Format Locator or anything else.
Without Separator sheets you will need a smart classification software like Kofax Transformation Module (KTM). Its kind of expensive. you will need to verify the cost saving and ROI.

CouchDB, all_docs and filter design documents with endkey

First, this question - filter design documents from all_docs - already seemed to be solved like described here:
https://plus.google.com/+JasonDeRose/posts/1iP5tu3wVqw
/mydb/_all_docs?endkey=%22_%22
and worked in first place. However, suddenly in a different setup (actually just different deploy), the query only returns an empty collection []. It seems like the ordering changed, without endkey="_" the full collection is returned (including design documents). I tried various combinations of endkey/startkey but cannot achieve to filter the design documents again.
Finally I added a filter and switched to _changes?include_docs=true to load the initial documents. I also thought about defining a view, but don't like that this results in data replication and some inconveniences with the changes feed (needed in another context). The filter on the other hand will be executed for every document.
Is it a bug that endkey=%22_%22 doesn't work anymore and is there a more convenient, still working way?
/_all_docs is a special case for CouchDB. Instead of the normal Unicode Collation, it uses ASCII collation.
The '_' character in ASCII order shows up between uppercase letters and lowercase letters. So if your doc id starts with lowercase letters (default behaviour), they will show up after any design docs. If your doc ids start with uppercase letters, they will show up before design docs.
Try creating a document with an id of: "ABC" You will see it show up before the design doc and your trick to filter design docs would work in this case.
However, I recommend you stop using the `_all_docs view altogether. Instead use the normal view functionality. When you create a view, CouchDB automatically skips design docs for you. So if your view looked like:
function(doc){
emit(doc._id, null);
}
You could query this with no start or end key, and get all docs without design docs.
Also, please look at Unicode Collation order, this is the order all your other views will be in, and it's important to understand as you work with CouchDB. You can read all about it here:
http://docs.couchdb.org/en/stable/ddocs/views/collation.html

Classify documents with tags

I have a huge amount of documents (mainly pdfs and doc's) I want to classify, so I can search over them according to certain tags. These tags could either be of my own (I put the tags to the document) or extracted from the text.
I've just seen a post related to this (Classify data using Apache Mahout), but perhaps there is something even more simple.
Mahout might be overkill for your problem - but you can get a fairly quick, easy solution by using OpenNLP.
http://opennlp.sourceforge.net/api/index.html
Specifically, look at the opennlp.tools.doccat package. Essentially, you have to go through and manually tag a small(ish) set of the items for each category you desire. If they are really distinct, you can get away with a small sample size.
You can use the DocumentCategorizerME.train() static function to train a collection of documents, where each requires a category tag and the text block to train on. Then, you can initialize the DocumentCategorizerME with the trained model and begin classifying all the rest of your documents.
Once you do this, you can (I think) write the model to a file so you don't have to ever do that again.
This post on extracting keywords and classifying webpages is related and may be helpful. In your example it sounds like you can use tags in lieu of the keyword extraction piece (although you may want to use both in combination). Weka is easy to use, I would definitely recommend giving it a look.

Sharepoint list and a word doc

Hi
I have a word templateis used as a content type. I want the document to be used as read and write data to a sharepoint list.
I also want this document to track changes. Is this possible?
Say user1 edits the doc, saves it
User2 makes some changes. saves it.
Now since i am actually "databinding" to a list, can i track changes?
And eventually i want to push this data to a record center
You can track changes using the built in versioning tools for SharePoint. Give this link a read:
http://technet.microsoft.com/en-us/library/cc262378.aspx
Not easy to achieve, but if you did, let me know how. I tried something similar and it's easier in InfoPath, but I have no use for it at work. As for versioning control - that is within SP out of the box; the answer above should do it.

Resources