I realize this is possible with the FileNET P8 API, however I'm looking for a way to find the physical document path within the database. Specifically there are two level subfolders in the FileStore, like FN01\FN13\DocumentID but I can't find the reference to FN01 or FN13 anywhere.
You will not find the names of the folders anywhere in the FN databases. The folder structure is determined by a hashing function. Here is an excerpt from this page on filestores:
Documents are stored among the directories at the leaf level using a hashing algorithm to evenly distribute files among these leaf directories.
The IBM answer is correct only from a technical standpoint of intended functionality.
If you really really need to find the document file name and folder location, disable your actual file store(s) by making the file store(s) folder unavailable to Content Engine. I did that for each file store by simply changing the root FN#'s to FN#a. For instance, FN3 became FN3a. Once done, I changed the top tree folder back. I used that method so log files would not exceed the tool's maximum output. Any method that leaves a storage location (eg: drive, share, etc) accessible and searchable, but renders the individual files unavailable should cause the same results.
Then, run the Content Engine Consistency Checker. It will provide you with a full list of all files, IDs and locations.
After that, you can match the entries to the OBJECT_ID fields in the database tables. In non-MSSQL databases, the byte ordering is reversed for the first few octets of the UUID. You need to account for that and fix the byte ordering to match the CCC output.
...needs to be byte reversed so that it can be queried upon in Oracle.
When querying on GUIDs, GUIDs are stored in byte-reversed form in
Oracle and DB2 (not MS SQL), whereby the first three sections are pair
reversed and the last two are left alone.
Thus, the same applies in reverse. In order to use the output from the Content Consistency Checker to match output to database, one must go through the same byte ordering reversal.
See this IBM Tech Doc and the answer linked below for details:
IBM Technote: https://www.ibm.com/support/pages/node/469173
Stack Answer: https://stackoverflow.com/a/53319983/1854328
More detailed information on the storage mechanisms is located here:
IBM Technote: "How to translate the unique identifier as displayed within FileNet Enterprise Manager so that it matches what is stored in the Oracle and DB2 databases"
I do not suggest using this for anything but catastrophic need, such as rebuilding and rewriting an entire file store that got horrendously corrupted by your predecessor when they destroyed an NTFS (or some similarly nasty situation).
It is a workaround to bypass FileNet's hashing that's used to obsfucate content information from those looking at the file system.
Related
We are designing an LDAP schema (specifically for OpenDJ) and we primarily need to be able to search on the mail attribute. We don't need to do a substring search as the user would provide the whole email address when they log in.
We already have an index on the mail attribute. However we are also considering to sub-divide the user directory by the first letter of the email address as well (so all users with an email address that starts with the letter A would be in an ou=A subdirectory under ou=users. The only value I can see in doing this is that when we do searches for a user by email, we can limit the baseDN of the search, thus reducing the scope of the search to approximately 1/26 of the entire directory.
My primary question is, does limiting the baseDN of an LDAP search like this provide any improvement on performance if the attribute already has an index? Do indexes take into account the baseDN, or are they indexed over the whole directory?
A secondary question, if I'm allowed, is there any other usage for splitting the users directory by first letter (or any other arrangement) other than providing a more specific baseDN when searching?
What you are thinking about seems like premature optimization when you don't even know if you have a performance issue.
Also, indexes and processing a query is not a standard element of LDAP, it's an implementation detail of the technology you are using.
In OpenDJ, an index is configured and maintain for a whole database backend.
The cost of a lookup in the email equality index and returning a single entry is the same whether you have 1 entry or 1 billion entries.
I have more than 20 years of experiences with LDAP and directory services, I've never seen any directory structured with splitting entries by the first letter of an attribute.
I once (and only once) encountered a problem similar to the one you're anticipating -- essentially you've got so many records that searching for a record creates an unacceptable user experience. In my case, there were over a million customers in the directory. What is now a rather old iteration of IBM's Tivoli Directory Server had several bugs that meant searching the directory took minutes to accomplish (indexes or no indexes). No one wants to wait minutes to log in and pay their bill! And we were constrained to using IBM's LDAP server.
In that case, I used the e-mail address used as the naming attribute when the account was created and never searched the directory. I.E. I'm cn=lisa#example.com,ou=customers,o=example within the directory. When I log in with lisa#example.com, the site programmatically formulates the bind DN as "cn=" + userInput + ",ou=customers,o=example" and validates the supplied password instead of searching for my account.
I am looking for a method (or even better a DW table) that contains report properties such as Name, Description, Type, Location, etc.
I have looked through many tables but can not find this information. I am working to build out a web portal that includes hyperlinks for all reports on the server.
Here is an example of report properties I am looking for-
Unfortunately, the definitions you're looking for are not stored at the database level, which is super lame, but that's the way it is. They're stored in the RPD file and the web catalog at the OS level.
The webcatalog is located:
on 10G: OracleBIData/web/catalog/
on 11G:
$ORACLE_INSTANCE/bifoundation/OracleBIPresentationServicesComponent/catalog/
on 12c: $ORACLE_HOME\user_projects\domains\bi\bidata\service_instances\ssi\metadata\content\catalog where ssi is a service instance.
If you descend into one of those directory structures you'll see files that are named with a bunch of punctuation symbols, plus the name of the report they represent.
Reference 1
Reference 2
Just to clarify the "lame" storage: What the OP is asking for is in the presentation catalog; the RPD has nothing to do with it.
And to clarify even further: Every object stored in the presentation catalog is physically represented by two files on the disk: one file without file extension which represents the object's XML definition. And one file with an .atr extension which contains the object's properties - what the OP is looking for - as well as the object's access permissions.
Ranting's fain, but please be precise ;-)
For what it's worth, in E-Business Suite, tables start with XDO_
I am designing a few dimensions with multiple data sources and wonder what other people have done to align the multiple business keys per data source.
My Example:
I have 2 data sources - the Ordering System and the Execution System. The Ordering system has details about payment and what should happen; the Execution System has details on what actually happened (how long it took etc, who executed on the order). Data from both systems is need to created a single fact.
In both the Ordering and Execution system they is a Location table. The business keys from both systems are mapped via an esb . There are attributes in both systems that make up the complete picture about a single location. Billing information is in the Ordering system, latitude and longitude are in the Execution system. And Location Name exists in both systems.
How do you design a SCD accomodate changes from both systems to the dimension?
We follow a fairly strict Kimball methodology - fyi, but I am open to looking at everyone's solutions.
Not necessarily an answer but here are my thoughts:
You've already covered the real options in your comment. Either:
A. Merge it beforehand
You need some merge functionality in staging which matches the two (or more) records, creates a new common merge key and uses that in the dimension. This requires some form of lookup or reference to be stored in addition to normal DW data
OR
B. Merge it in the dimension
Put both records in the dimension and allow the reporting tool to 'merge' it by, for example, grouping by location name. This means you don't need prior merging logic you just dump it in the dimension
However you have two constraints that I feel makes the choice between A & B clearer
Firstly, you need an SCD (Type 2 I assume). This means Option B could get very complicated as when there is a change in one source record you have to go find the the other record and change it as well - very unpleasant for option B. You still need some kind of pre-stored key to link them, which means option B is no longer simple
Secondly, given that you have two sources for one attribute (Location Name), you need some kind of staging logic to pick a single name when these don't match
So given these two circumstances, I suggest that option A would be best - build some pre-merging logic, as the complexity of your requirements warrants it.
You'd think this would be a common issue but I've never found a good online reference explaining how someone solved this before.
My thinking is actually very trivial. First you need to be able to conclude what is your master dataset on Geo+Location and granularity.
My method will be:
DIM loading
Say below is my target
Dim_Location = {Business_key, Longitude, Latitude, Location Name}
Dictionary
Business_key = Always maps to master record from source system (in this case it is the execution system). Imagine now the unique key from business is combined (longitude, latitude) for this table.
Location Name = Again, since we assume the "Execution system" is master for our data then it will host from Source="Execution System".
The above table is now loaded for Fact lookup.
Fact Loading
You have already integrated record between execution system and billing system. It's a straight forward lookup and load in staging since it exists with necessary combination of geo_location.
Challenging scenarios
What if execution system has a late arriving records on orders?
What if same geo_location points to multiple location names? Not possible but worth profiling the data for errors.
I'm trying to get the number of elements that match a search query.
The point is, the search will produce SizeLimitExceededException from time to time, and I would like to know exactly how many entries match the query. Therefore, counting the results obtained from the search is not an option.
Any ideas?
Thanks in advance :)
There is a reason for the size-limit being exceeded, it prevents clients from trawling the directory for object information, counting the number of objects, and so on. Trawling the directory is a) a security risk and b) will have a negative impact on old legacy server software.
There is also a time-limit that constrains the number of seconds a server may spend on a particular search which may come into play.
If all the entries (and no other)s that match your search filter are subordinate to an object, configure the server to support the numSubordinates attribute. That attribute (if supported) is the number of objects subordinate to the object in which the numSubordinates attribute appears. This method requires that all and only entries that match your search filter be stored subordinate to an object and that no other objects are subordinate to an object.
A plugin could be written to provide functionality; a plugin often has root DN access to the server database and often is not subject to access controls and as such may be able to count entries.
On recent, professional-quality servers, an DN could be created with the appropriate privileges to count the number of entries. An application could be furnished with this DN and credentials for the purpose of simply counting the entries.
I'm writing an online tax return filing application using MVC3 and EF 4.1. Part of the application requires that the taxpayer be able to upload documents associated with their return. The users will be able to come back days or weeks later and possibly upload additional documents. Prior to finally submitting their return the user is able to view a list of files that have been uploaded. I've written the application to save the uploaded files to a directory defined in the web.config. When I display the review page to the user I loop through the files in the directory and display it as a list.
I'm now thinking that I should be saving the files to the actual SQL Server as binary data in addition to saving them to the directory. I'm trying to avoid what if scenarios.
What if
A staff member accidentally deletes a file from the directory.
The file server crashes (Other agencies use the same SAN as us)
A staff member saves other files to the same directory. The taxpayer should not see those
Any other scenario that causes us to have to request another copy of a file from a taxpayer (Failure is not an option)
I'm concerned that saving to the SQL Server database will have dire consequences that I am not aware of since I've not done this before in a production environment.
There's a really good paper by Microsoft Research called To Blob or Not To Blob.
Their conclusion after a large number of performance tests and analysis is this:
if your pictures or document are typically below 256K in size, storing them in a database VARBINARY column is more efficient
if your pictures or document are typically over 1 MB in size, storing them in the filesystem is more efficient (and with SQL Server 2008's FILESTREAM attribute, they're still under transactional control and part of the database)
in between those two, it's a bit of a toss-up depending on your use
If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee foto in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee foto, too, as part of your queries.
For filegroups, check out Files and Filegroup Architecture for an intro. Basically, you would either create your database with a separate filegroup for large data structures right from the beginning, or add an additional filegroup later. Let's call it "LARGE_DATA".
Now, whenever you have a new table to create which needs to store VARCHAR(MAX) or VARBINARY(MAX) columns, you can specify this file group for the large data:
CREATE TABLE dbo.YourTable
(....... define the fields here ......)
ON Data -- the basic "Data" filegroup for the regular data
TEXTIMAGE_ON LARGE_DATA -- the filegroup for large chunks of data
Check out the MSDN intro on filegroups, and play around with it!