Multiple sub-agents for one table in Net-SNMP - snmp

I'm writing a custom MIB to expose a table over SNMP. There will be one table with set columns, but a variable numbers of rows. Is it possible, with Net-SNMP, to add multiple rows to the table from multiple processes (e.g. process A creates row 1, process B creates row 2, etc...)? I would like to avoid having one "master sub-agent" if possible (other then something that is a part of Net-SNMP, like snmpd/snmptrapd/etc).
I would like to use mib2c to help generate code if possible, but I can work around that if it can't accomplish what I need.
I'm using Net-SNMP 5.5 at the moment. Upgrading is possible if support for what I need is added in newer versions.

If writing AgentX for snmpd, it looks like you cannot share the table OID over two or more AgentXs, snmpd responds with an error that oid is a duplicate for some of the sub-agents. Thus I am continuing my sources with my own sub-sub-agents (based on Enduro/X) which collect the data into a single AgentX which would fill the SNMP table.
According to the https://www.rfc-editor.org/rfc/rfc2741.html#section-7.1.4.1 :
7.1.4.1. Handling Duplicate and Overlapping Subtrees
As a result of this registration algorithm there are likely to be
duplicate and/or overlapping subtrees within the registration data
store of the master agent. Whenever the master agent's dispatching
algorithm (see section 7.2.1, "Dispatching AgentX PDUs") determines
that there are multiple subtrees that could potentially contain the
same MIB object instances, the master agent selects one to use,
termed the 'authoritative region', as follows:
1) Choose the one whose original agentx-Register-PDU r.subtree
contained the most subids, i.e., the most specific r.subtree.
Note: The presence or absence of a range subid has no bearing
on how "specific" one object identifier is compared to another.
2) If still ambiguous, there were duplicate subtrees. Choose the
one whose original agentx-Register-PDU specified the smaller
value of r.priority.
So in best case scenario, you might get that data is randomly collected from one AgentX or another, if the same oid is registered from different AgentX processes

Related

Design a dimension with multiple data sources

I am designing a few dimensions with multiple data sources and wonder what other people have done to align the multiple business keys per data source.
My Example:
I have 2 data sources - the Ordering System and the Execution System. The Ordering system has details about payment and what should happen; the Execution System has details on what actually happened (how long it took etc, who executed on the order). Data from both systems is need to created a single fact.
In both the Ordering and Execution system they is a Location table. The business keys from both systems are mapped via an esb . There are attributes in both systems that make up the complete picture about a single location. Billing information is in the Ordering system, latitude and longitude are in the Execution system. And Location Name exists in both systems.
How do you design a SCD accomodate changes from both systems to the dimension?
We follow a fairly strict Kimball methodology - fyi, but I am open to looking at everyone's solutions.
Not necessarily an answer but here are my thoughts:
You've already covered the real options in your comment. Either:
A. Merge it beforehand
You need some merge functionality in staging which matches the two (or more) records, creates a new common merge key and uses that in the dimension. This requires some form of lookup or reference to be stored in addition to normal DW data
OR
B. Merge it in the dimension
Put both records in the dimension and allow the reporting tool to 'merge' it by, for example, grouping by location name. This means you don't need prior merging logic you just dump it in the dimension
However you have two constraints that I feel makes the choice between A & B clearer
Firstly, you need an SCD (Type 2 I assume). This means Option B could get very complicated as when there is a change in one source record you have to go find the the other record and change it as well - very unpleasant for option B. You still need some kind of pre-stored key to link them, which means option B is no longer simple
Secondly, given that you have two sources for one attribute (Location Name), you need some kind of staging logic to pick a single name when these don't match
So given these two circumstances, I suggest that option A would be best - build some pre-merging logic, as the complexity of your requirements warrants it.
You'd think this would be a common issue but I've never found a good online reference explaining how someone solved this before.
My thinking is actually very trivial. First you need to be able to conclude what is your master dataset on Geo+Location and granularity.
My method will be:
DIM loading
Say below is my target
Dim_Location = {Business_key, Longitude, Latitude, Location Name}
Dictionary
Business_key = Always maps to master record from source system (in this case it is the execution system). Imagine now the unique key from business is combined (longitude, latitude) for this table.
Location Name = Again, since we assume the "Execution system" is master for our data then it will host from Source="Execution System".
The above table is now loaded for Fact lookup.
Fact Loading
You have already integrated record between execution system and billing system. It's a straight forward lookup and load in staging since it exists with necessary combination of geo_location.
Challenging scenarios
What if execution system has a late arriving records on orders?
What if same geo_location points to multiple location names? Not possible but worth profiling the data for errors.

Create subsets for certain Resources to better fit existing data model?

We are trying to implement a FHIR Rest Server for our application. In our current data model (and thus live data) several FHIR resources are represented by multiple tables, e.g. what would all be Observations are stored in tables for vital values, laboratory values and diagnosis. Each table has an independent, auto-incrementing primary ID, so there are entries with the same ID in different tables. But for GET or DELETE calls to the FHIR server a unique ID is needed. What would be the most sensible way to handle this?
Searching didn't reveal an inherent way of doing this, so I'm considering these two options:
Add a prefix to all (or just the problematic) table IDs, e.g lab-123 and vit-123
Add a UUID to every table and use that as the logical identifier
Both have drawbacks: an ID parser is necessary for the first one and the second requires multiple database calls to identify the correct record.
Is there a FHIR way that allows to split a resource into several sub-resources, even in the Rest URL? Ideally I'd get something like GET server:port/Observation/laboratory/123
Server systems will have all sorts of different divisions of data in terms of how data is stored internally. What FHIR does is provide an interface that tries to hide those variations. So Observation/laboratory/123 would be going against what we're trying to do - because every system would have different divisions and it would be very difficult to get interoperability happening.
Either of the options you've proposed could work. I have a slight leaning towards the first option because it doesn't involve changing your persistence layer and it's a relatively straight-forward transformation to convert between external/fhir and internal.
Is there a FHIR way that allows to split a resource into several
sub-resources, even in the Rest URL? Ideally I'd get something like
GET server:port/Observation/laboratory/123
What would this mean for search? So, what would /Obervation?code=xxx search through? Would that search labs, vitals etc combined, or would you just allow access on /Observation/laboratory?
If these are truly "silos", maybe you could use http://servername/lab/Observation (so swap the last two path parts), which suggests your server has multiple "endpoints" for the different observations. I think more clients will be able to handle that url than the url you suggested.
Best, still, I think is having one of your two other options, for which the first is indeed the easiest to implement.

Handling Duplicates in Data Warehouse

I was going through the below link for handling Data Quality issues in a data warehouse.
http://www.kimballgroup.com/2007/10/an-architecture-for-data-quality/
"
Responding to Quality Events
I have already remarked that each quality screen has to decide what happens when an error is thrown. The choices are: 1) halting the process, 2) sending the offending record(s) to a suspense file for later processing, and 3) merely tagging the data and passing it through to the next step in the pipeline. The third choice is by far the best choice.
"
In some dimensional feeds (like Client list), sometimes we get a same Client twice (the two records having difference in certain attributes). What is the best solution in this scenario?
I don't want to reject both records (as that would mean incomplete client data).
The source systems are very slow in fixing the issue, so we get the same issues every day. That means a manual fix to the problem also is tough as it has to be done every day (we receive the client list everyday).
Selecting a single record is not possible as we don't know what the correct value is.
Having both the records in our warehouse means our joins are disrupted. Because of two rows for the same ID, the fact table rows are doubled (in a join).
Any thoughts?
What is the best solution in this scenario?
There are a lot of permutations and combinations with your scenario. The big questions is "Are the differing details valid or invalid? as this will change how you deal with them.
Valid Data Example: Record 1 has John Smith living at 12 Main St, Record 2 has John Smith living at 45 Main St. This is valid because John Smith moved address between the first and second record. This is an example of Valid Data. If the data is valid you have options such as create a slowly changing dimension and track the changes (end date old record, start date new record).
Invalid Data Example: However if the data is INVALID (eg your system somehow creates duplicate keys incorrectly) then your options are different. I doubt you want to surface this data, as it's currently invalid and, as you pointed out, you don't have a way to identify which duplicate record is "correct". But you don't want your whole load to fail/halt.
In this instance you would usually:
Push these duplicate rows to a "Quarantine" area
Push an alert to the people who have the power to fix this operationally
Optionally select one of the records randomly as the "golden detail" record (so your system will still tally with totals) and mark an attribute on the record saying that it's "Invalid" and under investigation.
The point that Kimball is trying to make is that Option 1 is not desirable because it halts your entire system for errors that will happen, Option 2 isn't ideal because it means your aggregations will appear out of sync with your source systems, so Option 3 is the most desirable as it still leads to a data fix, but doesn't halt the process or the use of the data (but it does alert the users that this data is suspect).

Search parameters with sort on DeviceObservationReport

Given a Resource such as DeviceObservationReport, a number of fields have cardinality 0..many. In some cases these contain reference(s) to other Resource(s) which may also have cardinality 0..many. I am having considerable difficulty in deciding how to support 'chained' queries over referenced Resources which may be two or three steps 'deep' (for want of a better term).
For example, in a single DeviceObservationReport there may be multiple Observation Resource references. It is entirely probable that a client may wish to perform a query which requests all instances of an Observation with a specific code, which have a timestamp (appliesDate) later than a specific instant in time. The named Search Parameter observation would appear to be the obvious starting point and the Path to the observation is specified as virtualDevice.channel.metric.observation. Given that the virtualDevice, channel, and metric fields have cardinality 0..*, would a 'simple' query to retrieve all DeviceObservationReport instances which contain observations with code TESTCODE and observed later than 14:00 on 10 October 2014 look something like:
../data/DeviceObservationReport?virtualDevice[0].channel[0].metric[0].observation.name=TESTCODE&virtualDevice[0].channel[0].metric[0].observation.date>2014-10-10%2014:00
Secondly, if the client requests that the result set be sorted on date, how would that be expressed in the query, because, from the various attempts I have made to implement this, at this point support for the query becomes rather more complex, and thus far I have not been able to come up with a satisfactory solution.
firstly, the path for the parameter is the path within the resource, and chaining links between the defined names. So your query would look like this:
../data/DeviceObservationReport?observation.name=TESTCODE&observation.date=>2014-10-10%2014:00
e.g. the search parameters are aliases within the resource. However the problem with this search is that the parameters are anded at the root, not the leaf - which means this finds all device observation reports that have an observation with a TESTCODE, and that have an observation with date >DATE, which is subtly different to what you probably want: all device observation reports that have an observation with a TESTCODE, and a date >DATE. This will be addressed in the next major release of FHIR.
Sorting is tough with a chained query. I ended up extracting the field that I sort by, but not actually sorting by it - I insert the raw matches into a holding table, and then sort by the sort field as I access the secondary table. The principle reason for this is to make paging robust against on-going changes to the resources.

Obtaining number of rows written to multiple targets in Informatica Power Center

I have a requirement where I need to obtain the number of rows written to multiple targets in my mapping. There are 3 targets in my mapping (T1, T2 and T3). I need the number of rows written to each target separately. These values need to be used in subsequent sessions.
I understand that there is a method where I can use separate counters and write them to a flat file and perform a lookup on this file in subsequent mappings. However, I am looking for a direct and better approach to this problem.
You can use the $PMTargetName#numAffectedRows built-in variables. In your case it would be something like
$PMT1#numAffectedRows
$PMT2#numAffectedRows
$PMT3#numAffectedRows
Please refer An ETL Framework for Operational Metadata Logging for details.

Resources