I want to get only the CIs that are at the next level of a specified CI in uCMDB. Is it possible to call getAllCIs by specifying depth? Is there a more efficient way?
Related
I want to integrate fingerprint sensor in my project. For the instance I have shortlisted R307, which has capacity of 1000 fingerprints.But as project requirement is more then 1000 prints,so I will going to store print inside the host.
The procedure I understand by reading the datasheet for achieving project requirements is :
I will register the fingerprint by "GenImg".
I will download the template by "upchr"
Now whenever a fingerprint come I will follow the step 1 and step 2.
Then start some sort of matching algorithm that will match the recently downloaded template file with
the template file stored in database.
So below are the points for which I want your thoughts
Is the procedure I have written above is correct and optimized ?
Is matching algorithm is straight forward like just comparing or it is some sort of tricky ? How can
I implement that.Please suggest if some sort of library already exist.
The sensor stores the image in 256 * 288 pixels and if I take this file to host
at maximum data rate it takes ~5(256 * 288*8/115200) seconds. which seems very
large.
Thanks
Abhishek
PS: I just mentioned "HOST" from which I going to connect sensor, it can be Arduino/Pi or any other compute device depends on how much computing require for this task, I will select.
You most probably figured it out yourself. But for anyone stumbling here in future.
You're correct for the most part.
You will take finger image (GenImg)
You will then generate a character file (Img2Tz) at BufferID: 1
You'll repeat the above 2 steps again, but this time store the character file in BufferID: 2
You're now supposed to generate a template file by combining those 2 character files (RegModel).
The device combines them for you, and stores the template in both the character buffers
As a last step; you need to store this template in your storage (Store)
For searching the finger: You'll take finger image once, generate a character file in BufferID : 1 and search the library (Search). This performs a linear search and returns the finger id along with confidence score.
There's also another method (GR_Identify); does all of the above automatically.
The question about optimization isn't applicable here, you're using a 3rd party device and you have to follow the working instructions whether it's optimized or not.
The sensor stores the image in 256 * 288 pixels and if I take this file to host at maximum data rate it takes ~5(256 * 288*8/115200) seconds. which seems very large.
I don't really get what you mean by this, but the template file ( that you intend to upload to your host ) is 512 bytes, I don't think it should take much time.
If you want an overview of how this system is implemented; Adafruit's Library is a good reference.
(note: this is related to a question I posted before
H2O (open source) for K-mean clustering)
I am using K-Means for our data set of about 100 features (some of them are timestamps)
(1) I checked the “OUTPUT - CLUSTER MEANS” section and the timestamp filed is with the value like “1.4144556086883196e+22”. Our timestamp file is about data in year 2018 and the year 2018 Unix time is like “1541092918000”. Hence, it cannot be that big number “1.4144556086883196e+22”. My understand of the numbers in “OUTPUT - CLUSTER MEANS” section should be close to the raw data (before standarization). Right ?
(2) About standardization, can you use this example https://github.com/h2oai/h2o-3/blob/master/h2o-genmodel/src/test/resources/hex/genmodel/algos/kmeans/model.ini#L21-L27 and tell me how the input data is converted to standardized value? Say, I have a raw vector of value ( a,b,c,d, 1.8 ) , I only keep last element and omit others. How can I know if it’s close to center 2 below in this example. Can you show me how H2O convert the raw data using standardize_means, standardize_mults and standardize_modes. I am sure H2O has a way to compute standardized value from the model output, but I cannot find the place and the formula.
center_2 = [2.0, 0.0, -0.5466317772145349, 0.04096506994984166, 2.1628815416218337]
Thanks.
1) I'm not sure where you are seeing a timestamp in Flow or if you mean your dataset contains a timestamp that H2O-3 has converted. Either way it sounds like you may have encountered a bug. The timestamps you see in H2O-3 are milliseconds since the Unix epoch, so you have to divide by 1000 before using a unix time converter (for example you could use https://currentmillis.com/). But again given that the number is so large, I'm leaning towards a bug - any code you can provide to make it reproducible would be great.
1a) When you check standardize in flow in addition to “OUTPUT - CLUSTER MEANS” (which is not standardized) you will see "OUTPUT - STANDARDIZED CLUSTER MEANS" so the non-standardize output should reflect the unit of your input.
2) Standardization in H2O-3 is described here (which says: "standardizes numeric columns to have zero mean and unit variance. "). The link you provided points to a model for testing that has been saved as MOJO and I'm not sure it makes sense to use as an example. But in general the way standardization works for h2o-3 is as standardization is defined.
I'm very experienced with Apache Camel and EIPs and am struggling to understand how to implement equivalents in Nifi. I understand that Nifi uses a different paradigm (flow based programming) but I don't think what I'm trying to do is unreasonable.
In a nutshell I want the contents of each file to be sent to many rest services and I want to aggregate the responses into a single document which will stored in elasticsearch. I might also do some further processing and cleanup to improve what is stored (but this isn't my immediate issue)
The screenshot is a quick mock-up of what I'm trying to achieve but I don't understand enough about Nifi to know how to implement this pattern correctly.
If you are going to take a single piece of data and then fork to multiple parts of the flow and then converge back, there needs to be a way for MergeContent to know which pieces go together.
There are generally two ways this can be done...
The first is using MergeContent in "defragment mode". Think of this as reversing a split operation that was performed by one of the split processors like SplitText. For example, you split a file of 100 lines into 100 flow files of 1 line each, then do some stuff to each one, then want to converge back. The split processors produce a standard set of split attributes (described in the docs of the processors) and the defragment mode knows how to bin the splits accordingly and merge them back together. This probably doesn't apply to your example since you didn't start with a split processor.
The second approach is the "Correlation Attribute" in MergeConent. This tells merge content to only merge flow files together that have the same value for the attribute specified. In your example, when a file gets picked up by GetFile and sent to 3 InvokeHttp processors, there are 3 flow files created, and they all should have their "filename" attribute set to the name of the file picked up from disk. So telling MergeContent to correlate on filename should do the trick, and probably setting the min and max number of entries to the number you expect like 3, and a maximum time in case one of them fails or hangs.
I was wondering how can I configure Hbase in a way to store just the first version of each cell? Suppose the following Htable:
row_key cf1:c1 timestamp
----------------------------------------
1 x t1
After putting ("1","cf1:c2",t2) in the scenario of ColumnDescriptor.DEFAULT_VERSIONS = 2 the mentioned Htable becomes:
row_key cf1:c1 timestamp
----------------------------------------
1 x t1
1 x t2
where t2>t1.
My question would be how can I change this scenario in a way that the first version of cell would be the only version that could be store and retrieve. I mean in the provided example the only version would be 't1' one! Thus, I want to change hbase in a way that ignore insertion on duplicates.
I know that setting VERSIONS to 1 for Htable and putting based on Long.MAX_VALUE - System.currentTimeMillis() would solve my problem but I dont know is it the best solution or not?! What is the concern of changing tstamp to Long.MAX_VALUE - System.currentTimeMillis()? Does it has any performance issue?
There are two strategies that I can think of:
1. One version + inverted timestamp
Setting VERSIONS to 1 for Htable and putting based on Long.MAX_VALUE - System.currentTimeMillis() will generally work and does not have any major performance issues.
On write:
When multiple versions of the same cell are written to hbase, at any point in time, all versions will be written (without any impact on performance). After compaction only the cell with the highest timestamp will survive.
The cell with the highest timestamp in this scheme is the one written by the client with the lowest value for System.currentTimeMillis(). It should be noted that this might not actually be the machine who tried to write to the cell first, since hbase clients might be out of sync.
On read:
When multiple versions of the same cell are found pruning will occur at that time. This can happen at any time, since your writes can occur at any time, even after compaction. This has a very slight impact on performance.
2. checkAndPut
To get true ordering through atomicity, meaning only the first write to reach the region server will succeed, you can use the checkAndPut operation:
From the docs:
public boolean checkAndPut(byte[] row, byte[] family, byte[] qualifier, byte[] value, Put put) throws IOException
Atomically checks if a row/family/qualifier value matches the expected
value. If it does, it adds the put. If the passed value is null, the
check is for the lack of column (ie: non-existance)`
So by setting value to null your Put will only succeed if the cell did not exist. If your Put succeeded then the return value will be true. This gives true atomicity, but at a write performance cost.
On write:
A row lock is set and a Get is issued internally before existance is checked. Once non-existance is confirmed the Put is issued. As you can imagine this has a pretty big performance impact for each write, since each write now also involves a read and a lock.
During compaction nothing needs to happen, because only one Put will ever make it to hbase. Which is always the first Put to reach the region server.
It should be noted that there is no way to batch these kind of checkAndPut operations by using checkAndMutate, since each Put needs it own check. This means each put needs to be a separate request, which means you will be paying a latency cost as well when writing in batches.
On read:
Only ever one version will make it to Hbase, so there is no impact here.
Picking between strategies:
If true ordering really matters or you may need to read each row after or before you write to hbase anyway (for example to find out if your write succeeded or not), you're better of with strategy 2, otherwise, in all other cases, I'd recommend strategy 1, since its write performance is much better. In that case just make sure your clients are properly time synced.
You can insert the Put with Long.MAX_VALUE - timestampand configure the table to store only 1 version (max versions => 1). This way only the first (earliest) Put will be returned by the Scan because all successive Puts will have a smaller timestamp value.
I want to raise the maximum number of threads in the default work manager's thread pool using a wsadmin (Jython) script. What is the best approach?
I can't seem to find documentation of a fine-grained control that would let me modify just this property. The closest I can find to what I want is AdminTask.applyConfigProperties, which requires passing a file. The documentation explains that if you want to modify an existing property, you must extract the existing properties file, edit it in an editor, and then pass the edited file to applyConfigProperties.
I want to avoid the manual step of extracting the existing properties file and editing it. The scripts needs to run completely unattended. In fact, I'd prefer to not use a file at all, but just set the property to a value directly in the script.
Something like the following pseudo-code:
defaultwmId = AdminConfig.getid("wm/default")
AdminTask.setProperty(defaultwmId, ['-propertyName', maxThreads, '-propertyValue', 20])
The following represents a fairly simplistic wsadmin approach to updating the max threads on the default work managers:
workManagers = AdminConfig.getid("/WorkManagerInfo:DefaultWorkManager/").splitlines()
for workManager in workManagers :
AdminConfig.modify(workManager, '[[maxThreads "20"]]')
AdminConfig.save()
Note that the first line will retrieve all of the default work managers across all scopes, so if you want to only choose one (for example, if you only one to modify a particular application server or cluster's work manager properties), you will need to refine the containment path further. Also, you may need to synchronize the nodes and restart the modified servers in order for the property to be applied at runtime.
More information on the use of the AdminConfig scripting object can be found in the WAS InfoCenter:
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websphere.nd.doc/info/ae/ae/rxml_adminconfig1.html