I'm using the Weka application and using a CVS file, I need to remove the instances with missing values. I tried to use the multi filter and use the removevalues filter, but I think I am doing it wrong since it filters ALL my instances. How do I do this right exactly?
To remove instances with missing values from a few attributes you can use weka.filters.unsupervised.instance.SubsetByExpression and use an expression such as
not ismissing(ATT5)
to remove instances with missing values in the attribute with index 5, or
not (ismissing(ATT5) or ismissing(ATT8))
to remove instances with missing values in attributes 5 or 8, and so on.
If you were trying to use the RemoveWithValues filter, it can be done this way but you need to clear the nominalIndices field (removing the -L argument from the filter command) and set a splitPoint value more negative than the minimum value of the attribute being filtered. Otherwise this filter will match any instance whose value matches any of these conditions.
I can't see any obvious way of removing instances that have missing values in any attribute, other than building an expression for SubsetByExpression that checks all of them one by one.
Related
Currently I am having around 15 attributes in my flowfile. Out of these 15, i only want a few (all the attributes that have a prefix 'error_' in it. These 'error_*' attributes can have 2 sets of values, eighter- 'valid' or some error code, say- '945'. Now i want to iterate though all the attributes with prefix - 'error_' and if its value is 'valid', do nothing and if its value is having some error code, append the error code to a string separated by ';'. So basically, if I have 5 error_ attributes:
error_field1: '123'
error_field2: 'Valid'
error_field3: '567'
error_field4: 'Valid'
error_field5: '45'
I want my output as - '123;567;45'.
Please help me as i am new to Nifi and i am not sure on how to work with such complex EL.
There are a couple ways to perform this.
${anyMatchingAttribute('error_'):find('\\d+')}
You can use the anyMatchingAttribute() function to evaluate a predicate against multiple attributes, and use the regular expression find() method to check for the presence of digits. This will give you a boolean result, but won't enumerate & join all the values.
${allMatchingAttributes('error_'):join(';'):replaceAll('Valid;', '')}
If you don't need to recall and associate the error codes with the specific field where they were sourced, you can simply concatenate all of the attributes and then use a regular expression to remove the Valid values.
Currently, when I use app.Tap I have to give it the exact string.
I want to do something like app.Tap("sstring") and still match elements marked with e.g. "somessting", "someSStRing", etc.
is it possible to have that somehow? it sounds like a simply thing, but I couldn't find a way to do it and it's surprising that there is no option to make it behave that way.
Have you tried doing it via the function overload and specifying the id?
app.Tap(e => e.Id("sstring"));
Marked searches many properties on each element to return any matches.
When setting repeated content in a section in Orbeon each control is repeated and the their names are the same. How do I access the the control from the first, second...etc instance of a control from each iterated section? I'm thinking along the lines of $control-name[instance#] or something similair.
The following works, given this form:
$name[2]: return the second value
string-join($name, ', '): join all values with commas
count($name): return the number of values
See also the relevant documentation.
To access this value in "bind" section, you can use a relative path , like ../name=''.
TO access this value in "body" section, then you can use context()/../name.
If you are trying to make anything different, be more specific and this answer can be edited to be according to what you want.
I have a couchdb view set up using an array key value, in the format:
[articleId, -timestamp]
I want to query for all entries with the same article id. All timestamps are acceptable.
Right now I am using a query like this:
?startkey=["A697CA3027682D5JSSC",-9999999999999]&endkey=["A697CA3027682D5JSSC",0]
but I would like something a bit simpler.
Is there an easy way to completely wildcard the second key element? What would be the simplest syntax for this?
First, as a comment pointed out, there is indeed a special value {} that is ordered after any value, so your query becomes:
startkey=["target ID"]&endkey=["target ID",{}]
This is as equivalent to a wildcard match.
As a side note, there is no need to reverse the ordering in the map function by emitting a negative timestamp, you can reverse the order as an option to the view invocation (your start and end key will be swapped).
startkey=["target ID",{}]&endkey=["target ID"]&descending=true
For future reference, in CouchDB 3 you can use "\ufff0" instead of {}, which would be ordered after a string or number, but before an object.
From the CouchDB 3 docs:
Beware that {} is no longer a suitable “high” key sentinel value. Use a string like "\ufff0" instead.
The query startkey=["foo"]&endkey=["foo",{}] will match most array keys with “foo” in the first element, such as ["foo","bar"] and ["foo",["bar","baz"]]. However it will not match ["foo",{"an":"object"}]
This may be a silly question, but is it possible to make a query using XPath without specifying the element name?
Normally I would write something like
//ElementName[#id = "some_id"]
But the thing is I have many (about 40) different element types with an id attribute and I want to be able to return any of them if the id fits. But I don't want to make this call for each type individually. Is it possible to search all of them at once, regardless of the name?
I am using this in an XQuery script, if that offers any help.
use * instead of name //*[#id = "some_id"]
It might be more efficient to look directly at the #id elements - //* will work, but will initially return every node in the document and then filter!
That may not matter in a small document, of course. but here's an alternative:
//#id[.="some_id"]/..