StringToWordVector filter under weka

StringToWordVector filter under weka - filter

My data are passed through StringToWordVector filter. StringToWordVector can output binary presence/absence indicators, word frequencies or TF-IDF scores. what is the default output of this filter under weka?

According to the options specified in this documentation,
-C
Output word counts rather than boolean word presence.
the default output is boolean word presence, and that can be changed by passing it arguments such as -C.

Related

What is the syntax for MariaDB 'IN NATURAL LANGUAGE MODE'?

According to the MariaDB documentation:
There are no special operators, and searches consist of one or more
comma-separated keywords.
The search clearly does not need to be comma-separated, as replacing commas with spaces gives the same result.
I assume that it breaks the string into separate keywords, but exactly how doesn't appear to be well documented.
With my test data, these two return the same results:
AGAINST('Quality Water Environment' IN NATURAL LANGUAGE MODE)
AGAINST('Quality Water åîøüé!##$%^&*()_+Environment' IN NATURAL LANGUAGE MODE)
The second search has some characters that I consider to be 'word characters' that seem to have no influence on the result.
So what exactly is accepted by this function, and what is filtered out?

Is there a way to search fhir resources on a text search parameter using wildcards?

I'm trying to search for all Observations where "blood" is associated with the code using:
GET [base]/Observation?code:text=blood
It appears that the search is matching Observations where the associated text starts with "blood" but not matching on associated text that contains "blood".
Using the following, I get results with a Coding.display of "Systolic blood pressure" but I'd like to also get these Observations by searching using the text "blood".
GET [base]/Observation?code:text=sys
Is there a different modifier I should be using or wildcards I should use?

The servers seem to do as the spec requests: when using the modifier :text on a token search parameter (like code here), the spec says:
":text The search parameter is processed as a string that searches
text associated with the code/value"
If we look at how a server is supposed to search a string, we find:
"By default, a field matches a string query if the value of the field
equals or starts with the supplied parameter value, after both have
been normalized by case and accent."
Now, if code would have been a true string search parameter, we could have applied the modifier contains, however we cannot stack modifiers, so in this case code:text:containts would may logical, but is not part of the current specification.
So, I am afraid that there is currently no "standard" way to do what you want.

How to filter a column in LibreOffice with some specific characters

I have a LibreOffice-Calc file like this:
There is a column called "code" here. How can I filter those based on "code" column that are in the style of ""11*-1*-91*-700"".
Character * here means any character.
I mean the output is the same as this:
1121-117-912-700
1122-121-912-700
1121-117-911-700
...

Use a Standard Filter on the Code column. When you get to the Standard Filter dialog, click Options and check Regular expression.
Set the Field name to Code, the Condition to = and supply the following for Value,
^11.?.?-1.?.?-91.?-700
Each question mark (e.g. ?) represents a single wildcard character and must be accompanied by a prefix period (aka . or full stop).
        
I didn't type out all of your sample data but I did type out enough to verify an answer.
You will also have to make sure that Tools ► Options ► OpenOffice Calc ► Calculate has Enable regular expressions in formulas enabled.

What does Elasticsearch's auto_generate_phrase_queries do?

In the docs for query string query, auto_generate_phrase_queries is listed as a parameter but the only description is "defaults to false." So what does this parameter do exactly?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

This will directly match to the lucene's org.apache.lucene.queryparser.classic.QueryParserSettings#autoGeneratePhraseQueries. When the analyzer applied on the query string, this setting allows lucene to generate quoted phrases no keywords.
Quoting:
SOLR-2015: Add a boolean attribute autoGeneratePhraseQueries to TextField.
autoGeneratePhraseQueries="true" (the default) causes the query parser to
generate phrase queries if multiple tokens are generated from a single
non-quoted analysis string. For example WordDelimiterFilter splitting text:pdp-11
will cause the parser to generate text:"pdp 11" rather than (text:PDP OR text:11).
Note that autoGeneratePhraseQueries="true" tends to not work well for non whitespace
delimited languages.
where word delimiter works as WordDelimiterFilter.html
Important thing to note is single non-quoted analysis string, i.e. if your query string is non-quoted. If you are already searching for a quoted phrase then it won't make any sense.

How to use regular expression in fetching data from graphite?

I want to fetch data from different counters from graphite in one single request like:-
summarize(site.testing_server_2.triggers_unknown.count,'1hour','sum')&format=json
summarize(site.testing_server_2.requests_failed.count,'1hour','sum')&format=json
summarize(site.testing_server_2.core_network_bad_soap.count,'1hour','sum')&format=json
and so on.. 20 more.
But I don't want to fetch
summarize(site.testing_server_2.module_xyz_abc.count,'1hour','sum')&format=json
in that request how can i do that?
This is what I tried:
summarize(site.testing_server_2.*.count,'1hour','sum')&format=json&from=-24hour
It gets json data for 'module_xyz_abc' too, but that i don't want.

You can't use regular expressions per se, but you can use some similar (in concept and somewhat in format) matching techniques available within the Graphite Render URL API. There are a few ways you can "match" within a target's "bucket" (i.e. between the dots).
Target Matching
Asterisk * match
The asterisk can be used to match ANY -zero or more- character(s). It can be used to replace the entire bucket (site.*.test) or within the bucket (site.w*t.test). Here is an example:
site.testing_server_2.requests_*.count
This would match site.testing_server_2.requests_failed.count, site.testing_server_2.requests_success.count, site.testing_server_2.requests_blah123.count, and so forth.
Character range [a-z0-9] match
The character range match is used to match on a single character (site.w[0-9]t.test) in the target's bucket and is specified as a range or list. For example:
site.testing_server_[0-4].requests_failed.count
This would match on site.testing_server_0.requests_failed.count, site.testing_server_1.requests_failed.count, site.testing_server_2.requests_failed.count, and so forth.
Value list (group capture) {blah, test, ...} match
The value list match can be used to match anything in the list of values, in the specified portion of the target's bucket.
site.testing_server_2.{triggers_unknown,requests_failed,core_network_bad_soap}.count
This would match site.testing_server_2.triggers_unknown.count, site.testing_server_2.requests_failed.count, and site.testing_server_2.core_network_bad_soap.count. But nothing else, so site.testing_server_2.module_xyz_abc.count would not match.
Answer
Without knowing all of your bucket values it is difficult to be surgical with the approach (perhaps with a combination of the matching options), so I'll recommend just going with a value list match. This should allow you to get all of the values in one -somewhat long- request. For example (and keep in mind you'd need to include all of your values):
summarize(site.testing_server_2.{triggers_unknown,requests_failed,core_network_bad_soap}.count,'1hour','sum')&format=json&from=-24hour
For more, see Graphite Paths and Wildcards

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

StringToWordVector filter under weka - filter

My data are passed through StringToWordVector filter. StringToWordVector can output binary presence/absence indicators, word frequencies or TF-IDF scores. what is the default output of this filter under weka?

According to the options specified in this documentation, -C Output word counts rather than boolean word presence. the default output is boolean word presence, and that can be changed by passing it arguments such as -C.

Related

What is the syntax for MariaDB 'IN NATURAL LANGUAGE MODE'?

Is there a way to search fhir resources on a text search parameter using wildcards?

How to filter a column in LibreOffice with some specific characters

What does Elasticsearch's auto_generate_phrase_queries do?

How to use regular expression in fetching data from graphite?

Categories

Resources