spamassassin filter for custom expression - spamassassin

I would like to add a rule that blocks all incoming e-mails that contain a certain expression. Ex: 'Test Phrase'. I have added the line
rawbody NO_SPAMW /Test" *"Phrase/i
but it seems it doesn't work. Can you tell me what is the correct way to parse a space to spamassassin?
Thank you!

You can match a space with \s.
rawbody TEST_PHRASE /test\s*phrase/i
score TEST_PHRASE 0.1
describe TEST_PHRASE This is a test
More about writing custom rules here

Related

Can you write rules in a programmatic if-then style?

Can you do if-then style statements in SpamAssassin?
I get spam sent to me that uses my email address for the sender's name and I would like to write a general rule for this.
For example, I receive spam messages with From: and To: lines like this:
From: "me#mydomain.org" <spam#spam.com>
To: <me#mydomain.org>
Below I refer to this format as:
From: "Name" <address>
To: <address>
Is it possible to write a rule that says:
if
the (From: name)
is equal to (To: email address)
but not the (From: email address)
then
give it a score?
I am thinking this specifically in case my server automatically sends messages in a similar format, such as: "root#mydomain.org" <root#mydomain.org>.
I don't want the rule to accidentally score emails like that.
I only see how to write positive rules. So I can look for these kinds of simple matches
header LOCAL_FROM_NAME_MyAddress From =~ /\"me#mydomain.org\"/
header LOCAL_FROM_Address_MyAddress From =~ /<me#mydomain.org>/
header LOCAL_TO_Address_MyAddress From =~ /<me#mydomain.org>/
So I could create a score if they all produced a match:
meta LOCAL_FROM_ME_TO_ME ((LOCAL_FROM_NAME_MyAddress + LOCAL_FROM_Address_MyAddress + LOCAL_TO_Address_MyAddress) >2)
score LOCAL_FROM_ME_TO_ME -0.1
But that is as far as I can go. I haven't seen any way to do something more complex.
SpamAssassin meta rules support boolean expressions, so you can use the &&, ||, and ! operators to create more complex matches. In the specific example you've given, the rule is logically equivalent to:
(FROM_NAME equals MyAddress) and (FROM_ADDR does not equal MyAddress)
A ruleset to express this could be:
header __LOCAL_FROM_NAME_MyAddress From:name =~ /me\#mydomain\.org/
header __LOCAL_FROM_ADDR_MyAddress From:addr =~ /me\#mydomain\.org/
meta LOCAL_SPOOFED_FROM (__LOCAL_FROM_NAME_MyAddress && !__LOCAL_FROM_ADDR_MyAddress)
score LOCAL_SPOOFED_FROM 5.0
If meta rules and boolean expressions are not enough, you can write a Perl plugin. Check out the many examples on CPAN, and perhaps specifically Mail::SpamAssassin:FromMailSpoof.
Notes
You can write :name and :addr to parse specific parts of the From and To headers.
You can prefix your sub-rules with __ so that they will not score on their own.
Special characters like # and . should be escaped in regex patterns.

Rasa RegexFeaturizer is it based on token or whole sentence?

- regex: regex features for intent classification
examples: |
- \bon road pric/i
- \bonroad pric/i
I have tested above regex and they are working fine. Hence I am sure there is no issue with regex expression
Example:
training-row-1] Please tell me on road price now.
training-row-2] Please tell me price now.
Based on above regex pattern, regex features which should get added are:
training-row-1] Please tell me on road price now. ==> TRUE (because regex match)
training-row-2] Please tell me price now. ==> FALSE (regex don't match)
My question is, In RegexFeaturizer, does regex match happens on whole sentence or on each token?
It make sense to have it on whole sentence.
Is above featurization which I have assumed is correct or no?
I've found the following docstring in the code for the RegexFeaturizer.
"""
Given a sentence, returns a vector of {1,0} values indicating which
regexes did match. Furthermore, if the message is tokenized, the
function will mark all tokens with a dict relating the name of the
regex to whether it was matched.
"""
So I think it's taking the entire sentence as input. It's hard to see inside of the feature space in Rasa but I've confirmed that the correct entity is picked up across tokens when using the RegexEntityExtractor. This is easily verified by temporarily adding entity examples in your NLU data (make sure it appears at least twice in intents) and running rasa interactive.

how to implement complex pattern matching in Spring batch using PatternMatchingCompositeLineMapper

How can we implement pattern matching in Spring Batch, I am using org.springframework.batch.item.file.mapping.PatternMatchingCompositeLineMapper
I got to know that I can only use ? or * here to create my pattern.
My requirement is like below:
I have a fixed length record file and in each record I have two fields at 35th and 36th position which gives record type
for example below "05" is record type which is at 35th and 36th position and total length of record is 400.
0000001131444444444444445589868444050MarketsABNAKKAAAAKKKA05568551456...........
I tried to write regular expression but it does not work, i got to know only two special character can be used which are * and ? .
In that case I can only write like this
??????????????????????????????????05?????????????..................
but it does not seem to be good solution.
Please suggest how can I write this solution, Thanks a lot for help in advance
The PatternMatchingCompositeLineMapper uses an instance of org.springframework.batch.support.PatternMatcher to do the matching. It's important to note that PatternMatcher does not use true regular expressions. It uses something closer to ant patterns (the code is actually lifted from AntPathMatcher in Spring Core).
That being said, you have three options:
Use a pattern like you are referring to (since there is no short hand way to specify the number of ? that should be checked like there is in regular expressions).
Create your own composite LineMapper implementation that uses regular expressions to do the mapping.
For the record, if you choose option 2, contributing it back would be appreciated!

Case insensitive substring LDAP search on OpenLDAP 2.4.33

The current question is not the same as this one.
I have an LDAP entry which the content "This is a SimpleTest indeed" in the "supName" field.
I need to write a filter so that when the user of my software introduces any substring of this content in any case (upper, lower or mixed case), it finds the entry. It must work even if the user does not input a complete word ("impletes", for example).
The supName field follows DirectoryString syntax. This means that the default matching rule is exact and case sensitive ("caseExactMatch"). But this syntax in theory, should allow also "caseIgnoreMatch" and "caseIgnoreSubstringsMatch" matching rules. I though I just needed to force to use the last one ("caseIgnoreSubstringsMatch"), so I tried this filter:
(supName:caseIgnoreSubstringsMatch:=*impletes*)
But this does not work. I make my tests using Apache Directory Studio, and that tool refuses to accept the above filter. It complains on the asterisks, and I don't understand why, since I am using a Substring match (and thus asterisks should be allowed). If I run the filter from command line (using ldapsearch), I get this error message:
ldap_search_ext: Bad search filter (-7)
Therefore this is not an issue with Apache Directory Studio.
So my question is: What is the correct way of defining a case-insensitive substring filter on a field that is case-sensitive by default?
Futher tests:
What follows are some other filters I have tested, and the reasons they do not suit me.
Test #1 filters:
(supName=*impleTes*)
This operator (=) returns my test entry, but it's not case-insensitive. If I replace "impleTes" for "impletes" it does not return anything.
Test #2 filter:
(supName~=simpletest)
This operator (~=) works, but:
It needs a complete word. If I replace "simpletest" for "impletes" it does not return anything.
As it is an "approximate" search operator, it may return unwanted results. For example, the filter above returns also a second entry: "This is a SimpleTast indeed" (notice the "a" instead of "e" in "SimpleTast"). I don't want approximate results.
Test #3 filter:
(supName:caseIgnoreMatch:=this is a simpletest indeed)
This returns the entry I was expecting, and only that entry. It is also case-insensitive. But it forces the user to write the whole content of the field: it is not a substring search, but a case-insensitive exact-match search.
Test #4 filter:
(supName:caseIgnoreMatch:=*impletes*)
This returns a "Bad search filter (-7)" error, with is expected since I am not allowed to use substring syntax in an exact matching rule.
And finally, the Test #5 filter:
(supName:caseIgnoreSubstringsMatch:=*impletes*)
Which I expected to work, but returns a "Bad search filter (-7)" error.
Additional info - Opposite example
I have found here (see the "Extensible Matching" section at the end) examples of the opposite case. In the example, the field "sn" uses the "caseIgnoreMatch" matching rule by default (making it case-insensitive). So what they want in the example is to do a case-sensitive substring search. This is the filter they use:
(sn:caseExactSubstringMatch:=*S*)
But I doubt if this example is correct, because if I try exactly the same filter on my side:
(supName:caseExactSubstringMatch:=*S*)
I get a "Bad search filter (-7)" error.
So maybe my issue is due to limitation on OpenLDAP 2.4.33 but would work with other LDAP servers, although the example comes from a guide that is supposed to cover OpenLDAP 2.x ... (?)
If I'm reading RFC 4515ยง3 correctly, an extensible match can only be done with an assertion value (read: a fixed string) and not with a substring filter. If that were permitted, I would expect your original example (supName:caseIgnoreSubstringsMatch:=*impletes*) to work.
Other option is
(Attribute~=value)
-- http://publib.boulder.ibm.com/tividd/td/IBMDS/IDSprogref52/en_US/HTML/progref.htm#HDRLDAPSRCH

How can I write a regex to repeatedly capture group within a larger match?

I'm getting a regex headache, so hopefully someone can help me here. I'm doing some file syntax conversion and I've got this situation in the files:
OpenMarker
keyword some expression
keyword some expression
keyword some expression
keyword some expression
keyword some expression
CloseMarker
I want to match all instances of "keyword" inside the markers. The marker areas are repeated and the keyword can appear in other places, but I don't want to match outside of the markers. What I don't seem to be able to work out is how to get a regex to pull out all the matches. I can get one to do the first or the last, but not to get all of them. I believe it should be possible and it's something to do with repeated capture groups -- can someone show me the light?
I'm using grepWin, which seems to support all the bells and whistles.
You could use:
(?<=OpenMarker((?!CloseMarker).)*)keyword(?=.*CloseMarker)
this will match the keyword inside OpenMarker and CloseMarker (using the option "dot matches newline").
sed -n -e '/OpenMarker[[:space:]]*CloseMarker/p' /path/to/file | grep keyword should work. Not sure if grep alone could do this.
There are only a few regex engines that support separate captures of a repeated group (.NET for example). So your best bet is to do this in two steps:
First match the section you're interested in: OpenMarker(.*?)CloseMarker (using the option "dot matches newline").
Then apply another regex to the match repeatedly: keyword (.*) (this time without the option "dot matches newline").

Resources