Syntax Highlighting in Notepad++: how to highlight timestamps in log files - syntax

I am using Notepad++ to check logs. I want to define custom syntax highlighting for timestamps and log levels. Highlighting logs levels works fine (defined as keywords). However, I am still struggling with highlighting timestamps of the form
06 Mar 2014 08:40:30,193
Any idea how to do that?

If you just want simple highlighting, you can use Notepad++'s regex search mode. Open the Find dialog, switch to the Mark tab, and make sure Regular Expression is set as the search mode. Assuming the timestamp is at the start of the line, this Regex should work for you:
^\d{2}\s[A-Za-z]+\s\d{4}\s\d{2}:\d{2}:\d{2},[\d]+
Breaking it down bit by bit:
^ means the following Regex should be anchored to the start of the line. If your timestamp appears anywhere but the start of a line, delete this.
\d means match any digit (0-9). {n} is a qualifier that means to match the preceding bit of Regex exactly n times, so \d{2} means match exactly two digits.
\s means match any whitespace character.
[A-Za-z] means match any character in the set A-Z or the set a-z, and the + is a qualifier that means match the preceding bit of Regex 1 or more times. So we're looking for an alphabetic character sequence containing one or more alphabetic characters.
\s means match any whitespace character.
\d{4} is just like \d{2} earlier, only now we're matching exactly 4 digits.
\s means match any whitespace character.
\d{2} means match exactly two digits.
: matches a colon.
\d{2} matches exactly two digits.
: matches another colon.
\d{2} matches another two digits.
, matches a comma.
[\d]+ works similarly to the alphabetic search sequence we set up earlier, only this one's for digits. This finds one or more digits.
When you run this Regex on your document, the Mark feature will highlight anything that matches it. Unlike the temporary highlighting the "Find All in Document" search type can give you, Mark highlighting lasts even after you click somewhere else in the document.

Related

Discard contractions from string

I have a special use case where I want to discard all the contractions from the string and select only words followed by alphabets which do not contain any special character.
For eg:
string = "~ ASAP ASCII Achilles Ada Stackoverflow James I'd I'll I'm I've"
string.scan(/\b[A-z][a-z]+\b/)
#=> ["Achilles", "Ada", "Stackoverflow", "James", "ll", "ve"]
Note: It's not discarding the whole word I'll and I've
Can someone please help how to discard the whole word which contains contractions?
Try this Regex:
(?:(?<=\s)|(?<=^))[a-zA-Z]+(?=\s|$)
Explanation:
(?:(?<=\s)|(?<=^)) - finds the position immediately preceded by either start of the line or by a white-space
[a-zA-Z]+ - matches 1+ occurrences of a letter
(?=\s|$) - The substring matched above must be followed by either a whitespace or end of the line
Click for Demo
Update:
To make sure that not all the letters are in upper case, use the following regex:
(?:(?<=\s)|(?<=^))(?=\S*[a-z])[a-zA-Z]+(?=\s|$)
Click for Demo
The only thing added here is (?=\S*[a-z]) which means that there must be atleast one lowercase letter
I know that there's an accepted answer already, but I'd like to give my own shot:
(?<=\s|^)\w+[a-z]\w*
You can test it here. This regex is shorter and more efficient (157 steps against 315 from the accepted answer).
The explanation is rather simple:
(?<=\s|^)- This is a positive look behind. It means that we want strings preceded by a whitespace character or the start of the string.
\w+[a-z]\w* - This one means that we want strings composed by letters only (word characters) containing least one lowercase letter, thus discarding words which are whole uppercase. Along with the positive look behind, the whole regex ends up discarding words containing special characters.
NOTE: this regex won't take into account one-letter words. If you want to accomplish that, then you should use \w*[a-z]\w* instead, with a little efficiency cost.

Regex incorrectly matching punctuation (including spaces)

I am trying to check if a string contains at least one lowercase letter, uppercase letter, and a number, but not punctuation (including spaces).
For example
4aBc8Fk3 should match
4aBc 8.;3 should not match
I tried the following, but it matches spaces:
^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9]).{6,}[^[:punct:]]$
Any ideas how to not match strings containing punctuation including spaces?
The regular expression you have got there does the following for as far as I understand (I'm not familiar with the ruby variety, and still quite new to regex myself; this will give you an idea, but may not be 100% correct):
Go to the beginning of the string
Ensure the string matches any number of any characters followed by a lowercase letter, e.g. --a
Ensure the string matches any number of any characters followed by an uppercase letter, e.g.--aA
Ensure the string matches any number of any characters followed by a number, e.g. --aA0
If that is all true, make sure the beginning of the string is followed by at least 6 random characters, e.g.--aA0-
Ensure that is followed by a single non-punctuation character (although this is the part I'm not sure about, as I haven't used character classes before, and don't know if it's [^[:punct:]] or [^:punct:]), e.g. --aA0-c
Ensure that is followed directly by the end of the string
Now, the lookaheads would also allow a different order of occurrences, e.g. 0---Aa, as long as the string contains any characters followed by what they are looking for.
What you probably want is ^[a-zA-Z0-9]{6,}$, i.e. at least six characters, with the characters being letters and numbers (though that would also allow aaaaaa, for example).
Maybe try ^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])[a-zA-Z0-9]{6,}$ to make sure each group is present, and to get alpha-numerical characters (at least six of them) only.
I always use a tool such as http://www.regexpal.com/ to slowly build up my regex and to see where I go wrong, deconstructing a "bad" regex until I get to a "good" one, then slowly adding to it again.
Hope that helps. :)
P.S.: I'm still a bit unclear how many characters you want to match in total, i.e. if the string is fixed length or not...?

any explanation on the following regular expression?

I met the following the regex in ruby code, anyone could detail this to me?
[\w-]+\.(?:doc|txt)$
especially I think I am not clear about [\w-]+\ and ?:
It is a sequence of one or more letter/number/underscore/hyphen, followed by the period, followed by either doc or txt at the end of a line.
[\w-] is letter/number/underscore/hyphen.
\. is an escaped period.
(?:...) is a grouping (required to express options between doc and txt) that would not appear in the result as a captured substring.
It is likely written for searching a file name with the extension doc or txt, embedded within a multi-line string. Or, if the author of that regex is stupid (mistaking $ for \z), then it might have been intended to simply match a file name with that extension.
There is an online regex tester available at https://regex101.com/
You can use it to analyse, verify or debug your regex strings. It already saved me tons of time.
Your regex detailed automatically with the help of that tool:
/[\w-]+\.(?:doc|txt)$/
[\w-]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\w match any word character [a-zA-Z0-9_]
- the literal character -
\. matches the character . literally
(?:doc|txt) Non-capturing group
1st Alternative: doc
doc matches the characters doc literally (case sensitive)
2nd Alternative: txt
txt matches the characters txt literally (case sensitive)
$ assert position at end of the string
\w means any word character
minus in this context just means minus char
(?:doc|txt) means match doc or txt
so any word char or a minus repeated one or more times followed by a dot followed by either doc or txt and the pattern must be at the end of the line
the author should have escaped the minus for clarity imho
It means a file name which contains only word characters (a-z, A-Z, 0-9 and underscore) and hyphens, and with an extension of either .doc or .txt.
In detail,
\w matches a word character
[\w-] matches either a word character or a hyphen
[\w-]+ matches one or more such characters
\. matches a period
(?:) forms a non-capture group
(?:doc|txt) matches either a doc sequence, or a txt sequence
In ruby, $ matches the end of a line

Regex for capital letters not matching accented characters

I am new to ruby and I'm trying to work with regex.
I have a text which looks something like:
HEADING
Some text which is always non capitalized. Headings are always capitalized, followed by a space or nothing more.
YOU CAN HAVE MULTIPLE WORDS IN HEADING
I'm using this regular expression to choose all headings:
^[A-Z]{2,}\s?([A-Z]{2,}\s?)*$
However, it matches all headings which does not contain chars as Č, Š, Ž(slovenian characters).
So I'm guessing [A-Z] only matches ASCII characters? How could I get utf8?
You are right in that when you define the ASCII range A-Z, the match is made literally only for those characters. This is to do with the history of characters on computers, more and more characters have been added over time, and they are not always structured in an encoding in ways that are easy to use.
You could make a larger character class that matches the slovenian characters you need, by listing them.
But there is a shortcut. Someone else has already added necessary data to the Unicode data so that you can write shorter matches for "all uppercase characters": /[[:upper:]]/. See http://ruby-doc.org//core-2.1.4/Regexp.html for more.
Altering your regular expression with just this adjustment:
^[[:upper:]]{2,}\s?([[:upper:]]{2,}\s?)*$
You may need to adjust it further, for instance it would not match the heading "I AM A HEADING" due to the match insisting each word is at least two letters long.
Without seeing all your examples, I would probably simplify the group matching and just allow spaces anywhere:
^[[:upper:]\s]+$
You can use unicode upper case letter:
\p{Lu}
Your regex:
\b\p{Lu}{2,}(?:\s*\p{Lu}{2,})\b
RegEx Demo

How can I write a regex in Ruby that will determine if a string meets this criteria?

How can I write a regex in Ruby 1.9.2 that will determine if a string meets this criteria:
Can only include letters, numbers and the - character
Cannot be an empty string, i.e. cannot have a length of 0
Must contain at least one letter
/\A[a-z0-9-]*[a-z][a-z0-9-]*\z/i
It goes like
beginning of string
some (or zero) letters, digits and/or dashes
a letter
some (or zero) letters, digits and/or dashes
end of string
I suppose these two will help you: /\A[a-z0-9\-]{1,}\z/i and /[a-z]{1,}/i. The first one checks on first two rules and the second one checks for the last condition.
No regex:
str.count("a-zA-Z") > 0 && str.count("^a-zA-Z0-9-") == 0
You can take a look at this tutorial for how to use regular expressions in ruby. With regards to what you need, you can use the following:
^[A-Za-z0-9\-]+$
The ^ will instruct the regex engine to start matching from the very beginning of the string.
The [..] will instruct the regex engine to match any one of the characters they contain.
A-Z mean any upper case letter, a-z means any lower case letter and 0-9 means any number.
The \- will instruct the regex engine to match the -. The \ is used infront of it because the - in regex is a special symbol, so it needs to be escaped
The $ will instruct the regex engine to stop matching at the end of the line.
The + instructs the regex engine to match what is contained between the square brackets one or more time.
You can also use the \i flag to make your search case insensitive, so the regex might become something like this:
^[a-z0-9\-]+/i$

Resources