search a text in another text with the characters in the same order - text-search

I would like to search for a text ('needle') if it exists within another text ('haystack') with the following two conditions:
all the characters of the 'needle' must be within the 'haystack' in the same order
there can be any and unlimited other characters between subsequent characters of the 'needle' within the 'haystack'
Examples:
cde in abcde --> TRUE
cde in ab-c-de --> TRUE
cde in cabecd --> FALSE
cde in c-d!a+b5ce --> TRUE
cde in edc --> FALSE
Moreover 'cde' is not a constant string, instead a variable iterated over a list.
Any elegant solution in python or R or bash would be appreciated.

I can propose you to use a dynamically generated regex like this:
/.*c.*d.*e.*/

Regular expressions are your friend.
http://en.m.wikipedia.org/wiki/Regular_expression
https://docs.python.org/2/library/re.html

I got the solution in python:
re.match('.*'+'.*'.join(list(needle))+'.*',(haystack))

Related

Antlr4 handling of yaml unquoted multi-line strings

I am trying to build a parser for a limited set of YAML syntax similar to what is shown below using Antlr 4.7:
name:
last: Smith
first: John
address:
street: 123 Main St
Suite 100
city: Boston
state: MA
zip: 12345
I have a grammar (derived from the Python 3 grammar) that works correctly if I put quotes around the "value" strings but fails if I remove them. It seems that defining the "value" string so matching terminates before the next "tag:" portion of a new block or a "tag: " portion of a new assign statement is the trick.
Does anyone have any ideas or working samples that handle this use case?
It is the indentation of a non-empty line that should end the matching of a plain scalar. If that indentation is not more than the indentation of the current mapping, the scalar ends there.
For example:
mapping:
key: value with
multiple lines
key2:
other value
Here, the value with multiple lines ends at the line with key2:, because it is not indented more than the current mapping (i.e. the value of mapping: above). Of course, the last newline character and the indentation of key2: is not a part of that scalar's content.
In the YAML specification, this is handled by a production
s-indent(n) ::= s-space × n
Now in our case, the inner mapping has an indentation of n=2, so your scalar would be matched by something like
plain-scalar-part (s-indent(3) s-white* plain-scalar-part)*
(I don't know Antlr syntax, just assume these are all non-terminals). After the (possibly empty) first line, you match an indentation of more than the parent mapping (so 3 spaces in this case), then there might be even more whitespace (which is not part of the content), and then more content follows. For simplicity, I ignored possible empty lines.
This will not match the line key2: because it has too few indentation, which is how the matching of the scalar will end.
Now I do not know how to do something like s-indent(n) in Antlr, but the Python grammar should give you the right pointers.

Regex capturing from a non capture group in ruby

I am trying to fix a bit of regex I have for a chatops bot for lita. I have the following regex:
/^(?:how\s+do\s+I\s+you\s+get\s+far\s+is\s+it\s+from\s+)?(.+)\s+to\s+(.+)/i
This is supposed to capture the words before and after 'to', with optional words in front that can form questions like: How do I get from x to y, how far from x to y, how far is it from x to y.
expected output:
match 1 : "x"
match 2 : "y"
For the most part my optional words work as expected. But when I pull my response matches, I get the words leading up to the first capture group included.
So, how far is it from sfo to lax should return:
sfo and lax.
But instead returns:
how far is it from sfo and lax
Your glitch is that the first chunk of your regex doesn't make sense.
To choose from multiple options, use this syntax:
(a|b|c)
What I think you're trying to do is this:
/^(?:(?:how|do|I|you|get|far|is|it|from)\s+)*(.+)\s+to\s+(.+)/i
The regexp says to skip all the words in the multiple options, regardless of order.
If you want to preserve word order, you can use regexps such as this pseudocode:
… how (can|do|will) (I|you|we) (get|go|travel) from …
When you want to match words, \w is the most natural pattern I'd use (e.g., it is used in word count tools.)
To capture any 1 word before and after a "to" can be done with (\w+\sto\s+\w*) regex.
To return them as 2 different groups, you can use (\w+)\s+to\s+(\w+).
Have a look at the demo.

Notepad++ how to delete second set of same character in a line

I have a text files that contains multiple lines, each line has the following format
string1/string2/string3
all 3 strings are arbitrary. I want to remove /string3 for all lines.
Anyone have any suggestion?
Thank you in advance!
CTRL + H
Select Regular expression
Type /(\w)+$ into Find what
Replace with nothing
Of course you may have to fiddle around with the regular expression according to your data, but that's the way to go.

Using Ruby on a string, how can I slice between two parts of the string using RegEx?

I just want to save the text between two specific points in a string into a variable. The text would look like this:
..."content"=>"The text I want to save to a variable"}]...
I suppose I would have to use scan or slice, but not exactly sure how to pull out just the text without grabbing the RegEx identifiers before and after the text. I tried this, but it didn't work:
var = mystring.slice(/\"content\"\=\>\".\"/)
This should do the job
var = mystring[/"content"=>"(.*)"/, 1]
Note that:
.slice aliases []
none of the characters you escaped are special regexp characters where you're using them
you can "group" the bit you want to keep with ()
.slice / [] take a second parameter to pick a matched group
your_text = '"content"=>"The text I want to save to a variable"'
/"content"=>"(?<hooray>.*)"/ =~ your_text
Afterwards, hooray local variable will be magically set to contain your text. Can be used to set multiple variables.
This regex will match your string:
/\"content\"=>\"(.*)\"/
you can try rubular.com for testing
It looks like you're trying to truncate a sentence. You can split the sentence either on punctuation, or even on words.
mystring.split(".")
mystring.split("word")

Locating the node by value containing whitespaces using XPath

I need to locate the node within an xml file by its value using XPath.
The problem araises when the node to find contains value with whitespaces inside.
F.e.:
<Root>
<Child>value</Child>
<Child>value with spaces</Child>
</Root>
I can not construct the XPath locating the second Child node.
Simple XPath /Root/Child perfectly works for both children, but /Root[Child=value with spaces] returns an empty collection.
I have already tried masking spaces with %20, & #20;, & nbsp; and using quotes and double quotes.
Still no luck.
Does anybody have an idea?
Depending on your exact situation, there are different XPath expressions that will select the node, whose value contains some whitespace.
First, let us recall that any one of these characters is "whitespace":
-- the Tab
-- newline
-- carriage return
' ' or -- the space
If you know the exact value of the node, say it is "Hello World" with a space, then a most direct XPath expression:
/top/aChild[. = 'Hello World']
will select this node.
The difficulties with specifying a value that contains whitespace, however, come from the fact that we see all whitespace characters just as ... well, whitespace and don't know if a it is a group of spaces or a single tab.
In XPath 2.0 one may use regular expressions and they provide a simple and convenient solution. Thus we can use an XPath 2.0 expression as the one below:
/*/aChild[matches(., "Hello\sWorld")]
to select any child of the top node, whose value is the string "Hello" followed by whitespace followed by the string "World". Note the use of the matches() function and of the "\s" pattern that matches whitespace.
In XPath 1.0 a convenient test if a given string contains any whitespace characters is:
not(string-length(.)= stringlength(translate(., '
','')))
Here we use the translate() function to eliminate any of the four whitespace characters, and compare the length of the resulting string to that of the original string.
So, if in a text editor a node's value is displayed as
"Hello World",
we can safely select this node with the XPath expression:
/*/aChild[translate(., '
','') = 'HelloWorld']
In many cases we can also use the XPath function normalize-space(), which from its string argument produces another string in which the groups of leading and trailing whitespace is cut, and every whitespace within the string is replaced by a single space.
In the above case, we will simply use the following XPath expression:
/*/aChild[normalize-space() = 'Hello World']
Try either this:
/Root/Child[normalize-space(text())=value without spaces]
or
/Root/Child[contains(text(),value without spaces)]
or (since it looks like your test value may be the issue)
/Root/Child[normalize-space(text())=normalize-space(value with spaces)]
Haven't actually executed any of these so the syntax may be wonky.
Locating the Attribute by value containing whitespaces using XPath
I have a input type element with value containing white space.
eg:
<input type="button" value="Import Selected File">
I solved this by using this xpath expression.
//input[contains(#value,'Import') and contains(#value ,'Selected')and contains(#value ,'File')]
Hope this will help you guys.
"x0020" worked for me on a jackrabbit based CQ5/AEM repository in which the property names had spaces. Below would work for a property "Record ID"-
[(jcr:contains(jcr:content/#Record_x0020_ID, 'test'))]
did you try #x20 ?
i've googled this up like on the second link:
try to replace the space using "x0020"
this seems to work for the guy.
All of the above solutions didn't really work for me.
However, there's a much simpler solution.
When you create the XMLDocument, make sure you set PreserveWhiteSpace property to true;
XmlDocument xmldoc = new XmlDocument();
xmldoc.PreserveWhitespace = true;
xmldoc.Load(xmlCollection);

Resources