Regex: capture multiple similar blocks - ruby

I'm trying to capture all the {% tag %}...{% endtag %} blocks individually in a string but my regex always returns the whole string from the first opening tag to the last ending tag. How can I make it capture all the blocks separately instead of just one match?
Here is an example of a string:
{% tag %}Lorem ipsum dolor sit amet{% endtag %}
{% tag %}
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
{% endtag %}
And my regex (in ruby): /(\{% trans %\}.*\{% endtrans %\})/m
I know the .* is the issue but I haven't found a way to match everything except a closing tag.

You should use a non-greedy or lazy quantifier (?). This means that .*? will try to match as little as possible instead of the .* matching as much as it can (greedy).
/(\{% trans %\}.*?\{% endtrans %\})/m
DEMO

Related

Notepad++ Syntax colouring: Delimiter style closes on end-of-line

I'm making a new colouring syntax for Notepad++.
I'd like to detect and mark the "some_method_name" in the following text:
method::some_method_name
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed non risus. Suspendisse lectus tortor, dignissim sit amet, ...
I wanted to use a "Delimiter style" with "::" as Open param, and End of line as Close param. But I don't know how to write this "end of line".
I tried \n, \\n, \n\r, \\n\\r, \\r\\n, but none are working.
How to approach this syntax colouring ?

in bash, create json object of key=filename and value=file-contents given sequence of pathed filenames on stdin [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
For example, if the list of the filenames on stdin is /etc/alpha.txt and /tmp/beta.txt
And /etc/alpha.txt contains wibble
And /tmp/beta.txt contains fu\nbar
Then what I'd like to generate is
{"/etc/alpha.txt":"wibble","/tmp/beta.txt":"fu\nbar"}
I don't have access to any programming languages.
This is on a Linux OS.
I can install utilities like jq.
The solution from Léa Gris looks spot on. Thank you Léa. Alas my question has been closed as not being focused enough. Sorry about that. This is only my second question on StackOverflow! I'm struggling to make it more focused. This really is my exact issue. I'm trying to make the core runner service in https://cyber-dojo.org a little faster.
My attempts had got stuck at what to put before the jq -s add.
Here it is:
#!/usr/bin/env bash
files=('alpha.txt' 'beta.txt' 'delta with space name.txt')
# Filling sample files
printf 'wibble' >'alpha.txt'
printf 'fu\nbar' >'beta.txt'
cat >'delta with space name.txt' <<'EOF'
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
EOF
# Stream the test files names (1 per line)
printf '%s\n' "${files[#]}" |
xargs -L 1 -I {} jq -sR --arg key {} '{ ($key): .}' {} | jq -s 'add'
xargs -L 1 -I {}: Executes a command by passing each line from stdin while substituting curly braces by the file name.
xargs then runs the jq command to create the filename key and value content of file objects:
jq -sR --arg key {} '{ ($key): .}' {}
Finally, the stream of JSON objects is piped into a final jq -s 'add'
to re-assemble it, into a merged object with "key": "value" pairs:
jq -s 'add'
And finally the actual output of all this:
{
"alpha.txt": "wibble",
"beta.txt": "fu\nbar",
"delta with space name.txt": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.\nUt enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.\nDuis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.\nExcepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n"
}
Processing all files with a single jq:
xargs -I {} jq -sR '{(input_filename):.}' {} | jq -s add
The caution about using input_filename from man jq:
input_filename
Returns the name of the file whose input is currently being filtered. Note that this will not work well unless jq is running in a UTF-8 locale.
Here's one option that assumes both jq and a bash or bash-like shell:
while read -r d
do
(cd "$d"; find . -type f -maxdepth 1) | while read -r f ; do
echo $d $'\t' $(sed 's,^./,,' <<< "$f")
done
done | jq -Rn '[inputs | split("\t") | {(.[0]): .[1]}] | add'
For reference, if you don't want to use jq:
#!/bin/bash
gulp() {
echo $(sed -E ':a;N;$!ba;s/\r{0,1}\n/\\n/g' $1)
}
while read f1 && read f2; do
key=$(gulp $f1)
val=$(gulp $f2)
echo "{\"$f1\":\"$key\",\"$f2\":\"$val\"}"
done
The "gulp" subroutine reads a file and converts newlines to literal '\n'. Then the main part of the script just reads two lines at a time from stdin.

Remove blank line after a match [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Imagine the following example input file:
(1) Lorem ipsum dolor sit amet
vero eos et accusam et justo duo
(2) Lorem ipsum dolor sit amet
vero eos et accusam et justo duo
(3) Lorem ipsum -- dolor sit amet
vero eos et accusam et justo duo
(4) Lorem -- ipsum dolor sit amet
vero eos et accusam et justo duo
I am interested in finding all lines ending with the keyword amet and not containing the keyword -- in a script. If such a line is found the successor line should be removed if it is blank. So only the second (2) example has to be changed:
(1) Lorem ipsum dolor sit amet
vero eos et accusam et justo duo
(2) Lorem ipsum dolor sit amet
vero eos et accusam et justo duo
(3) Lorem ipsum -- dolor sit amet
vero eos et accusam et justo duo
(4) Lorem -- ipsum dolor sit amet
vero eos et accusam et justo duo
This sed command would work:
sed '/--/b;/amet$/{N;s/\n$//;}'
It does the following:
/--/b # If line matches "--", skip all commands
/amet$/ { # If the line ends in "amet"
N # Read next line into pattern space
s/\n$// # Delete the second line if it is blank
}
This would fail for a few edge cases: does a line ending in blamet qualify? Does -- have to be separated by blanks? Could there ever be input like this:
ends in amet
also ends in amet
next line
as the solution presented would not remove the blank line here. For the presented input, though, it would work.
Let's create it bit by bit:
Process only lines that end with amet (using the GNU \b to match a word boundary):
/\bamet$/
If it doesn't contain --
/--/!
Then print the line
n
And if the next line is empty, delete it
/^$/d
That gives this simple program:
#!/bin/sed -f
# Process only lines that end with `amet`:
/\bamet$/{
# If it doesn't contain `--`
/--/!{
# Then print the line
n
# And if the next line is empty, delete it
/^$/d
}
}

vbScript ignore many blank spaces after split

When I split a string with many spaces, is there a way to skip blank spaces?
Example string below:
Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Morbi cursus quam sapien, sed ultricies diam vestibulum ac.
Morbi luctus nisl eleifend mi tincidunt,
sed vehicula magna lobortis.
When split, the array contains many positions of " " (blank spaces)
[0] Lorem
[1] " "
[2] " "
[3] " "
[4] " "
[5] Ipsum
So, is there a way to skip this blank spaces and get something like this?
[0] Lorem
[1] Ipsum
[3] dolor
Here's my code:
strTmp = split(tmpstr," ")
For each text in strTmp
'Here I validate other things
If InStr(x,textToFind) Then
print "text found"
Else
print "not found"
End If
Next
One of the way is to process the string before splitting it.
Sample Code
varStr = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi cursus quam sapien, sed ultricies diam vestibulum ac. Morbi luctus nisl eleifend mi tincidunt, sed vehicula magna lobortis"
' this is what you are getting right now
arrStr = Split(varStr, " ")
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
.Global = True
.MultiLine = True
.Pattern = "\s+" 'matches any whitespace character
varStr1 = objRegEx.Replace(varStr, "¬")
End With
Set objRegEx = Nothing
' this is what you want
arrStr1 = Split(varStr1, "¬")
I have first stripped all spaces and replaced it with a single ¬ which will act as a delim when I split the string later on.
Can do a loop on the string, and replace double spaces with single spaces
Do Until InStr(text, " ") = 0
text= Replace(text, " ", " ")
Loop
You can try this
If trim(text) <> "" Then
Else
End if
Or
If len(trim(text)) > 0 Then
Else
End if

Elastic search - get position from result

is there a way to get the position for a result in elastic search?
Lets say I have the following document:
"Lorem ipsum dolor sit amet, john HALLO doe consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et john HALLO doe dolore magna aliquyam erat john HALLO doe, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum."
Now I search for "HALLO" and get three hits, and some words before and after each hit ("john HALLO doe").
My problem is, that these words could be equal.
So is there a more fancy way to get the exact position from the hit in the document, like e.g. ">HALLO< [line, char-start - char-end]"?
Yes, with the termvector api, doc here. You have to enable the offset param. Follow the example in the doc

Resources