Elastic search - get position from result - elasticsearch

is there a way to get the position for a result in elastic search?
Lets say I have the following document:
"Lorem ipsum dolor sit amet, john HALLO doe consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et john HALLO doe dolore magna aliquyam erat john HALLO doe, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum."
Now I search for "HALLO" and get three hits, and some words before and after each hit ("john HALLO doe").
My problem is, that these words could be equal.
So is there a more fancy way to get the exact position from the hit in the document, like e.g. ">HALLO< [line, char-start - char-end]"?

Yes, with the termvector api, doc here. You have to enable the offset param. Follow the example in the doc

Related

Notepad++ Syntax colouring: Delimiter style closes on end-of-line

I'm making a new colouring syntax for Notepad++.
I'd like to detect and mark the "some_method_name" in the following text:
method::some_method_name
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed non risus. Suspendisse lectus tortor, dignissim sit amet, ...
I wanted to use a "Delimiter style" with "::" as Open param, and End of line as Close param. But I don't know how to write this "end of line".
I tried \n, \\n, \n\r, \\n\\r, \\r\\n, but none are working.
How to approach this syntax colouring ?

in bash, create json object of key=filename and value=file-contents given sequence of pathed filenames on stdin [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
For example, if the list of the filenames on stdin is /etc/alpha.txt and /tmp/beta.txt
And /etc/alpha.txt contains wibble
And /tmp/beta.txt contains fu\nbar
Then what I'd like to generate is
{"/etc/alpha.txt":"wibble","/tmp/beta.txt":"fu\nbar"}
I don't have access to any programming languages.
This is on a Linux OS.
I can install utilities like jq.
The solution from Léa Gris looks spot on. Thank you Léa. Alas my question has been closed as not being focused enough. Sorry about that. This is only my second question on StackOverflow! I'm struggling to make it more focused. This really is my exact issue. I'm trying to make the core runner service in https://cyber-dojo.org a little faster.
My attempts had got stuck at what to put before the jq -s add.
Here it is:
#!/usr/bin/env bash
files=('alpha.txt' 'beta.txt' 'delta with space name.txt')
# Filling sample files
printf 'wibble' >'alpha.txt'
printf 'fu\nbar' >'beta.txt'
cat >'delta with space name.txt' <<'EOF'
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
EOF
# Stream the test files names (1 per line)
printf '%s\n' "${files[#]}" |
xargs -L 1 -I {} jq -sR --arg key {} '{ ($key): .}' {} | jq -s 'add'
xargs -L 1 -I {}: Executes a command by passing each line from stdin while substituting curly braces by the file name.
xargs then runs the jq command to create the filename key and value content of file objects:
jq -sR --arg key {} '{ ($key): .}' {}
Finally, the stream of JSON objects is piped into a final jq -s 'add'
to re-assemble it, into a merged object with "key": "value" pairs:
jq -s 'add'
And finally the actual output of all this:
{
"alpha.txt": "wibble",
"beta.txt": "fu\nbar",
"delta with space name.txt": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.\nUt enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.\nDuis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.\nExcepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n"
}
Processing all files with a single jq:
xargs -I {} jq -sR '{(input_filename):.}' {} | jq -s add
The caution about using input_filename from man jq:
input_filename
Returns the name of the file whose input is currently being filtered. Note that this will not work well unless jq is running in a UTF-8 locale.
Here's one option that assumes both jq and a bash or bash-like shell:
while read -r d
do
(cd "$d"; find . -type f -maxdepth 1) | while read -r f ; do
echo $d $'\t' $(sed 's,^./,,' <<< "$f")
done
done | jq -Rn '[inputs | split("\t") | {(.[0]): .[1]}] | add'
For reference, if you don't want to use jq:
#!/bin/bash
gulp() {
echo $(sed -E ':a;N;$!ba;s/\r{0,1}\n/\\n/g' $1)
}
while read f1 && read f2; do
key=$(gulp $f1)
val=$(gulp $f2)
echo "{\"$f1\":\"$key\",\"$f2\":\"$val\"}"
done
The "gulp" subroutine reads a file and converts newlines to literal '\n'. Then the main part of the script just reads two lines at a time from stdin.

Remove blank line after a match [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Imagine the following example input file:
(1) Lorem ipsum dolor sit amet
vero eos et accusam et justo duo
(2) Lorem ipsum dolor sit amet
vero eos et accusam et justo duo
(3) Lorem ipsum -- dolor sit amet
vero eos et accusam et justo duo
(4) Lorem -- ipsum dolor sit amet
vero eos et accusam et justo duo
I am interested in finding all lines ending with the keyword amet and not containing the keyword -- in a script. If such a line is found the successor line should be removed if it is blank. So only the second (2) example has to be changed:
(1) Lorem ipsum dolor sit amet
vero eos et accusam et justo duo
(2) Lorem ipsum dolor sit amet
vero eos et accusam et justo duo
(3) Lorem ipsum -- dolor sit amet
vero eos et accusam et justo duo
(4) Lorem -- ipsum dolor sit amet
vero eos et accusam et justo duo
This sed command would work:
sed '/--/b;/amet$/{N;s/\n$//;}'
It does the following:
/--/b # If line matches "--", skip all commands
/amet$/ { # If the line ends in "amet"
N # Read next line into pattern space
s/\n$// # Delete the second line if it is blank
}
This would fail for a few edge cases: does a line ending in blamet qualify? Does -- have to be separated by blanks? Could there ever be input like this:
ends in amet
also ends in amet
next line
as the solution presented would not remove the blank line here. For the presented input, though, it would work.
Let's create it bit by bit:
Process only lines that end with amet (using the GNU \b to match a word boundary):
/\bamet$/
If it doesn't contain --
/--/!
Then print the line
n
And if the next line is empty, delete it
/^$/d
That gives this simple program:
#!/bin/sed -f
# Process only lines that end with `amet`:
/\bamet$/{
# If it doesn't contain `--`
/--/!{
# Then print the line
n
# And if the next line is empty, delete it
/^$/d
}
}

Insert two line breaks inbetween merged files using PASTE

I'm using the following command to merge several files: paste -d"\n \n" -s *.md > big-markdown-file.md.
My issue is that there is only one line break between the merged files:
# This is the start of
Lorem ipsum dolor sit amet, te eos solet copiosae deterruisset, mea eu augue postulant temporibus. Sit ex definiebas referrentur. This is the end of file1.
# This is the start of File2.md
Lorem ipsum dolor sit amet, te eos solet copiosae deterruisset, mea eu augue postulant temporibus. Sit ex definiebas referrentur. This is the end of file2.
# This is the start of File3.md
Lorem ipsum dolor sit amet, te eos solet copiosae deterruisset, mea eu augue postulant temporibus. Sit ex definiebas referrentur. This is the end of file3.
This causes issues when the markdown is processed, turning the lorem ipsum paragraphs into headings. Is there a way to introduce 2 line breaks between the individual pastes in the final file so that it outputs something like this:
# This is the start of
Lorem ipsum dolor sit amet, te eos solet copiosae deterruisset, mea eu augue postulant temporibus. Sit ex definiebas referrentur. This is the end of file1.
# This is the start of File2.md
Lorem ipsum dolor sit amet, te eos solet copiosae deterruisset, mea eu augue postulant temporibus. Sit ex definiebas referrentur. This is the end of file2.
# This is the start of File3.md
Lorem ipsum dolor sit amet, te eos solet copiosae deterruisset, mea eu augue postulant temporibus. Sit ex definiebas referrentur. This is the end of file3.
Maybe "cheat" and create a dummy file?
$ touch dummy
$ paste -d"\n" -s *.md dummy > big-markdown-file.md
$ rm dummy # :)
I think it will cause paste to try and consume the next line from the empty file, "fail", and create an empty line instead.
Actually, for a list of file you'll have to create a dummy for each:
$ # create dummy files
$ for f in *.md; do echo $f; touch ${f}_dummy.md; done
$ # create the result files
$ paste -d"\n" -s *.md > big-markdown-file.md
$ # remove dummy files
$ find -name '*dummy.md' -delete

Regex: capture multiple similar blocks

I'm trying to capture all the {% tag %}...{% endtag %} blocks individually in a string but my regex always returns the whole string from the first opening tag to the last ending tag. How can I make it capture all the blocks separately instead of just one match?
Here is an example of a string:
{% tag %}Lorem ipsum dolor sit amet{% endtag %}
{% tag %}
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
{% endtag %}
And my regex (in ruby): /(\{% trans %\}.*\{% endtrans %\})/m
I know the .* is the issue but I haven't found a way to match everything except a closing tag.
You should use a non-greedy or lazy quantifier (?). This means that .*? will try to match as little as possible instead of the .* matching as much as it can (greedy).
/(\{% trans %\}.*?\{% endtrans %\})/m
DEMO

Resources