what is Naur Text-Processing - algorithm

Can someone please explain to me in layman terms what the Naur Text-Processing rules? I'm having trouble understanding what the rules mean such as line by line form and line breaks.

Imagine that you have a text, say
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua.\nUt enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
The text contains three kinds of characters:
Spaces ()
New Line characters (\n)
Letters (all other characters: letters, digits, punctuations...)
You have to split the given text into lines in the most efficient way (you want to obtain as few lines as possible), but the split must meet restrictions:
New Line character \n must start a new line
You can split text and start a new line on space only
Each line can contain at most MaxPos (given constant) characters.
In the sample above for MaxPos = 30 we can split as
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor
incididunt ut labore et
dolore magna aliqua.\n <- \n New Line must break; we can't add "Ut" in the line
Ut enim ad minim veniam,
...
These splits broke the rules and that's why are invalid:
Lorem ipsum dolor sit amet, consectetur <- The line is too long, exceeds MaxPos = 30
...
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor incidi <- wrong split: we can split on spaces only
dunt
...
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor
incididunt ut labore et
dolore magna aliqua.\nUt enim <- \n (New Line) must start a new line
ad minim veniam, quis nostrud
...

Related

Read a variable stored into another file in bash

I want to retrieve the contents of a variable stored in another file.
my file content: file.txt
text="Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco
laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum"
my script : script.sh
#!/bin/bash
my_var=$(grep "^text=" file.txt | awk -F"=" '{print $2}' )
echo "$my_var"
Now when I run my script It just retrieves the first line of the variable text and I want to have the whole content of the variable
Assign the entire contents of the file to a variable, then use a parameter expansion operator to remove the text= prefix.
my_var=$(< file.txt)
echo "${my_var#*=}"
${my_var#*=} expands to the value of $my_var with a prefix that matches the wildcard *= removed.

MigraDoc: How to apply vertical line spacing to a paragraph?

I am creating a PDF using MigraDoc.
Everything works fine except the setting of line spacing of a paragraph.
I want to have more vertical space between paragraph lines.
What I tried so far without any change in the resulting PDF:
string text = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.";
Paragraph para = CreateParagraph(text , "Helvetica", 7, "0.1mm", Colors.Black, ParagraphAlignment.Left);
// tried this:
para.Format.LineSpacing = MigraDoc.DocumentObjectModel.Unit.FromMillimeter(12);
// and tried that:
para.Format.LineSpacing = 12;
Can anyone point me in the right direction?
The meaning of LineSpacing depends on the value set for LineSpacingRule.
If LineSpacingRule is set to e.g. Single or Double then the value set for LineSpacing will be ignored.
Try AtLeast or Exactly for LineSpacingRule.

Text in columns (like in a table)

I would like to have one column with a label and a second column with a longer text inside with line breaks like in a table.
Label Text: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed
diam nonumy eirmod tempor invidunt ut labore et dolore magna
aliquyam erat, sed diam voluptua. At vero eos et accusam et
justo duo dolores et ea rebum. Stet clita kasd gubergren, no
sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem
ipsum dolor sit amet, consetetur sadipscing elitr, sed diam.
I tried:
paste label.txt long.txt | column -s $'\t'
Thank you very much in advance!
Glad you have accepted an answer. Just for others who might want to have the
text re-wrapped to avoid over-long lines, this sort of text-processing is what nroff was invented for
over 40 years ago. It's now part of the groff package. Here's an example:
(echo -e '.na\n.nh'
cat label.txt
echo "'in \\w' $(<label.txt)'u"
cat long.txt ) |
nroff | sed '/^$/d'
Nroff commands begin with . or ' at start of line.
.na stops justification, .nh stops hyphenation, 'in sets the indent
to the width of the string (\w'...'), and the sed is to remove trailing blank lines.
You can set the line width with .ll 80 eg for 80 columns.
Long live nroff!
Label Text: Lorem ipsum dolor sit amet, consetetur sadipscing
elitr, sed diam nonumy eirmod tempor invidunt ut
labore et dolore magna aliquyam erat, sed diam
voluptua. At vero eos et accusam et justo duo dolores
et ea rebum. Stet clita kasd gubergren, no sea
takimata sanctus est Lorem ipsum dolor sit amet.
Lorem ipsum dolor sit amet, consetetur sadipscing
elitr, sed diam.
The following bash script might help you:
padded-paste.sh:
#!/bin/bash
label=$1
text=$2
# get the number of lines in the text
nline=$(wc -l ${text} | cut -f 1 -d' ')
# get the width of the label
padding=$(awk 'NR==1{ print length }' ${label})
# create a temp directory
tmpdir=$(mktemp -dt "$(basename $0).XXXXXXXXXX")
templabel=${tmpdir}/label.tmp
# print the first line of the label file to a temp file:
awk 'NR==1{ print }' ${label} > ${templabel}
# add blank padding to the temp label file:
for i in $(seq 2 $nline); do
printf "%*s\n" $padding "" >> ${templabel}
done
# pasted the padded lable to the long text
paste -d' ' ${templabel} ${text}
Based on the following inputs:
label.txt:
Label Text:
long.txt:
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam
voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit
amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam.
You can use it like:
sh padded-paste.sh label.txt long.txt
And it will output:
Label Text: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam
voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit
amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam.

List of substitutions in external file

I need to pass a string against an external file that contains a list of substitutions to perform at every occurrence.
The substitution file will look like this (I'm open to suggestions on the structure, it can be a csv, a yaml, etc...)
"ipsum" "foobar"
"elit" ""
"sit amet" "2312"
My ruby code should be implemented like this:
mystring = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam quis elit augue. Nulla tempus magna nec ligula dapibus malesuada. Fusce at orci augue, sit amet suscipit sem. Suspendisse potenti."
newstring = mystring.somemagichappenshere
And the newstring value should be "Lorem foobar dolor 2312, consectetur adipiscing . Aliquam quis augue. Nulla tempus magna nec ligula dapibus malesuada. Fusce at orci augue, 2312 suscipit sem. Suspendisse potenti."
How should I implement that?
Using a csv:
require 'csv'
str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam quis elit augue. Nulla tempus magna nec ligula dapibus malesuada. Fusce at orci augue, sit amet suscipit sem. Suspendisse potenti."
replacements = "ipsum,foobar
elit,
sit amet,2312"
#construct a hash from the csv:
transform_table = Hash[CSV.parse(replacements)]
#Take the keys from the hash and use them for a regular expression:
re = Regexp.union(transform_table.keys)
#Do all substituions in one go:
p str.gsub(re, transform_table)
It's quite simple
Read the file
Iterate each line in the file and for each entry use mystring.gsub!(find, replace) to replace the value with the substitution

Ruby - Find the top 3 longest words in a string

I want to be able to get the 3 longest words from a string. Is there a neat way of doing this without getting into arrays etc?
>> str = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.'
>> str.split.map { |s| s.gsub(/\W/, '') }.sort_by(&:length)[-3..-1]
=> ["adipisicing", "consectetur", "exercitation"]
"some string with words that are of different length".split(/ /).sort_by(&:length).reverse[0..2]
Since Ruby 2.2 Enumerable max_by, min_by,maxand min take an optional argument, allowing you to specify how many elements will be returned.
str.scan(/[[:alnum:]]+/).max_by(3, &:size)
# => ["exercitation", "consectetur", "adipisicing"]

Resources