trying to parse multiple patterns from single command (using sed) - bash

I have multiple files (markdown) that are used to generate different artifacts. For one of the artifacts, I need to parse for lines that begin with # AND for lines between a pattern (::: notes -> :::).
example file
# Blah
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua.
- one
- two
- three
<!--
::: notes
- one is yadda yadda
- two is yadda yadda yadda
- three is wrong
:::
-->
## derp derp
Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
# woo hoo!
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
<!--
::: notes
Aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
:::
-->
I can use sed to find all the # for me
sed -n '/#/p' FILENAME.md
produces the output:
# Blah
## derp derp
# woo hoo!
and I can use sed to properly find and spit out the notes
sed -n '/::: notes/, /:::/p' FILENAME.md
produces the output:
::: notes
- one is yadda yadda
- two is yadda yadda yadda
- three is wrong
:::
::: notes
Aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
:::
But what I really need is the output in the right order (same order it appears in the file) like:
# Blah
::: notes
- one is yadda yadda
- two is yadda yadda yadda
- three is wrong
:::
## derp derp
# woo hoo!
::: notes
Aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
:::
Any sed guru's handy ?
Thanks in advance!!

Multiple search patterns can be specified in this way:
sed -e 'command' -e 'command' filename
So your solution would look like this:
sed -n -e '/::: notes/, /:::/p' -e '/#/p' FILENAME.md

Related

Read a variable stored into another file in bash

I want to retrieve the contents of a variable stored in another file.
my file content: file.txt
text="Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco
laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum"
my script : script.sh
#!/bin/bash
my_var=$(grep "^text=" file.txt | awk -F"=" '{print $2}' )
echo "$my_var"
Now when I run my script It just retrieves the first line of the variable text and I want to have the whole content of the variable
Assign the entire contents of the file to a variable, then use a parameter expansion operator to remove the text= prefix.
my_var=$(< file.txt)
echo "${my_var#*=}"
${my_var#*=} expands to the value of $my_var with a prefix that matches the wildcard *= removed.

Is it possible to fold paragraphs in YAML, while preserving the newlines between paragraphs?

I have 3 paragraphs of text:
my_value:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc
ullamcorper dolor nibh, ut vestibulum purus vestibulum eu. Nulla in
elit sed ante maximus efficitur eu eget orci. Donec fermentum diam at
ornare auctor. Aenean porttitor, est ac dignissim sagittis, ligula
justo luctus risus, ac auctor turpis magna id ex.
Nunc imperdiet dictum mi efficitur malesuada. Sed in ipsum imperdiet,
aliquam massa vitae, vehicula nisl. Morbi eu odio imperdiet, auctor
nulla in, convallis turpis. Pellentesque habitant morbi tristique
senectus et netus et malesuada fames ac turpis egestas. Proin id
egestas massa. Maecenas finibus erat ac cursus auctor.
Nullam elementum accumsan massa id finibus. Praesent ipsum lectus,
venenatis nec congue vel, dignissim rutrum eros. Suspendisse potenti.
Vivamus et sodales ipsum. Mauris eu erat luctus nibh posuere sodales
in ut diam.
What I want as a result is a string that has 3 newlines - the 2 between the paragraphs, and the final newline. The newlines within each paragraph should be folder into spaces.
If I use my_value: >, every newline is folded, so the newlines between the 3 paragraphs are not preserved. If I use my_value: |, the newlines within each paragraph are preserved, which isn't what I want either.
Is there any way to represent this to get the right output, short of just having super long lines in my yaml file?
If I use my_value: >, every newline is folded
That's not true. Maybe you are using a broken YAML library?
> is the indicator for the "folded block scalar".
Empty lines in a folded block scalar are not folded.
So a plain scalar like you showed and a folded block scalar can be the same:
plain:
a
b
c
folded block: >-
a
b
c
double quoted: "a b\nc"
So you can use a plain scalar or a folded block scalar in your case.
(More about Block Scalars in my article about quoting)

what is Naur Text-Processing

Can someone please explain to me in layman terms what the Naur Text-Processing rules? I'm having trouble understanding what the rules mean such as line by line form and line breaks.
Imagine that you have a text, say
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua.\nUt enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
The text contains three kinds of characters:
Spaces ()
New Line characters (\n)
Letters (all other characters: letters, digits, punctuations...)
You have to split the given text into lines in the most efficient way (you want to obtain as few lines as possible), but the split must meet restrictions:
New Line character \n must start a new line
You can split text and start a new line on space only
Each line can contain at most MaxPos (given constant) characters.
In the sample above for MaxPos = 30 we can split as
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor
incididunt ut labore et
dolore magna aliqua.\n <- \n New Line must break; we can't add "Ut" in the line
Ut enim ad minim veniam,
...
These splits broke the rules and that's why are invalid:
Lorem ipsum dolor sit amet, consectetur <- The line is too long, exceeds MaxPos = 30
...
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor incidi <- wrong split: we can split on spaces only
dunt
...
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor
incididunt ut labore et
dolore magna aliqua.\nUt enim <- \n (New Line) must start a new line
ad minim veniam, quis nostrud
...

pipe output to stdout and then to command then to variable

I'm working on a TeamCity server, one of my build commands is:
xcodebuild -scheme "<myscheme>" archive
I need to retrieve the .dSYM file
code=$(cat <<-'CODE'
$lines = file("php://stdin");
foreach($lines as $line){
if(preg_match("#Touch (.*dSYM)#",$line,$m))echo "$m[1]\n";
}
CODE
)
dsym=$(xcodebuild -scheme "<myscheme>" archive | php -r "$code")
This will work. However, my issue is, I would like the logs of xcodebuild to be piped to stdout AND php -r "$code"
xcodebuild -scheme "<myscheme>" archive | tee >(php -r "$code" --)
This also works, the build log shows, and if I change php -r "$code" -- to php -r "$code" -- | cat, it logs the .dSYM file location.
But, the following doesn't work:
xcodebuild -scheme "<myscheme>" archive | tee >(dsym=$(php -r "$code" --))
#this one is the closest but is the wrong way around,
#dsym = all the output, the filename is sent to stdout
exec 5>&1
dsym=$(xcodebuild -scheme "<myscheme>" archive | tee >(php -r "$code" >&5))
And I am unable to get my head around how read -u X dsym works or is meant to be working. Does anyone know how I would go about:
Piping all output to stdout
Piping all output to an intermediate program/script (grep)
Storing the above intermediate program/script output into a variable
To test: save a file scheme.out and replace xcodebuild... with cat scheme.out
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus nibh
nulla, tempor nec dolor ac, eleifend imperdiet diam. Mauris tristique
congue condimentum. Nullam commodo erat fringilla vestibulum tempus.
Aenean mattis varius erat in venenatis. Donec eu tellus urna. Morbi
lacinia vulputate purus, eu egestas tortor varius eget. Curabitur
vitae commodo elit, vitae ullamcorper leo.
Touch some_test_dsym_file.dSYM
Nunc malesuada, nisi at ultricies lobortis, odio diam rhoncus urna,
sed scelerisque enim ipsum eget quam. Nunc ut iaculis sem. Pellentesque
massa odio, sodales nec lacinia nec, rutrum eu neque. Aenean quis neque
magna. Nam quis dictum quam. Proin ut libero tortor. Class aptent taciti
sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Vivamus vehicula fringilla consequat. Curabitur tincidunt est sed magna
congue tristique. Maecenas aliquam nibh eget pellentesque pellentesque.
Quisque gravida cursus neque sed interdum. Proin ornare dapibus
dignissim.
Desired output
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus nibh
nulla, tempor nec dolor ac, eleifend imperdiet diam. Mauris tristique
congue condimentum. Nullam commodo erat fringilla vestibulum tempus.
Aenean mattis varius erat in venenatis. Donec eu tellus urna. Morbi
lacinia vulputate purus, eu egestas tortor varius eget. Curabitur
vitae commodo elit, vitae ullamcorper leo.
Touch some_test_dsym_file.dSYM
Nunc malesuada, nisi at ultricies lobortis, odio diam rhoncus urna,
sed scelerisque enim ipsum eget quam. Nunc ut iaculis sem. Pellentesque
massa odio, sodales nec lacinia nec, rutrum eu neque. Aenean quis neque
magna. Nam quis dictum quam. Proin ut libero tortor. Class aptent taciti
sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Vivamus vehicula fringilla consequat. Curabitur tincidunt est sed magna
congue tristique. Maecenas aliquam nibh eget pellentesque pellentesque.
Quisque gravida cursus neque sed interdum. Proin ornare dapibus
dignissim.
Desired output of echo $dsym
some_test_dsym_file.dSYM
Your code has a lot of dependencies. I will illustrate what I think that you need without using anything beyond standard unix tools.
This runs a command, seq 4, and sends all of its output to stdout and also sends all of its output to another command, sed 's/3/3-processed/', the output of which is captured in a variable, var:
$ exec 3>&1
$ var=$(seq 4 | tee >(cat >&3) | sed 's/3/3-processed/')
1
2
3
4
To illustrate that we successfully captured the output of the sed command:
$ echo "$var"
1
2
3-processed
4
Explanation: var=$(...) captures the output of file handle 1 (stdout) and assigns it to var. Thus, to make the output also appear on stdout, we need to duplicate stdout to another file handle before $(...) redirects it. Thus, we use exec to duplicate stdout as file handle 3. In this way, tee >(cat >&3) sends the output of the command both the original stdout (now called 3) and to file handle 1 which is passed on the the next stage in the pipeline.
So, using your toolchain, try:
exec 5>&1
dsym=$(xcodebuild -scheme "<myscheme>" archive | tee >(cat >&5) | php -r "$code")

List of substitutions in external file

I need to pass a string against an external file that contains a list of substitutions to perform at every occurrence.
The substitution file will look like this (I'm open to suggestions on the structure, it can be a csv, a yaml, etc...)
"ipsum" "foobar"
"elit" ""
"sit amet" "2312"
My ruby code should be implemented like this:
mystring = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam quis elit augue. Nulla tempus magna nec ligula dapibus malesuada. Fusce at orci augue, sit amet suscipit sem. Suspendisse potenti."
newstring = mystring.somemagichappenshere
And the newstring value should be "Lorem foobar dolor 2312, consectetur adipiscing . Aliquam quis augue. Nulla tempus magna nec ligula dapibus malesuada. Fusce at orci augue, 2312 suscipit sem. Suspendisse potenti."
How should I implement that?
Using a csv:
require 'csv'
str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam quis elit augue. Nulla tempus magna nec ligula dapibus malesuada. Fusce at orci augue, sit amet suscipit sem. Suspendisse potenti."
replacements = "ipsum,foobar
elit,
sit amet,2312"
#construct a hash from the csv:
transform_table = Hash[CSV.parse(replacements)]
#Take the keys from the hash and use them for a regular expression:
re = Regexp.union(transform_table.keys)
#Do all substituions in one go:
p str.gsub(re, transform_table)
It's quite simple
Read the file
Iterate each line in the file and for each entry use mystring.gsub!(find, replace) to replace the value with the substitution

Resources