In Google cloudbuild.yaml, what is the - | argument? - google-cloud-build

In Google cloudbuild tutorial the example cloudbuild.yaml uses - | as one of the arguments.
args:
- '-c'
- |
if [ -d "environments/$BRANCH_NAME/" ]; then
...
What is the purpose of '- |'

That character is called "Literal Block Scalar" and it is used to span values across multiple lines. Spanning with | will include the newlines and any trailing spaces. You can also span with > but this will fold new lines to spaces.
Example:
include_newlines: |
exactly as you see
will appear these three
lines of poetry
fold_newlines: >
this is really a
single line of text
despite appearances
If you want to know more about the yaml syntax, you can visit this for information that may not be included in the cloud build documentation.

Related

Regex: match only string C that is in between string A and string B

How can I write a regex in a shell script that would target only the targeted substring between two given values? Give the example
https://www.stackoverflow.com
How can I match only the ":" in between "https" and "//".
If possible please also explain the approach.
The context is that I need to prepare a file that would fetch a config from the server and append it to the .env file. The response comes as JSON
{
"GRAPHQL_URL": "https://someurl/xyz",
"PUBLIC_TOKEN": "skml2JdJyOcrVdfEJ3Bj1bs472wY8aSyprO2DsZbHIiBRqEIPBNg9S7yXBbYkndX2Lk8UuHoZ9JPdJEWaiqlIyGdwU6O5",
"SUPER_SECRET": "MY_SUPER_SECRET"
}
so I need to adjust it to the .env syntax. What I managed to do this far is
#!/bin/bash
CURL_RESPONSE="$(curl -s url)"
cat <<< ${CURL_RESPONSE} | jq -r '.property.source' | sed -r 's/:/=/g;s/[^a-zA-Z0-9=:_/-]//g' > .env.test
so basically I fetch the data, then extract the key I am after with jq, and then I use sed to first replace all ":" to "=" and after that I remove all the quotations and semicolons and white spaces that comes from JSON and leave some characters that are necessary.
I am almost there but the problem is that now my graphql url (and only other) would look like so
https=//someurl/xyz
so I need to replace this = that is in between https and // back with the colon.
Thank you very much #Nic3500 for the response, not sure why but I get error saying that
sed: 1: "s/:/=/g;s#https\(.*\)// ...": \1 not defined in the RE
I searched SO and it seems that it should work since the brackets are escaped and I use -r flag (tried -E but no difference) and I don't know how to apply it. To be honest I assume that the replacement block is this part
#\1#
so how can I let this know to what character should it be replaced?
This is how I tried to use it
#!/bin/bash
CURL_RESPONSE="$(curl -s url)"
cat <<< ${CURL_RESPONSE} | jq -r '.property.source' | sed -r 's/:/=/g;s#https\(.*\)//.*#\1#;s/[^a-zA-Z0-9=:_/-]//g' > .env.test
Hope with this context you would be able to help me.
echo "https://www.stackoverflow.com" | sed 's#https\(.*\)//.*#\1#'
:
sed operator s/regexp/replacement/
regexp: https\(.*)//.*. So "https" followed by something (.*), followed by "//", followed by anything else .*
the parenthesis are back slashed since they are not part of the pattern. They are used to group a part of the regex for the replacement part of the s### operator.
replacement: \1, means the first group found in the regex \(.*\)
I used s###, but the usual form is s///. Any character can take the place of the / with the s operator. I used # as using / would have been confusing since you use / in the url.
The problem is that your sed substitutions are terribly imprecise. Anyway, you want to do it in jq instead, where you have more control over which parts you are substituting, and avoid spawning a separate process for something jq quite easily does natively in the first place.
curl -s url |
jq -r '.property.source | to_entries[] |
"\(.key)=\"\(.value\)\""' > .env.test
Tangentially, capturing the output of curl into a variable just so you can immediately cat it once to standard output is just a waste of memory.

Whitespace characters messes up a shell script grep pattern for extracting markdown links (macOS)

I am working on a tool to convert markdown files to text bundles, revising a great piece of code from Zett to use on macOS since I will be porting my Apple Notes files to Craft.
I have problems parsing all links into an array using grep. No matter how hard I try using options like --null and | xargs -0, the result ends up being split by whitespace characters:
targets=($(grep '!\[.*\](.*)' "$inFile"))
An example: I have a small markdown test file containing the following:
# Allan Falk - ComicWiki
**Allan Falk - ComicWiki**
![Allan Falk - ComicWiki](images/Allan%20Falk%20-%20ComicWiki.png)
http://comicwiki.dk/wiki/Allan_Falk
Running the above code creates the following array in where the markdown link is split up like so:
![Allan
Falk
-
ComicWiki](images/Allan%20Falk%20-%20ComicWiki.png)
How can I get complete links as individual array entries (they will be processed later individually, using sed for copying files etc.)?
You can set IFS= (null value) and use read like this:
IFS= read -ra arr < <(grep '!\[.*\]' file)
# examine array
declare -p arr
declare -a arr='([0]="![Allan Falk - ComicWiki](images/Allan%20Falk%20-%20ComicWiki.png)")'
<(grep '!\[.*\]' file) runs grep using process substitution and < before that sends output of this command to read
Working Demo
After doing some digging, I found out that I was missing quotations in my statement. So instead of writing:
targets=($(grep '!\[.*\](.*)' "$inFile"))
I needed to add quotation marks inside the first set of brackets:
targets=( "$(grep '!\[.*\](.*)' "$inFile")" )
Now the array works fine – no whitespace splitting occurs.

Parse a nested variable from YAML file in bash

A complex .yaml file from this link needs to be fed into a bash script that runs as part of an automation program running on an EC2 instance of Amazon Linux 2. Note that the .yaml file in the link above contains many objects, and that I need to extract one of the environment variables defined inside one of the many objects that are defined in the file.
Specifically, how can I extract the 192.168.0.0/16 value of the CALICO_IPV4POOL_CIDR variable into a bash variable?
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
I have read a lot of other postings and blog entries about parsing flatter, simpler .yaml files, but none of those other examples show how to extract a nested value like the value of CALICO_IPV4POOL_CIDR in this question.
As others are commenting, it is recommended to make use of yq (along with jq) if available.
Then please try the following:
value=$(yq -r 'recurse | select(.name? == "CALICO_IPV4POOL_CIDR") | .value' "calico.yaml")
echo "$value"
Output:
192.168.0.0/16
If you're able to install new dependencies, and are planning on dealing with lots of yaml files, yq is a wrapper around jq that can handle yaml. It'd allow a safe (non-grep) way of accessing nested yaml values.
Usage would look something like MY_VALUE=$(yq '.myValue.nested.value' < config-file.yaml)
Alternatively, How can I parse a YAML file from a Linux shell script? has a bash-only parser that you could use to get your value.
The right way to do this is to use a scripting language and a YAML parsing library to extract the field you're interested in.
Here's an example of how to do it in Python. If you were doing this for real you'd probably split it out into multiple functions and have better error reporting. This is literally just to illustrate some of the difficulties caused by the format of calico.yaml, which is several YAML documents concatenated together, not just one. You also have to loop over some of the lists internal to the document in order to extract the field you're interested in.
#!/usr/bin/env python3
import yaml
def foo():
with open('/tmp/calico.yaml', 'r') as fil:
docs = yaml.safe_load_all(fil)
doc = None
for candidate in docs:
if candidate["kind"] == "DaemonSet":
doc = candidate
break
else:
raise ValueError("no YAML document of kind DaemonSet")
l1 = doc["spec"]
l2 = l1["template"]
l3 = l2["spec"]
l4 = l3["containers"]
for containers_item in l4:
l5 = containers_item["env"]
env = l5
for entry in env:
if entry["name"] == "CALICO_IPV4POOL_CIDR":
return entry["value"]
raise ValueError("no CALICO_IPV4POOL_CIDR entry")
print(foo())
However, sometimes you need a solution right now and shell scripts are very good at that.
If you're hitting an API endpoint, then the YAML will usually be pretty-printed so you can get away with extracting text in ways that won't work on arbitrary YAML.
Something like the following should be fairly robust:
cat </tmp/calico.yaml | grep -A1 CALICO_IPV4POOL_CIDR | grep value: | cut -d: -f2 | tr -d ' "'
Although it's worth checking at the end with a regex that the extracted value really is valid IPv4 CIDR notation.
The key thing here is grep -A1 CALICO_IPV4POOL_CIDR .
The two-element dictionary you mentioned (shown below) will always appear as one chunk since it's a subtree of the YAML document.
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
The keys in calico.yaml are not sorted alphabetically in general, but in {"name": <something>, "value": <something else>} constructions, name does consistently appear before value.
MYVAR=$(\
curl https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml | \
grep -A 1 CALICO_IPV4POOL_CIDR | \
grep value | \
cut -d ':' -f2 | \
tr -d ' "')
Replace curl https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml with however you're sourcing the file. That gets piped to grep -A 1 CALICO_IPV4POOL_CIDR. This gives you 2 lines of text: the name line, and the value line. That gets piped to grep value, which now gives us the line we want with just the value. That gets piped to cut -d ':' -f2 which uses the colon as a delimiter and gives us the second field. $(...) executes the enclosed script, and it is assigned to MYVAR. After this script, echo $MYVAR should produce 192.168.0.0/16.
You have two problems there:
How to read a YAML document from a file with multiple documents
How to select the key you want from that YAML document
I have guessed that you need the YAML document of kind 'DaemonSet' from reading Gregory Nisbett's answer.
I will try to only use tools that are likely to be already installed on your system because you mentioned you want to do this in a Bash script. I assume you have JQ because it is hard to do much in Bash without it!
For the YAML library I tend to use Ruby for this because:
Most systems have a Ruby
Ruby's Psych library has been bundled since Ruby 1.9
The PyYAML library in Python is a bit inflexible and sometimes broken compared to Ruby's in my experience
The YAML library in Perl is often not installed by default
It was suggested to use yq, but that won't help so much in this case because you still need a tool that can extract the YAML document.
Having extracted the document I am going to again use Ruby to save the file as JSON. Then we can use jq.
Extracting the YAML document
To get the YAML document using Ruby and save it as JSON:
url=...
curl -s $url | \
ruby -ryaml -rjson -e \
"puts YAML.load_stream(ARGF.read)
.select{|doc| doc['kind']=='DaemonSet'}[0].to_json" \
| jq . > calico.json
Further explanation:
The YAML.load_stream reads the YAML documents and returns them all as an Array
ARGF.read reads from a file passed via STDIN
Ruby's select allows easy selection of the YAML document according to its kind key
Then we take the element 4 and convert to JSON.
I pass that response through jq . so that it's formatted for human readability but that step isn't really necessary. I could do the same in Ruby but I'm guessing you want Ruby code kept to a minimum.
Selecting the key you want
To select the key you want the following JQ query can be used:
jq -r \
'.spec.template.spec.containers[].env[] | select(.name=="CALICO_IPV4POOL_CIDR") | .value' \
calico.json
Further explanation:
The first part spec.template.spec.containers[].env[] iterates for all containers and for all envs inside them
Then we select the Hash where the name key equals CALICO_IPV4POOL_CIDR and return the value
The -r removes the quotes around the string
Putting it all together:
#!/usr/bin/env bash
url='https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml'
curl -s $url | \
ruby -ryaml -rjson -e \
"puts YAML.load_stream(ARGF.read)
.select{|doc| doc['kind']=='DaemonSet'}[0].to_json" \
| jq . > calico.json
jq -r \
'.spec.template.spec.containers[].env[] | select(.name=="CALICO_IPV4POOL_CIDR") | .value' \
calico.json
Testing:
▶ bash test.sh
192.168.0.0/16

Dynamic delimiter in Unix

Input:-
echo "1234ABC89,234" # A
echo "0520001DEF78,66" # B
echo "46545455KRJ21,00"
From the above strings, I need to split the characters to get the alphabetic field and the number after that.
From "1234ABC89,234", the output should be:
ABC
89,234
From "0520001DEF78,66", the output should be:
DEF
78,66
I have many strings that I need to split like this.
Here is my script so far:
echo "1234ABC89,234" | cut -d',' -f1
but it gives me 1234ABC89 which isn't what I want.
Assuming that you want to discard leading digits only, and that the letters will be all upper case, the following should work:
echo "1234ABC89,234" | sed 's/^[0-9]*\([A-Z]*\)\([0-9].*\)/\1\n\2/'
This works fine with GNU sed (I have 4.2.2), but other sed implementations might not like the \n, in which case you'll need to substitute something else.
Depending on the version of sed you can try:
echo "0520001DEF78,66" | sed -E -e 's/[0-9]*([A-Z]*)([,0-9]*)/\1\n\2/'
or:
echo "0520001DEF78,66" | sed -E -e 's/[0-9]*([A-Z]*)([,0-9]*)/\1$\2/' | tr '$' '\n'
DEF
78,66
Explanation: the regular expression replaces the input with the expected output, except instead of the new-line it puts a "$" sign, that we replace to a new-line with the tr command
Where do the strings come from? Are they read from a file (or other source external to the script), or are they stored in the script? If they're in the script, you should simply reformat the data so it is easier to manage. Therefore, it is sensible to assume they come from an external data source such as a file or being piped to the script.
You could simply feed the data through sed:
sed 's/^[0-9]*\([A-Z]*\)/\1 /' |
while read alpha number
do
…process the two fields…
done
The only trick to watch there is that if you set variables in the loop, they won't necessarily be visible to the script after the done. There are ways around that problem — some of which depend on which shell you use. This much is the same in any derivative of the Bourne shell.
You said you have many strings like this, so I recommend if possible save them to a file such as input.txt:
1234ABC89,234
0520001DEF78,66
46545455KRJ21,00
On your command line, try this sed command reading input.txt as file argument:
$ sed -E 's/([0-9]+)([[:alpha:]]{3})(.+)/\2\t\3/g' input.txt
ABC 89,234
DEF 78,66
KRJ 21,00
How it works
uses -E for extended regular expressions to save on typing, otherwise for example for grouping we would have to escape \(
uses grouping ( and ), searches three groups:
firstly digits, + specifies one-or-more of digits. Oddly using [0-9] results in an extra blank space above results, so use POSIX class [[:digit:]]
the next is to search for POSIX alphabetical characters, regardless if lowercase or uppercase, and {3} specifies to search for 3 of them
the last group searches for . meaning any character, + for one or more times
\2\t\3 then returns group 2 and group 3, with a tab separator
Thus you are able to extract two separate fields per line, just separated by tab, for easier manipulation later.

Is it possible to clean up an HTML file with grep to extract certain strings?

There is a website that I am a part of and I wanted to get the information out of the site on a daily basis. The page looks like this:
User1 added User2.
User40 added user3.
User13 added user71
User47 added user461
so on..
There's no JSON end point to get the information and parse it. So I have to wget the page and clean up the HTML:
User1 added user2
Is it possible to clean this up even though the username always changes?
I would divide that problem into two:
How to clean up your HTML
Yes it is possible to use grep directly, but I would recommend using a standard tool to convert HTML to text before using grep. I can think of two (html2text is a conversion utility, and w3m is actually a text browser), but there are more:
wget -O - http://www.stackoverflow.com/ | html2text | grep "How.*\?"
w3m http://www.stackoverflow.com/ | grep "How.*\?"
These examples will get the homepage of StackOverflow and display all questions found on that page starting with How and ending with ? (it displays about 20 such lines for me, but YMMV depending on your settings).
How to extract only the desired strings
Concerning your username, you can just tune your expression to match different users (-E is necessary due to the extended regular expression syntax, -o will make grep print only the matching part(s) of each line):
[...] | grep -o -E ".ser[0-9]+ added .ser[0-9]+"
This however assumes that users always have a name matching .ser[0-9]+. You may want to use a more general pattern like this one:
[...] | grep -o -E "[[:graph:]]+[[:space:]]+added[[:space:]]+[[:graph:]]+"
This pattern will match added surrounded by any two other words, delimited by an arbitrary number of whitespace characters. Or simpler (assuming that a word may contain everything but blank, and the words are delimited by exactly one blank):
[...] | grep -o -E "[^ ]+ added [^ ]+"
Do you intent to just strip away the HTML-Tags?
Then try this:
sed 's/<[^>]*>//g' infile >outfile

Resources