How can I get content of values under key images in YAML file from the shell - shell

YAML file like this:
http:
Domain: {{ environment.domains.httpport }}
images:
emas_fe_weex: 20170810-ed0b13f
eweex_basic_manager: 20150109-e0fafa3
replicaCount:
xxxx: 1
resources:
{}
How can I get the following with shell?
emas_fe_weex: 20170810-ed0b13f
eweex_basic_manager: 20150109-e0fafa3

It is best to process YAML with a YAML parser e.g. with Python and ruamel.yaml (disclaimer: I am the author of that package). With the input in input.yaml:
< input.yaml python -c "import sys, ruamel.yaml; yaml=ruamel.yaml.YAML(); yaml.dump(yaml.load(sys.stdin)['http']['images'], sys.stdout)"
will output:
emas_fe_weex: 20170810-ed0b13f
eweex_basic_manager: 20150109-e0fafa3

I agree with Anthon: YAML is sufficiently complicated to require use of a YAML parser (like XML, JSON, CSV, etc)
Here are a few examples with other scripting languages, depending on your taste:
Ruby
ruby -ryaml -e '
data = YAML.load($stdin)
puts YAML.dump(data["http"]["images"])
' < file.yaml
Perl (requires YAML::XS from CPAN)
perl -MYAML::XS -0777 -nE '
$data = Load($_);
say Dump($data->{http}{images})
' < file.yaml
Tcl (requires tcllib)
echo '
package require yaml
set data [yaml::yaml2dict -file "file.yaml"]
puts [yaml::dict2yaml [dict get $data http images]]
' | tclsh

If you are using a system where grep is available, you can get them both with it. Assuming the data is in a file called http.yaml:
grep -e emas_fe_weex -e eweex_basic_manager http.yaml

Related

Bash: decode string with url escaped hex codes [duplicate]

I'm looking for a way to turn this:
hello < world
to this:
hello < world
I could use sed, but how can this be accomplished without using cryptic regex?
Try recode (archived page; GitHub mirror; Debian page):
$ echo '<' |recode html..ascii
<
Install on Linux and similar Unix-y systems:
$ sudo apt-get install recode
Install on Mac OS using:
$ brew install recode
With perl:
cat foo.html | perl -MHTML::Entities -pe 'decode_entities($_);'
With php from the command line:
cat foo.html | php -r 'while(($line=fgets(STDIN)) !== FALSE) echo html_entity_decode($line, ENT_QUOTES|ENT_HTML401);'
An alternative is to pipe through a web browser -- such as:
echo '!' | w3m -dump -T text/html
This worked great for me in cygwin, where downloading and installing distributions are difficult.
This answer was found here
Using xmlstarlet:
echo 'hello < world' | xmlstarlet unesc
A python 3.2+ version:
cat foo.html | python3 -c 'import html, sys; [print(html.unescape(l), end="") for l in sys.stdin]'
This answer is based on: Short way to escape HTML in Bash? which works fine for grabbing answers (using wget) on Stack Exchange and converting HTML to regular ASCII characters:
sed 's/ / /g; s/&/\&/g; s/</\</g; s/>/\>/g; s/"/\"/g; s/#'/\'"'"'/g; s/“/\"/g; s/”/\"/g;'
Edit 1: April 7, 2017 - Added left double quote and right double quote conversion. This is part of bash script that web-scrapes SE answers and compares them to local code files here: Ask Ubuntu -
Code Version Control between local files and Ask Ubuntu answers
Edit June 26, 2017
Using sed was taking ~3 seconds to convert HTML to ASCII on a 1K line file from Ask Ubuntu / Stack Exchange. As such I was forced to use Bash built-in search and replace for ~1 second response time.
Here's the function:
LineOut="" # Make global
HTMLtoText () {
LineOut=$1 # Parm 1= Input line
# Replace external command: Line=$(sed 's/&/\&/g; s/</\</g;
# s/>/\>/g; s/"/\"/g; s/'/\'"'"'/g; s/“/\"/g;
# s/”/\"/g;' <<< "$Line") -- With faster builtin commands.
LineOut="${LineOut// / }"
LineOut="${LineOut//&/&}"
LineOut="${LineOut//</<}"
LineOut="${LineOut//>/>}"
LineOut="${LineOut//"/'"'}"
LineOut="${LineOut//'/"'"}"
LineOut="${LineOut//“/'"'}" # TODO: ASCII/ISO for opening quote
LineOut="${LineOut//”/'"'}" # TODO: ASCII/ISO for closing quote
} # HTMLtoText ()
On macOS, you can use the built-in command textutil (which is a handy utility in general):
echo '👋 hello < world 🌐' | textutil -convert txt -format html -stdin -stdout
outputs:
👋 hello < world 🌐
To support the unescaping of all HTML entities only with sed substitutions would require too long a list of commands to be practical, because every Unicode code point has at least two corresponding HTML entities.
But it can be done using only sed, grep, the Bourne shell and basic UNIX utilities (the GNU coreutils or equivalent):
#!/bin/sh
htmlEscDec2Hex() {
file=$1
[ ! -r "$file" ] && file=$(mktemp) && cat >"$file"
printf -- \
"$(sed 's/\\/\\\\/g;s/%/%%/g;s/&#[0-9]\{1,10\};/\&#x%x;/g' "$file")\n" \
$(grep -o '&#[0-9]\{1,10\};' "$file" | tr -d '&#;')
[ x"$1" != x"$file" ] && rm -f -- "$file"
}
htmlHexUnescape() {
printf -- "$(
sed 's/\\/\\\\/g;s/%/%%/g
;s/&#x\([0-9a-fA-F]\{1,8\}\);/\&#x0000000\1;/g
;s/&#x0*\([0-9a-fA-F]\{4\}\);/\\u\1/g
;s/&#x0*\([0-9a-fA-F]\{8\}\);/\\U\1/g' )\n"
}
htmlEscDec2Hex "$1" | htmlHexUnescape \
| sed -f named_entities.sed
Note, however, that a printf implementation supporting \uHHHH and \UHHHHHHHH sequences is required, such as the GNU utility’s. To test, check for example that printf "\u00A7\n" prints §. To call the utility instead of the shell built-in, replace the occurrences of printf with env printf.
This script uses an additional file, named_entities.sed, in order to support the named entities. It can be generated from the specification using the following HTML page:
<!DOCTYPE html>
<head><meta charset="utf-8" /></head>
<body>
<p id="sed-script"></p>
<script type="text/javascript">
const referenceURL = 'https://html.spec.whatwg.org/entities.json';
function writeln(element, text) {
element.appendChild( document.createTextNode(text) );
element.appendChild( document.createElement("br") );
}
(async function(container) {
const json = await (await fetch(referenceURL)).json();
container.innerHTML = "";
writeln(container, "#!/usr/bin/sed -f");
const addLast = [];
for (const name in json) {
const characters = json[name].characters
.replace("\\", "\\\\")
.replace("/", "\\/");
const command = "s/" + name + "/" + characters + "/g";
if ( name.endsWith(";") ) {
writeln(container, command);
} else {
addLast.push(command);
}
}
for (const command of addLast) { writeln(container, command); }
})( document.getElementById("sed-script") );
</script>
</body></html>
Simply open it in a modern browser, and save the resulting page as text as named_entities.sed. This sed script can also be used alone if only named entities are required; in this case it is convenient to give it executable permission so that it can be called directly.
Now the above shell script can be used as ./html_unescape.sh foo.html, or inside a pipeline reading from standard input.
For example, if for some reason it is needed to process the data by chunks (it might be the case if printf is not a shell built-in and the data to process is large), one could use it as:
nLines=20
seq 1 $nLines $(grep -c $ "$inputFile") | while read n
do sed -n "$n,$((n+nLines-1))p" "$inputFile" | ./html_unescape.sh
done
Explanation of the script follows.
There are three types of escape sequences that need to be supported:
&#D; where D is the decimal value of the escaped character’s Unicode code point;
&#xH; where H is the hexadecimal value of the escaped character’s Unicode code point;
&N; where N is the name of one of the named entities for the escaped character.
The &N; escapes are supported by the generated named_entities.sed script which simply performs the list of substitutions.
The central piece of this method for supporting the code point escapes is the printf utility, which is able to:
print numbers in hexadecimal format, and
print characters from their code point’s hexadecimal value (using the escapes \uHHHH or \UHHHHHHHH).
The first feature, with some help from sed and grep, is used to reduce the &#D; escapes into &#xH; escapes. The shell function htmlEscDec2Hex does that.
The function htmlHexUnescape uses sed to transform the &#xH; escapes into printf’s \u/\U escapes, then uses the second feature to print the unescaped characters.
I like the Perl answer given in https://stackoverflow.com/a/13161719/1506477.
cat foo.html | perl -MHTML::Entities -pe 'decode_entities($_);'
But, it produced an unequal number of lines on plain text files. (and I dont know perl enough to debug it.)
I like the python answer given in https://stackoverflow.com/a/42672936/1506477 --
python3 -c 'import html, sys; [print(html.unescape(l), end="") for l in sys.stdin]'
but it creates a list [ ... for l in sys.stdin] in memory, that is forbidden for large files.
Here is another easy pythonic way without buffering in memory: using awkg.
$ echo 'hello < : " world' | \
awkg -b 'from html import unescape' 'print(unescape(R0))'
hello < : " world
awkg is a python based awk-like line processor. You may install it using pip https://pypi.org/project/awkg/:
pip install awkg
-b is awk's BEGIN{} block that runs once in the beginning.
Here we just did from html import unescape.
Each line record is in R0 variable, for which we did
print(unescape(R0))
Disclaimer:
I am the maintainer of awkg
I have created a sed script based on the list of entities so it must handle most of the entities.
sed -f htmlentities.sed < file.html
My original answer got some comments, that recode does not work for UTF-8 encoded HTML files. This is correct. recode supports only HTML 4. The encoding HTML is an alias for HTML_4.0:
$ recode -l | grep -iw html
HTML-i18n 2070 RFC2070
HTML_4.0 h h4 HTML
The default encoding for HTML 4 is Latin-1. This has changed in HTML 5. The default encoding for HTML 5 is UTF-8. This is the reason, why recode does not work for HTML 5 files.
HTML 5 defines the list of entities here:
https://html.spec.whatwg.org/multipage/named-characters.html
The definition includes a machine readable specification in JSON format:
https://html.spec.whatwg.org/entities.json
The JSON file can be used to perform a simple text replacement. The following example is a self modifying Perl script, which caches the JSON specification in its DATA chunk.
Note: For some obscure compatibility reasons, the specification allows entities without a terminating semicolon. Because of that the entities are sorted by length in reverse order to make sure, that the correct entities are replaced first so that they do not get destroyed by entities without the ending semicolon.
#! /usr/bin/perl
use utf8;
use strict;
use warnings;
use open qw(:std :utf8);
use LWP::Simple;
use JSON::Parse qw(parse_json);
my $entities;
INIT {
if (eof DATA) {
my $data = tell DATA;
open DATA, '+<', $0;
seek DATA, $data, 0;
my $entities_json = get 'https://html.spec.whatwg.org/entities.json';
print DATA $entities_json;
truncate DATA, tell DATA;
close DATA;
$entities = parse_json ($entities_json);
} else {
local $/ = undef;
$entities = parse_json (<DATA>);
}
}
local $/ = undef;
my $html = <>;
for my $entity (sort { length $b <=> length $a } keys %$entities) {
my $characters = $entities->{$entity}->{characters};
$html =~ s/$entity/$characters/g;
}
print $html;
__DATA__
Example usage:
$ echo '😊 & ٱلْعَرَبِيَّة' | ./html5-to-utf8.pl
😊 & ٱلْعَرَبِيَّة
With Xidel:
echo 'hello < : " world' | xidel -s - -e 'parse-html($raw)'
hello < : " world

Reading strings from JSON using ksh

{
"assignedTo": [
"Daniel Raisor",
"Dalton Leslie",
"Logan Petro"
]
}
I want to extract Daniel Raisor Dalton Leslie and Logan Petro and assign these to a separate variable like assignedTo=Daniel Raisor,Dalton Leslie,Logan Petro
also my unix doesn't support jq ,I need to do this using grep or sed command
Use a proper JSON parser to parse JSON. Even if you don't have jq, there's a perfectly good one included with the Python interpreter, which the following example script uses:
#!/bin/ksh
# note: this was tested with ksh93u+ 2012-08-01
# to use with Python 3 rather than Python 2, replace "pipes" with "shlex"
json='{"assignedTo": ["Daniel \"The Man\" Raisor","Dalton Leslie","Logan Petro"]}'
getEntries() {
python -c '
import json, sys, pipes
content = json.load(sys.stdin)
for name, values in content.iteritems():
print("%s=%s" % (pipes.quote(name), pipes.quote(",".join(values))))
'
}
eval "$(getEntries <<<"$json")"
echo "$assignedTo"
...properly emits the following output:
Daniel "The Man" Raisor,Dalton Leslie,Logan Petro

To remove line based on string

I have file like test.yaml file, the text content in the file like below.
servers:
- uri: "http://demo.nginx1.com/status"
name: "NGinX Monitor1"
- uri: "http://demo.nginx2.com/status"
name: "NGinX Monitor2"
I want to remove - uri line and immediate next line (start with name:) where host name = demo.nginx1.com.
I want out put like below.
servers:
- uri: "http://demo.nginx2.com/status"
name: "NGinX Monitor2"
I tied like below..
cat test.yaml | grep -v demo.nginx1.com | grep -v Monitor1 >> test_back.yaml
mv test_back.yaml test.yaml
I am getting expected out put. But it's re creating the file and I don't want to re create the file
Please help me with suitable command that i can use..
Just a simple logic using GNU sed
sed '/demo.nginx1.com/,+1 d' test.yaml
servers:
- uri: "http://demo.nginx2.com/status"
name: "NGinX Monitor2"
For in-place replacement, add a -i flag as -i.bak
sed -i.bak '/demo.nginx1.com/,+1 d' test.yaml
To see the in-place replacement:-
cat test.yaml
servers:
- uri: "http://demo.nginx2.com/status"
name: "NGinX Monitor2"
As I dislike using regular expressions to hack something you can parse - here's how I'd tackle it, using perl and the YAML module:
#!/usr/bin/env perl
use strict;
use warnings;
use YAML;
use Data::Dumper;
#load yaml by reading files specified as args to stdin,
#or piped in. (Just like how you'd use 'sed')
my $config = Load ( do { local $/ ; <>} );
#output data structure for debug
print Dumper $config;
#use grep to filter the uri you don't want.
#{$config -> {servers}} = grep { not $_ -> {uri} =~ m/demo.nginx2/ } #{$config -> {servers}};
#resultant data structure
print Dumper $config;
#output YAML to STDOUT
print Dump $config;

Search and replace with Bash

I have a mustache like template file that needs to be parsed. e.g.
abc def ghijk{{ var1 }} lmno {{ var2 }} pq
rst={{ var3 }} uvwzyx
Variables like {{ var1 }} need to be replaced by a variable with the same name, which is already defined previous in the bash script, e.g var1="foobar".
I am thinking of using while read and awk to accomplish this, but don't know what is the correct way to do string manipulation in this case.
Thanks in advance!
export var1="duper"
export var2="tester"
export var3=1231
sed -e 's/{{ *\([^} ]*\) *}}/$\1/g' -e 's/^/echo "/' -e 's/$/"/' input | sh
Gives:
abc def ghijkduper lmno tester pq
rst=1231 uvwzyx
Here's an all awk version that requires a file of key=value pairs for the replacements. If I make a vars file like:
var1 =foobar
var2 =elf
var3 =monkey data
where I "cheated" and included the whitespaces associated with vars in your data file. Then I made an executable awk file like:
#!/usr/bin/awk -f
BEGIN {FS="="}
NR==FNR {vars[$1]=$2; next}
set_delims==0 {FS="[{][{]|[}][}]"; $0=$0; set_delims=1 }
{
for(i=1;i<=NF;i++) {
printf( "%s%s", ($i in vars) ? vars[$i] : $i, i==NF?"\n":"")
}
}
If the executable awk file is called awko it can be run like awko vars data:
abc def ghijkfoobar lmno elf pq
rst=monkey data uvwzyx
I had a similar issue recently and found this nice mustache implementation in bash: https://github.com/tests-always-included/mo
If you have all your variables already defined in the current bash context it's as simple as calling:
./mo template.file
With a slightly different usage it also supports other mustache features like partials or arrays.
To do this in a scalable manner (lots of macros to expand), you'll probably want to look into using a macro processor. GNU m4 springs to mind, though it only allows alphanumeric and _ characters in macro names, so may be tricky for this particular task.
To get you started, you can use sed to do the replacements with this sed script (I called it mustache.sed):
s/{{ var1 }}/foobar/g
s/{{ var2 }}/bazbar/g
s/{{ var3 }}/quxbar/g
In use with your example:
$ sed -f mustache.sed mustache.txt
abc def ghijkfoobar lmno bazbar pq
rst=quxbar uvwzyx
$
You could put this sed script all on one sed command line, but I think using the script makes it more readable.

Storing files inside BASH scripts

Is there a way to store binary data inside a BASH script so that it can be piped to a program later in that script?
At the moment (on Mac OS X) I'm doing
play sound.m4a
# do stuff
I'd like to be able to do something like:
SOUND <<< the m4a data, encoded somehow?
END
echo $SOUND | play
#do stuff
Is there a way to do this?
Base64 encode it. For example:
$ openssl base64 < sound.m4a
and then in the script:
S=<<SOUND
YOURBASE64GOESHERE
SOUND
echo $S | openssl base64 -d | play
I know this is like riding a dead horse since this post is rather old, but I'd like to improve Sionide21 answer as his solution stores the binary data in a variable which is not necessary.
openssl base64 -d <<SOUND | play
YOURBASE64DATAHERE
SOUND
Note: HereDoc Syntax requires that you don't indent the last 'SOUND'
and base64 decoding sometimes failed on me when i indented that
'YOURBASE64DATAHERE' section. So it's best practice to keep the Base64
Data as well the end-token unindented.
I've found this looking for a more elegant way to store binary data in shell scripts, but i had already solved it like described here. Only difference is I'm transporting some tar-bzipped files this way. My platform knows a separate base64 binary so I don't have to use openssl.
base64 -d <<EOF | tar xj
BASE64ENCODEDTBZ
EOF
There is a Unix format called shar (shell archive) that allows you to store binary data in a shell script. You create a shar file using the shar command.
When I've done this I've used a shell here document piped through atob.
function emit_binary {
cat << 'EOF' | atob
--junk emitted by btoa here
EOF
}
the single quotes around 'EOF' prevent parameter expansion in the body of the here document.
atob and btoa are very old programs, and for some reason they are often absent from modern Unix distributions. A somewhat less efficient but more ubiquitous alternative is to use mimencode -b instead of btoa. mimencode will encode into base64 ASCII. The corresponding decoding command is mimencode -b -u instead of atob. The openssl command will also do base64 encoding.
Here's some code I wrote a long time ago that packs a choice executable into a bash script. I can't remember exactly how it works, but I suspect you could pretty easily modify it to do what you want.
#!/usr/bin/perl
use strict;
print "Stub Creator 1.0\n";
unless($#ARGV == 1)
{
print "Invalid argument count, usage: ./makestub.pl InputExecutable OutputCompressedExecutable\n";
exit;
}
unless(-r $ARGV[0])
{
die "Unable to read input file $ARGV[0]: $!\n";
}
my $OUTFILE;
open(OUTFILE, ">$ARGV[1]") or die "Unable to create $ARGV[1]: $!\n";
print "\nCreating stub script...";
print OUTFILE "#!/bin/bash\n";
print OUTFILE "a=/tmp/\`date +%s%N\`;tail -n+3 \$0 | zcat > \$a;chmod 700 \$a;\$a \${*};rm -f \$a;exit;\n";
close(OUTFILE);
print "done.\nCompressing input executable and appending...";
`gzip $ARGV[0] -n --best -c >> $ARGV[1]`;
`chmod +x $ARGV[1]`;
my $OrigSize;
$OrigSize = -s $ARGV[0];
my $NewSize;
$NewSize = -s $ARGV[1];
my $Temp;
if($OrigSize == 0)
{
$NewSize = 1;
}
$Temp = ($NewSize / $OrigSize) * 100;
$Temp *= 1000;
$Temp = int($Temp);
$Temp /= 1000;
print "done.\nStub successfully composed!\n\n";
print <<THEEND;
Original size: $OrigSize
New size: $NewSize
Compression: $Temp\%
THEEND
If it's a single block of data to use, the trick I've used is to put a "start of data" marker at the end of the file, then use sed in the script to filter out the leading stuff. For example, create the following as "play-sound.bash":
#!/bin/bash
sed '1,/^START OF DATA/d' $0 | play
exit 0
START OF DATA
Then, you can just append your data to the end of this file:
cat sound.m4a >> play-sound.bash
and now, executing the script should play the sound directly.
Since Python is available on OS X by default, you can do as below:
ENCODED=$(python -m base64 foo.m4a)
Then decode it as below:
echo $ENCODED | python -m base64 -d | play

Resources