Get the YAML path of a given line in a file - yaml

Using yq (or any other tool), how can I return the full YAML path of an arbitrary line number ?
e.g. with this file :
a:
b:
c: "foo"
d: |
abc
def
I want to get the full path of line 2; it should yield: a.b.c. Line 0 ? a, Line 4 ? a.d (multiline support), etc.
Any idea how I could achieve that?
Thanks

I have coded two solutions that differ slightly in their behaviour (see remarks below)
Use the YAML processor mikefarah/yq.
I have also tried to solve the problem using kislyuk/yq, but it is not suitable,
because the operator input_line_number only works in combination with the --raw-input option
Version 1
FILE='sample.yml'
export LINE=1
yq e '[..
| select(line == env(LINE))
| {"line": line,
"path": path | join("."),
"type": type,
"value": .}
]' $FILE
Remarks
LINE=3 returns two results, because line 3 contains two nodes
the key 'c' of map 'a.b'
the string value 'foo' of key 'c'.
LINE=5 does not return a match, because the multiline text node starts in line 4.
the results are wrapped in an array, as multiple nodes can be returned
Output for LINE=1
- line: 1
path: ""
type: '!!map'
value:
a:
b:
c: "foo"
d: |-
abc
def
Output for LINE=2
- line: 2
path: a
type: '!!map'
value:
b:
c: "foo"
Output for LINE=3
- line: 3
path: a.b
type: '!!map'
value:
c: "foo"
- line: 3
path: a.b.c
type: '!!str'
value: "foo"
Output for LINE=4
- line: 4
path: d
type: '!!str'
value: |-
abc
def
Output for LINE=5
[]
Version 2
FILE='sample.yml'
export LINE=1
if [[ $(wc -l < $FILE) -lt $LINE ]]; then
echo "$FILE has less than $LINE lines"
exit
fi
yq e '[..
| select(line <= env(LINE))
| {"line": line,
"path": path | join("."),
"type": type,
"value": .}
]
| sort_by(.line, .type)
| .[-1]' $FILE
Remarks
at most one node is returned, even if there are more nodes in the selected row. So the result does not have to be wrapped in an array.
Which node of one line is returned can be controlled by the sort_by function, which can be adapted to your own needs.
In this case, text nodes are preferred over maps because "!!map" is sorted before "!!str".
LINE=3 returns only the text node of line 3 (not node of type "!!map")
LINE=5 returns the multiline text node starting at line 4
LINE=99 does not return the last multiline text node of sample.yaml because the maximum number of lines is checked in bash beforehand
Output for LINE=1
line: 1
path: ""
type: '!!map'
value:
a:
b:
c: "foo"
d: |-
abc
def
Output for LINE=2
line: 2
path: a
type: '!!map'
value:
b:
c: "foo"
Output for LINE=3
line: 3
path: a.b.c
type: '!!str'
value: "foo"
Output for LINE=4
line: 4
path: d
type: '!!str'
value: |-
abc
def
Output for LINE=5
line: 4
path: d
type: '!!str'
value: |-
abc
def

Sharing my findings since I've spent too much time on this.
As #Inian mentioned line numbers won't necessary be accurate.
YQ does provides us with the line operator, but I was not able to find a decent way of mapping that from an input.
That said, if you're sure the input file will not contain any multi-line values, you could do something like this
Use awk to get the key of your input line, eg 3 --> C
This assumes the value will never contain :, the regex can be edited if needed to go around this
Select row in awk
Trim leading and trailing spaces from a string in awk
export searchKey=$(awk -F':' 'FNR == 3 { gsub(/ /,""); print $1 }' ii)
Use YQ to recursive (..) loop over the values, and create each path using (path | join("."))
yq e '.. | (path | join("."))' ii
Filter the values from step 2, using a regex where we only want those path's that end in the key from step 1 (strenv(searchKey))
yq e '.. | (path | join(".")) | select(match(strenv(searchKey) + "$"))' ii
Print the path if it's found
Some examples from my local machine, where your input file is named ii and both awk + yq commands are wrapped in a bash function
$ function getPathByLineNumber () {
key=$1
export searchKey="$(awk -v key=$key -F':' 'FNR == key { gsub(/ /, ""); print $1 }' ii)"
yq e '.. | (path | join(".")) | select(match(strenv(searchKey) + "$"))' ii
}
$
$
$
$
$ yq e . ii
a:
b:
c: "foo"
$
$
$ getPathByLineNumber 1
a
$ getPathByLineNumber 2
a.b
$ getPathByLineNumber 3
a.b.c
$
$

Related

Extract values from command output to a JSON

I am extracting values from a cloud foundry command. It has to be done via the shell. Here is how the file looks like:
User-Provided:
end: 123.12.12.12
text_pass: 980
KEY: 000
Running Environment Variable Groups:
BLUEMIX_REGION: ibm:yp:us-north
Staging Environment Variable Groups:
BLUEMIX_REGION: ibm:yp:us-south
I want to extract everything from end to KEY and please note that user-provided will always be the start but the end can be any value. But there will always be a new line.
How do I extract between "User-Provided to new line" and put in a JSON file which I will later use to parse?
So far I'm able to do this:
cf env space | awk -F 'end:' '{print $2}'
this gives me the value of end but not the whole object.
Expected output:
{
"end": "123.12.12.12"
"text_pass": "980"
"KEY": "000"
}
cf env space | awk '/User-Provided/{a = 1; next}/^$/{a = 0} a'
end: 123.12.12.12
text_pass: 980
KEY: 000
When pattern User-Provided is encountered set a variable a and when a blank line is encountered, unset this variable a. Now, the lines will be printed out for only the cases when a is set.
Edited answer:
cf env space | awk -F" *: *" '/User-Provided/{a=1;print"{";next}/^$/{a=0} END{print "\n}"} a{if(c)printf(","); printf("%s", "\n\""$1"\" : \""$NF"\""); c=1}'
This will give the output:
{
"end" : "123.12.12.12",
"text_pass" : "980",
"KEY" : "000"
}
Latest edit:
cf env space | awk '/User-Provided/{a=1;print"{";next}/^$/{a=0} END{print "\n}"} a{if(c)printf(","); sub(/:$/,"",$1); printf("%s", "\n\""$1"\" : \""$NF"\""); c=1}'
In awk:
$ awk '/^end:/,/^KEY:/' file
end: 123.12.12.12
text_pass: 980
KEY: 000
/.../,/.../ is used to name the start and end markers which are printed.
However, the output requirements complicate the program a bit:
$ awk '
BEGIN { FS=": *";OFS=":" } # set appropriate delimiters
/^end:/ { print "{";f=1 } # print at start marker and raise flag
f { print "\"" $1"\"","\"" $2"\"" } # when flag up, print
/^KEY:/ { print "}";f="" } # at end-marker, print end marker and flag down
' file
{
"end":"123.12.12.12"
"text_pass":"980"
"KEY":"000"
}
If you want to use and empty line as end marker, use /^$/ && f instead of /^KEY:/.

Get line number (count of newlines) when piping in bash

I am converting a file of json documents to a file of differently shaped json documents using jq. I need the output documents to have a contiguous positive id. Can I access a variable that equals the number of newlines seen?
gzcat input.gz | jq -r '"{\"id\":???, \"foo\":\(.foo)}"' > output
# can anything take the place of ??? to give 0..n?
If your jq has input_line_number, you might be able to use that. Here is a typescript illustrating what it does:
$ jq 'input_line_number'
"a"
1
"b"
2
(In the above, the input line is immediately followed by the output line.)
Similarly, here is how foreach and inputs can be used together:
$ jq -n 'foreach inputs as $line (0; .+1; "line \(.) is \($line)")'
"abc"
"line 1 is abc"
123
"line 2 is 123"
If your jq does not have foreach, then you might find reduce adequate for your needs:
$ jq -s -r 'reduce .[] as $line
( [0,""]; .[0]+=1 | .[1] += "line \(.[0]) is \($line)\n")
| .[1]'
Input:
"abc"
123
Output:
line 1 is abc
line 2 is 123

Advanced AWK formatting

I am having problem using this awk command . It is not producing the result I want giving this input file. Can someone help me with this please?
I am searching for "Class:" value of "ABC". When I find ABC . I like to assign the values associated with userName/servicelist/hostlist and port number to variables. ( please see output section ) to
awk -v q="\"" '/ABC/{f=1;c++}
f && /userName|serviceList|hostList|portNumber/
{sub(":",c"=",$1);
print $1 q $3 q
}
/port:/{f=0;print ""}' filename
The file contains the following input
Instance: Ths is a test
Class: ABC
Variables:
udpRecvBufSize: Numeric: 8190000
userName: String:test1
pingInterval: Numeric: 2
blockedServiceList: String:
acceptAllServices: Boolean: False
serviceList: String: ABC
hostList: String: 159.220.108.3
protocol: String: JJJJ
portNumber: Numeric: 20001
port: String: RTR_LLLL
Children:
Instance: The First Server in the Loop
Class: Servers
Variables:
pendout: Numeric: 0
overflows: Counter: 0
peakBufferUsage: Numeric: 100
bufferPercentage: Gauge: 1 (0,100)
currentBufferUsage: Numeric: 1
pendingBytesOut: Numeric: 0
pendingBytesIn: Numeric: 1
pingsReceived: Counter: 13597
pingsSent: Counter: 87350
clientToServerPings: Boolean: True
serverToClientPings: Boolean: True
numInputBuffers: Numeric: 10
maxOutputBuffers: Numeric: 100
guaranteedOutputBuffers: Numeric: 100
lastOutageDuration: String: 0:00:00:00
peakDisconnectTime: String:
totalDisconnectTime: String: 0:00:00:00
disconnectTime: String:
disconnectChannel: Boolean: False
enableDacsPermTest: Boolean: False
enableFirewall: Boolean: False
dacsPermDenied: Counter: 0
dacsDomain: String:
compressPercentage: Gauge: 0 (0,100)
uncompBytesSentRate: Gauge: 0 (0,9223372036854775807)
Instance: Ths is a test
Class: ABC
Variables:
udpRecvBufSize: Numeric: 8190000
userName: String:test2
pingInterval: Numeric: 4
blockedServiceList: String:
acceptAllServices: Boolean: False
serviceList: String: DEF
hostList: String: 159.220.111.2
protocol: String: ffff
portNumber: Numeric: 20004
port: String: JJJ_LLLL
Children:
This is the output I am looking for . Assigning variables
userName1="test1"
serviceList1="ABC"
hostList1="159.220.108.3"
portNumber1="2001"
userName2="test2"
serviceList2="DEF"
hostList2="159.220.111.2"
portNumber2="2004"
If your intention is to assign to a series of variables, then rather than parsing the whole file at once, perhaps you could just extract the specific parts that you're interested in one by one. For example:
$ awk -F'\n' -v RS= -v record=1 -v var=userName 'NR == record { for (i=1; i<=NF; ++i) if (sub("^\\s*" var ".*:\\s*", "", $i)) print $i }' file
test1
$ awk -F'\n' -v RS= -v record=1 -v var=serviceList 'NR == record { for (i=1; i<=NF; ++i) if (sub("^\\s*" var ".*:\\s*", "", $i)) print $i }' file
ABC
The awk script could be put inside a shell function and used like this:
parse_file() {
record=$1
var=$2
file=$3
awk -F'\n' -v RS= -v record="$record" -v var="$var" 'NR == record {
for (i=1; i<=NF; ++i) if (sub("^\\s*" var ".*:\\s*", "", $i)) print $i
}' "$file"
}
userName1=$(parse_file 1 userName file)
serviceList1=$(parse_file 1 serviceList file)
# etc.
$ awk -F: -v q="\"" '/Class: ABC/{f=1;c++;print ""} \
f && /userName|serviceList|hostList|portNumber/ \
{gsub(/ /,"",$1); \
gsub(/ /,"",$3); \
print $1 c "=" q $3 q} \
/Children:/{f=0}' vars
userName1="test1"
serviceList1="ABC"
hostList1="159.220.108.3"
portNumber1="20001"
userName2="test2"
serviceList2="DEF"
hostList2="159.220.111.2"
portNumber2="20004"
it will increment the counter for each "Class: ABC" pattern and set a flag. Will format and print the selected entries until the terminal pattern for the block. This limits the context between the two patterns.
Assuming bash 4.0 or newer, there's no need for awk here at all:
flush() {
if (( ${#hostvars[#]} )); then
for varname in userName serviceList hostList portNumber; do
[[ ${hostvars[$varname]} ]] && {
printf '%q=%q\n' "$varname" "${hostvars[$varname]}"
}
done
printf '\n'
fi
hostvars=( )
}
class=
declare -A hostvars=( )
while read -r line; do
[[ $line = *"Class: "* ]] && class=${line#*"Class: "}
[[ $class = ABC ]] || continue
case $line in
*:*:*)
IFS=$': \t' read varName varType value <<<"$line"
hostvars[$varName]=$value
;;
*"Variables:"*)
flush
;;
esac
done
flush
Notable points:
The full set of defined variables are collected in the hostvars associative array (what other languages might call a "map" or "hash"), even though we're only printing the four names defined to be of interest. More interesting logic could thus be defined that combined multiple variables to decide what to output, &c.
The flush function is defined outside the loop so it can be used in multiple places -- both when starting a new block (as detected, here, by seeing Variables:), and when at the end-of-file.
The output varies from what you requested in that it includes quotes only if necessary -- but that quoting is guaranteed to be correct and sufficient for bash to parse without room for security holes even if the strings being emitted would otherwise contain security-relevant content. Think about correctly handling a case where serviceList contains $(rm -rf /*)'$(rm -rf /*)' (the duplication being present to escape single quotes); printf %q makes this easy, whereas awk has no equivalent.
Solution in TXR:
#(collect)
#(skip)Class: ABC
Variables:
# (gather)
userName: String:#user
serviceList: String: #servicelist
hostList: String: #hostlist
portNumber: Numeric: #port
# (until)
Children:
# (end)
#(end)
#(deffilter shell-esc
("\"" "\\\"") ("$" "\\$") ("`" "\\'")
("\\" "\\\\"))
#(output :filter shell-esc)
# (repeat :counter i)
userName#(succ i)="#user"
serviceList#(succ i)="#servicelist"
hostList#(succ i)="#hostlist"
portNumber#(succ i)="#port"
# (end)
#(end)
Run:
$ txr data.txr data
userName1="test1"
serviceList1="ABC"
hostList1="159.220.108.3"
portNumber1="20001"
userName2="test2"
serviceList2="DEF"
hostList2="159.220.111.2"
portNumber2="20004"
Note 1: Escaping is necessary if the data may contain characters which are special between quotes in the target language. The shell-esc filter is based on the assumption that the generated variable assignments are shell syntax. It can easily be replaced.
Note 2: The code assumes that each Class: ABC has all of the required variables present. It will not work right if some are missing, and there are two ways to address it by tweaking the #(gather) line:
failure:
#(gather :vars (user servicelist hostlist port))
Meaning: fail if any of these four variables are not gathered. The consequence is that the entire Class: ABC section with missing variables is skipped.
default missing:
#(gather :vars (user (servicelist "ABC") hostlist port))
Meaning: must gather the four variables user, servicelist, hostlist and port. However, if serviceList is missing, then it gets the default value "ABC" and is treated as if it had been found.

Bash: tell if a file is included in another

I'm trying to compare the content of two files and tell if the content of one is totally included in another (meaning if one file has three lines, A, B and C, can I find those three lines, in that order, in the second file). I've looked at diff and grep but wasn't able to find the relevant option (if any).
Examples:
file1.txt file2.txt <= should return true (file2 is included in file1)
--------- ---------
abc def
def ghi
ghi
jkl
file1.txt file2.txt <= should return false (file2 is not included in file1)
--------- ---------
abc abc
def ghi
ghi
jkl
Any idea?
Using the answer from here
Use the following python function:
def sublistExists(list1, list2):
return ''.join(map(str, list2)) in ''.join(map(str, list1))
In action:
In [35]: a=[i.strip() for i in open("f1")]
In [36]: b=[i.strip() for i in open("f2")]
In [37]: c=[i.strip() for i in open("f3")]
In [38]: a
Out[38]: ['abc', 'def', 'ghi', 'jkl']
In [39]: b
Out[39]: ['def', 'ghi']
In [40]: c
Out[40]: ['abc', 'ghi']
In [41]: sublistExists(a, b)
Out[41]: True
In [42]: sublistExists(a, c)
Out[42]: False
Assuming your file2.txt does not contain characters with special meaning for regular expressions, you can use:
grep "$(<file2.txt)" file1.txt
This should work even if your file2.txt contains special characters:
cp file1.txt file_read.txt
while read -r a_line ; do
first_line_found=$( fgrep -nx "${a_line}" file_read.txt 2>/dev/null | head -1 )
if [ -z "$first_line_found" ];
then
exit 1 # we couldn't find a_line in the file_read.txt
else
{ echo "1,${first_line_found}d" ; echo "w" ; } | ed file_read.txt #we delete up to line_found
fi
done < file2.txt
exit 0
(the "exit 0" is there for "readability" so one can see easily that it exits with 1 only if fgrep can't find a line in file1.txt. It's not needed)
(fgrep is a literral grep, searching for a string (not a regexp))
(I haven't tested the above, it's a general idea. I hope it does work though ^^)
the "-x" force it to match lines exactly, ie, no additionnal characters (ie : "to" can no longer match "toto". Only "toto" will match "toto" when adding -x)
please try if this awk "one-liner" ^_^ works for your real file. for the example files in your question, it worked:
awk 'FNR==NR{a=a $0;next}{b=b $0}
END{while(match(b,a,m)){
if(m[0]==a) {print "included";exit}
b=substr(b,RSTART+RLENGTH)
}
print "not included"
}' file2 file1

shell script to search attribute and store value along with filename

Looking out for a shell script which searches for an attribute (a string) in all the files in current directory and stores the attribute values along with the file name.
e.g File1.txt
abc xyz = "pqr"
File2.txt
abc xyz = "klm"
Here File1 and File2 contains desired string "abc xyz" and have values "pqr" and "klm".
I want result something like this:
File1.txt:pqr
File2.txt:klm
Well, this depends on how do you define a 'shell script'. Here are 3 one-line solutions:
Using grep/sed:
egrep -o "abc xyz = ".*"' * | sed -e 's/abc xyz = "(.*)"/\1/'
Using awk:
awk '/abc xyz = "(.)"/ { print FILENAME ":" gensub("abc xyz = \"(.)\"", "\1", 1) }' *
Using perl one-liner:
perl -ne 'if(s/abc xyz = "(.*)"/$ARGV:$1/) { print }' *
I personally would go with the last one.
Please don't use bash scripting for this.
There is much room for small improvements in the code,
but in 20 lines the damn thing does the job.
Note: the code assumes that "abc xyz" is at the beginning of the line.
#!/usr/bin/python
import os
import re
MYDIR = '/dir/you/want/to/search'
def search_file(fn):
myregex = re.compile(r'abc xyz = \"([a-z]+)\"')
f = open(fn, 'r')
for line in f:
m = myregex.match(line)
if m:
yield m.group(1)
for filename in os.listdir(MYDIR):
if os.path.isfile(os.path.join(MYDIR, filename)):
matches = search_file(os.path.join(MYDIR, filename))
for match in matches:
print filename + ':' + match,
Thanks to David Beazley, A.M. Kuchling, and Mark Pilgrim for sharing their vast knowledge.
I couldn't have done something like this without you guys leading the way.

Resources