I'm parsing a html file successfully with xmllint but when I combine two or more xpath expressions I get only one occurrence and not all of them.
When I run the expressions separately I get something like this:
Expression:
xmllint --html --xpath "//h3[contains(text(),'Rodada')]/../following-sibling::div//span[contains(#class,'partida-desc')][2]/text()" 2012-campeonato-brasileiro.html 2>/dev/null
Result:
Couto Pereira - Curitiba - PR
Aflitos - Recife - PE
Serra Dourada - Goiania - GO
But when I combine the expressions:
prefix="//h3[contains(text(),'Rodada')]/../following-sibling::div"
xmllint --html --xpath "normalize-space(concat($prefix//span[contains(#class,'partida-desc')]/text(),';',$prefix//div[contains(#class,'pull-left')]//img/#title,';',$prefix//div[contains(#class,'pull-right')]//img/#title,';',$prefix//strong/span/text(),';',$prefix//span[contains(#class,'partida-desc')][2]/text()))" 2012-campeonato-brasileiro.html 2>/dev/null
Result:
Sáb, 19/05/2012 18:30 - Jogo: 3 ;Palmeiras - SP;Portuguesa - SP;1 x 1; Pacaembu - Sao Paulo - SP
It works but stop at the first result. I can't make it parse all the file.
To run this example, you can download the html from here
curl https://www.cbf.com.br/futebol-brasileiro/competicoes/campeonato-brasileiro-serie-a/2012 --compressed > /tmp/2012-campeonato-brasileiro.html
With any call to functions like normalize-space or concat in XPath 1.0, if you call it on an argument being a node-set only the value of the first node in the node-set is used.
In XPath 2 and later you can use e.g. //foo/normalize-space() or //foo/concat(.//bar, ';', .//baz) or string-join(//foo, ';').
With pure XPath 1.0 you would need to iterate in the host language (e.g. shell or XSLT or Java) and then concatenate in the host language.
Concat will operate on the first node of a nodeset.
The following command adds more processing to take advantage of xmllint shell
echo -e "cd //h3[contains(text(),'Rodada')]/../following-sibling::div \n cat .//span[contains(#class,'partida-desc')]/text() | .//div[contains(#class,'pull-left')]//img/#title | .//div[contains(#class,'pull-right')]//img/#title | .//strong/span/text() | .//span[contains(#class,'partida-desc')][2]/text() \nbye\n" | \
xmllint --html --shell 2012-campeonato-brasileiro.html 2>/dev/null | \
tr -s ' ' | grep -v '^ *$' | \
gawk 'BEGIN{ RS="(\n -------){3,3}"; FS="\n -------\n"; OFS=";"} {if(NR>2) { print gensub(/\n/,"","g",$1),gensub(/title="([^"]+)"/,"\\1","g",$2),gensub(/title="([^"]+)"/,"\\1","g",$3),$4,$5}}'
Result
Sáb, 19/05/2012 21:00 - Jogo: 4 ; Figueirense - SC; Náutico - PE;2 x 1; Orlando Scarpelli - Florianopolis - SC
Dom, 20/05/2012 16:00 - Jogo: 8 ; Ponte Preta - SP; Atlético - MG;0 x 1; Moisés Lucarelli - Campinas - SP
Dom, 20/05/2012 16:00 - Jogo: 5 ; Corinthians - SP; Fluminense - RJ;0 x 1; Pacaembu - Sao Paulo - SP
Dom, 20/05/2012 16:00 - Jogo: 7 ; Botafogo - RJ; São Paulo - SP;4 x 2; João Havelange - Rio de Janeiro - RJ
Dom, 20/05/2012 16:00 - Jogo: 6 ; Internacional - RS; Coritiba - PR;2 x 0; Beira-Rio - Porto Alegre - RS
Dom, 20/05/2012 18:30 - Jogo: 1 ; Vasco da Gama - RJ; Grêmio - RS;2 x 1; São Januário - Rio de Janeiro - RJ
Dom, 20/05/2012 18:30 - Jogo: 2 ; Bahia - BA; Santos - SP;0 x 0; Pituaçu - Salvador - BA
.... (more records)
More clean up might be needed since field contain leading/trailing spaces.
Note: html needs to be converted to unix new lines
dos2unix 2012-campeonato-brasileiro.html
Thanks for your answers!
Considering my alternatives that's my best solution so far.
´#!/bin/bash
RODADAS=$(xmllint --html --xpath "//h3[contains(text(),'Rodada ')]/text()" $1 2>/dev/null)
while read i
do
for x in {1..10}
do
PREFIX="//h3[contains(text(), '$i')]/../following-sibling::div/ul/li[$x]";
xmllint --html --xpath "normalize-space(concat($PREFIX//span[contains(#class,'partida-desc')]/text(),';',$PREFIX//div[contains(#class,'pull-left')]//img/#title,';',$PREFIX//div[contains(#class,'pull-right')]//img/#title,';',$PREFIX//strong/span/text(),';',$PREFIX//span[contains(#class,'partida-desc')][2]/text()))" $1 2>/dev/null;
done
done <<< "$RODADAS"´
Run:
./html-csv.sh 2012-campeonato-brasileiro.html
Result:
Sáb, 01/12/2012 19:30 - Jogo: 373 ;Santos - SP;Palmeiras - SP;3 x 1; Vila Belmiro - Santos - SP
Dom, 02/12/2012 17:00 - Jogo: 372 ;Fluminense - RJ;Vasco da Gama - RJ;1 x 2; João Havelange - Rio de Janeiro - RJ
Dom, 02/12/2012 17:00 - Jogo: 374 ;São Paulo - SP;Corinthians - SP;3 x 1; Pacaembu - Sao Paulo - SP
Given the following yaml:
charts:
# repository with Helm charts for creation namespaces
path: ns
pathMonitoringPrometheus: prom
namespaces:
first:
description: "Description of first"
enabled: false
branch: master
bootstrapChart: bootstrap
syncAccessGroups: []
namespace:
role: k8s-role-of-first
istio: disabled
public: view
sources: []
second:
description: "Description of second"
enabled: false
branch: HEAD
bootstrapChart: bootstrap
namespace:
role: k8s-role-of-second
istio: 1-13-2
labels:
label: second
sources:
- http://url.of.second
How could we get a list of namespaces and their istio value if it is different to "disabled".
We are trying to use "yq" tool, but I guess any approach would be ok, although "yq" would be a preferred approach.
second, 1-13-2
Using kislyuk/yq you can base your filter on jq.
to_entries splits up the object into an array of key-value pairs
select selects those items matching your criteria
String interpolation in combination with the -r option puts together your desired output
yq -r '
.namespaces
| to_entries[]
| select(.value.namespace.istio != "disabled")
| "\(.key), \(.value.namespace.istio)"
'
second, 1-13-2
Using mikefarah/yq the filter is quite similar.
to_entries[] has to be split up to_entries | .[]
String interpolation is replaced using join and an array
yq '
.namespaces
| to_entries | .[]
| select(.value.namespace.istio != "disabled")
| [.key, .value.namespace.istio] | join(", ")
'
second, 1-13-2
this will do:
cat /path/tp/your.yaml |yq -r '.namespaces | to_entries[] | "\(.key) \(.value.namespace.istio)"'`
will result:
first disabled
second 1-13-2
I have a text file looking like
text_a_3 xxx yyy
- - - - - - - - - - -
text_b_2 xyx zyz
- - - - - - - - - - -
text_b_3 xxy zyy
- - - - - - - - - - -
text_a_2 foo bar
- - - - - - - - - - -
text_a_1 foo bla
- - - - - - - - - - -
text_b_1 bla bla
I want to sort this file numerically, based on the first field, so that my output would look like:
text_a_1 foo bla
- - - - - - - - - - -
text_a_2 foo bar
- - - - - - - - - - -
text_a_3 xxx yyy
- - - - - - - - - - -
text_b_1 bla bla
- - - - - - - - - - -
text_b_2 xyx zyz
- - - - - - - - - - -
text_b_3 xxy zyy
I thought sort would do the job. I thus tried
sort -n name_of_my_file
sort -k1 -n name_of_my_file
But it gives
- - - - - - - - - - -
- - - - - - - - - - -
- - - - - - - - - - -
- - - - - - - - - - -
- - - - - - - - - - -
text_a_1 foo bla
text_a_2 foo bar
text_a_3 xxx yyy
text_b_1 bla bla
text_b_2 xyx zyz
text_b_3 xxy zyy
The option --field-separator is not of any help.
Is there any way to achieve this with a one-line, sort based command ?
Or is the only solution to extract text containing lines, sort them, and insert line delimiters afterwards ?
Using GNU awk only, and relying with internal sort function asort() and record separator set to dashes line:
awk -v RS='- - - - - - - - - - -\n' '
{a[++c]=$0}
END{
asort(a)
for(i=1;i<=c;i++)
printf "%s%s",a[i],(i==c?"":RS)
}' name_of_my_file
The script first fills the content of the input file into the array a. When the file is read, the array is sorted and then printed with the same input record separator.
When the line delimiters are all on the even lines, you can use
paste -d'\r' - - < yourfile | sort -n | tr '\r' '\n'
I actually prefer removing the delimiters in front, sort and add them afterwards, so please reconsider your requirements:
grep -Ev "(- )*-" yourfile | sort -n | sed 's/$/\n- - - - - - - - - - -/'
Following sort + awk may help you.
sort -t"_" -k2 -k3 Input_file | awk '/^-/ && !val{val=$0} !/^-/{if(prev){print prev ORS val};prev=$0} END{print prev}'
Adding a non-one liner form of solution too now.
sort -t"_" -k2 -k3 Input_file |
awk '
/^-/ && !val{
val=$0}
!/^-/{
if(prev){
print prev ORS val};
prev=$0
}
END{
print prev
}'
I have two text files with the following line format:
Value - Value - Number
I need to merge these files in a new one that contains only the lines with the common Value - Value pairs followed by the two Number values.
For example if I have these files:
File1.txt
Jack - Mark - 12
Alex - Ryan - 15
Jack - Ryan - 22
File2.txt
Paul - Bill - 11
Jack - Mark - 18
Jack - Ryan - 20
The merged file will contain:
Jack - Mark - 12 - 18
Jack - Ryan - 22 - 20
How can I do this?
awk to the rescue!
awk -F' - ' 'BEGIN{OFS=FS}
NR==FNR{a[$1,$2]=$3;next}
($1,$2) in a{print $1,$2,a[$1,$2],$3}' file1 file2
Jack - Mark - 12 - 18
Jack - Ryan - 22 - 20
alternatively, with decorate/join/undecorate
$ join <(sort file1 | sed 's/ - /-/') <(sort file2 | sed 's/ - /-/') |
sed 's/-/ - /'
Jack - Mark - 12 - 18
Jack - Ryan - 22 - 20
Basically here is what i want to do:
I have unsorted mp3 files that do not have track numbers. I downloaded the complete txt file (the file contains the album's track's numbers, artists and titles in the order 1 - someone - something). My next step was to import all the file names from the album folder to another txt.
So here is an example of the first txt:
1 - CCC - C
2 - AAA - A
3 - BBB - B
And here is my generated txt file:
AAA - A
BBB - B
CCC - C
Can someone please post a full command or a script so i could get an output file like this:
2 - AAA - A
3 - BBB - B
1 - CCC - C
For the record it's not a school assignment. I like my mp3s sorted and there are just too many unsorted albums and using mp3tag or anything else to manual search for the files and type in the numbers is too much time consuming.
sort -t - -k 2 first.txt
Output:
2 - AAA - A
3 - BBB - B
1 - CCC - C