Building a 'table of contents' - xpath

Consider the following 'sample.xml'
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
<level>
<name>testA</name>
<level>
<name>testB</name>
</level>
<level>
<name>testC</name>
<level>
<name>testD</name>
<level>
<name>testE</name>
</level>
</level>
</level>
</level>
</root>
Using xmlstarlet i can do:
xml sel -t -m //level -v name -o " " -v "count(ancestor::*)-1" -o "." -v "count(preceding-sibling::*)" -n sample.xml
This produces:
testA 0.0
testB 1.1
testC 1.2
testD 2.1
testE 3.1
What should i do to get:
testA 0.0
testB 1.1
testC 1.2
testD 1.2.1
testE 1.2.1.1
In this example i only have 4 levels, but this can be more than 4.
I am thinking of some kind of recursion, are there any links available which can explain how to do that?

You should be able to do this using XSLT with the "tr" command in xmlstarlet...
However your desired output is a little confusing. If "testA" is the first level and you start at zero, why don't all the other entries start at zero? Or maybe "root" is supposed to be zero?
Anyway, here's an example that starts at 1 instead of zero that should get you started...
XML Input (input.xml)
<root>
<level>
<name>testA</name>
<level>
<name>testB</name>
</level>
<level>
<name>testC</name>
<level>
<name>testD</name>
<level>
<name>testE</name>
</level>
</level>
</level>
</level>
</root>
XSLT 1.0 (test.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="level">
<xsl:value-of select="concat(name, ' ')"/>
<xsl:for-each select="ancestor-or-self::level">
<xsl:if test="not(position()=1)">.</xsl:if>
<xsl:number/>
</xsl:for-each>
<xsl:text>
</xsl:text>
<xsl:apply-templates select="level"/>
</xsl:template>
</xsl:stylesheet>
Command Line
xmlstarlet tr test.xsl input.xml
Output
testA 1
testB 1.1
testC 1.2
testD 1.2.1
testE 1.2.1.1

This problem can be solved without recursion, by iterating over
elements on the ancestor-or-self axis.
The following xmlstarlet command processes all level elements using
the inner -m (xsl:for-each) to handle each path from root to target
(as suggested in comments the shell variable base defaults to 0 but
can be set to 1).
xmlstarlet select -T -t \
-m '//level' \
-v 'concat(name," ")' \
-m 'ancestor-or-self::level' \
--if 'position() = 1' \
-v "'${base:-0}'" \
--else \
-o '.' \
-v 'count(preceding-sibling::level) + 1' \
-b \
-b \
-n \
file.xml
Output:
testA 0
testB 0.1
testC 0.2
testD 0.2.1
testE 0.2.1.1
For a more compact inner -m -- producing the same output -- instead
-m 'ancestor-or-self::level' \
--if 'position() != 1' -o '.' -b \
-v 'count(preceding-sibling::level) + number(position() != 1)' \
-b \
where the count is incremented by 1 for all except the root level
where position() is 1.
As a variation on the same theme: select elements with the shell
variable target and print their paths as XPath expressions using the
XSLT current() function to reference the
element being processed by the inner -m:
target='//level[name="testB" or name="testE"]'
xmlstarlet select -T -t \
-m "${target}" \
-m 'ancestor-or-self::*' \
--var pos='1 + count(preceding-sibling::*[name() = name(current())])' \
-v 'concat("/",name(),"[",$pos,"]")' \
-b \
-n \
file.xml
Output:
/root[1]/level[1]/level[1]
/root[1]/level[1]/level[2]/level[1]/level[1]

Related

Extracting data from API using grep

I'm trying to make a bash scraper, I've managed to extract the data, but struggle with fetching the lines for f.ex today's temperature using grep since the date and temperature is not on the same line. I would like the results to be outputted into a file.
I've tried grep -E -o '[2022]-[11]-[15]' | grep "celsius" | grep -E -o '[0-9]{1,2}.[0-9]{1,2}' > file.txt
API result
`product class="pointData">
<time datatype="forecast" from="2022-11-14T18:00:00Z" to="2022-11-14T18:00:00Z">
<location altitude="4" latitude="60.3913" longitude="5.3221">
<temperature id="TTT" unit="celsius" value="8.2"/>
<windDirection id="dd" deg="146.5" name="SE"/>
<windSpeed id="ff" mps="0.5" beaufort="1" name="Flau vind"/>
<windGust id="ff_gust" mps="1.2"/>
<humidity unit="percent" value="82.5"/>
<pressure id="pr" unit="hPa" value="1014.5"/>
<cloudiness id="NN" percent="45.1"/>
<fog id="FOG" percent="0.0"/>
<lowClouds id="LOW" percent="4.5"/>
<mediumClouds id="MEDIUM" percent="0.0"/>
<highClouds id="HIGH" percent="39.9"/>
<dewpointTemperature id="TD" unit="celsius" value="5.0"/>
</location>
</time>
<time datatype="forecast" from="2022-11-14T17:00:00Z" to="2022-11-14T18:00:00Z">
<location altitude="4" latitude="60.3913" longitude="5.3221">
<precipitation unit="mm" value="0.0" minvalue="0.0" maxvalue="0.0"/>
<symbol id="PartlyCloud" number="3" code="partlycloudy_night"/>
</location>
</time>
<time datatype="forecast" from="2022-11-14T19:00:00Z" to="2022-11-14T19:00:00Z">
<location altitude="4" latitude="60.3913" longitude="5.3221">
<temperature id="TTT" unit="celsius" value="8.7"/>
<windDirection id="dd" deg="112.5" name="SE"/>
<windSpeed id="ff" mps="0.4" beaufort="1" name="Flau vind"/>
<windGust id="ff_gust" mps="0.8"/>
<humidity unit="percent" value="75.6"/>
<pressure id="pr" unit="hPa" value="1013.8"/>
<cloudiness id="NN" percent="57.5"/>
<fog id="FOG" percent="0.0"/>
<lowClouds id="LOW" percent="1.1"/>
<mediumClouds id="MEDIUM" percent="0.4"/>
<highClouds id="HIGH" percent="55.4"/>
<dewpointTemperature id="TD" unit="celsius" value="4.4"/>
</location>
</time>
Output to file should be.
8.2
grep -A3 '2022-11-14' -m1 inputfile.txt | \
grep -P -o "<temperature.*celsius.*\"\K\-?[0-9]{1,2}\.[0-9]{1,2}"
8.2
-A3 print 3 lines after match
-m1 Stop after first match
-P use Perl regex
-o grep only the match
\K ignore what is before
-? get - for negative temperature
[0-9]{1,2}.[0-9]{1,2} the temperature in celsius
You can also use xq:
$ date="2022-11-14"
$ xq -r '.product.time[0] | select (."#from" | contains("'$date'")) // null | '\
'.location|.temperature|(if ."#unit" == "celsius" then ."#value" else "error" end)' \
< input.html
8.2
Or as #AndyLester said, using xpath.
$ date="2022-11-14"
$ xmllint --xpath '//time[starts-with(#from,"'$date'")][1]'\
'//temperature[#unit="celsius"]/#value' input.txt |\
grep -Po '[-]?\d+\.\d+'

Data Mining understanding ``` join aux_data.txt aux_cat.txt --header -1 1 -2 1 -t '|' -a 1 > aux_ticdata1.txt```

I'm starteing with datamining on the terminal. Does anyone can explain me what does the line join aux_data.txt aux_cat.txt --header -1 1 -2 1 -t '|' -a 1 > aux_ticdata1.txt does? I know that it's joining together two files and nameing that file "aux_ticdata1.txt". Also I know that the line is doing two cosecutive intruccions. But I don't undersant what -a 1 > aux_ticdata1.txt does either. Any suggestions would be great!

shell - How to match content between xml tags?

I have this file:
<?xml version="1.0" encoding="utf-8"?>
<response>
<Count>1</Count>
<Messages>
<Message>
<Smstat>0</Smstat>
<Index>40001</Index>
<Phone>234</Phone>
<Content>Poin Bonstri kamu: 358
Sisa Kuota kamu :
Kuota WA.Line 18 MB s.d 06/08/2019 19:33:46
Kuota Reguler 1478 MB s.d 02/08/2019 05:36:44
Temukan beragam paket lain di bima+ https://goo.gl/RQ1DBA</Content>
<Date>2019-08-01 13:28:04</Date>
<Sca></Sca>
<SaveType>4</SaveType>
<Priority>0</Priority>
<SmsType>2</SmsType>
</Message>
</Messages>
</response>
I want to match the text between <Content> and </Content>. I've tried:
tr '\n' ' ' < input_file | grep -E "^<Content>.*</Content>$"
But it doesn't work. Please note that I use ash shell instead of bash. How do I do this ?
If you have PCRE capable grep you could use positive lookahead and -behind:
$ tr '\n' ' ' < file | grep -Po "(?<=<Content>).*(?=</Content>)"
Output:
Poin Bonstri kamu: 358 Sisa Kuota kamu : Kuota WA.Line 18 MB s.d 06/08/2019 19:33:46 Kuota Reguler 1478 MB s.d 02/08/2019 05:36:44 Temukan beragam paket lain di bima+ https://goo.gl/RQ1DBA

Simulating User Interaction In Gromacs in Bash

I am currently doing parallel cascade simulations in GROMACS 4.6.5 and I am inputting the commands using a bash script:
#!/bin/bash
pdb2gmx -f step_04_01.pdb -o step_04_01.gro -water none -ff amber99sb -ignh
grompp -f minim.mdp -c step_04_01.gro -p topol.top -o em.tpr
mdrun -v -deffnm em
grompp -f nvt.mdp -c em.gro -p topol.top -o nvt.tpr
mdrun -v -deffnm nvt
grompp -f md.mdp -c nvt.gro -t nvt.cpt -p topol.top -o step_04_01.tpr
mdrun -v -deffnm step_04_01
trjconv -s step_04_01.tpr -f step_04_01.xtc -pbc mol -o step_04_01_pbc.xtc
g_rms -s itasser_2znh.tpr -f step_04_01_pbc.xtc -o step_04_01_rmsd.xvg
Commands such as trjconv and g_rms require user interaction to select options. For instance when running trjconv you are given:
Select group for output
Group 0 ( System) has 6241 elements
Group 1 ( Protein) has 6241 elements
Group 2 ( Protein-H) has 3126 elements
Group 3 ( C-alpha) has 394 elements
Group 4 ( Backbone) has 1182 elements
Group 5 ( MainChain) has 1577 elements
Group 6 ( MainChain+Cb) has 1949 elements
Group 7 ( MainChain+H) has 1956 elements
Group 8 ( SideChain) has 4285 elements
Group 9 ( SideChain-H) has 1549 elements
Select a group:
And the user is expected to enter eg. 0 into the terminal to select Group 0. I have tried using expect and send, eg:
trjconv -s step_04_01.tpr -f step_04_01.xtc -pbc mol -o step_04_01_pbc.xtc
expect "Select group: "
send "0"
However this does not work. I have also tried using -flag like in http://www.gromacs.org/Documentation/How-tos/Using_Commands_in_Scripts#Within_Script but it says that it is not a recognised input.
Is my expect \ send formatted correctly? Is there another way around this in GROMACS?
I don't know gromacs but I think they are just asking you to to use the bash syntax:
yourcomand ... <<EOF
1st answer to a question
2nd answer to a question
EOF
so you might have
trjconv -s step_04_01.tpr -f step_04_01.xtc -pbc mol -o step_04_01_pbc.xtc <<EOF
0
EOF
You can use
echo 0 | trjconv -s step_04_01.tpr -f step_04_01.xtc -pbc mol -o step_04_01_pbc.xtc
And if you need to have multiple inputs, just use
echo 4 4 | g_rms -s itasser_2znh.tpr -f step_04_01_pbc.xtc -o step_04_01_rmsd.xvg

How to grep from the output and write the greped text to a file

I want grep only the text after every http: line and write it to a file.
I have the current output from the output stream
References
1. https://soundcloud.com/sc-opensearch.xml
2. https://m.soundcloud.com/search/sounds?q=L AME IMMORTELLE
3. https://soundcloud.com/
4. http://www.enable-javascript.com/
5. https://soundcloud.com/search
6. https://soundcloud.com/search/sounds
7. https://soundcloud.com/search/sets
8. https://soundcloud.com/search/people
9. https://soundcloud.com/search/groups
10. https://soundcloud.com/thomas-rainer/l-ame-immortelle-banish
11. https://soundcloud.com/outtamyndxmetal-llc/lame-immortelle-the-heart
12. https://soundcloud.com/cyberdelic-mind/l-me-immortelle-dark-mix-i
13. https://soundcloud.com/sawthinzarhtaik/dort-drauben
14. https://soundcloud.com/lagrima-negra/lagrima-tears-in-the-rain
15. https://soundcloud.com/bathony/in-strict-confidence-zauberschlos-lame-immortelle-version
16. https://soundcloud.com/jubej-thos/sirius-5-jahre-lame-immortelle
17. https://soundcloud.com/virul3nt/lamme-immortelle-sag-mir-wann-shiv-r-remix
18. https://soundcloud.com/outtamyndxmetal-llc/lame-immortelle-no-goodbye
19. https://soundcloud.com/usefulrage/das-ich-dem-ich-den-traum
20. http://help.soundcloud.com/customer/portal/articles/552882-the-site-won-t-load-for-me-all-i-see-is-the-soundcloud-logo-what-can-i-do-
21. http://google.com/chrome
22. http://firefox.com/
23. http://apple.com/safari
24. http://windows.microsoft.com/ie
25. http://help.soundcloud.com/
and my code currently which is not greping is below
lynx --dump -listonly https://soundcloud.com/search/sounds?q=L%20AME%20IMMORTELLE | \
tr "\t\r\n'" ' "' | \
grep -i -o 'http......HERE I NEED THE GREP STUFF' | \
sed -e 's/^.*"\([^"]\+\)".*$/\1/g' \ >k.txt
You can use grep -E:
grep -i -oE 'https?://soundcloud\.com[^[:blank:]]*'
It worked with
lynx --dump -listonly https://soundcloud.com/search/sounds?q=L%20AME%20IMMORTELLE | \
tr "\t\r\n'" ' "' | \
grep -i -oE 'https?://[^[:blank:]]+' | \
sed -e 's/^.*"\([^"]\+\)".*$/\1/g' \
>k.txt
i got the appropriate output
https://soundcloud.com/sc-opensearch.xml
https://m.soundcloud.com/search/sounds?q=L
https://soundcloud.com/
http://www.enable-javascript.com/
https://soundcloud.com/search
https://soundcloud.com/search/sounds
https://soundcloud.com/search/sets
https://soundcloud.com/search/people
https://soundcloud.com/search/groups
https://soundcloud.com/thomas-rainer/l-ame-immortelle-banish
https://soundcloud.com/outtamyndxmetal-llc/lame-immortelle-the-heart
https://soundcloud.com/cyberdelic-mind/l-me-immortelle-dark-mix-i
https://soundcloud.com/sawthinzarhtaik/dort-drauben
https://soundcloud.com/lagrima-negra/lagrima-tears-in-the-rain
https://soundcloud.com/bathony/in-strict-confidence-zauberschlos-lame-immortelle-version
https://soundcloud.com/jubej-thos/sirius-5-jahre-lame-immortelle
https://soundcloud.com/virul3nt/lamme-immortelle-sag-mir-wann-shiv-r-remix
https://soundcloud.com/outtamyndxmetal-llc/lame-immortelle-no-goodbye
https://soundcloud.com/usefulrage/das-ich-dem-ich-den-traum
http://help.soundcloud.com/customer/portal/articles/552882-the-site-won-t-load-for-me-all-i-see-is-the-soundcloud-logo-what-can-i-do-
http://google.com/chrome
http://firefox.com/
http://apple.com/safari
http://windows.microsoft.com/ie
http://help.soundcloud.com/

Resources