Convert XMLdata to facts

Convert XMLdata to facts - xpath

Part of my config-file is as follows:
<factFile name="Apps.xml">
<directory>/home/<account>/Werk/Divers/Prolog/XMLdata/</directory>
<field>apl_id</field>
<field>dns_id</field>
<field>apl_naam_kort</field>
</factFile>
<factFile name="Dienst.xml">
<directory>/home/<account>/Werk/Divers/Prolog/XMLdata/</directory>
<field>dns_id</field>
<field>dns_afkorting</field>
<field>dns_naam</field>
</factFile>
Each factFile is created by MySQL (mysql -u username -p -X -e 'use schema; select-statement' > Apps.xml)
The number of factFiles can change, as do the number of fields.
What I want is to convert the content (values) from each datafile to facts. So
<row>
<field name="apl_id">1</field>
<field name="dns_id">7</field>
<field name="apl_naam_kort">Risk</field>
</row>
should be converted to
assertz(apps(1, 7, Risk)).
What is the best approach to realize this?

Since the number of fields may change per row and you are using SWI-Prolog, I believe that using library record (Wielemaker & O'Keefe) is a good approach here. It allows a subset of predicate arguments to be specified and performs type checking, catching some potential errors early on.
Since I do not know your XML Schema here, I have specified 3 sample field arguments:
apl_id of type integer.
dns_id with default value 0 and of type non-negative integer (i.e., nonneg).
apl_naam_kort of type atom.
It is easy to extend the record/1 declaration with additional field names. The arity of the dynamic/1 declaration would have to be upped accordingly.
Since SWI-Prolog comes with very good Web standards support (wise choice to use SWI for this!) it is straightforward to load XML DOM from file(s) (i.e., load_xml/3) and match rows and fields using XPath-like statements (i.e., xpath/3).
:- module(fact_file, [load_fact_file/1]).
:- use_module(library(record)).
:- use_module(library(sgml)).
:- use_module(library(xpath)).
:- record(apps(apl_id:integer, dns_id:nonneg=0, apl_naam_kort)).
:- dynamic(apps/3).
load_fact_file(File):-
load_xml(File, Dom, []),
forall(
xpath(Dom, //row, Row),
(
findall(
NVPair,
(
xpath(Row, //field(#name=Name,text), Value1),
value_conversion(Value1, Value2),
NVPair =.. [Name,Value2]
),
NVPairs
),
make_apps(NVPairs, Apps),
assertz(Apps)
)
).
value_conversion(Atom, Number):-
atom_number(Atom, Number), !.
value_conversion(Atom, Atom).
Example use:
?- load_fact_file(<FILE-PATH>\test.xml').
true.
?- listing(apps).
:- dynamic fact_file:apps/3.
fact_file:apps(1, 7, 'Risk').
fact_file:apps(_, 0, 'Low Risk').
fact_file:apps(1, 7, _).
Contents of file test.xml:
<table>
<row>
<field name="apl_id">1</field>
<field name="dns_id">7</field>
<field name="apl_naam_kort">Risk</field>
</row>
<row>
<field name="apl_naam_kort">Low Risk</field>
</row>
<row>
<field name="apl_id">1</field>
<field name="dns_id">7</field>
</row>
</table>
Notice that missing arguments for which we did not specify a default value now appear as unnamed variables. This is because Prolog has no null value.
Possible improvements w.r.t. the above code:
Integrate value conversion into library record.
Allow fields in library record to be specified in pair notation (i.e., Name-Value or Name=Value) in addition to predicate notation (i.e., Name(Value)). This allows us to leave out code line NVPair =.. [Name,Value2].
It is possible to update the record/1 declaration dynamically. This may be needed in case the set of field names is very large, not known in advance, and/or changing over time.
If an XML Schema is given that uses XML Schema Datatypes (XSD) the value conversions can be automatically derived, e.g., xsd:nonNegativeInteger -> nonneg.

Related

Is it possible to select the properties of a node a XPATH?

I have an XML of the form:
<articleslist>
<articles>
<originalId>507948</originalId>
<title>Hogan Lovells Training Contract</title>
<slug>hogan-lovells-training-contract</slug>
<metaTitle>Hogan Lovells Training Contract</metaTitle>
<metaDescription>Find out about the Hogan Lovells Training Contract and Application Process</metaDescription>
<language>en</language>
<disableAds>false</disableAds>
<shortUrl>false</shortUrl>
<category_slug>law</category_slug>
<subcategory_slug>industry</subcategory_slug>
<updatedAt>2021-03-15T18:38:51.058+00:00</updatedAt>
<createdAt>2018-11-29T06:42:51.665+00:00</createdAt>
</articles>
</articlelist>
I'm able to select the row values with the XPATH //articles.
How can I select the child properties of articles (i.e. the column headings), so I get back a list of the form:
originalId
title
slug
etc...

Depends on your XPath version.
In XPath 2.0 it's simply //articles/*/name()
In 1.0 it's not possible because there's no such data type as a "sequence of strings". You would have to return the set of elements as //articles/*, and then extract their names in the calling program.

XPath based on node indexes only

I have an XML :
<Section>
<Paragraph>
<Text>t1</Text>
<Text>t2</Text>
</Paragraph>
<Paragraph>
<Text>t3</Text>
<Text>t4</Text>
</Paragraph>
</Section>
and I know only element indexes, e.g., /0/1/0 i.e. first Section, second Paragraph, and its first Text. How can I translate '0/1/0' into a valid XPath that returns element where t3 is ?
Note that I don't know element names because they can differ but I only know sequence of indexes as in above example.
Many thanks

For the example given this will work.
/element()[1]/element()[2]/element()[1]/text()

Simplify specific XPath expression

I would like to know if the following XPath expression can be simplified:
//map[requester/#type='2' and requester/code]
Some test data:
<root>
<map>
<requester type="2">
<code>a</code>
<code>b</code>
</requester>
</map>
...
</root>
My objective is to get only map elements which have at least one requester with type attribute and value '2' and also have at least one code element.

For your use case, this is probably as simple as it could be. However, it doesn't match what you are describing doing.
Here you are selecting map elements where
There is a requester element with type attribute equal to 2
There is a requester element with a code element
The requester elements in (1) and (2) are not necessarily the same
For example, the map element in the following is selected:
<root>
<map>
<requester type="2"/>
<requester>
<code>a</code>
</requester>
</map>
</root>
If you want the elements in (1) and (2) to be the same, you should use (simplified slightly at the suggestion of kjhughes)
//map[requester[#type='2']/code]
Here we select all map elements which have a requester element which in turn has an attribute type with a value of 2 and a code element.

Does xpath support "or" function

In case below two elements do not show in same time
<a title='a' />
<b title='b' />
I want to check if one of them can show
does xpath support the 'or' function? I just want to write in one line:
//a[#title='a'] or .. #title='b' ??

XPath Operators
Select either matching nodes (your case here):
//a[#title='a'] | //b[#title='b']
Select one element with either matching attributes
//a[#title='a' or #title='b']

If you want to match either <a/> elements with #title='a' attribute or <b/> elements with #title='b' attribute, you can also match all elements and perform a test on their name:
//*[local-name(.) = 'a' and #title='a' or local-name(.) = 'b' and #title='b']

XPATH -- Result order defined by query

I have an xpath-expression like this:
element[#attr="a"] | element[#attr="b"] | element[#attr="c"] | … which is an »or« statement. So can I create an expression that guarantees the result to appear in the order as in the query, even if the elements appear in a different order in the document?
f.e. an document fragment in this order:
<doc>
<element attr="c" />
<element attr="b" />
<element attr="a" />
.
.
.
</doc>
and a result list ordered like this:
[0] <element attr="a" />
[1] <element attr="b" />
[2] <element attr="c" />
.
.
.

The | operator computes the union of its operands and with XPath 1.0 you simply get a set of nodes, the order is undefined, though most XPath APIs then return the result in document order or allow you to say which order you want or whether order matters (see for instance http://www.w3.org/TR/DOM-Level-3-XPath/xpath.html#XPathResult).
With XPath 2.0 you get a sequence of nodes ordered in document order, with XPath 2.0 if you want the order of your subexpressions you would need to use the comma operator, not the union operator i.e. element[#attr="a"] , element[#attr="b"] , element[#attr="c"].

can I create an expression that guarantees the result to appear in the
order as in the query, even if the elements appear in a different
order in the document?
Not with any XPath 1.0 engine -- they return the resulting XmlNodeList in document order.
With XPath 2.0 one can specify that a sequence is to be returned, using the comma , operator, like this:
element[#attr="a"] , element[#attr="b"] , element[#attr="c"]
Finally, If you are limited with an XPath 1.0 implementation, one way of getting the results in the desired order is to evaluate these three XPath expressions:
element[#attr="a"]
element[#attr="b"]
element[#attr="c"]
Then you can access the first result first, the second result -- second and the third result -- third.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Convert XMLdata to facts - xpath

Related

Is it possible to select the properties of a node a XPATH?

XPath based on node indexes only

Simplify specific XPath expression

Does xpath support "or" function

XPATH -- Result order defined by query

Categories

Resources