SQL 'FOR XML' query - for-xml

I am trying to write a SQL 'FOR XML' query that produces an XML block in a specific xml format. The query I have so far is close but I am having problems getting it produce the exact xml format that I need. I hoping someone on here can help me.
Using the following SQL, I populate the table against which, the SQL FOR XML query is run:
CREATE TABLE PerfTable
(
ID INT NOT NULL,
Name VARCHAR(500) NOT NULL,
P_Performance1 NUMERIC(10,2),
B_Performance1 NUMERIC(10,2),
P_Performance2 NUMERIC(10,2),
B_Performance2 NUMERIC(10,2),
P_Performance3 NUMERIC(10,2),
B_Performance3 NUMERIC(10,2)
);
insert PerfTable(id, Name, P_Performance1, B_Performance1, P_Performance2,
B_Performance2, P_Performance3, B_Performance3)
values (111, 'Item1', -0.111, -0.112, -0.121, -0.122, -0.131, -0.132)
insert PerfTable(id, Name, P_Performance1, B_Performance1, P_Performance2,
B_Performance2, P_Performance3, B_Performance3)
values (222, 'Item2', -0.211, -0.212, -0.221, -0.222, -0.231, -0.232)
insert PerfTable(id, Name, P_Performance1, B_Performance1, P_Performance2,
B_Performance2, P_Performance3, B_Performance3)
values (333, 'Item3', -0.311, -0.312, -0.321, -0.322, -0.331, -0.332)
SELECT TOP 9
id, Name,
period as "Period_Performance/#Period",
F_Perf as "Period_Performance/F_Perf",
B_Perf as "Period_Performance/B_Perf"
FROM
(SELECT pt.id, pt.Name,
pt.P_Performance1 ,
pt.B_Performance1,
'WTD' as Period1,
pt.P_Performance2 ,
pt.B_Performance2,
'MTD' as Period3,
pt.P_Performance3 ,
pt.B_Performance3,
'YTD' as Period2
FROM PerfTable pt) a
UNPIVOT
(F_Perf FOR F IN
(P_Performance1, P_Performance2, P_Performance3)
) AS Fund_unpvt
UNPIVOT
(B_Perf FOR B IN
(B_Performance1, B_Performance2, B_Performance3)
) AS bmk_unpvt
UNPIVOT
(period FOR periods IN
(Period1, Period2, Period3)
) AS period_unpvt
WHERE
(RIGHT(F, 1) = RIGHT(B, 1))
AND (RIGHT(F, 1) = RIGHT(periods, 1))
FOR XML PATH('Performance')
Then I run the following query:
SELECT
id, Name,
period as "Period_Performance/#Period",
F_Perf as "Period_Performance/F_Perf",
B_Perf as "Period_Performance/B_Perf"
FROM
(SELECT
pt.id,
pt.Name,
pt.P_Performance1 ,
pt.B_Performance1,
'WTD' as Period1,
pt.P_Performance2 ,
pt.B_Performance2,
'MTD' as Period3,
pt.P_Performance3 ,
pt.B_Performance3,
'YTD' as Period2
FROM PerfTable pt) a
UNPIVOT
(F_Perf FOR F IN
(P_Performance1,P_Performance2,P_Performance3)
) AS Fund_unpvt
UNPIVOT
(B_Perf FOR B IN
(B_Performance1,B_Performance2,B_Performance3)
) AS bmk_unpvt
UNPIVOT
(period FOR periods IN
(Period1,Period2, Period3)
) AS period_unpvt
WHERE
(RIGHT(F,1) = RIGHT(B,1))
AND (RIGHT(F,1) = RIGHT(periods,1))
FOR XML PATH('Performance')
This query produces the following XML (this xml may not display correctly on this webpage(?)):
<Performance>
<id>111</id>
<Name>Item1</Name>
<Period_Performance Period="WTD">
<F_Perf>-0.11</F_Perf>
<B_Perf>-0.11</B_Perf>
</Period_Performance>
</Performance>
<Performance>
<id>111</id>
<Name>Item1</Name>
<Period_Performance Period="YTD">
<F_Perf>-0.12</F_Perf>
<B_Perf>-0.12</B_Perf>
</Period_Performance>
</Performance>
<Performance>
<id>111</id>
<Name>Item1</Name>
<Period_Performance Period="MTD">
<F_Perf>-0.13</F_Perf>
<B_Perf>-0.13</B_Perf>
</Period_Performance>
</Performance>
<Performance>
<id>222</id>
<Name>Item2</Name>
<Period_Performance Period="WTD">
<F_Perf>-0.21</F_Perf>
<B_Perf>-0.21</B_Perf>
</Period_Performance>
</Performance>
<Performance>
<id>222</id>
<Name>Item2</Name>
<Period_Performance Period="YTD">
<F_Perf>-0.22</F_Perf>
<B_Perf>-0.22</B_Perf>
</Period_Performance>
</Performance>
<Performance>
<id>222</id>
<Name>Item2</Name>
<Period_Performance Period="MTD">
<F_Perf>-0.23</F_Perf>
<B_Perf>-0.23</B_Perf>
</Period_Performance>
</Performance>
<Performance>
<id>333</id>
<Name>Item3</Name>
<Period_Performance Period="WTD">
<F_Perf>-0.31</F_Perf>
<B_Perf>-0.31</B_Perf>
</Period_Performance>
</Performance>
<Performance>
<id>333</id>
<Name>Item3</Name>
<Period_Performance Period="YTD">
<F_Perf>-0.32</F_Perf>
<B_Perf>-0.32</B_Perf>
</Period_Performance>
</Performance>
<Performance>
<id>333</id>
<Name>Item3</Name>
<Period_Performance Period="MTD">
<F_Perf>-0.33</F_Perf>
<B_Perf>-0.33</B_Perf>
</Period_Performance>
</Performance>
This XML that I need to produce is below:
<Performance>
<id>1</id>
<Name>Item1</Name>
<Period_Performance Period="WTD">
<F_Perf>-0.11</F_Perf>
<B_Perf>-0.11</B_Perf>
</Period_Performance>
<Period_Performance Period="YTD">
<F_Perf>-0.12</F_Perf>
<B_Perf>-0.12</B_Perf>
</Period_Performance>
<Period_Performance Period="MTD">
<F_Perf>-0.13</F_Perf>
<B_Perf>-0.13</B_Perf>
</Period_Performance>
</Performance>
<Performance>
<id>2</id>
<Name>Item2</Name>
<Period_Performance Period="WTD">
<F_Perf>-0.21</F_Perf>
<B_Perf>-0.21</B_Perf>
</Period_Performance>
<Period_Performance Period="YTD">
<F_Perf>-0.22</F_Perf>
<B_Perf>-0.22</B_Perf>
</Period_Performance>
<Period_Performance Period="MTD">
<F_Perf>-0.23</F_Perf>
<B_Perf>-0.23</B_Perf>
</Period_Performance>
</Performance>
<Performance>
<id>3</id>
<Name>Item3</Name>
<Period_Performance Period="WTD">
<F_Perf>-0.31</F_Perf>
<B_Perf>-0.31</B_Perf>
</Period_Performance>
<Period_Performance Period="YTD">
<F_Perf>-0.32</F_Perf>
<B_Perf>-0.32</B_Perf>
</Period_Performance>
<Period_Performance Period="MTD">
<F_Perf>-0.33</F_Perf>
<B_Perf>-0.33</B_Perf>
</Period_Performance>
</Performance>
Any help to create the desired XML you can give, is greatly appreciated.
Thanks

Resolved.
Thanks Marc. I have been able to solve this issue, but what I needed was one Performance block per id, with repeating Period_Performance (WTD, MTD, YTD etc.) blocks within the Performance block. You might notice that in the 'before' xml, there is one Period_Performance block per Performance block. Thanks anyway. B

Related

ASH: IN_PARSE and IN_SQL_EXECUTION simultaneously?

Can someone please point me towards documentation on the V$ACTIVE_SESSION_HISTORY columns IN_PARSE, IN_HARD_PARSE and IN_SQL_EXECUTION?
Version is 11.2.0.3
I have a TMPDLT-Query of an aggregate-mview fast-refresh (http://www.oaktable.net/content/fast-refresh-aggregate-only-materialized-views-%E2%80%93-introduction) that stalls other queries via a library cache lock for over 20 minutes 2 out of 7 days.
My question is: This TMPDLT-Query in ASH shows as IN_PARSE, IN_HARD_PARSE and IN_SQL_EXECUTION simultaneously for 20 minutes. According to SQL_PLAN_OPERATION and SQL_PLAN_OPTIONS it is doing a TABLE ACCESS FULL on the one MLOG-table it needs and according to file# it is in the UNDO table space.
My question: Have you ever seen sessions in ASH being IN_PARSE, IN_HARD_PARSE and IN_SQL_EXECUTION simultaneously? For 20 minutes? Is this normal? What does it mean? How can it full table scan already and still be in hard parse? Is this maybe related to the result_cache hint in the query? Or an effect of dynamic sampling?
Many thanks!
Update for BobC:
Plan of the query at the time the problem happened:
OTHER_XML as not fully shown above:
<other_xml>
<info type="db_version">11.2.0.3</info>
<info type="parse_schema">
<![CDATA["SYS"]]>
</info>
<info type="plan_hash">2866394291</info>
<info type="plan_hash_2">1823969956</info>
<outline_data>
<hint>
<![CDATA[IGNORE_OPTIM_EMBEDDED_HINTS]]>
</hint>
<hint>
<![CDATA[OPTIMIZER_FEATURES_ENABLE('11.2.0.3')]]>
</hint>
<hint>
<![CDATA[DB_VERSION('11.2.0.3')]]>
</hint>
<hint>
<![CDATA[OUTLINE_LEAF(#"SEL$335DD26A")]]>
</hint>
<hint>
<![CDATA[MERGE(#"SEL$3")]]>
</hint>
<hint>
<![CDATA[OUTLINE_LEAF(#"SEL$1")]]>
</hint>
<hint>
<![CDATA[OUTLINE_LEAF(#"SEL$ABDE6DFF")]]>
</hint>
<hint>
<![CDATA[MERGE(#"SEL$6")]]>
</hint>
<hint>
<![CDATA[OUTLINE_LEAF(#"SEL$4")]]>
</hint>
<hint>
<![CDATA[OUTLINE(#"SEL$2")]]>
</hint>
<hint>
<![CDATA[OUTLINE(#"SEL$3")]]>
</hint>
<hint>
<![CDATA[OUTLINE(#"SEL$5")]]>
</hint>
<hint>
<![CDATA[OUTLINE(#"SEL$6")]]>
</hint>
<hint>
<![CDATA[NO_ACCESS(#"SEL$4" "V4"#"SEL$4")]]>
</hint>
<hint>
<![CDATA[NO_ACCESS(#"SEL$ABDE6DFF" "DLT$"#"SEL$6")]]>
</hint>
<hint>
<![CDATA[NO_ACCESS(#"SEL$1" "MAS$"#"SEL$1")]]>
</hint>
<hint>
<![CDATA[FULL(#"SEL$335DD26A" "MAS$"#"SEL$3")]]>
</hint>
</outline_data>
SQL of the query (anonymized a bit):
WITH "TMPDLT$_XXXXXXXXXXXXXXXX" AS(
SELECT /*+ RESULT_CACHE(LIFETIME=SESSION) */
"MAS$"."RID$" "RID$",
"MAS$"."PARTITION_YYYY",
"MAS$"."ZZZZZZZZZ",
"MAS$"."KEY",
"MAS$"."VALUE",
decode("MAS$"."OLD_NEW$$",'N','I','D')"DML$$",
"MAS$"."OLD_NEW$$" "OLD_NEW$$",
"MAS$"."TIME$$" "TIME$$",
"MAS$"."DMLTYPE$$" "DMLTYPE$$"
FROM
(
SELECT
"MAS$".*,
MIN("MAS$"."SEQ$$")OVER(
PARTITION BY "MAS$"."RID$"
)"MINSEQ$$",
MAX("MAS$"."SEQ$$")OVER(
PARTITION BY "MAS$"."RID$"
)"MAXSEQ$$"
FROM
(
SELECT /*+ CARDINALITY(MAS$ 0) */
chartorowid("MAS$"."M_ROW$$")rid$,
"MAS$"."PARTITION_YYYY",
"MAS$"."ZZZZZZZZZ",
"MAS$"."KEY",
"MAS$"."VALUE",
decode("MAS$".old_new$$,'N','I','D')dml$$,
"MAS$"."DMLTYPE$$" "DMLTYPE$$",
"MAS$"."SEQUENCE$$" "SEQ$$",
"MAS$"."OLD_NEW$$" "OLD_NEW$$",
"MAS$"."SNAPTIME$$" "TIME$$"
FROM
"AAAAAAAAAAAAAA"."MLOG$_XXXXXXXXXXXXXXXX" "MAS$"
WHERE
"MAS$".snaptime$$ > :b_st0
)AS OF SNAPSHOT(:b_scn)"MAS$"
)"MAS$"
WHERE
((("MAS$"."OLD_NEW$$" = 'N')
AND("MAS$"."SEQ$$" = "MAS$"."MAXSEQ$$"))
OR(("MAS$"."OLD_NEW$$" IN(
'O',
'U'
))
AND("MAS$"."SEQ$$" = "MAS$"."MINSEQ$$")))
)
SELECT
CASE
WHEN ddt_1 = 'D'
AND ddt_2 = 'I' THEN
'U'
ELSE
ddt_1
END "DML$$",
MAX(mtime)"TIME$$"
FROM
(
SELECT
MIN(dd_type)OVER(
PARTITION BY rid,old_new
)ddt_1,
MAX(dd_type)OVER(
PARTITION BY rid,old_new
)ddt_2,
mtime
FROM
(
SELECT
"DLT$"."RID$" rid,
"DLT$"."DML$$" dd_type,
"DLT$"."TIME$$" mtime,
CASE
WHEN "DLT$"."DMLTYPE$$" = 'U'
AND "DLT$"."OLD_NEW$$" = 'N' THEN
'U'
ELSE
"DLT$"."OLD_NEW$$"
END old_new
FROM
"TMPDLT$_XXXXXXXXXXXXXXXX" "DLT$"
)v3
)v4
GROUP BY
CASE
WHEN ddt_1 = 'D'
AND ddt_2 = 'I' THEN
'U'
ELSE
ddt_1
END

Sort direct neighbor nodes (books) by attribute of 2nd degree neighbors (authors) for user book list?

By agheranimesh via Slack:
This is my graph, named LibraryGraph:
My graph query:
FOR v, e, p IN 1..2 OUTBOUND "User/001" GRAPH "LibraryGraph"
SORT p.vertices[2].Name
RETURN p.vertices[1]
It's not giving me result I want. I want a book list sorted by author name and books without author should come last (B2, B3, B1, B4, B5).
Script to re-create the data (arangosh --javascript.execute <file>):
db._createDatabase('Library')
db._useDatabase('Library')
const User = db._create('User')
const Book = db._create('Book')
const Author = db._create('Author')
const User_Book = db._createEdgeCollection('User_Book')
const Book_Author = db._createEdgeCollection('Book_Author')
User.save({ '_key': '001', 'UserName': 'U1' })
Book.save({ '_key': 'B1', 'Name': 'B1' })
Book.save({ '_key': 'B2', 'Name': 'B2' })
Book.save({ '_key': 'B3', 'Name': 'B3' })
Book.save({ '_key': 'B4', 'Name': 'B4' })
Book.save({ '_key': 'B5', 'Name': 'B5' })
Author.save({ '_key': 'A', 'Name': 'A' })
Author.save({ '_key': 'B', 'Name': 'B' })
Author.save({ '_key': 'X', 'Name': 'X' })
Author.save({ '_key': 'Y', 'Name': 'Y' })
Author.save({ '_key': 'Z', 'Name': 'Z' })
User_Book.save({ '_from': 'User/001', '_to': 'Book/B1' })
User_Book.save({ '_from': 'User/001', '_to': 'Book/B2' })
User_Book.save({ '_from': 'User/001', '_to': 'Book/B3' })
User_Book.save({ '_from': 'User/001', '_to': 'Book/B4' })
User_Book.save({ '_from': 'User/001', '_to': 'Book/B5' })
Book_Author.save({ '_from': 'Book/B2', '_to': 'Author/A' })
Book_Author.save({ '_from': 'Book/B3', '_to': 'Author/B' })
Book_Author.save({ '_from': 'Book/B1', '_to': 'Author/X' })
Book_Author.save({ '_from': 'Book/B1', '_to': 'Author/Y' })
Book_Author.save({ '_from': 'Book/B1', '_to': 'Author/Z' })
const graph_module = require('org/arangodb/general-graph')
const graph = graph_module._create('LibraryGraph')
graph._addVertexCollection('User')
graph._addVertexCollection('Book')
graph._addVertexCollection('Author')
graph._extendEdgeDefinitions(graph_module._relation('User_Book', ['User'], ['Book']))
graph._extendEdgeDefinitions(graph_module._relation('Book_Author', ['Book'], ['Author']))
Instead of a single traversal with variable depth (1..2) to cover both cases, books with and without authors, I suggest to use two traversals:
FOR book IN OUTBOUND "User/001" GRAPH "LibraryGraph"
LET author = FIRST(
FOR author IN OUTBOUND book._id GRAPH "LibraryGraph"
SORT author.Name
LIMIT 1
RETURN author.Name
) OR "\uFFFF"
SORT author
RETURN book
First we traverse from User/001 to the linked books. Then we do a second traversal from each book to the linked authors. This may return 0, 1 or multiple authors. The sub-query caps the result to the alphabetically first author (e.g. X out of X, Y, Z) and returns the name.
In the scope of the main query, we take the author name or fallback to a value that ends up last if sorted (null would end up first, which is not desired here). Then we sort the books by author name and return them:
Another way to achieve this result, yet harder to understand:
FOR v, e, p IN 1..2 OUTBOUND "User/001" GRAPH "LibraryGraph"
LET name = p.vertices[2].Name OR "\uFFFF"
COLLECT book = p.vertices[1] AGGREGATE author = MIN(name)
SORT author
RETURN book
The traversal returns paths with 2 or 3 vertices...
[0] [1] [2]
User/001 --> Book/B2
User/001 --> Book/B2 --> Author/A
User/001 --> Book/B3
User/001 --> Book/B3 --> Author/B
User/001 --> Book/B4
User/001 --> Book/B5
User/001 --> Book/B1
User/001 --> Book/B1 --> Author/Y
User/001 --> Book/B1 --> Author/X
User/001 --> Book/B1 --> Author/Z
The authors at index 2 (p.vertices[2]) or a fallback value is temporarily stored in a variable name. Then the book vertices are grouped together to eliminate duplicates (caused by the variable traversal depth, which returns e.g. 001-->B2 but also the longer path 001-->B2-->A).
Aggregation is used to pick the author name with the lowest value (MIN), which usually means the alphabetically first - it probably doesn't work correctly for some languages and character sets however, whereas SORT does sort correctly based on the rules of the set language (can only be one per DBMS instance).
The grouping result - distinct book documents - is sorted by author names and returned.

Handling duplicate records in PIG Latin

If there are duplicates in the file, the first record should go to valid file and remaining duplicate records should be moved to invalid file using a PIG script.
Below is the scenario.
Input:
Acc|Phone|Name
1234|333-444-5555|XYZ
4567|222-555-1111|ABC
1234|234-123-0000|DEF
9999|123-456-1890|PQR
8734|456-879-1234|QWE
4567|369-258-0147|NNN
1234|987-654-3210|BLS
output: Two files
1. Valid rec:
1234|333-444-5555|XYZ
4567|222-555-1111|ABC
9999|123-456-1890|PQR
8734|456-879-1234|QWE
2. Invalid rec:
1234|234-123-0000|DEF
4567|369-258-0147|NNN
1234|987-654-3210|BLS
Invalid records are not necessarily to be in same order. It can also be like this.
Invalid rec:
1234|234-123-0000|DEF
1234|987-654-3210|BLS
4567|369-258-0147|NNN
Scenario 2:
Input:
1234|333-444-5555|XYZ
4567|222-555-1111|ABC
1234|234-123-0000|DEF
9999|123-456-1890|PQR
8734|456-879-1234|QWE
4567|369-258-0147|NNN
1234|087-654-3210|BLS
1234|303-444-5555|XYZ
4567|122-555-1111|ABC
1234|134-123-0000|DEF
9999|123-456-1890|PQR
8734|456-879-1234|QWE
4567|069-258-0147|NNN
1234|086-654-3210|BLS
1234|033-444-5555|XYZ
4567|200-555-1111|ABC
1234|230-123-0000|DEF
9999|023-456-1890|PQR
8734|456-779-1234|QWE
4567|309-258-0147|NNN
1234|007-654-3210|BLS
Good Rec:
1234|333-444-5555|XYZ
4567|222-555-1111|ABC
9999|123-456-1890|PQR
8734|456-879-1234|QWE
Can anyone please suggest some idea. I'm only able to get the first record.
Thanks.
Can you try this?
input.txt
1234|333-444-5555|XYZ
4567|222-555-1111|ABC
1234|234-123-0000|DEF
9999|123-456-1890|PQR
8734|456-879-1234|QWE
4567|369-258-0147|NNN
1234|987-654-3210|BLS
PigScript:
A =LOAD 'input.txt' USING PigStorage('|') AS (Acc:chararray,Phone:chararray,Name:chararray);
B = RANK A;
C = GROUP B BY Acc;
D = FOREACH C {
sortInAsc = ORDER B BY rank_A ASC;
top1 = LIMIT sortInAsc 1;
GENERATE top1 AS goodRecord,SUBTRACT(B,top1) AS badRecord;
}
--Flatten the good records
E = FOREACH D GENERATE FLATTEN(goodRecord);
--Get the required columns and skip the rank column(ie,$0)
F = FOREACH E GENERATE $1,$2,$3;
STORE F INTO 'goodrecord' USING PigStorage('|');
--Flatten the bad records
G = FOREACH D GENERATE FLATTEN(badRecord);
--Get the required columns and skip the rank column(ie,$0)
H = FOREACH G GENERATE $1,$2,$3;
STORE H INTO 'badrecord' USING PigStorage('|');
goodrecord Output1:
1234|333-444-5555|XYZ
4567|222-555-1111|ABC
8734|456-879-1234|QWE
9999|123-456-1890|PQR
badrecord Output1:
1234|987-654-3210|BLS
1234|234-123-0000|DEF
4567|369-258-0147|NNN
Scenario2 goodrecord Output:
1234|333-444-5555|XYZ
4567|222-555-1111|ABC
8734|456-879-1234|QWE
9999|123-456-1890|PQR
Scenario2 badrecord Output:
1234|033-444-5555|XYZ
1234|007-654-3210|BLS
1234|230-123-0000|DEF
1234|303-444-5555|XYZ
1234|234-123-0000|DEF
1234|134-123-0000|DEF
1234|086-654-3210|BLS
1234|087-654-3210|BLS
4567|369-258-0147|NNN
4567|309-258-0147|NNN
4567|122-555-1111|ABC
4567|069-258-0147|NNN
4567|200-555-1111|ABC
8734|456-879-1234|QWE
8734|456-779-1234|QWE
9999|123-456-1890|PQR
9999|023-456-1890|PQR

Parsing XML to get the population of Albania?

I am trying to learn how to use Nokogiri and parse XML files, however I can't seem to get past this issue I am having.
I have this XML file with information about countries such as population, name, religion, inflation etc.:
<cia>
<continent id='europe'
name='Europe'/>
<continent id='asia'
name='Asia'/>
<continent id='northAmerica'
name='North America'/>
<continent id='australia'
name='Australia/Oceania'/>
<continent id='southAmerica'
name='South America'/>
<continent id='africa'
name='Africa'/>
<country id='cid-cia-Albania'
continent='Europe'
name='Albania'
datacode='AL'
total_area='28750'
population='3249136'
population_growth='1.34'
infant_mortality='49.2'
gdp_agri='55'
inflation='16'
gdp_total='4100'
indep_date='28 11 1912'
government='emerging democracy'
capital='Tirane'>
<ethnicgroups name='Greeks'>3</ethnicgroups>
<ethnicgroups name='Albanian'>95</ethnicgroups>
<religions name='Muslim'>70</religions>
<religions name='Roman Catholic'>10</religions>
<religions name='Albanian Orthodox'>20</religions>
<borders country='cid-cia-Greece'>282</borders>
<borders country='cid-cia-Macedonia'>151</borders>
<borders country='cid-cia-Serbia-and-Montenegro'>287</borders>
<coasts>Adriatic Sea</coasts>
<coasts>Ionian Sea</coasts>
<coasts>Serbia</coasts>
<coasts>Montenegro</coasts>
</country>
.
.
.
</cia>
I am trying to find a country by passing in the name of the country as an argument, and, from there, trying to get the population of the country, but I can't for some reason. Here is my method:
#doc = Nokogiri::XML(File.read(file)) # get the file from the initialize method
def get_population(country)
element = #doc.xpath("//country[#name='#{country}']")
end
So if I do:
get_population('Albania')
How can I get this method to get the population for Albania? Currently all I get is the XML for that country.
Thanks for all the help in advance!
Do as below
def get_population(country)
element = #doc.at_xpath("//country[#name='#{country}']/#population")
element.text
end
#doc.at_xpath("//country[#name='#{country}']/#population") will give you Nokogiri::XML::Attr instance.Now Nokogiri::XML::Attr inherits from Nokogiri::XML::Node. So you can use Nokogiri::XML::Node#text method, on the instance of Nokogiri::XML::Attr.
Using CSS selectors makes this very straight-forward:
require 'nokogiri'
xml = "<cia>
<continent id='europe'
name='Europe'/>
<continent id='asia'
name='Asia'/>
<continent id='northAmerica'
name='North America'/>
<continent id='australia'
name='Australia/Oceania'/>
<continent id='southAmerica'
name='South America'/>
<continent id='africa'
name='Africa'/>
<country id='cid-cia-Albania'
continent='Europe'
name='Albania'
datacode='AL'
total_area='28750'
population='3249136'
population_growth='1.34'
infant_mortality='49.2'
gdp_agri='55'
inflation='16'
gdp_total='4100'
indep_date='28 11 1912'
government='emerging democracy'
capital='Tirane'>
<ethnicgroups name='Greeks'>3</ethnicgroups>
<ethnicgroups name='Albanian'>95</ethnicgroups>
<religions name='Muslim'>70</religions>
<religions name='Roman Catholic'>10</religions>
<religions name='Albanian Orthodox'>20</religions>
<borders country='cid-cia-Greece'>282</borders>
<borders country='cid-cia-Macedonia'>151</borders>
<borders country='cid-cia-Serbia-and-Montenegro'>287</borders>
<coasts>Adriatic Sea</coasts>
<coasts>Ionian Sea</coasts>
<coasts>Serbia</coasts>
<coasts>Montenegro</coasts>
</country>
</cia>
"
Here's the gist of the code:
doc = Nokogiri::XML(xml)
doc.at('country[name="Albania"]')['population']
# => "3249136"

to extract data between two words for the first occurence of a xml file in unix

How to extract data between "so" and "again" ( the first occurence of test )
cat > sedtesting.txt
this is for testing
so test
AAgainn and again
this is for testing
so test
AAgainn and again
expected output is :
so test
AAgainn and again
but what i am getting is :
so test
AAgainn and again
so test
AAgainn and again
in the below sample code we need to extract data between "Exp_CDL_CONTRACT_D" and "Tracing Level"
below sample code
<TRANSFORMATION DESCRIPTION ="" NAME ="Exp_CDL_CONTRACT_D" OBJECTVERSION ="1" REUSABLE ="NO" TYPE ="Expression" VERSIONNUMBER ="15">
<TRANSFORMFIELD DATATYPE ="string" DEFAULTVALUE ="&apos;UNKNOWN&apos;" DESCRIPTION ="" EXPRESSION ="CONTRACT_NUM" EXPRESSIONTYPE ="GENERAL" NAME ="CONTRACT_NUM" PICTURETEXT ="" PORTTYPE ="INPUT/OUTPUT" PRECISION ="120" SCALE ="0"/>
<TRANSFORMFIELD DATATYPE ="string" DEFAULTVALUE ="-999" DESCRIPTION ="" EXPRESSION ="MASTER_AGREEMENT_NUM" EXPRESSIONTYPE ="GENERAL" NAME ="MASTER_AGREEMENT_NUM" PICTURETEXT ="" PORTTYPE ="INPUT/OUTPUT" PRECISION ="255" SCALE ="0"/>
<TRANSFORMFIELD DATATYPE ="string" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="DEAL_NUM" EXPRESSIONTYPE ="GENERAL" NAME ="DEAL_NUM" PICTURETEXT ="" PORTTYPE ="INPUT/OUTPUT" PRECISION ="50" SCALE ="0"/>
<TRANSFORMFIELD DATATYPE ="date/time" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="FUNDING_DT" EXPRESSIONTYPE ="GENERAL" NAME ="FUNDING_DT" PICTURETEXT ="" PORTTYPE ="INPUT/OUTPUT" PRECISION ="29" SCALE ="9"/>
<TRANSFORMFIELD DATATYPE ="date/time" DEFAULTVALUE ="TO_DATE(&apos;1/1/1900 00:00:00 &apos;,&apos;MM/DD/YYYY HH24:MI:SS&apos;)" DESCRIPTION ="" EXPRESSION ="BOOK_DT" EXPRESSIONTYPE ="GENERAL" NAME ="BOOK_DT" PICTURETEXT ="" PORTTYPE ="INPUT/OUTPUT" PRECISION ="29" SCALE ="9"/>
<TABLEATTRIBUTE NAME ="Tracing Level" VALUE ="Normal"/>
<TRANSFORMATION DESCRIPTION ="" NAME ="Exp_SEQ_CDL_CONTRACT_D" OBJECTVERSION ="1" REUSABLE ="NO" TYPE ="Expression" VERSIONNUMBER ="8">
<TRANSFORMFIELD DATATYPE ="decimal" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="V_CNT+1" EXPRESSIONTYPE ="GENERAL" NAME ="V_CNT" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="38" SCALE ="0"/>
<TRANSFORMFIELD DATATYPE ="decimal" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="IIF(V_CNT=1,:SP.GET_MAX_VALUE(&apos;CILDL.CDL_CONTRACT_D&apos;,&apos;CONTRACT_KEY&apos;),V_MAX)" EXPRESSIONTYPE ="GENERAL" NAME ="V_MAX" PICTURETEXT ="" PORTTYPE ="LOCAL VARIABLE" PRECISION ="38" SCALE ="0"/>
<TRANSFORMFIELD DATATYPE ="decimal" DEFAULTVALUE ="ERROR(&apos;transformation error&apos;)" DESCRIPTION ="" EXPRESSION ="V_CNT+V_MAX" EXPRESSIONTYPE ="GENERAL" NAME ="CONTRACT_KEY" PICTURETEXT ="" PORTTYPE ="OUTPUT" PRECISION ="38" SCALE ="0"/>
<TRANSFORMFIELD DATATYPE ="decimal" DEFAULTVALUE ="" DESCRIPTION ="" EXPRESSION ="Lkp_CONTRACT_KEY" EXPRESSIONTYPE ="GENERAL" NAME =
<TABLEATTRIBUTE NAME ="Tracing Level" VALUE ="Normal"/>
<INSTANCE DESCRIPTION ="" INSTANCEID ="16" NAME ="Exp_CDL_CONTRACT_D" REUSABLE ="NO" TRANSFORMATION_NAME ="Exp_CDL_CONTRACT_D" TRANSFORMATION_TYPE ="Expression" TYPE ="TRANSFORMATION"/>
<INSTANCE DESCRIPTION ="" INSTANCEID ="17" NAME ="Lkp_CDL_CONTRACT_D" REUSABLE ="NO" TRANSFORMATION_NAME ="Lkp_CDL_CONTRACT_D" TRANSFORMATION_TYPE ="Lookup Procedure" TYPE ="TRANSFORMATION"/>
<INSTANCE DESCRIPTION ="" INSTANCEID ="18" NAME ="Rtr_CDL_CONTRACT_D" REUSABLE ="NO" TRANSFORMATION_NAME ="Rtr_CDL_CONTRACT_D"
<MAPPINGVARIABLE DATATYPE ="date/time" DEFAULTVALUE ="" DESCRIPTION ="" ISEXPRESSIONVARIABLE ="NO" ISPARAM ="YES" NAME ="$$LAST_EXTRACT_DATE" PRECISION ="29" SCALE ="9" USERDEFINED ="YES"/>
</WORKFLOW>
</FOLDER>
</REPOSITORY>
</POWERMART>
Use awk:
awk -F 'Exp_CDL_CONTRACT_D|Tracing Level' '{print "\"Exp_CDL_CONTRACT_D" $2 "Tracing Level\""; exit}' RS= file.xml
OR grep -oP:
grep -oP '"Exp_CDL_CONTRACT_D[\s\S]*Tracing Level"' file.xml

Resources