XPath trying to get the third position - xpath

I want to get the authors on the third position. I am using //authors[3]. Is there any reason this way does not work? When i type //authors[1] i get the whole row.
<publications>
<publication>
<publication_name> Group agency: The possibility, design, and status of corporate agents </publication_name>
<authors> C List,P Pettit </authors>
<publisher> Oxford University Press </publisher>
<year> 2011 </year>
<citation> 598 </citation>
</publication>
<publication>
<publication_name> Aggregating sets of judgments: An impossibility result</publication_name>
<authors> C List, P Pettit </authors>
<publisher> Economics and Philosophy </publisher>
<volume> 18 </volume>
<number> 1 </number>
<pages> 89- 110 </pages>
<citation> 558 </citation>
<year> 2002 </year>e
</publication>
<publication>
<publication_name> Epistemic democracy: generalizing the Condorcet jury theorem</publication_name>
<authors> C List, RE Goodin </authors>
<publisher> Journal of Political Philosophy </publisher>
<volume> 9 </volume>
<number> 3 </number>
<pages> 277-306 </pages>
<citation> 409 </citation>
<year> 2001 </year>
</publication>
<publication>
<publication_name> Arrow’s theorem in judgment aggregation </publication_name>
<authors> F Dietrich, C List </authors>
<publisher> Social Choice and Welfare </publisher>
<volume> 29 </volume>
<number> 1 </number>
<pages> 19 - 33 </pages>
<citation> 220 </citation>
<year> 2007 </year>
</publication>
<publication>
<publication_name> Deliberation, single-peakedness, and the possibility of meaningful democracy: evidence from deliberative polls </publication_name>
<authors> C List, RC Luskin, JS Fishkin, I McLean </authors>
<publisher> Journal of Politics </publisher>
<volume> 75 </volume>
<number> 01 </number>
<pages> 80-95 </pages>
<citation> 143 </citation>
<year> 2013 </year>
</publication>
<publication>
<publication_name> Swarm intelligence: When uncertainty meets conflict </publication_name>
<authors> L Conradt, C List, TJ Roper </authors>
<publisher> The American Naturalist </publisher>
<volume> 182 </volume>
<number> 5 </number>
<pages> 592-610 </pages>
<citation> 10 </citation>
<year> 2013 </year>
</publication>
<publication>
<publication_name> Intradimensional Single-peakedness and the Multidimensional Arrow Problem </publication_name>
<authors> C List </authors>
<publisher> Theory and Decision </publisher>
<citation> 10 </citation>
<year> 2004 </year>
</publication>
<publication>
<publication_name> The methodology of political theory </publication_name>
<authors> C List, L Valentini </authors>
<publisher> The Oxford Handbook of Philosophical Methodology </publisher>
<citation> 8 </citation>
<year> 2016 </year>
</publication>
<publication>
<publication_name> Social choice theory and deliberative democracy: a response to Aldred </publication_name>
<authors> JS Dryzek, C List </authors>
<publisher> British Journal of Political Science </publisher>
<volume> 34 </volume>
<number> 4 </number>
<pages> 752-758 </pages>
<citation> 8 </citation>
<year> 2004 </year>
</publication>
<publication>
<publication_name> Episteme symposium on group agency: Replies to Gaus, Cariani, Sylvan, and Briggs </publication_name>
<authors> C List, P Pettit </authors>
<publisher> Episteme </publisher>
<volume> 9 </volume>
<number> 3 </number>
<pages> 293 </pages>
<citation> 5 </citation>
<year> 2012 </year>
</publication>
<publication>
<publication_name> Two intuitions about free will: Alternative possibilities and intentional endorsement </publication_name>
<authors> C List, W Rabinowicz </authors>
<publication> Philosophical Perspectives </publication>
<volume> 28 </volume>
<number> 1 </number>
<pages> 155-172 </pages>
<citation> 4 </citation>
<year> 2014 </year>
</publication>
<publication>
<publication_name> Reasons for (prior) belief in Bayesian epistemology </publication_name>
<authors> F Dietrich, C List </authors>
<publisher> Synthese </publisher>
<volume> 190 </volume>
<number> 5 </number>
<pages> 787-808 </pages>
<citation> 4 </citation>
<year> 2013 </year>
</publication>
<publication>
<publication_name> Freedom as independence </publication_name>
<authors> C List, L Valentini </authors>
<publisher> Ethics </publisher>
<volume> 126 </volume>
<number> 4 </number>
<pages> 1043-1074 </pages>
<citation> 3 </citation>
<year> 2016 </year>
</publication>
<publication>
<publication_name> Belief revision generalized: A joint characterization of Bayes' and Jeffrey's rules </publication_name>
<authors> F Dietrich, C List, R Bradley </authors>
<publisher> Journal of Economic Theory </publisher>
<volume> 162</volume>
<pages> 352–371 </pages>
<citation> 3 </citation>
<year> 2016 </year>
</publication>
<publication>
<publication_name>Which worlds are possible? A judgment aggregation problem</publication_name>
<authors>C List</authors>
<publisher>Journal of Philosophical Logic</publisher>
<volume>37</volume>
<number>1</number>
<pages>57-65</pages>
<citation>12</citation>
<year>2008</year>
</publication>
</publications>
I only want to get the author on the third position.
There is no result and i get an error saying: The XPath query returned no results. XPath scope:current file.

For Xpath 1.0 you can use next expression
//authors[substring-after(substring-after( . ,',' ),',')]
/substring-before(concat(normalize-space(substring-after(substring-after(.,',' ),',')),","),",")
The 1st part of Xpath selects authors tag having not less that 2 comma.
If you want to get the name of the third author not using a programming language, the second part will do that.
I've splitted the Xpath for readability

This XPath 2.0 expression,
//authors[count(tokenize(.,',')) > 2]/tokenize(.,',')[3]
will select the third author listed for each publication,
JS Fishkin
TJ Roper
R Bradley
for those author elements that have 3 or more authors (as identified via comma separation).
XPath 1.0 solution is left as an exercise for the reader.

In addition to what others have said, note that //authors[1] selects every authors element that is the first authors child of its parent, while (//authors)[1] selects the first authors element in the entire document, which is I think what you want.
(You talked of getting the "whole row". I have no idea what a "row" is in an XML context. If you want to communicate clearly, the first rule is to learn the technical vocabulary...)

You can select third author with this xpath
//authors[contains(text(),' C List, RE Goodin ')]

Related

Quanteda: display the actual difference between texts

I managed to calculate the difference between two texts with the cosine method. With the following:
library("quanteda")
dfmat <- corpus_subset(corpusnew) %>%
tokens(remove_punct = TRUE) %>%
tokens_remove(stopwords("portuguese")) %>%
dfm()
(tstat1 <- textstat_simil(dfmat, method = "cosine", margin = "documents"))
as.matrix(tstat1)
And I get the following matrix:
text1 text2 text3 text4 text5
text1 1.000 0.801 0.801 0.801 0.798
However, I would like to know the actual words that account for the difference and not by how much they differ or are alike. Is there a way?
Thanks
How about comparing tokens using setdiff()?
require(quanteda)
toks <- tokens(corpus(c("a b c d", "a e")))
toks
#> Tokens consisting of 2 documents.
#> text1 :
#> [1] "a" "b" "c" "d"
#>
#> text2 :
#> [1] "a" "e"
setdiff(toks[[1]], toks[[2]])
#> [1] "b" "c" "d"
setdiff(toks[[2]], toks[[1]])
#> [1] "e"
This question only has pairwise answers, since each computation of similarity occurs between a single pair of documents. It's also not entirely clear what output you want to see, so I'll take my best guess and demonstrate a few possibilities.
So if you wanted to the features most different between text1 and text2, for instance, you could slice the documents you want to compare from the dfm, and then change margin = "features" to get the similarity of the document across features.
library("quanteda")
#> Package version: 3.2.1
#> Unicode version: 13.0
#> ICU version: 69.1
#> Parallel computing: 10 of 10 threads used.
#> See https://quanteda.io for tutorials and examples.
dfmat <- tokens(data_corpus_inaugural[1:5], remove_punct = TRUE) %>%
tokens_remove(stopwords("en")) %>%
dfm()
library("quanteda.textstats")
sim <- textstat_simil(dfmat[1:2, ], margin = "features", method = "cosine")
Now we can examine the pairwise similarities (greatest and smallest) by converting the similarity matrix to a data.frame, and sorting it.
# most similar features
as.data.frame(sim) %>%
dplyr::arrange(desc(cosine)) %>%
dplyr::filter(cosine < 1) %>%
head(10)
#> feature1 feature2 cosine
#> 1 present may 0.9994801
#> 2 country may 0.9994801
#> 3 may government 0.9991681
#> 4 present citizens 0.9988681
#> 5 country citizens 0.9988681
#> 6 present people 0.9988681
#> 7 country people 0.9988681
#> 8 present united 0.9988681
#> 9 country united 0.9988681
#> 10 present government 0.9973337
# most different features
as.data.frame(sim) %>%
dplyr::arrange(cosine) %>%
head(10)
#> feature1 feature2 cosine
#> 1 government upon 0.1240347
#> 2 government chief 0.1240347
#> 3 government magistrate 0.1240347
#> 4 government proper 0.1240347
#> 5 government arrive 0.1240347
#> 6 government endeavor 0.1240347
#> 7 government express 0.1240347
#> 8 government high 0.1240347
#> 9 government sense 0.1240347
#> 10 government entertain 0.1240347
Created on 2022-03-08 by the reprex package (v2.0.1)
There are other ways to compare the words most different between documents, such as "keyness" - for instance quanteda.textstats::textstat_keyness() between text1 and text2, where the head and tail of the resulting data.frame will tell you the most dissimilar features.

Only show 3 or more Races - a grouping issue with DAX

Setting up the problem:
Here is my data:
Primary Key Car Type Race Day Gas Type Track or City Course Place
1 Audi 1/1/2017 unleaded track 1
2 Ford 1/1/2017 unleaded track 2
3 BMW 1/1/2017 unleaded track 3
4 Audi 1/2/2017 unleaded track 3
5 Ford 1/2/2017 unleaded track 2
6 BMW 1/2/2017 unleaded track 1
7 Audi 1/3/2017 unleaded track 2
8 Ford 1/3/2017 unleaded city 3
9 BMW 1/3/2017 unleaded city 1
10 Audi 1/4/2017 unleaded city 1
11 Ford 1/4/2017 unleaded city 3
12 BMW 1/4/2017 unleaded city 2
13 Audi 1/5/2017 unleaded city 1
14 Ford 1/5/2017 unleaded city 3
15 BMW 1/5/2017 unleaded city 2
16 Audi 1/6/2017 unleaded city 2
17 Ford 1/6/2017 unleaded city 3
18 BMW 1/6/2017 leaded city 1
19 Audi 1/7/2017 leaded city 3
20 Ford 1/7/2017 leaded city 1
21 BMW 1/7/2017 leaded city 2
22 Audi 1/8/2017 leaded city 3
23 Ford 1/8/2017 leaded city 1
24 BMW 1/8/2017 leaded city 2
25 Audi 1/9/2017 leaded city 2
26 Ford 1/9/2017 leaded city 1
27 BMW 1/9/2017 leaded city 3
28 Audi 1/10/2017 leaded track 3
29 Ford 1/10/2017 leaded track 2
30 BMW 1/10/2017 leaded track 1
31 Audi 1/11/2017 leaded track 2
32 Ford 1/11/2017 leaded track 1
33 BMW 1/11/2017 leaded track 3
34 Audi 1/12/2017 leaded track 1
35 Ford 1/12/2017 leaded track 3
36 BMW 1/12/2017 leaded track 2
I’m running into a grouping problem with a DAX formula. I will walk through the dashboard and then state the problem.
The dashboard is a collection of races by three different cars, Ford, Audi, and BMW.
The cars have had 12 races on two types of courses (City or Track) and the cars had two gas options (leaded or unleaded).
This is what the dashboard looks like with no slicers selected:
On the right hand side we see the count of races by car type, the Box and Whisker in the middle is showing race outcome.
So for example, when I selected ‘unleaded’ for gas we see Ford does not have any 1st place finishes with unleaded gas and normally finishes 3rd when it does have unleaded gas.
And we also see on the right hand side, Audi and Ford have performed in six races with unleaded gas, and BMW five races.
Starting to get into the problem:
I only want a car type to show in the Box and Whisker when that car type has had at least three races.
Here is an example:
In this example, the Box and Whisker graph is working exactly as I expect because BMW has less than three races and they do not show up on the Box and Whisker plot.
The Box and Whisker is running off the following formula:
Show when 3 total races = if(CALCULATE(DISTINCTCOUNT(cars[Races]), ALLEXCEPT(cars, cars[Car Type], cars[Gas Type], cars[Race Day], cars[Track or City Course])) > 2.5, sum(cars[Place]), blank())
HERE IS AN EXAMPLE OF THE ISSUE
Consider the following, there are four races for each car type:
Yet when I select ‘Ford’ in the slicer, I get the following
Even though there are four races by Ford, the Box and Whisker does not show. I expect it to show, because I know Ford has been in four races, even the table on the right has four listed. The only thing that has changed between the last two pictures is I have Ford selected as a slicer.
I want to show any combination of the four slicers to show in the Box and Whisker plot if the count of races is 3 or above.
Does anyone have any insight on this issue?
It is because the BoxWhiskerChart has a different evaluation context than the count of races table that you have set up.
In order to filter which car type to display in the chart, you can set up a measure to count the number of selected races:
Number of Races = CALCULATE(COUNT(Cars[Races]), ALLSELECTED(Cars[Races]))
And add it to the visual level filter:
The results should be as expected:

Obtain a different result when evaluating Stanford NLP sentiment

I downloaded Stanford NLP 3.5.2 and run sentiment analysis with default configuration (i.e. I did not change anything, just unzip and run).
java -cp "*" edu.stanford.nlp.sentiment.Evaluate -model edu/stanford/nlp/models/sentiment/sentiment.ser.gz -treebank test.txt
EVALUATION SUMMARY
Tested 82600 labels
66258 correct
16342 incorrect
0.802155 accuracy
Tested 2210 roots
976 correct
1234 incorrect
0.441629 accuracy
Label confusion matrix
Guess/Gold 0 1 2 3 4 Marg. (Guess)
0 323 161 27 3 3 517
1 1294 5498 2245 652 148 9837
2 292 2993 51972 2868 282 58407
3 99 602 2283 7247 2140 12371
4 0 1 21 228 1218 1468
Marg. (Gold) 2008 9255 56548 10998 3791
0 prec=0.62476, recall=0.16086, spec=0.99759, f1=0.25584
1 prec=0.55891, recall=0.59406, spec=0.94084, f1=0.57595
2 prec=0.88982, recall=0.91908, spec=0.75299, f1=0.90421
3 prec=0.58581, recall=0.65894, spec=0.92844, f1=0.62022
4 prec=0.8297, recall=0.32129, spec=0.99683, f1=0.46321
Root label confusion matrix
Guess/Gold 0 1 2 3 4 Marg. (Guess)
0 44 39 9 0 0 92
1 193 451 190 131 36 1001
2 23 62 82 30 8 205
3 19 81 101 299 255 755
4 0 0 7 50 100 157
Marg. (Gold) 279 633 389 510 399
0 prec=0.47826, recall=0.15771, spec=0.97514, f1=0.2372
1 prec=0.45055, recall=0.71248, spec=0.65124, f1=0.55202
2 prec=0.4, recall=0.2108, spec=0.93245, f1=0.27609
3 prec=0.39603, recall=0.58627, spec=0.73176, f1=0.47273
4 prec=0.63694, recall=0.25063, spec=0.96853, f1=0.35971
Approximate Negative label accuracy: 0.646009
Approximate Positive label accuracy: 0.732504
Combined approximate label accuracy: 0.695110
Approximate Negative root label accuracy: 0.797149
Approximate Positive root label accuracy: 0.774477
Combined approximate root label accuracy: 0.785832
The test.txt file is downloaded from http://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip (contains train.txt, dev.txt and test.txt). The download link is get from http://nlp.stanford.edu/sentiment/code.html
However, in the paper "Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y. and Potts, C., 2013, October. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP) (Vol. 1631, p. 1642)." which sentiment analysis tool is based on, the authors reported that the accuracy when classify 5 classes is 0.807.
Is my results I obtained normal?
I get the same results when I run it out of the box. It would not surprise me if the version of their system they made for Stanford CoreNLP differs slightly from the version in the paper.

DAX AverageX where table dimension is reduced by one

I'm trying to find the right way to structure a DAX formula to compute a specific average. I think I might be able to construct the average more or less explicitly by using a sum/count construction, but I'm wondering if averagex with an appropriate set of table filters might get the job done.
Specifically, my problem can be explained like this: I'm trying to compute the average cost of a car in DAX, but my data includes the cost of all the components individually (call it body, wheels and engine for now).
Name Year Part Cost
Alice 2000 Engine $10
Alice 2000 Wheels $5
Alice 2000 Body $25
Alice 2001 Engine $8
Alice 2001 Wheels $6
Alice 2001 Body $2
Bob 2000 Engine $10
Bob 2000 Wheels $5
Bob 2000 Body $25
Bob 2001 Engine $8
Bob 2001 Wheels $6
Bob 2001 Body $2
Is there any way to tell DAX that I want to first sum across all the components of the car first, and then compute averages on the data set where the dimensionality of the data has been reduced by one (only the "part" dimension removed)?
For example, the average cost for Alice then would yield
((10+5+25)+(8+6+2))/2 = 28
While if I had a pivot table constructed per name and per year, it would show
Alice 2000 40
Alice 2001 16
etc...
Thanks.
Try this... it works in the case where Name,Year provides a unique combination.
[nCombinations]:=COUNTROWS(SUMMARIZE(Table1,Table1[Name],Table1[Year]))
[TotalCost]:=SUM(Table1[Cost])
[AverageCost]:=CALCULATE([TotalCost]/[nCombinations])
Create a PivotTable with [Name] and [Year] on rows,
Then add [nCombinations] [TotalCost] and [AverageCost] in the body.
Row nCombinations TotalCost AverageCost
Alice 2 56 28
2000 1 40 40
2001 1 16 16
Bob 2 56 28
2000 1 40 40
2001 1 16 16
Grand Total 4 112 28

Oracle7:Merge data are same in many records to one record

I would like to merge data are same in many records to one record.
From
FO LINE FLOOR COLOR SUM
S4714EH02 EH 11F AK 9
S4714EH02 EH 11F AK 18
S4714EH02 EH 11F FE 9
S4714EH02 EH 11F FE 18
S4714EH02 EH 12F AK 9
S4714EH02 EH 12F AK 18
S4714EH02 EH 12F FE 9
S4714EH02 EH 12F FE 18
To
FO LINE FLOOR COLOR SUM
S4714EH02 EH 11F AK 9,18
S4714EH02 EH 11F FE 9,18
S4714EH02 EH 12F AK 9,18
S4714EH02 EH 12F FE 9,18
I know it can in sql server 2008 but I don't know it can make in oracle7 .
Please help me. Thank you.
Oracle 7 is a fine release of that database. it introduced many new features, it performed well and obviously it remains exceedingly stable. But it is long in the tooth and lacks many features available to us in more recent versions of the product.
For instance, all of the normal techniques we can use to aggregate values into a list only work in 9i or higher. (Some may work in 8i, my mind is a little fuzzy here as it's been almost a decade since I worked with Oracle that old)
So do you have any options in Oracle 7? The only one I can think of is to run a stored procedure as part of a reporting pre-process. This stored procedure would loop round the rows you want to query, assemble rows which matched your desired output and then insert them into a different table. This table would then service the actual query.
This is an extremely clunky workaround, and may not be viable in your situation. But alas that is the cost of using legacy software.
As already very well said by APC, this version is really old and lacks all kinds of functions to do string aggregation. I have worked with version 7 in the previous millenium though, and I think the next sequence should work in Oracle7. I could be wrong though, but obviously I can't check it.
SQL> create table t (fo,line,floor,color,sum)
2 as
3 select 'S4714EH02', 'EH', '11F', 'AK', 9 from dual union all
4 select 'S4714EH02', 'EH', '11F', 'AK', 18 from dual union all
5 select 'S4714EH02', 'EH', '11F', 'FE', 9 from dual union all
6 select 'S4714EH02', 'EH', '11F', 'FE', 18 from dual union all
7 select 'S4714EH02', 'EH', '12F', 'AK', 9 from dual union all
8 select 'S4714EH02', 'EH', '12F', 'AK', 18 from dual union all
9 select 'S4714EH02', 'EH', '12F', 'FE', 9 from dual union all
10 select 'S4714EH02', 'EH', '12F', 'FE', 18 from dual
11 /
Table created.
SQL> create function f
2 ( p_fo in t.fo%type
3 , p_line in t.line%type
4 , p_floor in t.floor%type
5 , p_color in t.color%type
6 ) return varchar2
7 is
8 cursor c
9 is
10 select t.sum
11 from t
12 where t.fo = p_fo
13 and t.line = p_line
14 and t.floor = p_floor
15 and t.color = p_color
16 order by t.sum
17 ;
18 l_concatenated_sum varchar2(2000);
19 begin
20 for r in c
21 loop
22 l_concatenated_sum := l_concatenated_sum || ',' || to_char(r.sum);
23 end loop;
24 return substr(l_concatenated_sum,2);
25 end f;
26 /
Function created.
SQL> select fo
2 , line
3 , floor
4 , color
5 , f(fo,line,floor,color) sum
6 from t
7 group by fo
8 , line
9 , floor
10 , color
11 /
FO LI FLO CO SUM
--------- -- --- -- --------------------
S4714EH02 EH 11F AK 9,18
S4714EH02 EH 11F FE 9,18
S4714EH02 EH 12F AK 9,18
S4714EH02 EH 12F FE 9,18
4 rows selected.
Regards,
Rob.
In the special case where you have only two records per distinct key -- as shown by your sample data -- you could do this:
SELECT fo, line, floor, color, MIN(sum) || ',' || MAX(sum)
FROM theTable
GROUP BY fo, line, floor, color;
But this can't be generalized to handle more than two values of sum per key.

Resources