Compare two xml's using XMLUnit bypassing the order of elements - xmlunit

I am writing a comparison util which lets me compare similarity of two xmls without considering the order. Am using xmlunit 2.4.0
org.xmlunit.diff.Diff diff = DiffBuilder.compare(xml1)
.withTest(xml2)
.checkForSimilar()
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byNameAndText))
.build();
So with this, the below two xmls gets compared successfully
xml1:
<multistatus>
<flowers>
<flower>Roses</flower>
<flower>Daisy</flower>
</flowers>
<flowers>
<flower>Roses</flower>
<flower>Daisy</flower>
</flowers>
</multistatus>
xml2:
<multistatus>
<flowers>
<flower>Roses</flower>
<flower>Daisy</flower>
</flowers>
<flowers>
<flower>Daisy</flower>
<flower>Roses</flower>
</flowers>
</multistatus>
However this fails when i give the below input:
xml1:
<multistatus>
<flowers>
<flower>Roses</flower>
</flowers>
<flowers>
<flower>Daisy</flower>
</flowers>
</multistatus>
xml2:
<multistatus>
<flowers>
<flower>Daisy</flower>
</flowers>
<flowers>
<flower>Roses</flower>
</flowers>
</multistatus>
I tried creating ElementSelector and even that is not helping.
ElementSelector selector = ElementSelectors.conditionalBuilder()
.whenElementIsNamed("flowers").thenUse(ElementSelectors.byXPath("./flowers/flower", ElementSelectors.byNameAndText))
.elseUse(ElementSelectors.byName)
.build();
org.xmlunit.diff.Diff diff = DiffBuilder.compare(refSource)
.withTest(testSource)
.checkForSimilar()
.ignoreWhitespace()
.normalizeWhitespace()
.withNodeMatcher(
new DefaultNodeMatcher(
selector,ElementSelectors.Default)
)
.build();

Your XPath doesn't match anything.
The context "." is the node you've selected with whenElementIsNamed so it is the respective "flowers" element.
You probably mean "./flower" - and it doesn't find any differences in your example.

Related

Include an interactive plot (wordcloud) in a dashboard

I created an interactive dashboard using hvPlot .interactive and Panel:
template = pn.template.FastListTemplate(
title='Central Africa - Word Frequency Analysis',
sidebar=['Frequency' , yaxis],
main=[ihvplot.panel(), ipanel.panel()],
accent_base_color="#88d8b0",
header_background="#88d8b0",
sidebar_width=450,
theme='dark',
)
template.show()
Where ivplot and ipanel are my interactive plot and table, respectively. However, I would like to put instead of the table, an interactive wordcloud which I created using ipywidgets:
list1 = ['Corbeau News','Journal de Bangui','Le Potentiel','Ndjoni Sango','RJDH','Radio
Lengo Songo','Radio Ndeke Luka']
def makingclouds(Category,frame,col,atitle):
wordcloud_bangui = WordCloud(stopwords= stopword , width=1600 , height=800 ,
background_color="black",
colormap="Set2").generate(''.join(data_file_1['Content']))
plt.figure(figsize=(20,10),facecolor='k')
plt.title(atitle, fontsize=40)#,fontweight="bold")
plt.imshow(wordcloud_bangui, interpolation="bilinear")
plt.axis("off")
plt.tight_layout(pad=0)
wordcloud = interact(makingclouds, Category=list1, df=fixed(data_file_1),
col=fixed('Content'),
atitle=fixed('Most used words media - Central Africa'),
frame=fixed(data_file_1[['Source','Content']]))
My question is how can I do it. I've tried simply to put 'wordcloud' instead of itable in the 4th line of the first piece of code but it tells me that the object does not have a function panel.
What can I do?

regex pattern not working in pyspark after applying the logic

I have data as below:
>>> df1.show()
+-----------------+--------------------+
| corruptNames| standardNames|
+-----------------+--------------------+
|Sid is (Good boy)| Sid is Good Boy|
| New York Life| New York Life In...|
+-----------------+--------------------+
So, as per above data I need to apply regex,create a new column and get the data as in the second column i.e standardNames. I tried below code:
spark.sql("select *, case when corruptNames rlike '[^a-zA-Z ()]+(?![^(]*))' or corruptNames rlike 'standardNames' then standardNames else 0 end as standard from temp1").show()
It throws below error:
pyspark.sql.utils.AnalysisException: "cannot resolve '`standardNames`' given input columns: [temp1.corruptNames, temp1. standardNames];
Try this example without select sql. I am assuming you want to create a new column called standardNames based on corruptNames if the regex pattern is true, otherwise "do something else...".
Note: Your pattern won't compile because you need to escape the second last ) with \.
pattern = '[^a-zA-Z ()]+(?![^(]*))' #this won't compile
pattern = r'[^a-zA-Z ()]+(?![^(]*\))' #this will
Code
import pyspark.sql.functions as F
df_text = spark.createDataFrame([('Sid is (Good boy)',),('New York Life',)], ('corruptNames',))
pattern = r'[^a-zA-Z ()]+(?![^(]*\))'
df = (df_text.withColumn('standardNames', F.when(F.col('corruptNames').rlike(pattern), F.col('corruptNames'))
.otherwise('Do something else'))
.show()
)
df.show()
#+-----------------+---------------------+
#| corruptNames| standardNames|
#+-----------------+---------------------+
#|Sid is (Good boy)| Do something else|
#| New York Life| Do something else|
#+-----------------+---------------------+

VBS - Find SCCM Collection Id's, Names and Package Names a computer is apart of?

Currently I have 'excel SQL query' for subject requirement and it is executing fine without any issues.
Current Excel SQL query Script: I have created SQL query which is connect to SCCM server and get the below details.
Excel 2013/2016-> Data -> Connections-> Workbook Connections->
Excel SQL Query
Connection String:
Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=True;Initial Catalog=CM_CAS;Data Source=<SCCMServerIP>;Use Procedure for Prepare=1;Auto Translate=True;Packet Size=4096;Workstation ID=<SCCMServerHostName>;Use Encryption for Data=False;Tag with column collation when possible=False
Command Type: SQL
Command Text:
DECLARE #machname varchar(max)
SET #machname = '<DeviceHostName>'
select CollectionID,CollectionName,packagename,#machname as MachineName from v_AdvertisementInfo
-- select AssignmentID,AssignmentName,CollectionID,CollectionName,ApplicationName,AppModelID from v_ApplicationAssignment
where CollectionID in
(select FCM.CollectionId from dbo.v_R_System r join
dbo.v_FullCollectionMembership FCM on R.ResourceID = FCM.ResourceID join
dbo.v_Collection C on C.CollectionID = FCM.CollectionID Where R.Name0
= #machname) and
ProgramName not like '%Remove%'and
ProgramName not like '%Un-Install%'and
CollectionName not like '%Test%' and
CollectionName not like '%temp%'
union
select CollectionID,CollectionName,ApplicationName,#machname as MachineName from v_ApplicationAssignment
where CollectionID in
(select FCM.CollectionId from dbo.v_R_System r join
dbo.v_FullCollectionMembership FCM on R.ResourceID = FCM.ResourceID join
dbo.v_Collection C on C.CollectionID = FCM.CollectionID Where R.Name0
= #machname) and
CollectionName not like '%Test%' and
CollectionName not like '%temp%'
+++++++++++++++++++++++++++++++++++++++++++++
And example Excel output will be as below for
hostname 'IN-00001236'
Result/Output
Requirement is : for Laptop EOL refresh
Gather all collection id's which is member of IN-00001236 (EOL Device) and add new device host name (IN-1111) into result collection id's! then delete IN-00001236 from SCCM. Generate report in excel/CSV?
I have a challenge In above Excel SQL Query? like I have add/replace hostname every time/run under 'Command Text'
So, I need to automate this complete process using VBS or PS?
If I understand you correctly you want a new record to have all memberships of the old record.
In my opinion you can focus only on collection memberships, whether the deployment to a collection is a package, application or update should not be important as long as the collections are the same.
So first you would need to translate your devices name to a resourceid as those are the primary key in all sccm tables:
$WS = 'IN-00001236'
$resourceID = (Get-WmiObject -Class "SMS_R_System" -Filter "(Name = '$WS')" | Select-Object -last 1).ResourceID
once you have that you can get all the collection members
$collectionMembers = (Get-WmiObject -Class "SMS_FullCollectionMembership" -Filter "(ResourceID = '$resourceID')" | where { $_.IsDirect -eq $true }).CollectionID
to delete a record use this
[void](Get-WmiObject -Class "SMS_R_System" -Filter "(ResourceID = '$resourceID')").Delete()
now you have to add those to the new record (first get resourceID again)
Foreach($CollID in $collectionMembers) {
$CollectionQuery = Get-WmiObject -Class "SMS_Collection" -Filter "CollectionID = '$CollID'"
$directRule = (Get-WmiObject -List -Class "SMS_CollectionRuleDirect").CreateInstance()
$directRule.ResourceClassName = "SMS_R_System"
$directRule.ResourceID = $newResourceID
[void]$CollectionQuery.AddMemberShipRule($directRule)
}
If you want to use this for multiple devices in a loop you can just create a csv file with oldrecord,newrecord
import with
$records = Import-Csv <path to csv> -Header old,new
and then use a loop outside all the code like
foreach ($record in $records) {
# inside here use $record.old or $record.new for the old/new names like
# $WS = $record.old
}
Now i don't know if you even would need the excel file anymore if you just have a script doing the work, but in case you really do you can create a excel readable csv from powershell with export-csv (might need options like -Delimiter ';' -NoTypeInformation -Encoding UTF8 to be easily readable)
In theory you can execute your whole sql query in powershell as well
$sqlText = "your query"
$SQLConnection = new-object System.Data.SqlClient.SQLConnection("Server=<sccmserver>;Database=<sccm db>;Integrated Security=SSPI")
$SQLConnection.Open();
$cmd = new-object System.Data.SqlClient.SqlCommand($sqlText, $SQLConnection);
$reader = $cmd.ExecuteReader()
while ($reader.Read()) {
$reader["<columnname>"]
}
and recreate your file in the ps script
Finally I completed VBS as below. it's working fine.
Dim connect, sql, resultSet, pth, txt
Set ObjFSO = CreateObject("Scripting.FileSystemObject")
Set connect = CreateObject("ADODB.Connection")
connect.ConnectionString = "Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=True;Initial Catalog=CM_CAS;Data Source=<SCCMServerIP>;Use Procedure for Prepare=1;Auto Translate=True;Packet Size=4096;Workstation ID=<SCCMServerHostName>;Use Encryption for Data=False;Tag with column collation when possible=False;Trusted_Connection=True;"
connect.Open
sql="select CollectionID,CollectionName,packagename,<HostName> as MachineName from v_AdvertisementInfo where CollectionID in (select FCM.CollectionId from dbo.v_R_System r join dbo.v_FullCollectionMembership FCM on R.ResourceID = FCM.ResourceID join dbo.v_Collection C on C.CollectionID = FCM.CollectionID Where R.Name0 = 'HostName') and ProgramName not like '%Remove%'and ProgramName not like '%Un-Install%'and CollectionName not like '%Test%' and CollectionName not like '%temp%' union select CollectionID,CollectionName,ApplicationName,#machname as MachineName from v_ApplicationAssignment where CollectionID in (select FCM.CollectionId from dbo.v_R_System r join dbo.v_FullCollectionMembership FCM on R.ResourceID = FCM.ResourceID join dbo.v_Collection C on C.CollectionID = FCM.CollectionID Where R.Name0 = 'HostName') and CollectionName not like '%Test%' and CollectionName not like '%temp%'"
Set resultSet = connect.Execute(sql)
pth = "C:\test\test.csv"
Set txt = ObjFSO.CreateTextFile(pth, True)
On Error Resume Next
resultSet.MoveFirst
Do While Not resultSet.eof
txt.WriteLine(resultSet(0) & "," & resultSet(1) & "," & resultSet(2))
resultSet.MoveNext
Loop
resultSet.Close
connect.Close
Set connect = Nothing

How to train tfidfvectorizer for new dataset

I am doing document classification using tfidfvectorizer and LinearSVC. I need to train tfidfvectorizer again and again as new dataset comes. Is there any way to store current tfidfvectorizer and mix new features when new dataset comes.
Code :
if os.path.exists("trans.pkl"):
with open("trans.pkl", "rb") as fid:
transformer = cPickle.load(fid)
else:
transformer = TfidfVectorizer(sublinear_tf=True, max_df=0.5,stop_words = 'english')
with open("trans.pkl", "wb") as fid:
cPickle.dump(transformer, fid)
X_train = transformer.fit_transform(train_data)
X_test = transformer.transform(test_data)
print X_train.shape[1]
if os.path.exists("store_model.pkl"):
print "model exists"
with open("store_model.pkl","rb") as fid:
classifier = cPickle.load(fid)
print classifier
else:
print "model created"
classifier = LinearSVC().fit(X_train, train_target)
with open("store_model.pkl","wb") as fid:
cPickle.dump(classifier,fid)
predictions = classifier.predict(X_test)
I have 2 diff train files and 1 test file. I executed code for 1st train file,then it works well. But when I try for 2nd train file,no of features are different than 1st so it gives error. How can I train my model if I have multiple such dataset files.

Is MERGE supported in Greenplum Database 4.3.5.1 build 1

I am trying to create a merge statement for Greenplum DB and I am getting an syntax error. So I am wondering if the MERGE is even supported the way I am writing it.
I have two approaches
Approach 1-
MERGE into public.table20 pritab
USING
(
select stgout.key1, stgout.key2, stgout.col1
from public.table20_stage stgout
where stgout.sequence_id < 1000
) as stgtab
ON (pritab.key1 = stgtab.key1
and pritab.key2 = stgtab.key2)
WHEN MATCHED THEN
UPDATE SET pritab.key1 = stgtab.key1
,pritab.key2 = stgtab.key2
,pritab.col1 = stgtab.col1
WHEN NOT MATCHED THEN
INSERT (key1, key2, col1)
values (stgtab.key1, stgtab.key2, stgtab.col1);
Approach 2:
public.table20 pritab
SET pritab.key1 = stgtab.key1
,pritab.key2 = stgtab.key2
,pritab.col1 = stgtab.col1
from
(
select stgout.key1, stgout.key2, stgout.col1
from public.table20_stage stgout
where stgout.sequence_id < 1000
) as stgtab
ON (pritab.key1 = stgtab.key1
and pritab.key2 = stgtab.key2)
returning (stgtab.key1, stgtab.key2, stgtab.col1);
Is there any other way or something is wrong with my syntax itself?
Merge is not supported in Greenplum but I wrote a blog post on how to achieve the results of a merge statement in Greenplum.
http://www.pivotalguru.com/?p=104

Resources