sort_values() got multiple values for argument 'axis' - sorting

sort_values() got multiple values for argument 'axis'
I am trying to sort this series using sort_values
Item Per Capita GSDP (Rs.)
Andhra_avg 102803
Arunachal_avg 100745
Assam_avg 55650.6
Bihar_avg 30030.6
Chattisgarh_avg 82046.7
Gujrat_avg 128755
Himachal_avg 126650
Jharkhand_avg 56385
Jammu_avg 73422.8
Karnataka_avg 128959
Kerala_avg 139140
MP_avg 61122.2
Maharashtra_avg 133512
my code:
dfGDP_percapita.sort_values("Item", axis = 0, ascending = True, inplace = True, na_position ='first')
Expected result should give "Per Capita GSDP (Rs.)" in the decreasing order with Nan on top

Try changing your code to
dfGDP_percapita.sort_values("Item", inplace=True, na_position ='first')
Some of the arguments are by default - I'm not sure why the error occurs, but my code works like this.

Related

How to solve the error ' [not a vector ]'

I ran this code to find the norm of some fundamnetal units of a biqaudratic number field, but I faced following problem
for (q=5, 200, for(p=q+1, 200, if (isprime(p)==1 && isprime(q)==1 ,k1=bnfinit(y^2-2*p,1); k2=bnfinit(y^2-q,1); k3=bnfinit(y^2-2*p*q,1); ep1=k1[8][5][1]; ep2=k2[8][5][1]; ep3=k3[8][5][1]; normep1=nfeltnorm(k1,ep1); normep2=nfeltnorm(k2,ep2); normep3=nfeltnorm(k3,ep3); li=[[q,p], [normep1, normep2, normep3]]; lis4=concat(lis4,[li]))))
and it works for small p and q. However, when I ran that for p and q greater than 150, it gives the following error:
First, I didn't use the flag=1 for bnf, but after adding that, still I get the same error.
Please, do not use indexing like ...[8][5][1] to get the fundamental units (FU). It seems that bnfinit omits FU matrix for some p and q. Instead, use the member function fu to receive FU. Please, find the example below:
> [q, p] = [23, 109];
> k = bnfinit(y^2 - 2*p*q, 1);
> k[8][5]
[;]
> k[8][5][1] \\ you will get the error here trying to index the empty matrix.
...
incorrect type in _[_] OCcompo1 [not a vector] (t_MAT).
> k.fu
[Mod(-355285121749346859670064114879166870*y - 25157598731408198132266996072608016699, y^2 - 5014)]
> norm(k.fu[1])
1

With ruamel.yaml how can I conditionally convert flow maps to block maps based on line length?

I'm working on a ruamel.yaml (v0.17.4) based YAML reformatter (using the RoundTrip variant to preserve comments).
I want to allow a mix of block- and flow-style maps, but in some cases, I want to convert a flow-style map to use block-style.
In particular, if the flow-style map would be longer than the max line length^, I want to convert that to a block-style map instead of wrapping the line somewhere in the middle of the flow-style map.
^ By "max line length" I mean the best_width that I configure by setting something like yaml.width = 120 where yaml is a ruamel.yaml.YAML instance.
What should I extend to achieve this? The emitter is where the line-length gets calculated so wrapping can occur, but I suspect that is too late to convert between block- and flow-style. I'm also concerned about losing comments when I switch the styles. Here are some possible extension points, can you give me a pointer on where I'm most likely to have success with this?
Emitter.expect_flow_mapping() probably too late for converting flow->block
Serializer.serialize_node() probably too late as it consults node.flow_style
RoundTripRepresenter.represent_mapping() maybe? but this has no idea about line length
I could also walk the data before calling yaml.dump(), but this has no idea about line length.
So, where should I and where can I adjust the flow_style whether a flow-style map would trigger line wrapping?
What I think the most accurate approach is when you encounter a flow-style mapping in the dumping process is to first try to emit it to a buffer and then get the length of the buffer and if that combined with the column that you are in, actually emit block-style.
Any attempt to guesstimate the length of the output without actually trying to write that part of a tree is going to be hard, if not impossible to do without doing the actual emit. Among other things the dumping process actually dumps scalars and reads them back to make sure no quoting needs to be forced (e.g. when you dump a string that reads back like a date). It also handles single key-value pairs in a list in a special way ( [1, a: 42, 3] instead of the more verbose [1, {a: 42}, 3]. So a simple calculation of the length of the scalars that are the keys and values and separating comma, colon and spaces is not going to be precise.
A different approach is to dump your data with a large line width and parse the output and make a set of line numbers for which the line is too long according to the width that you actually want to use. After loading that output back you can walk over the data structure recursively, inspect the .lc attribute to determine the line number on which a flow style mapping (or sequence) started and if that line number is in the set you built beforehand change the mapping to block style. If you have nested flow-style collections, you might have to repeat this process.
If you run the following, the initial dumped value for quote will be on one line.
The change_to_block method as presented changes all mappings/sequences that are too long
that are on one line.
import sys
import ruamel.yaml
yaml_str = """\
movie: bladerunner
quote: {[Batty, Roy]: [
I have seen things you people wouldn't believe.,
Attack ships on fire off the shoulder of Orion.,
I watched C-beams glitter in the dark near the Tannhäuser Gate.,
]}
"""
class Blockify:
def __init__(self, width, only_first=False, verbose=0):
self._width = width
self._yaml = None
self._only_first = only_first
self._verbose = verbose
#property
def yaml(self):
if self._yaml is None:
self._yaml = y = ruamel.yaml.YAML(typ=['rt', 'string'])
y.preserve_quotes = True
y.width = 2**16
return self._yaml
def __call__(self, d):
pass_nr = 0
changed = [True]
while changed[0]:
changed[0] = False
try:
s = self.yaml.dumps(d)
except AttributeError:
print("use 'pip install ruamel.yaml.string' to install plugin that gives 'dumps' to string")
sys.exit(1)
if self._verbose > 1:
print(s)
too_long = set()
max_ll = -1
for line_nr, line in enumerate(s.splitlines()):
if len(line) > self._width:
too_long.add(line_nr)
if len(line) > max_ll:
max_ll = len(line)
if self._verbose > 0:
print(f'pass: {pass_nr}, lines: {sorted(too_long)}, longest: {max_ll}')
sys.stdout.flush()
new_d = self.yaml.load(s)
self.change_to_block(new_d, too_long, changed, only_first=self._only_first)
d = new_d
pass_nr += 1
return d, s
#staticmethod
def change_to_block(d, too_long, changed, only_first):
if isinstance(d, dict):
if d.fa.flow_style() and d.lc.line in too_long:
d.fa.set_block_style()
changed[0] = True
return # don't convert nested flow styles, might not be necessary
# don't change keys if any value is changed
for v in d.values():
Blockify.change_to_block(v, too_long, changed, only_first)
if only_first and changed[0]:
return
if changed[0]: # don't change keys if value has changed
return
for k in d:
Blockify.change_to_block(k, too_long, changed, only_first)
if only_first and changed[0]:
return
if isinstance(d, (list, tuple)):
if d.fa.flow_style() and d.lc.line in too_long:
d.fa.set_block_style()
changed[0] = True
return # don't convert nested flow styles, might not be necessary
for elem in d:
Blockify.change_to_block(elem, too_long, changed, only_first)
if only_first and changed[0]:
return
blockify = Blockify(96, verbose=2) # set verbose to 0, to suppress progress output
yaml = ruamel.yaml.YAML(typ=['rt', 'string'])
data = yaml.load(yaml_str)
blockified_data, string_output = blockify(data)
print('-'*32, 'result:', '-'*32)
print(string_output) # string_output has no final newline
which gives:
movie: bladerunner
quote: {[Batty, Roy]: [I have seen things you people wouldn't believe., Attack ships on fire off the shoulder of Orion., I watched C-beams glitter in the dark near the Tannhäuser Gate.]}
pass: 0, lines: [1], longest: 186
movie: bladerunner
quote:
[Batty, Roy]: [I have seen things you people wouldn't believe., Attack ships on fire off the shoulder of Orion., I watched C-beams glitter in the dark near the Tannhäuser Gate.]
pass: 1, lines: [2], longest: 179
movie: bladerunner
quote:
[Batty, Roy]:
- I have seen things you people wouldn't believe.
- Attack ships on fire off the shoulder of Orion.
- I watched C-beams glitter in the dark near the Tannhäuser Gate.
pass: 2, lines: [], longest: 67
-------------------------------- result: --------------------------------
movie: bladerunner
quote:
[Batty, Roy]:
- I have seen things you people wouldn't believe.
- Attack ships on fire off the shoulder of Orion.
- I watched C-beams glitter in the dark near the Tannhäuser Gate.
Please note that when using ruamel.yaml<0.18 the sequence [Batty, Roy] never will be in block style
because the tuple subclass CommentedKeySeq does never get a line number attached.

How to get the correlation matrix of a pyspark data frame? NEW 2020

I have the same question from this topic:
How to get the correlation matrix of a pyspark data frame?
"I have a big pyspark data frame. I want to get its correlation matrix. I know how to get it with a pandas data frame.But my data is too big to convert to pandas. So I need to get the result with pyspark data frame.I searched other similar questions, the answers don't work for me. Can any body help me? Thanks!"
df4 is my dataset, he has 9 columns and all of them are integers:
reference__YM_unix:integer
tenure_band:integer
cei_global_band:integer
x_band:integer
y_band:integer
limit_band:integer
spend_band:integer
transactions_band:integer
spend_total:integer
I have first done this step:
# convert to vector column first
vector_col = "corr_features"
assembler = VectorAssembler(inputCols=df4.columns, outputCol=vector_col)
df_vector = assembler.transform(df4).select(vector_col)
# get correlation matrix
matrix = Correlation.corr(df_vector, vector_col)
And got the following output:
(matrix.collect()[0]["pearson({})".format(vector_col)].values)
Out[33]: array([ 1. , 0.0760092 , 0.09051543, 0.07550633, -0.08058203,
-0.24106848, 0.08229602, -0.02975856, -0.03108094, 0.0760092 ,
1. , 0.14792512, -0.10744735, 0.29481762, -0.04490072,
-0.27454922, 0.23242408, 0.32051685, 0.09051543, 0.14792512,
1. , -0.03708623, 0.13719527, -0.01135489, 0.08706559,
0.24713638, 0.37453265, 0.07550633, -0.10744735, -0.03708623,
1. , -0.49640664, 0.01885793, 0.25877516, -0.05019079,
-0.13878844, -0.08058203, 0.29481762, 0.13719527, -0.49640664,
1. , 0.01080777, -0.42319841, 0.01229877, 0.16440178,
-0.24106848, -0.04490072, -0.01135489, 0.01885793, 0.01080777,
1. , 0.00523737, 0.01244241, 0.01811365, 0.08229602,
-0.27454922, 0.08706559, 0.25877516, -0.42319841, 0.00523737,
1. , 0.32888075, 0.21416322, -0.02975856, 0.23242408,
0.24713638, -0.05019079, 0.01229877, 0.01244241, 0.32888075,
1. , 0.53310864, -0.03108094, 0.32051685, 0.37453265,
-0.13878844, 0.16440178, 0.01811365, 0.21416322, 0.53310864,
1. ])
I've tried to insert this result on arrays or an excel file but it didnt work.
I did:
matrix2 = (matrix.collect()[0]["pearson({})".format(vector_col)])
Then I got the following error when I tried to display this info:
display(matrix2)
Exception: ML model display does not yet support model type <class 'pyspark.ml.linalg.DenseMatrix'>.
I was expecting to insert the name of the columns back from df4 but it didnt succeed, I've read that I need to use df4.columns but I have no idea how does it works.
Finally, I was expecting to print the following graph that I've seen from medium article
https://medium.com/towards-artificial-intelligence/feature-selection-and-dimensionality-reduction-using-covariance-matrix-plot-b4c7498abd07
But also it didn't work:
from sklearn.preprocessing import StandardScaler
stdsc = StandardScaler()
X_std = stdsc.fit_transform(df4.iloc[:,range(0,7)].values)
cov_mat =np.cov(X_std.T)
plt.figure(figsize=(10,10))
sns.set(font_scale=1.5)
hm = sns.heatmap(cov_mat,
cbar=True,
annot=True,
square=True,
fmt='.2f',
annot_kws={'size': 12},
cmap='coolwarm',
yticklabels=cols,
xticklabels=cols)
plt.title('Covariance matrix showing correlation coefficients', size = 18)
plt.tight_layout()
plt.show()
AttributeError: 'DataFrame' object has no attribute 'iloc'
I've tried to replace df4 to matrix2 and didn't work too
You can use the following to get the correlation matrix in a form you can manipulate:
matrix = matrix.toArray().tolist()
From there you can convert to a dataframe pd.DataFrame(matrix) which would allow you to plot the heatmap, or save to excel etc.

Ruby each block, does not return value

This following Code doesn't work, the Problem is inside of the calc_fitness method, the each block does not return an value, and i dont know why
# takes an array, splices them into groups of 2 and returns a sum of values read from the matrix
def calc_fitness(hypothesis_arr)
hypothesis_arr.each_slice(2).to_a.map{|v| $distances[v.first,v.last]}
end
def main
# filling the matrix, with random values
$distances = create_sym_matrix
p calc_fitness((0..99).to_a)
end
main
# => [67,67,67,67....67,] #these should not be the same, which means i the block alway returns the same value. Why?
This happens because each returns itself (an instance of Enumerator) and each_slice return nil.
https://ruby-doc.org/core/Enumerable.html#method-i-each_slice
You could try changing the each to a map
To simplify and get a visual of what was generated, I used:
(1..99).to_a.each_slice(2).map { |v| { first: v.first, last: v.last } }
=> [{:first=>1, :last=>2}, {:first=>3, :last=>4}, {:first=>5, :last=>6}, {:first=>7, :last=>8}, {:first=>9, :last=>10}, {:first=>11, :last=>12}, {:first=>13, :last=>14}, {:first=>15, :last=>16}, {:first=>17, :last=>18}, {:first=>19, :last=>20}, {:first=>21, :last=>22}, {:first=>23, :last=>24}, {:first=>25, :last=>26}, {:first=>27, :last=>28}, {:first=>29, :last=>30}, {:first=>31, :last=>32}, {:first=>33, :last=>34}, {:first=>35, :last=>36}, {:first=>37, :last=>38}, {:first=>39, :last=>40}, {:first=>41, :last=>42}, {:first=>43, :last=>44}, {:first=>45, :last=>46}, {:first=>47, :last=>48}, {:first=>49, :last=>50}, {:first=>51, :last=>52}, {:first=>53, :last=>54}, {:first=>55, :last=>56}, {:first=>57, :last=>58}, {:first=>59, :last=>60}, {:first=>61, :last=>62}, {:first=>63, :last=>64}, {:first=>65, :last=>66}, {:first=>67, :last=>68}, {:first=>69, :last=>70}, {:first=>71, :last=>72}, {:first=>73, :last=>74}, {:first=>75, :last=>76}, {:first=>77, :last=>78}, {:first=>79, :last=>80}, {:first=>81, :last=>82}, {:first=>83, :last=>84}, {:first=>85, :last=>86}, {:first=>87, :last=>88}, {:first=>89, :last=>90}, {:first=>91, :last=>92}, {:first=>93, :last=>94}, {:first=>95, :last=>96}, {:first=>97, :last=>98}, {:first=>99, :last=>99}]
Now at this point I'm not sure what that function is doing that is assigned to $distances. You might want to provide the code for that or give some more detail on what your are attempting to accomplish.

Problems with IndexedFaceSet in X3Dom + D3

I'm using D3 to markup X3Dom as in this example: http://bl.ocks.org/jbeuckm/5620882
I converted the example to use simple squares instead of boxes: http://bl.ocks.org/jbeuckm/5645205
In a later version, I started loading data and calling plotAxis and plotData from various callbacks. It works as expected if I draw boxes as in the first example:
shape.append("x3d:box");
But when I substitute my 2-triangle face set...
shape.append("x3d:indexedFaceSet")
.attr("coordIndex", "0 1 2 -1 2 3 0 -1")
.attr('solid', 'false')
.append("x3d:coordinate")
.attr("point", "-1 -1 0, 1 -1 0, 1 1 0, -1 1 0")
it doesn't work and I get this error:
Uncaught TypeError: Cannot call method 'getPoints' of null
x3dom.registerNodeType.defineClass.nodeChanged x3dom.js:3175
x3dom.NodeNameSpace.setupTree x3dom.js:1929
domEventListener.onNodeInserted x3dom.js:1296
append d3.v2.js:3701
d3_selectionPrototype.select d3.v2.js:3606
d3_selectionPrototype.append d3.v2.js:3707
plotData tran_3d.html:132
(anonymous function) tran_3d.html:240
st.Callbacks.f jquery-1.9.0.min.js:1
st.Callbacks.p.fireWith jquery-1.9.0.min.js:1
st.extend.Deferred.st.each.i.(anonymous function) jquery-1.9.0.min.js:1
(anonymous function) tran_3d.html:280
d3.json d3.v2.js:2950
ready d3.v2.js:2940
It looks like maybe the "node-inserted" code analyzes the shape before the <Coordinate> child has been added to the <IndexedFaceSet>. But, I'm not sure why the same append statement would work in one context and not another. Again, just appending an x3d:box works fine in my data-loading setup, but the x3d:indexedFaceSet throws the error.
A workaround is to build up the IndexedFaceSet node separately, then generate the markup string for the node, then use D3's selection.html(str) to append the node to the shape. In the code referenced above, "shape" is a selection of data-bound nodes, so the workaround is like this:
shape.each(function(d){
var newNode = "<indexedFaceSet coordIndex="..."><coordinate point="..."/></indexedFaceSet>";
d3.select(this).html(newNode);
});

Resources