to_crs("epsg:4326") retruns different coordinate - geopandas

I am trying to change coordinates system on my geopandas dataframe from epsg:5179 to epsg:4236.
BUT, .to_crs("epsg:4326") retruns different coordinates... How can I get true cordinates?
geo[geometry].set_crs("epsg:5179", inplace = True)
geo_df = geo[geometry].to_crs("epsg:4326")
Original
LINESTRING (14138122.900 4519000.200, 14138248...LINESTRING (14135761.800 4518881.600, 14135799...
Changed-proj
LINESTRING (-149.90927 12.31701, -149.90912 12...LINESTRING (-149.91219 12.32162, -149.91215 12...

It seems like you got true coordinates with your code which is :
geo[geometry].set_crs("epsg:5179", inplace = True)
geo_df = geo[geometry].to_crs("epsg:4326")
I've been looking through pyproj, and couldn't find error to change coordinates epsg:5179 to epsg:4326.
If you want to get futher more information about cordinates, you can visit here.

Related

Display images on Tensorboard to have the input, the ground truth and the prediction side by side

I'm working on a deep learning model and I would like to be able to display images on Tensorboard to have the input, the ground truth and the prediction side by side.
Currently, the display look like this :
current display
But this visualization is not convenient, because it's not easy to compare the ground truth with the prediction if images are not side by side, and we have to scroll to pass from the ground truth to the prediction (because images are too big and we display more than 6 images).
The current code :
for epoch in range(EPOCHS):
for step, (x_train, y_train) in enumerate(train_ds):
y_, gloss, dloss = pix2pix.train_step(x_train, y_train, epoch)
if step%PRINT_STEP == 0:
template = 'Epoch {} {}%, G-Loss: {}, D-Loss: {}'
print (template.format(epoch+1,int(100*step/max_steps),gloss, dloss))
with train_writer.as_default():
tf.summary.image('GT', y_train+0.5, step=epoch*max_steps+step, max_outputs=3, description=None)
tf.summary.image('pred', y_+0.5, step=epoch*max_steps+step, max_outputs=3, description=None)
tf.summary.image('input', x_train+0.5, step=epoch*max_steps+step, max_outputs=3, description=None)
tf.summary.scalar('generator loss', gloss, step = epoch*max_steps+step)
tf.summary.scalar('discriminator loss', dloss, step = epoch*max_steps+step)
tf.summary.flush()
So here is an example that what I would like to have :
desired display
I thought about an other solution : save all triples images(input/truth/pred) in local folders (folders 1 : input 1 /truth 1 /pred 1, folders 2 : input 2 /truth 2 /pred 2 ...) and display them with a python library (cv2, matplotlib ...) but same problem, I don't know how to do that if it's possible.
Thanks for your help

How to get the correlation matrix of a pyspark data frame? NEW 2020

I have the same question from this topic:
How to get the correlation matrix of a pyspark data frame?
"I have a big pyspark data frame. I want to get its correlation matrix. I know how to get it with a pandas data frame.But my data is too big to convert to pandas. So I need to get the result with pyspark data frame.I searched other similar questions, the answers don't work for me. Can any body help me? Thanks!"
df4 is my dataset, he has 9 columns and all of them are integers:
reference__YM_unix:integer
tenure_band:integer
cei_global_band:integer
x_band:integer
y_band:integer
limit_band:integer
spend_band:integer
transactions_band:integer
spend_total:integer
I have first done this step:
# convert to vector column first
vector_col = "corr_features"
assembler = VectorAssembler(inputCols=df4.columns, outputCol=vector_col)
df_vector = assembler.transform(df4).select(vector_col)
# get correlation matrix
matrix = Correlation.corr(df_vector, vector_col)
And got the following output:
(matrix.collect()[0]["pearson({})".format(vector_col)].values)
Out[33]: array([ 1. , 0.0760092 , 0.09051543, 0.07550633, -0.08058203,
-0.24106848, 0.08229602, -0.02975856, -0.03108094, 0.0760092 ,
1. , 0.14792512, -0.10744735, 0.29481762, -0.04490072,
-0.27454922, 0.23242408, 0.32051685, 0.09051543, 0.14792512,
1. , -0.03708623, 0.13719527, -0.01135489, 0.08706559,
0.24713638, 0.37453265, 0.07550633, -0.10744735, -0.03708623,
1. , -0.49640664, 0.01885793, 0.25877516, -0.05019079,
-0.13878844, -0.08058203, 0.29481762, 0.13719527, -0.49640664,
1. , 0.01080777, -0.42319841, 0.01229877, 0.16440178,
-0.24106848, -0.04490072, -0.01135489, 0.01885793, 0.01080777,
1. , 0.00523737, 0.01244241, 0.01811365, 0.08229602,
-0.27454922, 0.08706559, 0.25877516, -0.42319841, 0.00523737,
1. , 0.32888075, 0.21416322, -0.02975856, 0.23242408,
0.24713638, -0.05019079, 0.01229877, 0.01244241, 0.32888075,
1. , 0.53310864, -0.03108094, 0.32051685, 0.37453265,
-0.13878844, 0.16440178, 0.01811365, 0.21416322, 0.53310864,
1. ])
I've tried to insert this result on arrays or an excel file but it didnt work.
I did:
matrix2 = (matrix.collect()[0]["pearson({})".format(vector_col)])
Then I got the following error when I tried to display this info:
display(matrix2)
Exception: ML model display does not yet support model type <class 'pyspark.ml.linalg.DenseMatrix'>.
I was expecting to insert the name of the columns back from df4 but it didnt succeed, I've read that I need to use df4.columns but I have no idea how does it works.
Finally, I was expecting to print the following graph that I've seen from medium article
https://medium.com/towards-artificial-intelligence/feature-selection-and-dimensionality-reduction-using-covariance-matrix-plot-b4c7498abd07
But also it didn't work:
from sklearn.preprocessing import StandardScaler
stdsc = StandardScaler()
X_std = stdsc.fit_transform(df4.iloc[:,range(0,7)].values)
cov_mat =np.cov(X_std.T)
plt.figure(figsize=(10,10))
sns.set(font_scale=1.5)
hm = sns.heatmap(cov_mat,
cbar=True,
annot=True,
square=True,
fmt='.2f',
annot_kws={'size': 12},
cmap='coolwarm',
yticklabels=cols,
xticklabels=cols)
plt.title('Covariance matrix showing correlation coefficients', size = 18)
plt.tight_layout()
plt.show()
AttributeError: 'DataFrame' object has no attribute 'iloc'
I've tried to replace df4 to matrix2 and didn't work too
You can use the following to get the correlation matrix in a form you can manipulate:
matrix = matrix.toArray().tolist()
From there you can convert to a dataframe pd.DataFrame(matrix) which would allow you to plot the heatmap, or save to excel etc.

Random sample for y variable in catplot seaborn

I'm new to python and trying to create catplot (stripplot and swarmplot) with a jitter in seaborn for x='region' and y='amount' using a random sample of 300 from my y variable. I have tried:
data_sample = data.sample(300)
y = data_sample['amount']
plt.figure(figsize=(8,8))
sns.catplot('region', y, data=data, jitter='1', kind='strip')
Which produces:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
Can anyone explain what this error means and how to resolve? Also, below the long list of error recommendations it is actually showing a visual of a catplot, labeled with 'Figure size 2160x2160 with 0 Axes.'
Thank you, help appreciated.

How to calculate shap values for ADABoost model?

I am running 3 different model (Random forest, Gradient Boosting, Ada Boost) and a model ensemble based on these 3 models.
I managed to use SHAP for GB and RF but not for ADA with the following error:
Exception Traceback (most recent call last)
in engine
----> 1 explainer = shap.TreeExplainer(model,data = explain_data.head(1000), model_output= 'probability')
/home/cdsw/.local/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_perturbation, **deprecated_options)
110 self.feature_perturbation = feature_perturbation
111 self.expected_value = None
--> 112 self.model = TreeEnsemble(model, self.data, self.data_missing)
113
114 if feature_perturbation not in feature_perturbation_codes:
/home/cdsw/.local/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, data_missing)
752 self.tree_output = "probability"
753 else:
--> 754 raise Exception("Model type not yet supported by TreeExplainer: " + str(type(model)))
755
756 # build a dense numpy version of all the tree objects
Exception: Model type not yet supported by TreeExplainer: <class 'sklearn.ensemble._weight_boosting.AdaBoostClassifier'>
I found this link on Git that state
TreeExplainer creates a TreeEnsemble object from whatever model type we are trying to explain, and then works with that downstream. So all you would need to do is and add another if statement in the
TreeEnsemble constructor similar to the one for gradient boosting
But I really don't know how to implement it since I quite new to this.
I had the same problem and what I did, was to modify the file in the git you are commenting.
In my case I use windows so the file is in C:\Users\my_user\AppData\Local\Continuum\anaconda3\Lib\site-packages\shap\explainers but you can do double click over the error message and the file will be opened.
The next step is to add another elif as the answer of the git help says. In my case I did it from the line 404 as following:
1) Modify the source code.
...
self.objective = objective_name_map.get(model.criterion, None)
self.tree_output = "probability"
elif str(type(model)).endswith("sklearn.ensemble.weight_boosting.AdaBoostClassifier'>"): #From this line I have modified the code
scaling = 1.0 / len(model.estimators_) # output is average of trees
self.trees = [Tree(e.tree_, normalize=True, scaling=scaling) for e in model.estimators_]
self.objective = objective_name_map.get(model.base_estimator_.criterion, None) #This line is done to get the decision criteria, for example gini.
self.tree_output = "probability" #This is the last line I added
elif str(type(model)).endswith("sklearn.ensemble.forest.ExtraTreesClassifier'>"): # TODO: add unit test for this case
scaling = 1.0 / len(model.estimators_) # output is average of trees
self.trees = [Tree(e.tree_, normalize=True, scaling=scaling) for e in model.estimators_]
...
Note in the other models, the code of shap needs the attribute 'criterion' that the AdaBoost classifier doesn't have in a direct way. So in this case this attribute is obtained from the "weak" classifiers with the AdaBoost has been trained, that's why I add model.base_estimator_.criterion .
Finally you have to import the library again, train your model and get the shap values. I leave an example:
2) Import again the library and try:
from sklearn import datasets
from sklearn.ensemble import AdaBoostClassifier
import shap
# import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target
ADABoost_model = AdaBoostClassifier()
ADABoost_model.fit(X, y)
shap_values = shap.TreeExplainer(ADABoost_model).shap_values(X)
shap.summary_plot(shap_values, X, plot_type="bar")
Which generates the following:
3) Get your new results:
It seems that the shap package has been updated and still does not contain the AdaBoostClassifier. Based on the previous answer, I've modified the previous answer to work with the shap/explainers/tree.py file in lines 598-610
### Added AdaBoostClassifier based on the outdated StackOverflow response and Github issue here
### https://stackoverflow.com/questions/60433389/how-to-calculate-shap-values-for-adaboost-model/61108156#61108156
### https://github.com/slundberg/shap/issues/335
elif safe_isinstance(model, ["sklearn.ensemble.AdaBoostClassifier", "sklearn.ensemble._weighted_boosting.AdaBoostClassifier"]):
assert hasattr(model, "estimators_"), "Model has no `estimators_`! Have you called `model.fit`?"
self.internal_dtype = model.estimators_[0].tree_.value.dtype.type
self.input_dtype = np.float32
scaling = 1.0 / len(model.estimators_) # output is average of trees
self.trees = [Tree(e.tree_, normalize=True, scaling=scaling) for e in model.estimators_]
self.objective = objective_name_map.get(model.base_estimator_.criterion, None) #This line is done to get the decision criteria, for example gini.
self.tree_output = "probability" #This is the last line added
Also working on testing to add this to the package :)

Plot 3d scattered data using gnuplot

Hi I just got some 3d scattered data (the data name is just data.txt) which look like the following:
0 0 0
-1.08051e-16 -1.73991e-16 -1.79157e-16
-1.02169e-15 -1.19283e-15 5.92632e-16
3.41114e-16 -1.02211e-15 3.19436e-15
-4.51742e-15 -5.18861e-15 -4.60754e-15
-2.00685e-15 -4.67813e-15 -4.86101e-15
-9.82727e-16 -2.24413e-15 -5.87927e-16
-7.74439e-16 -9.73515e-16 -1.69707e-15
4.32668e-16 2.15869e-15 -2.25004e-15
-3.74495e-15 -2.20596e-15 -7.33201e-16
-4.97941e-16 -5.45749e-16 -2.93136e-15
-2.40174e-15 -4.31022e-15 7.13531e-15
-4.58812e-15 -4.38568e-15 -9.99635e-16
-7.00716e-15 7.53852e-15 -8.484e-15
4.50028e-15 2.2255e-15 2.32808e-15
-8.57887e-15 3.09127e-15 -3.49207e-15
-2.0608e-16 -6.06078e-15 -6.07822e-16
-7.76829e-15 -1.47001e-14 -1.08924e-14
1.04016e-15 6.33122e-16 -2.11985e-15
2.33557e-15 -7.92667e-15 2.52748e-15
6.94335e-15 3.70286e-15 -1.44815e-15
.........
the 1st,2nd and 3rd column represent x,y and z axis, respectively.
I'd like to use splot command to plot these data. Can anyone kindly give some suggestions? Thanks.
Since your data is nicely formatted, you could start with
splot 'data.txt'
If you want to get fancy, you can add some options to change how it is plotted:
splot 'data.txt' with points pointtype 7
What kind of suggestions are you looking for?

Resources