Area differs after dissolve in Geopandas? - geopandas

I am getting different values for area after dissolving a multipolygon using geopandas. Not sure what I'm doing wrong.
temp_file = gpd.read_file("temp.shp")
print('original area: ',sum(temp_file.area))
temp_file['dissolve']=1
dissolved = temp_file.dissolve(by='dissolve')
print ('dissolved area: ',dissolved.area.values)
print (dissolved.columns)
I get the following outputs:
original area: 13162193.658826955
dissolved area: [9087595.31178884]
Index(['geometry', 'old_index', 'CALCACRES', 'category', 'farm_id', 'luse_18'], dtype='object')

Related

to_crs("epsg:4326") retruns different coordinate

I am trying to change coordinates system on my geopandas dataframe from epsg:5179 to epsg:4236.
BUT, .to_crs("epsg:4326") retruns different coordinates... How can I get true cordinates?
geo[geometry].set_crs("epsg:5179", inplace = True)
geo_df = geo[geometry].to_crs("epsg:4326")
Original
LINESTRING (14138122.900 4519000.200, 14138248...LINESTRING (14135761.800 4518881.600, 14135799...
Changed-proj
LINESTRING (-149.90927 12.31701, -149.90912 12...LINESTRING (-149.91219 12.32162, -149.91215 12...
It seems like you got true coordinates with your code which is :
geo[geometry].set_crs("epsg:5179", inplace = True)
geo_df = geo[geometry].to_crs("epsg:4326")
I've been looking through pyproj, and couldn't find error to change coordinates epsg:5179 to epsg:4326.
If you want to get futher more information about cordinates, you can visit here.

How to calculate shap values for ADABoost model?

I am running 3 different model (Random forest, Gradient Boosting, Ada Boost) and a model ensemble based on these 3 models.
I managed to use SHAP for GB and RF but not for ADA with the following error:
Exception Traceback (most recent call last)
in engine
----> 1 explainer = shap.TreeExplainer(model,data = explain_data.head(1000), model_output= 'probability')
/home/cdsw/.local/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, model_output, feature_perturbation, **deprecated_options)
110 self.feature_perturbation = feature_perturbation
111 self.expected_value = None
--> 112 self.model = TreeEnsemble(model, self.data, self.data_missing)
113
114 if feature_perturbation not in feature_perturbation_codes:
/home/cdsw/.local/lib/python3.6/site-packages/shap/explainers/tree.py in __init__(self, model, data, data_missing)
752 self.tree_output = "probability"
753 else:
--> 754 raise Exception("Model type not yet supported by TreeExplainer: " + str(type(model)))
755
756 # build a dense numpy version of all the tree objects
Exception: Model type not yet supported by TreeExplainer: <class 'sklearn.ensemble._weight_boosting.AdaBoostClassifier'>
I found this link on Git that state
TreeExplainer creates a TreeEnsemble object from whatever model type we are trying to explain, and then works with that downstream. So all you would need to do is and add another if statement in the
TreeEnsemble constructor similar to the one for gradient boosting
But I really don't know how to implement it since I quite new to this.
I had the same problem and what I did, was to modify the file in the git you are commenting.
In my case I use windows so the file is in C:\Users\my_user\AppData\Local\Continuum\anaconda3\Lib\site-packages\shap\explainers but you can do double click over the error message and the file will be opened.
The next step is to add another elif as the answer of the git help says. In my case I did it from the line 404 as following:
1) Modify the source code.
...
self.objective = objective_name_map.get(model.criterion, None)
self.tree_output = "probability"
elif str(type(model)).endswith("sklearn.ensemble.weight_boosting.AdaBoostClassifier'>"): #From this line I have modified the code
scaling = 1.0 / len(model.estimators_) # output is average of trees
self.trees = [Tree(e.tree_, normalize=True, scaling=scaling) for e in model.estimators_]
self.objective = objective_name_map.get(model.base_estimator_.criterion, None) #This line is done to get the decision criteria, for example gini.
self.tree_output = "probability" #This is the last line I added
elif str(type(model)).endswith("sklearn.ensemble.forest.ExtraTreesClassifier'>"): # TODO: add unit test for this case
scaling = 1.0 / len(model.estimators_) # output is average of trees
self.trees = [Tree(e.tree_, normalize=True, scaling=scaling) for e in model.estimators_]
...
Note in the other models, the code of shap needs the attribute 'criterion' that the AdaBoost classifier doesn't have in a direct way. So in this case this attribute is obtained from the "weak" classifiers with the AdaBoost has been trained, that's why I add model.base_estimator_.criterion .
Finally you have to import the library again, train your model and get the shap values. I leave an example:
2) Import again the library and try:
from sklearn import datasets
from sklearn.ensemble import AdaBoostClassifier
import shap
# import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target
ADABoost_model = AdaBoostClassifier()
ADABoost_model.fit(X, y)
shap_values = shap.TreeExplainer(ADABoost_model).shap_values(X)
shap.summary_plot(shap_values, X, plot_type="bar")
Which generates the following:
3) Get your new results:
It seems that the shap package has been updated and still does not contain the AdaBoostClassifier. Based on the previous answer, I've modified the previous answer to work with the shap/explainers/tree.py file in lines 598-610
### Added AdaBoostClassifier based on the outdated StackOverflow response and Github issue here
### https://stackoverflow.com/questions/60433389/how-to-calculate-shap-values-for-adaboost-model/61108156#61108156
### https://github.com/slundberg/shap/issues/335
elif safe_isinstance(model, ["sklearn.ensemble.AdaBoostClassifier", "sklearn.ensemble._weighted_boosting.AdaBoostClassifier"]):
assert hasattr(model, "estimators_"), "Model has no `estimators_`! Have you called `model.fit`?"
self.internal_dtype = model.estimators_[0].tree_.value.dtype.type
self.input_dtype = np.float32
scaling = 1.0 / len(model.estimators_) # output is average of trees
self.trees = [Tree(e.tree_, normalize=True, scaling=scaling) for e in model.estimators_]
self.objective = objective_name_map.get(model.base_estimator_.criterion, None) #This line is done to get the decision criteria, for example gini.
self.tree_output = "probability" #This is the last line added
Also working on testing to add this to the package :)

How to update bokeh active interaction with GeoJSON as data source?

I have made an interactive choropleth map with bokeh, and I'm trying to add active interactions using the dropdown widget (Select). However, most tutorials and SO questions about active interactions use ColumnDataSource, and not GeoJSONDataSource.
The issue is that GeoJSONDataSource doesn't have a .data method like ColumnDataSource does, so idk exactly how the syntax works when updating it.
My dataset is a dictionary in the form of city_dict = {'Amsterdam': <some data frame>, 'Antwerp': <some data frame>, ...}, where the dataframe is in geojson format. I have already confirmed that this format works when making glyphs.
def update(attr, old, new):
s_value = dropdown.value
p.title.text = '%s', s_value
new_src1 = make_dataset(s_value)
val1 = GeoJSONDataSource(new_src1)
r1.data_source = val1
where make_dataset is a function that transforms my original dataset into a dataset that can feed into the GeoJSONDataSource function. make_dataset requires a string (name of the city) to work eg. 'Amsterdam'. It works on passive interactions.
The main plot code (removed unnecessary stuff) is:
dropdown = Select(value='Amsterdam', options = cities)
controls = WidgetBox(dropdown)
initial_city = 'Amsterdam'
a = make_dataset(initial_city)
src1 = GeoJSONDataSource(a)
p = figure(title = 'Amsterdam', plot_height = 750 , plot_width = 900, toolbar_location = 'right')
r1 = p.patches('xs','ys', source = src1, fill_color = {'field' :'norm', 'transform' : color_mapper})
dropdown.on_change('value', update)
layout = row(controls, p)
curdoc().add_root(layout)
I've added the error I get. error handling message Message 'PATCH-DOC' (revision 1) content: {'events': [{'kind': 'ModelChanged', 'model': {'type': 'Select', 'id': '1147'}, 'attr': 'value', 'new': 'Antwerp'}], 'references': []}: ValueError("expected a value of type str, got ('%s', 'Antwerp') of type tuple",)

Pygal bar chart says "No Data"

I am trying to create a bar graph in pygal that uses the api for github and charts the most popular projects based on stars. I posted my code below, but I cannot figure out why my graph keep saying "No Data"??? Any suggestions? Thanks!
import requests
import pygal
from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS
url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
r = requests.get(url)
print("Status code:", r.status_code)
response_dict = r.json()
print('Total repositories:', response_dict['total_count'])
repo_dicts = response_dict['items']
names,stars = [],[]
for repo_dict in repo_dicts:
names.append(repo_dict['name'])
stars.append(repo_dict['stargazers_count'])
my_style = LS('#333366',base_style=LCS)
chart = pygal.Bar(style=my_style,x_label_rotation=45,show_legend=False)
chart.title = 'Most Starred Python Projects on GitHub'
chart.x_labels = names
chart.add = ('',stars)
chart.render_to_file('python_repos.svg')
on the second last line of your code, chart.add=('',stars) there should not be a '=' equal sign there , it should be chart.add('',stars) then the code should work! :)

How to Repeat Table Column Headings over Page Breaks in PDF output from ReportLab

I'm using ReportLab to write tables in PDF documents and am very pleased with the results (despite not having a total grasp on flowables just yet).
However, I have not been able to figure out how to make a table that spans a page break have its column headings repeated.
The code below creates a test.pdf in C:\Temp that has a heading row followed by 99 rows of data.
The heading row looks great on the first page but I would like that to repeat at the top of the second and third pages.
I'm keen to hear of any approaches that have been used to accomplish that using the SimpleDocTemplate.
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, Frame, Spacer
from reportlab.lib import colors
from reportlab.lib.units import cm
from reportlab.lib.pagesizes import A3, A4, landscape, portrait
from reportlab.lib.styles import ParagraphStyle, getSampleStyleSheet
from reportlab.lib.enums import TA_LEFT, TA_RIGHT, TA_CENTER, TA_JUSTIFY
from reportlab.pdfgen import canvas
pdfReportPages = "C:\\Temp\\test.pdf"
doc = SimpleDocTemplate(pdfReportPages, pagesize=A4)
# container for the "Flowable" objects
elements = []
styles=getSampleStyleSheet()
styleN = styles["Normal"]
# Make heading for each column
column1Heading = Paragraph("<para align=center>COLUMN ONE HEADING</para>",styles['Normal'])
column2Heading = Paragraph("<para align=center>COLUMN TWO HEADING</para>",styles['Normal'])
row_array = [column1Heading,column2Heading]
tableHeading = [row_array]
tH = Table(tableHeading, [6 * cm, 6 * cm]) # These are the column widths for the headings on the table
tH.hAlign = 'LEFT'
tblStyle = TableStyle([('TEXTCOLOR',(0,0),(-1,-1),colors.black),
('VALIGN',(0,0),(-1,-1),'TOP'),
('BOX',(0,0),(-1,-1),1,colors.black),
('BOX',(0,0),(0,-1),1,colors.black)])
tblStyle.add('BACKGROUND',(0,0),(-1,-1),colors.lightblue)
tH.setStyle(tblStyle)
elements.append(tH)
# Assemble rows of data for each column
for i in range(1,100):
column1Data = Paragraph("<para align=center> " + "Row " + str(i) + " Column 1 Data" + "</font> </para>",styles['Normal'])
column2Data = Paragraph("<para align=center> " + "Row " + str(i) + " Column 2 Data" + "</font> </para>",styles['Normal'])
row_array = [column1Data,column2Data]
tableRow = [row_array]
tR=Table(tableRow, [6 * cm, 6 * cm])
tR.hAlign = 'LEFT'
tR.setStyle(TableStyle([('BACKGROUND',(0,0),(-1,-1),colors.white),
('TEXTCOLOR',(0,0),(-1,-1),colors.black),
('VALIGN',(0,0),(-1,-1),'TOP'),
('BOX',(0,0),(-1,-1),1,colors.black),
('BOX',(0,0),(0,-1),1,colors.black)]))
elements.append(tR)
del tR
elements.append(Spacer(1, 0.3 * cm))
doc.build(elements)
From the documentation (yes, I know, but it's sometimes hard to locate this stuff in the manual):
The repeatRows argument specifies the number of leading rows that
should be repeated when the Table is asked to split itself.
So when you create the table, this is one of the arguments you can pass, and it will turn the first n rows into header rows that repeat. You'll find this part of the text on page 77, but the section relating to creating a Table starts on page 76.
http://www.reportlab.com/docs/reportlab-userguide.pdf
This is the code I developed, after following Gordon's advice to reconsider using repeatRows, and it works!
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, Frame, Spacer
from reportlab.lib import colors
from reportlab.lib.units import cm
from reportlab.lib.pagesizes import A3, A4, landscape, portrait
from reportlab.lib.styles import ParagraphStyle, getSampleStyleSheet
from reportlab.lib.enums import TA_LEFT, TA_RIGHT, TA_CENTER, TA_JUSTIFY
from reportlab.pdfgen import canvas
pdfReportPages = "C:\\Temp\\test.pdf"
doc = SimpleDocTemplate(pdfReportPages, pagesize=A4)
# container for the "Flowable" objects
elements = []
styles=getSampleStyleSheet()
styleN = styles["Normal"]
# Make heading for each column and start data list
column1Heading = "COLUMN ONE HEADING"
column2Heading = "COLUMN TWO HEADING"
# Assemble data for each column using simple loop to append it into data list
data = [[column1Heading,column2Heading]]
for i in range(1,100):
data.append([str(i),str(i)])
tableThatSplitsOverPages = Table(data, [6 * cm, 6 * cm], repeatRows=1)
tableThatSplitsOverPages.hAlign = 'LEFT'
tblStyle = TableStyle([('TEXTCOLOR',(0,0),(-1,-1),colors.black),
('VALIGN',(0,0),(-1,-1),'TOP'),
('LINEBELOW',(0,0),(-1,-1),1,colors.black),
('BOX',(0,0),(-1,-1),1,colors.black),
('BOX',(0,0),(0,-1),1,colors.black)])
tblStyle.add('BACKGROUND',(0,0),(1,0),colors.lightblue)
tblStyle.add('BACKGROUND',(0,1),(-1,-1),colors.white)
tableThatSplitsOverPages.setStyle(tblStyle)
elements.append(tableThatSplitsOverPages)
doc.build(elements)
Use the repeatRows=1 when you create the Table...
from reportlab.platypus import Table
Table(data,repeatRows=1)
I always like to have something you can cut & paste into a .py file to run and test. So here it is...
import os
import pandas as pd
import numpy as np
import reportlab.platypus
import reportlab.lib.styles
from reportlab.lib import colors
from reportlab.lib.units import mm
from reportlab.lib.pagesizes import letter, landscape
reportoutputfilepath = os.path.join('.\\test.pdf')
pdf_file = reportlab.platypus.SimpleDocTemplate(
reportoutputfilepath,
pagesize=landscape(letter),
rightMargin=10,
leftMargin=10,
topMargin=38,
bottomMargin=23
)
ts_tables = [
('ALIGN', (4,0), (-1,-1), 'RIGHT'),
('LINEBELOW', (0,0), (-1,0), 1, colors.purple),
('FONT', (0,0), (-1,0), 'Times-Bold'),
('LINEABOVE', (0,-1), (-1,-1), 1, colors.purple),
('FONT', (0,-1), (-1,-1), 'Times-Bold'),
('BACKGROUND',(1,1),(-2,-2),colors.white),
('TEXTCOLOR',(0,0),(1,-1),colors.black),
('FONTSIZE', (0,0),(-1,-1), 8),
]
df = pd.DataFrame(np.random.randint(0,1000,size=(1000, 4)), columns=list('ABCD'))
lista = [df.columns[:,].values.astype(str).tolist()] + df.values.tolist()
#Here is where you put repeatRows=1
table = reportlab.platypus.Table(lista, colWidths=(20*mm, 20*mm, 20*mm, 20*mm),repeatRows=1)
table_style = reportlab.platypus.TableStyle(ts_tables)
table.setStyle(table_style)
elements = []
elements.append(table)
# Build the PDF
pdf_file.build(elements)
print reportoutputfilepath
t1 = Table(lista, colWidths=220, rowHeights=20, repeatRows=1)
just type repeatRows=1
I found this solution to repeat easily the header on a table which is on two pages. Add this line in your CSS for your table:
-fs-table-paginate: paginate;
I also found a class for FPDF which seems powerful (i don't need it for the moment, so I didn't test it)
http://interpid.eu/fpdf-table

Resources