Importing shapefile data using GeoDataFrame - geopandas

I am using GeoDataFrame for the data importing. But have the following problems. Actually this function works well for some shapefiles, but does not work so well for some specific shapefiles and I am wondering why
data = GeoDataFrame.from_file('bayarea_general.shp')
fiona/ogrext.pyx in fiona.ogrext.Iterator.__next__ (fiona/ogrext.c:17244)()
fiona/ogrext.pyx in fiona.ogrext.FeatureBuilder.build (fiona/ogrext.c:3254)()
IndexError: list index out of range

Related

How to use tapas table question answer model when table size is big like containing 50000 rows?

I am trying to build up a model in which I load the dataframe (an excel file from Kaggle) and I am using TAPAS-large-finetuned-wtq model to query this dataset. I tried to query 259 rows (the memory usage is 62.9 KB). I didn't have a problem, but then I tried to query 260 rows with memory usage 63.1KB, and I have the error which says: "index out of range in self". I have attached a screenshot for the reference as well. The data I used here can be found from Kaggle datasets.
The code I am using is:
from transformers import pipeline
import pandas as pd
import torch
question = "Which Country code has the quantity 30604?"
tqa = pipeline(task="table-question-answering", model="google/tapas-large-finetuned-wtq")
c = tqa(table=df[:100], query=question)['cells']
In the last line, as you can see in the screenshot, I get the error.
Please let me know what can be the way I can work for a solution? Any tips would be welcome.

Azure Machine Learning Service - dataset API question

I am trying to use autoML feature of AML. I saw that in the sample notebook it is using Dataset.Tabular.from_delimited_files(train_data) which only takes data from a https path. I am wondering how can I use pandas dataframe directly automl config instead of using dataset API. Alternatively, what is the way I can convert pandas dataframe to tabular dataset to pass into automl config?
You could quite easily save your pandas dataframe to parquet, upload the data to the workspace's default blob store and then create a Dataset from there:
# ws = <your AzureML workspace>
# df = <contains a pandas dataframe>
from azureml.core.dataset import Dataset
os.makedirs('mydata', exist_ok=True)
df.to_parquet('mydata/myfilename.parquet')
dataref = ws.get_default_datastore().upload('mydata')
dataset = Dataset.Tabular.from_parquet_files(path = dataref.path('myfilename.parquet'))
dataset.to_pandas_dataframe()
Or you can just create the Dataset from local files in the portal http://ml.azure.com
Once you created it in the portal, it will provide you with the code to load it, which will look somewhat like this:
# azureml-core of version 1.0.72 or higher is required
from azureml.core import Workspace, Dataset
subscription_id = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
resource_group = 'ignite'
workspace_name = 'ignite'
workspace = Workspace(subscription_id, resource_group, workspace_name)
dataset = Dataset.get_by_name(workspace, name='IBM-Employee-Attrition')
dataset.to_pandas_dataframe()

Linkmap graph using 3DJS

I have a website linking map in a csv file. it is represented by 2 column, one for Origin URL and one for Target URL
I would like to draw a chart that represent the relationship between all pages. I'm looking for a logic to implement that make me draw the chart using 3DJS
I used the collapsed force layout of 3DJS https://bl.ocks.org/mbostock/1062288
As you can see, the JS parse a JSON file that represent the relationship between nodes.
here is what my mapping looks like https://imgur.com/uzh1lnR
Currently, I have my data stored in a database. I can parse it and create a JSON/CSV file. I'm looking for an appropriate format for that JSON file
any help ?

How to convert enum datatype into Numric in H20

I have import my dataset into h2o flow, I have one column which is categorical type, I wanna convert this into numerical data type.
If I use pandas for this task I'll do like this,
df['category_column'] = df['category_column'].astype('category')
df['category_column'] = df['category_column'].apply(lambda x: x.cat.codes)
How to do this in h2o flow,
I tried following,
while parsing data i changed Data type to numeric from enum but data shows ยท like this.
I tried convert to numeric option, But it didn't work as I wish.
I don't know whether I'm going in right direction or not.
Please help me to solve this issue.
Update on question as suggested:
Why GLM forced me to use numerical column?
Error evaluating cell
My dataset looks like this:
When I use GLM to build model and, I is my response_column i'm getting following error
Error calling POST /3/ModelBuilders/glm with opts {"model_id":"glm-e2ed0066-636c-4c71-bf8...
ERROR MESSAGE: Illegal argument(s) for GLM model: glm-e2ed0066-636c-4c71-bf8c-04525eb05002. Details: ERRR on field: _response: Regression requires numeric response, got categorical. For more information visit: http://jira.h2o.ai/browse/TN-2
if you are using H2O's python api you can convert numeric columns to enum using .asfactor() for example df['my_colummn'] = df['my_colummn'].asfactor()
In flow after you import the dataset you will see a data type drop-down menu next to each column name where you can convert the data type to enum by selecting enum from the drop-down menu. You can also do this after you have parsed the dataset when you view the data; there is a hyperlink within each row that you can click on to convert the data type from numeric to enum.
please see the documentation for more details: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/flow.html#parsing-data
To run GLM on categorical data, set the family to "multinomial" (or "binomial" when there are only two classes).

datatables + adding a filter per column

How do I get the search/filter to work per column?
I am making small steps in building, and constantly adding to my data table, which is pretty dynamic at this stage. It basically builds a datatable based on the dat that is fed into it. I have now added the footer to act as a search/filter, but unfortunately this is where I have become stuck. I cannot get the filer part to work. Advice greatly appreciated.
here is my sample data tables that I am working on http://live.datatables.net/qociwesi/2/edit
It basically has dTableControl object that builds by table.
To build my table I need to call loadDataFromSocket which does the following:
//then I have this function for loading my data and creating my tables
//file is an array of objects
//formatFunc is a function that formats the data in the data table, and is stored in options for passing to the dTableControl for formatting the datatable - not using this in this example
//ch gets the keys from file[0] which will be the channel headers
//then I add the headers
//then I add the footers
//then I create the table
//then i build the rows using the correct values from file
//then I draw and this then draws all the row that were built
//now the tricky part of applying the search to each columns
So i have got this far but the search per column is not working. How do I get the search/filter to wrok per column?
Note this is a very basic working example that I have been working off: http://jsfiddle.net/HattrickNZ/t12w3a65/
You should use t1.oTable to access DataTables API, see updated example for demonstration.
Please compare your code with jsFiddle in your question, notice its simplicity and consider rewriting your code.

Resources