Creating alias for a pandera type: DataFrame with particular pandera schema - python-typing

I want to define a type to be a DataFrame with a particular pandera schema. However, when I lint this code:
from pandera.typing import DataFrame, Series
class MySchema(pa.SchemaModel):
foo: Series[str]
MyDF = DataFrame[MySchema]
def myfun(df: MyDF) -> bool:
return True
I get
myproject: myfile.py: note: In function "myfun":
myproject: myfile.py:10:15: error: Variable "myproject.myproject.myfile.MyDF" is not valid as a type [valid-type]
I understand I can avoid this error by specifying df: DataFrame[MySchema] in the signature of myfun instead. Is there some way I can define the alias MyDF in a valid manner?

Related

Streamlit Unhashable TypeError when i use st.cache

when i use the st.cache decorator to cash hugging-face transformer model i get
Unhashable TypeError
this is the code
from transformers import pipeline
import streamlit as st
from io import StringIO
#st.cache(hash_funcs={StringIO: StringIO.getvalue})
def model() :
return pipeline("sentiment-analysis", model='akhooli/xlm-r-large-arabic-sent')
after searching in issues section in streamlit repo
i found that hashing argument is not required , just need to pass this argument
allow_output_mutation = True
This worked for me:
from transformers import pipeline
import tokenizers
import streamlit as st
import copy
#st.cache(hash_funcs={tokenizers.Tokenizer: lambda _: None, tokenizers.AddedToken: lambda _: None})
def get_model() :
return pipeline("sentiment-analysis", model='akhooli/xlm-r-large-arabic-sent')
input = st.text_input('Text')
bt = st.button("Get Sentiment Analysis")
if bt and input:
model = copy.deepcopy(get_model())
st.write(model(input))
Note 1:
calling the pipeline with input model(input) changes the model and we shouldn't change a cached value so we need to copy the model and run it on the copy.
Note 2:
First run will load the model using the get_model function next run will use the chace.
Note 3:
You can read more about Advanced caching in stremlit in thier documentation.
Output examples:

Validate Python TypedDict at runtime

I'm working in a Python 3.8+ Django/Rest-Framework environment enforcing types in new code but built on a lot of untyped legacy code and data. We are using TypedDicts extensively for ensuring that data we are generating passes to our TypeScript front-end with the proper data type.
MyPy/PyCharm/etc. does a great job of checking that our new code spits out data that conforms, but we want to test that the output of our many RestSerializers/ModelSerializers fits the TypeDict. If I have a serializer and typed dict like:
class PersonSerializer(ModelSerializer):
class Meta:
model = Person
fields = ['first', 'last']
class PersonData(TypedDict):
first: str
last: str
email: str
and then run code like:
person_dict: PersonData = PersonSerializer(Person.objects.first()).data
Static type checkers don't be able to figure out that person_dict is missing the required email key, because (by design of PEP-589) it is just a normal dict. But I can write something like:
annotations = PersonData.__annotations__
for k in annotations:
assert k in person_dict # or something more complex.
assert isinstance(person_dict[k], annotations[k])
and it will find that email is missing from the data of the serializer. This is well and good in this case, where I don't have any changes introduced by from __future__ import annotations (not sure if this would break it), and all my type annotations are bare types. But if PersonData were defined like:
class PersonData(TypedDict):
email: Optional[str]
affiliations: Union[List[str], Dict[int, str]]
then isinstance is not good enough to check if the data passes (since "Subscripted generics cannot be used with class and instance checks").
What I'm wondering is if there already exists a callable function/method (in mypy or another checker) that would allow me to validate a TypedDict (or even a single variable, since I can iterate a dict myself) against an annotation and see if it validates?
I'm not concerned about speed, etc., since the point of this is to check all our data/methods/functions once and then remove the checks later once we're happy that our current data validates.
The simplest solution I found works using pydantic.
from typing import cast, TypedDict
import pydantic
class SomeDict(TypedDict):
val: int
name: str
# this could be a valid/invalid declaration
obj: SomeDict = {
'val': 12,
'name': 'John',
}
# validate with pydantic
try:
obj = cast(SomeDict, pydantic.create_model_from_typeddict(SomeDict)(**obj).dict())
except pydantic.ValidationError as exc:
print(f"ERROR: Invalid schema: {exc}")
EDIT: When type checking this, it currently returns an error, but works as expected. See here: https://github.com/samuelcolvin/pydantic/issues/3008
You may want to have a look at https://pypi.org/project/strongtyping/. This may help.
In the docs you can find this example:
from typing import List, TypedDict
from strongtyping.strong_typing import match_class_typing
#match_class_typing
class SalesSummary(TypedDict):
sales: int
country: str
product_codes: List[str]
# works like expected
SalesSummary({"sales": 10, "country": "Foo", "product_codes": ["1", "2", "3"]})
# will raise a TypeMisMatch
SalesSummary({"sales": "Foo", "country": 10, "product_codes": [1, 2, 3]})
A little bit of a hack, but you can check two types using mypy command line -c options. Just wrap it in a python function:
import subprocess
def is_assignable(type_to, type_from) -> bool:
"""
Returns true if `type_from` can be assigned to `type_to`,
e. g. type_to := type_from
Example:
>>> is_assignable(bool, str)
False
>>> from typing import *
>>> is_assignable(Union[List[str], Dict[int, str]], List[str])
True
"""
code = "\n".join((
f"import typing",
f"type_to: {type_to}",
f"type_from: {type_from}",
f"type_to = type_from",
))
return subprocess.call(("mypy", "-c", code)) == 0
You could do something like this:
def validate(typ: Any, instance: Any) -> bool:
for property_name, property_type in typ.__annotations__.items():
value = instance.get(property_name, None)
if value is None:
# Check for missing keys
print(f"Missing key: {property_name}")
return False
elif property_type not in (int, float, bool, str):
# check if property_type is object (e.g. not a primitive)
result = validate(property_type, value)
if result is False:
return False
elif not isinstance(value, property_type):
# Check for type equality
print(f"Wrong type: {property_name}. Expected {property_type}, got {type(value)}")
return False
return True
And then test some object, e.g. one that was passed to your REST endpoint:
class MySubModel(TypedDict):
subfield: bool
class MyModel(TypedDict):
first: str
last: str
email: str
sub: MySubModel
m = {
'email': 'JohnDoeAtDoeishDotCom',
'first': 'John'
}
assert validate(MyModel, m) is False
This one prints the first error and returns bool, you could change that to exceptions, possibly with all the missing keys. You could also extend it to fail on additional keys than defined by the model.
I like your solution!. In order to avoid iteration fixes for some user, I added some code to your solution :D
def validate_custom_typed_dict(instance: Any, custom_typed_dict:TypedDict) -> bool|Exception:
key_errors = []
type_errors = []
for property_name, type_ in my_typed_dict.__annotations__.items():
value = instance.get(property_name, None)
if value is None:
# Check for missing keys
key_errors.append(f"\t- Missing property: '{property_name}' \n")
elif type_ not in (int, float, bool, str):
# check if type is object (e.g. not a primitive)
result = validate_custom_typed_dict(type_, value)
if result is False:
type_errors.append(f"\t- '{property_name}' expected {type_}, got {type(value)}\n")
elif not isinstance(value, type_):
# Check for type equality
type_errors.append(f"\t- '{property_name}' expected {type_}, got {type(value)}\n")
if len(key_errors) > 0 or len(type_errors) > 0:
error_message = f'\n{"".join(key_errors)}{"".join(type_errors)}'
raise Exception(error_message)
return True
some console output:
Exception:
- Missing property: 'Combined_cycle'
- Missing property: 'Solar_PV'
- Missing property: 'Hydro'
- 'timestamp' expected <class 'str'>, got <class 'int'>
- 'Diesel_engines' expected <class 'float'>, got <class 'int'>

How to graphql-codegen handle schema with string templates in typescript/javascript exports

I am using graphql-codegen to generate the typescript type files for given schema.
All good except if there is string template in the exported scheme, it will complain the sytax and seems it will not compile it. Check the following files for more details.
The types.js is below:
const gql = require("graphql-tag");
const Colors = ["Red", "Yellow"];
export default gql`
type Person {
name: String
}
enum Color {
${Colors.join("\n")}
}
# if we change the above enum to be the below, it will be good
# enum Color {
# Red
# Yellow
# }
`;
And the config yml file is:
schema:
- types.js
generates:
generated.ts:
plugins:
- typescript
- typescript-resolvers
When run yarn graphql:codegen, it complains the below:
Found 1 error
✖ generated.ts
Failed to load schema from types.js:
Syntax Error: Expected Name, found }
GraphQLError: Syntax Error: Expected Name, found }
at syntaxError (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/graphql/error/syntaxError.js:15:10)
at Parser.expectToken (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/graphql/language/parser.js:1404:40)
at Parser.parseName (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/graphql/language/parser.js:94:22)
at Parser.parseEnumValueDefinition (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/graphql/language/parser.js:1014:21)
at Parser.optionalMany (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/graphql/language/parser.js:1497:28)
at Parser.parseEnumValuesDefinition (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/graphql/language/parser.js:1002:17)
at Parser.parseEnumTypeDefinition (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/graphql/language/parser.js:986:23)
at Parser.parseTypeSystemDefinition (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/graphql/language/parser.js:705:23)
at Parser.parseDefinition (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/graphql/language/parser.js:146:23)
at Parser.many (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/graphql/language/parser.js:1518:26)
at Parser.parseDocument (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/graphql/language/parser.js:111:25)
at Object.parse (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/graphql/language/parser.js:36:17)
at Object.parseGraphQLSDL (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/#graphql-toolkit/common/index.cjs.js:572:28)
at CodeFileLoader.load (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/#graphql-toolkit/code-file-loader/index.cjs.js:120:31)
at async /Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/#graphql-toolkit/core/index.cjs.js:682:25
at async Promise.all (index 0)
GraphQL Code Generator supports:
- ES Modules and CommonJS exports (export as default or named export "schema")
- Introspection JSON File
- URL of GraphQL endpoint
- Multiple files with type definitions (glob expression)
- String in config file
Try to use one of above options and run codegen again.
Error: Failed to load schema
at loadSchema (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/#graphql-codegen/cli/bin.js:353:15)
Error: Failed to load schema
at loadSchema (/Users/rliu/study/reproduce-graphql-codegen-with-string-template-in-js-export/node_modules/#graphql-codegen/cli/bin.js:353:15)
Looks like graphql-codegen didn't like template string like: ${Colors.join('\n')}.
Also please have a look about the repo containing all the above files.
Anyone can help to fix? Thanks.
It's not handling it, mostly because of the complexity of loading code files and interpolate it.
But, the way to workaround is:
Create a single schema.js file.
Import all typedefs with string interpolation from your source files, and use buildSchema (from graphql) or makeExecutableSchema (from graphql-tools) to build a GraphQLSchema object instance.
Export your GraphQLSchema as default export, or as identifier named schema.
Provide that file into codegen (by doing schema: ./schema.js - using a single code file causes the codegen to look for ast code, and then try to do require to it.
If you are using TypeScript, you should also add require extension to the codegen (see https://graphql-code-generator.com/docs/getting-started/require-field#typescript-support)

How to access JSON from external data source in Terraform?

I am receiving JSON from a http terraform data source
data "http" "example" {
url = "${var.cloudwatch_endpoint}/api/v0/components"
# Optional request headers
request_headers {
"Accept" = "application/json"
"X-Api-Key" = "${var.api_key}"
}
}
It outputs the following.
http = [{"componentID":"k8QEbeuHdDnU","name":"Jenkins","description":"","status":"Partial Outage","order":1553796836},{"componentID":"ui","name":"ui","description":"","status":"Operational","order":1554483781},{"componentID":"auth","name":"auth","description":"","status":"Operational","order":1554483781},{"componentID":"elig","name":"elig","description":"","status":"Operational","order":1554483781},{"componentID":"kong","name":"kong","description":"","status":"Operational","order":1554483781}]
which is a string in terraform. In order to convert this string into JSON I pass it to an external data source which is a simple ruby function. Here is the terraform to pass it.
data "external" "component_ids" {
program = ["ruby", "./fetchComponent.rb",]
query = {
data = "${data.http.example.body}"
}
}
Here is the ruby function
#!/usr/bin/env ruby
require 'json'
data = JSON.parse(STDIN.read)
results = data.to_json
STDOUT.write results
All of this works. The external data outputs the following (It appears the same as the http output) but according to terraform docs this should be a map
external1 = {
data = [{"componentID":"k8QEbeuHdDnU","name":"Jenkins","description":"","status":"Partial Outage","order":1553796836},{"componentID":"ui","name":"ui","description":"","status":"Operational","order":1554483781},{"componentID":"auth","name":"auth","description":"","status":"Operational","order":1554483781},{"componentID":"elig","name":"elig","description":"","status":"Operational","order":1554483781},{"componentID":"kong","name":"kong","description":"","status":"Operational","order":1554483781}]
}
I was expecting that I could now access data inside of the external data source. I am unable.
Ultimately what I want to do is create a list of the componentID variables which are located within the external data source.
Some things I have tried
* output.external: key "0" does not exist in map data.external.component_ids.result in:
${data.external.component_ids.result[0]}
* output.external: At column 3, line 1: element: argument 1 should be type list, got type string in:
${element(data.external.component_ids.result["componentID"],0)}
* output.external: key "componentID" does not exist in map data.external.component_ids.result in:
${data.external.component_ids.result["componentID"]}
ternal: lookup: lookup failed to find 'componentID' in:
${lookup(data.external.component_ids.*.result[0], "componentID")}
I appreciate the help.
can't test with the variable cloudwatch_endpoint, so I have to think about the solution.
Terraform can't decode json directly before 0.11.x. But there is a workaround to work on nested lists.
Your ruby need be adjusted to make output as variable http below, then you should be fine to get what you need.
$ cat main.tf
variable "http" {
type = "list"
default = [{componentID = "k8QEbeuHdDnU", name = "Jenkins"}]
}
output "http" {
value = "${lookup(var.http[0], "componentID")}"
}
$ terraform apply
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Outputs:
http = k8QEbeuHdDnU

Rscript : Why is Error in UseMethod("extract_") : being indicated when attempting to use raster::extract?

I attempting to use raster package's extract method to extract values from a Raster* object.
RStudioPrompt> jpnpe <- extract(jpnp, jpnb, fun = mean, na.rm = T)
where jpnp is the raster object and jpnb is SpatialPolygonsDataFrame
However the following error is indicated:
Error in UseMethod("extract_") :
no applicable method for 'extract_' applied to an object of class "c('RasterStack', 'Raster', 'RasterStackBrick', 'BasicRaster')"
How can I get passed this error?
Issue may be due to having another package with the same method name, obfuscating the raster extract method.
The tidyr package has an extract method which may conflict with raster's extract method.
Confirm by checking libraries loaded by doing:
>search()
[1] ".GlobalEnv" **"package:tidyr"** "package:dplyr"
[4] "package:rgeos" "package:ggplot2" "package:RColorBrewer"
[7] "package:animation" "package:rgdal" "package:maptools"
[10] **"package:raster"** "package:sp" "tools:rstudio"
[13] "package:stats" "package:graphics" "package:grDevices"
[16] "package:utils" "package:datasets" "package:methods"
[19] "Autoloads" "package:base"
you can also check which extract method is being loaded by typing name of function without brackets (as below, the environment will tell you which package is being used):
> extract
function (data, col, into, regex = "([[:alnum:]]+)", remove = TRUE,
convert = FALSE, ...)
{
col <- col_name(substitute(col))
extract_(data, col, into, regex = regex, remove = remove,
convert = convert, ...)
}
<environment: namespace:tidyr>
To resolve the error just unload the offending package, in RStudio you can use the following command:
>.rs.unloadPackage("tidyr")
and re-execute the raster extract method:
>jpnpe <- extract(jpnp, jpnb, fun = mean, na.rm = T)

Resources