Blaze can not tackle with BIGINT, TEXT data structure of postgresql database - blaze

I am trying Blaze to analyze data in my postgres db
connecting use
conn_string = 'postgresql://postgres:mysecretpassword#localhost:5432/postgres'
Data(resource(conn_string, schema='public'))
it gives the error
Blaze does not understand a SQLAlchemy type.
Blaze provided the following error:
No SQL-datashape match for type HSTORE
Skipping.
Blaze does not understand a SQLAlchemy type.
Blaze provided the following error:
No SQL-datashape match for type TEXT[]
Skipping.
Blaze does not understand a SQLAlchemy type.
Blaze provided the following error:
No SQL-datashape match for type BIGINT[]
Skipping.
It seems that Blaze can not understand the data type of my postgresql

Related

Cassandra :: Mapping for Date datatype,Error: No converter found capable of converting from type[java.time.LocalDate] to type[java.util.Date]

I am using Spring Data JPA for mapping entities to a Cassandra table.
Data Type is Date for Cassandra
The data type I am using for mapping Date in Cassandra to entity class is java.util.Date.
However when I try to retrieve the data from Cassandra, the application displays the below exception :
No converter found capable of converting from type[java.time.LocalDate] to type[java.util.Date]
Logs
Also if I use the java.util.LocalDate, it returns an array in this format [YYYY, MM, dd].
When Converted to LocalDate
Could someone help me on this?

How to convert an AVRO scheme into line protocol in order to insert data into InfluxBD with Apache Ni-Fi

I am creating a data pipeline with Apache Ni-Fi to copy data from a remote MySQL database into InfluxDB.
I use QueryDatabaseTable processor to extract the data from the MySQL database, then I use UpdateRecord to do some data transformation and I would like to use PutInfluxDB to insert the time series into my local Influx instance in Linux.
The data coming from the QueryDatabaseTable processor uses AVRO scheme and I need to convert it into line protocol by configuring which are the tags and which are the measurement values.
However, I do not find any processor that allows doing this conversion.
Any hints?
Thanks,
Bernardo
There is no built-in processor for InfluxDB Line Protocol conversions - you could write a ScriptedRecordWriter if you wanted to do it yourself, however there is a project that already implements a Line Protocol reader for NiFi here by InfluxData that seems to be active & up-to-date.
See the documentation for adding it into NiFi here

Azure Data Factory Converting Source Data Type to a Different Format

I am using Azure Data Factory to copy data from an Oracle Database to ADLS Gen 2 Container
In the COPY Activity, I added Source as Oracle DB and Sink as ADLS
I want to create Parquet file in Sink
When I click on Mapping, I can see the datatype which is NUMBER in Source is getting converted as Double in ADF
Also, Date type in source is converted to DateTime in ADF
Due to which I am not able to load correct data
I even tried Typecasting in Source Query to convert it into same format as source but still ADF is converting it into Double
Pls find below screenshot as a reference:
Here ID column is NUMBER in Oracle DB, but ADF is considering it as Double and adding .0 to the data which is not what I need
Even after typecasting it to Number it is not showing correct type
What can be the possible root cause of this issue and why the Source data type is not shown in correct format
Due to this, the Parquet file which I am creating is not correct and my Synapse Table (end destination) is not able to add the data as in Synapse I have kept ID column as Int
Ideally, ADF should show the same data type as in Source
Pls let me know if you have any solution or suggestions for me to try
Thanks!
I am not an Oracle user, but as I understand it the NUMBER data type is generic and can be either integer or decimal based. Parquet does not have this concept, so when it converts it basically has to be to a decimal type (such as Double) to prevent loss of data. If you really want the data to be an Integer, then you'll need to use Data Flow (instead of COPY) to cast the incoming values to an integer column.

Kafka connect jdbc oracle source numeric

I am trying use Kafka-connect-jdbc to load data in incremental mode using query. Data is getting loaded to the topic. Key column to converted to Numeric(38,0) but it's getting truncated in the topic. I am suspecting that some conversion is happening. Its starts from 0 to 127 and then each digit is getting repeated.
Thanks in advance
Please add more details as to how the key looks like in the topic.
As far as Confluent JDBC Source connector goes, there have been problems mapping the precision appropriately.
This blog sheds more light into the specifics of Oracle numeric type with respect to Kafka Connect.
Based on the problem description, I can suggest using the below property in the source property file:
numeric.mapping=best_fit

How to specify schema while reading parquet file with pyspark?

While reading a parquet file stored in hadoop with either scala or pyspark an error occurs:
#scala
var dff = spark.read.parquet("/super/important/df")
org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:189)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:189)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:188)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425)
... 52 elided
or
sql_context.read.parquet(output_file)
results in the same error.
Error message is pretty clear about what has to be done: Unable to infer schema for Parquet. It must be specified manually.;.
But where can I specify it?
Spark 2.1.1, Hadoop 2.5, dataframes are created with a help of pyspark. Files are partitioned into 10 peaces.
This error usually occurs when you try to read an empty directory as parquet.
If for example you create an empty DataFrame, you write it in parquet and then read it, this error appears.
You could check if the DataFrame is empty with rdd.isEmpty() before write it.
I have done a quick implementation for the same
Hope this Helps!!...

Resources