Databricks Cognitive Services: name 'TextSentiment' is not defined - azure-databricks

I am trying to implement Cognitive Services using the following guide
https://learn.microsoft.com/en-us/azure/cognitive-services/big-data/getting-started
I have implemented the sample code:
from mmlspark.cognitive import *
from pyspark.sql.functions import col
# Add your subscription key from the Language service (or a general Cognitive Service key)
service_key = "ADD-SUBSCRIPTION-KEY-HERE"
df = spark.createDataFrame([
("I am so happy today, its sunny!", "en-US"),
("I am frustrated by this rush hour traffic", "en-US"),
("The cognitive services on spark aint bad", "en-US"),
], ["text", "language"])
sentiment = (TextSentiment()
.setTextCol("text")
.setLocation("eastus")
.setSubscriptionKey(service_key)
.setOutputCol("sentiment")
.setErrorCol("error")
.setLanguageCol("language"))
results = sentiment.transform(df)
# Show the results in a table
display(results.select("text", col("sentiment")[0].getItem("score").alias("sentiment")))
I have installed the library Azure:mmlspark:0.17
But I keep on getting the error:
name 'TextSentiment' is not defined
Any thoughts?

I found that currently, there is only one cluster type that can make this tutorial work which is Databricks Runtime Version: 6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11) as the library supports only Runtime Version 7.0 or below. Also, It seems that you need to install the latest version of the library onto the cluster as follows:
Coordinate:
com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc3
Repository:
https://mmlspark.azureedge.net/maven
See the screenshot.

Related

using confluent kafka-schema-registry-client with basic auth with managed confluent schema registry in databricks

in my spark application I have the following scala code
val restService = new RestService(schemaRegistryUrl)
val props = Map(
"basic.auth.credentials.source" -> "USER_INFO",
"basic.auth.user.info" -> "%s:%s".format(key, secret)
).asJava
val schemaRegistryClient = new CachedSchemaRegistryClient(restService, 100, props)
// later:
schemaRegistryClient.getSchemaById(id) // fails with 401
I've verified I'm able to get a schema by the same id, from the rest API using the same basic auth credentials. But with the code, I get an auth error - 401. So obviously I'm missing something here and I'll be glad for help here.
Version of both restService and SchemaRegistry 6.2.1 (the latest I believe)
Note this works locally when I run this code with the same credentials via the same schema registry, it works from databricks when I use python with the same connection variables. just with Scala from databricks I get 401
So the answer turned out to be, my assembly build dropped some important stuff when I dropped META-INF, not dropping it, but choosing wisely what I need took some effort, but worked eventually. and then it worked.
Also turns out that the version of the Confluent schema registry client I've used avro version (1.10) isn't compatible with existing versions of spark in Databricks (3/3.1) which support only 1.8

Spring Boot with spring-data-elastic connecting to Elastic Search 7.4.0 on AWS server

I have 2 questions:
Can I run spring-data-elastic v4.0.1.RELEASE (with org.elasticsearch:elasticsearch 7.6.2 ) with ES client running on 7.4.0??? If not, what combination can I use for 7.4.0 client? We are migrating to AWS and I need to use 7.4.0 version of client.
I have parent/child relationship (configured as join datatype field). Could pls somebody provide a documentation or explain, how to use either ElasticsearchRestTemplate or ElasticsearchOperations to correctly insert/update both parent and child records?
Thank you.
Best regards,
Robert
ad 1): from the Elasticsearch documentation I can't at the moment find anything in the breaking changes sections that would prevent using a 7.4.0 client library, but that does not mean there aren't any. But that does not mean that there aren't any. Recently there was a breaking change in the Java classes (from 7.7 to 7.8) and I got the information:
our compatability focus is on the HTTP APIs and we don’t offer any guarantees on the code itself. There’s more background here: https://github.com/elastic/elasticsearch/issues/22707#issuecomment-274163711
So I'd say, write a small test app and with the corresponding libraries, start a local ES 7.4 and test it.
ad 2): adding the join-type mapping ang implementing the corresponding inserts etc. is currently worked on and will hopefully be available in version 4.1.

Is the GBMV3 object from the H2O server different from the GBMV3 class in the H2O library?

We are working with H2O version 3.22.0.1. We created a process in java 10 that communicates with the REST API utilizing jersey version 2.27 with gson 2.3.1. The process invokes ImportFiles, followed by ParseSetup and Parse. Everything works well up until that point. Then the process invokes 3/ModelBuilders/gbm/parameters. From examining the log, it appears that the H2O server responds as expected. However, gson throws a JsonSyntaxException caused by the following:
java.lang.IllegalStateException: Expected BEGIN_OBJECT but was BEGIN_ARRAY at line 1 column 4115 path $.parameters
Upon further analysis, it appears that the H2O server is providing a GBMV3 object with an array of ModelParameterSchemaV3 objects, while the GBMV3 class, as defined in the library that our client uses, extends SharedTreeV3, which extends ModelBuilderSchema, which has a single instance of ModelParametersSchemaV3. There is an apparent discrepancy between the way the GBMV3 object provided by the H2O server is composed, and the way the class is defined in the H2O library. One has an array of ModelParameterSchemaV3 objects, while the other has a single instance of ModelParametersSchemaV3. Is that the case? If so, could you please help us understand what we may be doing wrong, and how to correct it?
See the files located at: https://1drv.ms/f/s!AsSlPHvlhJI1hIpB2M5X49J5L-h1qw
Run the H2O server. Import the CSV file in H2O Flow. SetupParse and Parse the data. Run the test procedure. Thank you for your kind assistance.
Thanks for the detailed description. To better understand your problem - would you be able to provide a simplified example of how you are calling H2O-3 using the Java bindings?
You might be hitting a bug so if you are able to give us a reproducer we could expedite a fix for this issue.

How to execute query over Amazon Athena with Ruby?

how to connect Amazon Athena with Ruby and execute query over Amazon Athena and get result.
we not able to find any gem or example with help us to connect Amazon Athena in ruby.
Please provide any reference using which we can use to build connection with Amazon Athena and build a custom query executor in Ruby.
just to clarify my application on production so changing SDK from Ruby to JRuby is not a suitable option for me.
Per May 19th 2017, Amazon Athena supports query execution via SDK and CLI.
Ruby API client for Athena documentation # docs.aws.amazon.com
Source code of aws-sdk-athena # github.com/aws/aws-sdk-ruby
I found that the official Amazon SDK for athena was a bit complicated, so I made a new gem called Athens that wraps the SDK in a nicer interface:
conn = Athens::Connection.new(database: 'sample')
query = conn.execute("SELECT * FROM mytable")
If using JRuby is not acceptable, there is another option that could work - but be warned that it's not 100% Ruby!
You could set up a Java Lambda function that encapsulates the query logic, taking in search parameters and then connecting directly to Athena using the JDBC driver.
Then call the Lambda function from Ruby - either via HTTP or through the Ruby client.
Using Lambda function is good alternative but if some one not like to pay additional service amount then batter to implment small application with jetty using rest service in JAVA with sql query as parameter and response text as output(your prefer format) will give you workaround to move on.
JRuby is required. Athena offers only JDBC driver. It works only in JRE.

is there any difference between com.cloudera.sqoop.SqoopOptions vs org.apache.sqoop.SqoopOptions?

Iam new in sqoop.Actual iam used sqoop import & export through command line arguments.But now iam trying to implment with java.I got compile time error when calling expTool.run(sqoopoptions) when using the org.apache.sqoop.SqoopOptions package.If i am trying to use cloudera package instead of apache sqoop package.there is no compile time execption.check the below code snippet
SqoopTool expTool=new ExportTool();
SqoopOptions options=new SqoopOptions();
options.setConnectString("jdbc:mysql://localhost/sample_db");
options.setUsername("hive");
options.setPassword("hadoop");
options.setExportDir("hdfs://localhost:7002/user/warehouse/output1/part-00000");
options.setTableName("warehouse");
options.setInputFieldsTerminatedBy(',');
expTool.run(options);
Is there any issue implmentation with apache.sqoop package?.Please help me.
Sqoop was originally developed openly on Cloudera github and thus all code was stored in com.cloudera.sqoop namespace. During incubation in Apache Software Foundation all functionality was moved to org.apache.sqoop namespace. To preserve backward compatibility Sqoop has not removed the com.cloudera.sqoop namespace, however users are advised to use code from org.apache.sqoop instead. Details about the namespace migration can be found on Sqoop wiki [1].
Links:
1: https://cwiki.apache.org/confluence/display/SQOOP/Namespace+Migration

Resources