Create table on cluster of clickhouse error

Create table on cluster of clickhouse error - clickhouse

When I create table as follows:
CREATE TABLE partition_v3_cluster ON CLUSTER perftest_3shards_3replicas(
ID String,
URL String,
EventTime Date
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(EventTime)
ORDER BY ID;
I get errors:
Query id: fe98c8b6-16af-44a1-b8c9-2bf10d9866ea
┌─host────────┬─port─┬─status─┬─error───────────────────────────────────────────────────────────────────────────────────────────────────────────── ──────────────────────────────────────────────────────────────────────────┬─num_hosts_remaining─┬─num_hosts_active─┐
│ 10.18.1.131 │ 9000 │ 371 │ Code: 371, e.displayText() = DB::Exception: There are two exactly the same ClickHouse instances 10.18.1.131:9000 i n cluster perftest_3shards_3replicas (version 21.6.3.14 (official build)) │ 2 │ 0 │
│ 10.18.1.133 │ 9000 │ 371 │ Code: 371, e.displayText() = DB::Exception: There are two exactly the same ClickHouse instances 10.18.1.133:9000 i n cluster perftest_3shards_3replicas (version 21.6.3.14 (official build)) │ 1 │ 0 │
│ 10.18.1.132 │ 9000 │ 371 │ Code: 371, e.displayText() = DB::Exception: There are two exactly the same ClickHouse instances 10.18.1.132:9000 i n cluster perftest_3shards_3replicas (version 21.6.3.14 (official build)) │ 0 │ 0 │
└─────────────┴──────┴────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ──────────────────────────────────────────────────────────────────────────┴─────────────────────┴──────────────────┘
← Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) 0%
3 rows in set. Elapsed: 0.149 sec.
Received exception from server (version 21.6.3):
Code: 371. DB::Exception: Received from localhost:9000. DB::Exception: There was an error on [10.18.1.131:9000]: Code: 371, e.displayText() = DB:: Exception: There are two exactly the same ClickHouse instances 10.18.1.131:9000 in cluster perftest_3shards_3replicas (version 21.6.3.14 (official build)).
And here is my metrika.xml:
<?xml version="1.0" encoding="utf-8"?>
<yandex>
<remote_servers>
<perftest_3shards_3replicas>
<shard>
<replica>
<host>10.18.1.131</host>
<port>9000</port>
</replica>
<replica>
<host>10.18.1.132</host>
<port>9000</port>
</replica>
<replica>
<host>10.18.1.133</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>10.18.1.131</host>
<port>9000</port>
</replica>
<replica>
<host>10.18.1.132</host>
<port>9000</port>
</replica>
<replica>
<host>10.18.1.133</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>10.18.1.131</host>
<port>9000</port>
</replica>
<replica>
<host>10.18.1.132</host>
<port>9000</port>
</replica>
<replica>
<host>10.18.1.133</host>
<port>9000</port>
</replica>
</shard>
</perftest_3shards_3replicas>
</remote_servers>
<zookeeper>
<node>
<host>10.18.1.131</host>
<port>2181</port>
</node>
<node>
<host>10.18.1.132</host>
<port>2181</port>
</node>
<node>
<host>10.18.1.133</host>
<port>2181</port>
</node>
</zookeeper>
<macros>
<shard>01</shard>
<replica>01</replica>
</macros>
</yandex>
I don't know where is wrong, can someone help me? Thank u

Related

Clickhouse shows duplicates data in distributed table

I have 3 nodes with 3 shards and 2 replicas on each:
CLickhouse cluster settings
Added also the XML config for the sharding and replicas
<default_cluster>
<shard>
<internal_replication>true</internal_replication>
<replica>
<default_database>shard</default_database>
<host>clickhouse-0</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<default_database>replica</default_database>
<host>clickhouse-2</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<default_database>shard</default_database>
<host>clickhouse-1</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<default_database>replica</default_database>
<host>clickhouse-0</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<default_database>shard</default_database>
<host>clickhouse-2</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
<replica>
<default_database>replica</default_database>
<host>clickhouse-1</host>
<port>9000</port>
<user>default</user>
<password>default</password>
</replica>
</shard>
</default_cluster>
I am doing the following example:
create database test on cluster default_cluster;
CREATE TABLE test.test_distributed_order_local on cluster default_cluster
(
id integer,
test_column String
)
ENGINE = ReplicatedMergeTree('/default_cluster/test/tables/test_distributed_order_local/{shard}', '{replica}')
PRIMARY KEY id
ORDER BY id;
CREATE TABLE test.test_distributed_order on cluster default_cluster as test.test_distributed_order_local
ENGINE = Distributed(default_cluster, test, test_distributed_order_local, id);
insert into test.test_distributed_order values (1, 'test1');
insert into test.test_distributed_order values (2, 'test2');
insert into test.test_distributed_order values (3, 'test3');
The results are not the same, and they contain duplications: Eg
Result 1
Result 2

What am I missing?
I expect to not have duplicated rows in the select

I think this post probably sums up what you're trying to achieve - https://altinity.com/blog/2018/5/10/circular-replication-cluster-topology-in-clickhouse
It's a little old but the principle applies - For Clickhouse not a topology that's recommended.
Consider this simplified example:
<shard>
// These two are replicas of each other
<replica>
<host>cluster_node_0</host>
</replica>
**<replica>
<host>cluster_node_2</host>
</replica>**
</shard>
<shard>
<replica>
<host>cluster_node_1</host>
</replica>
<replica>
<host>cluster_node_0</host>
</replica>
</shard>
<shard>
**<replica>
<host>cluster_node_2</host>
</replica>**
<replica>
<host>cluster_node_1</host>
</replica>
</shard>
Let's suppose data is written into the first shard on node cluster_node_0. It will then be replicated to the shard on cluster_node_2 - as the zookeeper path is the same.
Now for the issue. You have also defined the 3rd shard on cluster_node_2. When you create this table, it will physically contain data from 2 shards - the 1st and 3rd - I've attempted to highlight with **.
When a query comes in, it will be sent to each shard. The challenge is each local table will respond with results from both shards - hence you get duplicates.
Generally, avoid more than one shard on a host - the blog explains how you can achieve more than one buts its not recommended or ever need.

ClickHouse show duplicates cause you use the same hosts in multiple shards
During execution of SELECT your query, it rewrites and execute in one replica in each shard.
Because same replica presents in different shards and query run twice.
Usually shard means data is not intersected between other shards
If you want a cluster for 3 shards and 2 replicas in each shard
You need 6 different replicas clickhouse-0..5

ElasticSearch MasterNotDiscoveredException

I've setup an elasticsearch cluster in kuberentes, but I'm getting the error "MasterNotDiscoveredException". I'm not really sure even where to begin debugging this error as there does not appear to be anything really useful in the logs of any of the nodes:
│ elasticsearch {"type": "server", "timestamp": "2022-06-15T00:44:17,226Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "logging-ek", "node.name": "logging-ek-es-master-0", "message": "path: /_bulk, params: {}", │
│ elasticsearch "stacktrace": ["org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized, SERVICE_UNAVAILABLE/2/no master];", │
│ elasticsearch "at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:179) ~[elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.handleBlockExceptions(TransportBulkAction.java:635) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.doRun(TransportBulkAction.java:481) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$2.onTimeout(TransportBulkAction.java:669) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:345) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:263) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:660) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]", │
│ elasticsearch "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]", │
│ elasticsearch "at java.lang.Thread.run(Thread.java:833) [?:?]", │
│ elasticsearch "Suppressed: org.elasticsearch.discovery.MasterNotDiscoveredException", │
│ elasticsearch "\tat org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:297) ~[elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "\tat org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:345) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "\tat org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:263) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "\tat org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:660) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) [elasticsearch-7.17.1.jar:7.17.1]", │
│ elasticsearch "\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]", │
│ elasticsearch "\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]", │
│ elasticsearch "\tat java.lang.Thread.run(Thread.java:833) [?:?]"] }
is pretty much the only logs i've ever seen.
It does appear that the cluster sees all of my master nodes:
elasticsearch {"type": "server", "timestamp": "2022-06-15T00:45:41,915Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "logging-ek", "node.name": "logging-ek-es-master-0", "message": "master not discovered yet, this node has not previously joine │
│ d a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{logging-ek-es-master-0}{fHLQrvLsTJ6UvR_clSaxfg}{iLoGrnWSTpiZxq59z7I5zA}{10.42.64.4}{10.42.64.4:9300}{mr}, {logging-ek-es-master-1}{EwF8WLIgSF6Q1Q46_51VlA}{wz5rg74iThicJdtzXZg29g}{ │
│ 10.42.240.8}{10.42.240.8:9300}{mr}, {logging-ek-es-master-2}{jtrThk_USA2jUcJYoIHQdg}{HMvZ_dUfTM-Ar4ROeIOJlw}{10.42.0.5}{10.42.0.5:9300}{mr}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, [::1]:9300, [::1]: │
│ 9301, [::1]:9302, [::1]:9303, [::1]:9304, [::1]:9305, 10.42.0.5:9300, 10.42.240.8:9300] from hosts providers and [{logging-ek-es-master-0}{fHLQrvLsTJ6UvR_clSaxfg}{iLoGrnWSTpiZxq59z7I5zA}{10.42.64.4}{10.42.64.4:9300}{mr}] from last-known cluster state; node term 0, last-accepted version │
│ 0 in term 0" }
and I've verified that they can in fact reach each other through the network. Is there anything else or anywhere else I need to look for errors? I installed elasticsearch via elasticoperator.

Start doing curl to the 9300 port. Make sure you get a valid response going both ways.
Also make sure your MTU is set right or this can happen as well. It's a network problem most of the time.

Liferay DXP 7.3 Theme Creation: Error during gulp build

I'm getting this error when I try to use the gulp build command:
[09:08:11] Starting 'build:compile-css'...
Deprecation Warning: Using / for division outside of calc() is deprecated and will be removed in Dart Sass 2.0.0.
Recommendation: math.div($spacer, 2) or calc($spacer / 2)
More info and automated migrator: https://sass-lang.com/d/slash-div
╷
306 │ $headings-margin-bottom: $spacer / 2 !default;
│ ^^^^^^^^^^^
╵
build\_css\clay\bootstrap\_variables.scss 306:31 #import
build\_css\clay\base.scss 10:9 #import
build\_css\clay.scss 1:9 root stylesheet
Deprecation Warning: Using / for division outside of calc() is deprecated and will be removed in Dart Sass 2.0.0.
Recommendation: math.div($input-padding-y, 2) or calc($input-padding-y / 2)
More info and automated migrator: https://sass-lang.com/d/slash-div
╷
501 │ $input-height-inner-quarter: add($input-line-height * .25em, $input-padding-y / 2) !default;
│ ^^^^^^^^^^^^^^^^^^^^
╵
build\_css\clay\bootstrap\_variables.scss 501:73 #import
build\_css\clay\base.scss 10:9 #import
build\_css\clay.scss 1:9 root stylesheet
Deprecation Warning: Using / for division outside of calc() is deprecated and will be removed in Dart Sass 2.0.0.
Recommendation: math.div($custom-control-indicator-size, 2) or calc($custom-control-indicator-size / 2)
More info and automated migrator: https://sass-lang.com/d/slash-div
╷
571 │ $custom-switch-indicator-border-radius: $custom-control-indicator-size / 2 !default;
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
╵
build\_css\clay\bootstrap\_variables.scss 571:49 #import
build\_css\clay\base.scss 10:9 #import
build\_css\clay.scss 1:9 root stylesheet
Deprecation Warning: Using / for division outside of calc() is deprecated and will be removed in Dart Sass 2.0.0.
Recommendation: math.div($spacer, 2) or calc($spacer / 2)
More info and automated migrator: https://sass-lang.com/d/slash-div
╷
717 │ $nav-divider-margin-y: $spacer / 2 !default;
│ ^^^^^^^^^^^
╵
build\_css\clay\bootstrap\_variables.scss 717:37 #import
build\_css\clay\base.scss 10:9 #import
build\_css\clay.scss 1:9 root stylesheet
Deprecation Warning: Using / for division outside of calc() is deprecated and will be removed in Dart Sass 2.0.0.
Recommendation: math.div($spacer, 2) or calc($spacer / 2)
More info and automated migrator: https://sass-lang.com/d/slash-div
╷
722 │ $navbar-padding-y: $spacer / 2 !default;
│ ^^^^^^^^^^^
╵
build\_css\clay\bootstrap\_variables.scss 722:37 #import
build\_css\clay\base.scss 10:9 #import
build\_css\clay.scss 1:9 root stylesheet
[09:08:14] 'build:compile-css' errored after 3.14 s
[09:08:14] Error in plugin "sass"
Message:
build\_css\compat\components\_dropdowns.scss
Error: compound selectors may no longer be extended.
Consider `#extend .dropdown-item, .disabled` instead.
╷
34 │ #extend .dropdown-item.disabled;
│ ^^^^^^^^^^^^^^^^^^^^^^^
╵
build\_css\compat\components\_dropdowns.scss 34:11 root stylesheet
Details:
formatted: Error: compound selectors may no longer be extended.
Consider `#extend .dropdown-item, .disabled` instead.
╷
34 │ #extend .dropdown-item.disabled;
│ ^^^^^^^^^^^^^^^^^^^^^^^
╵
build\_css\compat\components\_dropdowns.scss 34:11 root stylesheet
line: 34
column: 11
file: C:\Users\fmateosg\IdeaProjects\test\themes\base-theme\build\_css\compat\components\_dropdowns.scss
status: 1
messageFormatted: build\_css\compat\components\_dropdowns.scss
Error: compound selectors may no longer be extended.
Consider `#extend .dropdown-item, .disabled` instead.
╷
34 │ #extend .dropdown-item.disabled;
│ ^^^^^^^^^^^^^^^^^^^^^^^
╵
build\_css\compat\components\_dropdowns.scss 34:11 root stylesheet
messageOriginal: compound selectors may no longer be extended.
Consider `#extend .dropdown-item, .disabled` instead.
╷
34 │ #extend .dropdown-item.disabled;
│ ^^^^^^^^^^^^^^^^^^^^^^^
╵
build\_css\compat\components\_dropdowns.scss 34:11 root stylesheet
relativePath: build\_css\compat\components\_dropdowns.scss
domainEmitter: [object Object]
domainThrown: false
[09:08:14] 'build' errored after 6.36 s
I know that there is a similar question to this one but the answers there couldn't solve my problem. This is my folder structure:
Folder structure
I copied the _dropdowns.scss file into src/css/compat/components/ and did the modification there but it still gives me the error when I retry to build

I had the same problem because I accidentally upgraded my liferay-theme-tasks in package.json to version 11.2.2.
If thats the case, downgrade liferay-theme-tasks to version ^10.0.2, remove the node_modules folder and run npm install again. Gulp build should pass after that.
I'm using Node.js version 14.17.0, gulp version 4.0.2

Send an email when application shuts down

I need to send an email when the application shuts down.
I have created the following method:
void onStop(#Observes ShutdownEvent event) throws ParserConfigurationException {
mailer.send(Mail.withText("test#email.com","STOP TEST", "HERE IS STOP A TEST"));
Quarkus.waitForExit();
}
But I always receive the following error:
│ Caused by: java.net.ConnectException: Connection refused (Connection refused) │
│ at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) │
│ at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412) │
│ at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255) │
│ at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237) │
│ at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) │
│ at java.base/java.net.Socket.connect(Socket.java:609) │
│ at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:368) │
│ at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) │
│ ... 32 more
I am quite sure that all the connections are closed when I am trying to send the email. This is very similar to the question asked here -> How to accept http requests after shutdown signal in Quarkus?, but not clear to me how to implement the solution he proposes.
Any help is greatly appreciated.

Error while loading data from BigQuery table to Dataproc cluster

I'm new to Dataproc and PySpark and facing certain issues while integrating BigQuery table to Dataproc cluster via Jupyter Lab API. Below is the code that I used for loading BigQuery table to the Dataproc cluster through Jupyter Notebook API but I am getting an error while loading the table
from pyspark.sql import SparkSession
SparkSession.builder.appName('Jupyter BigQuery Storage').config(
'spark.jars', 'gs://spark-lib/bigquery/spark-bigquery-latest.jar').getOrCreate()
df=spark.read.format("com.google.cloud.spark.bigquery").option(
"table", "publicdata.samples.shakespeare").load()
df.printSchema()
Below, is the error I'm getting
Py4JJavaErrorTraceback (most recent call last)
<ipython-input-17-789ad67053e5> in <module>()
1 table = "publicdata.samples.shakespeare"
----> 2 df = spark.read.format("com.google.cloud.spark.bigquery").option("table",table).load()
3 df.printSchema()
/usr/lib/spark/python/pyspark/sql/readwriter.pyc in load(self, path, format, schema, **options)
170 return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
171 else:
--> 172 return self._df(self._jreader.load())
173
174 #since(1.4)
/opt/conda/anaconda/lib/python2.7/site-packages/py4j/java_gateway.pyc in __call__(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id, self.name)
1258
1259 for temp_arg in temp_args:
/usr/lib/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
65 s = e.java_exception.toString()
/opt/conda/anaconda/lib/python2.7/site-packages/py4j/protocol.pyc in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
Py4JJavaError: An error occurred while calling o254.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.google.cloud.spark.bigquery. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:639)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.google.cloud.spark.bigquery.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:622)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:622)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:622)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:622)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:622)
... 13 more```

Please assign the SparkSession.builder result to a variable:
spark = SparkSession.builder\
.appName('Jupyter BigQuery Storage')\
.config('spark.jars', 'gs://spark-lib/bigquery/spark-bigquery-latest.jar')\
.getOrCreate()
Also, the reference to the public datasets is bigquery-public-data, so please change the reading to
df = spark.read.format("com.google.cloud.spark.bigquery")\
.option("table", "bigquery-public-data.samples.shakespeare")\
.load()

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Create table on cluster of clickhouse error - clickhouse

Related

Clickhouse shows duplicates data in distributed table

ElasticSearch MasterNotDiscoveredException

Liferay DXP 7.3 Theme Creation: Error during gulp build

Send an email when application shuts down

Error while loading data from BigQuery table to Dataproc cluster

Categories

Resources