Databricks-Connect Error

For some reason I’m getting the following error when using Databricks-Connect. Thing is absolutely nothing has changed in my environment, and it used to work yesterday. Why is shard address invalid?

An error occurred while calling o28.csv.
: com.databricks.service.SparkServiceConnectionException: Invalid shard address: "https://adb-....azuredatabricks.net/"

To connect to a Databricks cluster, you must specify the URL of your Databricks shard.
Shard address: The URL of your shard (e.g., "https://dbc-01234567-89ab.cloud.databricks.com")
  - Get current value: spark.conf.get("spark.databricks.service.address")
  - Set via conf: spark.conf.set("spark.databricks.service.address", <your shard address>)
  - Set via environment variable: export DATABRICKS_ADDRESS=<your shard address>
      
	at com.databricks.service.SparkServiceDebugHelper$.validateSparkServiceAddress(SparkServiceDebugHelper.scala:121)
	at com.databricks.service.SparkClientManager.$anonfun$getForSession$3(SparkClient.scala:384)
	at org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4724)
	at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
	at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
	at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
	at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2193)
	at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:3932)
	at org.sparkproject.guava.cache.LocalCache$LocalManualCache.get(LocalCache.java:4721)
	at com.databricks.service.SparkClientManager.liftedTree1$1(SparkClient.scala:377)
	at com.databricks.service.SparkClientManager.getForSession(SparkClient.scala:376)
	at com.databricks.service.SparkClientManager.getForSession$(SparkClient.scala:353)
	at com.databricks.service.SparkClientManager$.getForSession(SparkClient.scala:401)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:292)
	at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:715)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:295)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.http.client.HttpResponseException: status code: 404
	at com.databricks.service.DBAPIClient.get(DBAPIClient.scala:101)
	at com.databricks.service.SparkServiceDebugHelper$.validateSparkServiceAddress(SparkServiceDebugHelper.scala:118)
	... 26 more

After a lot of poking around I found the issue - shard address must not end with a forward slash. This used to work in earlier versions of databricks connect but not the latest stable one (10.4 LTS).

All good after running databricks-connect test now:

* PySpark is installed at ...
* Checking SPARK_HOME
* Checking java version
openjdk version "11.0.15" 2022-04-19 LTS
OpenJDK Runtime Environment Corretto-11.0.15.9.1 (build 11.0.15+9-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.15.9.1 (build 11.0.15+9-LTS, mixed mode)
WARNING: Java versions >8 are not supported by this SDK
* Skipping scala command test on Windows
* Testing python command
22/06/23 11:09:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/06/23 11:09:50 WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set.
View job details at ...
* Simple PySpark test passed                                        (0 + 8) / 8]
* Testing dbutils.fs
[FileInfo(...)]
* Simple dbutils test passed
* All tests passed.

Em, excuse me! Have Android 📱 and use Databricks? You might be interested in my totally free (and ad-free) Pocket Bricks . You can get it from Google Play too: Get it on Google Play


To contact me, send an email anytime or leave a comment below.