Listing Spark Databases and Tables Fast

Spark includes two useful functions to list databases and tables:

spark.catalog.listDatabases()

spark.catalog.listTables(db_name)

Both of those are using catalog API in Spark, and run for extremely long time, sometimes minutes (!) as they try to fetch all the possible metadata for all the objects. However, if you only need basic metadata, like database names and table names you can use Spark SQL:

show databases

show tables from db_name

which return almost instantly. Apparently, to use from Python/Scala just wrap it in spark.sql i.e. spark.sql("show databases") will return a DataFrame with the info you require.

Em, excuse me! Have Android 📱 and use Databricks? You might be interested in my totally free (and ad-free) Pocket Bricks . You can get it from Google Play too: Get it on Google Play


To contact me, send an email anytime or leave a comment below.