Listing Spark Databases and Tables Fast

Spark includes two useful functions to list databases and tables:

spark.catalog.listDatabases()

spark.catalog.listTables(db_name)

Both of those are using catalog API in Spark, and run for extremely long time, sometimes minutes (!) as they try to fetch all the possible metadata for all the objects. However, if you only need basic metadata, like database names and table names you can use Spark SQL:

show databases

show tables from db_name

which return almost instantly. Apparently, to use from Python/Scala just wrap it in spark.sql i.e. spark.sql("show databases") will return a DataFrame with the info you require.

Have a question⁉ Contact me.