Listing Spark Databases and Tables Fast
Spark includes two useful functions to list databases and tables:
spark.catalog.listDatabases()
spark.catalog.listTables(db_name)
Both of those are using catalog API in Spark, and run for extremely long time, sometimes minutes (!) as they try to fetch all the possible metadata for all the objects. However, if you only need basic metadata, like database names and table names you can use Spark SQL:
show databases
show tables from db_name
which return almost instantly. Apparently, to use from Python/Scala just wrap it in spark.sql i.e. spark.sql("show databases") will return a DataFrame with the info you require.
