Install Maven .jar Dependencies from Jupyter Python Spark Notebook

When creating a spark session, you can actually install external .jars. This is an amazing feature, because many Maven artefacts have complex dependencies which are hard to download and track manually. So, for instance, in my local Jupyter notebook I need to call to Azure Storage:

spark = (SparkSession
         .builder
         .master("local[2]")
         .config("spark.jars.packages", "org.apache.hadoop:hadoop-azure:3.2.1")
         .getOrCreate())

See, when Jupyter starts up and executes this cell, we have automatic dependency resolution and optional download if it doesn’t exist natively:

org.apache.hadoop#hadoop-azure added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-2cb8bc79-d1f9-4b43-aa38-3034ee2adf6a;1.0
	confs: [default]
	found org.apache.hadoop#hadoop-azure;3.2.1 in central
	found org.apache.httpcomponents#httpclient;4.5.6 in central
	found org.apache.httpcomponents#httpcore;4.4.10 in central
	found commons-logging#commons-logging;1.1.3 in central
	found commons-codec#commons-codec;1.11 in central
	found com.microsoft.azure#azure-storage;7.0.0 in central
	found com.fasterxml.jackson.core#jackson-core;2.9.8 in central
	found org.slf4j#slf4j-api;1.7.25 in central
	found com.microsoft.azure#azure-keyvault-core;1.0.0 in central
	found com.google.guava#guava;27.0-jre in central
	found com.google.guava#failureaccess;1.0 in central
	found com.google.guava#listenablefuture;9999.0-empty-to-avoid-conflict-with-guava in central
	found com.google.code.findbugs#jsr305;3.0.0 in central
	found org.checkerframework#checker-qual;2.5.2 in central
	found com.google.errorprone#error_prone_annotations;2.2.0 in central
	found com.google.j2objc#j2objc-annotations;1.1 in central
	found org.codehaus.mojo#animal-sniffer-annotations;1.17 in central
	found org.eclipse.jetty#jetty-util-ajax;9.3.24.v20180605 in central
	found org.eclipse.jetty#jetty-util;9.3.24.v20180605 in central
	found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in central
	found org.codehaus.jackson#jackson-core-asl;1.9.13 in central
	found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in central
:: resolution report :: resolve 1116ms :: artifacts dl 63ms
	:: modules in use:
	com.fasterxml.jackson.core#jackson-core;2.9.8 from central in [default]
	com.google.code.findbugs#jsr305;3.0.0 from central in [default]
	com.google.errorprone#error_prone_annotations;2.2.0 from central in [default]
	com.google.guava#failureaccess;1.0 from central in [default]
	com.google.guava#guava;27.0-jre from central in [default]
	com.google.guava#listenablefuture;9999.0-empty-to-avoid-conflict-with-guava from central in [default]
	com.google.j2objc#j2objc-annotations;1.1 from central in [default]
	com.microsoft.azure#azure-keyvault-core;1.0.0 from central in [default]
	com.microsoft.azure#azure-storage;7.0.0 from central in [default]
	commons-codec#commons-codec;1.11 from central in [default]
	commons-logging#commons-logging;1.1.3 from central in [default]
	org.apache.hadoop#hadoop-azure;3.2.1 from central in [default]
	org.apache.httpcomponents#httpclient;4.5.6 from central in [default]
	org.apache.httpcomponents#httpcore;4.4.10 from central in [default]
	org.checkerframework#checker-qual;2.5.2 from central in [default]
	org.codehaus.jackson#jackson-core-asl;1.9.13 from central in [default]
	org.codehaus.jackson#jackson-mapper-asl;1.9.13 from central in [default]
	org.codehaus.mojo#animal-sniffer-annotations;1.17 from central in [default]
	org.eclipse.jetty#jetty-util;9.3.24.v20180605 from central in [default]
	org.eclipse.jetty#jetty-util-ajax;9.3.24.v20180605 from central in [default]
	org.slf4j#slf4j-api;1.7.25 from central in [default]
	org.wildfly.openssl#wildfly-openssl;1.0.7.Final from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   22  |   0   |   0   |   0   ||   22  |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-2cb8bc79-d1f9-4b43-aa38-3034ee2adf6a
	confs: [default]
	0 artifacts copied, 22 already retrieved (0kB/17ms)


To contact me, send an email anytime or leave a comment below.