Install Maven .jar Dependencies from Jupyter Python Spark Notebook
When creating a spark session, you can actually install external .jar
s. This is an amazing feature, because many Maven artefacts have complex dependencies which are hard to download and track manually. So, for instance, in my local Jupyter notebook I need to call to Azure Storage:
spark = (SparkSession
.builder
.master("local[2]")
.config("spark.jars.packages", "org.apache.hadoop:hadoop-azure:3.2.1")
.getOrCreate())
See, when Jupyter starts up and executes this cell, we have automatic dependency resolution and optional download if it doesn’t exist natively:
org.apache.hadoop#hadoop-azure added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-2cb8bc79-d1f9-4b43-aa38-3034ee2adf6a;1.0
confs: [default]
found org.apache.hadoop#hadoop-azure;3.2.1 in central
found org.apache.httpcomponents#httpclient;4.5.6 in central
found org.apache.httpcomponents#httpcore;4.4.10 in central
found commons-logging#commons-logging;1.1.3 in central
found commons-codec#commons-codec;1.11 in central
found com.microsoft.azure#azure-storage;7.0.0 in central
found com.fasterxml.jackson.core#jackson-core;2.9.8 in central
found org.slf4j#slf4j-api;1.7.25 in central
found com.microsoft.azure#azure-keyvault-core;1.0.0 in central
found com.google.guava#guava;27.0-jre in central
found com.google.guava#failureaccess;1.0 in central
found com.google.guava#listenablefuture;9999.0-empty-to-avoid-conflict-with-guava in central
found com.google.code.findbugs#jsr305;3.0.0 in central
found org.checkerframework#checker-qual;2.5.2 in central
found com.google.errorprone#error_prone_annotations;2.2.0 in central
found com.google.j2objc#j2objc-annotations;1.1 in central
found org.codehaus.mojo#animal-sniffer-annotations;1.17 in central
found org.eclipse.jetty#jetty-util-ajax;9.3.24.v20180605 in central
found org.eclipse.jetty#jetty-util;9.3.24.v20180605 in central
found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in central
found org.codehaus.jackson#jackson-core-asl;1.9.13 in central
found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in central
:: resolution report :: resolve 1116ms :: artifacts dl 63ms
:: modules in use:
com.fasterxml.jackson.core#jackson-core;2.9.8 from central in [default]
com.google.code.findbugs#jsr305;3.0.0 from central in [default]
com.google.errorprone#error_prone_annotations;2.2.0 from central in [default]
com.google.guava#failureaccess;1.0 from central in [default]
com.google.guava#guava;27.0-jre from central in [default]
com.google.guava#listenablefuture;9999.0-empty-to-avoid-conflict-with-guava from central in [default]
com.google.j2objc#j2objc-annotations;1.1 from central in [default]
com.microsoft.azure#azure-keyvault-core;1.0.0 from central in [default]
com.microsoft.azure#azure-storage;7.0.0 from central in [default]
commons-codec#commons-codec;1.11 from central in [default]
commons-logging#commons-logging;1.1.3 from central in [default]
org.apache.hadoop#hadoop-azure;3.2.1 from central in [default]
org.apache.httpcomponents#httpclient;4.5.6 from central in [default]
org.apache.httpcomponents#httpcore;4.4.10 from central in [default]
org.checkerframework#checker-qual;2.5.2 from central in [default]
org.codehaus.jackson#jackson-core-asl;1.9.13 from central in [default]
org.codehaus.jackson#jackson-mapper-asl;1.9.13 from central in [default]
org.codehaus.mojo#animal-sniffer-annotations;1.17 from central in [default]
org.eclipse.jetty#jetty-util;9.3.24.v20180605 from central in [default]
org.eclipse.jetty#jetty-util-ajax;9.3.24.v20180605 from central in [default]
org.slf4j#slf4j-api;1.7.25 from central in [default]
org.wildfly.openssl#wildfly-openssl;1.0.7.Final from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 22 | 0 | 0 | 0 || 22 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-2cb8bc79-d1f9-4b43-aa38-3034ee2adf6a
confs: [default]
0 artifacts copied, 22 already retrieved (0kB/17ms)
To contact me, send an email anytime or leave a comment below.