First, download Spark and extract somewhere. In my case it’s in
/opt/spark on a server:
alg@mc:/opt/spark$ ls bin data jars LICENSE logs python README.md sbin yarn conf examples kubernetes licenses NOTICE R RELEASE work
Now, create a user called
spark for security reasons
useradd -m spark
-m to create home folder as Spark requires it!
Now I will create two systemd services - spark master and worker.
sudo vi /etc/systemd/system/spark-master.service:
[Unit] Description=Apache Spark Master Wants=network-online.target After=network-online.target [Service] User=spark Group=spark Type=forking SuccessExitStatus=143 WorkingDirectory=/opt/spark/sbin ExecStart=/opt/spark/sbin/start-master.sh ExecStop=/opt/spark/sbin/stop-master.sh [Install] WantedBy=multi-user.target
sudo vi /etc/systemd/system/spark-worker.service:
[Unit] Description=Apache Spark Worker Wants=network-online.target After=network-online.target [Service] User=spark Group=spark Type=forking SuccessExitStatus=143 WorkingDirectory=/opt/spark/sbin ExecStart=/opt/spark/sbin/start-worker.sh spark://192.168.1.85:7077 ExecStop=/opt/spark/sbin/stop-worker.sh [Install] WantedBy=multi-user.target
start-worker script requires address of Spark master. In my case it’s
192.168.1.85 (local IP address). Spark Master is configured to run on that address by
vi /opt/spark/conf/spark-env.sh which looks like this:
To test, register services by executing
systemctl daemon-reload and starting both services:
systemctl start spark-master
systemctl start spark-worker
Check for errors if any issues with
systemctl status spark-master. If everything is OK it should look like:
Apparently, if you need Spark to autostart on server reboot, just enable two services:
systemctl enable spark-master
systemctl enable spark-worker
Now reboot the server and Spark should still be up and running.
To contact me, send an email anytime or leave a comment below.