import requests import zipfile import io loc = "https://www.getthedata.com/downloads/open_postcode_geo.csv.zip" zip_bin_data = requests.get(loc).content byte_file = io.BytesIO(zip_bin_data) with zipfile.ZipFile(byte_file, "r") as zip_ref: print(zip_ref.filelist) entry_name = zip_ref.filelist csv_bin_data = zip_ref.read(entry_name) csv_data = csv_bin_data.decode("utf-8") lines = csv_data.splitlines() dt = spark.sparkContext.parallelize(lines) df = spark.read.csv(dt, "postcode string, status string, usertype string, easting int, northing int, positional_quality_indicator int, country string, latitude decimal(25,20), longitude decimal(25,20), postcode_no_space string, postcode_fixed_width_seven string, postcode_fixed_width_eight string, postcode_area string, postcode_district string, postcode_sector string, outcode string, incode string") display(df)
There are 2'581'934 records at the time of this writing.
Be aware that you need to use proper licensing as they mention:
Free to use for any purpose - attribution required.
Open Postcode Geo is derived from the ONS Postcode Directory which is licenced under the Open Government Licence and the Ordnance Survey OpenData Licence. Northern Irish postcodes have been removed as these are covered by a more restrictive licence. You may use the additional fields provided by GetTheData without restriction.
For details of the required attribution statements see the ONS Licences page.