You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hello hello,
querying population density data (only tested for Germany) works fine, when choosing 'total' as category.
However, when choosing a different category (f.e. 'women'), one can find the downloaded files as .csv in the tmp folder but the code it breaks when creating the parquet files. Error message:
Exception occurred during processing of request from ('127.0.0.1', 33952)
Traceback (most recent call last):
File "/usr/local/lib/python3.10/socketserver.py", line 316, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/local/lib/python3.10/socketserver.py", line 347, in process_request
self.finish_request(request, client_address)
File "/usr/local/lib/python3.10/socketserver.py", line 360, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/local/lib/python3.10/socketserver.py", line 747, in __init__
self.handle()
File "/usr/local/lib/python3.10/site-packages/pyspark/accumulators.py", line 262, in handle
poll(accum_updates)
File "/usr/local/lib/python3.10/site-packages/pyspark/accumulators.py", line 235, in poll
if func():
File "/usr/local/lib/python3.10/site-packages/pyspark/accumulators.py", line 239, in accum_updates
num_updates = read_int(self.rfile)
File "/usr/local/lib/python3.10/site-packages/pyspark/serializers.py", line 564, in read_int
raise EOFError
EOFError
----------------------------------------
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/py4j/clientserver.py", line 480, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/py4j/java_gateway.py", line 1038, in send_command
response = connection.send_command(command)
File "/usr/local/lib/python3.10/site-packages/py4j/clientserver.py", line 503, in send_command
raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving
Traceback (most recent call last):
File "/opt/app/pipelines/population-density/src/main.py", line 21, in <module>
Processor.start(files, output_dir, updated_date)
File "/opt/app/pipelines/population-density/src/Processor.py", line 70, in start
df.write.mode("overwrite").parquet(f"{output_dir}{updated_date}_result.parquet")
File "/usr/local/lib/python3.10/site-packages/pyspark/sql/readwriter.py", line 885, in parquet
self._jwrite.parquet(path)
File "/usr/local/lib/python3.10/site-packages/py4j/java_gateway.py", line 1321, in __call__
return_value = get_return_value(
File "/usr/local/lib/python3.10/site-packages/pyspark/sql/utils.py", line 111, in deco
return f(*a, **kw)
File "/usr/local/lib/python3.10/site-packages/py4j/protocol.py", line 334, in get_return_value
raise Py4JError(
py4j.protocol.Py4JError: An error occurred while calling o84.parquet
ERROR: 1
any idea on how I can fix this? (I work on macOS, Monterey, intel chip and only need the parquet files)
Thank you so much for any help and in general this really awesome project!
The text was updated successfully, but these errors were encountered:
hello hello,
querying population density data (only tested for Germany) works fine, when choosing 'total' as category.
However, when choosing a different category (f.e. 'women'), one can find the downloaded files as .csv in the tmp folder but the code it breaks when creating the parquet files. Error message:
any idea on how I can fix this? (I work on macOS, Monterey, intel chip and only need the parquet files)
Thank you so much for any help and in general this really awesome project!
The text was updated successfully, but these errors were encountered: