In my case, data is aggregated using Spark and written to HDFS. Afterwards, it needs to be copied over to a Postgres database. What’s the way you’ve found works the best?
Right now I’m using a Python script to transfer the data from HDFS to a temp directory then using COPY to write to Postgres. Seems highly inefficient, so I’m looking for alternatives. Sqoop is an option, but some people have mentioned JDBC connection issues.
Edit: title typo…RDBMS*