In my case, is aggregated using Spark and written to HDFS. Afterwards, it needs to be copied over to a Postgres database. What’s the way you’ve found works the best?

Right now I’m using a Python script to transfer the data from HDFS to a temp directory then using COPY to write to Postgres. Seems highly inefficient, so I’m looking for alternatives. Sqoop is an option, but some people have mentioned JDBC connection issues.

Edit: title typo…RDBMS*

Source link

No tags for this post.


Please enter your comment!
Please enter your name here