Currently our setup is by using Attunity Replicate to capture the changes in the DB2 and then write it in Hive. The problem is that Attunity Replicate cannot handle Hive’s ACID property well, especially for Update and Delete operations (we know that there is Attunity Compose for this, but we can’t use this option now).
I read this presentation from Hortonworks and Attunity. It suggests to use Attunity Replicate and pipe the output to Kafka, and then to the datalake (I guess we can use either Spark Streaming or Flume to do that).
However I have few doubts:
- Can Spark/Flume handle update/insert operations in Hive?
- Is this the optimal solution? (db2 -> attunity replicate -> kafka -> spark/flume -> hive) Or, is there any better solution for this?