I am trying to build a infrastructure (preferably open source) to solve a big problem.
Hence, Looking for options with Presto.

I have large Fact tables, which consists of:

1) 2 Snapshot Facts : 40 attributes and approx 1TB in size.
Partitioned : PRODUCT_ID
Composite Sort : (Date, Country)

2) 2 transaction Facts : 200 attributes each and approx 50GB in size.
Partition Key : PRODUCT_ID
Sortkey : (Date)

Use cases to solve :
1) Volume : One Snapshot fact to be joined with other 2 transaction facts along with and 3-4 dimensions to generate -Series metrics aggregate data at different hierarchies.
2) Velocity : Output in less than 1min may be 2min.
3) High Concurrency : 50 user in parallel querying data from Tableau (Max users : 200)

I have only limited amount of budget to spend per year on infrastructure and .

Either I can build Presto-Hive-S3/HDFS combo on EMR or buy Exasol(2TB) cluster.
But not sure, which one is better.

Please recommend.

Source link

No tags for this post.


Please enter your comment!
Please enter your name here