Norway


I am trying to build a infrastructure (preferably open source) to solve a big problem.
Hence, Looking for options with Presto.

I have large Fact tables, which consists of:

1) 2 Snapshot Facts : 40 attributes and approx 1TB in size.
Partitioned : PRODUCT_ID
Composite Sort : (Date, Country)

2) 2 transaction Facts : 200 attributes each and approx 50GB in size.
Partition Key : PRODUCT_ID
Sortkey : (Date)

Use cases to solve :
1) Volume : One Snapshot fact to be joined with other 2 transaction facts along with and 3-4 dimensions to generate -Series metrics aggregate data at different hierarchies.
2) Velocity : Output in less than 1min may be 2min.
3) High Concurrency : 50 user in parallel querying data from Tableau (Max users : 200)

I have only limited amount of budget to spend per year on infrastructure and .

Either I can build Presto-Hive-S3/HDFS combo on EMR or buy Exasol(2TB) cluster.
But not sure, which one is better.

Please recommend.



Source link

No tags for this post.

LEAVE A REPLY

Please enter your comment!
Please enter your name here