I’m quite new to big data and right now I’m in the middle of researching different technologies for a prototype my team need to build. Our system is essentially going to stream in data from a single source (however this source contains data from different systems, in binary format). We stream it in, decode/process the messages, store them, and then have to supply it to our end users in the form of some subscription based dashboard.
We have a good handle on the streaming aspect right now, we’re using Kafka and either spark streaming or kafka streams (depending on how much processing we’re going to need to perform on the data). But we’ve kind of hit a roadblock on the storage aspect of things. I’ve read a lot on Cassandra, and it seems to be quite widely used in the industry at the moment, but my team lead wants to know why we can’t just use HDFS, as we could store both the unstructured and structured data there. But from what I have read, HDFS is a file system, and not a database. It doesn’t have a query language associated with it, so I don’t know how good it is for querying for data from a frontend system when we need it, whereas Cassandra seems like it would be good for this. The only downside being , the data has to be structured, and you apparently need to design your tables and columns with your queries in mind, as it’s not incredibly flexible.
Then on the flip-side of this, you have Elasticsearch that can also be used to store data, although I’ve read that it should not be your primary data store, as it is designed as a search engine first and foremost, but that it can be really good for querying data and used with tools like Kibana for building the visualizations we need. To me it seems, you can go with a combination of elasticsearch and hdfs for storage and searching, or cassandra for storage and also elasticsearch for storage and searching… It’s really confusing.
Can anyone offer some clarification for me on which one would be best for my particular use case? Or just general use cases for both could be helpful too, in helping me understand these technologies. Thanks.