In spite of its young age, the Big Data ecosystem already contains a plethora of complex and diverse open source frameworks. They are commonly of two kinds: data platform frameworks, which deal with the needed storage scalability, or processing frameworks, which aim to improve query performance . A Big Data application is generally produced by combining them in a smooth way. Each framework operates with its own computational model. For example, a data platform framework may manage distributed files, tuples, or graphs, and a processing framework may handle batch or real-time jobs. Building a reliable and robust Data-Intensive Application (DIA) consists in finding a suitable combination that meet requirements. Besides, without a careful design by developers on the one hand, and an optimal configuration of frameworks by operators on the other hand, the quality of the DIA cannot be guaranteed.
In this blog post we would like to mention three simple principles we have learned while we were building our Big Data application:
- Using models to synchronize the work of developers and operators;
- Designing databases so that we do not need to update or delete data; and
- Letting operators resolve low-level production-specific issues.