Let us imagine that you are a Software Developer working in a highly innovative data-driven start-up delivering a cutting-edge solution called “Data Digger Solution” to gather raw data from various and heterogeneous sources (e.g. social media, websites, CRM, online sales, servers, emails, etc.), process them and gain tangible insights from them, with fresh semantics allowing concrete and profitable interpretations (e.g. in terms of sales and web presence). Your start-up is growing and signing more and more contracts with major actors in various sectors (banks, insurances, retailers, medias, etc.) and, actually, this is great! Your boss is a visionary man or maybe he just reads the new IDC forecast which sees the Big Data technology and services market growing at a 26.4% compound annual growth rate to $41.5 billion through 2018 driven by wide adoption across industries. To not be victim of your own success, your boss asked you to rapidly design and implement a prototype of “Data Digger Solution” aka DDS using Big Data and Cloud technologies and make sure it will be in accordance with the unstoppable start-up business acceleration especially in terms of performance, reliability and scalability: “Do it fast, cheap, at scale and don’t lose data!”
To be able to deliver this prototype in time will exempt you from explaining to your boss why you deserve a raise! So motivated, you open your favorite search engine, write “build big data application”, get thousands of articles, read some of them and at the end of the day you have a plethora of words such as Map Reduce, Hadoop, Spark, Cassandra, Storm, VM, Linux, Cloudify, Zookeeper, Kafka, Akka, Java, Scala and maybe also Lambda Architecture. Since you are clever, you get that these are not point-and-click technologies. Yet, you are puzzled on how to start the project? How to design your Big Data application? How could you satisfy all the quality requirements? What architecture to adopt keeping in mind the future evolution of the system? How to accelerate quality testing for your release?
Actually, offering an answer to these questions (and more) is the role of the DICE methodology. It is a step-by-step workflow we are continuously testing and validating on actual Data-Intensive Applications (DIA).
This methodology relies on all DICE tools which foster an efficient specification, design, development, and deployment of DIAs for various business domains. The DICE toolset incorporates ready-to-use built-in components supporting many DIA platforms and technologies. So far, the DICE methodology consists of ten defined and fully equipped activities going from business modelling and requirement analysis to the deployment and real-time feedback analysis:
- Business modelling and requirement analysis
- DIA architecture design
- DPIM simulation and verification
- DIA platform and technology mapping
- DTSM simulation and verification
- Platform- and technology-specific implementations
- DIA deployment mapping
- DDSM optimization
- DIA deployment
- Runtime feedback analysis
Each of these actions needs some actors to perform identified tasks with existing tools. From design to deployment, they are guided and assisted by the DICE IDE which interacts with the helpful DICE development and runtime tools. The workflow allows iterations between these steps in order to better meet the designer requirements and let users take full advantage of the DevOps capabilities of the DICE toolset.
The DICE philosophy built into the DICE IDE proposes an innovative architecture/approach, which lets the entire environment be flexible and extensible. Choosing The Eclipse Platform and the Papyrus Modelling Environment is a deliberate choice mainly made because of the built-in extension mechanisms offered by these platforms and widely adopted by developers, i.e. potential end-users of DICE (you). The extensibility is even more significant in DICE which proposes to the users to adapt/enrich the list of supported Big-Data technologies, which will, for sure, evolve and become longer and longer. Existing and adopted solutions such as Spark, Cassandra or Hadoop are already supported, but more and more new emerging solutions will appear. Thanks to Eclipse and to the DICE extension mechanisms, DICE users will be able to integrate these technologies with no effort in order to benefit from the whole DICE ecosystem. This extensibility feature is also a part of the whole methodology.
To sum up, the DICE methodology will (1) guide you through steps to build an efficient architecture, to test it, to simulate it, to optimize it and to deploy your DIA and (2) to adapt it accordingly to your needs. Before I thank you for reading this post, let me tell you that the astonishing growth in data in general will profoundly affect businesses and this fictive story will become, in the near future, an actual challenge for many SMEs. Coming posts and deliverables will give more details about the DICE methodology.
Youssef Ridene (Netfective Technology)
One Response to “How to start a successful DIA from scratch”
Sorry, the comment form is closed at this time.
Nice post. I learn something more demanding on distinct blogs everyday.
It will constantly be provoking to read content from other writers and practice a little something
from their shop. I’d prefer to use some with the content
on my site whether you do’t mind. Natually
I’ll give you a link in your web site. Thanks for sharing.
My blog – simcity buildit hack