Rich Client Platform for the DIA-integrated Development

 Uncategorized  Comments Off on Rich Client Platform for the DIA-integrated Development
Mar 072017
 

DICE focuses on the quality assurance for data-intensive applications (DIA) developed through the Model-Driven Engineering (MDE) paradigm. The project aims at delivering methods and tools that will help satisfying quality requirements in data-intensive applications by iterative enhancement of their architecture design. One component of the tool chain developed within the project is the DICE IDE. It is an Integrated Development Environment (IDE) that accelerates the development of data-intensive applications.

The Eclipse-based DICE IDE integrates most of the tools of the DICE framework and it is the base of the DICE methodology. As highlighted in the deliverable D1.1 State of the Art Analysis, there does not exist yet any MDE IDE on the software market through which a designer can create models to describe and analyse data-intensive or Big Data applications and their underpinning technology stack. This is the motivation for defining the DICE IDE.

The DICE IDE is based on Eclipse, which is the de-facto standard for the creation of software engineering models based on the MDE approach. DICE customizes the Eclipse IDE with suitable plug-ins that integrate the execution of the different DICE tools, in order to minimize learning curves and simplify adoption. In this blog post we explain how the DICE tools introduced to the reader earlier have been integrated into the IDE. So, How’s the DICE IDE built?

Continue reading »

Apache Cassandra: From Design to Deployment

 Uncategorized  Comments Off on Apache Cassandra: From Design to Deployment
Feb 022017
 

In spite of its young age, the Big Data ecosystem already contains a plethora of complex and diverse open source frameworks. They are commonly of two kinds: data platform frameworks, which deal with the needed storage scalability, or processing frameworks, which aim to improve query performance [1]. A Big Data application is generally produced by combining them in a smooth way. Each framework operates with its own computational model. For example, a data platform framework may manage distributed files, tuples, or graphs, and a processing framework may handle batch or real-time jobs. Building a reliable and robust Data-Intensive Application (DIA) consists in finding a suitable combination that meet requirements. Besides, without a careful design by developers on the one hand, and an optimal configuration of frameworks by operators on the other hand, the quality of the DIA cannot be guaranteed.

In this blog post we would like to mention three simple principles we have learned while we were building our Big Data application:

  1. Using models to synchronize the work of developers and operators;
  2. Designing databases so that we do not need to update or delete data; and
  3. Letting operators resolve low-level production-specific issues.

Continue reading »

Formal Verification of Data-Intensive Applications with Temporal Logic

 Uncategorized  Comments Off on Formal Verification of Data-Intensive Applications with Temporal Logic
Dec 052016
 

Beside functional aspects, designers of Data-Intensive Applications have to consider various quality aspects that are specific to the applications processing huge volumes of data with high throughput and running in clusters of (many) physical machines. A broad set of non-functional aspects positioned in the areas of performance and safety should be included at the early stage of the design process to guarantee high-quality software development.

The evaluation of the correctness of such applications, and when functional and non-functional aspects are both involved, is definitely not trivial. In the case of Data-Intensive Applications, the inherent distributed architecture, the software stratification and the computational paradigm implementing the logic of the applications pose new questions on the criteria that should be considered to evaluate their correctness.

Continue reading »

Performance and Reliability in DIA Development

 Uncategorized  Comments Off on Performance and Reliability in DIA Development
Oct 202016
 

Worried about the performance and reliability of your data-intensive application?

A Capgemini research shows that only 13% of organizations have achieved full-scale production for their Data-Intensive applications (DIA). In particular the research refers to applications using Big Data implementations, such as Hadoop MapReduce, Apache Storm or Apache Spark. Apart of the correct deployment and optimization of a DIA, software engineers face the problem of achieving performance and reliability requirements. Definitely, a framework to assist in guaranteeing these requirements in the very early phases of the development could be of great help. Consider that in later phases, the ecosystem of a cluster is not completely controllable. Therefore, predictions of throughputs, service times or scalabilities with varying number of users, workloads, network traffic or failures are a need. Within the DICE project, Simulation tool has been developed to help achieve that.

Continue reading »

Using Apache Storm for Trend Detection in the Social Media

 Uncategorized  Comments Off on Using Apache Storm for Trend Detection in the Social Media
Oct 052016
 

As it is widely known, especially in the media industry, messages posted in social media contain valuable information related to events and trends in the real world. Various industries and brands that analyze social media are gaining valuable insights and information which they use in a number of operations.

For example, in the news industry, trend detection is useful for:

  • identifying emerging news based on the popularity of a certain topic and
  • defining areas of great public interest that should be closely monitored as even a small development affects many people and leads to emerging news.

Continue reading »

Going for NoOps: should SysAdmins be worried for their jobs?

 Uncategorized  Comments Off on Going for NoOps: should SysAdmins be worried for their jobs?
Aug 292016
 

Reliable and fast automation drives efficient quality-driven development process. In DICE, we are factoring into this process deployment of services such as Storm, Cassandra or Hadoop. We offer this capability in a tool called DICER, and back it up with a technology library to off-load the installation and configuration work to a set of scripts. In effect, our technology library enables a NoOps experience to the users, because no SysAdmins are required to do the work of setting these services up. But is this a bad news for the SysAdmins? Will DICE put them out of job?

Continue reading »

A design for life!

 Uncategorized  Comments Off on A design for life!
Aug 152016
 

Have you ever had problems working with a data intensive application?

If so, you’ll know that the difficulty comes from having to unavoidably deal with various failures. So what do you do? Many people have found success by designing software to never fail. But there are a few things you should know before you buy and implement a solution in order to ensure your software is actually  resilient to failures of the hosting environment. This post will tell you what you need to know to make sure you select a much more viable strategy to make your applications reliable and will let you properly test applications both during development and after deployment. Within the DICE project, a Fault Injection Tool (FIT) has been developed to help achieve exactly that.

Continue reading »

DICE Configuration Optimization Tool (BO4CO)

 Uncategorized  Comments Off on DICE Configuration Optimization Tool (BO4CO)
Jul 052016
 

Big Data systems are regarded as a new class of software systems leveraging several emerging technologies to efficiently ingest, process and produce large quantities of data. Each of the comprising technologies (e.g., Hadoop, Spark, Cassandra) has typically dozens of configurable parameters that should be carefully tuned in order to perform optimally. Unfortunately, users of such systems, like data scientists, usually lack the technical skills to tune system internals. Such users would rather use a system that can tune itself. Yet, there is a shortage of automated methods to support the configuration of Big Data systems. One possible explanation is that the influences of a configuration option on performance are not well understood [1].

Continue reading »

May 222016
 

The IT industry is not immune in the efforts of speeding up the production of its goods – applications and services. The best way of reducing cost and time needed to build a software solution is to cut the processes that can be done better and faster automatically without losing the essence of the process. Installing and configuring software is traditionally a manual process, and thus complex, costly and time-consuming. A much better alternative is to describe the whole application in a blueprint, then use a suitable tool to interpret the blueprint to turn it into a live application. OASIS TOSCA provides an emerging standard for describing applications in blueprints.

Continue reading »

May 182016
 

Big Data is certainly a big hype nowadays and there are a tremendous number of frameworks available that enable companies to develop Big Data applications. The development of data-intensive applications, like development of any other software application, involves testing, validation and fine-tuning processes to ensure the performance and reliability the end-users expect. Throughout these processes the execution of the application needs to be constantly monitored in order to extract execution trends and spot the anomalies. And this is only the beginning. Once in production, monitoring of the application, together with its underlying infrastructure, is a must. But Big Data applications generate Big Monitoring Data, and not only this: the data is generated in different formats, is available either in log files, or via APIs. Continue reading »