Apache Cassandra: From Design to Deployment

 Uncategorized  Comments Off on Apache Cassandra: From Design to Deployment
Feb 022017
 

In spite of its young age, the Big Data ecosystem already contains a plethora of complex and diverse open source frameworks. They are commonly of two kinds: data platform frameworks, which deal with the needed storage scalability, or processing frameworks, which aim to improve query performance [1]. A Big Data application is generally produced by combining them in a smooth way. Each framework operates with its own computational model. For example, a data platform framework may manage distributed files, tuples, or graphs, and a processing framework may handle batch or real-time jobs. Building a reliable and robust Data-Intensive Application (DIA) consists in finding a suitable combination that meet requirements. Besides, without a careful design by developers on the one hand, and an optimal configuration of frameworks by operators on the other hand, the quality of the DIA cannot be guaranteed.

In this blog post we would like to mention three simple principles we have learned while we were building our Big Data application:

  1. Using models to synchronize the work of developers and operators;
  2. Designing databases so that we do not need to update or delete data; and
  3. Letting operators resolve low-level production-specific issues.

Continue reading »

Formal Verification of Data-Intensive Applications with Temporal Logic

 Uncategorized  Comments Off on Formal Verification of Data-Intensive Applications with Temporal Logic
Dec 052016
 

Beside functional aspects, designers of Data-Intensive Applications have to consider various quality aspects that are specific to the applications processing huge volumes of data with high throughput and running in clusters of (many) physical machines. A broad set of non-functional aspects positioned in the areas of performance and safety should be included at the early stage of the design process to guarantee high-quality software development.

The evaluation of the correctness of such applications, and when functional and non-functional aspects are both involved, is definitely not trivial. In the case of Data-Intensive Applications, the inherent distributed architecture, the software stratification and the computational paradigm implementing the logic of the applications pose new questions on the criteria that should be considered to evaluate their correctness.

Continue reading »

Performance and Reliability in DIA Development

 Uncategorized  Comments Off on Performance and Reliability in DIA Development
Oct 202016
 

Worried about the performance and reliability of your data-intensive application?

A Capgemini research shows that only 13% of organizations have achieved full-scale production for their Data-Intensive applications (DIA). In particular the research refers to applications using Big Data implementations, such as Hadoop MapReduce, Apache Storm or Apache Spark. Apart of the correct deployment and optimization of a DIA, software engineers face the problem of achieving performance and reliability requirements. Definitely, a framework to assist in guaranteeing these requirements in the very early phases of the development could be of great help. Consider that in later phases, the ecosystem of a cluster is not completely controllable. Therefore, predictions of throughputs, service times or scalabilities with varying number of users, workloads, network traffic or failures are a need. Within the DICE project, Simulation tool has been developed to help achieve that.

Continue reading »

Using Apache Storm for Trend Detection in the Social Media

 Uncategorized  Comments Off on Using Apache Storm for Trend Detection in the Social Media
Oct 052016
 

As it is widely known, especially in the media industry, messages posted in social media contain valuable information related to events and trends in the real world. Various industries and brands that analyze social media are gaining valuable insights and information which they use in a number of operations.

For example, in the news industry, trend detection is useful for:

  • identifying emerging news based on the popularity of a certain topic and
  • defining areas of great public interest that should be closely monitored as even a small development affects many people and leads to emerging news.

Continue reading »

Going for NoOps: should SysAdmins be worried for their jobs?

 Uncategorized  Comments Off on Going for NoOps: should SysAdmins be worried for their jobs?
Aug 292016
 

Reliable and fast automation drives efficient quality-driven development process. In DICE, we are factoring into this process deployment of services such as Storm, Cassandra or Hadoop. We offer this capability in a tool called DICER, and back it up with a technology library to off-load the installation and configuration work to a set of scripts. In effect, our technology library enables a NoOps experience to the users, because no SysAdmins are required to do the work of setting these services up. But is this a bad news for the SysAdmins? Will DICE put them out of job?

Continue reading »

A design for life!

 Uncategorized  Comments Off on A design for life!
Aug 152016
 

Have you ever had problems working with a data intensive application?

If so, you’ll know that the difficulty comes from having to unavoidably deal with various failures. So what do you do? Many people have found success by designing software to never fail. But there are a few things you should know before you buy and implement a solution in order to ensure your software is actually  resilient to failures of the hosting environment. This post will tell you what you need to know to make sure you select a much more viable strategy to make your applications reliable and will let you properly test applications both during development and after deployment. Within the DICE project, a Fault Injection Tool (FIT) has been developed to help achieve exactly that.

Continue reading »

DICE Configuration Optimization Tool (BO4CO)

 Uncategorized  Comments Off on DICE Configuration Optimization Tool (BO4CO)
Jul 052016
 

Big Data systems are regarded as a new class of software systems leveraging several emerging technologies to efficiently ingest, process and produce large quantities of data. Each of the comprising technologies (e.g., Hadoop, Spark, Cassandra) has typically dozens of configurable parameters that should be carefully tuned in order to perform optimally. Unfortunately, users of such systems, like data scientists, usually lack the technical skills to tune system internals. Such users would rather use a system that can tune itself. Yet, there is a shortage of automated methods to support the configuration of Big Data systems. One possible explanation is that the influences of a configuration option on performance are not well understood [1].

Continue reading »

May 222016
 

The IT industry is not immune in the efforts of speeding up the production of its goods – applications and services. The best way of reducing cost and time needed to build a software solution is to cut the processes that can be done better and faster automatically without losing the essence of the process. Installing and configuring software is traditionally a manual process, and thus complex, costly and time-consuming. A much better alternative is to describe the whole application in a blueprint, then use a suitable tool to interpret the blueprint to turn it into a live application. OASIS TOSCA provides an emerging standard for describing applications in blueprints.

Continue reading »

May 182016
 

Big Data is certainly a big hype nowadays and there are a tremendous number of frameworks available that enable companies to develop Big Data applications. The development of data-intensive applications, like development of any other software application, involves testing, validation and fine-tuning processes to ensure the performance and reliability the end-users expect. Throughout these processes the execution of the application needs to be constantly monitored in order to extract execution trends and spot the anomalies. And this is only the beginning. Once in production, monitoring of the application, together with its underlying infrastructure, is a must. But Big Data applications generate Big Monitoring Data, and not only this: the data is generated in different formats, is available either in log files, or via APIs. Continue reading »

DICE enables Quality-Driven DevOps for Big Data – a White Paper

 Uncategorized  Comments Off on DICE enables Quality-Driven DevOps for Big Data – a White Paper
Apr 252016
 

The DICE project has recently concluded its first year of activity, during which a lot of progress has been made in the definition of an innovative framework to develop Big Data applications. A technical architecture has been defined and initial prototypes are rapidly maturing.
The DICE consortium has recently released a white paper to explain to industrial stakeholders the purpose of DICE, its architecture and tool offering, and the market-oriented demonstrators that are currently being implemented.

Download the DICE White Paper

The first complete release of the DICE tools is set for August 2016, with an integrated development environment set for release in February 2017. Stay tuned!

Giuliano Casale, DICE Project Coordinator