Performance and Reliability in DIA Development

 Uncategorized  Comments Off on Performance and Reliability in DIA Development
Oct 202016

Worried about the performance and reliability of your data-intensive application?

A Capgemini research shows that only 13% of organizations have achieved full-scale production for their Data-Intensive applications (DIA). In particular the research refers to applications using Big Data implementations, such as Hadoop MapReduce, Apache Storm or Apache Spark. Apart of the correct deployment and optimization of a DIA, software engineers face the problem of achieving performance and reliability requirements. Definitely, a framework to assist in guaranteeing these requirements in the very early phases of the development could be of great help. Consider that in later phases, the ecosystem of a cluster is not completely controllable. Therefore, predictions of throughputs, service times or scalabilities with varying number of users, workloads, network traffic or failures are a need. Within the DICE project, Simulation tool has been developed to help achieve that.

Continue reading »

Using Apache Storm for Trend Detection in the Social Media

 Uncategorized  Comments Off on Using Apache Storm for Trend Detection in the Social Media
Oct 052016

As it is widely known, especially in the media industry, messages posted in social media contain valuable information related to events and trends in the real world. Various industries and brands that analyze social media are gaining valuable insights and information which they use in a number of operations.

For example, in the news industry, trend detection is useful for:

  • identifying emerging news based on the popularity of a certain topic and
  • defining areas of great public interest that should be closely monitored as even a small development affects many people and leads to emerging news.

Continue reading »

Going for NoOps: should SysAdmins be worried for their jobs?

 Uncategorized  Comments Off on Going for NoOps: should SysAdmins be worried for their jobs?
Aug 292016

Reliable and fast automation drives efficient quality-driven development process. In DICE, we are factoring into this process deployment of services such as Storm, Cassandra or Hadoop. We offer this capability in a tool called DICER, and back it up with a technology library to off-load the installation and configuration work to a set of scripts. In effect, our technology library enables a NoOps experience to the users, because no SysAdmins are required to do the work of setting these services up. But is this a bad news for the SysAdmins? Will DICE put them out of job?

Continue reading »

A design for life!

 Uncategorized  Comments Off on A design for life!
Aug 152016

Have you ever had problems working with a data intensive application?

If so, you’ll know that the difficulty comes from having to unavoidably deal with various failures. So what do you do? Many people have found success by designing software to never fail. But there are a few things you should know before you buy and implement a solution in order to ensure your software is actually  resilient to failures of the hosting environment. This post will tell you what you need to know to make sure you select a much more viable strategy to make your applications reliable and will let you properly test applications both during development and after deployment. Within the DICE project, a Fault Injection Tool (FIT) has been developed to help achieve exactly that.

Continue reading »

DICE Configuration Optimization Tool (BO4CO)

 Uncategorized  Comments Off on DICE Configuration Optimization Tool (BO4CO)
Jul 052016

Big Data systems are regarded as a new class of software systems leveraging several emerging technologies to efficiently ingest, process and produce large quantities of data. Each of the comprising technologies (e.g., Hadoop, Spark, Cassandra) has typically dozens of configurable parameters that should be carefully tuned in order to perform optimally. Unfortunately, users of such systems, like data scientists, usually lack the technical skills to tune system internals. Such users would rather use a system that can tune itself. Yet, there is a shortage of automated methods to support the configuration of Big Data systems. One possible explanation is that the influences of a configuration option on performance are not well understood [1].

Continue reading »

May 222016

The IT industry is not immune in the efforts of speeding up the production of its goods – applications and services. The best way of reducing cost and time needed to build a software solution is to cut the processes that can be done better and faster automatically without losing the essence of the process. Installing and configuring software is traditionally a manual process, and thus complex, costly and time-consuming. A much better alternative is to describe the whole application in a blueprint, then use a suitable tool to interpret the blueprint to turn it into a live application. OASIS TOSCA provides an emerging standard for describing applications in blueprints.

Continue reading »

May 182016

Big Data is certainly a big hype nowadays and there are a tremendous number of frameworks available that enable companies to develop Big Data applications. The development of data-intensive applications, like development of any other software application, involves testing, validation and fine-tuning processes to ensure the performance and reliability the end-users expect. Throughout these processes the execution of the application needs to be constantly monitored in order to extract execution trends and spot the anomalies. And this is only the beginning. Once in production, monitoring of the application, together with its underlying infrastructure, is a must. But Big Data applications generate Big Monitoring Data, and not only this: the data is generated in different formats, is available either in log files, or via APIs. Continue reading »

DICE enables Quality-Driven DevOps for Big Data – a White Paper

 Uncategorized  Comments Off on DICE enables Quality-Driven DevOps for Big Data – a White Paper
Apr 252016

The DICE project has recently concluded its first year of activity, during which a lot of progress has been made in the definition of an innovative framework to develop Big Data applications. A technical architecture has been defined and initial prototypes are rapidly maturing.
The DICE consortium has recently released a white paper to explain to industrial stakeholders the purpose of DICE, its architecture and tool offering, and the market-oriented demonstrators that are currently being implemented.

Download the DICE White Paper

The first complete release of the DICE tools is set for August 2016, with an integrated development environment set for release in February 2017. Stay tuned!

Giuliano Casale, DICE Project Coordinator

Apr 082016

Let us imagine that you are a Software Developer working in a highly innovative data-driven start-up delivering a cutting-edge solution called “Data Digger Solution” to gather raw data from various and heterogeneous sources (e.g. social media, websites, CRM, online sales, servers, emails, etc.), process them and gain tangible insights from them, with fresh semantics allowing concrete and profitable interpretations (e.g. in terms of sales and web presence). Your start-up is growing and signing more and more contracts with major actors in various sectors (banks, insurances, retailers, medias, etc.) and, actually, this is great! Your boss is a visionary man or maybe he just reads the new IDC forecast which sees the Big Data technology and services market growing at a 26.4% compound annual growth rate to $41.5 billion through 2018 driven by wide adoption across industries. To not be victim of your own success, your boss asked you to rapidly design and implement a prototype of “Data Digger Solution” aka DDS using Big Data and Cloud technologies and make sure it will be in accordance with the unstoppable start-up business acceleration especially in terms of performance, reliability and scalability: “Do it fast, cheap, at scale and don’t lose data!”

Continue reading »

Support for FCO in Cloudify

 Uncategorized  Comments Off on Support for FCO in Cloudify
Mar 022016

Cloudify is an important component of the DICE deployment tool. It enables that the users can describe their applications in a human-readable text format YAML, using a TOSCA dialect to describe the applications’ topology. The blueprint containing the topology normally includes specification of nodes (virtual machines), services needed by the application (e.g., Kafka, Spark, Zookeeper etc.), which service runs on which node, and what are the relationships between services. Cloudify then takes care of provisinoning resources and installing the components in a public and private Cloud provider of choice. The original list includes the Amazon’s EC2, OpenStack, vSphere and others. In DICE, we extended support to the Flexiant Cloud Orchestrator (FCO) as well. Continue reading »