DevOps: A Quality Assessment Experience

Dec 132017

In a previous article of this Blog, we discussed the importance of assessing quality during the development of data intensive applications (DIA). In particular, we explored the performance and reliability properties of DIA and presented a Simulation tool (SimTool) that helps on this purpose. This article extends such contribution, concretely for addressing the quality topic in the DevOps context. The core idea of DevOps is to foster a close cooperation between the Dev and Ops teams. Probably, the reader will also be interested on taking a look of what the DICE project proposes at this regard.

Assessment Approach

The quality assessment of a DIA can be carried out in between the design and code implementation stages. The SimTool [1] contributes towards automatizing the assessment activities. Concretely, it computes performance metrics (throughput, utilization and service response time) and reliability metrics. SimTool fits within a complete DevOps workflow for DIA development. A typical usage scenario of the SimTool happens when a new Dev cycle starts, the addition of a new functionality is planned, and monitored information from the Ops is received. The overall system, including the new functionality, needs to satisfy quality requirements, which were already defined or updated in the current iteration. Using the monitored information, the quality parameters in the design models, e.g, the actual host demands, are automatically updated.

A Tax Fraud DIA

In the context of the European Union, Tax fraud represents a huge problem. It causes a big fiscal loss of money, estimated to be of the order of 1 trillion euros per year [2]. Netfective Technology has started building a Tax fraud DIA, called Big Blu. First, it has been built a rapid prototype to validate the whole approach around focus groups and internal requirements gathering. Second, it has been built an MVP (Minimum Viable Product) respecting the 80/20 rule which means 80 per cent of the expected results must come from 20 per cent of efforts. This MVP has been developed, deployed and tested on a private Cloud relying on an agile DevOps approach. The third and last step is to move from an MVP to a first release with a stable and scalable architecture, which can be deployed on a production environment. The current version of the MVP is based on Java and Scala as programming languages above a Cassandra and Spark clusters respectively for data management and data processing. Next, we report an experience conducted for assessing quality [3], using SimTool, during two iterations of the MVP development.

Assessment Experience

In a given iteration, the Ops team reported saturation of a Big Data processing computational node. In particular, the response time for a computational branch was required to be completed in 10 minutes, however this was largely surpassed. The Dev team realized that, by inspecting monitored information, the issue could be due to some events:

The growth in the number of requests to the system.
The growth of the database.

Using SimTool, we tried to confirm our guess by simulating increments in the arrival rate of requests to the processing node, and also increments in the execution time required by the computational branch. Concretely, we simulated the arrival rate from 0.0016 req/s until reaching system saturation and the execution time from 2.5 minutes to 5 minutes.

Figure 1(a1) depicts the simulation results. Firstly, note that we have depicted a yellow plane at response time value of 600 seconds, to easily check whether the requirement is fulfilled or not, i.e., if it is below to 10 minutes or not. For the initial configuration (2.5 minutes of execution time and 0.0016 req/s), the system was stable and the response time was 3 minutes and 46 seconds. If only the arrival rate is increased, up to 3 requests every 10 minutes, then the response time is 8 minutes and 45 seconds. If only the execution time increased, up to 5 minutes, then the response time is 10 minutes and 25 seconds. If both variables increase (5 minutes of execution time and 3 requests every 10 minutes), then the response time increases unlimitedly. Figure 1(a2) depicts the utilization of a Big Data processing node with respect to our two variables. We observe that, for an arrival rate of 3 requests every 10 minutes and an execution time of 3.5 minutes, the node is almost saturated, with a 97.8% of utilization. When execution times are above 3.5 minutes, then the node is saturated, at 100% of utilization. In the subsequent stage of the new Dev cycle, developers faced two alternatives:

Acquire new nodes in the private Cloud devoted to the Big Data processing, so to parallelize the requests.
The reengineering of the fraud detection activity to make it faster, they had already observed that some parts of the code could be refactored.

The second alternative could easily reduce the service time by 40%, although going further this reduction would require much work. We used again SimTool to deeply analyse the first alternative and a combination of both.

Figure 1(b1) depicts the results of the first alternative. It shows that using 3 nodes, the expected response time is between 6 and 7 minutes. In turn, with 4 nodes, the system will be able to satisfy more than 6 requests every 10 minutes. Part (c1) also considers the second alternative by refactoring of the activity down to 3 minutes, i.e., an improvement of 40%. Then, using 2 nodes, we could offer an expected response time between 4 and 5 minutes. In turn, using 3 nodes, more than 8 requests every 10 minutes could be served. Parts (b2) and (c2) depict the utilization of the nodes for the same alternative solutions as (b1) and (c1), respectively. The decision was to use 3 processing nodes, which should satisfy the response time requirement.

Figure 1: Simulation results [3].

José Merseguer, Diego Perez-Palacin and José I. Requeno, Universidad de Zaragoza
Youssef Ridene, Netfective Technology

References

The DICE Consortium. DICE Simulation Tool, Oct., 2017. https://github.com/dice-project/DICE-Simulation.
E. Commission, 2017. http://ec.europa.eu/taxation_customs/fight-against-tax-fraud-tax-evasion/a-huge-problem_en.
D. Perez-Palacin, Y. Ridene, and J. Merseguer. Quality Assessment in DevOps: Automated Analysis of a Tax Fraud Detection System. In Companion Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, pp. 133-138. L’Aquila, Italy, April 22-26, 2017.

Sorry, the comment form is closed at this time.