Recognitions based on bibliometric indexes

  • 2021 2014

    Top Scientists in the Computer Science Area: 15th place in Italy and 1051st place in the world

    In 2022 I was ranked at the 15th in Italy and at the 1051st in the world in the eighth edition of the ranking of top scientists in the computer science area. The ranking was published by Research.com, a leading website for computer science research that has been offering credible data on scientific contributions since 2014.

  • 2020 2013

    Top Scientists in the Software Engineering Area: 4th place in the world

    In 2021 I was ranked at the 4th place in a bibliometric study conducted, over an eight-year period (2013-2020), with the aim of identifying, worldwide, the most active researchers in the field of software engineering (details)

  • 2017 2010

    Top Scientists in the Software Engineering Area (Consoloditar category): 2nd place in the world

    In 2018, I was ranked at the 2nd place in the word, in the consolidator category, in a bibliometric study conducted, over an eight-year period (2010-2017), with the aim of identifying, worldwide, the top 20 early stage, consolidator and experienced researchers in the field of software engineering (details)

Most Influential Paper (MIP) Awards

  • SANER 2024

    Cross-project defect prediction models: L'Union fait la force
    by A. Panichella, R. Oliveto, A. De Lucia

    31st IEEE International Conference on Software Analysis, Evolution and Reengineering

    Existing defect prediction models use product or process metrics and machine learning methods to identify defect-prone source code entities. Different classifiers (e.g. linear regression, logistic regression, or classification trees) have been investigated in the last decade. The results achieved so far are sometimes contrasting and do not show a clear winner. In this paper we present an empirical study aiming at statistically analyzing the equivalence of different defect predictors. We also propose a combined approach, coined as CODEP (COmbined DEfect Predictor), that employs the classification provided by different machine learning techniques to improve the detection of defect-prone entities. The study was conducted on 10 open source software systems and in the context of cross-project defect prediction, that represents one of the main challenges in the defect prediction field. The statistical analysis of the results indicates that the investigated classifiers are not equivalent and they can complement each other. This is also confirmed by the superior prediction accuracy achieved by CODEP when compared to stand-alone defect predictors.

  • SCAM 2022

    When does a Refactoring Induce Bugs? An Empirical Study
    by G. Bavota, B. De Carluccio, A. De Lucia, M. Di Penta, R. Oliveto, O. Strollo

    22nd IEEE International Working Conference on Source Code Analysis and Manipulation

    Refactorings are - as defined by Fowler - behavior preserving source code transformations. Their main purpose is to improve maintainability or comprehensibility, or also reduce the code footprint if needed. In principle, refactorings are defined as simple operations so that are "unlikely to go wrong" and introduce faults. In practice, refactoring activities could have their risks, as other changes. This paper reports an empirical study carried out on three Java software systems, namely Apache Ant, Xerces, and Ar-go UML, aimed at investigating to what extent refactoring activities induce faults. Specifically, we automatically detect (and then manually validate) 15,008 refactoring operations (of 52 different kinds) using an existing tool (Ref-Finder). Then, we use the SZZ algorithm to determine whether it is likely that refactorings induced a fault. Results indicate that, while some kinds of refactorings are unlikely to be harmful, others, such as refactorings involving hierarchies (e.g., pull up method), tend to induce faults very frequently. This suggests more accurate code inspection or testing activities when such specific refactorings are performed.

  • ICSME 2021

    On integrating orthogonal information retrieval methods to improve traceability recovery
    by M. Gethers, R. Oliveto, D. Poshyvanyk, A. De Lucia

    37th IEEE International Conference on Software Maintenance and Evolution

    Different Information Retrieval (IR) methods have been proposed to recover traceability links among software artifacts. Until now there is no single method that sensibly outperforms the others, however, it has been empirically shown that some methods recover different, yet complementary traceability links. In this paper, we exploit this empirical finding and propose an integrated approach to combine orthogonal IR techniques, which have been statistically shown to produce dissimilar results. Our approach combines the following IR-based methods: Vector Space Model (VSM), probabilistic Jensen and Shannon (JS) model, and Relational Topic Modeling (RTM), which has not been used in the context of traceability link recovery before. The empirical case study conducted on six software systems indicates that the integrated method outperforms stand-alone IR methods as well as any other combination of non-orthogonal methods with a statistically significant margin.

  • ICPC 2020

    On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery
    by R. Oliveto, M. Gethers, D. Poshyvanyk, A. De Lucia

    28th International Conference on Program Comprehension

    We present an empirical study to statistically analyze the equivalence of several traceability recovery methods based on Information Retrieval (IR) techniques. The analysis is based on Principal Component Analysis and on the analysis of the overlap of the set of candidate links provided by each method. The studied techniques are the Jensen-Shannon (JS) method, Vector Space Model (VSM), Latent Semantic Indexing (LSI), and Latent Dirichlet Allocation (LDA). The results show that while JS, VSM, and LSI are almost equivalent, LDA is able to capture a dimension unique to the set of techniques which we considered.

ACM Sigsoft Distinguished Paper Awards

  • MSR 2019

    Data-driven solutions to detect API compatibility issues in Android: an empirical study
    by S. Scalabrino, G. Bavota, M. Linares-Vásquez, M. Lanza, R. Oliveto

    16th International Conference on Mining Software Repositories

    Android apps are inextricably linked to the official Android APIs. Such a strong form of dependency implies that changes introduced in new versions of the Android APIs can severely impact the apps' code, for example because of deprecated or removed APIs. In reaction to those changes, mobile app developers are expected to adapt their code and avoid compatibility issues. To support developers, approaches have been proposed to automatically identify API compatibility issues in Android apps. The state-of-the-art approach, named CiD, is a data-driven solution learning how to detect those issues by analyzing the changes in the history of Android APIs ("API side" learning). While it can successfully identify compatibility issues, it cannot recommend coding solutions. We devised an alternative data-driven approach, named ACRYL. ACRYL learns from changes implemented in other apps in response to API changes ("client side" learning). This allows not only to detect compatibility issues, but also to suggest a fix. When empirically comparing the two tools, we found that there is no clear winner, since the two approaches are highly complementary, in that they identify almost disjointed sets of API compatibility issues. Our results point to the future possibility of combining the two approaches, trying to learn detection/fixing rules on both the API and the client side.

  • ASE 2017

    Automatically Assessing Code Understandability: How Far Are We?
    by S. Scalabrino, G. Bavota, C. Vendome, M. Linares-Vasquez, D. Poshyvanyk, R. Oliveto

    32nd IEEE/ACM International Conference on Automated Software Engineering

    Program understanding plays a pivotal role in software maintenance and evolution: a deep understanding of code is the stepping stone for most software-related activities, such as bug fixing or testing. Being able to measure the understandability of a piece of code might help in estimating the effort required for a maintenance activity, in comparing the quality of alternative implementations, or even in predicting bugs. Unfortunately, there are no existing metrics specifically designed to assess the understandability of a given code snippet. In this paper we perform a first step in this direction, by studying the extent to which several types of metrics computed on code, documentation and developers correlate with code understandability. To perform such an investigation we ran a study with 46 participants who were asked to understand eight code snippets each. We collected a total of 324 evaluations aiming at assessing the perceived understandability, the actual level of understanding and the time needed to understand a code snippet. Our results demonstrate that none of the (existing and new) metrics we considered is able to capture code understandability, not even the ones assumed to assess quality attributes strongly related with it, such as code readability and complexity.

  • ICPC 2016

    Improving Code Readability Models with Textual Features
    by S. Scalabrino, M. Linares-Vasquez, D. Poshyvanyk, R. Oliveto

    24th International Conference on Program Comprehension

    Code reading is one of the most frequent activities in software maintenance; before implementing changes, it is necessary to fully understand source code often written by other developers. Thus, readability is a crucial aspect of source code that might significantly influence program comprehension effort. In general, models used to estimate software readability take into account only structural aspects of source code, e.g., line length and a number of comments. However, code is a particular form of text; therefore, a code readability model should not ignore the textual aspects of source code encapsulated in identifiers and comments. In this paper, we propose a set of textual features that could be used to measure code readability. We evaluated the proposed textual features on 600 code snippets manually evaluated (in terms of readability) by 5K+ people. The results show that the proposed features complement classic structural features when predicting readability judgments. Consequently, a code readability model based on a richer set of features, including the ones proposed in this paper, achieves a significantly better accuracy as compared to all the state-of-the-art readability models.

  • ICSE 2015

    When and Why Your Code Starts to Smell Bad
    by M. Tufano, F. Palomba, G. Bavota, R. Oliveto, M. Di Penta, A. De Lucia, and D. Poshyvanyk

    37th International Conference on Software Engineering

    In past and recent years, the issues related to managing technical debt received significant attention by researchers from both industry and academia. There are several factors that contribute to technical debt. One of these is represented by code bad smells, i.e. symptoms of poor design and implementation choices. While the repercussions of smells on code quality have been empirically assessed, there is still only anecdotal evidence on when and why bad smells are introduced. To fill this gap, we conducted a large empirical study over the change history of 200 open source projects from different software ecosystems and investigated when bad smells are introduced by developers, and the circumstances and reasons behind their introduction. Our study required the development of a strategy to identify smell-introducing commits, the mining of over 0.5M commits, and the manual analysis of 9,164 of them (i.e. those identified as smell-introducing). Our findings mostly contradict common wisdom stating that smells are being introduced during evolutionary tasks. In the light of our results, we also call for the need to develop a new generation of recommendation systems aimed at properly planning smell refactoring activities.

  • ESEC
    FSE
    2015

    Optimizing Energy Consumption of GUIs in Android Apps: A Multi-objective Approach by M. Linares-Vasquez, G. Bavota, C. Bernal-Cardenas, R. Oliveto, M. Di Penta, D. Poshyvanyk

    10th Joint Meeting of the European Software Engineering Conference and the 23rd ACM SIGSOFT Symposium on the Foundations of Software Engineering

    The wide diffusion of mobile devices has motivated research towards optimizing energy consumption of software systems - including apps - targeting such devices. Besides efforts aimed at dealing with various kinds of energy bugs, the adoption of Organic Light-Emitting Diode (OLED) screens has motivated research towards reducing energy consumption by choosing an appropriate color palette. Whilst past research in this area aimed at optimizing energy while keeping an acceptable level of contrast, this paper proposes an approach, named GEMMA (Gui Energy Multi-objective optiMization for Android apps), for generating color palettes using a multi-objective optimization technique, which produces color solutions optimizing energy consumption and contrast while using consistent colors with respect to the original color palette. An empirical evaluation that we performed on 25 Android apps demonstrates not only significant improvements in terms of the three different objectives, but also confirmed that in most cases users still perceived the choices of colors as attractive. Finally, for several apps we interviewed the original developers, who in some cases expressed the intent to adopt the proposed choice of color palette, whereas in other cases pointed out directions for future improvements.

  • ASE 2013

    Detecting Bad Smells in Source Code Using Change History Information
    by F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia, D. Poshyvanyk

    28th IEEE/ACM International Conference on Automated Software Engineering

    Code smells represent symptoms of poor implementation choices. Previous studies found that these smells make source code more difficult to maintain, possibly also increasing its fault-proneness. There are several approaches that identify smells based on code analysis techniques. However, we observe that many code smells are intrinsically characterized by how code elements change over time. Thus, relying solely on structural information may not be sufficient to detect all the smells accurately. We propose an approach to detect five different code smells, namely Divergent Change, Shotgun Surgery, Parallel Inheritance, Blob, and Feature Envy, by exploiting change history information mined from versioning systems. We applied approach, coined as HIST (Historical Information for Smell deTection), to eight software projects written in Java, and wherever possible compared with existing state-of-the-art smell detectors based on source code analysis. The results indicate that HIST's precision ranges between 61% and 80%, and its recall ranges between 61% and 100%. More importantly, the results confirm that HIST is able to identify code smells that cannot be identified through approaches solely based on code analysis.

Best Paper Awards

  • HEALTHINF 2020

    Combining Rhythmic and Morphological ECG Features for Automatic Detection of Atrial Fibrillation
    by G. Laudato, F. Boldi, A. Colavita, G. Rosa, S. Scalabrino, P. Torchitti, A. Lazich, R. Oliveto

    13th International Conference on Health Informatics

    Atrial fibrillation (AF) is the most common type of heart arrhythmia. AF is highly associated with other cardiovascular diseases, such as heart failure, coronary artery disease and can lead to stroke. Unfortunately, in some cases people with atrial fibrillation have no explicit symptoms and are unaware of their condition until it is discovered during a physical examination. Thus, it is considered a priority to define highly accurate automatic approaches to detect such a pathology in the context of a massive screening. For this reason, in the recent years several approaches have been defined to automatically detect AF. These approaches are often based on machine learning techniques and—most of them—analyse the heart rhythm to make a prediction. Even if AF can be diagnosed by analysing the rhythm, the analysis of the morphology of a heart beat is also important. Indeed, during an AF events the P wave could be absent and fibrillation waves may appear in its place. This means that the presence of only arrhythmia could be not enough to detect an AF events. Based on the above consideration we have presented MORPHYTHM, an approach that use machine learning to combine rhythm and morphological features to identify AF events. The results we achieved in an empirical evaluation seems promising. In this paper we present an extension of MORPHYTHM, called LOCAL MORPHYTHM, aiming at further improving the detection accuracy of AF events. An empirical evaluation of LOCAL MORPHYTHM has shown significantly better results in the classification process with respect to MORPHYTHM, particularly for what concerns the true positives and false negatives.

  • SCAM 2012

    When does a Refactoring Induce Bugs? An Empirical Study
    by G. Bavota, B. De Carluccio, A. De Lucia, M. Di Penta, R. Oliveto, O. Strollo

    12th IEEE International Working Conference on Source Code Analysis and Manipulation

    Refactorings are - as defined by Fowler - behavior preserving source code transformations. Their main purpose is to improve maintainability or comprehensibility, or also reduce the code footprint if needed. In principle, refactorings are defined as simple operations so that are "unlikely to go wrong" and introduce faults. In practice, refactoring activities could have their risks, as other changes. This paper reports an empirical study carried out on three Java software systems, namely Apache Ant, Xerces, and Ar-go UML, aimed at investigating to what extent refactoring activities induce faults. Specifically, we automatically detect (and then manually validate) 15,008 refactoring operations (of 52 different kinds) using an existing tool (Ref-Finder). Then, we use the SZZ algorithm to determine whether it is likely that refactorings induced a fault. Results indicate that, while some kinds of refactorings are unlikely to be harmful, others, such as refactorings involving hierarchies (e.g., pull up method), tend to induce faults very frequently. This suggests more accurate code inspection or testing activities when such specific refactorings are performed.

  • ICPC 2011

    Improving IR-based Traceability Recovery Using Smoothing Filters
    by A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, S. Panichella

    19th International Conference on Program Comprehension

    Information Retrieval methods have been largely adopted to identify traceability links based on the textual similarity of software artifacts. However, noise due to word usage in software artifacts might negatively affect the recovery accuracy. We propose the use of smoothing filters to reduce the effect of noise in software artifacts and improve the performances of traceability recovery methods. An empirical evaluation performed on two repositories indicates that the usage of a smoothing filter is able to significantly improve the performances of Vector Space Model and Latent Semantic Indexing. Such a result suggests that other than being used for traceability recovery the proposed filter can be used to improve performances of various other software engineering approaches based on textual analysis.

  • ICSM
    ERA
    2010

    Physical and Conceptual Identifier Dispersion: Measures and Relation to Fault Proneness
    by V. Arnaoudova, L. Eshkevari, R. Oliveto, Y.-G. Guéhéneuc, G. Antoniol

    26th IEEE International Conference on Software Maintenance - ERA Track

    Poorly-chosen identifiers have been reported in the literature as misleading and increasing the program comprehension effort. Identifiers are composed of terms, which can be dictionary words, acronyms, contractions, or simple strings. We conjecture that the use of identical terms in different contexts may increase the risk of faults. We investigate our conjecture using a measure combining term entropy and term context coverage to study whether certain terms increase the odds ratios of methods to be fault-prone. Entropy measures the physical dispersion of terms in a program: the higher the entropy, the more scattered across the program the terms. Context coverage measures the conceptual dispersion of terms: the higher their context coverage, the more unrelated the methods using them. We compute term entropy and context coverage of terms extracted from identifiers in Rhino 1.4R3 and ArgoUML 0.16. We show statistically that methods containing terms with high entropy and context coverage are more fault-prone than others.