-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmethodology_EC.tex
91 lines (80 loc) · 4.57 KB
/
methodology_EC.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
\section{Methodology}
\label{Methodology}
In this work we aim at analysing the structure of a software system using its associated network.
First, to build a software system network we parse its source code, retrieved
from the corresponding Software Control Managers (SCM).
During this procedure, we associate network nodes to classes and network edges to the several relationships
between classes (inheritance, composition, etc.).
We consider as a main indicator of a software quality, the number of defects (bugs) that it presents, so
we collected data about the bugs of a software system by mining its Bug Tracking Systems (BTS).
In order to associate to each bug its corresponding classes we mined the commits on the software SCM to figure out
which classes a bug fix intervention is related. % is correctly associated to a bug.
At the end we obtained a network where each node is labelled % annotated
with the number of bugs for the associated class.
% Specifically we are interested in extracting the community structure of a software system in order
% to figure out its modular organization. Moreover, we are interested in computing the modularity Q associated
% to a community structure \ref{}, the number of communities, and the clustering coefficient.
% In order to compute the metrics related to the community structure, we
% need to build the networks to associate to the software systems. This is done
% by parsing the source code retrieved from Software Configuration Management
% (SCM) repositories, in order to extract the various relationships among classes
% and files.
% These relationships could be inheritance, composition, dependencies,
% aggregation, association and so on. We considered Java classes as nodes of
% the software network, while we considered the relationships among classes as
% network edges.
% Once we retrieved the networks, we collected software issues
% by mining bug repositories, in order to associate to each node in the network
% the corresponding defects. Finally we analyzed the community structure of the
% software networks, computing different community metrics and some software
% metrics.
We collected the source code and analysed 5 releases of Eclipse, whose main feature are
presented in Table \ref{tab:Eclipse}.
% We collected the source code of NetBeans and Eclipse from the CVS repository.
% We analyzed 6 releases of NetBeans and 5 releases of Eclipse. In Table \ref{tab:Eclipse}
% we report their main features.
\begin{table}[h]
\begin{center}
% \scalebox{0.9}
% {
% \begin{tabular}{|l|c|c|c|c|c|c|}
% \hline
% Release & NB 3.2 & NB 3.2.1 & NB 3.3.0 & NB 3.4 & NB 4.0 & NB 6.0.1\\ \hline
% Size & 4333 & 4348 & 5678 & 7520 & 11866 & 34591 \\
%
% Sub-Projects n.& 38 & 38 & 39 & 42 & 41 & 56 \\
%
% N. of defects & 14948 & 15043 & 19218 & 21529 & 26592 & 73230 \\ \hline
%
% \end{tabular}
% }
\scalebox{0.9}
{
\begin{tabular}{|l|c|c|c|c|c|}
\hline
Release & Eclipse 2.1 & Eclipse 3.0 & Eclipse 3.1 & Eclipse 3.2 & Eclipse 3.3 \\\hline
Size & 8257 & 11406 & 13413 & 16013 & 17517 \\
Sub-Projects n.& 49 & 66 & 70 & 86 & 104 \\
N. of defects & 47788 & 59804 & 69900 & 80149 & 95337 \\ \hline
\end{tabular}
}
\end{center}
\caption{Main features of the analysed releases of Eclipse (EC): size (number of classes),
number of sub-projects (sub-networks), and total number of defects.}
\label{tab:Eclipse}
\end{table}
Each release is structured in almost independent sub-projects, thus the total number
of sub-projects analysed amounts at 375, with more than 60000 nodes (classes)
and more than 350000 defects.% 170623
We performed the computation of the community structure using the algorithm devised by Clauset et al. \cite{Clauset:2004}.
This is an agglomerative clustering algorithm that perform a greedy optimization of the Modularity (Q) \cite{Newman:2004}.
At the end we retrieved the number of communities in which the network is structured, the corresponding value for Q
and the nodes associated to each community.
We performed the computation of the clustering coefficient using the implementation included in the IGraph package
\cite{igraph} for R software\cite{R}.
To study the evolution of the system we use the following approach. We carried out the
analysis firstly for each release, and than putting together different releases, according to a temporal evolution.
Specifically, of the 5 releases of our dataset, we
studied the evolution of the system by cumulating the first and the second releases, then adding the third release
to the first set, and so on.
This way we were able to make predictions about the next release starting from those previously cumulated.