CORRECT-ICSE2016

CORRECT: CODE REVIEWER
RECOMMENDATION IN GITHUB BASED ON
CROSS-PROJECT & TECHNOLOGY
EXPERIENCE
Mohammad Masudur Rahman, Chanchal K. Roy and
Jason A. Collins*
Department of Computer Science
University of Saskatchewan, Canada, Google Inc., USA*
38th International Conference on Software Engineering
(ICSE 2016), Austin, TX, USA

CODE REVIEW
2
Code review could be unpleasant!!

CODE REVIEW
Formal inspection
Peer code review
Modern code review
Code review is a systematic
examination of source code for
detecting bugs or defects and
coding rule violations.
3
Early bug detection
Stop coding rule violation
Enhance developer skill

4
 FOR
Novice developers
Distributed software
development
Delayed 12 days
(Thongtanunam et al, SANER 2015)

EXISTING LITERATURE
 Line Change History (LCH)
 ReviewBot (Balachandran, ICSE 2013)
 File Path Similarity (FPS)
 RevFinder (Thongtanunam et al, SANER 2015)
 FPS (Thongtanunam et al, CHASE 2014)
 Tie (Xia et al, ICSME 2015)
 Code Review Content and Comments
 Tie (Xia et al, ICSME 2015)
 SNA (Yu et al, ICSME 2014)
5
 Issues & Limitations
 Mine developer’s contributions from
within a single project only.
 Library & Technology Similarity
Library
Technology

OUTLINE OF THE TALK
6
Vendasta codebase
CORRECT
Evaluation using
VendAsta code base
Evaluation using
Open Source Projects
Conclusion
Comparative
study
Exploratory study 3 Research questions

EXPLORATORY STUDY ( 3 RQS)
1: How frequently do the commercial software
projects reuse external libraries from within the
codebase?
2: Does the experience of a developer with such
libraries matter in code reviewer selection by other
developers?
3: How frequently do the commercial projects adopt
specialized technologies (e.g., taskqueue,
mapreduce, urlfetch)?
7

DATASET: EXPLORATORY STUDY
8
 Each project has at least 750 closed pull requests.
 Each library is used at least 10 times on average.
 Each technology is used at least 5 times on average.
10 utility libraries
(Vendasta)
10 commercial projects
(Vendasta)
10 Google App Engine
Technologies

LIBRARY USES IN COMMERCIAL PROJECTS
(ANSWERED: EXP-RQ1 )
 Empirical library usage frequency in 10 projects
 Mostly used: vtest, vauth, and vapi
 Least used: vlogs, vmonitor
9

LIBRARY USES IN
PULL REQUESTS (ANSWERED: EXP-RQ2)
 30%-70% of pull requests used at least one of the 10 libraries
 87%-100% of library authors recommended as code reviewers in
the projects using those libraries
 Library experience really matters!
10
% of PR using selected libraries % of library authors as code reviewers

TECHNOLOGY USES
IN PROJECTS (ANSWERED: EXP-RQ3)
 Empirical technology usage frequency in top 10
commercial projects
 Champion technology: mapreduce
11

TECHNOLOGY USES IN PULL REQUESTS
(ANSWERED: EXP-RQ3)
 20%-60% of the pull requests used at least one of the
10 specialized technologies.
 Mostly used in: ARM, CS and VBC
12

SUMMARY OF EXPLORATORY FINDINGS
13
About 50% of the pull requests used one or more of the
selected libraries. (Exp-RQ1)
About 98% of the library authors were later
recommended as pull request reviewers. (Exp-RQ2)
About 35% of the pull requests used one or more
specialized technologies. (Exp-RQ3)
Library experience and Specialized technology
experience really matter in code reviewer
selection/recommendation

CORRECT: CODE REVIEWER RECOMMENDATION
IN GITHUB USING CROSS-PROJECT &
TECHNOLOGY EXPERIENCE
14

CORRECT: CODE REVIEWER
RECOMMENDATION
15
R1 R2
R3
PR Review R1 PR Review R2
PR Review R3
Review
Similarity
Review
Similarity

OUR CONTRIBUTIONS
16
State-of-the-art (Thongtanunam et al, SANER 2015)
IF
IF
Our proposed technique--CORRECT
= New PR = Reviewed PR = Source file
= External library & specialized technology

EVALUATION OF CORRECT
 Two evaluations using-- (1) Vendasta codebase (2)
Open source software projects
17
1: Are library experience and technology experience useful
proxies for code review skills?
2: Does CoRReCT outperform the baseline technique for
reviewer recommendation?
3: Does CoRReCT perform equally/comparably for both
private and public codebase?
4: Does CoRReCT show bias to any of the development
frameworks

EXPERIMENTAL DATASET
 Sliding window of 30 past requests for learning.
 Metrics: Top-K Accuracy, Mean Precision (MP), Mean
Recall (MR), and Mean Reciprocal rank (MRR). 18
10 Python projects 2 Python, 2 Java &
2 Ruby projects
13,081 Pull requests 4,034 Pull requests
Code reviews Code reviewers
Gold set

LIBRARY EXPERIENCE & TECHNOLOGY
EXPERIENCE (ANSWERED: RQ1)
Metric Library Similarity Technology Similarity Combined Similarity
Top-3 Top-5 Top-3 Top-5 Top-3 Top-5
Accuracy 83.57% 92.02% 82.18% 91.83% 83.75% 92.15%
MRR 0.66 0.67 0.62 0.64 0.65 0.67
MP 65.93% 85.28% 62.99% 83.93% 65.98% 85.93%
MR 58.34% 80.77% 55.77% 79.50% 58.43% 81.39%
19
[ MP = Mean Precision, MR = Mean Recall, MRR = Mean Reciprocal Rank ]
 Both library experience and technology experience are
found as good proxies, provide over 90% accuracy.
 Combined experience provides the maximum performance.
 92.15% recommendation accuracy with 85.93% precision and
81.39% recall.
 Evaluation results align with exploratory study findings.

COMPARATIVE STUDY FINDINGS (ANSWERED:
RQ2)
 CoRReCT performs better than the competing technique in all
metrics (p-value=0.003<0.05 for Top-5 accuracy)
 Performs better both on average and on individual projects.
 RevFinder uses PR similarity using source file name and file’s
directory matching
20
Metric RevFinder[18] CoRReCT
Top-5 Top-5
Accuracy 80.72% 92.15%
MRR 0.65 0.67
MP 77.24% 85.93%
MR 73.27% 81.39%
[ MP = Mean Precision, MR = Mean Recall,
MRR = Mean Reciprocal Rank ]

COMPARISON ON OPEN SOURCE PROJECTS
(ANSWERED: RQ3)
 In OSS projects, CoRReCT also performs better than the
baseline technique.
 85.20% accuracy with 84.76% precision and 78.73% recall,
and not significantly different than earlier (p-value=0.239>0.05
for precision)
 Results for private and public codebase are quite close.
21
Metric RevFinder [18] CoRReCT (OSS) CoRReCT (VA)
Top-5 Top-5 Top-5
Accuracy 62.90% 85.20% 92.15%
MRR 0.55 0.69 0.67
MP 62.57% 84.76% 85.93%
MR 58.63% 78.73% 81.39%

COMPARISON ON DIFFERENT PLATFORMS
(ANSWERED: RQ4)
Metrics Python Java Ruby
Beets St2 Avg. OkHttp Orientdb Avg. Rubocop Vagrant Avg.
Accuracy 93.06% 79.20% 86.13% 88.77% 81.27% 85.02% 89.53% 79.38% 84.46%
MRR 0.82 0.49 0.66 0.61 0.76 0.69 0.76 0.71 0.74
MP 93.06% 77.85% 85.46% 88.69% 81.27% 84.98% 88.49% 79.17% 83.83%
MR 87.36% 74.54% 80.95% 85.33% 76.27% 80.80% 81.49% 67.36% 74.43%
22
 In OSS projects, results for different platforms look
surprisingly close except the recall.
 Accuracy and precision are close to 85% on average.
 CORRECT does NOT show any bias to any particular platform.

TAKE-HOME MESSAGES
23
1
3
6
4
5
2

THANK YOU!! QUESTIONS?
24
Masud Rahman (masud.rahman@usask.ca)
CORRECT site (http://www.usask.ca/~masud.rahman/correct)
Acknowledgement: This work is supported by NSERC

THREATS TO VALIDITY
 Threats to Internal Validity
 Skewed dataset: Each of the 10 selected projects is
medium sized (i.e., 1.1K PR) except CS.
 Threats to External Validity
 Limited OSS dataset: Only 6 OSS projects considered—
not sufficient for generalization.
 Issue of heavy PRs: PRs containing hundreds of files can
make the recommendation slower.
 Threats to Construct Validity
 Top-K Accuracy: Does the metric represent effectiveness
of the technique? Widely used by relevant literature
(Thongtanunam et al, SANER 2015)
25

CORRECT-ICSE2016

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to CORRECT-ICSE2016

Similar to CORRECT-ICSE2016 (20)

More from Masud Rahman

More from Masud Rahman (20)

Recently uploaded

Recently uploaded (20)

CORRECT-ICSE2016

Editor's Notes