SlideShare a Scribd company logo
1 of 25
CORRECT: CODE REVIEWER
RECOMMENDATION IN GITHUB BASED ON
CROSS-PROJECT & TECHNOLOGY
EXPERIENCE
Mohammad Masudur Rahman, Chanchal K. Roy and
Jason A. Collins*
Department of Computer Science
University of Saskatchewan, Canada, Google Inc., USA*
38th International Conference on Software Engineering
(ICSE 2016), Austin, TX, USA
CODE REVIEW
2
Code review could be unpleasant!!
CODE REVIEW
Formal inspection
Peer code review
Modern code review
Code review is a systematic
examination of source code for
detecting bugs or defects and
coding rule violations.
3
Early bug detection
Stop coding rule violation
Enhance developer skill
4
 FOR
Novice developers
Distributed software
development
Delayed 12 days
(Thongtanunam et al, SANER 2015)
EXISTING LITERATURE
 Line Change History (LCH)
 ReviewBot (Balachandran, ICSE 2013)
 File Path Similarity (FPS)
 RevFinder (Thongtanunam et al, SANER 2015)
 FPS (Thongtanunam et al, CHASE 2014)
 Tie (Xia et al, ICSME 2015)
 Code Review Content and Comments
 Tie (Xia et al, ICSME 2015)
 SNA (Yu et al, ICSME 2014)
5
 Issues & Limitations
 Mine developer’s contributions from
within a single project only.
 Library & Technology Similarity
Library
Technology
OUTLINE OF THE TALK
6
Vendasta codebase
CORRECT
Evaluation using
VendAsta code base
Evaluation using
Open Source Projects
Conclusion
Comparative
study
Exploratory study 3 Research questions
EXPLORATORY STUDY ( 3 RQS)
1: How frequently do the commercial software
projects reuse external libraries from within the
codebase?
2: Does the experience of a developer with such
libraries matter in code reviewer selection by other
developers?
3: How frequently do the commercial projects adopt
specialized technologies (e.g., taskqueue,
mapreduce, urlfetch)?
7
DATASET: EXPLORATORY STUDY
8
 Each project has at least 750 closed pull requests.
 Each library is used at least 10 times on average.
 Each technology is used at least 5 times on average.
10 utility libraries
(Vendasta)
10 commercial projects
(Vendasta)
10 Google App Engine
Technologies
LIBRARY USES IN COMMERCIAL PROJECTS
(ANSWERED: EXP-RQ1 )
 Empirical library usage frequency in 10 projects
 Mostly used: vtest, vauth, and vapi
 Least used: vlogs, vmonitor
9
LIBRARY USES IN
PULL REQUESTS (ANSWERED: EXP-RQ2)
 30%-70% of pull requests used at least one of the 10 libraries
 87%-100% of library authors recommended as code reviewers in
the projects using those libraries
 Library experience really matters!
10
% of PR using selected libraries % of library authors as code reviewers
TECHNOLOGY USES
IN PROJECTS (ANSWERED: EXP-RQ3)
 Empirical technology usage frequency in top 10
commercial projects
 Champion technology: mapreduce
11
TECHNOLOGY USES IN PULL REQUESTS
(ANSWERED: EXP-RQ3)
 20%-60% of the pull requests used at least one of the
10 specialized technologies.
 Mostly used in: ARM, CS and VBC
12
SUMMARY OF EXPLORATORY FINDINGS
13
About 50% of the pull requests used one or more of the
selected libraries. (Exp-RQ1)
About 98% of the library authors were later
recommended as pull request reviewers. (Exp-RQ2)
About 35% of the pull requests used one or more
specialized technologies. (Exp-RQ3)
Library experience and Specialized technology
experience really matter in code reviewer
selection/recommendation
CORRECT: CODE REVIEWER RECOMMENDATION
IN GITHUB USING CROSS-PROJECT &
TECHNOLOGY EXPERIENCE
14
CORRECT: CODE REVIEWER
RECOMMENDATION
15
R1 R2
R3
PR Review R1 PR Review R2
PR Review R3
Review
Similarity
Review
Similarity
OUR CONTRIBUTIONS
16
State-of-the-art (Thongtanunam et al, SANER 2015)
IF
IF
Our proposed technique--CORRECT
= New PR = Reviewed PR = Source file
= External library & specialized technology
EVALUATION OF CORRECT
 Two evaluations using-- (1) Vendasta codebase (2)
Open source software projects
17
1: Are library experience and technology experience useful
proxies for code review skills?
2: Does CoRReCT outperform the baseline technique for
reviewer recommendation?
3: Does CoRReCT perform equally/comparably for both
private and public codebase?
4: Does CoRReCT show bias to any of the development
frameworks
EXPERIMENTAL DATASET
 Sliding window of 30 past requests for learning.
 Metrics: Top-K Accuracy, Mean Precision (MP), Mean
Recall (MR), and Mean Reciprocal rank (MRR). 18
10 Python projects 2 Python, 2 Java &
2 Ruby projects
13,081 Pull requests 4,034 Pull requests
Code reviews Code reviewers
Gold set
LIBRARY EXPERIENCE & TECHNOLOGY
EXPERIENCE (ANSWERED: RQ1)
Metric Library Similarity Technology Similarity Combined Similarity
Top-3 Top-5 Top-3 Top-5 Top-3 Top-5
Accuracy 83.57% 92.02% 82.18% 91.83% 83.75% 92.15%
MRR 0.66 0.67 0.62 0.64 0.65 0.67
MP 65.93% 85.28% 62.99% 83.93% 65.98% 85.93%
MR 58.34% 80.77% 55.77% 79.50% 58.43% 81.39%
19
[ MP = Mean Precision, MR = Mean Recall, MRR = Mean Reciprocal Rank ]
 Both library experience and technology experience are
found as good proxies, provide over 90% accuracy.
 Combined experience provides the maximum performance.
 92.15% recommendation accuracy with 85.93% precision and
81.39% recall.
 Evaluation results align with exploratory study findings.
COMPARATIVE STUDY FINDINGS (ANSWERED:
RQ2)
 CoRReCT performs better than the competing technique in all
metrics (p-value=0.003<0.05 for Top-5 accuracy)
 Performs better both on average and on individual projects.
 RevFinder uses PR similarity using source file name and file’s
directory matching
20
Metric RevFinder[18] CoRReCT
Top-5 Top-5
Accuracy 80.72% 92.15%
MRR 0.65 0.67
MP 77.24% 85.93%
MR 73.27% 81.39%
[ MP = Mean Precision, MR = Mean Recall,
MRR = Mean Reciprocal Rank ]
COMPARISON ON OPEN SOURCE PROJECTS
(ANSWERED: RQ3)
 In OSS projects, CoRReCT also performs better than the
baseline technique.
 85.20% accuracy with 84.76% precision and 78.73% recall,
and not significantly different than earlier (p-value=0.239>0.05
for precision)
 Results for private and public codebase are quite close.
21
Metric RevFinder [18] CoRReCT (OSS) CoRReCT (VA)
Top-5 Top-5 Top-5
Accuracy 62.90% 85.20% 92.15%
MRR 0.55 0.69 0.67
MP 62.57% 84.76% 85.93%
MR 58.63% 78.73% 81.39%
[ MP = Mean Precision, MR = Mean Recall, MRR = Mean Reciprocal Rank ]
COMPARISON ON DIFFERENT PLATFORMS
(ANSWERED: RQ4)
Metrics Python Java Ruby
Beets St2 Avg. OkHttp Orientdb Avg. Rubocop Vagrant Avg.
Accuracy 93.06% 79.20% 86.13% 88.77% 81.27% 85.02% 89.53% 79.38% 84.46%
MRR 0.82 0.49 0.66 0.61 0.76 0.69 0.76 0.71 0.74
MP 93.06% 77.85% 85.46% 88.69% 81.27% 84.98% 88.49% 79.17% 83.83%
MR 87.36% 74.54% 80.95% 85.33% 76.27% 80.80% 81.49% 67.36% 74.43%
22
[ MP = Mean Precision, MR = Mean Recall, MRR = Mean Reciprocal Rank ]
 In OSS projects, results for different platforms look
surprisingly close except the recall.
 Accuracy and precision are close to 85% on average.
 CORRECT does NOT show any bias to any particular platform.
TAKE-HOME MESSAGES
23
1
3
6
4
5
2
THANK YOU!! QUESTIONS?
24
Masud Rahman (masud.rahman@usask.ca)
CORRECT site (http://www.usask.ca/~masud.rahman/correct)
Acknowledgement: This work is supported by NSERC
THREATS TO VALIDITY
 Threats to Internal Validity
 Skewed dataset: Each of the 10 selected projects is
medium sized (i.e., 1.1K PR) except CS.
 Threats to External Validity
 Limited OSS dataset: Only 6 OSS projects considered—
not sufficient for generalization.
 Issue of heavy PRs: PRs containing hundreds of files can
make the recommendation slower.
 Threats to Construct Validity
 Top-K Accuracy: Does the metric represent effectiveness
of the technique? Widely used by relevant literature
(Thongtanunam et al, SANER 2015)
25

More Related Content

What's hot

Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsLionel Briand
 
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)James Clause
 
Automated Program Repair Keynote talk
Automated Program Repair Keynote talkAutomated Program Repair Keynote talk
Automated Program Repair Keynote talkAbhik Roychoudhury
 
Enabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsEnabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsLionel Briand
 
Architecture of a morphological malware detector
Architecture of a morphological malware detectorArchitecture of a morphological malware detector
Architecture of a morphological malware detectorUltraUploader
 
20100309 03 - Vulnerability analysis (McCabe)
20100309 03 - Vulnerability analysis (McCabe)20100309 03 - Vulnerability analysis (McCabe)
20100309 03 - Vulnerability analysis (McCabe)LeClubQualiteLogicielle
 
Impact of Tool Support in Patch Construction
Impact of Tool Support in Patch ConstructionImpact of Tool Support in Patch Construction
Impact of Tool Support in Patch ConstructionDongsun Kim
 

What's hot (10)

Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
 
Binary Analysis - Luxembourg
Binary Analysis - LuxembourgBinary Analysis - Luxembourg
Binary Analysis - Luxembourg
 
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
 
Automated Program Repair Keynote talk
Automated Program Repair Keynote talkAutomated Program Repair Keynote talk
Automated Program Repair Keynote talk
 
Enabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsEnabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical Systems
 
Architecture of a morphological malware detector
Architecture of a morphological malware detectorArchitecture of a morphological malware detector
Architecture of a morphological malware detector
 
20100309 03 - Vulnerability analysis (McCabe)
20100309 03 - Vulnerability analysis (McCabe)20100309 03 - Vulnerability analysis (McCabe)
20100309 03 - Vulnerability analysis (McCabe)
 
Impact of Tool Support in Patch Construction
Impact of Tool Support in Patch ConstructionImpact of Tool Support in Patch Construction
Impact of Tool Support in Patch Construction
 
Jsp Tin2010
Jsp Tin2010Jsp Tin2010
Jsp Tin2010
 
APSEC2020 Keynote
APSEC2020 KeynoteAPSEC2020 Keynote
APSEC2020 Keynote
 

Similar to CORRECT-ICSE2016

Code-Review-COW56-Meeting
Code-Review-COW56-MeetingCode-Review-COW56-Meeting
Code-Review-COW56-MeetingMasud Rahman
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016Masud Rahman
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionMartin Pinzger
 
A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug predictionMartin Pinzger
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
SurfExample- Recommendation of Exception Handling Code Examples
SurfExample- Recommendation of Exception Handling Code ExamplesSurfExample- Recommendation of Exception Handling Code Examples
SurfExample- Recommendation of Exception Handling Code ExamplesMasud Rahman
 
QUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeQUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeMasud Rahman
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015CodeInsight-SCAM2015
CodeInsight-SCAM2015Masud Rahman
 
Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...Sayed Mohsin Reza
 
A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationkrws
 
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)PVS-Studio
 
Keynote: Machine Learning for Design Automation at DAC 2018
Keynote:  Machine Learning for Design Automation at DAC 2018Keynote:  Machine Learning for Design Automation at DAC 2018
Keynote: Machine Learning for Design Automation at DAC 2018Manish Pandey
 
SAST, CWE, SEI CERT and other smart words from the information security world
SAST, CWE, SEI CERT and other smart words from the information security worldSAST, CWE, SEI CERT and other smart words from the information security world
SAST, CWE, SEI CERT and other smart words from the information security worldAndrey Karpov
 

Similar to CORRECT-ICSE2016 (20)

Code-Review-COW56-Meeting
Code-Review-COW56-MeetingCode-Review-COW56-Meeting
Code-Review-COW56-Meeting
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug Prediction
 
A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug prediction
 
Icsm19.ppt
Icsm19.pptIcsm19.ppt
Icsm19.ppt
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
SurfExample- Recommendation of Exception Handling Code Examples
SurfExample- Recommendation of Exception Handling Code ExamplesSurfExample- Recommendation of Exception Handling Code Examples
SurfExample- Recommendation of Exception Handling Code Examples
 
QUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeQUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-Singapore
 
RACK-SANER2016
RACK-SANER2016RACK-SANER2016
RACK-SANER2016
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015CodeInsight-SCAM2015
CodeInsight-SCAM2015
 
Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...
 
C044061518
C044061518C044061518
C044061518
 
STRICT-SANER2017
STRICT-SANER2017STRICT-SANER2017
STRICT-SANER2017
 
Ch1
Ch1Ch1
Ch1
 
Ch1
Ch1Ch1
Ch1
 
A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localization
 
computer architecture.
computer architecture.computer architecture.
computer architecture.
 
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)
Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)
 
Keynote: Machine Learning for Design Automation at DAC 2018
Keynote:  Machine Learning for Design Automation at DAC 2018Keynote:  Machine Learning for Design Automation at DAC 2018
Keynote: Machine Learning for Design Automation at DAC 2018
 
SAST, CWE, SEI CERT and other smart words from the information security world
SAST, CWE, SEI CERT and other smart words from the information security worldSAST, CWE, SEI CERT and other smart words from the information security world
SAST, CWE, SEI CERT and other smart words from the information security world
 

More from Masud Rahman

HereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityHereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityMasud Rahman
 
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...Masud Rahman
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanPhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanMasud Rahman
 
PhD proposal of Masud Rahman
PhD proposal of Masud RahmanPhD proposal of Masud Rahman
PhD proposal of Masud RahmanMasud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanPhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanMasud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanDoctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanMasud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Masud Rahman
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationMasud Rahman
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017RACK-Tool-ICSE2017
RACK-Tool-ICSE2017Masud Rahman
 
ACER-ASE2017-slides
ACER-ASE2017-slidesACER-ASE2017-slides
ACER-ASE2017-slidesMasud Rahman
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureMasud Rahman
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018Masud Rahman
 
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Masud Rahman
 
Improving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query ReformulationImproving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query ReformulationMasud Rahman
 

More from Masud Rahman (20)

HereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityHereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie University
 
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanPhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of Saskatchewan
 
PhD proposal of Masud Rahman
PhD proposal of Masud RahmanPhD proposal of Masud Rahman
PhD proposal of Masud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanPhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanDoctoral Symposium of Masud Rahman
Doctoral Symposium of Masud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-Localization
 
MSR2017-Challenge
MSR2017-ChallengeMSR2017-Challenge
MSR2017-Challenge
 
MSR2017-RevHelper
MSR2017-RevHelperMSR2017-RevHelper
MSR2017-RevHelper
 
MSR2015-Challenge
MSR2015-ChallengeMSR2015-Challenge
MSR2015-Challenge
 
MSR2014-Challenge
MSR2014-ChallengeMSR2014-Challenge
MSR2014-Challenge
 
STRICT-SANER2015
STRICT-SANER2015STRICT-SANER2015
STRICT-SANER2015
 
CMPT-842-BRACK
CMPT-842-BRACKCMPT-842-BRACK
CMPT-842-BRACK
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017RACK-Tool-ICSE2017
RACK-Tool-ICSE2017
 
ACER-ASE2017-slides
ACER-ASE2017-slidesACER-ASE2017-slides
ACER-ASE2017-slides
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018
 
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...
 
Improving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query ReformulationImproving IR-Based Bug Localization with Context-Aware-Query Reformulation
Improving IR-Based Bug Localization with Context-Aware-Query Reformulation
 

Recently uploaded

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Recently uploaded (20)

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

CORRECT-ICSE2016

  • 1. CORRECT: CODE REVIEWER RECOMMENDATION IN GITHUB BASED ON CROSS-PROJECT & TECHNOLOGY EXPERIENCE Mohammad Masudur Rahman, Chanchal K. Roy and Jason A. Collins* Department of Computer Science University of Saskatchewan, Canada, Google Inc., USA* 38th International Conference on Software Engineering (ICSE 2016), Austin, TX, USA
  • 2. CODE REVIEW 2 Code review could be unpleasant!!
  • 3. CODE REVIEW Formal inspection Peer code review Modern code review Code review is a systematic examination of source code for detecting bugs or defects and coding rule violations. 3 Early bug detection Stop coding rule violation Enhance developer skill
  • 4. 4  FOR Novice developers Distributed software development Delayed 12 days (Thongtanunam et al, SANER 2015)
  • 5. EXISTING LITERATURE  Line Change History (LCH)  ReviewBot (Balachandran, ICSE 2013)  File Path Similarity (FPS)  RevFinder (Thongtanunam et al, SANER 2015)  FPS (Thongtanunam et al, CHASE 2014)  Tie (Xia et al, ICSME 2015)  Code Review Content and Comments  Tie (Xia et al, ICSME 2015)  SNA (Yu et al, ICSME 2014) 5  Issues & Limitations  Mine developer’s contributions from within a single project only.  Library & Technology Similarity Library Technology
  • 6. OUTLINE OF THE TALK 6 Vendasta codebase CORRECT Evaluation using VendAsta code base Evaluation using Open Source Projects Conclusion Comparative study Exploratory study 3 Research questions
  • 7. EXPLORATORY STUDY ( 3 RQS) 1: How frequently do the commercial software projects reuse external libraries from within the codebase? 2: Does the experience of a developer with such libraries matter in code reviewer selection by other developers? 3: How frequently do the commercial projects adopt specialized technologies (e.g., taskqueue, mapreduce, urlfetch)? 7
  • 8. DATASET: EXPLORATORY STUDY 8  Each project has at least 750 closed pull requests.  Each library is used at least 10 times on average.  Each technology is used at least 5 times on average. 10 utility libraries (Vendasta) 10 commercial projects (Vendasta) 10 Google App Engine Technologies
  • 9. LIBRARY USES IN COMMERCIAL PROJECTS (ANSWERED: EXP-RQ1 )  Empirical library usage frequency in 10 projects  Mostly used: vtest, vauth, and vapi  Least used: vlogs, vmonitor 9
  • 10. LIBRARY USES IN PULL REQUESTS (ANSWERED: EXP-RQ2)  30%-70% of pull requests used at least one of the 10 libraries  87%-100% of library authors recommended as code reviewers in the projects using those libraries  Library experience really matters! 10 % of PR using selected libraries % of library authors as code reviewers
  • 11. TECHNOLOGY USES IN PROJECTS (ANSWERED: EXP-RQ3)  Empirical technology usage frequency in top 10 commercial projects  Champion technology: mapreduce 11
  • 12. TECHNOLOGY USES IN PULL REQUESTS (ANSWERED: EXP-RQ3)  20%-60% of the pull requests used at least one of the 10 specialized technologies.  Mostly used in: ARM, CS and VBC 12
  • 13. SUMMARY OF EXPLORATORY FINDINGS 13 About 50% of the pull requests used one or more of the selected libraries. (Exp-RQ1) About 98% of the library authors were later recommended as pull request reviewers. (Exp-RQ2) About 35% of the pull requests used one or more specialized technologies. (Exp-RQ3) Library experience and Specialized technology experience really matter in code reviewer selection/recommendation
  • 14. CORRECT: CODE REVIEWER RECOMMENDATION IN GITHUB USING CROSS-PROJECT & TECHNOLOGY EXPERIENCE 14
  • 15. CORRECT: CODE REVIEWER RECOMMENDATION 15 R1 R2 R3 PR Review R1 PR Review R2 PR Review R3 Review Similarity Review Similarity
  • 16. OUR CONTRIBUTIONS 16 State-of-the-art (Thongtanunam et al, SANER 2015) IF IF Our proposed technique--CORRECT = New PR = Reviewed PR = Source file = External library & specialized technology
  • 17. EVALUATION OF CORRECT  Two evaluations using-- (1) Vendasta codebase (2) Open source software projects 17 1: Are library experience and technology experience useful proxies for code review skills? 2: Does CoRReCT outperform the baseline technique for reviewer recommendation? 3: Does CoRReCT perform equally/comparably for both private and public codebase? 4: Does CoRReCT show bias to any of the development frameworks
  • 18. EXPERIMENTAL DATASET  Sliding window of 30 past requests for learning.  Metrics: Top-K Accuracy, Mean Precision (MP), Mean Recall (MR), and Mean Reciprocal rank (MRR). 18 10 Python projects 2 Python, 2 Java & 2 Ruby projects 13,081 Pull requests 4,034 Pull requests Code reviews Code reviewers Gold set
  • 19. LIBRARY EXPERIENCE & TECHNOLOGY EXPERIENCE (ANSWERED: RQ1) Metric Library Similarity Technology Similarity Combined Similarity Top-3 Top-5 Top-3 Top-5 Top-3 Top-5 Accuracy 83.57% 92.02% 82.18% 91.83% 83.75% 92.15% MRR 0.66 0.67 0.62 0.64 0.65 0.67 MP 65.93% 85.28% 62.99% 83.93% 65.98% 85.93% MR 58.34% 80.77% 55.77% 79.50% 58.43% 81.39% 19 [ MP = Mean Precision, MR = Mean Recall, MRR = Mean Reciprocal Rank ]  Both library experience and technology experience are found as good proxies, provide over 90% accuracy.  Combined experience provides the maximum performance.  92.15% recommendation accuracy with 85.93% precision and 81.39% recall.  Evaluation results align with exploratory study findings.
  • 20. COMPARATIVE STUDY FINDINGS (ANSWERED: RQ2)  CoRReCT performs better than the competing technique in all metrics (p-value=0.003<0.05 for Top-5 accuracy)  Performs better both on average and on individual projects.  RevFinder uses PR similarity using source file name and file’s directory matching 20 Metric RevFinder[18] CoRReCT Top-5 Top-5 Accuracy 80.72% 92.15% MRR 0.65 0.67 MP 77.24% 85.93% MR 73.27% 81.39% [ MP = Mean Precision, MR = Mean Recall, MRR = Mean Reciprocal Rank ]
  • 21. COMPARISON ON OPEN SOURCE PROJECTS (ANSWERED: RQ3)  In OSS projects, CoRReCT also performs better than the baseline technique.  85.20% accuracy with 84.76% precision and 78.73% recall, and not significantly different than earlier (p-value=0.239>0.05 for precision)  Results for private and public codebase are quite close. 21 Metric RevFinder [18] CoRReCT (OSS) CoRReCT (VA) Top-5 Top-5 Top-5 Accuracy 62.90% 85.20% 92.15% MRR 0.55 0.69 0.67 MP 62.57% 84.76% 85.93% MR 58.63% 78.73% 81.39% [ MP = Mean Precision, MR = Mean Recall, MRR = Mean Reciprocal Rank ]
  • 22. COMPARISON ON DIFFERENT PLATFORMS (ANSWERED: RQ4) Metrics Python Java Ruby Beets St2 Avg. OkHttp Orientdb Avg. Rubocop Vagrant Avg. Accuracy 93.06% 79.20% 86.13% 88.77% 81.27% 85.02% 89.53% 79.38% 84.46% MRR 0.82 0.49 0.66 0.61 0.76 0.69 0.76 0.71 0.74 MP 93.06% 77.85% 85.46% 88.69% 81.27% 84.98% 88.49% 79.17% 83.83% MR 87.36% 74.54% 80.95% 85.33% 76.27% 80.80% 81.49% 67.36% 74.43% 22 [ MP = Mean Precision, MR = Mean Recall, MRR = Mean Reciprocal Rank ]  In OSS projects, results for different platforms look surprisingly close except the recall.  Accuracy and precision are close to 85% on average.  CORRECT does NOT show any bias to any particular platform.
  • 24. THANK YOU!! QUESTIONS? 24 Masud Rahman (masud.rahman@usask.ca) CORRECT site (http://www.usask.ca/~masud.rahman/correct) Acknowledgement: This work is supported by NSERC
  • 25. THREATS TO VALIDITY  Threats to Internal Validity  Skewed dataset: Each of the 10 selected projects is medium sized (i.e., 1.1K PR) except CS.  Threats to External Validity  Limited OSS dataset: Only 6 OSS projects considered— not sufficient for generalization.  Issue of heavy PRs: PRs containing hundreds of files can make the recommendation slower.  Threats to Construct Validity  Top-K Accuracy: Does the metric represent effectiveness of the technique? Widely used by relevant literature (Thongtanunam et al, SANER 2015) 25

Editor's Notes

  1. Hello everyone. My name is Mohammad Masudur Rahman I am a 2nd year PhD student from University of Saskatchewan, Canada. Today, I am going to talk about code reviewer recommendation based on cross-project and technology experience. I work with Dr. Chanchal Roy. The other co-author of the paper is Jason Collins from Google, USA.
  2. When I searched in the web, I found this. Obviously, code review is not a very good experience all the time  If you do not select appropriate reviewers, the review could be disastrous. One way to handle the frustration about the code review is choosing the appropriate reviewers for your code.
  3. We already had a talk on code review. Anyway, just to recap, code review is a systematic process of source code that identifies defects and coding standard violation of the code. It helps in early bug detection—thus reduces cost. It also ensures code quality by maintaining the coding standards. Code review has also evolved. First, it was formal inspection which was time-consuming, slow and costly. Then came a less formal code review—peer code review. Now we do tool assisted code review—also called modern code review.
  4. This is an example for code review at GitHub. Once a developer submits a pull request, a way to submitting changes at GitHub, The core developers/ reviewers can review the changes and provide their feedback like this. Our goal in this research is to identify appropriate code reviewers for such a pull request. Identifying such code reviewers is very important especially for the novice developers who do not know the skill set their fellow developers. It is also very essential for distributed development where the developers rarely meet face to face. Besides an earlier study suggest that without appropriate code reviewers the whole change submission could be 12 days late on average. However, identifying such reviewers is challenging since the skill is not obvious, and it would require massive mining of the revision history.
  5. The earlier study analyze line change history of source code, file path similarity of source files and review comments. In short, they mostly considered the work experience of a candidate code reviewers within a single project only. However, some skills span across multiple projects such as working experience with specific API libraries or specialized technologies. Also in an industrial setting, a developer’s contribution scatter throughout different projects within the company codebase. We thus consider external libraries and APIs included in the changed code and suggest more appropriate code reviewers.
  6. This is the outline of my today’s talk. We collect commercial projects and libraries from Vendasta codebase, a medium sized Canadian software company. Then we ask 3 research questions and conduct an exploratory study to answer those questions. Based on those findings, we then propose our recommendation technique—CORRECT. Then in the experiments, we experimented commercial projects, compare with the state-of-the-art and also experimented with Open source projects. Finally, we conclude the talk.
  7. We ask these three research questions. In a commercial codebase, there are two types of projects– customer project and utility projects. The utility projects are also called libraries. We ask. How frequently do the commercial software projects reuse external libraries in their code? Does working experience on such libraries matter in code reviewer selection? That means does a reviewer with such experience get preference over the others? Does working experience with specialized technologies such as mapreduce, taskqueue matter in code reviewer selection?
  8. This is connectivity graph of core projects and internal libraries from Vendasta codebase. We see the graph is pretty much connected, that means most of the libraries are used by most of the projects. For the study, we chose 10 projects and 10 internal libraries, and they are chosen based on certain restriction. Each project should have 750 closed pull requests, that means they should be pretty big, and most importantly pretty much active. Each internal library should be used at least 10 times on average by each of those projects. Each technology, I mean the specialized technology is should be used at least 5 times on average by each of those projects. We consider the Google App Engine libraries as the specialized technologies. We consider 10 of them.
  9. This is the usage frequency of the selected libraries in the 10 project we selected for the study. We take the latest snapshot of each of the projects, analyze their source files, and look for imported libraries using an AST parser. We try to find out those 10 libraries mostly, and this is the box plot of their frequencies. We can see that vtest, vauth and vapi are the mostly used, which a kind of make sense especially vtest and vauth, they provide possibly the generic testing and authentication support. However, vtest has a large variance, that means some projects used it extensively whereas the others didn’t use it at all. The least used libraries are vlogs and vmonitor. So, this is the empirical frequencies.
  10. We investigated the ratio of pull requests that used any of the selected libraries. We note that 30%-70% of all pull requests did that in different projects. We also investigated the percentage of the library authors who are later recommended as code reviewers for the projects referring to that library. We considered a developer as library author if he/she authored at least one pull request of the library. We note that almost 100% of authors are later recommended. This is a very interesting findings that suggest that library experience really matters.
  11. We also calculated the empirical frequency of the ten specialized technologies in the selected Vendasta projects And this is the box plot. We can see that mapreduce is the champion technology here, and the rest are close competitors.
  12. In case of the pull requests, 20%-60% pull requests used at least one of ten specialized technologies Mostly used by ARM, CS and VBC. So, specialized technologies are also used in our selected projects quite significantly.
  13. So, here are the empirical findings from the exploratory studies we conducted. They suggest that library experience and specialized technology experience really matter. These are new findings, and we exploit them to develop the recommendation algorithm later.
  14. Based on those exploratory findings, we propose CORRECT– Code reviewer recommendation based on cross-project and technology experience.
  15. This is our recommendation algorithm. Once a new pull request R3 is created, we analyze its commits, then source files, and look for the libraries referred and the specialized technologies used. Thus, we get a library token list and a technology token list. We combined both lists, and this list can be considered as a summary of libraries and technologies for the new pull request. Now, we consider the latest 10 but closed pull requests, and collect their library and technology tokens. It should be noted that the past requests contain their code reviewers. Now, we estimate the similarity between the new and each of the past requests. We use cosine similarity score between their token list. We add that score to the corresponding code reviewers. This way, finally, we get a list of reviewers who got accumulated score for different past reviews. Then they are ranked and top reviewers are recommended. Thus, we use pull request similarity score to estimate the relevant expertise of code review.
  16. Now, to be technically specific The state-of-the-art considers two pull requests relevant/similar if they share source code files or directories. On the other hand, we suggest that two pull requests are relevant/similar if they share the same external libraries and specialized technologies. That’s the major difference in methodology and our core technical contribution.
  17. We performed two evaluations– one with Vendasta codebase, and the other with Open source codebase. From that experiments, we try to answer four research questions. Are library experience and technology experience useful proxies for code review skills? Can our technique outperform the state-of-the-art technique from the literature? Does it perform equally for closed source and open source projects? Does it show any bias to any particular platform?
  18. We conducted experiments using 10 projects from GitHub codebase and 6 projects from open source domain. From Vendasta, we collected 13K pull requests, and from open source, we collect 4K pull requests. Gold reviewers are collected from the corresponding pull requests. Vendasta projects are python-based whereas the OS projects are written in python, Java and Ruby. We consider four performance metrics– accuracy, precision, recall, and reciprocal rank. In case of accuracy, if the recommendation contains at least one gold reviewer, we consider the recommendation is accurate.
  19. This is how we answer the first RQ. We see that both library similarity and technology similarity are pretty good proxies for code review skills. Each of them provides over 90% top-5 accuracy. However, when we combine, we get the maximum—92% top-5 accuracy. The precision and recall are also greater than 80% which is highly promising according to relevant literature.
  20. We then compare with the state-of-the-art –RevFinder. We found that our performance is significantly better than theirs. We get a p-value of 0.003 for top-5 accuracy with Mann-Whitney U tests. The median accuracy 95%. The median precision and median recall are between 85% to 90% In the case of individual projects, our technique also outperformed the state-of-the-art.
  21. We also experimented using 6 Open source projects, and found 85% Top-5 accuracy. For the case of precision and recall, they are not significantly different from those with Vendasta projects. For example, with precision, we get a p-value of 0.239 which is greater than 0.05.
  22. This slide shows how CORRECT performed with the projects from 3 programming platforms– Python, Java and Ruby. We also find quite similar performance for each of the platforms which is interesting. This shows that our findings with commercial projects are quite generalizable.
  23. Now to summarize Code review could be unpleasant or unproductive without appropriate code reviewers. We fist motivated our technique using an exploratory study which suggested that --library experience and specialized technology experience really matter for code reviewer selection. Then we proposed our technique—CORRECT that learns from past review history and then recommends. We experimented using both commercial and open source projects, and compared with the state-of-the-art. The results clearly demonstrate the high potential of our technique.
  24. That’s all I have to say today. Thanks for your time. Questions!!
  25. There are a few threats to the validity of our findings. -- The dataset from VA codebase is a bit skewed. Most of the projects are medium and only one project is big. --Also the projects considered from open source domain is limited. --Also, the technique could be slower for big pull requests.