Welcome

This website is for the Extract Method Refactoring Datasets Survey (EMRS).


EMRS is an open-source repository for Extraction Method refactoring methodologies intended for supporting software engineering researchers and practitioners. For each case in EMRS, we provide summary, tools, dataset and original papers for extract method refactoring techniques. EMRS currently provides studies for 83 refactoring methods.


About

This survey collects and manages the relevant papers, datasets, and tools for utilization by researchers. This website was designed by Matheus Paixao, and Addl Tariq added content management as part of his MS Capstone at Rochester Institute of Technology. Dr. Mohamed Wiem Mkaouer and Dr. Eman Abdullah AlOmar contributed to the first incarnation of EMRS.


By leveraging the data contained in EMRS, software engineering researchers and practitioners can review the historical research on the Extract Method topic, and can utilize the existing studies (techniques, datasets, and tools) to improve or create new empirical studies to assess Extract Method Refactoring. Moreover, the data in EMRS is a valuable source of knowledge regarding motivation for and expansion of the Extract Method Refactoring techniques in the software engineering domain.


Here you can find useful information, such as understand the survey conducted. And download the tools, and datasets.


Download Tools

EMRS contains links to tools for multiple Extract refactoring techniques. All tools can be access through the below table containing links. The following table elaborates on the details concerning utilized artifacts by each project.


Tool Language No of Metric Interface Usage Guide? Tool Link Last Update
Tuck Unknown Unknown Unknown No No Unknown
CloRT Java N/A Unknown No No Unknown
Nate Java Unknown Eclipse No No Unknown
CCshaper Java 6 Command Line No No Unknown
Aries Java 6 GUI-based No No Unknown
SDAR Java N/A Eclipse No No Unknown
Unnamed Java N/A Eclipse No No Unknown
Xrefactory C++ N/A Unknown Yes Yes 2007
Unnamed Ruby N/A Eclipse Yes Yes 2012
Refactoring Annotation Java Unknown Eclipse No No Unknown
JDeodorant Java 3 IntelliJ/ Eclipse Yes Yes 2019
AutoMed Java 10 Unknown No No Unknown
Wrangler Erlang/OTP N/A GUI-based / Command line Yes Yes 2023
HaRe Haskell 98 N/A GUI-based / Command line Yes Yes 2017
ReAF Java Unknown Unknown No No Unknown
Unnamed C# Unknown Visual Studio extension No No Unknown
CeDAR Java 2 Eclipse No No Unknown
FTMPAT Java 3 Eclipse No No Unknown
SPAPE Procedural / Java Unknown Unknown No No Unknown
JExtract Java Unknown Eclipse Yes Yes 2016
DCRA Java 1 Unknown No No Unknown
RASE Java N/A Eclipse Yes Yes 2015
SEMI Java 5 GUI-based / Command line Yes Yes 2017
GEMS Java 48 Eclipse Yes No 2017
PostponableRefactoring Java N/A Eclipse Yes Yes 2018
LLPM Java 4 Unknown No No Unknown
PRI Java N/A Eclipse No No Unknown
LMR Java 5 Eclipse No No Unknown
CREC Java N/A Eclipse Yes Yes 2018
Bandago Java 4 Eclipse No No Unknown
Unnamed Java N/A Eclipse No Yes 2019
Unnamed Java N/A Unknown No No Unknown
CloneRefactor Java N/A Command line No Yes 2020
TOAD Pharo N/A Pharo Yes Yes 2019
Segmentation Java 2 Eclipse No Yes 2022
LiveRef Java 20 IntelliJ Yes Yes 2022
AntiCopyPaster Java 78 IntelliJ Yes Yes 2023
REM Java N/A IntelliJ Yes Yes 2023

In the above table, you will find links to directories containing csv, jsonl and zip files for the tools, raw data and datasets. For more information on the utilization of tools for Extract Method Refactoring, please review the relevant papers Paper Reviews section.



Download Datasets

EMRS contains links to datasets for multiple Extract refactoring techniques. All tools can be access through the below table containing links.


The following table elaborates on the datasets details utilized for Long Method Decompositions by each project.


Study Intent Language No of Metric No of Project Project Dataset Availability Validation Method
Tuck Long Method Unknown Unknown Unknown Unknown No Proof of Concept
JDeodorant Long Method Java 3 1 Violet 0.16 (LOC: 4,100/ 61 classes/ 144 methods) No Experiment
AutoMed Long Method Java 10 1 houtReader 1.8.0 (LOC: 20,000/ 269 classes) No Case Study
Meananeatra Long Method Java 3 Unknown Unknown No Experiment
Kaya & Fawcett Long Method C++ N/A Unknown Unknown No Case Study
Charalampidou Long Method Java 5 1 jFlex No Case Study
Charalampidou Long Method Java 8 1 jFlex No Case Study
SEMI Long Method Java 5 5 Wikidev, MyPlanner, MyWebMarket, JUnit, JHotDraw Yes Case Study
Hass & Hummel Long Method Java 2 3 Agilefant (LOC: 36,116/ 2,841 methods),
JabRef (LOC: 128,145/5,655 methods),
JChart2D (LOC: 50,728/ 1,849 methods)
No Experiment
Hass & Hummel Long Method Java 9 13 Unknown No Experiment
Kaya & Fawcett Long Method C++ N/A Unknown Unknown No Experiment
LLPM Separation of Concerns Java 4 5 Wikidev (130 total methods), SelfPlanner, MyWebMarket, JUnit, JHotDraw No Experiment
LMR Long Method Java 5 1 JFreeChart 1.0.17 (LOC: 5,665/ 20 classses/ 552 methods) No Case Study
Choi Long Method Java 6 1 JEdit (LOC: 97,116 - 313,706) No Experiment
Banago Long Method Java 4 10 Columbia 1.4 (LOC: 26,600/ 436 classes),
JGraphT 0.9.0 (LOC:14,180/ 218 classes),
SportTracker 5.7 (LOC: 5,200/ 40 classes),
Cayanner 4.0 (LOC: 45,000/ 533 classes),
CheckStyle 6.4.1 (LOC: 60,000/ 399 classes),
Jena 2.12.1 (LOC: 54,410/ 698 classes),
JGroups 3.4.8 (LOC: 76,570/ 644 classes),
Quartz 2.1.7 (LOC:26,810/ 176 classes)
Roller 5.1.2 (LOC: 47,460/ 452 classes),
Squirrel 3.6.0 (LOC: 79,070/ 879 classes)
Yes Case Study
TOAD Long Method Pharo N/A 9 GitMultipleMatrix, TestDeviator, DrTest, Regis, SmallSuiteGenerator, Roassal, Live Robot Programming, KerasBridge, GTool Documneter Yes Experiment
Shahidi Long Method Java Unknown 5 JEdit 4.5.1 (LOC:107,212/ 1,141 classes/ 6,663 methods),
FreeMind 0.9.0 (LOC: 40,933/ 696 classes/4,583 methods),
ArgoUML 0.34 (LOC:249,538/ 2,539 classes/ 17,485 methods),
JFreeChart (LOC: 222,814 / 8,630 classes/ 619 methods)
jVLT 1.3.2 (LOC: 29,161/ 420 classes/ 2,036 methods)
No Experiment
Segmentation Long Method Java 2 6 JEdit, JHotDraw, MyWebMarket, EventBus, Mockito, XData Yes Experiment
LiveRef Long Method Java 20 3 Space Invaders, JHotDraw, Movie retal system Yes Experiment

The following table elaborates on the datasets details utilized for Code Clone Extraction by each project.


Study Intent Language No of Metric No of Project Project Dataset Availability Validation Method
CloRT Code Clone Java Unknown Unknown Unknown No Proof of Concept
Komondoor & Horwitz Code Clone Procedural N/A Unknown Unknown No Proof of Concept
Komondoor & Horwitz Code Clone Procedural N/A Unknown Unknown No Proof of Concept
CCShaper Code Clone Java 6 1 Ant 1.6.0 (LOC: 180,000/ 627 files) No Case Study
Aries Code Clone Java 6 1 Ant 1.6.0 (LOC: 180,000/ 627 files) No Case Study
Juillerat & Hirsbrunner Code Clone Java N/A Unknown Unknown No Proof of Concept
Wrangler Code Clone Erlang/OTP N/A 3 Wrangler (LOC: 30,872),
Mnesia (LOC: 28,152),
Yaws (LOC: 29,603)
No Experiment
HaRe Code Clone Haskell 98 N/A 13 Previous Work No Case Study
Choi Code Clone Java 3 1 Unknown (LOC: 110/ 296 files) No Case Study
CeDar Code Clone Java 2 9 Ant 1.7.0 (KLOC: 67),
Columbia 1.4 (KLOC: 75),
EMF 2.4.1 (KLOC: 118),
Hibernate (KLOC: 209),
Jakarta-JMeter 2.3.2 (KLOC: 54),
JEdit 4.2 (KLOC: 51),
JFreeChart 1.10.10 (KLOC: 76),
JRuby (KLOC: 101),
Squirrel-SQL 3.0.3 (KLOC: 141)
No Experiment
FTMPAT Code Clone Java 3 1 Ant 1.7.0 No Case Study
SPAPE Code Clone Java/ Procedural Unknown 10 Linux 2.6.6/kernal (LOC: 30,629),
Unix/make 3.82 (LOC: 33,864),
http 2.2.2/server (LOC: 36,926),
devecot 2.0.8/src/auth (LOC: 18,243),
gstreamer 0.10.31/gst (LOC: 66,637),
gtk 2.91.5/gdk/x11 (LOC: 30,118),
iptables 1.4.10/extensions (LOC: 19,668),
nginx-0.8.15/src/core (LOC:17,126),
proftpd 1.3.3c/src (LOC: 34,404),
PostgreSQL 9.0.2/src/backend/access (LOC: 605,046)
No Experiment
Bian Code Clone Java Unknown 5 Linux 2.6.6/arch, Linux 2.6.6/net, Linux 2.6.6/sound/drivers, Unix/make 3.82, http2.2.2/server No Experiment
JDeodorant Code Clone Java N/A 9 Ant 1.7.0/Ant 1.9 (KLOC: 67),
Columbia 1.4 (KLOC: 75),
EMF 2.4.1 (KLOC: 118),
JMeter 2.3.2/JMeter 2.9 (KLOC: 54),
JEdit 4.2 (KLOC: 51),
JFreeChart 1.0.10/JFreeChart 1.0.14 (KLOC: 76),
JRuby 1.4.0/JRudby 1.7.3 (KLOC 101),
Hibernate 3.3.2 (KLOC: 209),
SQuirrel SQL 3.0.3 (KLOC: 141)
No Experiment
DCRA Code Clone Java 1 50 Qualitas Copus (v.20120401) No Experiment
RASE Code Clone Java N/A 2 Previous works Yes Experiment
CREC Code Clone Java N/A 6 Axis2 (8,723 commits),
Eclipse.jdt.core (22,358),
Elastic Search (14,766 commits),
JFreeChart (3,603 commits),
JRuby (24,434 commits),
Lucene (22,061 commits)
Yes Experiment
PRI Code Clone Java N/A 6 AlgoUML (LOC: 127,145/ 1,559 files),
Tomcat (LOC: 215,584/ 1,537 files),
Log4j (LOC: 59,499/ 817 files),
Eclipse AspectJ (LOC: 107,368 / 4,758 files),
JEdit (LOC:107,368/ 561 files),
JRuby (LOC: 186,514/ 1,256 files)
No Case Study
Ettinger Code Clone Java N/A Unknown Previous work (59 clone pairs) No Proof of Concept
Unnamed Code Clone Java N/A 2 JFreeChart (KLOC: 260/ 990 classes),
JUnit (KLOC: 43/ 449 classes)
No Experiment
Unnamed Code Clone Java N/A Unknown Unknown No Case Study
CloneRefactor Code Clone Java N/A 1,343 Previous work (LOC (AVG): 980) No Experiment
Sheneamer Code Clone Java N/A 6 Previous work,
netbeans (200 paired clones),
eclipse-jdtcore (400 paired clones),
EITC (426 paired clones),
J2sdk1.4.0-javax (482 paired clones),
eclipse-ant (522 paired clones),
cocoon (655 paired clones)
Yes Experiment
AntiCopyPaster Code Clone Java 78 13 arthas (73,884 commits), easyexcel, camel-quarkus, commons-lang, flink,, iceberg, jena, pulsar, storm, apollo, JavaGuide Yes Experiment

The following table elaborates on the datasets details utilized for Separation of Concerns by each project.


Study Intent Language No of Metric No of Project Project Dataset Availability Validation Method
Maruyama Separation of Concerns Java N/A Unknown Unknown No Proof of Concept
Nate Separation of Concerns Java N/A Unknown Unknown No Proof of Concept
SDAR Separation of Concerns Java N/A Unknown Unknown No Proof of Concept
Juillerat & Hirsbrunner Code Clone Java N/A Unknown Unknown No Proof of Concept
Xrefactory Separation of Concerns C++ N/A Unknown Unknown No Proof of Concept
Unnamed Separation of Concerns Ruby N/A Unknown Unknown No Proof of Concept
RefactoringAnnotation Separation of Concerns Java Unknown 5 Azureus, GanttProject, JasperReports, Java 1.4.2 libraries No Experiment
Abadi Separation of Concerns Java N/A Unknown Unknown No Case Study
Abadi Separation of Concerns Java N/A Unknown Unknown No Case Study
ReAF Separation of Concerns Java Unknown 1 Ant 1.8.1 No Experiment
Sharma Separation of Concerns C/C++ N/A 1 CppCheck No Proof of Concept
Unnamed Separation of Concerns C# Unknown Unknown Unknown No Proof of Concept
JExtract Separation of Concerns Java Unknown 12 MyWebMarket, Unit 3.8/ 4.10, JHotDraw 5.2, Ant 1.8.2, ArgoUML 0.34, CheckStyle 5.6, FindBugs 1.3.9, FreeMind 0.9.0, JFreeChart 1.0.013, Quartz 1.8.3, SQuirrel SQL 2.1.2, Tomcat 7.0.2 Yes Experiment
Gems Separation of Concerns Java 48 5 Wikidev (56 methods),
SelfPlanner (25 methods),
MyWebMarket (23 methods),
Junit (12 methods),
JHotDraw (14 methods)
No Experiment
Imazato Separation of Concerns Java Unknown 5 Ant (LOC: 260,624/ 1,532 methods),
ArgoUML (LOC: 370,750/ 1,470 methods),
JEdit (LOC: 187,166/ 1,066 methods),
JFreeChart (LOC: 327,865/ 180 methods),
Mylyn (LOC: 166,149/980 methods)
No Experiment
PostponableRefactoring Separation of Concerns Java N/A Unknown Unknown No Proof of Concept
Nyamawe Separation of Concerns Java N/A 55 Unknown Yes Experiment
Krasniqi & Cleland-Huang Separation of Concerns Java N/A 4 Derby (KLOC: 170/ 2,382 commits),
Drools (KLOC: 371/ 840 commits),
Groovy (KLOC: 141/ 4,892 commits),
Infinispan (KLOC: 299/ 2,349 commits)
Yes Experiment
Abid Separation of Concerns Java 8 30 Unknown Yes Experiment
Aniche Separation of Concerns Java 61 11,149 Unknown (8.8 million commits) Yes Experiment
Van der Leij Separation of Concerns Java 7 11,149 Previous work (8.8 million commits) No Experiment
Sagar Separation of Concerns Java 60 800 Previous work (748,001 commits) No Experiment
AlOmar Separation of Concerns Java N/A 800 Previous work (748,001 commits) Yes Experiment
Nyamawe Separation of Concerns Java N/A 65 Previous work (7,520 commits) No Experiment
Cui Separation of Concerns Java N/A Unknown Previous work Yes Experiment
REM Separation of Concerns Rust N/A 5 petgraph (LOC: 20,157),
gitoxide (LOC: 20,211),
kickof (LOC: 1,502),
sniffnet (LOC: 7,304), beerus (LOC: 302)
Yes Experiment
Palit Separation of Concerns Java 61 410 Previous work (55,268 commits) Yes Experiment

In the above table, you will find links to directories/files containing csv, jsonl and zip files for datasets. For more information on the utilization of datasets for Extract Method Refactoring, please see the relevant papers in the Paper Reviews section.



Study Summary

EMRS contains a study for multiple Extract refactoring techniques. The literature study, experiments and table below summarizes the study details of this project.


Behind the Intent of Extract Method Refactoring: A Systematic Literature Review

Aim: In this paper, we aim to review the current body of knowledge on existing Extract Method refactoring research and explore their limitations and potential improvement opportunities for future research efforts. That is, the Extract Method is considered one of the most widely-used refactorings, but difficult to apply in practice as it involves low-level code changes such as statements, variables, parameters, return types, etc. Hence, researchers and practitioners begin to be aware of the state-of-the-art and identify new research opportunities in this context.


Method: We review the body of knowledge related to Extract Method refactoring in the form of a systematic literature review (SLR). After compiling an initial pool of 1,367 papers, we conducted a systematic selection, and our final pool included 83 primary studies. We define three sets of research questions and systematically develop and refine a classification schema based on several criteria including their methodology, applicability, as their degree of automation.


Results: The results construct a catalog of 83 Extract Method approaches indicating that several techniques have been proposed in the literature. Our results show that: (i) 38.6% of Extract Method refactoring studies primarily focus on addressing code clones; (ii) Several of the Extract Method tools incorporate the developer’s involvement in the decision-making process when applying the method extraction, and (iii) these existing benchmarks are heterogeneous and do not contain the same type of information, making standardizing them for the purpose of benchmarking difficult.


Conclusions: Our study serves as an “index” to the body of knowledge in this area for researchers and practitioners in determining the Extract Method refactoring approach that is most appropriate for their needs. Our findings also empower the community with information to guide future refactoring tool development.


We drive our study using the following research questions:
RQ1: What approaches were considered by the PSs to recommend Extract Method refactoring?
RQ2: What are the main characteristics of Extract Method recommendation tools?
RQ3: What are the datasets, and benchmarks used for evaluating and validating Extract Method recommendation tools?


Research Questions and Findings

RQ1. What approaches were considered by the PSs to recommend Extract Method refactoring?

Figure below of the sankey diagram provides detailed mappings between our three dimensions. Separation of Concerns is the most popular intent driving method extraction, followed up with Long Method, and finally Code Clone. Interestingly, this is not being matched in terms of the toolset, as the highest ratio of tools goes to Long Method with 41.2%, then equally between Code Clone and Separation of Concerns with 29.4%. Such observation has caught our attention particularly as Separation of Concerns is the only category that relies on all existing detection techniques and has its own unique one,i.e., Evolutionary-based, and yet, there is a lack of concretizing this amount of research into practical tools. As for Detection, it is no surprise that rule-based techniques are the most popular in identifying need-to-refactor code fragments. This is being inherited from how research couples refactoring to a natural response to code smells, e.g., Long Method.


Table analysis

Each research paper's approach methodology, type of data, decision metrics, and evaluation metrics are detailed below:


Study Year Intent
Design Property Representation Detection Execution Semi-Automation Validation Method
Lakhotia & Deprez 1998 Long Method Semantic Graphs Manual Semi-automated Suggest Alternatives Proof of Concept
Balazinska 1999 Code Clone Syntactic AST Fully automated Fully automated N/A Proof of Concept
Komondoor & Horwitz 2000 Code Clone Semantic Graphs Manual Fully automated N/A Proof of Concept
Maruyama 2001 Separation of Concerns Semantic Graphs Manual Semi-automated Choose Candidates Proof of Concept
Komondoor & Horwitz 2003 Code Clone Semantic Graphs Manual Fully automated N/A Proof of Concept
Ettinger & Verbaere 2004 Separation of Concerns Semantic Graphs Manual Fully automated N/A Proof of Concept
Higo 2004 Code Clone Textual Source Code Fully automated Semi-automated Choose Candidates Case Study
Higo 2004 Code Clone Semantic Graphs Fully automated Fully automated N/A Case Study
Higo 2005 Code Clone Textual Source Code Fully automated Semi-automated Execute on Approval Case Study
Higo 2008 Code Clone Textual Source Code Fully automated Semi automated Execute on Approval Case Study
O’Connor 2005 Separation of Concerns Syntactic AST Semi-automated Semi-automated Suggest Alternatives Proof of Concept
Juillerat & Hirsbrunner 2006 Code Clone Syntactic AST Fully-automated Fully-automated N/A Proof of Concept
Juillerat & Hirsbrunner 2007 Separation of Concerns Syntactic AST Manual Fully-automated N/A Proof of Concept
Vittek 2007 Separation of Concerns Syntactic AST Manual Semi-automated User Input Proof of Concept
Corbat 2007 Separation of Concerns Syntactic AST Manual Semi-automated Choose Candidates Proof of Concept
Murphy-Hill & Black 2008 Separation of Concerns Textual Source Code Manual Semi-automated Choose Candidates Experiment
Abadi 2008 Separation of Concerns Textual Source Code Manual Fully automated N/A Case Study
Abadi 2009 Separation of Concerns Textual Source Code Manual Fully automated N/A Case Study
Tsantalis & Chatzigeorgiou 2009 Long Method Textual Source Code Fully automated Semi-automated Suggest Alternatives Experiment
Tsantalis & Chatzigeorgiou 2011 Long Method Textual Source Code Fully automated Semi-automated Suggest Alternatives Experiment
Yang 2009 Long Method Textual Source Code Manual Semi-automated Suggest Alternatives Case Study
Li & Thompson 2009 Code Clone Hybrids AST & Tokens Manual Semi-automated Suggest Alternatives Case Study
Brown & Thompson 2010 Code Clone Hybrids AST & Tokens Manual Semi-automated Suggest Alternatives Case Study
Kanemitsu 2011 Separation of Concerns Semantics Graphs Manual Semi-automated Suggest Alternatives Experiment
Meananeatra 2011 Long Method Syntactic Metrics Manual Semi-automated Suggest Alternatives Proof of Concept
Choi 2011 Code Clone Lexical Tokens Fully automated Manual N/A Case Study
Sharma  2012 Separation of Concerns Semantic Graphs Manual Fully automated N/A Proof of Concept
Cousot 2012 Separation of Concerns Textual Source Code Manual Fully automated N/A Proof of Concept
Tairas & Gray 2012 Code Clone Syntactic AST Fully automated Semi-automated Choose Candidates Experiment
Kaya & Fawcett 2013 Long Method Textual Source Code Fully automated Manual N/A Experiment
Goto 2013 Code Clone Syntactic AST Manual Fully automated N/A Case Study
Bian 2013 Code Clone Hybrids AST & Graphs Manual Fully automated N/A Experiment
Bian 2014 Code Clone Syntactic Metrics Fully automated Manual N/A Experiment
Krishnan & Trantalis 2013 Code Clone Textual Source Code Fully automated Semi-automated User Input Experiment
Krishnan & Trantalis 2014 Code Clone Hybrids AST & Graphs Fully automated Semi-automated User Input Experiment
Tsantalis 2015 Code Clone Hybrids AST & Source Code & Tokens Fully automated Semi-automated User Input Experiment
Mazinanian 2016 Code Clone Hybrids AST & Source Code & Tokens Fully automated Semi-automated User Input Experiment
Tsantalis 2017 Code Clone Hybrids AST & Source Code & Tokens Fully automated Semi-automated User Input Experiment
Silva 2014 Separation of Concerns Textual Source Code Fully automated Semi-automated Suggest Alternatives Experiment
Silva 2015 Separation of Concerns Textual Source Code Fully automated Semi-automated Suggest Alternatives Experiment
Fontana 2015 Code Clone Hybrids AST & Source Code Fully automated Semi-automated Suggest Alternatives Experiment
Meng 2015 Code Clone Syntactic AST Fully automated Fully automated N/A Experiment
Charalampidou 2015 Long Method Syntactic Metrics Fully automated Fully automated N/A Case Study
Charalampidou 2016 Long Method Syntactic AST & Metrics Fully automated Fully automated N/A Case Study
Charalampidou 2018 Long Method Syntactic Metrics Fully automated Fully automated N/A Case Study
Haas & Hummel 2016 Long Method Hybrids Source Code & Graphs Manual Semi-automated Suggest Alternatives Experiment
Haas & Hummel 2017 Long Method Hybrids Source Code & Graphs Manual Semi-automated Select Alternatives Experiment
Xu 2017 Separation of Concerns Textual Source Code Fully automated Semi-automated Choose Candidates Experiment
Imazato 2017 Separation of Concerns Textual Source Code Fully automated Manual N/A Experiment
Kaya & Fawcett 2017 Long Method Semantic Graphs Fully automated Fully automated N/A Experiment
Maruyama & Hayashi 2017 Separation of Concerns Textual Source Code Manual Semi-automated Choose Candidates Proof of Concept
Xu 2017 Long Method Syntactic Metrics Fully automated Manual N/A Experiment
Chen 2017 Code Clone Syntactic AST Manual Fully automated N/A Case Study
Ettinger & Tyszberowicz 2016 Code Clone Textual Source Code Manual Fully automated N/A Proof of Concept
Ettinger 2017 Code Clone Semantic Graphs Manual Fully automated N/A Proof of Concept
Meananeatra 2018 Long Method Hybrids AST & Graphs Manual Semi-automated Execute on Approval Case Study
Choi 2018 Long Method Syntactic Metrics Fully automated Manual N/A Experiment
Yue 2018 Code Clone Syntactic AST Fully automated Manual N/A Experiment
Vidal 2018 Long Method Textual Source Code Fully automated Semi-automated Choose Candidates Case Study
Yoshida 2019 Code Clone Hybrids AST & Tokens Fully automated Semi-automated Choose Candidates Experiment
Shin 2019 Code Clone Syntactic AST Fully automated Fully automated N/A Case Study
Barrs & Oprescu 2019 Code Clone Hybrids AST & Graphs Fully automated Manual N/A Experiment
Antezana 2019 Long Method Textual Source Code Manual Semi-automated Choose Candidates Experiment
Alcocer 2020 Long Method Textual Source Code Manual Semi-automated Choose Candidates Experiment
Nyamawe 2019 Separation of Concerns Textual Text Fully automated Manual N/A Experiment
Nyamawe 2020 Separation of Concerns Textual Text Fully automated Manual N/A Experiment
Krasniqi & Cleland-Huang 2020 Separation of Concerns Textual Text Fully automated Manual N/A Experiment
Abid 2020 Separation of Concerns Textual Source Code Manual Semi-automated User Input Experiment
Sheneamer 2020 Code Clone Hybrids AST & Graphs & Tokens Fully automated Manual Choose Candidates Experiment
Aniche 2020 Separation of Concerns Syntactic Metrics Fully automated Manual N/A Experiment
Van der Leij 2021 Separation of Concerns Syntactic Metrics Fully automated Manual N/A Experiment
Sagar 2021 Separation of Concerns Hybrids Text & Metrics Fully automated Manual N/A Experiment
AlOmar 2022 Separation of Concerns Textual Text Fully automated Manual N/A Experiment
Nyamawe 2022 Separation of Concerns Textual Text Fully automated Manual N/A Experiment
Shahidi 2022 Long Method Hybrids Graphs & Metrics Fully automated Fully automated N/A Experiment
Tiwari & Joshi 2022 Long Method Semantic Graphs Fully automated Manual N/A Experiment
Fernandes 2022 Long Method Syntactic Metrics Fully automated Semi-automated Execute on Approval Experiment
Fernandes 2022 Long Method Syntactic Metrics Fully automated Semi-automated Execute on Approval Experiment
AlOmar 2022 Code Clone Syntactic Metrics Fully automated Semi-automated Execute on Approval Experiment
AlOmar 2023 Code Clone Syntactic Metrics Fully automated Semi-automated Execute on Approval Experiment
Cui 2023 Separation of Concerns Semantic Graphs Fully automated Manual N/A Experiment
Thy 2023 Separation of Concerns Textual Source Code Fullu automated Fully automated N/A Case Study
Palit 2023 Separation of Concerns Semantic Graphs Fully automated Manual N/A Experiment


RQ2. What are the main characteristics of Extract Method recommendation tools?

To help select an appropriate Extract Method refactoring tool, we report the following main characteristics that can be considered to make an informed decision abouttools usage:

  • Language: Indicates the programming language the tool supports.
  • Intent: Indicates the context in which the tool can be used.
  • Number of Metrics: Indicates the number of software metrics used by the tool.
  • Interface: Indicates what IDE the tool supports.
  • Usage Guide?: Indicates the availability of instructions on how to use the tool.
  • Tool Link: Indicates to the online source code repository.
  • Last Update: Indicates whether the tool has been consistently updated/maintained since its development



RQ3. What are the datasets, and benchmarks used for evaluating and validating Extract Method recommendation tools?

Out of the 83 primary studies analyzed, almost 78% of the datasets are unavailable to the public, with only 22% available online, which means there is a lack of online datasets for the Extract Method refactoring research. Primary studies have mostly employed small or medium-scale open-source applications, often developed using Java, typically containing less than 225,000 lines of code. These dataset are heterogeneous and do not contain the same type of information, making standardizing them for the purpose of benchmarking difficult.


The detailed information on Tools, Datasets and Original papers can be found in the following sections:

  • Download Tools
  • Download Data
  • Paper Reviews

  • Paper Reviews


    The following list elaborates on the papers surveyed for this study (Paper Links are clickable).


    Paper Titles
    Restructuring programs by tucking statements into functions
    Automated method-extraction refactoring by using block-based slicing
    ARIES: Refactoring support environment based on code clone analysis
    ARIES: refactoring support tool for code clone
    Refactoring Support Based on Code Clone Analysis
    A metric-based approach to identifying refactoring opportunities for merging code clones in a java software system
    Star Diagram with Automated Refactorings for Eclipse
    A C++ Refactoring Browser and Method Extraction
    Breaking the barriers to successful refactoring: Observations and tools for Extract Method
    Fine slicing for advanced method extraction
    Identification of Extract Method refactoring opportunities
    Identification of Extract Method refactoring opportunities for the decomposition of methods
    Identifying fragments to be extracted from long methods
    Using software metrics to select refactoring for long method bad smell
    A visualization method of program dependency graph for identifying Extract Method opportunity
    Identifying extract-method refactoring candidates automatically
    An abstract interpretation framework for refactoring with application to extract methods with contracts
    Identifying Extract Method Opportunities Based on Variable References (S)
    Increasing clone maintenance support by unifying clone detection and refactoring activities
    Ruby refactoring plug-in for eclipse
    SPAPE: A semantic-preserving amorphous procedure extraction method for near-miss clones
    Identifying accurate refactoring opportunities using metrics
    Recommending automated Extract Method refactorings
    JExtract: An eclipse plug-in for recommending automated Extract Method refactorings
    A duplicated code refactoring advisor
    Identifying Extract Method refactoring opportunities based on functional relevance
    Deriving Extract Method refactoring suggestions for long methods
    Learning to rank extract method refactoring suggestions for long methods
    GEMS: An Extract Method refactoring recommender
    Finding extract method refactoring opportunities by analyzing development history
    Identification of extract method refactoring opportunities through analysis of variable declarations and uses
    A tool supporting postponable refactoring
    A log-linear probabilistic model for prioritizing extract method refactorings
    Refactoring opportunity identification methodology for removing long method smells and improving code analyzability
    An Investigation of the Relationship between Extract Method and Change Metrics: A Case Study of JEdit
    Automatic clone recommendation for refactoring based on the present and the past
    Proactive clone recommendation system for Extract Method refactoring
    TOAD: A tool for recommending auto-refactoring alternatives
    Improving the success rate of applying the Extract Method refactoring
    Feature requests-based recommendation of software refactorings
    Automated recommendation of software refactorings based on feature requests
    Enhancing source code refactoring detection with explanations from commit messages
    How does refactoring impact security when improving quality? a security-aware refactoring approach
    An automatic advisor for refactoring software clones based on machine learning
    The effectiveness of supervised machine learning algorithms in predicting software refactoring
    Data-driven Extract Method recommendations: A study at ING
    Comparing commit messages and source code metrics for the prediction refactoring activities
    On the documentation of refactoring types
    Mining commit messages to enhance software refactorings recommendation: A machine learning approach
    An automated Extract Method refactoring approach to correct the long method code smell
    Identifying Extract Method Refactorings
    LiveRef: a Tool for Live Refactoring Java Code
    A Live Environment to Improve the Refactoring Experience
    AntiCopyPaster: Extracting Code Duplicates As Soon As They Are Introduced in the IDE
    Just-in-time code duplicates extraction
    REMS: Recommending Extract Method Refactoring Opportunities via Multi-view Representation of Code Property Graph
    Size and Cohesion Metrics as Indicators of the Long Method Bad Smell: An Empirical Study
    Assessing the Refactoring of Brain Methods
    JDeodorant: Clone Refactoring
    Untangling: A Slice Extraction Refactoring
    How to extract differences from similar programs? A cohesion metric approach
    A study on the method of removing code duplication using code template
    Structural quality metrics as indicators of the long method bad smell: An empirical study
    Tool Support for Managing Clone Refactorings to Facilitate Code Review in Evolving Software
    Efficient method extraction for automatic elimination of type-3 clones
    Assessing the Refactorability of Software Clones
    Unification and refactoring of clones
    Clone Refactoring with Lambda Expressions
    Refactoring clones: An optimization problem
    Effective automatic procedure extraction
    Adventure of a Lifetime: Extract Method Refactoring for Rust
    An algorithm for detecting and removing clones in java code
    Extracting Code Clones for Refactoring Using Combinations of Clone Metrics
    Improving Method Extraction: A Novel Approach to Data Flow Analysis Using Boolean Flags and Expressions
    Semantics-preserving procedure extraction
    Duplication for the Removal of Duplication
    Does Automated Refactoring Obviate Systematic Editing?
    Re-Approaching the Refactoring Rubicon
    Automatic Refactoring Candidate Identification Leveraging Effective Code Representation
    Clone Detection and Elimination for Haskell
    Clone Detection and Removal for Erlang/OTP within a Refactoring Environment
    Partial Redesign of Java Software Systems Based on Clone Analysis
    Towards Automated Refactoring of Code Clones in Object-Oriented Programming Languages

    Publications

    This is a list of publications that use the CROP dataset. If you have a published piece of work that uses CROP and it is not listed above, feel free to contact us. Your publication will be included in the list soon.


    Journal Articles

    Matheus Paixao, Jens Krinke, DongGyun Han, Chaiyong Ragkhitwetsagul, Mark Harman. 2019. In IEEE Transactions on Software Engineering (TSE). Preprint


    Conference Papers

    Luca Pascarella, Davide Spadini, Fabio Palomba, Alberto Bacchelli. 2020. On The Effect Of Code Review On Code Smells. In IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). Preprint


    Matheus Paixao, Paulo Henrique Maia. 2019. Rebasing in Code Review Considered Harmful:A Large-scale Empirical Investigation. In International Conference on Source Code Analysis and Manipulation (SCAM). Preprint


    Matheus Paixao, Jens Krinke, DongGyun Han, and Mark Harman. 2018. CROP: Linking Code Reviews to Source Code Changes. In International Conference on Mining Software Repositories (MSR). Preprint


    Contact

    You can contact the EMRS's team through the following channel:


    EMRS's mailing list