This website is for the Extract Method Refactoring Datasets Survey (EMRS).
EMRS is an open-source repository for Extraction Method refactoring methodologies intended for supporting software engineering researchers and practitioners. For each case in EMRS, we provide summary, tools, dataset and original papers for extract method refactoring techniques. EMRS currently provides studies for 83 refactoring methods.
This survey collects and manages the relevant papers, datasets, and tools for utilization by researchers. This website was designed by Matheus Paixao, and Addl Tariq added content management as part of his MS Capstone at Rochester Institute of Technology. Dr. Mohamed Wiem Mkaouer and Dr. Eman Abdullah AlOmar contributed to the first incarnation of EMRS.
By leveraging the data contained in EMRS, software engineering researchers and practitioners can review the historical research on the Extract Method topic, and can utilize the existing studies (techniques, datasets, and tools) to improve or create new empirical studies to assess Extract Method Refactoring. Moreover, the data in EMRS is a valuable source of knowledge regarding motivation for and expansion of the Extract Method Refactoring techniques in the software engineering domain.
Here you can find useful information, such as understand the survey conducted. And download the tools, and datasets.
EMRS contains links to tools for multiple Extract refactoring techniques. All tools can be access through the below table containing links. The following table elaborates on the details concerning utilized artifacts by each project.
Tool | Language | No of Metric | Interface | Usage Guide? | Tool Link | Last Update |
---|---|---|---|---|---|---|
Tuck | Unknown | Unknown | Unknown | No | No | Unknown |
CloRT | Java | N/A | Unknown | No | No | Unknown |
Nate | Java | Unknown | Eclipse | No | No | Unknown |
CCshaper | Java | 6 | Command Line | No | No | Unknown |
Aries | Java | 6 | GUI-based | No | No | Unknown |
SDAR | Java | N/A | Eclipse | No | No | Unknown |
Unnamed | Java | N/A | Eclipse | No | No | Unknown |
Xrefactory | C++ | N/A | Unknown | Yes | Yes | 2007 |
Unnamed | Ruby | N/A | Eclipse | Yes | Yes | 2012 |
Refactoring Annotation | Java | Unknown | Eclipse | No | No | Unknown |
JDeodorant | Java | 3 | IntelliJ/ Eclipse | Yes | Yes | 2019 |
AutoMed | Java | 10 | Unknown | No | No | Unknown |
Wrangler | Erlang/OTP | N/A | GUI-based / Command line | Yes | Yes | 2023 |
HaRe | Haskell 98 | N/A | GUI-based / Command line | Yes | Yes | 2017 |
ReAF | Java | Unknown | Unknown | No | No | Unknown |
Unnamed | C# | Unknown | Visual Studio extension | No | No | Unknown |
CeDAR | Java | 2 | Eclipse | No | No | Unknown |
FTMPAT | Java | 3 | Eclipse | No | No | Unknown |
SPAPE | Procedural / Java | Unknown | Unknown | No | No | Unknown |
JExtract | Java | Unknown | Eclipse | Yes | Yes | 2016 |
DCRA | Java | 1 | Unknown | No | No | Unknown |
RASE | Java | N/A | Eclipse | Yes | Yes | 2015 |
SEMI | Java | 5 | GUI-based / Command line | Yes | Yes | 2017 |
GEMS | Java | 48 | Eclipse | Yes | No | 2017 |
PostponableRefactoring | Java | N/A | Eclipse | Yes | Yes | 2018 |
LLPM | Java | 4 | Unknown | No | No | Unknown |
PRI | Java | N/A | Eclipse | No | No | Unknown |
LMR | Java | 5 | Eclipse | No | No | Unknown |
CREC | Java | N/A | Eclipse | Yes | Yes | 2018 |
Bandago | Java | 4 | Eclipse | No | No | Unknown |
Unnamed | Java | N/A | Eclipse | No | Yes | 2019 |
Unnamed | Java | N/A | Unknown | No | No | Unknown |
CloneRefactor | Java | N/A | Command line | No | Yes | 2020 |
TOAD | Pharo | N/A | Pharo | Yes | Yes | 2019 |
Segmentation | Java | 2 | Eclipse | No | Yes | 2022 |
LiveRef | Java | 20 | IntelliJ | Yes | Yes | 2022 |
AntiCopyPaster | Java | 78 | IntelliJ | Yes | Yes | 2023 |
REM | Java | N/A | IntelliJ | Yes | Yes | 2023 |
In the above table, you will find links to directories containing csv, jsonl and zip files for the tools, raw data and datasets. For more information on the utilization of tools for Extract Method Refactoring, please review the relevant papers Paper Reviews section.
EMRS contains links to datasets for multiple Extract refactoring techniques. All tools can be access through the below table containing links.
The following table elaborates on the datasets details utilized for Long Method Decompositions by each project.
Study | Intent | Language | No of Metric | No of Project | Project | Dataset Availability | Validation Method |
---|---|---|---|---|---|---|---|
Tuck | Long Method | Unknown | Unknown | Unknown | Unknown | No | Proof of Concept |
JDeodorant | Long Method | Java | 3 | 1 | Violet 0.16 (LOC: 4,100/ 61 classes/ 144 methods) | No | Experiment |
AutoMed | Long Method | Java | 10 | 1 | houtReader 1.8.0 (LOC: 20,000/ 269 classes) | No | Case Study |
Meananeatra | Long Method | Java | 3 | Unknown | Unknown | No | Experiment |
Kaya & Fawcett | Long Method | C++ | N/A | Unknown | Unknown | No | Case Study |
Charalampidou | Long Method | Java | 5 | 1 | jFlex | No | Case Study |
Charalampidou | Long Method | Java | 8 | 1 | jFlex | No | Case Study |
SEMI | Long Method | Java | 5 | 5 | Wikidev, MyPlanner, MyWebMarket, JUnit, JHotDraw | Yes | Case Study |
Hass & Hummel | Long Method | Java | 2 | 3 | Agilefant (LOC: 36,116/ 2,841 methods), JabRef (LOC: 128,145/5,655 methods), JChart2D (LOC: 50,728/ 1,849 methods) |
No | Experiment |
Hass & Hummel | Long Method | Java | 9 | 13 | Unknown | No | Experiment |
Kaya & Fawcett | Long Method | C++ | N/A | Unknown | Unknown | No | Experiment |
LLPM | Separation of Concerns | Java | 4 | 5 | Wikidev (130 total methods), SelfPlanner, MyWebMarket, JUnit, JHotDraw | No | Experiment |
LMR | Long Method | Java | 5 | 1 | JFreeChart 1.0.17 (LOC: 5,665/ 20 classses/ 552 methods) | No | Case Study |
Choi | Long Method | Java | 6 | 1 | JEdit (LOC: 97,116 - 313,706) | No | Experiment |
Banago | Long Method | Java | 4 | 10 | Columbia 1.4 (LOC: 26,600/ 436 classes), JGraphT 0.9.0 (LOC:14,180/ 218 classes), SportTracker 5.7 (LOC: 5,200/ 40 classes), Cayanner 4.0 (LOC: 45,000/ 533 classes), CheckStyle 6.4.1 (LOC: 60,000/ 399 classes), Jena 2.12.1 (LOC: 54,410/ 698 classes), JGroups 3.4.8 (LOC: 76,570/ 644 classes), Quartz 2.1.7 (LOC:26,810/ 176 classes) Roller 5.1.2 (LOC: 47,460/ 452 classes), Squirrel 3.6.0 (LOC: 79,070/ 879 classes) |
Yes | Case Study |
TOAD | Long Method | Pharo | N/A | 9 | GitMultipleMatrix, TestDeviator, DrTest, Regis, SmallSuiteGenerator, Roassal, Live Robot Programming, KerasBridge, GTool Documneter | Yes | Experiment |
Shahidi | Long Method | Java | Unknown | 5 | JEdit 4.5.1 (LOC:107,212/ 1,141 classes/ 6,663 methods), FreeMind 0.9.0 (LOC: 40,933/ 696 classes/4,583 methods), ArgoUML 0.34 (LOC:249,538/ 2,539 classes/ 17,485 methods), JFreeChart (LOC: 222,814 / 8,630 classes/ 619 methods) jVLT 1.3.2 (LOC: 29,161/ 420 classes/ 2,036 methods) |
No | Experiment |
Segmentation | Long Method | Java | 2 | 6 | JEdit, JHotDraw, MyWebMarket, EventBus, Mockito, XData | Yes | Experiment |
LiveRef | Long Method | Java | 20 | 3 | Space Invaders, JHotDraw, Movie retal system | Yes | Experiment |
The following table elaborates on the datasets details utilized for Code Clone Extraction by each project.
Study | Intent | Language | No of Metric | No of Project | Project | Dataset Availability | Validation Method |
---|---|---|---|---|---|---|---|
CloRT | Code Clone | Java | Unknown | Unknown | Unknown | No | Proof of Concept |
Komondoor & Horwitz | Code Clone | Procedural | N/A | Unknown | Unknown | No | Proof of Concept |
Komondoor & Horwitz | Code Clone | Procedural | N/A | Unknown | Unknown | No | Proof of Concept |
CCShaper | Code Clone | Java | 6 | 1 | Ant 1.6.0 (LOC: 180,000/ 627 files) | No | Case Study |
Aries | Code Clone | Java | 6 | 1 | Ant 1.6.0 (LOC: 180,000/ 627 files) | No | Case Study |
Juillerat & Hirsbrunner | Code Clone | Java | N/A | Unknown | Unknown | No | Proof of Concept |
Wrangler | Code Clone | Erlang/OTP | N/A | 3 | Wrangler (LOC: 30,872), Mnesia (LOC: 28,152), Yaws (LOC: 29,603) |
No | Experiment |
HaRe | Code Clone | Haskell 98 | N/A | 13 | Previous Work | No | Case Study |
Choi | Code Clone | Java | 3 | 1 | Unknown (LOC: 110/ 296 files) | No | Case Study |
CeDar | Code Clone | Java | 2 | 9 | Ant 1.7.0 (KLOC: 67), Columbia 1.4 (KLOC: 75), EMF 2.4.1 (KLOC: 118), Hibernate (KLOC: 209), Jakarta-JMeter 2.3.2 (KLOC: 54), JEdit 4.2 (KLOC: 51), JFreeChart 1.10.10 (KLOC: 76), JRuby (KLOC: 101), Squirrel-SQL 3.0.3 (KLOC: 141) |
No | Experiment |
FTMPAT | Code Clone | Java | 3 | 1 | Ant 1.7.0 | No | Case Study |
SPAPE | Code Clone | Java/ Procedural | Unknown | 10 | Linux 2.6.6/kernal (LOC: 30,629), Unix/make 3.82 (LOC: 33,864), http 2.2.2/server (LOC: 36,926), devecot 2.0.8/src/auth (LOC: 18,243), gstreamer 0.10.31/gst (LOC: 66,637), gtk 2.91.5/gdk/x11 (LOC: 30,118), iptables 1.4.10/extensions (LOC: 19,668), nginx-0.8.15/src/core (LOC:17,126), proftpd 1.3.3c/src (LOC: 34,404), PostgreSQL 9.0.2/src/backend/access (LOC: 605,046) |
No | Experiment |
Bian | Code Clone | Java | Unknown | 5 | Linux 2.6.6/arch, Linux 2.6.6/net, Linux 2.6.6/sound/drivers, Unix/make 3.82, http2.2.2/server | No | Experiment |
JDeodorant | Code Clone | Java | N/A | 9 | Ant 1.7.0/Ant 1.9 (KLOC: 67), Columbia 1.4 (KLOC: 75), EMF 2.4.1 (KLOC: 118), JMeter 2.3.2/JMeter 2.9 (KLOC: 54), JEdit 4.2 (KLOC: 51), JFreeChart 1.0.10/JFreeChart 1.0.14 (KLOC: 76), JRuby 1.4.0/JRudby 1.7.3 (KLOC 101), Hibernate 3.3.2 (KLOC: 209), SQuirrel SQL 3.0.3 (KLOC: 141) |
No | Experiment |
DCRA | Code Clone | Java | 1 | 50 | Qualitas Copus (v.20120401) | No | Experiment |
RASE | Code Clone | Java | N/A | 2 | Previous works | Yes | Experiment |
CREC | Code Clone | Java | N/A | 6 | Axis2 (8,723 commits), Eclipse.jdt.core (22,358), Elastic Search (14,766 commits), JFreeChart (3,603 commits), JRuby (24,434 commits), Lucene (22,061 commits) |
Yes | Experiment |
PRI | Code Clone | Java | N/A | 6 | AlgoUML (LOC: 127,145/ 1,559 files), Tomcat (LOC: 215,584/ 1,537 files), Log4j (LOC: 59,499/ 817 files), Eclipse AspectJ (LOC: 107,368 / 4,758 files), JEdit (LOC:107,368/ 561 files), JRuby (LOC: 186,514/ 1,256 files) |
No | Case Study |
Ettinger | Code Clone | Java | N/A | Unknown | Previous work (59 clone pairs) | No | Proof of Concept |
Unnamed | Code Clone | Java | N/A | 2 | JFreeChart (KLOC: 260/ 990 classes), JUnit (KLOC: 43/ 449 classes) |
No | Experiment |
Unnamed | Code Clone | Java | N/A | Unknown | Unknown | No | Case Study |
CloneRefactor | Code Clone | Java | N/A | 1,343 | Previous work (LOC (AVG): 980) | No | Experiment |
Sheneamer | Code Clone | Java | N/A | 6 | Previous work, netbeans (200 paired clones), eclipse-jdtcore (400 paired clones), EITC (426 paired clones), J2sdk1.4.0-javax (482 paired clones), eclipse-ant (522 paired clones), cocoon (655 paired clones) |
Yes | Experiment |
AntiCopyPaster | Code Clone | Java | 78 | 13 | arthas (73,884 commits), easyexcel, camel-quarkus, commons-lang, flink,, iceberg, jena, pulsar, storm, apollo, JavaGuide | Yes | Experiment |
The following table elaborates on the datasets details utilized for Separation of Concerns by each project.
Study | Intent | Language | No of Metric | No of Project | Project | Dataset Availability | Validation Method |
---|---|---|---|---|---|---|---|
Maruyama | Separation of Concerns | Java | N/A | Unknown | Unknown | No | Proof of Concept |
Nate | Separation of Concerns | Java | N/A | Unknown | Unknown | No | Proof of Concept |
SDAR | Separation of Concerns | Java | N/A | Unknown | Unknown | No | Proof of Concept |
Juillerat & Hirsbrunner | Code Clone | Java | N/A | Unknown | Unknown | No | Proof of Concept |
Xrefactory | Separation of Concerns | C++ | N/A | Unknown | Unknown | No | Proof of Concept |
Unnamed | Separation of Concerns | Ruby | N/A | Unknown | Unknown | No | Proof of Concept |
RefactoringAnnotation | Separation of Concerns | Java | Unknown | 5 | Azureus, GanttProject, JasperReports, Java 1.4.2 libraries | No | Experiment |
Abadi | Separation of Concerns | Java | N/A | Unknown | Unknown | No | Case Study |
Abadi | Separation of Concerns | Java | N/A | Unknown | Unknown | No | Case Study |
ReAF | Separation of Concerns | Java | Unknown | 1 | Ant 1.8.1 | No | Experiment |
Sharma | Separation of Concerns | C/C++ | N/A | 1 | CppCheck | No | Proof of Concept |
Unnamed | Separation of Concerns | C# | Unknown | Unknown | Unknown | No | Proof of Concept |
JExtract | Separation of Concerns | Java | Unknown | 12 | MyWebMarket, Unit 3.8/ 4.10, JHotDraw 5.2, Ant 1.8.2, ArgoUML 0.34, CheckStyle 5.6, FindBugs 1.3.9, FreeMind 0.9.0, JFreeChart 1.0.013, Quartz 1.8.3, SQuirrel SQL 2.1.2, Tomcat 7.0.2 | Yes | Experiment |
Gems | Separation of Concerns | Java | 48 | 5 | Wikidev (56 methods), SelfPlanner (25 methods), MyWebMarket (23 methods), Junit (12 methods), JHotDraw (14 methods) |
No | Experiment |
Imazato | Separation of Concerns | Java | Unknown | 5 | Ant (LOC: 260,624/ 1,532 methods), ArgoUML (LOC: 370,750/ 1,470 methods), JEdit (LOC: 187,166/ 1,066 methods), JFreeChart (LOC: 327,865/ 180 methods), Mylyn (LOC: 166,149/980 methods) |
No | Experiment |
PostponableRefactoring | Separation of Concerns | Java | N/A | Unknown | Unknown | No | Proof of Concept |
Nyamawe | Separation of Concerns | Java | N/A | 55 | Unknown | Yes | Experiment |
Krasniqi & Cleland-Huang | Separation of Concerns | Java | N/A | 4 | Derby (KLOC: 170/ 2,382 commits), Drools (KLOC: 371/ 840 commits), Groovy (KLOC: 141/ 4,892 commits), Infinispan (KLOC: 299/ 2,349 commits) |
Yes | Experiment |
Abid | Separation of Concerns | Java | 8 | 30 | Unknown | Yes | Experiment |
Aniche | Separation of Concerns | Java | 61 | 11,149 | Unknown (8.8 million commits) | Yes | Experiment |
Van der Leij | Separation of Concerns | Java | 7 | 11,149 | Previous work (8.8 million commits) | No | Experiment |
Sagar | Separation of Concerns | Java | 60 | 800 | Previous work (748,001 commits) | No | Experiment |
AlOmar | Separation of Concerns | Java | N/A | 800 | Previous work (748,001 commits) | Yes | Experiment |
Nyamawe | Separation of Concerns | Java | N/A | 65 | Previous work (7,520 commits) | No | Experiment |
Cui | Separation of Concerns | Java | N/A | Unknown | Previous work | Yes | Experiment |
REM | Separation of Concerns | Rust | N/A | 5 | petgraph (LOC: 20,157), gitoxide (LOC: 20,211), kickof (LOC: 1,502), sniffnet (LOC: 7,304), beerus (LOC: 302) |
Yes | Experiment |
Palit | Separation of Concerns | Java | 61 | 410 | Previous work (55,268 commits) | Yes | Experiment |
In the above table, you will find links to directories/files containing csv, jsonl and zip files for datasets. For more information on the utilization of datasets for Extract Method Refactoring, please see the relevant papers in the Paper Reviews section.
EMRS contains a study for multiple Extract refactoring techniques. The literature study, experiments and table below summarizes the study details of this project.
Aim: In this paper, we aim to review the current body of knowledge on existing Extract Method refactoring research and explore
their limitations and potential improvement opportunities for future research efforts. That is, the Extract Method is considered one of the
most widely-used refactorings, but difficult to apply in practice as it involves low-level code changes such as statements, variables,
parameters, return types, etc. Hence, researchers and practitioners begin to be aware of the state-of-the-art and identify new research
opportunities in this context.
Method: We review the body of knowledge related to Extract Method refactoring in the form of a systematic literature review (SLR). After compiling an initial pool of 1,367 papers, we conducted a systematic selection, and our final pool included 83 primary studies. We define three sets of research questions and systematically develop and refine a classification schema based on several criteria including their methodology, applicability, as their degree of automation.
Results: The results construct a catalog of 83 Extract Method approaches indicating that several techniques have been proposed in the literature. Our results show that: (i) 38.6% of Extract Method refactoring studies primarily focus on addressing code clones; (ii) Several of the Extract Method tools incorporate the developer’s involvement in the decision-making process when applying the method extraction, and (iii) these existing benchmarks are heterogeneous and do not contain the same type of information, making standardizing them for the purpose of benchmarking difficult.
Conclusions: Our study serves as an “index” to the body of knowledge in this area for researchers and practitioners in determining the Extract Method refactoring approach that is most appropriate for their needs. Our findings also empower the community with information to guide future refactoring tool development.
We drive our study using the following research questions:
RQ1: What approaches were considered by the PSs to recommend Extract Method refactoring?
RQ2: What are the main characteristics of Extract Method recommendation tools?
RQ3: What are the datasets, and benchmarks used for evaluating and validating Extract Method recommendation tools?
RQ1. What approaches were considered by the PSs to recommend Extract Method refactoring?
Figure below of the sankey diagram provides detailed mappings between our three dimensions. Separation of Concerns is the most popular intent driving method extraction, followed up with Long Method, and finally Code Clone. Interestingly, this is not being matched in terms of the toolset, as the highest ratio of tools goes to Long Method with 41.2%, then equally between Code Clone and Separation of Concerns with 29.4%. Such observation has caught our attention particularly as Separation of Concerns is the only category that relies on all existing detection techniques and has its own unique one,i.e., Evolutionary-based, and yet, there is a lack of concretizing this amount of research into practical tools. As for Detection, it is no surprise that rule-based techniques are the most popular in identifying need-to-refactor code fragments. This is being inherited from how research couples refactoring to a natural response to code smells, e.g., Long Method.
Each research paper's approach methodology, type of data, decision metrics, and evaluation metrics are detailed below:
Study | Year | Intent |
Design Property | Representation | Detection | Execution | Semi-Automation | Validation Method |
---|---|---|---|---|---|---|---|---|
Lakhotia & Deprez | 1998 | Long Method | Semantic | Graphs | Manual | Semi-automated | Suggest Alternatives | Proof of Concept |
Balazinska | 1999 | Code Clone | Syntactic | AST | Fully automated | Fully automated | N/A | Proof of Concept |
Komondoor & Horwitz | 2000 | Code Clone | Semantic | Graphs | Manual | Fully automated | N/A | Proof of Concept |
Maruyama | 2001 | Separation of Concerns | Semantic | Graphs | Manual | Semi-automated | Choose Candidates | Proof of Concept |
Komondoor & Horwitz | 2003 | Code Clone | Semantic | Graphs | Manual | Fully automated | N/A | Proof of Concept |
Ettinger & Verbaere | 2004 | Separation of Concerns | Semantic | Graphs | Manual | Fully automated | N/A | Proof of Concept |
Higo | 2004 | Code Clone | Textual | Source Code | Fully automated | Semi-automated | Choose Candidates | Case Study |
Higo | 2004 | Code Clone | Semantic | Graphs | Fully automated | Fully automated | N/A | Case Study |
Higo | 2005 | Code Clone | Textual | Source Code | Fully automated | Semi-automated | Execute on Approval | Case Study |
Higo | 2008 | Code Clone | Textual | Source Code | Fully automated | Semi automated | Execute on Approval | Case Study |
O’Connor | 2005 | Separation of Concerns | Syntactic | AST | Semi-automated | Semi-automated | Suggest Alternatives | Proof of Concept |
Juillerat & Hirsbrunner | 2006 | Code Clone | Syntactic | AST | Fully-automated | Fully-automated | N/A | Proof of Concept |
Juillerat & Hirsbrunner | 2007 | Separation of Concerns | Syntactic | AST | Manual | Fully-automated | N/A | Proof of Concept |
Vittek | 2007 | Separation of Concerns | Syntactic | AST | Manual | Semi-automated | User Input | Proof of Concept |
Corbat | 2007 | Separation of Concerns | Syntactic | AST | Manual | Semi-automated | Choose Candidates | Proof of Concept |
Murphy-Hill & Black | 2008 | Separation of Concerns | Textual | Source Code | Manual | Semi-automated | Choose Candidates | Experiment |
Abadi | 2008 | Separation of Concerns | Textual | Source Code | Manual | Fully automated | N/A | Case Study |
Abadi | 2009 | Separation of Concerns | Textual | Source Code | Manual | Fully automated | N/A | Case Study |
Tsantalis & Chatzigeorgiou | 2009 | Long Method | Textual | Source Code | Fully automated | Semi-automated | Suggest Alternatives | Experiment |
Tsantalis & Chatzigeorgiou | 2011 | Long Method | Textual | Source Code | Fully automated | Semi-automated | Suggest Alternatives | Experiment |
Yang | 2009 | Long Method | Textual | Source Code | Manual | Semi-automated | Suggest Alternatives | Case Study |
Li & Thompson | 2009 | Code Clone | Hybrids | AST & Tokens | Manual | Semi-automated | Suggest Alternatives | Case Study |
Brown & Thompson | 2010 | Code Clone | Hybrids | AST & Tokens | Manual | Semi-automated | Suggest Alternatives | Case Study |
Kanemitsu | 2011 | Separation of Concerns | Semantics | Graphs | Manual | Semi-automated | Suggest Alternatives | Experiment |
Meananeatra | 2011 | Long Method | Syntactic | Metrics | Manual | Semi-automated | Suggest Alternatives | Proof of Concept |
Choi | 2011 | Code Clone | Lexical | Tokens | Fully automated | Manual | N/A | Case Study |
Sharma | 2012 | Separation of Concerns | Semantic | Graphs | Manual | Fully automated | N/A | Proof of Concept |
Cousot | 2012 | Separation of Concerns | Textual | Source Code | Manual | Fully automated | N/A | Proof of Concept |
Tairas & Gray | 2012 | Code Clone | Syntactic | AST | Fully automated | Semi-automated | Choose Candidates | Experiment |
Kaya & Fawcett | 2013 | Long Method | Textual | Source Code | Fully automated | Manual | N/A | Experiment |
Goto | 2013 | Code Clone | Syntactic | AST | Manual | Fully automated | N/A | Case Study |
Bian | 2013 | Code Clone | Hybrids | AST & Graphs | Manual | Fully automated | N/A | Experiment |
Bian | 2014 | Code Clone | Syntactic | Metrics | Fully automated | Manual | N/A | Experiment |
Krishnan & Trantalis | 2013 | Code Clone | Textual | Source Code | Fully automated | Semi-automated | User Input | Experiment |
Krishnan & Trantalis | 2014 | Code Clone | Hybrids | AST & Graphs | Fully automated | Semi-automated | User Input | Experiment |
Tsantalis | 2015 | Code Clone | Hybrids | AST & Source Code & Tokens | Fully automated | Semi-automated | User Input | Experiment |
Mazinanian | 2016 | Code Clone | Hybrids | AST & Source Code & Tokens | Fully automated | Semi-automated | User Input | Experiment |
Tsantalis | 2017 | Code Clone | Hybrids | AST & Source Code & Tokens | Fully automated | Semi-automated | User Input | Experiment |
Silva | 2014 | Separation of Concerns | Textual | Source Code | Fully automated | Semi-automated | Suggest Alternatives | Experiment |
Silva | 2015 | Separation of Concerns | Textual | Source Code | Fully automated | Semi-automated | Suggest Alternatives | Experiment |
Fontana | 2015 | Code Clone | Hybrids | AST & Source Code | Fully automated | Semi-automated | Suggest Alternatives | Experiment |
Meng | 2015 | Code Clone | Syntactic | AST | Fully automated | Fully automated | N/A | Experiment |
Charalampidou | 2015 | Long Method | Syntactic | Metrics | Fully automated | Fully automated | N/A | Case Study |
Charalampidou | 2016 | Long Method | Syntactic | AST & Metrics | Fully automated | Fully automated | N/A | Case Study |
Charalampidou | 2018 | Long Method | Syntactic | Metrics | Fully automated | Fully automated | N/A | Case Study |
Haas & Hummel | 2016 | Long Method | Hybrids | Source Code & Graphs | Manual | Semi-automated | Suggest Alternatives | Experiment |
Haas & Hummel | 2017 | Long Method | Hybrids | Source Code & Graphs | Manual | Semi-automated | Select Alternatives | Experiment |
Xu | 2017 | Separation of Concerns | Textual | Source Code | Fully automated | Semi-automated | Choose Candidates | Experiment |
Imazato | 2017 | Separation of Concerns | Textual | Source Code | Fully automated | Manual | N/A | Experiment |
Kaya & Fawcett | 2017 | Long Method | Semantic | Graphs | Fully automated | Fully automated | N/A | Experiment |
Maruyama & Hayashi | 2017 | Separation of Concerns | Textual | Source Code | Manual | Semi-automated | Choose Candidates | Proof of Concept |
Xu | 2017 | Long Method | Syntactic | Metrics | Fully automated | Manual | N/A | Experiment |
Chen | 2017 | Code Clone | Syntactic | AST | Manual | Fully automated | N/A | Case Study |
Ettinger & Tyszberowicz | 2016 | Code Clone | Textual | Source Code | Manual | Fully automated | N/A | Proof of Concept |
Ettinger | 2017 | Code Clone | Semantic | Graphs | Manual | Fully automated | N/A | Proof of Concept |
Meananeatra | 2018 | Long Method | Hybrids | AST & Graphs | Manual | Semi-automated | Execute on Approval | Case Study |
Choi | 2018 | Long Method | Syntactic | Metrics | Fully automated | Manual | N/A | Experiment |
Yue | 2018 | Code Clone | Syntactic | AST | Fully automated | Manual | N/A | Experiment |
Vidal | 2018 | Long Method | Textual | Source Code | Fully automated | Semi-automated | Choose Candidates | Case Study |
Yoshida | 2019 | Code Clone | Hybrids | AST & Tokens | Fully automated | Semi-automated | Choose Candidates | Experiment |
Shin | 2019 | Code Clone | Syntactic | AST | Fully automated | Fully automated | N/A | Case Study |
Barrs & Oprescu | 2019 | Code Clone | Hybrids | AST & Graphs | Fully automated | Manual | N/A | Experiment |
Antezana | 2019 | Long Method | Textual | Source Code | Manual | Semi-automated | Choose Candidates | Experiment |
Alcocer | 2020 | Long Method | Textual | Source Code | Manual | Semi-automated | Choose Candidates | Experiment |
Nyamawe | 2019 | Separation of Concerns | Textual | Text | Fully automated | Manual | N/A | Experiment |
Nyamawe | 2020 | Separation of Concerns | Textual | Text | Fully automated | Manual | N/A | Experiment |
Krasniqi & Cleland-Huang | 2020 | Separation of Concerns | Textual | Text | Fully automated | Manual | N/A | Experiment |
Abid | 2020 | Separation of Concerns | Textual | Source Code | Manual | Semi-automated | User Input | Experiment |
Sheneamer | 2020 | Code Clone | Hybrids | AST & Graphs & Tokens | Fully automated | Manual | Choose Candidates | Experiment |
Aniche | 2020 | Separation of Concerns | Syntactic | Metrics | Fully automated | Manual | N/A | Experiment |
Van der Leij | 2021 | Separation of Concerns | Syntactic | Metrics | Fully automated | Manual | N/A | Experiment |
Sagar | 2021 | Separation of Concerns | Hybrids | Text & Metrics | Fully automated | Manual | N/A | Experiment |
AlOmar | 2022 | Separation of Concerns | Textual | Text | Fully automated | Manual | N/A | Experiment |
Nyamawe | 2022 | Separation of Concerns | Textual | Text | Fully automated | Manual | N/A | Experiment |
Shahidi | 2022 | Long Method | Hybrids | Graphs & Metrics | Fully automated | Fully automated | N/A | Experiment |
Tiwari & Joshi | 2022 | Long Method | Semantic | Graphs | Fully automated | Manual | N/A | Experiment |
Fernandes | 2022 | Long Method | Syntactic | Metrics | Fully automated | Semi-automated | Execute on Approval | Experiment |
Fernandes | 2022 | Long Method | Syntactic | Metrics | Fully automated | Semi-automated | Execute on Approval | Experiment |
AlOmar | 2022 | Code Clone | Syntactic | Metrics | Fully automated | Semi-automated | Execute on Approval | Experiment |
AlOmar | 2023 | Code Clone | Syntactic | Metrics | Fully automated | Semi-automated | Execute on Approval | Experiment |
Cui | 2023 | Separation of Concerns | Semantic | Graphs | Fully automated | Manual | N/A | Experiment |
Thy | 2023 | Separation of Concerns | Textual | Source Code | Fullu automated | Fully automated | N/A | Case Study |
Palit | 2023 | Separation of Concerns | Semantic | Graphs | Fully automated | Manual | N/A | Experiment |
RQ2. What are the main characteristics of Extract Method recommendation tools?
To help select an appropriate Extract Method refactoring tool, we report the following main characteristics that can be considered to make an informed decision abouttools usage:
Out of the 83 primary studies analyzed, almost 78% of the datasets are unavailable to the public, with only 22% available online, which means there is a lack of online datasets for the Extract Method refactoring research. Primary studies have mostly employed small or medium-scale open-source applications, often developed using Java, typically containing less than 225,000 lines of code. These dataset are heterogeneous and do not contain the same type of information, making standardizing them for the purpose of benchmarking difficult.
The detailed information on Tools, Datasets and Original papers can be found in the following sections:
The following list elaborates on the papers surveyed for this study (Paper Links are clickable).
This is a list of publications that use the CROP dataset. If you have a published piece of work that uses CROP and it is not listed above, feel free to contact us. Your publication will be included in the list soon.
Matheus Paixao, Jens Krinke, DongGyun Han, Chaiyong Ragkhitwetsagul, Mark Harman. 2019. In IEEE Transactions on Software Engineering (TSE). Preprint
Luca Pascarella, Davide Spadini, Fabio Palomba, Alberto Bacchelli. 2020. On The Effect Of Code Review On Code Smells. In IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). Preprint
Matheus Paixao, Paulo Henrique Maia. 2019. Rebasing in Code Review Considered Harmful:A Large-scale Empirical Investigation. In International Conference on Source Code Analysis and Manipulation (SCAM). Preprint
Matheus Paixao, Jens Krinke, DongGyun Han, and Mark Harman. 2018. CROP: Linking Code Reviews to Source Code Changes. In International Conference on Mining Software Repositories (MSR). Preprint
You can contact the EMRS's team through the following channel: