Extract Method Refactoring Survey

Welcome

This website is for the Extract Method Refactoring Datasets Survey (EMRS).

EMRS is an open-source repository for Extraction Method refactoring methodologies intended for supporting software engineering researchers and practitioners. For each case in EMRS, we provide summary, tools, dataset and original papers for extract method refactoring techniques. EMRS currently provides studies for 83 refactoring methods.

About

This survey collects and manages the relevant papers, datasets, and tools for utilization by researchers. This website was designed by Matheus Paixao, and Addl Tariq added content management as part of his MS Capstone at Rochester Institute of Technology. Dr. Mohamed Wiem Mkaouer and Dr. Eman Abdullah AlOmar contributed to the first incarnation of EMRS.

By leveraging the data contained in EMRS, software engineering researchers and practitioners can review the historical research on the Extract Method topic, and can utilize the existing studies (techniques, datasets, and tools) to improve or create new empirical studies to assess Extract Method Refactoring. Moreover, the data in EMRS is a valuable source of knowledge regarding motivation for and expansion of the Extract Method Refactoring techniques in the software engineering domain.

Here you can find useful information, such as understand the survey conducted. And download the tools, and datasets.

Download Tools

EMRS contains links to tools for multiple Extract refactoring techniques. All tools can be access through the below table containing links. The following table elaborates on the details concerning utilized artifacts by each project.

Tool	Language	No of Metric	Interface	Usage Guide?	Tool Link	Last Update
Tuck	Unknown	Unknown	Unknown	No	No	Unknown
CloRT	Java	N/A	Unknown	No	No	Unknown
Nate	Java	Unknown	Eclipse	No	No	Unknown
CCshaper	Java	6	Command Line	No	No	Unknown
Aries	Java	6	GUI-based	No	No	Unknown
SDAR	Java	N/A	Eclipse	No	No	Unknown
Unnamed	Java	N/A	Eclipse	No	No	Unknown
Xrefactory	C++	N/A	Unknown	Yes	Yes	2007
Unnamed	Ruby	N/A	Eclipse	Yes	Yes	2012
Refactoring Annotation	Java	Unknown	Eclipse	No	No	Unknown
JDeodorant	Java	3	IntelliJ/ Eclipse	Yes	Yes	2019
AutoMed	Java	10	Unknown	No	No	Unknown
Wrangler	Erlang/OTP	N/A	GUI-based / Command line	Yes	Yes	2023
HaRe	Haskell 98	N/A	GUI-based / Command line	Yes	Yes	2017
ReAF	Java	Unknown	Unknown	No	No	Unknown
Unnamed	C#	Unknown	Visual Studio extension	No	No	Unknown
CeDAR	Java	2	Eclipse	No	No	Unknown
FTMPAT	Java	3	Eclipse	No	No	Unknown
SPAPE	Procedural / Java	Unknown	Unknown	No	No	Unknown
JExtract	Java	Unknown	Eclipse	Yes	Yes	2016
DCRA	Java	1	Unknown	No	No	Unknown
RASE	Java	N/A	Eclipse	Yes	Yes	2015
SEMI	Java	5	GUI-based / Command line	Yes	Yes	2017
GEMS	Java	48	Eclipse	Yes	No	2017
PostponableRefactoring	Java	N/A	Eclipse	Yes	Yes	2018
LLPM	Java	4	Unknown	No	No	Unknown
PRI	Java	N/A	Eclipse	No	No	Unknown
LMR	Java	5	Eclipse	No	No	Unknown
CREC	Java	N/A	Eclipse	Yes	Yes	2018
Bandago	Java	4	Eclipse	No	No	Unknown
Unnamed	Java	N/A	Eclipse	No	Yes	2019
Unnamed	Java	N/A	Unknown	No	No	Unknown
CloneRefactor	Java	N/A	Command line	No	Yes	2020
TOAD	Pharo	N/A	Pharo	Yes	Yes	2019
Segmentation	Java	2	Eclipse	No	Yes	2022
LiveRef	Java	20	IntelliJ	Yes	Yes	2022
AntiCopyPaster	Java	78	IntelliJ	Yes	Yes	2023
REM	Java	N/A	IntelliJ	Yes	Yes	2023

In the above table, you will find links to directories containing csv, jsonl and zip files for the tools, raw data and datasets. For more information on the utilization of tools for Extract Method Refactoring, please review the relevant papers Paper Reviews section.

Download Datasets

EMRS contains links to datasets for multiple Extract refactoring techniques. All tools can be access through the below table containing links.

The following table elaborates on the datasets details utilized for Long Method Decompositions by each project.

Study	Intent	Language	No of Metric	No of Project	Project	Dataset Availability	Validation Method
Tuck	Long Method	Unknown	Unknown	Unknown	Unknown	No	Proof of Concept
JDeodorant	Long Method	Java	3	1	Violet 0.16 (LOC: 4,100/ 61 classes/ 144 methods)	No	Experiment
AutoMed	Long Method	Java	10	1	houtReader 1.8.0 (LOC: 20,000/ 269 classes)	No	Case Study
Meananeatra	Long Method	Java	3	Unknown	Unknown	No	Experiment
Kaya & Fawcett	Long Method	C++	N/A	Unknown	Unknown	No	Case Study
Charalampidou	Long Method	Java	5	1	jFlex	No	Case Study
Charalampidou	Long Method	Java	8	1	jFlex	No	Case Study
SEMI	Long Method	Java	5	5	Wikidev, MyPlanner, MyWebMarket, JUnit, JHotDraw	Yes	Case Study
Hass & Hummel	Long Method	Java	2	3	Agilefant (LOC: 36,116/ 2,841 methods), JabRef (LOC: 128,145/5,655 methods), JChart2D (LOC: 50,728/ 1,849 methods)	No	Experiment
Hass & Hummel	Long Method	Java	9	13	Unknown	No	Experiment
Kaya & Fawcett	Long Method	C++	N/A	Unknown	Unknown	No	Experiment
LLPM	Separation of Concerns	Java	4	5	Wikidev (130 total methods), SelfPlanner, MyWebMarket, JUnit, JHotDraw	No	Experiment
LMR	Long Method	Java	5	1	JFreeChart 1.0.17 (LOC: 5,665/ 20 classses/ 552 methods)	No	Case Study
Choi	Long Method	Java	6	1	JEdit (LOC: 97,116 - 313,706)	No	Experiment
Banago	Long Method	Java	4	10	Columbia 1.4 (LOC: 26,600/ 436 classes), JGraphT 0.9.0 (LOC:14,180/ 218 classes), SportTracker 5.7 (LOC: 5,200/ 40 classes), Cayanner 4.0 (LOC: 45,000/ 533 classes), CheckStyle 6.4.1 (LOC: 60,000/ 399 classes), Jena 2.12.1 (LOC: 54,410/ 698 classes), JGroups 3.4.8 (LOC: 76,570/ 644 classes), Quartz 2.1.7 (LOC:26,810/ 176 classes) Roller 5.1.2 (LOC: 47,460/ 452 classes), Squirrel 3.6.0 (LOC: 79,070/ 879 classes)	Yes	Case Study
TOAD	Long Method	Pharo	N/A	9	GitMultipleMatrix, TestDeviator, DrTest, Regis, SmallSuiteGenerator, Roassal, Live Robot Programming, KerasBridge, GTool Documneter	Yes	Experiment
Shahidi	Long Method	Java	Unknown	5	JEdit 4.5.1 (LOC:107,212/ 1,141 classes/ 6,663 methods), FreeMind 0.9.0 (LOC: 40,933/ 696 classes/4,583 methods), ArgoUML 0.34 (LOC:249,538/ 2,539 classes/ 17,485 methods), JFreeChart (LOC: 222,814 / 8,630 classes/ 619 methods) jVLT 1.3.2 (LOC: 29,161/ 420 classes/ 2,036 methods)	No	Experiment
Segmentation	Long Method	Java	2	6	JEdit, JHotDraw, MyWebMarket, EventBus, Mockito, XData	Yes	Experiment
LiveRef	Long Method	Java	20	3	Space Invaders, JHotDraw, Movie retal system	Yes	Experiment

The following table elaborates on the datasets details utilized for Code Clone Extraction by each project.

Study	Intent	Language	No of Metric	No of Project	Project	Dataset Availability	Validation Method
CloRT	Code Clone	Java	Unknown	Unknown	Unknown	No	Proof of Concept
Komondoor & Horwitz	Code Clone	Procedural	N/A	Unknown	Unknown	No	Proof of Concept
Komondoor & Horwitz	Code Clone	Procedural	N/A	Unknown	Unknown	No	Proof of Concept
CCShaper	Code Clone	Java	6	1	Ant 1.6.0 (LOC: 180,000/ 627 files)	No	Case Study
Aries	Code Clone	Java	6	1	Ant 1.6.0 (LOC: 180,000/ 627 files)	No	Case Study
Juillerat & Hirsbrunner	Code Clone	Java	N/A	Unknown	Unknown	No	Proof of Concept
Wrangler	Code Clone	Erlang/OTP	N/A	3	Wrangler (LOC: 30,872), Mnesia (LOC: 28,152), Yaws (LOC: 29,603)	No	Experiment
HaRe	Code Clone	Haskell 98	N/A	13	Previous Work	No	Case Study
Choi	Code Clone	Java	3	1	Unknown (LOC: 110/ 296 files)	No	Case Study
CeDar	Code Clone	Java	2	9	Ant 1.7.0 (KLOC: 67), Columbia 1.4 (KLOC: 75), EMF 2.4.1 (KLOC: 118), Hibernate (KLOC: 209), Jakarta-JMeter 2.3.2 (KLOC: 54), JEdit 4.2 (KLOC: 51), JFreeChart 1.10.10 (KLOC: 76), JRuby (KLOC: 101), Squirrel-SQL 3.0.3 (KLOC: 141)	No	Experiment
FTMPAT	Code Clone	Java	3	1	Ant 1.7.0	No	Case Study
SPAPE	Code Clone	Java/ Procedural	Unknown	10	Linux 2.6.6/kernal (LOC: 30,629), Unix/make 3.82 (LOC: 33,864), http 2.2.2/server (LOC: 36,926), devecot 2.0.8/src/auth (LOC: 18,243), gstreamer 0.10.31/gst (LOC: 66,637), gtk 2.91.5/gdk/x11 (LOC: 30,118), iptables 1.4.10/extensions (LOC: 19,668), nginx-0.8.15/src/core (LOC:17,126), proftpd 1.3.3c/src (LOC: 34,404), PostgreSQL 9.0.2/src/backend/access (LOC: 605,046)	No	Experiment
Bian	Code Clone	Java	Unknown	5	Linux 2.6.6/arch, Linux 2.6.6/net, Linux 2.6.6/sound/drivers, Unix/make 3.82, http2.2.2/server	No	Experiment
JDeodorant	Code Clone	Java	N/A	9	Ant 1.7.0/Ant 1.9 (KLOC: 67), Columbia 1.4 (KLOC: 75), EMF 2.4.1 (KLOC: 118), JMeter 2.3.2/JMeter 2.9 (KLOC: 54), JEdit 4.2 (KLOC: 51), JFreeChart 1.0.10/JFreeChart 1.0.14 (KLOC: 76), JRuby 1.4.0/JRudby 1.7.3 (KLOC 101), Hibernate 3.3.2 (KLOC: 209), SQuirrel SQL 3.0.3 (KLOC: 141)	No	Experiment
DCRA	Code Clone	Java	1	50	Qualitas Copus (v.20120401)	No	Experiment
RASE	Code Clone	Java	N/A	2	Previous works	Yes	Experiment
CREC	Code Clone	Java	N/A	6	Axis2 (8,723 commits), Eclipse.jdt.core (22,358), Elastic Search (14,766 commits), JFreeChart (3,603 commits), JRuby (24,434 commits), Lucene (22,061 commits)	Yes	Experiment
PRI	Code Clone	Java	N/A	6	AlgoUML (LOC: 127,145/ 1,559 files), Tomcat (LOC: 215,584/ 1,537 files), Log4j (LOC: 59,499/ 817 files), Eclipse AspectJ (LOC: 107,368 / 4,758 files), JEdit (LOC:107,368/ 561 files), JRuby (LOC: 186,514/ 1,256 files)	No	Case Study
Ettinger	Code Clone	Java	N/A	Unknown	Previous work (59 clone pairs)	No	Proof of Concept
Unnamed	Code Clone	Java	N/A	2	JFreeChart (KLOC: 260/ 990 classes), JUnit (KLOC: 43/ 449 classes)	No	Experiment
Unnamed	Code Clone	Java	N/A	Unknown	Unknown	No	Case Study
CloneRefactor	Code Clone	Java	N/A	1,343	Previous work (LOC (AVG): 980)	No	Experiment
Sheneamer	Code Clone	Java	N/A	6	Previous work, netbeans (200 paired clones), eclipse-jdtcore (400 paired clones), EITC (426 paired clones), J2sdk1.4.0-javax (482 paired clones), eclipse-ant (522 paired clones), cocoon (655 paired clones)	Yes	Experiment
AntiCopyPaster	Code Clone	Java	78	13	arthas (73,884 commits), easyexcel, camel-quarkus, commons-lang, flink,, iceberg, jena, pulsar, storm, apollo, JavaGuide	Yes	Experiment

The following table elaborates on the datasets details utilized for Separation of Concerns by each project.

Study	Intent	Language	No of Metric	No of Project	Project	Dataset Availability	Validation Method
Maruyama	Separation of Concerns	Java	N/A	Unknown	Unknown	No	Proof of Concept
Nate	Separation of Concerns	Java	N/A	Unknown	Unknown	No	Proof of Concept
SDAR	Separation of Concerns	Java	N/A	Unknown	Unknown	No	Proof of Concept
Juillerat & Hirsbrunner	Code Clone	Java	N/A	Unknown	Unknown	No	Proof of Concept
Xrefactory	Separation of Concerns	C++	N/A	Unknown	Unknown	No	Proof of Concept
Unnamed	Separation of Concerns	Ruby	N/A	Unknown	Unknown	No	Proof of Concept
RefactoringAnnotation	Separation of Concerns	Java	Unknown	5	Azureus, GanttProject, JasperReports, Java 1.4.2 libraries	No	Experiment
Abadi	Separation of Concerns	Java	N/A	Unknown	Unknown	No	Case Study
Abadi	Separation of Concerns	Java	N/A	Unknown	Unknown	No	Case Study
ReAF	Separation of Concerns	Java	Unknown	1	Ant 1.8.1	No	Experiment
Sharma	Separation of Concerns	C/C++	N/A	1	CppCheck	No	Proof of Concept
Unnamed	Separation of Concerns	C#	Unknown	Unknown	Unknown	No	Proof of Concept
JExtract	Separation of Concerns	Java	Unknown	12	MyWebMarket, Unit 3.8/ 4.10, JHotDraw 5.2, Ant 1.8.2, ArgoUML 0.34, CheckStyle 5.6, FindBugs 1.3.9, FreeMind 0.9.0, JFreeChart 1.0.013, Quartz 1.8.3, SQuirrel SQL 2.1.2, Tomcat 7.0.2	Yes	Experiment
Gems	Separation of Concerns	Java	48	5	Wikidev (56 methods), SelfPlanner (25 methods), MyWebMarket (23 methods), Junit (12 methods), JHotDraw (14 methods)	No	Experiment
Imazato	Separation of Concerns	Java	Unknown	5	Ant (LOC: 260,624/ 1,532 methods), ArgoUML (LOC: 370,750/ 1,470 methods), JEdit (LOC: 187,166/ 1,066 methods), JFreeChart (LOC: 327,865/ 180 methods), Mylyn (LOC: 166,149/980 methods)	No	Experiment
PostponableRefactoring	Separation of Concerns	Java	N/A	Unknown	Unknown	No	Proof of Concept
Nyamawe	Separation of Concerns	Java	N/A	55	Unknown	Yes	Experiment
Krasniqi & Cleland-Huang	Separation of Concerns	Java	N/A	4	Derby (KLOC: 170/ 2,382 commits), Drools (KLOC: 371/ 840 commits), Groovy (KLOC: 141/ 4,892 commits), Infinispan (KLOC: 299/ 2,349 commits)	Yes	Experiment
Abid	Separation of Concerns	Java	8	30	Unknown	Yes	Experiment
Aniche	Separation of Concerns	Java	61	11,149	Unknown (8.8 million commits)	Yes	Experiment
Van der Leij	Separation of Concerns	Java	7	11,149	Previous work (8.8 million commits)	No	Experiment
Sagar	Separation of Concerns	Java	60	800	Previous work (748,001 commits)	No	Experiment
AlOmar	Separation of Concerns	Java	N/A	800	Previous work (748,001 commits)	Yes	Experiment
Nyamawe	Separation of Concerns	Java	N/A	65	Previous work (7,520 commits)	No	Experiment
Cui	Separation of Concerns	Java	N/A	Unknown	Previous work	Yes	Experiment
REM	Separation of Concerns	Rust	N/A	5	petgraph (LOC: 20,157), gitoxide (LOC: 20,211), kickof (LOC: 1,502), sniffnet (LOC: 7,304), beerus (LOC: 302)	Yes	Experiment
Palit	Separation of Concerns	Java	61	410	Previous work (55,268 commits)	Yes	Experiment

In the above table, you will find links to directories/files containing csv, jsonl and zip files for datasets. For more information on the utilization of datasets for Extract Method Refactoring, please see the relevant papers in the Paper Reviews section.

Study Summary

EMRS contains a study for multiple Extract refactoring techniques. The literature study, experiments and table below summarizes the study details of this project.

Behind the Intent of Extract Method Refactoring: A Systematic Literature Review

Aim: In this paper, we aim to review the current body of knowledge on existing Extract Method refactoring research and explore their limitations and potential improvement opportunities for future research efforts. That is, the Extract Method is considered one of the most widely-used refactorings, but difficult to apply in practice as it involves low-level code changes such as statements, variables, parameters, return types, etc. Hence, researchers and practitioners begin to be aware of the state-of-the-art and identify new research opportunities in this context.

Method: We review the body of knowledge related to Extract Method refactoring in the form of a systematic literature review (SLR). After compiling an initial pool of 1,367 papers, we conducted a systematic selection, and our final pool included 83 primary studies. We define three sets of research questions and systematically develop and refine a classification schema based on several criteria including their methodology, applicability, as their degree of automation.

Results: The results construct a catalog of 83 Extract Method approaches indicating that several techniques have been proposed in the literature. Our results show that: (i) 38.6% of Extract Method refactoring studies primarily focus on addressing code clones; (ii) Several of the Extract Method tools incorporate the developer’s involvement in the decision-making process when applying the method extraction, and (iii) these existing benchmarks are heterogeneous and do not contain the same type of information, making standardizing them for the purpose of benchmarking difficult.

Conclusions: Our study serves as an “index” to the body of knowledge in this area for researchers and practitioners in determining the Extract Method refactoring approach that is most appropriate for their needs. Our findings also empower the community with information to guide future refactoring tool development.

We drive our study using the following research questions:
RQ1: What approaches were considered by the PSs to recommend Extract Method refactoring?
RQ2: What are the main characteristics of Extract Method recommendation tools?
RQ3: What are the datasets, and benchmarks used for evaluating and validating Extract Method recommendation tools?

Research Questions and Findings

RQ1. What approaches were considered by the PSs to recommend Extract Method refactoring?

Figure below of the sankey diagram provides detailed mappings between our three dimensions. Separation of Concerns is the most popular intent driving method extraction, followed up with Long Method, and finally Code Clone. Interestingly, this is not being matched in terms of the toolset, as the highest ratio of tools goes to Long Method with 41.2%, then equally between Code Clone and Separation of Concerns with 29.4%. Such observation has caught our attention particularly as Separation of Concerns is the only category that relies on all existing detection techniques and has its own unique one,i.e., Evolutionary-based, and yet, there is a lack of concretizing this amount of research into practical tools. As for Detection, it is no surprise that rule-based techniques are the most popular in identifying need-to-refactor code fragments. This is being inherited from how research couples refactoring to a natural response to code smells, e.g., Long Method.

Table analysis

Each research paper's approach methodology, type of data, decision metrics, and evaluation metrics are detailed below:

Study	Year	Intent	Design Property	Representation	Detection	Execution	Semi-Automation	Validation Method
Lakhotia & Deprez	1998	Long Method	Semantic	Graphs	Manual	Semi-automated	Suggest Alternatives	Proof of Concept
Balazinska	1999	Code Clone	Syntactic	AST	Fully automated	Fully automated	N/A	Proof of Concept
Komondoor & Horwitz	2000	Code Clone	Semantic	Graphs	Manual	Fully automated	N/A	Proof of Concept
Maruyama	2001	Separation of Concerns	Semantic	Graphs	Manual	Semi-automated	Choose Candidates	Proof of Concept
Komondoor & Horwitz	2003	Code Clone	Semantic	Graphs	Manual	Fully automated	N/A	Proof of Concept
Ettinger & Verbaere	2004	Separation of Concerns	Semantic	Graphs	Manual	Fully automated	N/A	Proof of Concept
Higo	2004	Code Clone	Textual	Source Code	Fully automated	Semi-automated	Choose Candidates	Case Study
Higo	2004	Code Clone	Semantic	Graphs	Fully automated	Fully automated	N/A	Case Study
Higo	2005	Code Clone	Textual	Source Code	Fully automated	Semi-automated	Execute on Approval	Case Study
Higo	2008	Code Clone	Textual	Source Code	Fully automated	Semi automated	Execute on Approval	Case Study
O’Connor	2005	Separation of Concerns	Syntactic	AST	Semi-automated	Semi-automated	Suggest Alternatives	Proof of Concept
Juillerat & Hirsbrunner	2006	Code Clone	Syntactic	AST	Fully-automated	Fully-automated	N/A	Proof of Concept
Juillerat & Hirsbrunner	2007	Separation of Concerns	Syntactic	AST	Manual	Fully-automated	N/A	Proof of Concept
Vittek	2007	Separation of Concerns	Syntactic	AST	Manual	Semi-automated	User Input	Proof of Concept
Corbat	2007	Separation of Concerns	Syntactic	AST	Manual	Semi-automated	Choose Candidates	Proof of Concept
Murphy-Hill & Black	2008	Separation of Concerns	Textual	Source Code	Manual	Semi-automated	Choose Candidates	Experiment
Abadi	2008	Separation of Concerns	Textual	Source Code	Manual	Fully automated	N/A	Case Study
Abadi	2009	Separation of Concerns	Textual	Source Code	Manual	Fully automated	N/A	Case Study
Tsantalis & Chatzigeorgiou	2009	Long Method	Textual	Source Code	Fully automated	Semi-automated	Suggest Alternatives	Experiment
Tsantalis & Chatzigeorgiou	2011	Long Method	Textual	Source Code	Fully automated	Semi-automated	Suggest Alternatives	Experiment
Yang	2009	Long Method	Textual	Source Code	Manual	Semi-automated	Suggest Alternatives	Case Study
Li & Thompson	2009	Code Clone	Hybrids	AST & Tokens	Manual	Semi-automated	Suggest Alternatives	Case Study
Brown & Thompson	2010	Code Clone	Hybrids	AST & Tokens	Manual	Semi-automated	Suggest Alternatives	Case Study
Kanemitsu	2011	Separation of Concerns	Semantics	Graphs	Manual	Semi-automated	Suggest Alternatives	Experiment
Meananeatra	2011	Long Method	Syntactic	Metrics	Manual	Semi-automated	Suggest Alternatives	Proof of Concept
Choi	2011	Code Clone	Lexical	Tokens	Fully automated	Manual	N/A	Case Study
Sharma	2012	Separation of Concerns	Semantic	Graphs	Manual	Fully automated	N/A	Proof of Concept
Cousot	2012	Separation of Concerns	Textual	Source Code	Manual	Fully automated	N/A	Proof of Concept
Tairas & Gray	2012	Code Clone	Syntactic	AST	Fully automated	Semi-automated	Choose Candidates	Experiment
Kaya & Fawcett	2013	Long Method	Textual	Source Code	Fully automated	Manual	N/A	Experiment
Goto	2013	Code Clone	Syntactic	AST	Manual	Fully automated	N/A	Case Study
Bian	2013	Code Clone	Hybrids	AST & Graphs	Manual	Fully automated	N/A	Experiment
Bian	2014	Code Clone	Syntactic	Metrics	Fully automated	Manual	N/A	Experiment
Krishnan & Trantalis	2013	Code Clone	Textual	Source Code	Fully automated	Semi-automated	User Input	Experiment
Krishnan & Trantalis	2014	Code Clone	Hybrids	AST & Graphs	Fully automated	Semi-automated	User Input	Experiment
Tsantalis	2015	Code Clone	Hybrids	AST & Source Code & Tokens	Fully automated	Semi-automated	User Input	Experiment
Mazinanian	2016	Code Clone	Hybrids	AST & Source Code & Tokens	Fully automated	Semi-automated	User Input	Experiment
Tsantalis	2017	Code Clone	Hybrids	AST & Source Code & Tokens	Fully automated	Semi-automated	User Input	Experiment
Silva	2014	Separation of Concerns	Textual	Source Code	Fully automated	Semi-automated	Suggest Alternatives	Experiment
Silva	2015	Separation of Concerns	Textual	Source Code	Fully automated	Semi-automated	Suggest Alternatives	Experiment
Fontana	2015	Code Clone	Hybrids	AST & Source Code	Fully automated	Semi-automated	Suggest Alternatives	Experiment
Meng	2015	Code Clone	Syntactic	AST	Fully automated	Fully automated	N/A	Experiment
Charalampidou	2015	Long Method	Syntactic	Metrics	Fully automated	Fully automated	N/A	Case Study
Charalampidou	2016	Long Method	Syntactic	AST & Metrics	Fully automated	Fully automated	N/A	Case Study
Charalampidou	2018	Long Method	Syntactic	Metrics	Fully automated	Fully automated	N/A	Case Study
Haas & Hummel	2016	Long Method	Hybrids	Source Code & Graphs	Manual	Semi-automated	Suggest Alternatives	Experiment
Haas & Hummel	2017	Long Method	Hybrids	Source Code & Graphs	Manual	Semi-automated	Select Alternatives	Experiment
Xu	2017	Separation of Concerns	Textual	Source Code	Fully automated	Semi-automated	Choose Candidates	Experiment
Imazato	2017	Separation of Concerns	Textual	Source Code	Fully automated	Manual	N/A	Experiment
Kaya & Fawcett	2017	Long Method	Semantic	Graphs	Fully automated	Fully automated	N/A	Experiment
Maruyama & Hayashi	2017	Separation of Concerns	Textual	Source Code	Manual	Semi-automated	Choose Candidates	Proof of Concept
Xu	2017	Long Method	Syntactic	Metrics	Fully automated	Manual	N/A	Experiment
Chen	2017	Code Clone	Syntactic	AST	Manual	Fully automated	N/A	Case Study
Ettinger & Tyszberowicz	2016	Code Clone	Textual	Source Code	Manual	Fully automated	N/A	Proof of Concept
Ettinger	2017	Code Clone	Semantic	Graphs	Manual	Fully automated	N/A	Proof of Concept
Meananeatra	2018	Long Method	Hybrids	AST & Graphs	Manual	Semi-automated	Execute on Approval	Case Study
Choi	2018	Long Method	Syntactic	Metrics	Fully automated	Manual	N/A	Experiment
Yue	2018	Code Clone	Syntactic	AST	Fully automated	Manual	N/A	Experiment
Vidal	2018	Long Method	Textual	Source Code	Fully automated	Semi-automated	Choose Candidates	Case Study
Yoshida	2019	Code Clone	Hybrids	AST & Tokens	Fully automated	Semi-automated	Choose Candidates	Experiment
Shin	2019	Code Clone	Syntactic	AST	Fully automated	Fully automated	N/A	Case Study
Barrs & Oprescu	2019	Code Clone	Hybrids	AST & Graphs	Fully automated	Manual	N/A	Experiment
Antezana	2019	Long Method	Textual	Source Code	Manual	Semi-automated	Choose Candidates	Experiment
Alcocer	2020	Long Method	Textual	Source Code	Manual	Semi-automated	Choose Candidates	Experiment
Nyamawe	2019	Separation of Concerns	Textual	Text	Fully automated	Manual	N/A	Experiment
Nyamawe	2020	Separation of Concerns	Textual	Text	Fully automated	Manual	N/A	Experiment
Krasniqi & Cleland-Huang	2020	Separation of Concerns	Textual	Text	Fully automated	Manual	N/A	Experiment
Abid	2020	Separation of Concerns	Textual	Source Code	Manual	Semi-automated	User Input	Experiment
Sheneamer	2020	Code Clone	Hybrids	AST & Graphs & Tokens	Fully automated	Manual	Choose Candidates	Experiment
Aniche	2020	Separation of Concerns	Syntactic	Metrics	Fully automated	Manual	N/A	Experiment
Van der Leij	2021	Separation of Concerns	Syntactic	Metrics	Fully automated	Manual	N/A	Experiment
Sagar	2021	Separation of Concerns	Hybrids	Text & Metrics	Fully automated	Manual	N/A	Experiment
AlOmar	2022	Separation of Concerns	Textual	Text	Fully automated	Manual	N/A	Experiment
Nyamawe	2022	Separation of Concerns	Textual	Text	Fully automated	Manual	N/A	Experiment
Shahidi	2022	Long Method	Hybrids	Graphs & Metrics	Fully automated	Fully automated	N/A	Experiment
Tiwari & Joshi	2022	Long Method	Semantic	Graphs	Fully automated	Manual	N/A	Experiment
Fernandes	2022	Long Method	Syntactic	Metrics	Fully automated	Semi-automated	Execute on Approval	Experiment
Fernandes	2022	Long Method	Syntactic	Metrics	Fully automated	Semi-automated	Execute on Approval	Experiment
AlOmar	2022	Code Clone	Syntactic	Metrics	Fully automated	Semi-automated	Execute on Approval	Experiment
AlOmar	2023	Code Clone	Syntactic	Metrics	Fully automated	Semi-automated	Execute on Approval	Experiment
Cui	2023	Separation of Concerns	Semantic	Graphs	Fully automated	Manual	N/A	Experiment
Thy	2023	Separation of Concerns	Textual	Source Code	Fullu automated	Fully automated	N/A	Case Study
Palit	2023	Separation of Concerns	Semantic	Graphs	Fully automated	Manual	N/A	Experiment

RQ2. What are the main characteristics of Extract Method recommendation tools?

To help select an appropriate Extract Method refactoring tool, we report the following main characteristics that can be considered to make an informed decision abouttools usage:

Language: Indicates the programming language the tool supports.
Intent: Indicates the context in which the tool can be used.
Number of Metrics: Indicates the number of software metrics used by the tool.
Interface: Indicates what IDE the tool supports.
Usage Guide?: Indicates the availability of instructions on how to use the tool.
Tool Link: Indicates to the online source code repository.
Last Update: Indicates whether the tool has been consistently updated/maintained since its development

RQ3. What are the datasets, and benchmarks used for evaluating and validating Extract Method recommendation tools?

Out of the 83 primary studies analyzed, almost 78% of the datasets are unavailable to the public, with only 22% available online, which means there is a lack of online datasets for the Extract Method refactoring research. Primary studies have mostly employed small or medium-scale open-source applications, often developed using Java, typically containing less than 225,000 lines of code. These dataset are heterogeneous and do not contain the same type of information, making standardizing them for the purpose of benchmarking difficult.

The detailed information on Tools, Datasets and Original papers can be found in the following sections:

Download Tools

Download Data

Paper Reviews

The following list elaborates on the papers surveyed for this study (Paper Links are clickable).

Paper Titles
Restructuring programs by tucking statements into functions
Automated method-extraction refactoring by using block-based slicing
ARIES: Refactoring support environment based on code clone analysis
ARIES: refactoring support tool for code clone
Refactoring Support Based on Code Clone Analysis
A metric-based approach to identifying refactoring opportunities for merging code clones in a java software system
Star Diagram with Automated Refactorings for Eclipse
A C++ Refactoring Browser and Method Extraction
Breaking the barriers to successful refactoring: Observations and tools for Extract Method
Fine slicing for advanced method extraction
Identification of Extract Method refactoring opportunities
Identification of Extract Method refactoring opportunities for the decomposition of methods
Identifying fragments to be extracted from long methods
Using software metrics to select refactoring for long method bad smell
A visualization method of program dependency graph for identifying Extract Method opportunity
Identifying extract-method refactoring candidates automatically
An abstract interpretation framework for refactoring with application to extract methods with contracts
Identifying Extract Method Opportunities Based on Variable References (S)
Increasing clone maintenance support by unifying clone detection and refactoring activities
Ruby refactoring plug-in for eclipse
SPAPE: A semantic-preserving amorphous procedure extraction method for near-miss clones
Identifying accurate refactoring opportunities using metrics
Recommending automated Extract Method refactorings
JExtract: An eclipse plug-in for recommending automated Extract Method refactorings
A duplicated code refactoring advisor
Identifying Extract Method refactoring opportunities based on functional relevance
Deriving Extract Method refactoring suggestions for long methods
Learning to rank extract method refactoring suggestions for long methods
GEMS: An Extract Method refactoring recommender
Finding extract method refactoring opportunities by analyzing development history
Identification of extract method refactoring opportunities through analysis of variable declarations and uses
A tool supporting postponable refactoring
A log-linear probabilistic model for prioritizing extract method refactorings
Refactoring opportunity identification methodology for removing long method smells and improving code analyzability
An Investigation of the Relationship between Extract Method and Change Metrics: A Case Study of JEdit
Automatic clone recommendation for refactoring based on the present and the past
Proactive clone recommendation system for Extract Method refactoring
TOAD: A tool for recommending auto-refactoring alternatives
Improving the success rate of applying the Extract Method refactoring
Feature requests-based recommendation of software refactorings
Automated recommendation of software refactorings based on feature requests
Enhancing source code refactoring detection with explanations from commit messages
How does refactoring impact security when improving quality? a security-aware refactoring approach
An automatic advisor for refactoring software clones based on machine learning
The effectiveness of supervised machine learning algorithms in predicting software refactoring
Data-driven Extract Method recommendations: A study at ING
Comparing commit messages and source code metrics for the prediction refactoring activities
On the documentation of refactoring types
Mining commit messages to enhance software refactorings recommendation: A machine learning approach
An automated Extract Method refactoring approach to correct the long method code smell
Identifying Extract Method Refactorings
LiveRef: a Tool for Live Refactoring Java Code
A Live Environment to Improve the Refactoring Experience
AntiCopyPaster: Extracting Code Duplicates As Soon As They Are Introduced in the IDE
Just-in-time code duplicates extraction
REMS: Recommending Extract Method Refactoring Opportunities via Multi-view Representation of Code Property Graph
Size and Cohesion Metrics as Indicators of the Long Method Bad Smell: An Empirical Study
Assessing the Refactoring of Brain Methods
JDeodorant: Clone Refactoring
Untangling: A Slice Extraction Refactoring
How to extract differences from similar programs? A cohesion metric approach
A study on the method of removing code duplication using code template
Structural quality metrics as indicators of the long method bad smell: An empirical study
Tool Support for Managing Clone Refactorings to Facilitate Code Review in Evolving Software
Efficient method extraction for automatic elimination of type-3 clones
Assessing the Refactorability of Software Clones
Unification and refactoring of clones
Clone Refactoring with Lambda Expressions
Refactoring clones: An optimization problem
Effective automatic procedure extraction
Adventure of a Lifetime: Extract Method Refactoring for Rust
An algorithm for detecting and removing clones in java code
Extracting Code Clones for Refactoring Using Combinations of Clone Metrics
Improving Method Extraction: A Novel Approach to Data Flow Analysis Using Boolean Flags and Expressions
Semantics-preserving procedure extraction
Duplication for the Removal of Duplication
Does Automated Refactoring Obviate Systematic Editing?
Re-Approaching the Refactoring Rubicon
Automatic Refactoring Candidate Identification Leveraging Effective Code Representation
Clone Detection and Elimination for Haskell
Clone Detection and Removal for Erlang/OTP within a Refactoring Environment
Partial Redesign of Java Software Systems Based on Clone Analysis
Towards Automated Refactoring of Code Clones in Object-Oriented Programming Languages

Data Collection

Publications

This is a list of publications that use the CROP dataset. If you have a published piece of work that uses CROP and it is not listed above, feel free to contact us. Your publication will be included in the list soon.

Journal Articles

Matheus Paixao, Jens Krinke, DongGyun Han, Chaiyong Ragkhitwetsagul, Mark Harman. 2019. In IEEE Transactions on Software Engineering (TSE). Preprint

Conference Papers

Luca Pascarella, Davide Spadini, Fabio Palomba, Alberto Bacchelli. 2020. On The Effect Of Code Review On Code Smells. In IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). Preprint

Matheus Paixao, Paulo Henrique Maia. 2019. Rebasing in Code Review Considered Harmful:A Large-scale Empirical Investigation. In International Conference on Source Code Analysis and Manipulation (SCAM). Preprint

Matheus Paixao, Jens Krinke, DongGyun Han, and Mark Harman. 2018. CROP: Linking Code Reviews to Source Code Changes. In International Conference on Mining Software Repositories (MSR). Preprint

Contact

You can contact the EMRS's team through the following channel:

EMRS's mailing list

Welcome

About

Download Tools

Download Datasets

Study Summary

Behind the Intent of Extract Method Refactoring: A Systematic Literature Review

Research Questions and Findings

Table analysis

Paper Reviews

Data Collection

Final Set

Venues

Manual Classification

Publications

Journal Articles

Conference Papers

Contact