Publication Type
Journal Article
Version
publishedVersion
Publication Date
1-2014
Abstract
The paper focuses on the searching method for repetitions in DocBook/DRL or plain text documents. An algorithm has been designed based on software clone detection. The algorithm supports filtering results: clones are rejected if clone length in the group is less than 5 symbols, intersection of clone groups is eliminated, meaningfulness clones are removed, the groups containing clones consisting only of XML are eliminated. Remaining search is supported: found clones are extracted from the documentation, and clone search is repeated. One step is proved to be enough. Adaptive reuse technique of Paul Bassett – Stan Jarzabek has been implemented. A software tool has been developed on the basis of the algorithm. The tool supports setting parameters for repetitions detection and visualization of the obtained results. The tool is integrated into DocLine document development environment, and provides refactoring of documents using found clones. The Clone Miner clone detection utility is used for clones search. The method has been evaluated for Linux Kernel Documentation (29documents, 25000 lines). Five semantic kinds of clones have been selected: terms (abbreviations, one word and two word terms), hyperlinks, license agreements, functionality description, and code examples. 451 meaningful clone groups have been found, average clone length is 4.43 tokens, and average number of clones in a group is 3.56.
Keywords
software documentation, documentation reuse, software clone detection, adaptive reuse, refactoring, DocBook, DocLine, DRL
Discipline
Programming Languages and Compilers | Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
Scientific and Technical Journal of Information Technologies, Mechanics and Optics
Volume
14
Issue
4
First Page
106
Last Page
114
ISSN
2226-1494
Publisher
Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
Citation
LUTSIV, Dmitry V.; KOZNOV, Dmitry; BASIT, Hamid A.; OUH, Eng Lieh; SMIRNOV, Mikhail N.; and ROMANOVSKY, Konstantin Y..
An approach for clone detection in documentation reuse. (2014). Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 14, (4), 106-114.
Available at: https://ink.library.smu.edu.sg/sis_research/3984
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://ntv.ifmo.ru/file/article/10381.pdf