DLRL Hadoop Cluster Upgrade - Virginia Tech

DLRL Hadoop Cluster Upgrade - Virginia Tech

CCLearner: A Deep Learning-Based Clone Detection Approach Liuqing Li, He Feng, Wenjie Zhuang, Na Meng and Barbara Ryder {liuqing, fenghe, kaito, nm8247}@vt.edu, [email protected] Department of Computer Science Virginia Polytechnic Institute and State University Blacksburg, VA, USA September 22, 2017 Code Clone Developers copy and paste code to improve programming productivity

Clone detections tools are 15needed19% to help bug fixes or program changes 25% 2 Related Work Textbased Treebased Token

based Code Clone Detectio n Metri csbased Graph based 3

Research Question How can we automatically characterize the similarity between clones? How can we leverage the characteristics to detect clones? 4 Methodology Code Clone Detection Problem Classification Problem 5

Approach Clone Pairs Non-Clone Pairs Training Source code Feature Extractio n

Method Extraction Deep Learning Classifier Method Pair Enumerator Testing Clone Pairs Non-Clone Pairs

6 Hypothesis Code clones are more likely to share certain kinds of tokens than other tokens Tokens likely to be shared Keywords, method names, Tokens less likely to be shared

Variable names, literals, 7 Feature Extraction A method pair metho methodB dA token_fre q_listA [token_freq_catA1, , token_freq_cat A8] Category

Name Reserved words Example Category Name Type identifiers Operators <+=, 2>

Method identifiers Markers <;, 2> Qualified Example

8 Feature Extraction A method pair metho methodB dA token_fre token_fre q_listA q_listB [token_freq_catA1, , token_freq_cat

[token_freq_catA8] , , B1 token_freq_catB8] [sim_score1 , . . . , sim_score8] 9 Training Input Clones

<[sim_score1 , . . . , sim_score8], 1> Non-clones <[sim_score1 , . . . , sim_score8], 0> Training Process DeepLearning4j* Output A well-trained classifier (.mdl) 10

* Deeplearning4j, http://deeplearning4j.org/, accessed: 2017-09-04 Testing Input A codebase Output 2 nodes in DNN Predict the likelihood of clones and non-clones Challenges Time cost O(n2)

Solution Two filters 11 Evaluation Benchmark: BigCloneBench* 10 source code folders Clone Type: T1, T2, VST3, ST3, MT3 and WT3/4 Data Set Construction Training Data (Folder #4) T1, T2, VST3 and ST3 clones

Randomly choose a subset of false clone pairs Testing data (Other 9 folders) All source files * Jeffrey Svajlenko, Judith F. Islam, Iman Keivanloo, Chanchal K. Roy and Mohammad Mamun Mia, "Towards a Big Data Curated Benchmark of Inter-Project Code Clones", In Proceedings of the Early Research Achievements track of the 30th International Conference on Software 12 Evaluation Recall

( 1 3 ) 1 3 = ( 1 3 ) Precision = 385 F-score 2 1 3 1 3=

+ 1 3 13 Evaluation Inter-comparison Results (%) (%) (%) CCLearn Sourcerer Deckar NiCad

er CC d T1 100 100 100 96 T2 98 97 85 82 VST3

98 92 98 78 ST3 89 67 77 78 CCLearn Sourcerer Deckar NiCad er CC

d 93 98 68 71 CCLearn Sourcerer Deckar NiCad er CC d 93 88 76

77 14 Conclusion Novel Approach Use deep learning to characterize and detect clones Comprehensive Empirical Study Compare CCLearner with existing tools Evaluate the importance of different features

E.g., reserved words, type identifiers, and method identifiers 15 Thank you ! Questions? http://people.cs.vt.edu/~liuqing/ https://github.com/liuqingli/CCLearner https://goo.gl/k6rjDn Backup Backup

Backup Backup Backup Backup Backup Backup

Recently Viewed Presentations

  • Kurt Vonnegut &quot;And so it goes.&quot; 1922-2007

    Kurt Vonnegut "And so it goes." 1922-2007

    Kurt Vonnegut, Jr. ... Tralfamadorians Children/ innocence Children's crusade Mary O'Hare Adam and Eve Perception Optometry Tralfamadorians Roland Weary's version of the war story Sinatra and Wayne More to come: REVENGE CHRISTIANITY ( hypocrisy of some of them) Roland Weary...
  • PowerPoint-presentatie


    Daarom doet hij mee met het Songfestival. Sommige mensen hebben een afkeervan het Songfestival, een gevoel dat je iets vervelend vindt of helemaal niet wilt. Die mensen vinden het Songfestival stom. Maar gelukkig vindt niet iedereen dat. Van te voren...
  • The DBQ - Moore Public Schools

    The DBQ - Moore Public Schools

    The DBQ What you need to know about writing a document based essay Purpose of a DBQ Not to test a student's prior knowledge, but rather to evaluate their ability to formulate and support an answer from documentary evidence Think...
  • SwissStyle - South

    SwissStyle - South

    MIL-HDBK-502A Product Support Analysis. Product Support Analysis (PSA) • The analysis required to create a product support strategy. needed to field and maintain the readiness and operational capability of major weapon systems, subsystems, and components, including all functions related to...
  • Building Business: Forming a Glebe BIA

    Building Business: Forming a Glebe BIA

    Kanata North Business Park Sustainable Growth with a BIA
  • Instructor Morteza Maleki PhD 2 3  An interview

    Instructor Morteza Maleki PhD 2 3 An interview

    Semi-structured interviews may also be used in relation to an exploratory study. In descriptive studies, structured interviews can be used as a means to identify general patterns. In an explanatory study, semi-structured interviews may be used in order to understand...
  • Themes and Symbols: Appearances vs. Reality Objective: To ...

    Themes and Symbols: Appearances vs. Reality Objective: To ...

    Ambition and Regicide Objective: To consider how the goals of certain characters are catalysts of conflict in the narrative.. Lesson 4 [Catalyst]A person or thing that precipitates an event causing a reaction. [Regicide] The action of killing a king, queen...


    Christians, Muslims, Jews bury dead in cemetery (dates back to ancient Roman catacombs) Hindus - cremation - wash body in Ganges River and burn on funeral pyre. Ancient Zoroastrians - expose dead to scavenging birds and animals to strip body...