Study on The Performance of Statistical Machine Translation and Neural Machine Translation for Prefix Resolution in Software Engineering
Neural Machine Translation (NMT) is the current trend approach in Natural Language Processing (NLP) to solve the problem of automatically inferring the content of target language given the source language. The ability of NMT is to learn deep knowledge inside languages by deep learning approaches. However, prior works show that NMT has its own drawbacks in NLP and in some research problems of Software Engineering (SE). In this work, we provide a hypothesis that SE corpus has inherent characteristics that NMT will confront challenges compared to the state-of-the-art translation engine based on Statistical Machine Translation. We introduce a problem which is significant in SE and has characteristics that challenges the ability of NMT to learn correct sequences, called Prefix Mapping. We implement and optimize the original SMT and NMT to mitigate those challenges. By the evaluation, we show that SMT outperforms NMT for this research problem, which provides potential directions to optimize the current NMT engines for specific classes of parallel corpus. By achieving the accuracy from 65 % to 90 % for code tokens generation of 1000 Github code corpus, we show the potential of using MT for code completion at token level.
Committee: Ali Jannesari (major professor), Wei Le, and Carl Chang.
Join on WebEx: https://iastate.webex.com/iastate/j.php?MTID=mdf3348ed0e7af2617851bafba877a815
Meeting number: 120 849 2665