Towards new material discovery from CdTe solar cell literature with machine learning

Liu, Xiaolei (2021) Towards new material discovery from CdTe solar cell literature with machine learning. Masters thesis, Middlesex University. [Thesis]

PDF - Final accepted version (with author's formatting)
Download (893kB) | Preview


CdTe solar cells are the most successful second-generation solar technology and produce the lowest-cost electricity in the solar industry. The overarching aim of this project is to apply natural language processing (NLP) technologies to accelerate research in the field of CdTe photovoltaic devices by automatically discovering new material applications. The NLP technologies use various language models to extract most similar words. Consequently, a knowledge diagram is established by connecting these relevant similar words. The Language models include word2vec, GloVe, fastText and BERT, which are trained on a dataset of more than 22,500 paper abstracts. The performance of these language models is evaluated using a custom test dataset. The test dataset consists of 62-word pairs, which are conceptually related in the field of CdTe solar cells. The more similar the first word is to the second word in a word pair, the higher the trained language model scores. The goal of evaluating the trained language model is to find the related concepts in more similar words. The GloVe model achieves the highest score with the custom test dataset. The knowledge diagram established in this work shows the relationships between materials and concepts of interest. In addition, the language model trained on consecutive periods is used to track the timeline of material applications. The top 500 most similar words to “defect” are tracked with timeline and “selenium” is observed to appear in the GloVe model trained on paper abstracts between 2010 and 2020. This corresponds to a journal paper abstract published in 2019, which discussed the selenium passivation effect on the bulk defects of CdTe. Therefore, the knowledge diagram and timeline of material applications provide useful insights for future research and will accelerate material discoveries in the field of CdTe solar cells.

Item Type: Thesis (Masters)
Sustainable Development Goals:
Research Areas: A. > School of Science and Technology > Computer Science
B. > Theses
Item ID: 36682
Depositing User: Lisa Blanshard
Date Deposited: 31 Oct 2022 16:22
Last Modified: 31 Oct 2022 16:24

Actions (login required)

View Item View Item


Activity Overview
6 month trend
6 month trend

Additional statistics are available via IRStats2.