paper

The impact of imbalanced training data on machine learning for author name disambiguation

Jinseok Kim, Jenna Kim

The impact of imbalanced training data on machine learning for author name disambiguation

Name: The impact of imbalanced training data on machine learning for author name disambiguation
Author: Jinseok Kim, Jenna Kim

Jinseok Kim, Jenna Kim

paper2018-07-30English

Start Reading

machine learning financearxiv

Description

In supervised machine learning for author name disambiguation, negative training data are often dominantly larger than positive training data. This paper examines how the ratios of negative to positive training data can affect the performance of machine learning algorithms to disambiguate author names in bibliographic records. On multiple labeled datasets, three classifiers - Logistic Regression, Naïve Bayes, and Random Forest - are trained through representative features such as coauthor names,...