ウシオ ユウスケ   Ushio Yuusuke
  潮 雄介
   所属   医学部 医学科(東京女子医科大学病院)
   職種   助教
論文種別 原著
言語種別 英語
査読の有無 査読あり
表題 Machine learning for morbid glomerular hypertrophy.
掲載誌名 正式名:Scientific reports
略  称:Sci Rep
ISSNコード:20452322/20452322
掲載区分国外
巻・号・頁 12(1),pp.19155
著者・共著者 Ushio Yusuke, Kataoka Hiroshi, Iwadoh Kazuhiro, Ohara Mamiko, Suzuki Tomo, Hirata Maiko, Manabe Shun, Kawachi Keiko, Akihisa Taro, Makabe Shiho, Sato Masayo, Iwasa Naomi, Yoshida Rie, Hoshino Junichi, Mochizuki Toshio, Tsuchiya Ken, Nitta Kosaku
発行年月 2022/11
概要 A practical research method integrating data-driven machine learning with conventional model-driven statistics is sought after in medicine. Although glomerular hypertrophy (or a large renal corpuscle) on renal biopsy has pathophysiological implications, it is often misdiagnosed as adaptive/compensatory hypertrophy. Using a generative machine learning method, we aimed to explore the factors associated with a maximal glomerular diameter of ≥ 242.3 μm. Using the frequency-of-usage variable ranking in generative models, we defined the machine learning scores with symbolic regression via genetic programming (SR via GP). We compared important variables selected by SR with those selected by a point-biserial correlation coefficient using multivariable logistic and linear regressions to validate discriminatory ability, goodness-of-fit, and collinearity. Body mass index, complement component C3, serum total protein, arteriolosclerosis, C-reactive protein, and the Oxford E1 score were ranked among the top 10 variables with high machine learning scores using SR via GP, while the estimated glomerular filtration rate was ranked 46 among the 60 variables. In multivariable analyses, the R2 value was higher (0.61 vs. 0.45), and the corrected Akaike Information Criterion value was lower (402.7 vs. 417.2) with variables selected with SR than those selected with point-biserial r. There were two variables with variance inflation factors higher than 5 in those using point-biserial r and none in SR. Data-driven machine learning models may be useful in identifying significant and insignificant correlated factors. Our method may be generalized to other medical research due to the procedural simplicity of using top-ranked variables selected by machine learning.
DOI 10.1038/s41598-022-23882-7
PMID 36351996