TY - GEN AB - Accurate classification of cancer subtypes based on genomic data is a key component of precision oncology. Subtyping allows for better alignment of therapeutics with the specific biology of an individual patient's tumor. Training predictive models for classification requires collection, sequencing, and labeling of patient samples which incurs significant cost and thus motivates an estimation of the minimum number of samples required to attain a particular classification accuracy. In this work, a minimum sample size estimation strategy for machine learning cancer subtype prediction is developed by combining a subsampling method for learning curve generation with an inverse power law curve fitting method. AD - Oregon Health and Science University AD - Oregon Health and Science University AD - Oregon Health and Science University AD - Oregon Health and Science University AD - Oregon Health and Science University AU - Karlberg, Brian AU - Lee, Jordan AU - Wong, Chris AU - Stuart, Josh AU - Ellrott, Kyle DA - 2022-04-21 DO - 10.6083/g158bj19x DO - DOI ID - 9633 KW - Machine Learning KW - Neoplasms KW - cancer subtypes KW - power law curve fitting KW - feature engineering KW - the cancer genome atlas KW - precision oncology L1 - https://digitalcollections.ohsu.edu/record/9633/files/Karlberg_sample_count_abstract_2022-04-20.pdf L2 - https://digitalcollections.ohsu.edu/record/9633/files/Karlberg_sample_count_abstract_2022-04-20.pdf L4 - https://digitalcollections.ohsu.edu/record/9633/files/Karlberg_sample_count_abstract_2022-04-20.pdf LK - https://digitalcollections.ohsu.edu/record/9633/files/Karlberg_sample_count_abstract_2022-04-20.pdf N2 - Accurate classification of cancer subtypes based on genomic data is a key component of precision oncology. Subtyping allows for better alignment of therapeutics with the specific biology of an individual patient's tumor. Training predictive models for classification requires collection, sequencing, and labeling of patient samples which incurs significant cost and thus motivates an estimation of the minimum number of samples required to attain a particular classification accuracy. In this work, a minimum sample size estimation strategy for machine learning cancer subtype prediction is developed by combining a subsampling method for learning curve generation with an inverse power law curve fitting method. PB - Oregon Health and Science University PY - 2022-04-21 T1 - Machine learning for cancer subtyping: sampling effects on predictive accuracy and feature selection TI - Machine learning for cancer subtyping: sampling effects on predictive accuracy and feature selection UR - https://digitalcollections.ohsu.edu/record/9633/files/Karlberg_sample_count_abstract_2022-04-20.pdf Y1 - 2022-04-21 ER -