TY - GEN N2 - Accurate classification of cancer subtypes based on genomic data is a key component of precision oncology. Subtyping allows for better alignment of therapeutics with the specific biology of an individual patient's tumor. Training predictive models for classification requires collection, sequencing, and labeling of patient samples which incurs significant cost and thus motivates an estimation of the minimum number of samples required to attain a particular classification accuracy. In this work, a minimum sample size estimation strategy for machine learning cancer subtype prediction is developed by combining a subsampling method for learning curve generation with an inverse power law curve fitting method. DO - 10.6083/g158bj19x DO - DOI AB - Accurate classification of cancer subtypes based on genomic data is a key component of precision oncology. Subtyping allows for better alignment of therapeutics with the specific biology of an individual patient's tumor. Training predictive models for classification requires collection, sequencing, and labeling of patient samples which incurs significant cost and thus motivates an estimation of the minimum number of samples required to attain a particular classification accuracy. In this work, a minimum sample size estimation strategy for machine learning cancer subtype prediction is developed by combining a subsampling method for learning curve generation with an inverse power law curve fitting method. AD - Oregon Health and Science University AD - Oregon Health and Science University AD - Oregon Health and Science University AD - Oregon Health and Science University AD - Oregon Health and Science University T1 - Machine learning for cancer subtyping: sampling effects on predictive accuracy and feature selection DA - 2022-04-21 AU - Karlberg, Brian AU - Lee, Jordan AU - Wong, Chris AU - Stuart, Josh AU - Ellrott, Kyle L1 - https://digitalcollections.ohsu.edu/record/9633/files/Karlberg_sample_count_abstract_2022-04-20.pdf PB - Oregon Health and Science University PY - 2022-04-21 ID - 9633 L4 - https://digitalcollections.ohsu.edu/record/9633/files/Karlberg_sample_count_abstract_2022-04-20.pdf KW - Machine Learning KW - Neoplasms KW - cancer subtypes KW - power law curve fitting KW - feature engineering KW - the cancer genome atlas KW - precision oncology TI - Machine learning for cancer subtyping: sampling effects on predictive accuracy and feature selection Y1 - 2022-04-21 L2 - https://digitalcollections.ohsu.edu/record/9633/files/Karlberg_sample_count_abstract_2022-04-20.pdf LK - https://digitalcollections.ohsu.edu/record/9633/files/Karlberg_sample_count_abstract_2022-04-20.pdf UR - https://digitalcollections.ohsu.edu/record/9633/files/Karlberg_sample_count_abstract_2022-04-20.pdf ER -