TY  - GEN
AB  - Accurate classification of cancer subtypes based on genomic data is a key component of precision oncology. Subtyping allows for better alignment of therapeutics with the specific biology of an individual patient's tumor. Training predictive models for classification requires collection, sequencing, and labeling of patient samples which incurs significant cost and thus motivates an estimation of the minimum number of samples required to attain a particular classification accuracy. In this work, a minimum sample size estimation strategy for machine learning cancer subtype prediction is developed by combining a subsampling method for learning curve generation with an inverse power law curve fitting method.
AD  - Oregon Health and Science University
AD  - Oregon Health and Science University
AD  - Oregon Health and Science University
AD  - Oregon Health and Science University
AD  - Oregon Health and Science University
AU  - Karlberg, Brian
AU  - Lee, Jordan
AU  - Wong, Chris
AU  - Stuart, Josh
AU  - Ellrott, Kyle
DA  - 2022-04-21
DO  - 10.6083/g158bj19x
DO  - DOI
ID  - 9633
KW  - Machine Learning
KW  - Neoplasms
KW  - cancer subtypes
KW  - power law curve fitting
KW  - feature engineering
KW  - the cancer genome atlas
KW  - precision oncology
L1  - https://digitalcollections.ohsu.edu/record/9633/files/Karlberg_sample_count_abstract_2022-04-20.pdf
L2  - https://digitalcollections.ohsu.edu/record/9633/files/Karlberg_sample_count_abstract_2022-04-20.pdf
L4  - https://digitalcollections.ohsu.edu/record/9633/files/Karlberg_sample_count_abstract_2022-04-20.pdf
LK  - https://digitalcollections.ohsu.edu/record/9633/files/Karlberg_sample_count_abstract_2022-04-20.pdf
N2  - Accurate classification of cancer subtypes based on genomic data is a key component of precision oncology. Subtyping allows for better alignment of therapeutics with the specific biology of an individual patient's tumor. Training predictive models for classification requires collection, sequencing, and labeling of patient samples which incurs significant cost and thus motivates an estimation of the minimum number of samples required to attain a particular classification accuracy. In this work, a minimum sample size estimation strategy for machine learning cancer subtype prediction is developed by combining a subsampling method for learning curve generation with an inverse power law curve fitting method.
PB  - Oregon Health and Science University
PY  - 2022-04-21
T1  - Machine learning for cancer subtyping: sampling effects on predictive accuracy and feature selection
TI  - Machine learning for cancer subtyping: sampling effects on predictive accuracy and feature selection
UR  - https://digitalcollections.ohsu.edu/record/9633/files/Karlberg_sample_count_abstract_2022-04-20.pdf
Y1  - 2022-04-21
ER  -