000003064 001__ 3064 000003064 005__ 20250220094651.0 000003064 0247_ $$2DOI$$a10.6083/M4P84BF8 000003064 037__ $$aETD 000003064 245__ $$aNoise accumulation in high dimensional classification 000003064 260__ $$bOregon Health and Science University 000003064 269__ $$a2018 000003064 336__ $$aThesis 000003064 502__ $$bM.S. 000003064 502__ $$gBiostatistics 000003064 520__ $$aA tremendous amount of attention has been paid to Big Data in recent years. Such data hold promise for scientific discoveries but also pose challenges for analyses. In their 2014 article ”Challenges to Big Data analysis,” Fan and colleagues propose that the high dimen- sionality of Big Data introduces statistical problems including noise accumulation. This thesis explores noise accumulation in high dimensional two-group classification problems. First, it aims to determine whether noise accumulation threatens the discriminative ability of classifiers developed with three common machine learning approaches – random forest, support vector machine, and boosted classification trees. Four different scenarios with dif- fering amount of signal strength are simulated to evaluate each method. After determining that noise accumulation may impact the performance of these classifiers, the thesis charac- terizes factors which impact noise accumulation. Simulations varying sample size, signal strength, signal strength proportional to the number predictors, and signal magnitude are conducted with random forest classifiers. Finally, this thesis develops Total Signal Index to summarize the amount of signal relative to noise in a two-group classification problem. Theoretical and empirical versions of this measure are defined and simulations are used to assess them. 000003064 540__ $$fCC BY 000003064 542__ $$fIn copyright - single owner 000003064 650__ $$aMachine Learning$$011449 000003064 650__ $$aClassification$$016768 000003064 650__ $$aSupport Vector Machine$$039736 000003064 650__ $$aBig Data$$012808 000003064 650__ $$aRandom Forest$$013949 000003064 650__ $$aBiostatistics$$038720 000003064 691__ $$aOHSU-PSU School of Public Health$$041366 000003064 692__ $$aDepartment of Public Health and Preventative Medicine$$041444 000003064 7001_ $$aElman, Miriam R.$$uOregon Health and Science University$$041354$$10000-0003-3162-6500 000003064 7201_ $$aChoi, Dongseok$$uOregon Health and Science University$$041354$$7Personal$$eAdvisor 000003064 8564_ $$99cedd3fc-fe8a-403b-bc88-bedeeba79fb6$$s3797965$$uhttps://digitalcollections.ohsu.edu/record/3064/files/3997_etd.pdf 000003064 905__ $$a/rest/prod/vq/27/zn/63/vq27zn63h 000003064 909CO $$ooai:digitalcollections.ohsu.edu:3064$$pstudent-work 000003064 980__ $$aTheses and Dissertations 000003064 980__ $$aDual Author Affiliations Cleanup