Restricted Research - Award List, Note/Discussion Page

Fiscal Year: 2023

1532  The University of Texas at Arlington  (143420)

Principal Investigator: Suvra Pal,suvra.pal@uta.edu,(682) 248-7470

Total Amount of Contract, Award, or Gift (Annual before 2011): $ 452,082

Exceeds $250,000 (Is it flagged?): Yes

Start and End Dates: 6/1/23 - 5/31/26

Restricted Research: YES

Academic Discipline: Department of Mathematics

Department, Center, School, or Institute: none

Title of Contract, Award, or Gift: Using Machine Learning to Improve the Predictive Accuracy of Disease Cure

Name of Granting or Contracting Agency/Entity: National Institutes of Health
CFDA Link: HHS
93.859

Program Title: NIH R15
CFDA Linked: Biomedical Research and Research Training

Note:

(SAM Category 1.1.1.) With recent advancements in screening, diagnosis and treatment, many diseases are identified at an early stage and a significant proportion of patients suffering from these diseases are clinically cured. These patients will never experience recurrence, metastasis or death due to the primary disease. Among patients with an early-stage disease, it is clinically important to identify cured patients early, based on their pre-treatment characteristics, so that these patients can be protected from the additional risks of high-intensity treatments. Similarly, identifying uncured patients early is also important so that they can be treated timely before their diseases progress to advanced stage for which therapeutic options are rather limited. Such identification is also crucial for clinical trials to develop effective adjuvant therapies. Thus, there is an immense need for a predictive model that can take patient survival data and any available information on patient-related characteristics (or features) as simple inputs and predict the cured or uncured status of patients with high accuracy. Existing state-of-the-art models capable of such prediction come with several drawbacks that make them hard to meet the increasing needs for advanced applications. These include the lack of biological motivation and restrictive assumptions, non-robustness and global convergence problems with the associated estimation procedures, inability to efficiently handle high-dimensional data which leads to impreciseness in predictive accuracies of cure/uncure, and unavailability of the model as ready-to-use software packages with most of them requiring rich programming experience for successful implementation. The proposed research seeks to address the aforementioned issues by developing a next generation model, based on decreased complexity and lower computational cost, for highly accurate prediction of cured or uncured status in the presence of high-dimensional data. The novel idea here is to integrate machine learning with modern predictive statistical model to capture complex relationship in the data, specifically in a high-dimensional setup. We hypothesize that capturing such complex relationship will greatly improve the predictive accuracy of cure/uncure and will also result in improved prediction of the survival distribution of the uncured patients. In particular, the following specific aims are proposed. Aim 1: To develop a novel support vector machine-based predictive model that can capture the patient population as a mixture of cured and uncured patients; Aim 2: To develop new computationally efficient estimation and feature selection methods that can handle high-dimensional data; Aim 3: To develop new method for validating the proposed model using existing data and develop R software package for free and non-profit use. Successful completion of this research will aid in treatment assignment and the need to develop effective adjuvant therapies for the overall benefit of patients. 

Discussion: No discussion notes

 

Close Window

Close Menu