Colorectal Cancer

An innovative method to identify and screen patients at a silent risk for colorectal cancer


Colorectal cancer (CRC) is readily amenable to early diagnosis through fecal occult blood testing (FOBT) and/or colonoscopy, as well as subsequent curative treatment through polyp removal. However, many health systems report low participation in screening. There is a need to identify members at highest risk for developing the disease in order to target intensive outreach for screening.

Context and Aims

Low participation in CRC screening has been reported in various countries. While one approach to addressing this challenge is to educate the general population on the importance of undergoing screening, there is also the approach to conduct targeted screening among patients who are most likely to develop CRC.

The goal of this project was to identify early predictors for CRC and to create a risk-scoring model in order to conduct directed screening outreach.

Research Approach

We identified people who should be screened based on their age (50-75 years), but have not undergone a recent FOBT and/or colonoscopy. Among more than 850,000 members, those who developed CRC during the course of a year were identified.

An iterative machine learning-based model was created using over 1,000 clinical variables to identify which constellation of indicators and symptoms were strongly associated with the onset of CRC.


Nursing staff at local clinics are to receive a list of high risk patients from the central data warehouse, determined by the risk model based on their integrated medical history.

Patients are contacted for a primary care visit, and are then screened by the physician for initial suggestive signs and symptoms. If determined, they are recommended an FOBT or screening colonoscopy, and referred to a specialist for additional consultation.



Key Findings and/or Potential Impact

The rate of CRC was ten times higher in the high risk group as opposed to the general population (~1% versus ~0.1%, respectively). The algorithm was applied to flag several thousand patients who have an increased risk to develop CRC in the upcoming year. Over 3,500 patients were identified who were unlikely to be captured by the previous screening policy.

In an integrated health care system that combines clinical data with multiple levels of care provision, can readily identify signals that are associated with populations at high-risk for CRC, and create a program for targeted screening among a small group that is likely to have a high positive predictive value.

This innovative type of intervention is more practical than employing broad educational efforts among the general population and is more likely to be successful at identifying and treating patients who are most likely to develop CRC. We hypothesize that if a smaller sample of highly targeted patients were to be screened, we could identify up to an additional 10% of CRC cases prior to the onset of clinical symptoms.