The Ctrl+F of Cancer Testing (Featuring A.I.)

How Cancer Diagnosis is Being Made Preventative, and Scalable.

Published in

Good Audience

9 min readMar 11, 2019

When we get sick, we rely on our symptoms to give us an idea of what’s up inside our bodies. Fevers, coughing, shivers, they all do an alright job of giving us a good guess of what we’ve come down with.

With cancer, there is no cough, no runny nose, no fever. The symptoms of human cancers differ so much depending on what you have, and it’s hard to know what you don’t know about.

The mole on your skin that you’ve been worried about? The weird sense of nausea you’ve been getting time-to-time? The pain in your bladder? It could mean so many things, and cancer generally isn’t an initial thought.

Wait, pause, there’s already something wrong…

The event that often kickstarts the process of cancer testing and diagnosis, is a reaction that the patient has to something that isn’t quite right. When dealing with a disease that specializes in hiding from the immune system with reactionary testing, it’s no wonder why so many patients get diagnosed too late.

Cancer Kills Silently

In the realm of cancer testing, the next best thing besides finding someone cancer-negative is to diagnose a patient with early-stage cancer (Stage I or II).

Although the patient is still cancer-positive in this case, they have a really high chance of survival; cancer hasn’t metastasized, in which cancer would spread from its original tumour site to different areas in the body. During the early stages, an operation, combined with the right drugs and monitoring, should do usually do the job.

This isn’t the same for stage III and IV cancer patients, which need to both deal with their existing large tumours, as well as likely metastasized tumours elsewhere in the body. It’s, therefore, no surprise why these stages of cancer are more fatal on average.

Preventative diagnosis aims to integrate disease testing and screening into the average physician or clinical checkup. Applying to cancer, this would let the patient shift from being put at the hands of a reactionary testing process, to being able to constantly monitor their bodies for cancer.

A recent paper published by the University Health Network (UHN) in Toronto, is truly bringing light to this new generation of cancer testing.

Relying on immune responses to cancer is not reliable, with a disease that naturally avoids the immune system. Only recently have we started figuring out immunotherapies and immune system therapy for cancer.

Cell-Free DNA

We liken cells in our body to little, molecular cities on their own. Just like how a city produces municipal waste, our cells do too. Some of this waste comes in the form of DNA junk, known as cell-free DNA. In the context of cancer, both healthy cells and tumour cells alike, release this cell-free DNA.

Depending on where this cell-free DNA comes from, we classify it differently.

Free-flowing DNA that comes from healthy cells, is called circulating-free DNA (cfDNA), while those that come from cancer cells are called circulating-tumour DNA (ctDNA).

On the surface, ctDNA and cfDNA may look like two of the same kind- junk DNA just floating in the blood. That is until we take a deeper look at the epigenetics of these fragments.

DNA Methylation is Where its At

All the cells in our body have mostly the same genomic data, so why do we have completely different cells: from intestinal cells to neurons?

The answer lies in epigenetics, and how different enzymes in the body interact with the genome.

The genes in our body are covered in epigenetic markers, which are chemical tags that regulate the expression of such genes, as well as how these genes are wrapped and structured within the cell nucleus.

One of the most well-known tags that epigenetics works with, is called DNA Methylation. This is where a methyl chemical group is added to the Cytosine Base of specific DNA regions.

When DNA gets methylated, it usually results in that specific gene or region being repressed, and not fulfilling its function. The opposite is true when areas of DNA don’t possess these methylation tags; they remain actively expressed and fulfil their function.

Looking at the human genome, you will usually find a lot of methylation tags placed in noncoding parts of the genome, known as intergenic regions.

Meanwhile, in crucial genomic regions called promoters, the average methylation is relatively low. This allows the promoters to fulfil their roles in being areas that kickstart the expression of their respective genes.

The kicker? Cancerous ctDNA’s have the exact opposite pattern of methylation throughout their genome. Many ctDNA promoters are heavily methylated (hypermethylated), while their intergenic regions are left untouched (hypomethylated).

Profiling Cancer, Harvard Style

Between ctDNA and cfDNA samples, we can amplify differences in methylation patterns, in order to highlight differences in methylation between samples, known as differentially methylated regions (DMRs). This amplification process, allows us to differentiate ctDNA and cfDNA, and thus determine if cancer is present in a patient, judging by the presence of the ctDNA biomarker.

However, the test can do better than just determine the presence of cancer in a patient. Between different ctDNA samples, there existed ~520,000 DMRs that were still around. By using ML algorithms to analyze these DMRs to pre-diagnosed cancer patients and their plasma samples, the test managed to differentiate ctDNA samples between 20 different human cancers!

An easy way to visualize this is to imagine our cfDNA test like an Ivy League university admission process.

Between your pool of applicants, you would group the applicants with the highest marks/SAT scores, and separate them from those who applied with lower scores.

This is like separating the ctDNA and cfDNA samples, where we want to find traces of ctDNA, in order to diagnose the presence of cancer in a patient.

However, between your ctDNA (High-scoring pool), you still need to narrow things down more. Maybe you’re looking for a certain type of person for your school, or in our case, you’re looking for a specific cancer type.

An Ivy-League school would then differentiate their high scoring applicants by their essays and extracurriculars, or for us, the 520,000 DMRs between ctDNAs that highlight specific cancer types.

At the end of one thorough process, we can determine whether or not a patient has cancer and what specific type of cancer the patient has, all with ~95% accuracy.

Analyzing DMR’s with Machine Learning

The process doesn’t end with our 520,000 DMRs, as the fun has just begun!

Organizing the Data Points

In order to get a solid data set that is concise and follows the specific trends we want to observe, the DMRs have to be run through Machine Learning algorithms, in order to compress, train, and visualize the data.

For this test, our algorithms are written in “R”, a data-science programming language designed for visualization and manipulation of data points. The test uses programs written up by those in the R community (aka. “Packages”), the test data can be effectively worked with.

Our DMRs are expressed as Beta values ( β values), rating the degree of methylation in the DMR from 0 (No methylation) to 1 (Full methylation).

With the beta value, we can visualize a spectrum of methylation between DMRs, distinguishing hypermethylated DNA from hypomethylated DNA.

The data is first put through the “Caret”-R Package, which cuts down our overwhelming 520,000 samples down to a short and sweet 300 points, made up of 150 Hypermethylated (More Methylated) DMRs, and 150 Hypomethylated (Less methylated) DMRs.
From here, the samples are differentiated between cancerous and healthy patients, by running the DMR samples between two R packages: “MEDIP” and “Despseq-2". This can identify the methylation pattern differences between the ctDNA and cfDNA.
Next, our 300 new DMRs need to be run through another R package, called Binomial GLNET. This package takes our 300 DMRs and trains them in 7 different data classes, essentially creating 2100 (300 x 7) high-quality training points, to train our algorithm.
To finally differentiate between ctDNA samples, Harvard style, we have to run our DMR samples through the MethylKit “R” package, in order to profile ctDNA’s from 24 different human cancers.

What we end up getting: a lot of data points on high dimensions, and an accurate profile of a patient’s cancer, all before any formal screening or imaging.

Sandwiching 7D Data

It’s hard to imagine 8-dimension or 7-dimensional anything, let alone cancer data. Therefore these high-dimensional data points to be compressed in something that we can comprehend, such as in 2D or 3D terms.

Using a special tool called t-stochastic neighbour embedding (t-SNE), we can compress the high-dimensional data and approximate it in a 3D and 2D equivalent graph.

There’s some approximation involved when it comes to translating this high dimensional data to 2D/3D, however, the approximations are precise.

This is like when squishing down a sandwich in a sandwich grill, your toppings (high-dimensional data) will get their physical form slightly altered.

A simplified explanation of t-SNE’s main purpose, except the fundamental equaiton. (PC: https://www.slideshare.net/ssuserb667a8/visualization-data-using-tsne)

Once our data is compressed, organized, and compressed, our algorithms are prepared to do their job. Data biases have to be accounted for afterwards, and from there the test needs to also figure out the cancer stage.

Final Thoughts and Key Takeaways

Cancer is the real “Silent Killer” of human health, sneaking past our immune system and outsmarting our natural bodily reaction by design. Reactionary testing cannot be expected to outperform something that by nature doesn’t often cause reaction until it’s too late.

As we’re dealing with a disease that kills with time and stealth, both speed and detection are our greatest assets. Preventative cancer testing is putting us in a position to save so many lives from having to go through the prolonged fight against cancer!

Here’s what you gotta know about this cancer detection tool of the future:

Our means of cancer diagnosis right now often rely on a reaction to kickstart everything.
Differences in DNA Methylation patterns on cancerous and healthy cell-free DNA (ctDNA/cfDNA respectively), can allow us to thoroughly profile cancer, if present, in a patient.
This test uses a series of packages in the “R” data science language, in order to break down, filter, and train our algorithms on the data inputs.
A preventive cancer diagnosis will help save countless lives from late-stage cancer diagnosis, giving more and more patients a fighting chance of survival!

With the cancer test, a patient’s fight against cancer may start by noticing the presence of ctDNA in her blood plasma, during a regular clinical checkup.

Imagine your grandmother getting her regular clinical blood test. Her blood plasma is run through the test, and her DMR’s indicate that she has early-stage breast cancer.

Because she was seamlessly being tested for cancer in her previous checkups, the sudden development of cancer isn’t life-changing news. She starts early-stage cancer treatment, monitors her condition, and is on the path to full recovery.