Skip to main content


#DF15: Translating A Trillion Data Points Into Insights

dreamforce_day_2_twitterDr. Atul Butte of UCSF gave a keynote address at Dreamforce today on the value we can gain from big data. The session abstract:

There is an urgent need to take advantage of our vast new understanding of human genetics and computer technology to improve health and make the practice of medicine more precise. Dr. Atul Butte’s lab at UCSF focuses on building and applying computational tools to convert hundreds of trillions of points of molecular, clinical and epidemiological data collected by researchers worldwide, commonly known as “big data,” into new diagnostics, therapeutics and insights into both rare and common diseases. Dr. Butte, a computer scientist and pediatrician, will highlight how publicly-available molecular measurements to find new uses for drugs including drugs for inflammatory bowel disease and cancer, discovering new diagnostics for complications during pregnancy, and how the next generation of biotech companies might even start in your garage.

Dr. Buttu started with a list of conflicts of interest. He noted his most proud achievement in that list. It’s the students of his who have started companies.

Statistic: The human species produces two Zetabytes of data every year. That a REALLY huge number.

Statistic: By the end of the decade, NASA says they will produce an exabyte of date every day from their telescopes alone.

We have so much scientific data now that science itself is getting obsolete. OK, it’s the scientific method. The data deluge makes the scientific method obsolete. The new challenge is in understanding it.

Where does the data come from

Think of the Gene Chip.  It used to cost $60,000 each but now, you just put a cell onto the chip and get a DNA result.  But times have changed, you can now do that 384 samples at a time for just a couple hundred dollars.  The amazing part is not the chips, those are known.  Scientists are now putting sharing the huge amounts of data. There are 1.8 million samples of cancer, diabetes, heart disease, etc.

Quote: The data is just sitting there waiting for citizen scientists to do something magical.  Any high school kids can go these website and download 60,000 samples of breast cancer.

Note: Brittany Wenger did exactly that. He project received the top prize that year. The next year, she got the top prize again, this time for leukemia.

The database has more samples than any one researcher.  2,445 labs have already shared their data with this database.

Dr. Buttu refers to this as retroactive crowd-sourcing

Linneus is known for categorizing all living species.  But he was also the first one to put diseases into categories. He got it completely wrong. However, doctors still use nosology (classification of disease) every day.  Today it’s known as ICD9 and ICD10.   ICD started in the 1860’s.  It became a League of Nations standard.  We laugh at ICD but Dr. Buttu says that 20 years from now, people will laught at ICD 10.

Quote: We are in a molecular revolution

But it costs between $3b-$12B to develop a new drug. We are reaching the point where it’s getting hard to sustain that level of development cost. The Pharma industry might lose $250 trillion in 2018.  If that’s the case, where are the new drugs going to come from.

It’s going to come down to us to solve the problem and find new drugs.   You can put disease genes in a database and then put in drug gene expressions.  That’s a lot of data and a lot of ideas.  You can find new uses for drugs.

That leads you to new services. Assay Depot will let you buy a variety of experiments.  Choose a test and find out how much it costs and how many mice are needed.  The price for this is $9,000.  You can get the entire experiment done via ecommerce.  This highlights the increasing commoditization of these tests.  It’s so cheap, they can order up the test twice and see if it’s reproducible.


  • Found an anti-seizure drug that can treat inflammatory bowel disease
  • Found a psychiatric drug Imipramine can treat lung cancer.

The total cost to take Imipramine to trials is only $50,000.  That’s a lot less than $3B-$11B.  It started with data that led to mice tests that led to clinical trials.

How to get there

Venture capital can help this knowledge get out of academia to private parties.   Dr. Buttu has a new company that received $3.5M in funding.  It can work.  But that’s not just the reason for doing this. It’s the letters from terminally ill cancer patients who need these drugs and are begging for anything to help.

What else can you do?

This data has possibilities with diagnostics or tests. You can create computer predictions.  Of course, you can go to an ecommerce company which can find the right blood samples you need to test your predictions.  In other words, the hard part is not getting a blood sample of someone who has ALS.  The hard part

Example: Preeclampsia (big cause of death in pregnant mothers and unborn children)   They saw an unmet need, looked at the data, designed a test, tested it on samples, found the result and can test for it.   Carmenta Bioscience was started, received venture funding, and sold in 24 months.

Progress: DNA sequencing is down from factory to a USB stick and now costs only $1,500. In 10 years it will cost $33.

It will soon be routine to get your cancer sequenced to see if another drug that might work on your cancer.


Just measuring something has an effect. If you use fitbit or other tool, you can get results.

Think of the opportunity of getting this data to your doctor.  Scanadu has a simple diagnostic tool which can measure temperature, ECG, Heart rate, etc.  They are funded on Indigogo.

Next Big Data Opportunity

Half of all clinical trials fail.  No one even writes a paper on it.   But what if you could share all that data?  There’s so much with all that data.

What is UCSF Doing?

UCSF is leading a public-private effort to advance data-driven medicine. It’s a $3M investment and at the Institute for Computational Health Sciences.  They want to build the strongest team in the world to gain insights in the onslaught of data.

Bottom Line

Dr. Butte fundamentally believes the next Genetech will be founded in a dorm room or someones garage.

Big data is about a lot of things like mobile, etc. But it’s really about HOPE.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Michael Porter

Mike Porter leads the Strategic Advisors team for Perficient. He has more than 21 years of experience helping organizations with technology and digital transformation, specifically around solving business problems related to CRM and data.

More from this Author

Follow Us