BIOMEDIN 206
Informatics in Industry
A repository of notes from a seminar series in the School of Medicine.
Week 8 - IBM Research
- Dr. Jianying Hu, Distinguished Research Staff Member and Senior Manager, Health Informatics Research
IBM Research
- IBM research has 12 labs globally
- 8/12 labs has healthcare related research
- IBM Watson Healthcare is a subdivision of the Watson research team to apply Watson's NLP to real-world problems
Data-Driven Research vs. Knowledge-Driven Research
- Knowledge-driven research leverages Watson's NLP abilities to draw inferences from published literature, books, and guidelines
- Think crowd-sourced knowledge aggregating
- Personalized decisions in Healthcare. Linking patient similarities to construct predictive models from longitudinal data.
Patient Similarity Analysis
- Applying ML algorithms to automatically learn best metric from observational data and labels provided by experts. Training a similarity matrix to find clinically similar patients.
- Use Cases:
- Identifying likely patients for Acute Hypotensive Episode - pulling out patients with similar trajectories to predict what might happen to the current patient.
- Patient CarePathFlow - visualizes care pathways and associated outcomes. virtualizes and scales the traditional way that physicians probe their own and their colleague's knowledge of what worked and what didn't.
- CHF Predictive Modeling (NIH grant): Data-driven insights that found over 20k features with a novel feature selection algorithm.
- Temporal Event Pattern Mining - identify frequent patterns from the treatment on a particular group of patients.
- Network-based approach for combining patient and drug similarity
- Edges between patients represent similarity of patients
- The patient-drug prior association to indicate how likely a drug might work on a patient.
- PARAMO (Parallel Predictive Modeling Platform)
- Their deployment strategy is not unlike Palantir's they deploy engineers to integrate technologies into the client's database
How to address sparsity/bias?
- EMR Densification uses matrix decomposition
- EMR includes outpatient visits and primary care physicians (more comprehensive than just the bias in the hospitalization record)
Week 7 - Janssen Research
- Patrick Ryan - Head of the Epidemiology analytics group
- Cornell grad, Pfizer, Arizona Arthritis, GlaxoSmithKlin, OMOP (Operations Medical Outcome Partnership), UNC-PhD (Pharmaceutical Policy), Janssen/J&J, OHDSI, Columbia Faculty (Biomedical Informatics)
What do we do?
- Janssen R&D is the research arm focused on pharmaceutical for Johnson & Johnson
- We answer questions founded on observational data
Where do we get data?
- Government resources (Medicare/Medicaid), insurance providers, EMR inpatient/outpatient data
- Healthcare is like blind men and an elephant - datasets all have their own particular biases.
- NO datasource is enough, we need to aggregate and distill answers from the different datasources that we have
Data pipeline
- Convert and structure data into schemas
- ETL design, implement, OMOP CDM, ETL test
- ...ended up skipping this slide
Evidence Sharing paradigms * Single study: write protocol, code, execute, evidence (publish paper) * Real-time query: develop app, design query, submit job, gather evidence * Large-scale analytics: Develop app, execute, explore results (making information more accessible to more people)
Case study: Profiling diabetes treatment utilization
- Visualizing who is taking these treatments, what treatments they're taking to determine who these treatments will be most effective for.
What the future holds
- Patrick envisions a world that no longer is driven by single studies but rather by large-scale collaborative efforts (think: a massive grid that cross-references drugs and what effect they have)
Week 6 - Optum Labs
- Paul Wallace (MD) CMO, VP of Optum Labs
- Started off in Kaiser and led its integration with Epic's EMR system
How do we create a learning healthcare system?
- Key Tenets:
- Decision-making based on data
- Iteration/ongoing improvement of practices
- Shared learning across providers, patients, networks
- Patient-consumer centered and engaged
- Dynamism and Complexity is assumed
- Dynamic Change
- We're data-gatherers (think hunter-gatherers)
Can we create learning in other environments?
- Value proposition in healthcare is changing: volume -> value
- How can we get for-profit and non-profit entities to work on the same project?
- e.g. Life sciences + delivery companies
- Research -> Clinical Translation -> Innovation -> Policy change
- Get systematic about innovation (e.g. Bell labs, Proctor & Gamble): make it reproducible, reliable, flexible
- Architect systems that are not just technical but also social
UnitedHealth and Optum
- UnitedHealth Group is the trading company
- 60 million in the US get benefits from UnitedHealthcare
- Optum drives $50B+ and employes 80,000 to provide health services in addition to the benefits that UnitedHealthcare provides.
- Optum now has ~24 partners
Data today
- Data is deidentified (common hashing + salt across different services)
- Aggregates data that comes from clinical records and insurance claims
Precision Medicine
- Much more than just personalized genomics
- Started out as just trying to better understand heart disease and heart failure cases
- Optum provides a commons where different groups can work on a problem with the same data
- What about deep phenotyping
- Can we identify people that would benefit more from different types of delivery rather
- It's not just a medical problem, we need to include more than just the data about the body but their environments as well (i.e. improved prognosis from wife that cooks rather than living alone)
- "n of 1"
- Decision support: when a doctor sees a patient in a population like the patient and knows what has worked well in the population or community
- Can we create an amazon-like service that caters to this?
How much do we waste in healthcare?
- 30% is wasted (doesn't benefit the patient)
- Categories: Unnecessary services, inefficiently delivered services, prices that are too high, excessive administrative, fraud, missed prevention opportunities
- "pay-go" structure
Week 5 - DNAnexus
- George Asimenos (PhD) - Director of Science at DNAnexus, EE undergrad (from Greece), '09 PhD in CS.
- Joined startup started by his advisor after looking for post-docs
- Role model was Craig Venter, was interested why he was always one step of other people
- Crazy idea: if we can get blood from a crime scene an construct a genetic profile of the suspect
- Powers of ten - a ten-fold magnification into the DNA
- Worked on ENCODE project while at Stanford
Genetics Background
- Mapping - comparing chunks of the genome to see what parts are identical and what bases have been mutated
- Variation Calling - creates a file with variants that details where the current genome has deviated from a reference genome
DNAnexus
- Funded by Google Ventures
- Computation happens dynamically and elastically - computers are spun up on the spot to handle computation (sounds like AWS for genomes...)
- Global and collaborative - their code is also only for developers and researchers to collaborate all over the world
- Demo: showed us a cloud-based data analytics platform to manage DNA sequence data
- Their software interfaces and creates a secure network between R&D departments + clinical centers
- Key distinguishing factors: doesn't focus on the analytical techniques (these are all open source), compliant for regulatory practices for data on the cloud
- They estimate that 1/3 of all the sequencing that has happened has touched DNAnexus. They're currently branching out to China in collaboration.
- Do you guys have competitors?
- They started in 2009, early mover on the cloud and there's nobody competing with them at their scale. Smaller companies focus on filling in the niche needs of organizations with smaller needs.
- What does your R&D focus on?
- Has some R&D focused on developing query tools to make it easier for researchers to download only the relevant genomes they need
- How customizable is the pipeline?
- "Bring your own code" for your own customizable analyses. DNAnexus stresses that their platform is meant for large-scale data analysis. You can take their output and give to intelligent services like StationX to make predictions from the data.
- How are you guys secure?
- They use layers of security on top of AWS. Didn't elaborate the specifics of what layers these are, but he talked about how quickly they had to patch up vulnerabilities from HeartBleed
- Who initiates this process (chicken or the egg)?
- More common case is the pharma companies who ask their vendors to get onboard DNAnexus.
Garvan Institute (DNAnexus client)
- Illumina - announced sequencing for $1000/genome for a machine for $10 million machine (HiSeq X Ten)
- Garvan was first purchaser of the Illumina HiSeq X Ten.
- Clinical Diagnostics - using the instrument to create actionable findings from DNA. Still at a very early stage, but holds potential to be very life changing.
Regeneron (DNAnexus client)
- Pharmaceutical company from Tarrytown, NY
- George D. Yancopoulo, Chief Scientific Office, launched the largest sequencing effort to sequence the genomes of these people
- Collaborating with Geisinger - a healthcare provider from Pennsylvania with very well-curated EMR records.
- PCSK9 - mutation in this gene leads to low cholesterol
- developed a drug (clinical trial stage 3) attacking this gene to lower cholesterol. link
- On track to sequence 80,000 exomes this year. This puts them at 250,000 people sequenced in total
Week 4 - Wearables, Data, and Plumbing
- Rachel Kalmar (PhD) - Physics Undergrad @UCSB Neuroscience PhD @Stanford. Formally of Misfit Wearables,
Came to Stanford interested in Neuroscience and research didn't pan out, but a project at a Hackathon did! Her team got into RockHealth incubator program and got a crash course in entrepreneurship with an open-source integration platform that started off as an Arduino on a t-shirt
Sometimes data is too hard to clean - you need to fix the plumbing that the data comes from
- Big problems: Bad Data, Time is hard (hard to get time variances right - how do you sanitize input effectively
- Time is Hard
- standardized time isn't even that old - we only needed it when cities started centralizing people from towns that were on their own time
- this is where we are with wearables now - everything is in their own time zones
- "OnStar" for the body - early detection of disease.
- Closing feedback loop - smart navigation for our health.
- What are the 3 biggest challenges in wearables:
- Battery Life. Everything is a tradeoff with battery life.
- Standards - none of our devices are really speaking the same language. Most are using low-level bluetooth. It's still not possible to compare stuff like "steps" across different devices or across different activities. (ActivityScore, NikeFuel, etc...)
- Why not? We're still prototyping and it's still too soon for standards.
- Data sharing- how do we make it easier to share data. Is it fear of getting scooped, inability to reproduce findings, platform differences?
- ResearchKit. Allowing phone to be used as a data collection tool.
Week 3 - Google[x] Baseline study
- Vikram Bajaj (PhD) - initially classics @UPenn (Undergrad) + MIT (Grad)
- Faculty at Berkeley and then 2 startups, now at Google[x]
- Comes from spectroscopy background (NMR)
- Functional MRI: noninvasive spectroscopic techniques to see structure + dynamic view of function
- Vision evolved from proton gradient as a signaling mechanism (rather than an energy-generating mechnaism)
- Biomimetic nanoparticle probes for imaging
- SkinScan: targeting cosmetic dermatology - wounding and the healing skin.
- handheld device to detect types of skin filler (the part that heals)
- Chevron startup: Oil well logging - building MRI spectrometers that work from inside out to oil well drills to tell you information about the oil around the drill.
What's happening at Google[x]?
- Intersection of:
- Huge Problem
- Radical Solution
- Breakthrough Technology
- Not to be too afraid of failure
- "It's hard to think of making something 10x better" but this is hard in healthcare
- Problems:
- Self-driving cars - leading causes of death. Replacing drivers to make 10x improvement in solving this problem.
- "Medical Tri-corder" - diagnostics
- Baseline Study
Baseline Study
- Disease pathology: traditionally HE Stain (Hematoxylin and eosin stain) but moving towards "molecular markers" and quantative analyses of biological pathways
- Cancer is not one disease, but hetrogeneous in profile.
- More data beats better algorithms - we learn from the quality of the data and the parts that we pick out to be significant
- Limits of big data:
- 1854 Soho Cholera outbreak. Big data detected the cause of the outbreak. But removing the cause doesn't tell you anything about the disease or lead to any mechanistic insights.
Week 2 - Ginger.io: behavioral Analytics for Healthcare
- Gourab De (PhD) - biostatistics + Machine Learning. Expert in Risk & Outcomes modeling
- Health economics - somone in paying for hospitalization. If you reduce hospitalization, you're saving costs overall
Key Idea: Making data more granular in the time between visits:
- Right now doctors are only getting a glimpse (or a snapchat) of patients from visit to visit. Doctors are at a loss to learn from the time between visits
- Ginger.io provides: Continuous cycle for engagement, feedback and measurement: passive data, in-app surveys. Analyzing behavior, enable interventions -> measuring outcomes
- Integrating provided data with patient data from hospitals
- Virtual silos are used to merge and hold information from the app and the current hospital infrastructure
- cannot integrate with EMR (this is a very hard problem from a privacy perspective and who owns the data)
Insights from ginger.io data:
- Helped uncover a pattern between stress at home + low mobility.
- A patient caring for two mentally unstable children improved from higher mobility
What's next for Sensor Data + Modeling in Healthcare
- Smartphone and wearables have gone mainstream: 58% of US population owns smartphones. 76% of clinicians think health apps would be helpful for patientswith chronic conditions
- Major players entering space - Apple/HealthKit/ResearchKit etc...