Workshop on Applied Data Science for Healthcare

News

Invited Speaker lineup published.
Accepted paper list published. See Papers and Program announced
Paper acceptance notification updated to Jul 07, 2023
Paper submission deadline extended to Jun 15, 2023. See updated dates

Overview

Generative models have a long history and there are many application areas in medical machine learning (ML) and artificial intelligence (AI). With the development in deep neural networks, researchers focused on Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and autoregressive models in the past years. More recently, very large deep generative models have gained popularity, including the large language models (LLMs) such as Generative Pre-trained Transformer 3 (GPT-3) and text-to-image diffusion models such as Stable Diffusion. In healthcare research, one of the most common applications of generative models has been the generation of synthetic data for training of machine learning models. It is often used to increase representation of patient subgroups to improve generalization and mitigate algorithmic biases. This is especially valuable in application domains where data is hard to come by. The generative models can also be used for specific model evaluation purposes (e.g., within a robustness or generalizability assessment; virtual clinical trials). They can help to generate synthetic ground truth data when labeling of data is extremely burdensome. Moreover, generative models have been successfully applied in data preprocessing or enhancement, such as image reconstruction or denoising deep learning algorithms in the medical imaging space. While such generative models have proven their utility in the health domain, many open questions remain with regard to the approaches for evaluation of their effectiveness and safety. Testing and evaluation of such models require specific considerations. Taking the assessment of the gap between the generated data and the reality — so called Sim2Real challenge — as an example, it is often unclear how to (i) quantify this domain gap and its impact on downstream performance in a meaningful manner and (ii) reduce it in order to fully leverage the potential of generative models. New challenges are also emerging on a more grand scale. The recent advances in Large Language Models (LLMs) makes the generation of data even more effortless. However, the misinformation that is generated with such models may cause a “pollution” of data for future model training. We can expect an increased need for effective fact checking approaches. Despite the huge growth of this area of research, the actual use of NLP technology for fact checking is still in its infancy. In this one day workshop we would like to discuss some of the most common applications of generative models in the ML/AI research in the healthcare domain, the current challenges and also explore what are the potential new areas of application

Previous Iterations

KDD Health Day - DSHealth 2022: 2022 KDD Workshop on Applied Data Science for Healthcare: Transparent and Human-centered AI
KDD Health Day - DSHealth 2021: Joint KDD 2021 Health Day and 2021 KDD Workshop on Applied Data Science for Healthcare State of XAI and trustworthiness in Health
DSHealth 2020: 2020 KDD Workshop on Applied Data Science for Healthcare: Trustable and Actionable AI for Healthcare
DSHealth 2019: 2019 KDD Workshop on Applied Data Science for Healthcare: Bridging the Gap between Data and Knowledge
MLMH 2018: 2018 KDD Workshop on Machine Learning for Medicine and Healthcare

Speakers at-a-glance

See detailed information here

Deepti Pandita
Chief Medical Information officer, UC Irvine Health System;
AMIA physician of the year

Prasanna Sattigeri
Principal Research Scientist at IBM Research

Nitin Aggarwal
Head of Cloud AI Service at Google

Hoifung Poon
General Manager, Health Futures, Microsoft

Program

DSHealth 2023 will be held on August 07, 2023. See detailed schedule below.

Please refer to the KDD 2023 for up-to-date changes on venues and timings.

DSHealth Schedule

Time slot	Event
PDT
Aug 07, 1:00 pm - 3:00 pm	Session 1
1:00 pm - 1:35 pm	Invited Talk: Deepti Pandita
1:35 pm - 2:10 pm	Invited Talk: Prasanna Sattigeri
2:10 pm - 2:45 pm	Invited Talk: Nitin Aggarwal
2:45 pm - 3:00 pm	Spotlight Presentation: #1: Improving Primary Healthcare Workflow Using Extreme Summarization of Scientific Literature Based on Generative AI #13: Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records
3:00 pm - 3:30 pm	Break
Aug 07, 3:30 pm - 5:00 pm	Session 2
3:30 pm - 3:45 pm	Spotlight Presentation: #12: Bio+Clinical BERT, BERT Base, and CNN Performance Comparison for Predicting Drug-Review Satisfaction #14: A Novel U-Net Architecture for Denoising of Real-world Noise Corrupted Phonocardiogram Signal
3:45 pm - 4:20 pm	Invited Talk: Hoifung Poon
4:20 pm - 5:00 pm	Panel: with all speakers Moderated by Sabrina Hsueh

Invited Speakers

Deepti Pandita, Chief Medical Information officer, UC Irvine Health System; AMIA physician of the year

Bio: Dr. Deepti Pandita is currently the chief Medical Information officer at University of California Irvine Healthsystem, the only safety net academic medical center and also a Level one Adult and Pediatric Trauma center in Orange County. Dr. Pandita is Board Certified in Internal Medicine and in Clinical Informatics. She was previously CHIO and Program Director of the Clinical Informatics Fellowship at Hennepin Healthcare in Minneapolis, MN. Dr. Pandita is a Board member of the American Medical Informatics Association and Chair of the Medical Informatics Committee for the American college of Physicians leading several National committees and initiatives. She has led numerous sessions at AMIA CIC on Generative AI for Health.

Prasanna Sattigeri, Principal Research Scientist at IBM Research

Bio: Dr. Prasanna Sattigeri is a Principal Research Scientist at IBM Research. His main research goal is building reliable AI solutions. His research interests span several areas in machine learning and artificial intelligence; this includes Bayesian inference, deep generative modeling, uncertainty quantification and learning with limited data. His current work focuses on developing theory and practical systems for machine learning applications that demand constraints such as reliability, fairness, and interpretability. He is a core contributor to several open-source trustworthy AI toolkits - AI Fairness 360, AI Explainability 360, and Uncertainty Quantification 360.

Nitin Aggarwal, Head of Cloud AI Service at Google

Bio: Nitin is a Machine learning and Artificial Intelligence (AI) professional with over a decade of Industry experience. He is currently heading the Cloud AI Services team (US Central and HCLS) for Google. He leads AI solutions/products development for Google's strategic enterprise customers and advises building enterprise level AI strategy to Fortune 500 organizations. Analytics India Magazine awarded him as '40 under 40 Data Scientists' in 2021 and Global AI hub added him as Global AI Thought Leader for their 10million.AI project. He also co-authored 2 advanced ML courses on Coursera for Google Cloud.

Hoifung Poon, General Manager, Health Futures, Microsoft

Bio: Hoifung Poon is General Manager at Health Futures in Microsoft Research and an affiliated professor at the University of Washington Medical School. He leads biomedical AI research and incubation, with the overarching goal of structuring medical data to optimize delivery and accelerate discovery for precision health. His team and collaborators are among the first to explore large language models (LLMs) in health applications, from foundational research to incubations at large health systems and life science companies, and ultimately to productization. He has given tutorials on these topics at top conferences such as the Association for Computational Linguistics (ACL), the Association for the Advancement of Artificial Intelligence (AAAI), and Knowledge Discovery and Data Mining (KDD). His research spans a wide range of problems in machine learning and natural language processing (NLP), and his prior work has been recognized with Best Paper Awards from premier venues such as the North American Chapter of the Association for Computational Linguistics (NAACL), Empirical Methods in Natural Language Processing (EMNLP), and Uncertainty in AI (UAI). He received his PhD in Computer Science and Engineering from the University of Washington, specializing in machine learning and NLP.

Accepted Papers

We have accepted 4 papers for presentation at the workshop. All papers will be presented as posters within the workshop. PDF version of the final papers, if provided by the authors, are hyperlinked below.

#1 Improving Primary Healthcare Workflow Using Extreme Summarization of Scientific Literature Based on Generative AI Gregor Stiglic, Lucija Gosak, Primož Kocbek, Leon Kopitar, Prithwish Chakraborty, Pablo Meyer, Zhe He and Jiang Bian.

#13 Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records Shivani Shekhar, Simran Tiwari, Thomas Rensink, Ramy Eskander and Wael Salloum

#12 Bio+Clinical BERT, BERT Base, and CNN Performance Comparison for Predicting Drug-Review Satisfaction Yue Ling

#14 A Novel U-Net Architecture for Denoising of Real-world Noise Corrupted Phonocardiogram Signal Ayan Mukherjee, Rohan Banerjee and Avik Ghose.

Call for Papers

We invite full papers, as well as work-in-progress on the application of data science in healthcare. Topics may include, but not limited to, the following topics (For more information see workshop overview) with special focus on generative models for healthcare.

Synthetic data
- Training data augmentation, e.g. in computer vision, medical imaging algorithm
- Physics- and Chemistry- based generative models
- Simulated data and privacy preserving algorithms
- In-silico clinical trials
- Testing data, e.g. synthetic ground truth
- Generative AI for tabular data
- Interpretability
Privacy and security of generative AI
- Inverse models for source verification
- Watermark for AI generated data
- Factual capabilities of generative AI
Testing and evaluation of the generative models
- Sim2Real domain gap
- Data selection & quality aspects of the data (distribution shifts, monitoring of the models)
- Fact-checking
- Generating new healthcare-specific benchmarks
- Bias detection and mitigation in healthcare
- Reliability and trustworthiness of the generative models (actionable plans)
Application of LLMs
- Systematic literature review
- Modernizing pharmaceutical call center operations
- Chatbot for patient registration, triage, scheduling, and rooming
- Semantic data augmentation
- Others
Responsible use of Generative AI
- Generative AI Fairness and Bias detection
- Generative AI bias mitigation (e.g., adversarial training)
- Generative AI model transparency
- Generative AI ethics and responsible AI risk management
Other
- Knowledge representation learning

Papers must be submitted in PDF format to easychair https://easychair.org/conferences/?conf=dshealth2023 and formatted according to the new Standard ACM Conference Proceedings Template . Authors are encouraged to use the Overleaf template . Papers must be a maximum length of 4 pages, excluding references.

The program committee will select the papers based on originality, presentation, and technical quality for spotlight and/or poster presentation.

Key Dates (AOE)

Paper Submission opens: Apr 30, 2023
Paper Submission deadline: ~~May 23, 2023~~Jun 15, 2023
Acceptance Notice: ~~Jun 23, 2023~~Jul 07, 2023
Workshop Date: Aug 07, 2023

Unless otherwise specified, all deadlines correspond to 11:59 PM Hawaii Standard Time ( HST). Workshop will be held based on Eastern Daylight Time (EDT).

Organizers

Fei Wang, Cornell University, USA
Prithwish Chakraborty, Amazon Science, USA
Tao Xu, F-Hoffmann la Roche, Switzerland
Gregor Stiglic, University of Maribor, Slovenia
Pei-Yun Sabrina Hsueh, Pfizer Inc, USA
Lixia Yao, Polygon Health Analytics LLC, USA
Jiang Bian, University of Florida, USA
Alexej Gossmann, FDA, USA
Florian Buettner, Frankfurt University/German Cancer Research Center (DKFZ), Germany