Workshop on Applied Data Science for Healthcare

Applications and New Frontiers of Generative Models for Healthcare

 Workshop Date: 2023, August 07
Follow us on Twitter
For registration check KDD 2023 website for latest infromation


News

Overview

Generative models have a long history and there are many application areas in medical machine learning (ML) and artificial intelligence (AI). With the development in deep neural networks, researchers focused on Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and autoregressive models in the past years. More recently, very large deep generative models have gained popularity, including the large language models (LLMs) such as Generative Pre-trained Transformer 3 (GPT-3) and text-to-image diffusion models such as Stable Diffusion. In healthcare research, one of the most common applications of generative models has been the generation of synthetic data for training of machine learning models. It is often used to increase representation of patient subgroups to improve generalization and mitigate algorithmic biases. This is especially valuable in application domains where data is hard to come by. The generative models can also be used for specific model evaluation purposes (e.g., within a robustness or generalizability assessment; virtual clinical trials). They can help to generate synthetic ground truth data when labeling of data is extremely burdensome. Moreover, generative models have been successfully applied in data preprocessing or enhancement, such as image reconstruction or denoising deep learning algorithms in the medical imaging space. While such generative models have proven their utility in the health domain, many open questions remain with regard to the approaches for evaluation of their effectiveness and safety. Testing and evaluation of such models require specific considerations. Taking the assessment of the gap between the generated data and the reality — so called Sim2Real challenge — as an example, it is often unclear how to (i) quantify this domain gap and its impact on downstream performance in a meaningful manner and (ii) reduce it in order to fully leverage the potential of generative models. New challenges are also emerging on a more grand scale. The recent advances in Large Language Models (LLMs) makes the generation of data even more effortless. However, the misinformation that is generated with such models may cause a “pollution” of data for future model training. We can expect an increased need for effective fact checking approaches. Despite the huge growth of this area of research, the actual use of NLP technology for fact checking is still in its infancy. In this one day workshop we would like to discuss some of the most common applications of generative models in the ML/AI research in the healthcare domain, the current challenges and also explore what are the potential new areas of application

Previous Iterations

  • KDD Health Day - DSHealth 2022: 2022 KDD Workshop on Applied Data Science for Healthcare: Transparent and Human-centered AI
  • KDD Health Day - DSHealth 2021: Joint KDD 2021 Health Day and 2021 KDD Workshop on Applied Data Science for Healthcare State of XAI and trustworthiness in Health
  • DSHealth 2020: 2020 KDD Workshop on Applied Data Science for Healthcare: Trustable and Actionable AI for Healthcare
  • DSHealth 2019: 2019 KDD Workshop on Applied Data Science for Healthcare: Bridging the Gap between Data and Knowledge
  • MLMH 2018: 2018 KDD Workshop on Machine Learning for Medicine and Healthcare

Speakers at-a-glance

See detailed information here

Profile Image
Deepti Pandita
Chief Medical Information officer, UC Irvine Health System;
AMIA physician of the year
Profile Image
Prasanna Sattigeri
Principal Research Scientist at IBM Research
Profile Image
Nitin Aggarwal
Head of Cloud AI Service at Google
Profile Image
Hoifung Poon
General Manager, Health Futures, Microsoft


Program

DSHealth 2023 will be held on August 07, 2023. See detailed schedule below.

Please refer to the KDD 2023 for up-to-date changes on venues and timings.

DSHealth Schedule

Time slot Event
PDT
Aug 07,
1:00 pm - 3:00 pm
Session 1
1:00 pm - 1:35 pm Invited Talk:
Deepti Pandita
1:35 pm - 2:10 pm Invited Talk:
Prasanna Sattigeri
2:10 pm - 2:45 pm Invited Talk:
Nitin Aggarwal
2:45 pm - 3:00 pm Spotlight Presentation:
#1: Improving Primary Healthcare Workflow Using Extreme Summarization of Scientific Literature Based on Generative AI
#13: Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records
3:00 pm - 3:30 pm Break
Aug 07,
3:30 pm - 5:00 pm
Session 2
3:30 pm - 3:45 pm Spotlight Presentation:
#12: Bio+Clinical BERT, BERT Base, and CNN Performance Comparison for Predicting Drug-Review Satisfaction
#14: A Novel U-Net Architecture for Denoising of Real-world Noise Corrupted Phonocardiogram Signal
3:45 pm - 4:20 pm Invited Talk:
Hoifung Poon
4:20 pm - 5:00 pm Panel: with all speakers
Moderated by Sabrina Hsueh

Invited Speakers

Deepti Pandita, Chief Medical Information officer, UC Irvine Health System; AMIA physician of the year


Bio: Dr. Deepti Pandita is currently the chief Medical Information officer at University of California Irvine Healthsystem, the only safety net academic medical center and also a Level one Adult and Pediatric Trauma center in Orange County. Dr. Pandita is Board Certified in Internal Medicine and in Clinical Informatics. She was previously CHIO and Program Director of the Clinical Informatics Fellowship at Hennepin Healthcare in Minneapolis, MN. Dr. Pandita is a Board member of the American Medical Informatics Association and Chair of the Medical Informatics Committee for the American college of Physicians leading several National committees and initiatives. She has led numerous sessions at AMIA CIC on Generative AI for Health.

Prasanna Sattigeri, Principal Research Scientist at IBM Research


Bio: Dr. Prasanna Sattigeri is a Principal Research Scientist at IBM Research. His main research goal is building reliable AI solutions. His research interests span several areas in machine learning and artificial intelligence; this includes Bayesian inference, deep generative modeling, uncertainty quantification and learning with limited data. His current work focuses on developing theory and practical systems for machine learning applications that demand constraints such as reliability, fairness, and interpretability. He is a core contributor to several open-source trustworthy AI toolkits - AI Fairness 360, AI Explainability 360, and Uncertainty Quantification 360.

Nitin Aggarwal, Head of Cloud AI Service at Google


Bio: Nitin is a Machine learning and Artificial Intelligence (AI) professional with over a decade of Industry experience. He is currently heading the Cloud AI Services team (US Central and HCLS) for Google. He leads AI solutions/products development for Google's strategic enterprise customers and advises building enterprise level AI strategy to Fortune 500 organizations. Analytics India Magazine awarded him as '40 under 40 Data Scientists' in 2021 and Global AI hub added him as Global AI Thought Leader for their 10million.AI project. He also co-authored 2 advanced ML courses on Coursera for Google Cloud.

Hoifung Poon, General Manager, Health Futures, Microsoft


Bio: Hoifung Poon is General Manager at Health Futures in Microsoft Research and an affiliated professor at the University of Washington Medical School. He leads biomedical AI research and incubation, with the overarching goal of structuring medical data to optimize delivery and accelerate discovery for precision health. His team and collaborators are among the first to explore large language models (LLMs) in health applications, from foundational research to incubations at large health systems and life science companies, and ultimately to productization. He has given tutorials on these topics at top conferences such as the Association for Computational Linguistics (ACL), the Association for the Advancement of Artificial Intelligence (AAAI), and Knowledge Discovery and Data Mining (KDD). His research spans a wide range of problems in machine learning and natural language processing (NLP), and his prior work has been recognized with Best Paper Awards from premier venues such as the North American Chapter of the Association for Computational Linguistics (NAACL), Empirical Methods in Natural Language Processing (EMNLP), and Uncertainty in AI (UAI). He received his PhD in Computer Science and Engineering from the University of Washington, specializing in machine learning and NLP.


Accepted Papers

We have accepted 4 papers for presentation at the workshop. All papers will be presented as posters within the workshop. PDF version of the final papers, if provided by the authors, are hyperlinked below.

#1   Improving Primary Healthcare Workflow Using Extreme Summarization of Scientific Literature Based on Generative AI Gregor Stiglic, Lucija Gosak, Primož Kocbek, Leon Kopitar, Prithwish Chakraborty, Pablo Meyer, Zhe He and Jiang Bian.
#13   Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records Shivani Shekhar, Simran Tiwari, Thomas Rensink, Ramy Eskander and Wael Salloum
#12   Bio+Clinical BERT, BERT Base, and CNN Performance Comparison for Predicting Drug-Review Satisfaction Yue Ling
#14   A Novel U-Net Architecture for Denoising of Real-world Noise Corrupted Phonocardiogram Signal Ayan Mukherjee, Rohan Banerjee and Avik Ghose.

Call for Papers

We invite full papers, as well as work-in-progress on the application of data science in healthcare. Topics may include, but not limited to, the following topics (For more information see workshop overview) with special focus on generative models for healthcare.

  • Synthetic data
    • Training data augmentation, e.g. in computer vision, medical imaging algorithm
    • Physics- and Chemistry- based generative models
    • Simulated data and privacy preserving algorithms
    • In-silico clinical trials
    • Testing data, e.g. synthetic ground truth
    • Generative AI for tabular data
    • Interpretability
  • Privacy and security of generative AI
    • Inverse models for source verification
    • Watermark for AI generated data
    • Factual capabilities of generative AI
  • Testing and evaluation of the generative models
    • Sim2Real domain gap
    • Data selection & quality aspects of the data (distribution shifts, monitoring of the models)
    • Fact-checking
    • Generating new healthcare-specific benchmarks
    • Bias detection and mitigation in healthcare
    • Reliability and trustworthiness of the generative models (actionable plans)
  • Application of LLMs
    • Systematic literature review
    • Modernizing pharmaceutical call center operations
    • Chatbot for patient registration, triage, scheduling, and rooming
    • Semantic data augmentation
    • Others
  • Responsible use of Generative AI
    • Generative AI Fairness and Bias detection
    • Generative AI bias mitigation (e.g., adversarial training)
    • Generative AI model transparency
    • Generative AI ethics and responsible AI risk management
  • Other
    • Knowledge representation learning

Papers must be submitted in PDF format to easychair https://easychair.org/conferences/?conf=dshealth2023 and formatted according to the new Standard ACM Conference Proceedings Template . Authors are encouraged to use the Overleaf template . Papers must be a maximum length of 4 pages, excluding references.

The program committee will select the papers based on originality, presentation, and technical quality for spotlight and/or poster presentation.


Key Dates (AOE)

  • Paper Submission opens: Apr 30, 2023
  • Paper Submission deadline: May 23, 2023Jun 15, 2023
  • Acceptance Notice: Jun 23, 2023Jul 07, 2023
  • Workshop Date: Aug 07, 2023

Unless otherwise specified, all deadlines correspond to 11:59 PM Hawaii Standard Time ( HST). Workshop will be held based on Eastern Daylight Time (EDT).


Organizers