Generative models have a long history and there are many application areas in medical machine learning (ML) and artificial intelligence (AI). With the development in deep neural networks, researchers focused on Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and autoregressive models in the past years. More recently, very large deep generative models have gained popularity, including the large language models (LLMs) such as Generative Pre-trained Transformer 3 (GPT-3) and text-to-image diffusion models such as Stable Diffusion. In healthcare research, one of the most common applications of generative models has been the generation of synthetic data for training of machine learning models. It is often used to increase representation of patient subgroups to improve generalization and mitigate algorithmic biases. This is especially valuable in application domains where data is hard to come by. The generative models can also be used for specific model evaluation purposes (e.g., within a robustness or generalizability assessment; virtual clinical trials). They can help to generate synthetic ground truth data when labeling of data is extremely burdensome. Moreover, generative models have been successfully applied in data preprocessing or enhancement, such as image reconstruction or denoising deep learning algorithms in the medical imaging space. While such generative models have proven their utility in the health domain, many open questions remain with regard to the approaches for evaluation of their effectiveness and safety. Testing and evaluation of such models require specific considerations. Taking the assessment of the gap between the generated data and the reality — so called Sim2Real challenge — as an example, it is often unclear how to (i) quantify this domain gap and its impact on downstream performance in a meaningful manner and (ii) reduce it in order to fully leverage the potential of generative models. New challenges are also emerging on a more grand scale. The recent advances in Large Language Models (LLMs) makes the generation of data even more effortless. However, the misinformation that is generated with such models may cause a “pollution” of data for future model training. We can expect an increased need for effective fact checking approaches. Despite the huge growth of this area of research, the actual use of NLP technology for fact checking is still in its infancy. In this one day workshop we would like to discuss some of the most common applications of generative models in the ML/AI research in the healthcare domain, the current challenges and also explore what are the potential new areas of application
DSHealth 2023 will be held
on August 07, 2023. See detailed schedule below.
Please refer to the KDD 2023 for up-to-date changes on venues and timings.
1:00 pm - 3:00 pm
|1:00 pm - 1:35 pm||Invited Talk:
|1:35 pm - 2:10 pm||Invited Talk:
|2:10 pm - 2:45 pm||Invited Talk:
|2:45 pm - 3:00 pm||Spotlight Presentation:
#1: Improving Primary Healthcare Workflow Using Extreme Summarization of Scientific Literature Based on Generative AI
#13: Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records
|3:00 pm - 3:30 pm||Break|
3:30 pm - 5:00 pm
|3:30 pm - 3:45 pm||Spotlight Presentation:
#12: Bio+Clinical BERT, BERT Base, and CNN Performance Comparison for Predicting Drug-Review Satisfaction
#14: A Novel U-Net Architecture for Denoising of Real-world Noise Corrupted Phonocardiogram Signal
|3:45 pm - 4:20 pm||Invited Talk:
|4:20 pm - 5:00 pm||Panel: with all speakers
Moderated by Sabrina Hsueh
Bio: Dr. Deepti Pandita is currently the chief Medical Information officer at University of California Irvine Healthsystem, the only safety net academic medical center and also a Level one Adult and Pediatric Trauma center in Orange County. Dr. Pandita is Board Certified in Internal Medicine and in Clinical Informatics. She was previously CHIO and Program Director of the Clinical Informatics Fellowship at Hennepin Healthcare in Minneapolis, MN. Dr. Pandita is a Board member of the American Medical Informatics Association and Chair of the Medical Informatics Committee for the American college of Physicians leading several National committees and initiatives. She has led numerous sessions at AMIA CIC on Generative AI for Health.
Bio: Dr. Prasanna Sattigeri is a Principal Research Scientist at IBM Research. His main research goal is building reliable AI solutions. His research interests span several areas in machine learning and artificial intelligence; this includes Bayesian inference, deep generative modeling, uncertainty quantification and learning with limited data. His current work focuses on developing theory and practical systems for machine learning applications that demand constraints such as reliability, fairness, and interpretability. He is a core contributor to several open-source trustworthy AI toolkits - AI Fairness 360, AI Explainability 360, and Uncertainty Quantification 360.
Bio: Nitin is a Machine learning and Artificial Intelligence (AI) professional with over a decade of Industry experience. He is currently heading the Cloud AI Services team (US Central and HCLS) for Google. He leads AI solutions/products development for Google's strategic enterprise customers and advises building enterprise level AI strategy to Fortune 500 organizations. Analytics India Magazine awarded him as '40 under 40 Data Scientists' in 2021 and Global AI hub added him as Global AI Thought Leader for their 10million.AI project. He also co-authored 2 advanced ML courses on Coursera for Google Cloud.
Bio: Hoifung Poon is General Manager at Health Futures in Microsoft Research and an affiliated professor at the University of Washington Medical School. He leads biomedical AI research and incubation, with the overarching goal of structuring medical data to optimize delivery and accelerate discovery for precision health. His team and collaborators are among the first to explore large language models (LLMs) in health applications, from foundational research to incubations at large health systems and life science companies, and ultimately to productization. He has given tutorials on these topics at top conferences such as the Association for Computational Linguistics (ACL), the Association for the Advancement of Artificial Intelligence (AAAI), and Knowledge Discovery and Data Mining (KDD). His research spans a wide range of problems in machine learning and natural language processing (NLP), and his prior work has been recognized with Best Paper Awards from premier venues such as the North American Chapter of the Association for Computational Linguistics (NAACL), Empirical Methods in Natural Language Processing (EMNLP), and Uncertainty in AI (UAI). He received his PhD in Computer Science and Engineering from the University of Washington, specializing in machine learning and NLP.
We have accepted 4 papers for presentation at the workshop. All papers will be presented as posters within the workshop. PDF version of the final papers, if provided by the authors, are hyperlinked below.
|#1 Improving Primary Healthcare Workflow Using Extreme Summarization of Scientific Literature Based on Generative AI Gregor Stiglic, Lucija Gosak, Primož Kocbek, Leon Kopitar, Prithwish Chakraborty, Pablo Meyer, Zhe He and Jiang Bian.|
|#13 Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records Shivani Shekhar, Simran Tiwari, Thomas Rensink, Ramy Eskander and Wael Salloum|
|#12 Bio+Clinical BERT, BERT Base, and CNN Performance Comparison for Predicting Drug-Review Satisfaction Yue Ling|
|#14 A Novel U-Net Architecture for Denoising of Real-world Noise Corrupted Phonocardiogram Signal Ayan Mukherjee, Rohan Banerjee and Avik Ghose.|
We invite full papers, as well as work-in-progress on the application of data science in healthcare. Topics may include, but not limited to, the following topics (For more information see workshop overview) with special focus on generative models for healthcare.
Papers must be submitted in PDF format to easychair https://easychair.org/conferences/?conf=dshealth2023 and formatted according to the new Standard ACM Conference Proceedings Template . Authors are encouraged to use the Overleaf template . Papers must be a maximum length of 4 pages, excluding references.
The program committee will select the papers based on originality, presentation, and technical quality for spotlight and/or poster presentation.