Step-by-Step Exercises for Aspiring Data Scientists

Work through the challenges on this page to gain experience in clinical informatics and AI in healthcare.

Many apps require a Northwestern NetID. If you would like access but don’t have a NetID, please make your request here. Include a description of why you would like access.

Create an AI Generated Podcast

Here’s a simple activity to get you started — use Google’s NotebookLM to generate your own podcast-style audio overview using AI.

We followed these steps to create a podcast discussing a new study on LLMs and diagnoses:

  1. Upload a PDF of the content

  2. Click on “Customize” and enter prompts

  3. Select “Generate” and wait for your podcast

Listen to our podcast here (must sign in to Google for access). Our result explained AI, large language models, and the research fairly well, but it was also wordy and repetitive.

Try it out and let us know how it goes for you!

Try Running Your Own Local AI

Not running your own local AI yet? It’s worth a few minutes - free and easy

If you would like to have your own private, pretty smart AI, it’s easy as 1, 2, 3:

  1. Install Ollama: https://ollama.com/download. Ollama is an open-source tool that enables users to build and run large language models (LLMs) on their own computers.

  2. Choose the model:

    a. For a high performance laptop (e.g., M3, M4 with at least 64Gb RAM and 45 Gb free disk), use Meta’s latest: Open your terminal and enter: ollama run llama3.

    b. For other computers, use one of Google’s latest: Open your terminal and enter: ollama run gemma2

  3. Ask your questions! (Much more sophisticated setups are possible, but this will quickly and easily get you chatting.)

This model will run securely on your machine without privacy concerns (assuming you have rights to have the content on your laptop in the first place). Pipelines that leverage up-to-date medical references while maintaining data privacy are possible, too.

View an example from llama3.3.

Colab Notebooks for Cloud-Based
Analytics Using Python

Dive into the world of cloud-based data analysis with our series of Colab Notebooks. These interactive tutorials are designed to guide you through foundational aspects of Python programming.

Google Collab logo
  • Begin your journey with an Introduction to Colab Notebooks. This module provides an overview of the Colab environment, setting up your workspace, and executing your first Python commands. Perfect for those starting or needing a refresher.

  • Discover the power of data visualization in Part 2. Learn how to create compelling visual narratives with your data using Python libraries such as Matplotlib and Seaborn.

  • Delve into predictive modeling and AI in Part 3. Explore machine learning algorithms, train models, and evaluate their performance in predicting health outcomes.

  • Complete an exercise that applies machine learning techniques for diabetes prediction and chest X-ray analysis using ML workflows from data preparation to model training, evaluation, and interpretation.

  • You'll soon learn how oddly powerful neural network models are. They are even referred to as "deep learning" when they have multiple layers and can be used to generate highly accurate predictions!

  • Query and retrieve content from a large language model (LLM) and learn about more features, culminating in your own app!

Full Colab Notebook for Data Science

Orange for Data Science

Orange is an open-source data visualization and analysis tool for both novices and experts. With its intuitive graphical user interface, you can explore data science without deep programming knowledge. A NU NetID is required for access to the tutorials and assignments.

First Complete Video Tutorials

Then Complete Assignments

Orange logo

AutoAnalyzer for Simplifying Analysis

AutoAnalyzer is a state-of-the-art tool that helps users process datasets and learn about data science and machine learning. It simplifies the process of data analysis, allowing you to focus on interpreting the results.

  • Analyze hosted datasets, define outcomes, and generate receiver operating characteristic (ROC) curves and confusion matrices to evaluate predictive models.

  • Understand and explain the benefits of a Shapley force plot, a tool for interpreting machine learning models.

  • Upload public access datasets, create violin plots, and perform advanced predictive modeling.

LLM Use Cases for Saving Time, Avoiding Burnout, and Learning

Large language models (LLMs) have the potential to help physicians and medical students save time, avoid burnout, and learn. They can synthesize data to help create patient notes and summaries, provide opportunities for practice, and make learning about specific health topics a little easier. 

Take a look at these short videos describing some of the benefits of LLMs for medical education, and then try out the apps.

Learn more about LLM use cases here

Run a Local LLM from a Jupyter Notebook in VSCode

Experiment with a free local language model (LLM) that you download and run on your local computer. Learn about LLM underpinnings and interact with your own chatbot.

Because it doesn’t run online, privacy and security issues are minimized.

Use this project to design your own:

  • Workflow

  • Design

  • Data processing pipeline

  • And more

Design Your Own App with Replit

Check out our tutorial on editing code for an app. This Replit link allows any user to modify a base functioning Streamlit app and customize it. Just visit Replit and enter your OpenAI API Key. Then click “Run” at the top, wait couple minutes, and you will be able to edit the code. For a fee, Replit has options for immediate scalable deployment.

If you don’t have an OpenAI API Key, sign up for one at OpenAI.com.

Increase Skills in Data Science

Kaggle is a community of people interested in artificial intelligence and machine learning. It offers a platform to stay current on the newest technologies and techniques. The HDG enjoys exploring Kaggle learning opportunities that we use to hone our data science skills.

Customize an app

NUIT Research Computing and Data Services

Northwestern University Information Technology (NUIT) supports faculty, student, and staff researchers by providing data science and visualization, computing, and data-management expertise (NU NetID required).

photo of poeple

Community

Whether you’re just starting out or are broadening your health data science skills, being part of a community helps. Ideas for new datasets or tools? Suggestions for building the community? Let us know.

Join our email list

Stay connected by receiving the latest news and information about opportunities.