Data insights with quality tools

Pump some iron! Listed below are a number of tools we’ve found most useful for our research and data analysis.

Tool types explained

  • Data mining: Find anomalies, correlations, relationships, and patterns

  • Machine learning frameworks: Create, train, test, and implement machine learning models

  • Mapping: Connect data to geographic locations, such as how home location affects health

  • Notebooks: Convenient, interactive way to present programs, descriptions, and program output

  • Package managers: Used to install, update, and keep track of collections of software products

  • Programming languages: Most flexible tools; form the active elements of notebooks and the structure around frameworks

  • Statistics: Range from simple min, max, average, and median to various types of regression and classification trees

Get in some reps with Orange and Google Collab

Wondering where to begin exploring tools? Check out "Getting Started: Digital Healthcare and Data Science" for exercises in Orange and Google Collab.

The Tools

  • Anaconda logo

    Anaconda is a popular platform that offers an easy way to perform data science and machine learning using Python or R. It includes a package manager with a graphics user interface (GUI). Thousands of open-source packages and libraries are available. The full Anaconda installation is comprehensive but very large.
    —Type: Package manager with collection of packages
    —License: Depends on the edition; the "Anaconda Distribution" is free
    —Usage: Download

  • ArcGIS logo

    ArcGIS is a geographic information system software used to analyze location data and create interactive maps. Share insights with the general public or with specific groups.
    —Type: Mapping
    —License: Commercial; Northwestern University has site license
    —Usage: Use online

  • DeGAUSS logo

    DeGAUSS (Decentralized Geomarker Assessment for Multi-Site Studies) is a decentralized method for geocoding that maintains the privacy of protected health information. It is useful in studying environmental and social determinants of health
    —Type: Mapping
    —License: Free
    —Usage: Local computer

  • Colab, or "Colaboratory", is a fully-built environment for program development. It allows you to write and execute Python in your browser, with: 1) Zero configuration required, 2) Access to GPUs to process many pieces of data simultaneously, 3) Free of charge,4) Easy sharing.
    —Type: Notebook
    —License: Requires Google account
    —Usage: Visit the website and use

  • jupyter logo

    Jupyter provides a notebook programming environment which is heavily used in machine learning and other areas of science. A notebook has a set of rows, each of which can contain plain text, formatted text, or an executable program written in any of dozens of languages, including Python and R.
    —Type: Notebook
    —License: Free

    —Usage: Run on your computer or a Web server

  • Orange logo

    Orange helps users create visual workflows for data mining, machine learning, and data visualization. Focus on exploratory data analysis instead of coding.
    —Type: Data mining
    —License: Free
    —Usage: Download

  • Python logo

    Python is the most-used programming language worldwide, especially in artificial intelligence, machine learning, and data science. It can be extended with thousands of "modules.”
    —Type: Programming language
    —License: Free
    Tutorial
    Beginner's Guide
    —Usage: Download

  • PyTorch

    PyTorch is an open-source machine learning framework used to create, train, and run models. PyTorch runs with Python or C++.
    —Type: Machine learning framework
    —License: Free
    —Usage: Install locally or use a supported cloud platform

  • R logo

    R is a software environment for statistical computing and graphics, with its own programming language.
    —Type: Statistics
    —License: Free
    —Usage: Download

  • SAS logo

    SAS is software for statistical analysis, data mining, graphics, and forecasting.
    --Type: Statistics
    --License: Commercial product; University discount; "University Edition" is free
    —Usage: Download

  • TensorFlow logo

    TensorFlow is an end-to-end open source platform for machine learning used to create, train, and run models. It was created by Google. Google Cloud Platform has Tensor Processing Units that specifically accelerate TensorFlow.
    —Type: Machine learning framework
    —License: Free
    —Usage: Download or use online

  • Streamlit

    Streamlit is free and open-source, enabling users to build and share machine learning and data science apps. The Python-based framework is helpful for machine learning engineers.
    —Type: Machine Learning Framework
    —License: Free
    —Usage: Use online

Join our community

Sign up and stay connected to receive the latest news and information about opportunities.