
Data insights with quality tools
Pump some iron! Listed below are a number of tools we’ve found most useful for our research and data analysis.
Tool types explained
Data mining: Find anomalies, correlations, relationships, and patterns
Machine learning frameworks: Create, train, test, and implement machine learning models
Mapping: Connect data to geographic locations, such as how home location affects health
Notebooks: Convenient, interactive way to present programs, descriptions, and program output
Package managers: Used to install, update, and keep track of collections of software products
Programming languages: Most flexible tools; form the active elements of notebooks and the structure around frameworks
Statistics: Range from simple min, max, average, and median to various types of regression and classification trees
Get in some reps with Orange and Google Collab
Wondering where to begin exploring tools? Check out "Getting Started: Digital Healthcare and Data Science" for exercises in Orange and Google Collab.
The Tools
-
Anaconda is a popular platform that offers an easy way to perform data science and machine learning using Python or R. It includes a package manager with a graphics user interface (GUI). Thousands of open-source packages and libraries are available. The full Anaconda installation is comprehensive but very large.
—Type: Package manager with collection of packages
—License: Depends on the edition; the "Anaconda Distribution" is free
—Usage: Download -
ArcGIS is a geographic information system software used to analyze location data and create interactive maps. Share insights with the general public or with specific groups.
—Type: Mapping
—License: Commercial; Northwestern University has site license
—Usage: Use online -
DeGAUSS (Decentralized Geomarker Assessment for Multi-Site Studies) is a decentralized method for geocoding that maintains the privacy of protected health information. It is useful in studying environmental and social determinants of health
—Type: Mapping
—License: Free
—Usage: Local computer -
Colab, or "Colaboratory", is a fully-built environment for program development. It allows you to write and execute Python in your browser, with: 1) Zero configuration required, 2) Access to GPUs to process many pieces of data simultaneously, 3) Free of charge,4) Easy sharing.
—Type: Notebook
—License: Requires Google account
—Usage: Visit the website and use -
Jupyter provides a notebook programming environment which is heavily used in machine learning and other areas of science. A notebook has a set of rows, each of which can contain plain text, formatted text, or an executable program written in any of dozens of languages, including Python and R.
—Type: Notebook
—License: Free
—Usage: Run on your computer or a Web server -
Orange helps users create visual workflows for data mining, machine learning, and data visualization. Focus on exploratory data analysis instead of coding.
—Type: Data mining
—License: Free
—Usage: Download -
Python is the most-used programming language worldwide, especially in artificial intelligence, machine learning, and data science. It can be extended with thousands of "modules.”
—Type: Programming language
—License: Free
—Tutorial
—Beginner's Guide
—Usage: Download -
PyTorch is an open-source machine learning framework used to create, train, and run models. PyTorch runs with Python or C++.
—Type: Machine learning framework
—License: Free
—Usage: Install locally or use a supported cloud platform -
R is a software environment for statistical computing and graphics, with its own programming language.
—Type: Statistics
—License: Free
—Usage: Download -
SAS is software for statistical analysis, data mining, graphics, and forecasting.
--Type: Statistics
--License: Commercial product; University discount; "University Edition" is free
—Usage: Download -
TensorFlow is an end-to-end open source platform for machine learning used to create, train, and run models. It was created by Google. Google Cloud Platform has Tensor Processing Units that specifically accelerate TensorFlow.
—Type: Machine learning framework
—License: Free
—Usage: Download or use online -
Streamlit is free and open-source, enabling users to build and share machine learning and data science apps. The Python-based framework is helpful for machine learning engineers.
—Type: Machine Learning Framework
—License: Free
—Usage: Use online
Join our community
Sign up and stay connected to receive the latest news and information about opportunities.