For more than a decade, coronary heart disease and stroke have retained grim titles as the world’s two leading causes of death, and now claim over 15 million victims each year. Earlier this month, 3Clouders hit the streets once again, and marched alongside survivors, volunteers, and other engaged community members as participants in the American Heart Association’s annual Tampa Bay Heart Walk. We’re proud to have contributed to a record-breaking fundraising cycle, which supports health education, community outreach and the utilization of advanced algorithms in life-saving research.
The American Heart Association, our nation’s leading funder of both heart disease and stroke research outside the U.S. Government, has embarked on an aggressive campaign to promote cardiovascular health, aiming to reduce stroke and heart disease fatalities by 20%. Billions of dollars have been contributed to cutting-edge medical studies, which has helped fund research for 13 Nobel laureates. At their Institute for Precision Cardiovascular Medicine, researchers are employing machine learning techniques in a cloud-based platform to perform image recognition tasks on echocardiograms (images of the heart captured via ultrasound) to evaluate cardiovascular health. While a cardiologist would need to browse images one-by-one, machine learning algorithms can process enormous medical data sets much more quickly. It’s encouraging to know that the funds we help raise for the AHA each year will help support this level of life-saving research innovation.
While none of us 3Clouders are medical experts, we were curious what trends might present themselves through a brief analysis of some county-level data on cardiovascular disease obtained from the National Cardiovascular Disease Surveillance System. We joined these data with a variety of county-level socioeconomic indicators and health statistics courtesy of the U.S. Census Bureau, The Institute for Health Metrics and Evaluation, and the Center for Disease Controls Division of Population Health.
For this data exploration exercise, we made use of Azure Machine Learning Studio, an integrated environment for development, testing, and deployment of predictive analytics solutions and machine learning workflows.
From the CDC’s county-level diabetes data set, we selected prevalence of diabetes, physical inactivity, and obesity as features to explore. We cleansed the data of records with unreported values, then joined them with county-level smoking rates, median household income, and our variable of interest for this exercise, heart-disease mortality.
A correlation matrix produced with these features suggests a moderate-to-high degree of linear association between variables, and as predictors in a statistical model, these features may contribute similar information.
After some exploratory data visualization, we found evidence of non-linear relationships between our features and heart disease mortality. One instance of such evidence is reflected below in the curved nature of the plot of physical inactivity versus heart disease mortality.
We selected decision forest regression, which is robust to non-linear relationships,to predict heart disease mortality by county after splitting data into a training and test set. Both sets were fed into an experiment item called Permutation Feature Importance, which measures model performance before and after predictors are removed, one at a time. The predictors are then ranked by the decrease in the model’s coefficient of determination, R2, caused by the predictor’s absence. This method, as opposed to simply comparing the correlations between our features and heart disease mortality, accounts for possible non-linear relationships.
The output of the permutation feature importance task conveys similar information to the correlation matrix, which reinforces a key message the AHA continues to promote: to minimize our risk of developing cardiovascular diseases, it’s imperative that we take care of ourselves.
The pace of everyday life is increasingly brisk, and our frenetic modern lifestyles have exposed the limits of our cognitive bandwidth. It’s all too easy to slip into unhealthy habits, while passing up on important measures for reducing our overall susceptibility to these deadly diseases. Having the heart to support groups like the AHA is a great way to help advance medical science, education, and training. Check out their recommendations for healthy eating, fitness, stress management, and more on their website. And if you find yourself in the Tampa Bay area this time next fall, we’d love for you to come walk with us!
Learn more about incorporating advanced analytics and machine learning into your business by contacting us here.