Why Genomics environments need to be privatized in Azure to meet PHI compliance.

The data that is used in an environment will dictate some of the infrastructure requirements for usage in order to pass muster with security organizations. When we are dealing with any PHI covered in HIPAA or FEDRAMP we don’t have a choice, we will have to provide the configuration options for the Cromwell Architecture that meets Microsoft standards for regulatory compliance. Beyond the Microsoft requirements most organizations have strict PHI controls and tooling that needs to be integrated to provide end to end security for the Cromwell environment. To accomplish this task as well as provide a means of doing development work and data orchestration we build an enclave.

When running Genomics workloads there is typically optional open-source data and anonymized clinical data for research and a need to leverage live patient data for clinical operations. When there are no security control requirements around the data sets, we are free to leverage the base Azure Cromwell deployment without any consideration of the data layer which is incredibly useful for initial development and testing of pipelines. To make an environment like this actionable we need to provide a means of bringing workloads into production and a framework for onboarding to a net new security footprint. This involves factoring in security tooling and monitoring requirements and ensuring dependencies are mapped and data flows can be fully planned out in the end state architecture.

For a Research Hospital in Rochester New York, we had to approach the securing of each piece of the Cromwell on Azure architecture as well as laying in modern data warehousing for staging data. The first step was working directly with the Microsoft Biomedical Platforms & Genomics team for Cromwell to refactor the execution to leverage private container registries and supporting resources. Once this was complete, we had to define a method of staging containers from the Microsoft Container Registry into the environment via an intermediary system with internal access. Finally, there needed to be a secure data ingress and egress method defined, in this case leveraging Blob storage and data connections from on premises which could easily be augmented to use Azure based data warehousing. To pass security muster all this process had to be firewalled, there was a method of validating the Microsoft Containers and testing then pushing into the Azure Container Registry, Data source/destination firewalls and network paths were created, and all Cromwell assets were centrally logging all stages of usage in Log Analytics with a fork to Event Hubs for outside SIEM integration.

Cromwell on Azure being privatized provides a number of moving pieces that need to be dealt with, but it creates a secure space to help scale out research and a procedure for long term usage. The deployment is procedural and could be automated, but once it is set up and properly governed scale should be easy. The underlying components that power the platform can burst to meet need and there are multiple governance and cost recovery models that can be applied to the overall environment to ensure operational standards. In the case of our Research Hospital implementation, we did a singular environment per Organization which tied their cost centers back to an internal project. This can be scaled out via automation to pre stage and build environments on demand as well with some additional effort.

The end result is the ability to securely deploy and onboard data into a Cromwell on Azure environment with all of the security boxes checked to ensure adherence to regulatory requirements as well as options for integration with organization security footprints.