Destroying data silos is a quest any organization transitioning to data-driven must undertake. Some see success while others fail after valiant (i.e. expensive) efforts. Even for those that think they’ve succeeded in killing off their data silos, the need to stay vigilant is ever present because data silos are like cockroaches. You think you’ve killed them all, so you relax for just a minute, and they’re back!
In this series, we’ll discuss some options for eliminating your existing data silos, how to ensure new ones don’t pop up and, finally, how to make the most of your data once it’s unified.
- The first option is to build a data warehouse. Here you will be bringing select data from select systems into a central repository where the data is normalized and prepped
- The second option is to build a data services layer where data engineers (technical users) can query disparate repositories and deliver a variety of blended data sets
- The third option is a hybrid that includes a data warehouse and a data services layer.
- The fourth option is a “data lake” where all data is moved into a massively scalable repository, like Hadoop or Spark, and tools are placed on top to enable querying.
Bridging Your Data Silos Using a Data Warehouse
Let’s talk healthcare for a minute. Healthcare data is ugly. It’s big. It’s a mix of structured and unstructured data. It must be secured. It’s stored in a variety of different systems. The combination of these traits makes sharing healthcare data a bit of a nightmare for even the most technically sophisticated hospital networks. However, the upside of being able to efficiently share data across multiple departments is better patient outcomes, reduction in claim denials and improved financial performance, etc., so worth the effort. Let’s take a look at what Shirley Ryan AbilityLab’s did with Sagence Consulting (a Knowi partner) to break down their data silos and implement a solution that enables data sharing across multiple departments within their hospital network.
The Shirley Ryan AbilityLab, formerly the Rehabilitation Institute of Chicago (RIC), is the #1-ranked global leader in physical medicine and rehabilitation for adults and children with the most severe, complex conditions — from traumatic brain and spinal cord injury to stroke, amputation, and cancer-related impairment. Shirley Ryan AbilityLab is the first-ever “translational” research hospital in which clinicians, scientists, innovators, and technologists work together in the same space, 24/7, surrounding patients, discovering new approaches and applying (or “translating”) research real time.
Obviously, data plays a core role in their mission but was often locked in disparate repositories across the hospital, limiting the ability of administrators and clinicians to fully leverage it. A textbook example of data silos limiting an otherwise sophisticated data-driven culture.
AbilityLabs decided to implement a healthcare data warehouse strategy to serve data to all their departments from patient outcomes to finance. They selected Sagence Consulting to assist them in building the first iteration of the data warehouse.
Before the coded their first query, the Sagence and AbilityLab team spent a considerable amount of time planning before building the first data pipeline. Data warehouses take time to develop so doing the right preparation upfront is essential.
Set Impactful but Achievable Goals
This can be summed up in the old adage “Don’t try to boil the ocean.” A critical factor for success in a data warehouse project is to build something that actually makes things better for people. This means giving people access to data they didn’t have before or making it significantly easier and faster for them to access existing data. I know it sounds obvious, but you’d be surprised.
At the same time, be careful not to get too far over your skis and try to deliver the something so “revolutionary” that it requires specialized technology or skills to implement or use. If you keep saying to the team, “I know this sounds complicated but it will change everything if we can do it.” Stop. Step Back. Rethink.
Get Buy-in at All Levels
I hear a lot that “we’ve got management buy-in and executive level sponsorship” so teams think they are all set and once the data warehouse is up, people will line up to get their user account. Well… not so much. Change is hard for most people especially ones who perceive their roles as data gurus where they are the keepers of “the spreadsheet.” These people are incredibly vital to the success of your project so dismiss them at your own peril.
They can help you understand where the bodies are buried when it comes to data related processes. They know what data is good and what data is bad and, usually, why. The key is to show them how the technology will make their lives better so they can start using data to further their goals rather than spending all their time collecting, cleaning and preparing data for others to use. They will become the data warehouses greatest advocates and help with the significant task of change management.
Understand Current State of Available Data
This step can take the longest because it often morphs into a data quality and data entry process analysis exercise. Data quality is the elephant in the room when it comes to building a data warehouse or any kind of data analytics platform, for that matter. You want to start off your new data warehouse with pristinely accurate and complete data. Good luck with that.
Did I mention data is ugly? Naturally, some cleansing and improvement of data must happen but don’t get obsessed that every field must be complete and every piece of information validated. Your time is better spent addressing the root cause of the data quality issues and adjusting data collection and data entry processes. This will resolve data quality issues in the long-term. With a couple of concentrated efforts to address legacy data issues, your data quality will get there.
Build Processes People Can Actually Follow
That gets to my last point. Data collection and data entry processes. If you have data quality issues, they can probably be traced to requiring people to enter too much data into too many systems. Wherever possible, automate integration between systems. For data that must be entered, keep the amount of data required to a minimum, at least in the beginning. Expecting people to enter data into one system and turn around and enter similar data into another is not going to help your data quality issues.
If you cannot automate the integration, try to reduce the number of systems that need the data and use the data warehouse to provide a centralized view of information vs. each system. I know business needs often dictate a different path but think about how you can leverage your data warehouse to actually minimize the amount of data that is duplicated across systems.
Sagence Consulting are experts in data so helped AbilityLab create a strategy that resulted in a successful deployment of an enterprise data warehouse built on PostgreSQL within 6 months of kicking off the project. Knowi provides the analytics and visualizations for the embedded dashboards used by multiple departments across the Shirley Ryan AbilityLab Hospital network.
We recently did a webinar with Sagence were they went through in detail the architecture they deployed to support Shirley Ryan’s Healthcare data warehouse. In the webinar, the team from Sagence walked through three different use cases including managing research project financial, claims management and data quality management.