One topic that hasn’t been covered on Dear Analyst is master data management (MDM). I’m surprised it took this long before someone brought it up. I’ve never heard of the term before and it looks like it’s a core strategy for many large corporations for manager their data. Korhonda Randolph studied systems engineering at the University of Pennsylvania and started her career in engineering. She started specializing in master data management at companies like AutoTrader, Cox Automotive, and SunTrust/BB&T (merger). In this episode, Korhonda discusses what master data management is, data cleansing the CRM at AutoTrader, and the various data issues you have to work through during a merger between two banks.
A “master” record in master data management
The definition of master data management according to Wikipedia is pretty generic:
Master data management (MDM) is a technology-enabled discipline in which business and information technology work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise’s official shared master data assets.
After doing some quick research, MDM is closely associated with data quality and data governance. The cynical side of me says this is one of those disciplines that was created by data vendors way back when. But given the size and scope of the projects the MDM discipline is used in, it’s very likely I just have never had any experience with people who have utilized this discipline.
At a high-level, the goal of MDM is very simply. Create a “master” record for a customer, product, or some other entity that doesn’t change very much. Korhonda discusses working on customer data where properties like the first and last names of a customer would be an output of MDM. This data should stay consistent no matter what team or department is looking at the customer data.
Data cleaning CRM data at AutoTrader
AutoTrader was trendsetting in the field of data. Early on, data architects created their own MDM systems to manage customer data. If the MDM system is not created properly, then other systems would not function correctly. Korhonda’s team was using Hadoop because AutoTrader works with many car dealerships who need data to help them with their businesses.
Korhonda started as a project manager at AutoTrader helping coordinate all the moving parts of AutoTrader’s MDM system. Eventually she became a solutions architect on the data side.
I’ve talked about data cleaning in multiple episodes and I’ve discovered a few things about the process over the years:
- Excel and SQL are still the main tools used for data cleansing
- The same type of data problems exist at startups and large corporations alike
At AutoTrader, they were trying to figure out if client A in the sales system was also client A in another system. There is missing data across systems, and AutoTrader would try to find 3rd-party data sources to fill the gaps in the customer data. They may even contact the customer directly to get the data the need. At the end of the day, this type of data problem is not unique to AutoTrader. To this day, it still surprises me how simple and universal these data quality issues are.
Korhonda also discusses “systems of engagement.” These are the interfaces (e.g. a form on a website) where data is entered by a customer. These systems of engagement have to ensure that all the required information is captured such as birthdays. It’s like Amazon validating you entered your address correctly before shipping you a package.
“Analysts make the data flow”
Once the MDM system was in place, AutoTrader had a single source of truth for things like customers and dealerships. There was no more duplicate data. According to Korhonda, this had profound operational impact on the business. That feeling when your data is all cleaned up can only be summed up as:
Korhonda talks about how data analysts are becoming more important at organizations where there are tons of data that needs to be analyzed. She says data analysts are just as important as the data engineers who are creating the back-end systems.
Analysts make the data flow.
Engineers are great at building systems, but knowing the right data to include in the system is where business owners come into play. Business owners are subject-matter experts who know about the business rules in the organization, and what type of data would make sense to include in the system.
Merging client data between SunTrust and BB&T
In 2019, BB&T Corporation and SunTrust Banks merged to become Truist Financial Corporation. SunTrust and BB&T were banks based primarily in the southeast of the U.S. These two banks had an overlapping footprint, so there were many customers who belonged to both banks. Behind the scenes, Korhonda was in charge of merging the customer data between these two banks. The customer data had missing birthdays, missing names, and overall there were a lot of legacy processes creating dirty data. Needless to say, it was a mess.
There are a variety of bank regulations that I don’t care about getting into, but it’s interesting to note how these regulations impact the data processes Korhonda dealt with. For instance, there are federal rules about how much a customer can deposit at a bank. If the customer deposits too much, they get added to a special report. As a result, a clean list of customers was needed for the regulators before the merger could go through.
Korhonda acted as the project manager and worked with business stakeholders to sign off on all the rules for the MDM system that was being developed. Each bank had thousands of processes for collecting and storing data, and small differences had a large impact on the project.
For instance, one system might have a 40-character limit for an address but the other system had a 50-character limit. Do you increase the field size to the larger 50 characters? Do you truncate longer addresses? Korhonda and her team had to make decisions like this thousands of times taking into feedback from a variety of stakeholders.
Advice for companies working with dirty data
We ended the conversation on advice Korhonda has for organizations working with a lot of data that needs to be queried and cleaned up. Data lineage is a hot buzzword in the data infrastructure world (see episode #59 for a list of some companies in the data lineage space). In a nutshell, data lineage tools help you visualize how your data flows from the source all the way to when it gets consumed (typically by data analysts and business users). Referring to the merger example, Korhonda said having a robust data lineage platform would help you with issues like field lengths changing.
In addition to maintaining these data flow diagrams, Korhonda made a final plug for having MDM professionals maintaining an organization’s MDM systems. Sometimes the MDM systems are owned by a systems architect or a DBA, but these people may not see or know the overall picture of the data system.
In terms of advice for data analysts, Korhonda said that it’s more than just knowing how to write SQL. You have to know how to tell the story if you want to make an impact. The data storytelling skill has been repeated quite a few times in previous episodes.
Be a visionary and display data in way that’s easy to understand.
Other Podcasts & Blog Posts
No other podcasts mentioned in this episode!