The world’s leading publication for data science, AI, and ML professionals.

Data Management Strategy: Part 3

Data Integration, Security & Master Data

Picture from Unsplash
Picture from Unsplash

Introduction

This is part 3 of the series of articles related to carrying out and implementing a successful Data Management Strategy within an aspiring Digital Organization.

You can find the the introduction to this series here.

In this article we will focus on the following topics:

  • Data Integration
  • Data Security
  • Master Data

These are key aspects of every Data Management initiative and each one of them will be discussed deeply. Concretely, they will be explored within the following dimensions:

  • People involved (organization)
  • Processes (activities)
  • Technology (the minimum that technological solutions must have to develop each stage)

So, withourt further a do, let’s jump into it!

Data Integration

Data integration refers to the process and technologies that enable the movement of data from source data systems to target data systems. Along the way, data is transformed into information to suit business requirements.

This data must be available from any source, when needed, and with the features needed.

Data Integration Scenarios – ETL

ETL (Extraction, Transformation and Loading) refers to the data integration approach where data is extracted from the Origin Systems, then goes through the transformation process and ends up loading into the Destination System.

The typical ETL scenario is typical used in DataWarehouse systems.

Figure by Author
Figure by Author

Data Integration Scenarios – ELT

EL-T (Extraction, Loading and Transformation) refers to the data integration approach where data is extracted from the source systems and then loaded into the destination system without transformations. The data transformation is performed later in the target system. The ELT scenario is typical of Big Data/Hadoop systems.

Traditional ETL Architecture

  • Less performance
  • Higher cost
  • Transformation Hardware
Figure by Author
Figure by Author

ODI – ELT Architecture

  • Better performance
  • One-load
  • No additional costs of Hardware

Batch Processing vs Real Time in Data Integration

Batch

  • In Batch Processing, a large group of transactions is collected and the data is processed during a single execution.
  • Due to the large volume of data, the process must be executed when resources are less busy (this step is usually done at night).
  • Batch processing delays access to data, requires close monitoring, and data may not be available for a period of time.
  • Due to the delay in data access, knowledge is lost until processing is complete.
  • Problems that occur during batch processing can delay the entire process, so you need staff support to monitor it while it runs.

Real Time

  • Real-time processing processes small groups of transactions on demand.
  • The advantage of real-time processing is that it provides instant access to data runs with fewer resources and improves uptime.
  • With real-time Data Integration, you know your business as transactions occur.
  • If an error occurs, it can be handled immediately.
  • Real-time processing design is more complex.
  • Although real-time processing systems require more effort to design and implement, the benefit to the business can be enormous.

Data Integration Roles

The Data Integration Specialist is primarily responsible for carrying out the activities associated with Data Integration. They will work closely with business owners, data managers, technical owners, data custodians, MDM specialists and data architects.

His main responsibilities relate to the design and implementation of data integration applications, including the definition of mapping specifications, design and implementation of data integration Jobs…

Figure by Author
Figure by Author

Tools for Data Integration

The minimun requirements for every Data Integration Technical Tool for are:

  • Ability to perform batch and real-time processing.
  • Ability to perform data change detection (identify modified records).
  • Ability to perform powerful transformations of structured and unstructured data.
  • Capacity for integral error handling operations.
Figure by Author
Figure by Author

Master Data & Reference Data

Master Data

Master Data refer to that data accorded and shared across the entire organization. Ex: client data, employees, products…

Reference Data

Reference Data are a subset of Master Data that refers to the data that defines the set of allowed values that can be used by other data fields. Ex: Country codes, classification of industrial activities…

The function of this data guarantees the centralized management of this data, which is shared by different departments.

Master Data Management

There are 3 key steps in every succesful Master Data Management:

  1. Identify records with potential matches
  2. Master Data Management applies business rules to combine and merge records
  3. Master Data Managemen creates the master record with trusted attributes

Reference Data Management

There are 3 key steps in every succesful Master Data Management:

  1. Reference Data Management determines records for search
  2. Reference Data Management provides reference tables for searches
  3. Reference Data Management updates records to business standard

The criterion to differiciate Master and Reference data is:

Table by Author
Table by Author

Master and Reference Data Management Roles

The Master Data Management(MDM) specialist is the primary role responsible for carrying out the activities associated with Master and Reference Data management. Although it is a role specific to Master and Reference Data management, Master Data Management specialists work closely with Business Owners, Data Administrators, Technical Owners and Data Custodians.

The main responsibilities of the MDM Specialitsrelate to the design and implementation of the MDM product (e.g. definition of attributes in the MDM, creation of MDM mapping documents…etc).

Figure by Author
Figure by Author

Master Data and Reference Tools

The minimum requirements that every Master Data and Reference Tools must have are:

  • Support on multiple domains such as Customer, Product, Location and Accounts.
  • Efficient management of the relationship between domains, e.g. from customers to products.
  • Categorization, grouping and hierarchy of master data entities.
  • Powerful algorithm for identification of duplicate data and its cascade elimination.
  • Ability to easily configure critical data elements to identify data matches according to organizational requirements.
  • Ability to configure business rules to keep information up to date and as recent as possible, resulting from the creation of a single data record or Gold Record.
Figure by Author
Figure by Author

Data Security

Data Security (data protection or data privacy), refers to the processes, policies and technology necessary to protect confidential information from unauthorized access, internally and externally.

Examples of confidential information: Health Care number, date of birth, ethnicity, credit card number, sales plan…

Data Governance refers to the rules of how to build content.

Data Security refers to the rules of how to protect and use content.

Sensitive Information vs. Non Sensitive Information

Non-Sensitive Information

  • Public Information: information that is already in the public domain (e.g., sex offender registration and voter registration files).
  • Routine business information: business information that is not subject to any special protection and that can be routinely shared with anyone inside and outside the company.

Sensitive Information

  • Personal and Private Information: refers to information that belongs to a private person, but the person may choose to share it with others for personal or business reasons (e.g., SS number).
  • Business Confidential Information: information the disclosure of which may affect the company (sales and marketing plans).
  • Classified Information: Generally information that is subject to the classification of Special Security Regulations imposed by many national governments.

Data Security Process

The recommended four steps of every succesful Data Security Process are:

  1. Define the Data Security Policy Identification of the requirements related to data privacy, definition of the data protection policy and definitions of the guidelines for the implementation of Data Security.

2. Provide Technological Support Implementation of appropriate technological tools to support the implementation of the data protection policy

3. Implementing Data Security Policies And the proper training of personnel

4. Supervision and Control of Application of Data Security Policies Ensure that the standards defined in the data protection policy apply to the entire organization

Data Security Roles

It is the key role responsible for implementing data protection policies. Although this function is specific to data security, the Data Privacy Officer will work closely with all other data management functions.

The key responsibilities include:

  • Develop and implement data security policy
  • Provide information and guidance on the processing of all personal data.
  • Develop a ‘Best Practice’ Guide for staff.
  • Provide training to staff
  • Process, coordinate and respond to all requests for information
Figure by Author
Figure by Author

Technological Tools for Data Security

Email Protection

  • Inbound message filtering
  • Automatically filtering and deleting spam
  • Analysis of attachments and encryption of B2B communications

Antivirus

  • For employees with internet access
  • Executable on all computers in the network

Firewall

  • Hardware system that increases security in the network of computers in the organization

Encrypted Wi-Fi

  • If there is a Wi-Fi network configuration in the company, the network must be encrypted with access restricted to authorized personnel using access code.

Data Storage in the Cloud

  • Save and work with documents in the cloud
  • Enable cloud security tools to make it easier for multiple users to access documents and files from a mobile device or from a remote location.

Secure Network Access

  • Protecting and accessing the Internet in the organization
  • It is important to protect servers and systems from hacker attacks.

Conclusion

This has been the last article about how to implement a succesful Data Management Strategy in an organization. If you liked this article, do not miss the introduction here.

If you liked this post then you can take a look at my other posts on Data Science and Machine Learning here.

If you want to learn more about Machine Learning, Data Science and Artificial Intelligence follow me on Medium, and stay tuned for my next posts!


Related Articles