DATA ENGINEERING

In today’s digital landscape, marketing is increasingly driven by data. Unlike traditional marketing (examples: television, radio, billboards, signs, and print), where measuring the impact of marketing channels, strategies, and ads was often unclear, digital marketing allows us to measure these impacts with precision. In fact, digital marketing has become a more effective approach not just because of the large audiences online, but because it enables us to evaluate our marketing efforts with greater accuracy and adjust its targeting more effectively.
Therefore, marketing teams need robust and scalable data infrastructure that allows them to ingest, process, and analyze data from multiple sources or channels. Of course, this is where Marketing Data Engineering becomes an important topic. Marketing Data Engineering is the backbone of modern digital marketing. The quality of data engineering products in this specific field, enables data-driven decisions that can significantly impact a company’s goals.
While Marketing Data Engineering is a vast field that can’t be fully covered in a single article, I’ve aimed to provide a concise introduction packed with essential information. Each of the topics mentioned has much more depth, which I hope to explore in future articles. To stay updated and receive the latest insights, be sure to subscribe to my Medium channel.
Introduction
Marketing Data Engineering involves the design, development, and maintenance of data products (including tables, data pipelines, dashboards, etc.) that ingest and process data from various marketing channels and partners. It requires a deep understanding of both Digital Marketing concepts and data engineering principles.
In one sentence, if I want to express the goal of marketing data engineering, I should say:
The goal of Marketing Data Engineering is to ensure that marketing data are:
- Reliable: accurately and continuously validated
- Available: consistently available and meets its SLA
- Scalable: capable of accommodating more data, partners, or channels
- Actionable: ready for use by decision-makers
At its core, Marketing Data Engineering is about transforming raw marketing and business data into actionable marketing insights. Later you will see a whole section in this article on Marketing Data Engineers responsibilities.
Digital Marketing Channels
Before diving into the roles of Marketing Data Engineers, it’s essential to understand marketing channels. Marketing channels are the various platforms through which companies reach their target audience, including search engines, media outlets, social media platforms, and email, among others. Each marketing channel may encompass different partners and platforms. For instance, Google and Bing both fall under the search engine marketing channel, even though they are distinct platforms (often referred to as marketing partners). They share similar marketing components and concepts, so they’re grouped within the same channel but recognized as different partners. However, the data generated from search marketing campaigns is vastly different from that of social media campaigns. The challenge for Marketing Data Engineers is to standardize this diverse data to enable consistent and meaningful analysis.
Here, we will briefly introduce a few widely-used marketing channels.
Search Engine Channels
Search Engines are critical digital marketing platforms and can be divided into three main marketing channels: Organic (SEO), Paid (SEM), and Metasearch
SEO (Search Engine Optimization) or Organic Search Channel: This involves optimizing your website and content to rank higher in search engine results pages (SERPs) without paying for placement. The collected data here must show how to optimize your website to rank higher in each search engine (impression) and attract more audience (click). Google, Bing, and Yahoo are some well-known search engines that many SEO optimization techniques are studied and developed for them.
SEM (Search Engine Marketing) or Paid Search Channel: This involves paying for ads to appear in Search Engineer Results Pages (SERPs). Unlike SEO channel, SEM pricing model involves bidding processes for different keywords. Your goal is to collect data that identifies which keywords with which bidding prices are effective in your ads and which are not. Similar to SEO, Google, Bing, and Yahoo are big names in SEM world.
Metasearch Engine Channel: These marketing channels are slowly become popular in recent years specifically for e-commerce, hotels, air travel advertisements. Examples are Google, Bing, TripAdvisor, Kayak, Trivago, and Skyscanner all offer metasearch.
The following figure shows all three mentioned marketing channels associated with search engines on a sample Google result page.

Regardless of which search engine channels we use, a marketing data engineer must be able to handle data from all kinds of search engine campaigns, ensuring that it can be analyzed side by side to measure overall search marketing effectiveness.
Media Marketing Channel
Media marketing is traditionally associated with channels like TV, radio, and newspaper advertisements. However, many of these traditional media outlets have recently transitioned to the digital realm or recognized the value of incorporating digital marketing strategies. As a result, they have adopted digital tools to create, monitor, and measure the effectiveness of ad campaigns on their platforms. While it may still be challenging to track every user interaction with ads through these channels, we can assess their effectiveness in a broader sense.
Social Media Marketing (SMM) Channels
Social media marketing is another key area that has both organic and paid components as well. Meta (with Facebook and Instagram platforms), TikTok, X (or Twitter), LinkedIn, Snapchat, YouTube, and Pinterest are a few examples of partners in this type of marketing. Although it seems that SMM is a single marketing channel, we have two distinct marketing channels under SMM which have different campaigns and ways of measurement.
Organic Social Media Marketing Channel: This refers to the content you post on social media platforms without paying for distribution (for example through your company social media pages). Here, key metrics include likes, shares, comments, and follower growth. Also, we might be able to link conversion on our website to this channel if we have a good tracking system in place.
Paid Social Media Marketing Channel: This involves investing in ads or boosted posts on social media platforms. Key metrics include ad spend, impressions, clicks, conversions, etc. The challenge lies in crafting effective campaigns and managing them regularly to optimize costs while maximizing conversions. Cost Per Click (pay each time a user clicks), Cost Per Thousand Impressions (pay for every thousand times the ad is shown), Cost Per Action (pay when a user takes a specific action), and Cost Per Engagement (pay based on user interactions with the ad, such as likes, shares, comments, or other forms of engagement) are few pricing model in this type of marketing channel.
Email Marketing Channel
Though email marketing is considered a traditional digital marketing channel, it remains one of the most widely used. Recently, it has successfully integrated with advanced content marketing techniques like automation and personalization. A prominent service provider in this space is Mailchimp. Price modeling for this kind of marketing is usually based on the number of emails and contacts per month.
App Marketing Channel
This channel is particularly important for businesses that have apps and promote their apps. Currently, Apple App Store and Google Plays App are the two most important platforms for promoting apps. The goal for these kinds of marketing campaigns is to install the app for the targeted audience. Therefore the way we measure conversion and define the goals for them are different. Apple Search Ads and Google Ads are two important marketing partners in this type of marketing. Cost Per Tap (pay when a user tap on your ad), and Cost Per Install (pay when a user installs your app) are two pricing models in this marketing channel.
Generic Ad Structure
Understanding the structure of digital advertising is crucial for effective campaign management and data analysis. The typical hierarchy in most advertising platforms, such as Google Ads or Facebook Ads, follows a multi-level structure that includes the Ad Account, Ad Campaign, Ad Group (sometimes called Ad Set too), and Ads (see the following figure).

Each level serves a specific purpose and allows for detailed targeting, budgeting, and optimization.
Ad Account
The Ad Account is the highest level in the advertising hierarchy. It represents the organization or individual running the ads and is where billing and payment information is managed. An Ad Account typically contains multiple campaigns and is the central hub for tracking overall ad spend and performance across all campaigns.
Key responsibilities at the Ad Account level include setting up payment methods, managing user permissions (e.g., who can create or view ads), and accessing reporting dashboards that provide insights into the performance of all campaigns under the account.
Campaign
The next level is the Ad Campaign, which is where you define the broader objectives of your advertising efforts. An Ad Campaign is designed around a specific goal, such as increasing brand awareness, driving traffic to a website, or generating leads.
Each campaign has its budget and schedule, and you can choose different campaign types depending on your objectives, such as search campaigns, display campaigns, or video campaigns. The settings at the campaign level, such as targeting criteria and bid strategy, apply to all the ad groups and ads within that campaign.
Since each campaign can have a unique name, it’s more effective to use a standardized naming convention rather than descriptive names. By establishing clear naming patterns within the marketing team, data engineers can later parse essential information, such as campaign objectives or point of sale, directly from the campaign name.
Ad Group (Ad Set)
Within each campaign, you have Ad Groups (or Ad Sets, depending on the platform). Ad Groups allow for more granular control over your ads, enabling you to organize them based on specific targeting criteria, such as audience demographics, interests, or keywords.
Ad Groups are where you define more detailed settings, such as the bid for each click or impression and the specific targeting options, like location or device type. Each Ad Group can contain multiple ads, all of which share the same targeting and bidding settings defined at the group level.
Just like campaign names, ad group names can follow a specific pattern to include key objectives, such as targeted device type or geo-location. It’s crucial that everyone on the marketing team adheres to these naming conventions. This consistency allows data engineering teams to extract valuable information from the ad group names and provide more context for the measured data.
Ads
At the bottom of the hierarchy are the Ads themselves, which are the individual pieces of content that users will see. This is where you define the creative elements of your campaign, such as the headline, text, images, or video, as well as the call to action (CTA).
Each Ad within an Ad Group is subject to the targeting and bidding settings of the group but can be tested against other ads in the same group to see which performs best. This process, known as A/B testing, allows advertisers to optimize their creative assets by comparing the performance of different ad variations.
Integrating the Structure
The structure of Ad Accounts, Campaigns, Ad Groups, and Ads is designed to give marketers flexibility and control over their advertising efforts. By organizing your ads in this hierarchical way, you can manage large-scale campaigns more effectively, ensuring that your marketing budget is allocated efficiently and that your ads are reaching the right audience with the right message.
For Marketing Data Engineers, understanding this structure is essential. It informs how data is collected and organized within the platform and impacts how performance metrics are reported and analyzed. Properly structuring your ad campaigns can lead to more accurate data insights, enabling better optimization and more effective marketing strategies.
Marketing Data
After understanding different marketing channels and a generic ad structure, it is essential to get familiar with marketing data. Data engineers love to divide data into dimensions and facts (aligned with data warehousing). Here, I also divide the marketing data into these two groups to make it easier for you to understand them.
Context and Dimensions
Everything that gives us a context about a marketing activity (such as ads) is considered as dimension (or context). Here are some common dimensions for marketing measurements.
Ads structure dimensions: As mentioned in the previous section, each ad has a hierarchy of ad account, ad campaign, ad group (or ad set). Ids and names of different parts of this hierarchy (such as account id, account name, campaign id, campaign name, and so on). These data helps us to aggregate different level of reports easier.
Device and User Attributes: Nowadays, you can get information far more details on user interactions with app. Information regarding the impression device (where the ad is shown) such as device OS (ios, Android), and device type (mobile, tablet, or desktop). Sometimes the same information is shown for the device that eventually the conversion happened (usually called conversion or action device). In addition, if the user breakdowns are selected properly, you might get user attributes such as age, gender, location, and so on in an aggregated way. In particular, location breakdown is important for many advertisers. Two common subdivisions in the location breakdown are Country and Metro.
Attribution Windows: Conversion metrics have a close relationship with attribution windows. Attribution windows refer to the time period during which a conversion is attributed to a particular marketing effort. When we are talking about conversion, we are talking about many kinds of conversions which might be the purpose of an ad or campaign. Some examples are purchasing, checking out, booking, installing, registration, signing up, donating, etc. In addition, different platforms may have different default attribution windows, which can significantly impact your analysis. Some platforms can provide you data in different attribution windows based on your report request. For example, they let you choose one of these attribution windows: 1day-view, 7day-click, or 7day-view-30day-click. It is highly recommended to review different partners attribution windows and find common (or similar) attribution windows for your further analytics. For instance, a 7-day click attribution window means that if a user clicks on an ad and converts within 7 days, the conversion is attributed to that ad. Understanding and customizing attribution windows is crucial for accurately measuring the impact of your marketing activities.
Facts and Metrics
Basic Metrics: These fundamental metrics are essential for any digital marketing campaign and often serve as a starting point for more detailed analysis and advanced KPIs.
- Spend: This represents the total amount of money invested in a marketing campaign. It can be tracked at various levels, such as by day, platform, or ad unit.
- Impressions: This metric counts how many times an ad is displayed to users. It measures the reach of your campaign at the top of the funnel.
- Clicks: This measures the number of times users interact with your ad by clicking on it. It’s crucial for evaluating the effectiveness of your ad copy and targeting.
- Engagement: This includes various user interactions with ads, such as likes, comments, shares, and video plays. It reflects how engaging the ads are to those who see them.
- Reach: Unlike impressions, which count the total number of ad displays, reach measures the number of unique users who saw the ad, regardless of how many times they viewed it.
- Frequency: This metric indicates the average number of times a user has seen your ad.
- Video Metrics: For ads with video content, metrics such as the number of video plays and milestones (e.g., 50% or 100% completion) provide insights into video effectiveness and viewer engagement.
Action and Conversion Metrics: These metrics go beyond the basics to measure user engagement and conversion actions. Conversions can vary widely, including actions such as purchases, checkouts, bookings, installations, registrations, sign-ups, and donations. Conversion metrics are closely tied to the attribution window, which defines the time frame within which a conversion is attributed to an ad interaction. For example, you might want to track whether a user makes a purchase 1 day or 30 days after interacting with an ad. As the time between ad interaction and conversion increases, the direct causality between the ad and the conversion may diminish. Additionally, different platforms may report conversions based on their data and they are blind to all your marketing efforts. For instance, Google might attribute a specific conversion to its platform based on a 30-day click window, while Facebook (Meta) might attribute the same conversion to itseld based a 1-day click window conversion. As you see, both of these platforms are claiming this specific conversion to themselves, but ultimately, it’s up to you to determine how to attribute it based on your internal data and analysis. This is where the value of marketing analytics becomes evident, as you’ll explore in the following sections.
Marketing Data Engineering Responsibilities
Now that you have a grasp of digital marketing fundamentals, the structure of generic ads, and key marketing data features, let’s explore the specific roles and responsibilities of data engineers in supporting marketing goals.
Data Lake Design, Implementation, and Maintenance
One of the primary responsibilities in Marketing Data Engineering is designing, implementing, and maintaining marketing data lakes. Data engineers must ingest two main types of data from various sources into a centralized data lake:
1. Ads Data from Marketing Platforms: This involves downloading data from platforms like Google, Meta, Bing, etc. Integrating these platforms through APIs requires a deep understanding of APIs, data formats, and data transformation techniques. Many platforms provide SDKs (e.g., Meta Business SDK), while others rely on RESTful APIs (e.g., [TikTok](https://www.catchr.io/metrics/tiktok-ads-metrics) API for Business). As a data engineer, you’ll spend considerable time exploring API capabilities and limitations by reviewing official documentation and conducting API tests using tools like Postman. Often, official documentation may be insufficient, necessitating the use of third-party resources for more comprehensive insights. For example, here you can find third-party documentation for Facebook and TikTok. A proof-of-concept (POC) phase is highly recommended to thoroughly explore the APIs before building data pipelines. Understanding the granularity of the data available from APIs is crucial, as some may limit data to daily levels at the ad group level, while others allow more detailed breakdowns by factors like age, device, gender, and location. Additionally, assessing the available metrics, such as basic or action/conversion metrics with varying attribution windows, is vital for enhancing future analytics. Testing API thresholds for request limits and data volume per day is also critical. To manage these restrictions, you may need to divide requests into smaller, parallel ones using threading, which requires careful handling of its complexities.
2. Business Performance Data: This includes data from business activities such as orders, invoicing, bookings, transactions, and clickstream data. Typically stored in transactional systems (OLTP), these data must first be ingested into the marketing data lake using batch- or stream-processing pipelines. It’s essential to consider how to store this data in the marketing data lake, as some may be sensitive customer or financial information that cannot be stored in its raw form. The sheer volume of this data can also significantly increase costs if not managed carefully.
Regardless of the data source or type, the primary goal is to ensure seamless data flow from source to destination. Given that this data serves as the foundation for many downstream processes, defining a reasonable SLA and developing a strategy to meet it is crucial. The processes at this stage often consume the most resources and incur the highest costs, making good design essential. For example, indiscriminately dumping large amounts of clickstream data without a clear business use case can lead to significant storage costs, while running API request processes without leveraging threading can result in longer processing times and higher costs. These examples underscore the importance of efficiency in this stage to meet SLAs and minimize expenses.
Data Warehousing and Dimensional Modeling
After data is ingested into marketing data lakes, it often arrives in an unorganized, unstructured, and unclean state. Before business or analytics teams can utilize this data effectively, it needs to be cleaned, transformed, and validated – tasks where data engineers focus much of their efforts.
Applying data warehousing principles is critical in this process, with dimensional modeling remaining a popular approach in marketing data warehousing. By organizing marketing data into fact and dimension tables, stakeholders, data analysts, and scientists can more easily conduct further analysis, driving insights that inform business decisions.
One of the common challenges of applying dimensional modeling to ad data is how to deal with slowly changing dimensions. Many marketing teams reuse an ad account, or campaign for different purposes over time. Deciding on how to keep track of these changes must be considered in dimensional modeling of marketing data.
Since marketing data are coming from different advertising platforms, it is a challenge on how to define a more or less similar grain and schema. Each platform has its own limitations and naming convention, but for our stakeholders, it is much easier to find a similar grain and naming convention among all channels and partners.
Another challenge associated with marketing data warehousing is to choose a right approach for complex conversion metrics. Sometimes conversion metrics are reported in STRUCT or MAP formats due to their complex nature. It is the data engineer design choice on how to report these metrics in the business tables. Remember, your stakeholders are not data engineers and one of your responsibilities is to simplify the data as much as possible to make it available to most of your stakeholder. Keeping complex data structures, make your data products less interesting for them to use in their decision making process.
Data Sharing with Marketing Partners
In certain situations, it becomes necessary to share marketing and business data with your marketing partners. But why is this sharing important? While marketing and business data are valuable assets, sharing specific data with partners can optimize the marketing process and improve targeting efforts.
Consider this: if your marketing partner drives traffic to your site but doesn’t know whether that traffic leads to conversions, they’re essentially working blind. You could rely on their algorithm and hope for the best, but a more strategic approach involves sharing conversion data with them. By providing insights into which redirected traffic results in conversions – such as purchases or app installs – you empower their algorithm to focus on the audience most likely to convert. This collaboration enhances the effectiveness of the partnership, benefiting both parties.
One of the key responsibilities of a data engineer is to establish secure and reliable connections between your business data resources and those of your partners. This allows for the seamless sharing of marketing and conversion data. Typically, this involves creating customized data pipelines and collaborating closely with the engineers on the marketing platform.
Mapping Website Events to Metrics
A crucial task in Marketing Data Engineering is mapping website events to marketing metrics. Website events are user actions on your site, such as page views, purchases, bookings, or form submissions. These events need to be tracked, labeled, and mapped to relevant marketing metrics to provide a complete picture of user behavior.
For example, a button click on a product page might be mapped to a "product interest" metric, while a form submission could be mapped to a "lead generation" metric. This mapping process is essential for accurate attribution and performance measurement, enabling businesses to understand and optimize user interactions effectively.
Analytics
Once data is collected and processed, it needs to be analyzed to derive insights. Analytics involves applying statistical and machine learning techniques to marketing data to uncover trends, predict outcomes, and optimize campaigns, distribute budget among channels and partners.
Common analytics tasks include A/B testing, cohort analysis, and customer segmentation. The insights gained from these analyses can then be used to inform marketing strategy and drive better decision-making.
Conclusion
In conclusion, Marketing Data Engineering is a complex but essential field that bridges the gap between marketing strategy and data-driven insights. By understanding the fundamentals of digital marketing, integrating data from various channels, and applying advanced analytics, Marketing Data Engineers empower marketing teams to make informed decisions that drive business success.
In future, we will talk about some of the topic in this article in more details. Please ensure to follow me and subscribe to my channel for upcoming articles.