Your first task is to clean the data by replacing missing values and identifying and removing outliers, impossible values, and incorrect values.
Objectives
You’ve been given access to four Buhi files, containing many rows of data:
Customer data
U.S. demographics (third-party) data
Social media campaign data
Email campaign data
You can access these datasets and explore their contents by downloading the attached zip file containing the corresponding CSV files. Open the CSV files in Microsoft Excel or Google Sheets.
Your first task is to clean the data by replacing missing values and identifying and removing outliers, impossible values, and incorrect values.
You can clean the data manually or use formulas and functions to speed up the process. Make sure to check the Guides tab inside the Help menu in the top right corner. If you need help using formulas, watch the Excel tutorial videos.
To transform the data properly, you must:
Replace missing data
Find and remove outliers, impossible values, and incorrect values
You can replace missing data, outliers, and bad data in the customer data with the U.S. demographics third-party dataset based on the customers’ zip codes.
Assume that all of the zip codes in all of the datasets are accurate and verified.
Each dataset will require different data-cleaning transformations.
You’ll come across different types of data in the social media campaign file than the email campaign file.
Be sure to adjust your transformations accordingly.
For both datasets, you have the option to use part of your $10,000 budget to buy premium data transformation services from a contractor, McNelson Company.
Buying this premium service means McNelson will clean the data for you, but there are a couple of trade-offs.
Option 1: Buy a data transformation service for a particular file.
Your data is guaranteed to be clean, but you’ll have less money left in your budget for marketing. Also, your ultimate profits may be mediocre.
Option 2: Clean all the files yourself.
This is a higher-risk, higher-reward option. If you transform the data well, you’ll achieve similar results without paying for transformation service. This means more money for marketing, but it only works if you properly clean your dataset.
You will answer a series of questions after you’ve transformed each dataset.
Then, you’ll come up with a new budget for email and social media campaigns.
This should be based on the calculated profitability per campaign.
_________________________________________________________________________________________________________
1. Customer Data:
Recent inconsistencies and errors in Buhi’s customer data have made it hard to collect insightful information. You must clean the customer data to move forward with your analysis.
1 – Download Customer Data
The customer data contains the following information:
Age
Gender
State
Income
Education
Zip code
Customer ID
The U.S. demographics data will help you replace missing cells within the customer data. This file was purchased from McNelson Company and does not need transformation.
2 – Perform Data Transformations
Interpolate missing data using the estimated values from the U.S. demographics dataset based on the customers’ zip codes
Remove extreme outliers
Correct incorrect value formats (e.g., age = “twelve” instead of 12)
Remove impossible values (e.g., age = 203)
Identify categories from granular data (e.g., determine the state to which a zip code belongs)
Identify problem data with complex problems (e.g., the state is Montana, but the country is Canada; the purchase was $100, but the item is a new car; one column says the gender is female, but another says it is male)
3 – Upload Cleaned CSV File
Upload Customer Data
Accepted File Type(s): csv
File must contain 9 columns with the names: “ID”, “Customer ID”, “Age”, “Gender”, “State”, “Annual Income”, “Annual Disposal”, “Education Level”, “Zip Code”
File must contain 1500 rows (+- 50) .
_________________________________________________________________________________________________________
2.Email Campaigns
Recent inconsistencies and errors in Buhi’s email campaign data have made it hard to collect insightful information. You must clean the email campaign data to move forward with your analysis.
1 – Download Email Campaign Data
The email campaign data contains the following information:
Campaign ID
Bounces
Opens
Message and image classification
Emails sent
Clickthrough
Conversions
Profit per campaign
Timestamp
Customer profile ID
2 – Perform or Purchase Data Transformations
For the email campaign data, there is an added twist. In this round, you have the option to use part of your $10,000 budget to buy premium data transformation services from a contractor, McNelson Company. Buying this premium service means McNelson will clean the data for you.
Note: For the customer data file, you were instructed to replace the incorrect or missing values, but you do not need to do that for this file. During this round, you had the choice to purchase a third-party service to clean the data for you. If you chose to do it yourself, you will return to clearing the cell only.
***Purchase Decision****
$1,500.00
Contract McNelson Company to clean the email campaign data.
Warning: By clicking “Pay,” you are agreeing to purchase data transformation services from McNelson Company, and $1,500 will be deducted from your marketing budget. Yes or no ?
Basic Data transformations:
Remove extreme outliers
Remove incorrect value formats (e.g., age = “twelve” instead of 12)
Remove impossible values (e.g., age = 203)
Identify and remove problem data with complex problems (e.g., the state is Montana, but the country is Canada; the purchase was $100, but the item is a new car; one column says the gender is female, but another says it is male)
3 – Upload Cleaned CSV File
Upload Email Campaign Data
Accepted File Type(s): csv
File must contain 12 columns with the names: “ID”, “Campaign ID”, “Bounces”, “Opens”, “Message Classification”, “Image Classification”, “Emails Sent”, “Clickthrough”, “Conversions”, “Profit Per Campaign”, “Timestamp”, “Customer Profile ID”
File must contain 1123 rows (+- 40) .
_________________________________________________________________________________________________________
3.Social Media Campaign
Recent inconsistencies and errors in Buhi’s social media campaign data have made it hard to collect insightful information. You must clean the social media campaign data to move forward with your analysis.
1 – Download Social Media Campaign Data
The social media campaign data contains the following information:
Campaign ID
Click through on ads
Ad impressions
Time of day
Conversions
Profit per campaign
Likes
Price per campaign
Message and image classification
Customer profile ID
2 – Perform or Purchase Data Transformations
For the social media campaign data, there is an added twist. In this round, you have the option to use part of your $10,000 budget to buy premium data transformation services from a contractor, McNelson Company. Buying this premium service means McNelson will clean the data for you.
Note: For the customer data file, you were instructed to replace the incorrect or missing values, but you do not need to do that for this file. During this round, you had the choice to purchase a third-party service to clean the data for you. If you chose to do it yourself, you will return to clearing the cell only.
*****Purchase Decision****
$1,500.00
Contract McNelson Company to clean the social media campaign data.
Warning: By clicking “Pay,” you are agreeing to purchase data transformation services from McNelson Company, and $1,500 will be deducted from your marketing budget. Yes or no?
Basic Data transformations:
Remove extreme outliers
Remove incorrect value formats (e.g., age = “twelve” instead of 12)
Remove impossible values (e.g., age = 203)
Identify and remove problem data with complex problems (e.g., the state is Montana, but the country is Canada; the purchase was $100, but the item is a new car; one column says the gender is female, but another says it is male)
3 – Upload Cleaned CSV File
Upload Social Media Campaign Data
Accepted File Type(s): csv
File must contain 12 columns with the names: “ID”, “Campaign ID”, “Click through on ads”, “Ad impressions”, “Time of day”, “Conversions”, “Profit per Campaign”, “Likes”, “Price per Campaign”, “Message classification”, “Image classification”, “Customer Profile ID”
File must contain 953 rows (+- 30) .
———————————————————————————————————————————————————————————————-
Now that you’ve finished transforming the data, answer the questions below. Make sure to use the cleaned and transformed data for your calculations, rather than the original data files.
Analytics Report – Advanced Data Transformation
Answer the 10 questions below worth a total of 100 points.
According to the customer data, what is the average annual income for all customers?
Enter a whole number (ex. 1, 10, 100, 1000)
2. According to the social media campaign data, how many impressions were there in total?
Enter a whole number (ex. 1, 10, 100, 1000)
3. According to the customer data, what is the average annual disposable fund for all customers?
Enter a whole number (ex. 1, 10, 100, 1000)
4. Identify the customer profile in the social media campaign data with the highest number of likes. How many likes, in total, does that customer profile have?
Enter a whole number (ex. 1, 10, 100, 1000)
5.According to the email campaign data, how many conversions were there in total?
Enter a whole number (ex. 1, 10, 100, 1000)
6. Using total emails sent and total clickthroughs by customer profile from the email campaign file, identify the customer profile which had the highest click-through rate. What is the total click-through rate for that customer profile? Your manager defines click-through rate for emails as clickthroughs divided by emails sent.
Enter a percentage value (ex. 100%, 100.00%, .01%, 10)
7. According to the customer data, how many customers are in Idaho?
Enter a whole number (ex. 1, 10, 100, 1000)
8. According to the email campaign data, how many messages were classified as Newsletter?
Enter a whole number (ex. 1, 10, 100, 1000)
9. According to the customer data, what is the average customer age for all customers?
Enter a decimal number (ex. 1, 1.0, 10.01, .1001)
10. According to the social media campaign data, what is the average price per campaign across all campaigns?
Enter a whole number (ex. 1, 10, 100, 1000)
———————————————————————————————————————————————————————————————
Recommend Budget Allocation
Determine which customer profiles to target with email and social media campaigns, and recommend marketing budget allocations for the campaigns based on your transformed data and the questions you answered.
In each dataset, there is a column that lists one of seven customer profiles.
Based on your decision, you will measure the average profit per customer profile in both the email and social media campaigns. If the data is perfectly cleaned and the budget is allocated wisely, you should see an optimal performance.
You’ll need to select one customer profile to target from both the email campaign data and the social media campaign data.
Use the boxes below to recommend budget allocations for each campaign’s most profitable customer profile.
You must allocate a minimum of $1,000 to each campaign.
You can allocate a maximum of $5,000 to each campaign.
*If you purchased third-party transformation services, the cost has been deducted from your budget below.