An intro to portfolio hedging, no finance background required…

]]>Create professional-looking artwork in 10 minutes using transfer learning.

]]>Imagine you have 2 pucks that you want to fit tightly in an opening. If you just needed these three parts to fit together once, you could measure how tall the pucks were and then make your cut to that size and adjust it until it’s perfect. If you are trying to fit these three parts together a lot of times, that’s when things get dicey. Some of your stacks of pucks could be bigger than your opening and not fit. Some of your stacks of pucks could be much smaller than the opening and your parts will fit looser than you’d like.

You may design your yellow puck to be 30 mm tall, but the pucks will more likely be sized somewhere between 29 and 31 mm tall. In this scenario, your yellow puck height has a tolerance¹ of ± 1 mm.

You could measure each part and pair the larger yellow pucks with the smaller red pucks and force everything to work. (This is a process known as binning). This does work, but measuring parts takes time. And as the old adage goes, time = money. Incidentally, time also equals time. So if you want to bin, you will need to be willing to spend a lot of time and money doing it.

This is where tolerance stackups come in. When you have parts that you want to fit in an opening, tolerance stackups are a tool that allow you to determine whether or not your parts will always² fit in your opening, even if you are making hundreds of thousands of these assemblies.

A tolerance stackup is a way to create a loop that includes each critical dimension in the “stack.” It enables you to look at the dimensions and the tolerances for these values to figure out if your design will work and update accordingly. It typically looks like this:

In situations like the example above, a tolerance stackup is fairly easy to put together and it gives you all the information you need to create a good design. A tolerance stackup stops being quite so useful if:

- You have many more dimensions in your stackup
- You are looking at the tolerance of round parts and their radial fit.
- You already know the distribution your parts are coming in at

Each of these scenarios calls for a Monte Carlo simulation.

In a Monte Carlo simulation, you generate realistic values³ for every dimension that you had included in your tolerance stack up. Once you’ve generated these values, you can add and subtract values going through the same loop to generate a distribution for the critical dimension (in our example, the gap from 4 to 1). Finally, you’ll want to set an intended yield⁴ and see what that critical dimension is for your yield. This may tell you that you can make your design gap smaller or that you need to make it larger. The distribution might tell you there’s a strong case for binning.

Notice that the piece parts follow the specs of the parts as anticipated in the tolerance stackup. However, the resultant gap is distributed tightly enough that the tolerance on the gap is tighter than the RSS value from the tolerance stackup.

To make things more interesting, let’s look at a radial example.

Imagine you are fitting a plastic peg in to a metal opening as shown to the left. This peg has crush ribs on it⁵. You’d like to make the gap between the top plastic portion and the top metal portion as small as possible all around without ever crashing in to it.

Seems simple, right? Unfortunately, it’s not. If everything was perfectly round, this would work well. But if you consider the scenarios drawn below, you’ll see that the tolerance stack up is missing some information. These images are exaggerations of realistic scenarios.

A Monte Carlo simulation allows you to simulate a radial angle that each part is off center by and a radial angle that each part is the furthest from round at. By simulating your result, you can account for the times that the off center features coincidentally cause parts to fit and the times that they coincidentally cause parts to interfere. This often leads to a tighter and more realistic resultant tolerance than a tolerance stackup. This is because a tolerance stackup forces you to assume that the problem dimensions are all at the same radial angle when in reality, they almost certainly aren’t.

I have pasted the code I used to generate this simulation below:

# importing libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline

# Generating part measurement data

df['purple_r'] = np.random.normal(15.05, .1, 1000)

df['purple_off_center'] = np.random.normal(0, .02, 1000)

df['angle_purple_off_center'] = np.random.uniform(0, 6.283185, 1000)

df['grey_off_center'] = np.random.normal(0, .02, 1000)

df['grey_r'] = np.random.normal(15.5, .03, 1000)

df['angle_grey_off_center'] = np.random.uniform(0, 6.283185, 1000)

# Generating assembly measurement data

# Using df['angle_purple off center'] as direction of gap

df['Circularity Contribution Purple'] = df['purple_r']

df['Concentricity Contribution Purple'] = df['purple_off_center']

df['Circularity Contribution Grey'] = df['grey_r']

df['Concentricity Contribution Purple'] = df['grey_off_center'] * \

np.cos(df['angle_purple_off_center']-df['angle_grey_off_center'])

df['gap'] = np.abs(df['Circularity Contribution Purple']) + \

df['Concentricity Contribution Purple'] - np.abs(df['Circularity Contribution Grey']) \

- df['Concentricity Contribution Purple']

# Part measurement data graph

fig, ax = plt.subplots(2, ncols=3, figsize = (14, 8))

ax = ax.ravel()

ax[0].hist(df['purple_r'], bins =20, color='purple')

ax[0].set_title('Distribution of \n- Purple Large Radius')

ax[0].set_xlabel('mm')

ax[1].hist(df['purple_off_center'], bins =20, color='purple')

ax[1].set_title('Distribution of Concentricity\nof Features on Purple Part')

ax[1].set_xlabel('mm')

ax[2].hist(df['angle_purple_off_center'], bins =20, color='violet')

ax[2].set_title('Distribution of Angle of\n Off Center for Purple Part')

ax[2].set_xlabel('radians')

ax[3].hist(df['grey_off_center'], bins =20, color='dimgray')

ax[3].set_title('Distribution of Concentricity\nof Features on Grey Part')

ax[3].set_xlabel('mm')

ax[4].hist(df['angle_grey_off_center'], bins =20, color='lightgray')

ax[4].set_title('Distribution of Angle of\n Off Center for Gray Part')

ax[4].set_xlabel('radians')

ax[5].hist(df['grey_r'], bins =20, color='dimgray')

ax[5].set_title('Distribution of \n - Grey Large Radius')

ax[5].set_xlabel('mm')

plt.tight_layout();

# Assembly measurement data graph

fig, ax = plt.subplots(1, ncols=4, figsize = (14, 4))

ax = ax.ravel()

ax[0].hist(df['Circularity Contribution Purple'], bins =20, color='purple')

ax[0].set_title('Circularity Contribution Distribution \n Purple Outer Radius')

ax[1].hist(df['Concentricity Contribution Purple'], bins =20, color='violet')

ax[1].set_title('Concentricty Contribution Distribution \n Purple Radii Relative to Each Other')

ax[2].hist(df['Circularity Contribution Grey'], bins =20, color='dimgray')

ax[2].set_title('Circularity Contribution Distribution \n Grey Outer Radius')

ax[3].hist(df['Concentricity Contribution Purple'], bins =20, color='lightgray')

ax[3].set_title('Concentricty Contribution Distribution \n Purple Radii Relative to Each Other')

plt.tight_layout();

# Final Gap Graph

mu = df['gap'].mean()

sigma = df['gap'].std()

plt.hist(df['gap'], bins =20, color='black')

plt.title('Resultant Gap Distributions', fontsize = 16)

plt.axvline((mu-(3*sigma)), color='green', alpha=0.5)

plt.axvline((mu+(3*sigma)), color='green', alpha=0.5)

plt.axvline((mu+(3*sigma)), color='green', alpha=0.5)

plt.xlabel("Gap (mm)")

plt.ylabel("Assembly Count")

plt.tight_layout();

[1] Tolerances are determined in a number of ways. The design engineer will specify in a part drawing the tolerance for each critical dimension. This tolerance is usually created based on best practices for part tolerances of that manufacturing method and based on feedback from the vendor who is creating the part. If the tolerance stackup indicates that it needs to be tighter than best practices, in many cases that can be done too, but will increase the cost of the part.

[2] By always, I mean almost always. Outliers are always going to happen. Parts will be out of spec. You may even be willing to design knowing that you’ll throw away parts. For the purposes of this blog “always” actually means “as often as your intended yield.”

[3] If this is being done before data is available, you can generate values based on how you would *expect *them to come in given your manufacturing process and quality. If you are already making parts, you can use the mean and standard deviation of each of your tools to generate data.

[4] If you’ve heard the term “Six Sigma” before and ever wondered what it means, nothing you’ve heard from Jack Donaghy is real. It refers to setting your yield such that ± 3 standard deviations (six sigma) are in spec (i.e. you have a yield of 99.99966%)

[5] Crush ribs are small ribs included in a softer part that will be crushed as it is pushed in to a harder part. Barring major issues with the circularity of the hard part or the crush ribs in the small part, this tends to create a fit close enough to centering the crushed feature within the crushing feature that you can assume the tolerance of the fit is zero.

Tolerance Stackups was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

]]>**Tips and tricks to promote brands and gain insights from data**

Suppose, some day you think of starting a company. In this age and time, one of the most important things is the reach of your company. If the company is formed on certain products which are used by people in day-to-day life, then it is likely that you already have a lot of competitors in the market. Now, what matters is to beat the heat of the competition. So, the places where your company becomes accessible to people is not only the advertisement hoardings these days. It’s the social media where people keep scrolling through their news feed around three times a day. On average, an American spends 705 hours on social media every year. The chances of failing are negligible, and even if you do fail, you have nothing to lose.

So, let’s assume your company is officially participating in brand marketing on Social Media. You’ve set up a Facebook or Instagram page as the brand that is supposed to be marketed is a cosmetic brand ( Not a big fan of cosmetics but the dataset used is that of a cosmetic brand). You respond to customer questions, follow fans, post important news, and thank your advocates for their support. Beyond that, are you taking enough actions to monitor and analyze the results of your work? If you’re engaging in social media, then you should certainly measure those activities. Is there any better way of knowing it? It’s easier done than said, sounds ironical, huh ?

Social media metrics are data and statistics that give you insights into your social media marketing performance. Some of these metrics are explained in the blog below.

To suffice the purpose of this blog, we’ll segment the metrics into four different categories:

**Awareness**: These metrics illuminate your current and potential audience.**Engagement**: These metrics show how audiences are interacting with your content.**Conversion**: These metrics demonstrate the effectiveness of your social engagement.**Consumer**: These metrics reflect how active customers think and feel about your brand.

I’ll be covering all the categories in general without going into the specific details for each. So, following are some of the metrics that you need to know to get a better understanding of the whole thing:

**Reach**: The total number of distinct people or the users the post has reached.

**Impressions**: The number of times a post was seen by the users. A post reached to n number of people can be seen by them m times individually. So, the total impressions become m*n. The number of times a post is displayed in your news feed, irrespective of the post being clicked. People may see multiple impressions of the same post. For example, someone might see a Page update in News Feed once, and then a second time if their friend shares it.

**Total number of page likes: **Likes show your audience size on Facebook. Over time, that number should be growing. If you’re stuck with around the same number of likes for months, it means one of two things:

1) You’re losing the same number of Likes as you’re gaining.

2) New people aren’t following you

**Engagement**: It means the number of times the users have performed an action on the post. Engagement is one of the most important Facebook metrics you can track. Subjectively, engagement is a sign that people actually like the content you’re sharing. But another reason engagement is so valuable is it may give your posts more exposure to your audience. This includes liking, commenting, sharing and people who’ve viewed your video or clicked on your links and photos. And it also includes people who’ve clicked on a commenter’s name, liked a comment, clicked on your Page name and even gave negative feedback by reporting your post.

**Consumptions: **This metric is similar to Engagement but Consumption does not necessarily produce a story**.**

**Total interactions: **As we know, the main motive is to increase the number of people viewing the post and to increase the number of interactions (like, comment, share) with the post so that a story is created and it automatically appears to the viewer’s friends in their News Feed. The total interactions are calculated from the actions performed by Lifetime Post Engaged Users (Story is created) and Lifetime Post Consumers (Story not created). Do note these metrics, these will be used later.*Total interactions= Likes + Comments + Shares*

**Type of Posts: **There are four types of post on Facebook: Video, Photo, Status, Links. A general human tendency is to look at images, read and share them. Statuses are generally longer and people are reluctant to read it. So, the natural biasing is towards looking and reacting to Videos, Pictures followed by statuses.

Last year (2018), Facebook admitted that it prioritizes video in its algorithm, with extra emphasis on live video. If you can create video content, you have a much better chance of getting to the top of news feeds. So, you’ll see the later visualizations are a little biased towards Video Posts.

**Paid Likes**: Number of people who have Liked your Page as a result of a Facebook Ad campaign.

The Facebook page analytics provides data and statistics related to these metrics. So, what I have done is I have a dataset with the name Morto et al, which gave the metrics for advertisement of particular cosmetic anonymized brand on Facebook. The dataset has a total of 500 records which has 19 columns with such metric values defined. Now, coming back to next part of this blog is getting started with R programming.

**R is a statistical programming language. **R possesses an extensive catalog of statistical and graphical methods. It includes machine learning algorithms, linear regression, time series, statistical inferences. There are various packages in R which make it easy to get the tasks in the data science domain done easily. For detailed information theoretically on R, visit https://en.wikipedia.org/wiki/R_(programming_language) and https://www.tutorialspoint.com/r/index.htm to get help for syntax of R programming language.

Going to the project wherein I plotted various data visualizations which will help any layman to make decisions as to what actually helps in bringing the advertisement close to the end user.**Note**: The following observations and data analysis was done by intuition and then some changes were done to get the best results out of it.

**Chiefly, The total page likes depend on following Facebook Post metrics**:

Total Reach

Total Impressions

Type of Post (Photo, Status, Video, Link)

Weekday

Hour of the day

With the help of multiple regression analysis, we get the coefficients for all the above metrics which will help us determine the importance of each metric, which helps in increasing our page likes.

Why Multiple Regression?Multiple regression is an extension of linear regression into relationship between more than two variables. In simple linear relation we have one predictor and one response variable, but in multiple regression we have more than one predictor variable and one response variable.

We create the regression model using the **lm()** function in R. The model determines the value of the coefficients using the input data. Next we can predict the value of the response variable for a given set of predictor variables using these coefficients.

data<-read.csv("Morto_Et_Al.csv")

input<-read.csv(file = "facebook.csv", sep = ",")[ ,c('PTL','LPTReach','LPTImpressions','Type','Weekday','Hour')]

model<-lm(PTL~LPTReach+LPTImpressions+Type+Weekday+Hour,data=input)

a <- coef(model)[1]

XLPTReach <- coef(model)[2]

XLPTImpressions <- coef(model)[3]

XTypePhoto <- coef(model)[4]

XTypeStatus <- coef(model)[5]

XTypeVideo <- coef(model)[6]

XWeekday <- coef(model)[7]

Xhour <- coef(model)[8]

**2. Prediction analysis:**

Here, using the coefficients obtained by multiple regression analysis are used to generate an equation wherein putting the values for the variables the equation predicts the number of likes depending on the type of post.

x1 <- readline("What is the total reach?")

x1 <- as.numeric(x1)

x2 <- readline("What is the value for total impressions?")

x2 <- as.numeric(x2)

x6 <- readline("What is weekday?")

x6 <- as.numeric(x6)

x7 <- readline("Which is the hour of the day?")

x7 <- as.numeric(x7)

x<-c("Photo","Status","Video")

type<-readline("What is the type of post?")

if("Photo" %in% type) {

Y = a + XLPTReach*x1+XLPTImpressions*x2+XTypePhoto*2.5+XWeekday*x6+Xhour*x7Z = a + XLPTReach*x1+XLPTImpressions*x2+XTypePhoto*3+XWeekday*x6+Xhour*x7

}

else if ("Status" %in% x) {

Y = a + XLPTReach*x1+XLPTImpressions*x2+XTypeStatus*1.4+XWeekday*x6+Xhour*x7

Z = a + XLPTReach*x1+XLPTImpressions*x2+XTypeStatus*2+XWeekday*x6+Xhour*x7

}

else if ("Video" %in% x) {

Y = a + XLPTReach*x1+XLPTImpressions*x2+XTypeVideo*4+XWeekday*x6+Xhour*x7

Z = a + XLPTReach*x1+XLPTImpressions*x2+XTypeVideo*5+XWeekday*x6+Xhour*x7

}

3. **Post weekday v/s total reach, Post weekday v/s Total Impressions:**

Here, I have plotted a line graph to show the relation, how a post weekday affects the total reach and impressions. Results show a post posted on weekday 3 gives the best results for reach, and the same post has the maximum number of impressions the following day. Makes sense, doesn’t it?

day1<- subset(data,Weekday==1)

mean1R <- mean(day1$LPTReach)

mean1I <- mean(day1$LPTImpressions)

day2<- subset(data,Weekday==2)

mean2R <- mean(day2$LPTReach)

mean2I <- mean(day2$LPTImpressions)

day3<- subset(data,Weekday==3)

mean3R <- mean(day3$LPTReach)

mean3I <- mean(day3$LPTImpressions)

day4<- subset(data,Weekday==4)

mean4R <- mean(day4$LPTReach)

mean4I <- mean(day4$LPTImpressions)

day5<- subset(data,Weekday==5)

mean5R <- mean(day5$LPTReach)

mean5I <- mean(day5$LPTImpressions)

day6<- subset(data,Weekday==6)

mean6R <- mean(day6$LPTReach)

mean6I <- mean(day6$LPTImpressions)

day7<- subset(data,Weekday==7)

mean7R <- mean(day7$LPTReach)

mean7I <- mean(day7$LPTImpressions)

v<-c(mean1R,mean2R,mean3R,mean4R,mean5R,mean6R,mean7R)

t<-c(mean1I,mean2I,mean3I,mean4I,mean5I,mean6I,mean7I)

png(file = “Weekday_TotalReach.png”)

plot(v,type = “o”,col = “red”, xlab = “Weekday”, ylab = “Total Reach”,

main = “Chart for analysis of total reach”)

dev.off()

png(file = “Weekday_TotalImpressions.png”)

plot(t,type = “o”,col = “blue”, xlab = “Weekday”, ylab = “Total Impressions”,

main = “Chart for analysis of total impressions”)

dev.off()

**4. Type of post v/s Total reach, Type of Post v/s Total Impressions**

Here, a bar chart is used to show the relation between the post type and reach, impressions. It helps in comparing the impact made by each of the post types. The results show that a “video” could be a better option for promoting your brand over other post types.

photodf <- subset(data,Type=="Photo")

meanPR <- mean(photodf$LPTReach)

meanPI <- mean(photodf$LPTImpressions)

#meanPLikesP <- mean(photodf$PTL)

statusdf <- subset(data,Type=="Status")

meanSR <- mean(statusdf$LPTReach)

meanSI <- mean(statusdf$LPTImpressions)

#meanPLikesS <- mean(statusdf$PTL)

videodf <- subset(data,Type=="Video")

meanVR <- mean(videodf$LPTReach)

meanVI <- mean(videodf$LPTImpressions)

#meanPLikesV <- mean(videodf$PTL)

linkdf <- subset(data,Type=="Link")

meanLR <- mean(linkdf$LPTReach)

meanLI <- mean(linkdf$LPTImpressions)

#meanPLikesL <- mean(linkdf$PTL)

ValueR <- c(meanPR,meanSR,meanVR,meanLR)

ValueI <- c(meanPI,meanSI,meanVI,meanLI)

Post <- c("Photo","Status","Video","Link")

png(file = "barchart_reach&postType.png")

barplot(ValueR,names.arg = Post,xlab = "Post Type",ylab = "Total Reach",col = "blue", main = "Total Reach v/s Post Type",border = "red")

dev.off()

png(file = "barchart_Impressions&postType.png")

barplot(ValueI,names.arg = Post,xlab = "Post Type",ylab = "Total Impressions",col = "red", main = "Total Impressions v/s Post Type",border = "blue")

dev.off()

**5. Total Interactions v/s Total reach, Total Interactions v/s Total Impressions**

Using linear regression, a graph is plotted that shows the dependency of total interactions on total reach, impressions which in turn would actually allow many viewers to visit the page.

ReachInt<-data$LPTReach

ImpressionInt<-data$LPTImpressions

LikesInt<-data$PTL

TotalInt<-data$Total.Interactions

relation <- lm(TotalInt~ReachInt)

png(file = "linearregression(ReachvsInteractions.png")

plot(ReachInt,TotalInt,col = "blue",main = "Reach v/s Interactions",cex = 1.3,pch = 16,ylab = "Total Interactions",xlab = "Total Reach")

dev.off()

relation <- lm(TotalInt~ImpressionInt)

png(file = "linearregression(ImpressionsvsInteractions.png")

plot(ImpressionInt,TotalInt,col = "red",main = "Impressions v/s Interactions",cex = 1.3,pch = 16,ylab = "Total Interactions",xlab = "Total Impressions")

dev.off()

relation <- lm(LikesInt~TotalInt)

png(file = "linearregression(LikesvsInteractions.png")

plot(TotalInt,LikesInt,col = "violet",main = "Likes v/s Interactions",cex = 1.3,pch = 16,xlab = "Total Interactions",ylab = "Total Page Likes")

dev.off()

**6. Paid post v/s Total Reach:**

When you pay Facebook to promote a post, the total reach of the post is increased. So, in order to determine if a post is paid, we use a decision tree which shows that if the total reach > 10470, the post is a paid post. So, as per the current dataset if you have to make the post reach to around 10,000 people, you need to pay for it and Facebook ends up promoting it.

data1<-read.csv(“facebook.csv”)

library(party)

png(file = “Paid(FullData).png”)

output.tree <- ctree(Paid[1:100] ~ ReachInt[1:100] + ReachInt[1:100],data = data1)

plot(output.tree)

dev.off()

7. **How many users who have engaged with the post have already liked the page?** I plotted the pie chart to show the distribution. The results show 70% people who engage with the post are the ones who have already liked the page. Now, this statistic helps to spread the post amongst their friends and also provokes them to like the post or maybe the page.

LPEuser1<-data$LPEuser

LEngaged1<-data$LEngaged

divi.result <- (LPEuser1/LEngaged1)*100

LPEuser1mean<-mean(divi.result)

LEngaged1mean=100-LPEuser1mean

x<-c(LPEuser1mean,LEngaged1mean)

piepercent<- round(100*x/sum(x), 1)

png(file = “Engaged_users.png”)

pie(x,labels=piepercent, main = “Engaged users %”, col = rainbow(length(x)))

legend(“topright”, c(“Users already liked the page”,”Others”), cex = 0.8,fill = rainbow(length(x)))

dev.off()

**8. Engaged Users, Consumptions as a ratio of Impressions:**

Till now, we have been focusing on the parameters which directly affect the total page likes. Now, the factors which indirectly affect the total page likes i.e. Engaged users, Consumers, Consumptions are taken into consideration. Here, the statistics are plotted on a pie chart, which help in determining the percentage of consumers, engaged users from the total reach and also the consumptions, engaged users from the total impressions for every type of post.

This helps in showing whether a particular post type is suitable for increasing the number of interactions and promoting the post.

LPConsumptions<-photodf$LPConsumptions

LPTimpressions<-photodf$LPTImpressions

LEngaged2<-photodf$LEngaged

divi.result<-(LPConsumptions/LPTimpressions)*100

divi.result2<-(LEngaged2/LPTimpressions)*100

LEngaged2mean<-mean(divi.result)

LPConsumptionsmean<-mean(divi.result2)

others<-100-(LEngaged2mean + LPConsumptionsmean)

x<-c(LEngaged2mean,LPConsumptionsmean,others)

piepercent<- round(100*x/sum(x), 1)

png(file = "Impressions_Engaged_Consumptions(photo).png")

pie(x,labels=piepercent, main = "Engaged Users,Consumptions from total impressions(photo)%", col = rainbow(length(x)))

legend("topright", c("Engaged users%","Lifetime Post Consumptions%","Others%"), cex = 0.8,fill = rainbow(length(x)))

dev.off()

LPConsumptions<-statusdf$LPConsumptions

LPTimpressions<-statusdf$LPTImpressions

LEngaged2<-statusdf$LEngaged

divi.result<-(LPConsumptions/LPTimpressions)*100

divi.result2<-(LEngaged2/LPTimpressions)*100

LEngaged2mean<-mean(divi.result)

LPConsumptionsmean<-mean(divi.result2)

others<-100-(LEngaged2mean + LPConsumptionsmean)

x<-c(LEngaged2mean,LPConsumptionsmean,others)

piepercent<- round(100*x/sum(x), 1)

png(file = "Impressions_Engaged_Consumptions(status).png")

pie(x,labels=piepercent, main = "Engaged Users,Consumptions from total impressions(status)%", col = rainbow(length(x)))

legend("topright", c("Engaged users%","Lifetime Post Consumptions%","Others%"), cex = 0.8,fill = rainbow(length(x)))

dev.off()

LPConsumptions<-videodf$LPConsumptions

LPTimpressions<-videodf$LPTImpressions

LEngaged2<-videodf$LEngaged

divi.result<-(LPConsumptions/LPTimpressions)*100

divi.result2<-(LEngaged2/LPTimpressions)*100

LEngaged2mean<-mean(divi.result)

LPConsumptionsmean<-mean(divi.result2)

others<-100-(LEngaged2mean + LPConsumptionsmean)

x<-c(LEngaged2mean,LPConsumptionsmean,others)

piepercent<- round(100*x/sum(x), 1)

png(file = "Impressions_Engaged_Consumptions(video).png")

pie(x,labels=piepercent, main = "Engaged Users,Consumptions from total impressions(video)%", col = rainbow(length(x)))

legend("topright", c("Engaged users%","Lifetime Post Consumptions%","Others%"), cex = 0.8,fill = rainbow(length(x)))

dev.off()

LPConsumptions<-linkdf$LPConsumptions

LPTimpressions<-linkdf$LPTImpressions

LEngaged2<-linkdf$LEngaged

divi.result<-(LPConsumptions/LPTimpressions)*100

divi.result2<-(LEngaged2/LPTimpressions)*100

LEngaged2mean<-mean(divi.result)

LPConsumptionsmean<-mean(divi.result2)

others<-100-(LEngaged2mean + LPConsumptionsmean)

x<-c(LEngaged2mean,LPConsumptionsmean,others)

piepercent<- round(100*x/sum(x), 1)png(file = "Impressions_Engaged_Consumptions(link).png")

pie(x,labels=piepercent, main = "Engaged Users,Consumptions from total impressions(link)%", col = rainbow(length(x)))

legend("topright", c("Engaged users%","Lifetime Post Consumptions%","Others%"), cex = 0.8,fill = rainbow(length(x)))

dev.off()

Thus, above was the statistical data analysis done in R as an amateur to get a Hands-On Experience on a real-world dataset. So, next time you start marketing your brand, do it on Facebook and see it spread like wildfire. Also, do keep check on the metrics, they will help you grade your marketing skills😊

Brand Marketing on Facebook using Statistical Analysis was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

]]>Influencer marketing is not a foreign concept to us anymore. In the US retail business, there are only 33% of the retailers haven’t used…

]]>Pie is delicious!

]]>An introduction to learning rate hyper-parameter and principles and procedures to find a good starting value and adapting it over the…

]]>Creating tf.train.Example messages from data and writing and reading TFRecords

]]>A summary of methods and resources for creating a model using an imbalanced dataset

]]>Learning about statistical/machine learning can be intimidating, especially if you’re like me and coming from another field (eg., social sciences, life sciences, business, etc). There hundreds of complex models to choose from and numerous schemes to validate your data. Here, we’re going to take a closer look at a simple model for prediction you probably came across if you completed a statistics course, regression. Why so simple? Regression may not be as sexy as some other methods, but gaining a deeper understanding of the basics is undoubtedly important for moving on to more complex models.

When learning about regression, the emphasis probably wasn’t on **prediction **(or where predicted values come from) but rather on jumping to the output, checking coefficients, and praying that p-values associated with said coefficients fell below .05. Regression need not be just a tool for inferential statistics. As I touched on, if you’ve ran a regression analysis you’ve already completed a computation that involved calculating predicted values.

Below, we’ll unpack how prediction unfolds in two different contexts. In the first context, we’ll be working with data where the y value is present — a common situation where an analyst is making inferences about the relationship between the outcome and predictor variables. In the second context, we’ll be predicting new y values on left out data. We’ll do this all without the help of functions or libraries to illustrate how linear regression can be used as a basic predictive tool with the classic iris data set.

The first thing to do is to load the iris data set:

Before we get into our own calculations, let’s begin by looking at how we can fit a linear model and use that to predict some new data in base R with the “lm” function. We’ll attempt to predict Sepal Length from the other 3 numeric variables in the iris data set. To begin, we’ll grab some training data (70/30 split) and fit our model.

Now, we’ll create some test data and combine that with our model from the training data to create some predictions.

The predictions look great! If someone accidentally deleted a bunch of information regarding iris’ sepal lengths in a flower database, we can safely say we’d do a sound job recovering that information.

Of course, our concern here is to determine what “predict” is doing to come up with those values. We’ll take a look at that in a second; but first, we’ll examine prediction in the context of linear regression models in a more general sense.

We’ll start by examining how new values can be predicted on a set of data where **both **the y values and predictor variables are present (using the iris training data we created as an example). First, we need to separate our x values from our y values. We’ll pad the x matrix with a column of ones to represent the intercept.

Now, we need to transpose x and multiply it by itself (*X’X)*. This results in a sum of squares/cross-product matrix, *SSCP*. It turns out with a few manipulations, this matrix can depict the variance/co-variance, correlation, or cosine association between a set of variables. If you’re interested in hearing a bit more about that, check out my post here.

Our calculations match the output from the actual cross-product function.

Next, we are going to calculate a projection matrix or *hat matrix* (on it’s diagonal are observation leverages).

This will allow us to map *y* into predicted y values.

So, if we have access to the y values, we can combine them with a projection matrix to obtain predictions. That’s all well and good, but how can we predict new values of y in our test data using our training data? To do that, we need to calculate beta coefficients for our training data that contain information about the relationship between our y and x values.

We can ensure that out calculated beta values are indeed correct by comparing them to the values produce by “lm”.

We can add the model intercept through to our test data and multiply that by the beta coefficients (intercept excluded) to find our predicted values.

What we’ve learned so far can easily be extended to a kfold cross validation scheme. The function below will create folds and return a list containing the original data set with the folds as a new column. In addition, a list containing the fold indices themselves will also be returned.

Let’s create the folds and make sure they make sense.

And fit our model using 5 fold cross-validation:

We can extract and examine the predictive summary statistics.

Finally, we can pass our fold indices to the popular predictive modeling package, caret, and confirm our calculations.

Averaging across folds, our statistics match caret’s output.

In sum, we took a closer look at how prediction functions in the context of regression. We were ultimately able to apply the computations we covered to make predictions on left out data. Hopefully this tutorial provided a bit of clarity into what happens when you’re pressing ctrl+enter on a line of code containing the “lm” function.

How to create a simple statistical learning model from scratch: emphasizing prediction in… was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

]]>