Tuesday, May 5, 2020

Statistical Inference & Regression Analysis Free Sample

Question: 1. Use the same data that was used in Assignment 1 which is the SWEETS-4-U2 data for the previous year which consists of 52 weekly values of the sales and costs for the two popular product lines namely Forgive and Rejoice. These two products are both wrapped chocolates sold by weight. The only difference between Forgive and Rejoice is that different messages are attached to each type of chocolate. The Forgive chocolates have messages like Sorry, Forgive Me and Trust Me and the Rejoice chocolates have messages like Celebrate, Have Fun and I Love You. The 52 Sales and Cost values for both types of chocolate are given in SWEETS-4-U2.xls. The Accountant for SWEETS-4-U2 is worried that the business is spending too much on advertising for Rejoice as sales now exceed what the firm has budgeted for. In the business plan the firm had assumed that the average weekly sales for Rejoice was $450. If sales are more than $450 then the firm will be able to reduce its advertising spending. (You can use Excel or your calculator for any calculations.) [a] Using Excel and the weekly sales data find the mean and standard deviation of Rejoice Sales. [b] Find the 90% interval estimate of the average weekly sales for Rejoice. [c] Using a level of significance of a = 0.05 test whether the average of the weekly sales for Rejoice are more than 450. [d] Briefly explain what a Type I error and what a Type II error are and what are their costs or consequences in this problem. [e] Find the p value in this situation. Explain what the p value is and how we would use the p value to test the hypotheses. [f] Using the p value test the hypotheses when the level of significance is a = 0.10 [g] Briefly explain how and when we can use an interval estimate when testing hypotheses. 2. Suppose the Sales manager of the Sweets-4-U2 chain of confectionary stores is interested in the relationship between Sales and Total costs for the Rejoice range of chocolates. i.e. how total cost is affected by sales, for the Rejoice product line. The 52 Sales and Cost values for both types of chocolate are given in SWEETS-4-U2.xls. Unless otherwise stated use a level of significance of a = 0.05.) (You can use Excel or your calculator for any calculations.) [a] Obtain the scatter diagram, the covariance and the correlation coefficient for Total Costs and Sales for the Rejoice chocolates. Briefly explain what this graph and these values are telling us about the relationship between Total Costs and Sales [b] Write down the two forms of the Population Regression function you would assume here. Briefly explain how we interpret the conditional mean E(Y | X) and the error term (e). [c] Estimate the sample regression function. Write down your estimated model and briefly explain what the estimated intercept and estimated slope are telling us about the relationship between the Total Costs and Sales for Rejoice chocolates. [d] Using the F statistic, the R-squared value and the p-value for the estimated slope briefly discuss whether this estimated model does or does not show that there is a significant relationship between Total Costs and Sales for Rejoice chocolates. (With a sample of n = 52 you can assume that the critical values for the t statistic are the same as the critical values for a z statistic.) [e] Test the following hypotheses concerning the slope H0 : b1 = 0.8 and H1 : b1 0.8 [f] Using you estimated model forecast the Total Costs when Sales are 200. Comment briefly on how useful this forecast will be. Briefly explain what we mean by the terms Prediction Interval and Confidence Interval [g] Using the F statistic, the R-squared value and the scatter diagram which shows the Residuals on the vertical axis and the values of Sales (our X variable) on the horizontal axis briefly discuss whether our estimated model can be seen as a reliable estimate of the relationship between Total Costs and Sales for Rejoice chocolates. Answer: a) Using Excel and the weekly sales data find the mean and standard deviation of Rejoice Sales. The mean and standard deviation by using excel is given below: b) Find the 90% interval estimate of the average weekly sales for Rejoice. The 90% confidence interval estimate of the average weekly sales for rejoice is given in the following table: c) Using a level of significance of a = 0.05 test whether the average of the weekly sales for Rejoice are more than 450. Here we have to use the one sample t test for the population average of the weekly sales for the Rejoice. The total test with calculations are given below: Null hypothesis: Population average for weekly sales is $450. Alternative hypothesis: Population average for weekly sales is more than $450. One sample t test for Rejoice sales d) Here we reject the null hypothesis that the population average for the weekly sales is $450. Briefly explain what a Type I error and what a Type II error are and what are their costs or consequences in this problem. Type I error is nothing but the probability of the rejecting the null hypothesis when it is true and type II error is the probability of the not rejecting the null hypothesis even though it is not true. For this problem, the type I error is the probability of rejecting the null hypothesis that the average of the weekly sales for Rejoice is $450 when actually it is true that the average of the weekly sales for Rejoice is $450. The type II error is the probability of not rejecting the null hypothesis that the average of the weekly sales for Rejoice is $450 when actually it is false that the average of the weekly sales for Rejoice is $450. e) Here we are given a p-value = 0.0833 and we know the following decision rule: Decision rule: We reject the null hypothesis when the p-value is less than the alpha value or level of significance and we do not reject the null hypothesis when the p-value is greater than the alpha value or level of significance. Here, we have alpha value = 0.05 and p-value = 0.0833, that is, we have p-value alpha value, therefore, we do not reject the null hypothesis that the average of the weekly sales for Rejoice is $450. f) Using the p value test the hypotheses when the level of significance is a = 0.10 The t test for the average for alpha = 0.10 is given below: t test for rejoice (alpha = 0.10) Here also we get the p-value = 0.0833 And we have alpha = 0.10 Here, p-value alpha value Therefore, we reject the null hypothesis that the average of the weekly sales for Rejoice is $450. g) Briefly explain how and when we can use an interval estimate when testing hypotheses. If the test statistic value in the testing hypothesis is lies between the lower limit and upper limit of the given confidence interval, then we do not reject the null hypothesis that the average of the weekly sales for Rejoice is $450 and if the test statistic value is out of this confidence interval then we reject the null hypothesis that the average of the weekly sales for Rejoice is $450. The scatter plot for the total costs and sales for the Rejoice Chocolates is given as below: For this scatter plot y represents the total costs and x represents the sales for the rejoice chocolates. The covariance for the total cost and sales is given as below: The correlation coefficient for the total cost and sales is given as below: h) Write down the two forms of the Population Regression function you would assume here. Briefly explain how we interpret the conditional mean E(Y | X) and the error term (e). The two forms of the population regression function is written as below: Y = a + b*X Where Y is dependent variable, X is independent variable, a is the y-intercept and b is the slope of the regression line. In another form of the population regression function is given as below: Total cost = a + b*Sales We interpret the conditional mean E(Y|X) when we already given the mean of the x values or values of independent variable. The error term is nothing but the difference between the predicted value and the actual value. i) Estimate the sample regression function. Write down your estimated model and briefly explain what the estimated intercept and estimated slope are telling us about the relationship between the Total Costs and Sales for Rejoice chocolates. Here we have to estimate the sample regression function. For this regression model, we take the dependent variable y as the total cost and independent variable x as the sales of the rejoice chocolates. The regression analysis by using excel is given as below: Here we get the correlation coefficient as the 0.9723, this indicate that there is high association between the dependent variable total cost and independent variable sales of the rejoice chocolates. There is high linear relationship found between the dependent variable total cost and independent variable sales of the rejoice chocolates. The ANOVA table for this regression model is given as below: The coefficients for the regression equation are given in the following table: The regression equation is given as below: Y = 51.7669 + 1.0014*X Total cost = 51.7669 + 1.0014*sales j) Using the F statistic, the R-squared value and the p-value for the estimated slope briefly discuss whether this estimated model does or does not show that there is a significant relationship between Total Costs and Sales for Rejoice chocolates. (With a sample of n = 52 you can assume that the critical values for the t statistic are the same as the critical values for a z statistic.) Here the F statistic value is given as 865.47 and p-value is given as 0.00. The value of coefficient of determination or R squared is given as 0.9454, this means, about 94.54% of the variation in the dependent variable total cost is explained by the independent variable sales of rejoice chocolates for this regression model. For this regression model, the p-value is given as 0.00 which is less than the level of significance or alpha value 0.05, therefore we reject the null hypothesis that the linear relationship between the dependent variable total cost and independent variable sales of rejoice chocolates is significant. k) Test the following hypotheses concerning the slope H0 : b1 = 0.8 and H1 : b1 0.8 The complete test procedure is given below: Null hypothesis: H0 : b1 = 0.8 Alternative hypothesis: H1 : b1 0.8 Level of significance = alpha = 0.05 Degrees of freedom = n 2 = 52 2 = 50 Critical value = 2.008559072 Test statistic formula is given as below: t = b1 / SE where SE is the standard error and formula for standard error is given as bleow: SE = sb1 = sqrt [ (yi - i)2 / (n - 2) ] / sqrt [ (xi - x)2 ] The standard error is given as SE = 0.1373 Test statistic = t = 0.8 / 0.1373 = 5.8267 Critical value = 2.008559072 Test statistic Critical value Decision rule: Reject null hypothesis when test statistic value critical value Here test statistic value is greater than the critical value, therefore, we reject the null hypothesis that the slope is 0.8. l) Using you estimated model forecast the Total Costs when Sales are 200. Comment briefly on how useful this forecast will be. Briefly explain what we mean by the terms Prediction Interval and Confidence Interval Here we are given value of sales = 200. Now for forecasting the total costs we have to use the above regression model which is given as below: Y = a + b*X Total cost = a + b*Sales Y = 51.7669 + 1.0014*X Total cost = 51.7669 + 1.0014*sales Now, plug sales = 200 in the above regression equation for estimation or forecasting of total costs. Total cost = 51.7669 + 1.0014*200 = 252.0469 Total cost = $252.0469 Prediction interval means, the interval in which the predicted values lies in and the confidence interval means, the interval for which we have exact probability or confidence that the values are lies within this interval. m) Using the F statistic, the R-squared value and the scatter diagram which shows the Residuals on the vertical axis and the values of Sales (our X variable) on the horizontal axis briefly discuss whether our estimated model can be seen as a reliable estimate of the relationship between Total Costs and Sales for Rejoice chocolates. For this regression model, we get the F statistic value as 865.4676 which is very high. The value of coefficient of determination or R squared is given as 0.9454, this means, about 94.54% of the variation in the dependent variable total cost is explained by the independent variable sales of rejoice chocolates for this regression model. The scatter plot clearly shows the linear relationship exist between the dependent variable total cost and independent variable sales of rejoice chocolates. For this regression model, we reject the null hypothesis that the linear relationship between the dependent variable total cost and independent variable sales of rejoice chocolates is significant.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.