DRA/K V Decision and Risk Analysis Business Forecasting and Regression Analysis Kiriakos Vlahos Spring 2000 DRA/K V Session overview вЂў Why do we need forecasting? вЂў Overview of forecasting techniques вЂў The components of time series вЂ“ Trend вЂ“ Seasonality вЂ“ Cycles вЂ“ Randomness вЂў Trend curves вЂў Causal forecasting and regression analysis вЂў Judgemental forecasting вЂў Scenario planning DRA/K V All forecasts are wrong Those who claim to forecast the future are all lying even if, by chance, they are later proved right DRA/K V Forecasting is ... Forecasting is like trying to drive a car blindfolded and following directions given by a person who is looking out of the back window DRA/K V Forecasting in business Forecasting in business is like sex in society, we have to have it, we cannot get along without it, everyone is doing it, one way or another, but nobody is sure he is doing it the best way. G W Plossl Last Frontiers for Profits DRA/K V Forecasting in organisations вЂў Marketing вЂ“ Sales, prices, social and economic trends вЂў Production вЂ“ Demand, costs, employment and machinery requirements вЂў Finance вЂ“ Costs, sales, capital expenditure, economic climate вЂў R&D вЂ“ Technological developments, new products вЂў Top management вЂ“ Total sales, costs, pricing, economic trends, competitorsвЂ™ positioning DRA/K V Formal vrs. informal forecasting вЂў Forecasting is a very common activity вЂў The majority of forecasting is informal вЂў Why do we need formal forecasting? вЂ“ Coping with complexity вЂ“ Coping with growth вЂ“ Coping with change вЂ“ Need for auditability and justification Formal forecasting provides a vehicle for communication about the forecast and a basis for systematic improvement. DRA/K V Characteristics of forecasting problems вЂў Time horizon вЂ“ short-term вЂ“ long-term вЂў Data patterns вЂ“ Seasonality вЂ“ Trend вЂ“ Cycles вЂ“ Randomness вЂў Cost вЂў Complexity вЂў Accuracy Data patterns - Trend DRA/K V Medium to long term movements Upward or downward e.g. 197 8 197 9 198 0 198 1 198 2 198 3 198 4 198 5 198 6 198 7 DRA/K V Data patterns - cycles Long-term irregular movements, e.g. Government debt since the American revolution Data patterns Seasonality DRA/K V Regular periodic oscillations. They can be monthly, quarterly, etc. e.g. Turnover (ВЈm) Ja n-84 Ja n-85 Ja n-86 Ja n-87 Additive or multiplicative Ja n-88 DRA/K V Data patterns Random oscillations Unsystematic, oscillations around a constant mean. No trend cycle or seasonality 11 5 11 0 10 5 10 0 95 90 85 Classification of forecasting methods DRA/K V F o re ca sting m e tho d s Q ua ntita tive Tim e S e rie s Tre nd C urve s Jud g e m e nta l Ind ivid ua l G ro up S e a so na l D e co m p o sitio n C o m m itte e s E xp o ne ntia l S m o o thing D e lp hi C a usa l F o re ca sting R e g re ssio n E co no m e tric fo re ca sting Tra ck ing sig na ls S ce na rio p la nning M a rk e t s urve ys R o le p la ying DRA/K V Regression overview вЂў Why understanding relationships is important вЂў Visual tools for analysing relationships вЂў Correlation вЂ“ Interpretation вЂ“ Pitfalls вЂў Regression вЂ“ Building models вЂ“ Interpreting and evaluating models вЂ“ Assessing model validity вЂ“ Data transformations вЂ“ Use of dummy variables DRA/K V Why analysing relationships is important вЂў Development of theory in the social sciences and empirical testing вЂў Finance e.g. вЂ“ How are stock prices affected by market movements? вЂ“ What is the impact of mergers on stockholder value? вЂў Marketing e.g. вЂ“ How effective are different types of advertising? вЂ“ Do promotions simply shift sales without affecting overall volume? вЂў Economics e.g. вЂ“ How do interest rates affect consumer behaviour? вЂ“ How do exchange rates influence imports and exports? Sales vrs advertising Sales (units) DRA/K V Advertising (ВЈ000) DRA/K V Estimating betas The slope of this line is called the beta of the stock and is an estimate of its market risk. DRA/K V Scatter plots вЂў What are they? A graphical tool for examining the relationship between variables вЂў What are they good for? For determining вЂў Whether variables are related вЂў the direction of the relationship вЂў the type of relationship вЂў the strength of the relationship Correlation DRA/K V вЂў What is it? A measure of the strength of linear relationships between variables вЂў How to calculate? a) Calculate standard deviations sx, sy b) Calculate the correlation using the formula rxy пЂЅ пѓҐ (x i пЂ x )( y i пЂ y ) i ( N пЂ 1) s x s y вЂў Possible values From -1 to 1 DRA/K V Interpreting the correlation DRA/K V Correlation Pitfalls вЂў Correlation measures only linear relationships вЂў Existence of a relationship does not imply causality вЂў Even if there exists a causal relationship, the direction may not be obvious DRA/K V Correlation and Causality Many nations see improving communications as vital to boost overall economy. A 1% increment in telephone density yields an increment of about 0.1% in per-capita GNP, according to a 1983 OECD-ITU study. AT&T advertisement in Fortune Dec 97 Ferric Processing DRA/K V What are the factors influencing production costs? Plant age Capacity ? ? Production costs ? Plant location ? Other plant features Predicting production cost is important for the negotiation of 5-year contracts with steel companies Visual inspection DRA/K V a) Construct scatter plot 30 cost/ton ($) 25 20 15 10 0 0.5 1 1.5 2 2.5 3 capacity (000 tons/month) b) Calculate correlation (excel function CORREL) The correlation between cost and capacity is -0.84 c) Candidate model Cost = a + b Capacity 3.5 Simple Linear Regression DRA/K V Simple regression estimates a linear equation which corresponds to straight line that passes through the data 30 cost/ton ($) 25 20 15 10 0 0.5 1 1.5 2 2.5 3 3.5 capacity (000 tons/month) Regression model Cost = 25.2 - 4.4 Capacity Dependent Constant or variable intercept Coefficient Independent or slope or explanatory variable Least squares DRA/K V 30 Residuals cost/ton ($) 25 20 15 10 0 0.5 1 1.5 2 2.5 3 3.5 capacity (000 tons/month) вЂў Residuals are the vertical distances of the points from the regression line вЂў In least squares regression вЂ“ The sum of squared residuals is minimised вЂ“ The mean of residuals is zero вЂ“ residuals are assumed to be randomly distributed around the mean according to the normal distribution Excel output DRA/K V Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations Observe adjusted R2 0.84 0.70 0.66 2.33 10 s ANOVA df SS 100.65 43.59 144.23 MS 100.65 5.45 Coefficients Standard Error 25.19 1.86 -4.40 1.02 t Stat 13.55 -4.30 Regression Residual Total 1 8 9 Intercept Capacity Read equation sb F Significance F 18.47 0.00 P-value Lower 95% Upper 95% 0.00 20.91 29.48 0.00 -6.77 -2.04 Observe statistics The standard error s is simply the st. deviation of the residuals (a measure of variability) R2 is the most widely measure of goodness of fit. R пЂЅ 1пЂ 2 s 2 s 2 y пЂЅ 1пЂ residual variance dependent variable variance It can be interpreted as the proportion of the variance of the dependent variable explained by the model. Use the adjusted R2 ,which accounts for the no. of observations. DRA/K V Hypothesis testing Does a relationship between capacity and cost really exist? If we draw a different sample, would we still see the same relationship? Or in stats jargon Is the slope significantly different from zero? y b=0 x b=0 implies no relationship between x and y Hypothesis testing Test whether b=0 t-values and p-values DRA/K V Distribution of estimate of slope if b=0 p-value 0 b t-value * sb sb is the st. deviation of the slope estimate b t-value = b/sb p-value is the probability of getting an estimate of slope at least as large as b. Equivalent tests (5% significance level) |T-value| > 2 p-value < 0.05 DRA/K V Checking residuals Residuals should be random. Any systematic pattern indicates that our model is incomplete. Problematic patterns Heteroscedasticity Autocorrelated residuals Ferric - Residuals DRA/K V Line fit Plot 30 Cost/ton 25 Actual Predicted 20 15 10 0 1 2 3 4 Capacity Residual Plot 5 4 Residuals 3 2 1 0 -1 0 1 2 3 -2 -3 -4 Capacity Are residuals random? Can you see any pattern? 4 Combining theory and judgement DRA/K V The relationship appears to be non linear. We can fit non-linear relationships by introducing suitable transformations, e.g. y Ln(y) y=aebx Ln(y)=ln(a)+bx x What transformation is appropriate for the Ferric data? Use judgement e.g. Total Cost (TC) = Fixed Cost + Variable Cost TC = FC + Unit Cost (UC)* Quantity(Q) TC/Q = FC/Q + UC e.g. Average Cost = b/Q + a This suggests that average costs are inversely proportionate to capacity x Transforming the data DRA/K V Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.97 0.95 0.94 0.98 10 Coefficients Standard Error 11.75 0.60 7.93 0.67 Intercept 1/Capacity t Stat P-value Lower 95% Upper 95% 19.53 0.00 10.36 13.13 11.88 0.00 6.39 9.46 Line Fit Plot 30 Cost/ton 25 20 Actual 15 Predicted 10 0.00 0.50 1.00 1.50 2.00 2.50 1/Capacity 30 cost/ton ($) 25 20 15 10 0 0.5 1 1.5 2 2.5 capacity (000 tons/month) 3 3.5 DRA/K V Model comparison вЂў High adusted R2 вЂў All coefficients significant вЂ“ t-values or p-values вЂў Low standard error вЂў No pattern in residuals вЂў Is model supported by theory? вЂў Does the model make sense? Criteria High adjusted R2 All coefficients significant Low residual st. dev. (s) No pattern in residuals Equation makes sense First model 66% Yes 2.33 No Yes (?) Transformed model 94% Yes 0.98 Yes Yes The transformed model is better: Cost = 11.75 + 7.93 * (1/Capacity) DRA/K V Forecasting & confidence intervals вЂў If capacity is 2 what is the forecast for cost? вЂ“ Cost = 11.75 + 7.93 (1/2) = 15.71 вЂў Approximate 95% confidence interval: 15.71 п‚± 2 * s where s=0.98 is the standard error вЂў The greater the number of observations the better the approximation вЂў More accurate intervals can be calculated using statistical packages Confidence intervals DRA/K V Plot of Fitted Model 29 COST 26 23 20 17 14 0 0.5 1 1.5 2 2.5 3 1/CAPACITY Statgraphics gives two sets of intervals. вЂў Outer bands are prediction intervals for an individual plant вЂў Inner bands are confidence intervals for the average cost from all plants. The can be viewed as the confidence intervals for the regression line. Is plant age important? DRA/K V Multiple regression Cost = a + b(1/Capacity)+ cYear + e Correlation matrix Cost/ton Cost/ton 1 Year -0.74237 1/Capacity 0.9728 Year 1/Capacity 1 -0.67071 1 Regression analysis Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations Intercept Year 1/Capacity 0.98 0.96 0.95 0.90 10 Coefficients Standard Error 542.01 326.41 -0.27 0.16 7.03 0.82 Is this a good model? t Stat P-value 1.66 0.14 -1.62 0.15 8.58 0.00 Lower 95% Upper 95% -229.83 1313.84 -0.66 0.12 5.09 8.97 Multicollinearity DRA/K V Multicollinearity appears when explanatory variables are highly correlated. Effects: вЂў Including Year adds little information, hence fit does not improve much вЂў Parameter estimates become unreliable Remedial action: вЂў Remove one of the correlated variables Moral: вЂў Check for correlations between explanatory variables 30 81 cost/ton ($) 25 81 20 83 85 84 15 85 86 85 87 87 10 0 1 2 capacity (000 tons/month) 3 4 DRA/K V Other inappropriate models Influential observations and outliers Clustering of data Dummy variables DRA/K V War years Bond purchases and national income Year 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 B 2.6 3.0 3.6 3.7 3.8 4.1 4.4 7.1 8.0 8.9 9.7 10.2 10.1 7.9 8.7 9.1 10.1 Y 2.4 2.8 3.1 3.4 3.9 4.0 4.2 5.1 6.3 8.1 8.8 9.6 9.7 9.6 10.4 12.0 12.9 W 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 Regression equation: B = 1.29+.68Y+2.3W DRA/K V вЂў вЂў вЂў вЂў Regression checklist Visually inspect the data (scatter plots) Calculate correlations Develop and fit sensible model(s) Assess and compare the model(s) вЂ“ Significance of variables (t-values, p-values) вЂ“ adjusted R2 вЂ“ standard error (s) вЂ“ residual plots вЂў autocorrelation вЂў heteroscedasticity вЂў Normality вЂў Outliers, influencial observations вЂ“ Does the model make sense? вЂў If you are satisfied use the model for вЂ“ developing business insights вЂ“ forecasting DRA/K V Trend curves вЂў Also known as growth/decay curves вЂў Most common curves вЂ“ Linear вЂ“ Quadratic вЂ“ Exponential вЂ“ Logarithmic вЂ“ S-curves Fitting trend curves Transform the original data so that a linear equation of the form y=a+bx arises. Then apply regression analysis. Example: Y t пЂЅ ab t пѓћ log( Y t ) пЂЅ log( a ) пЂ« log( b ) t DRA/K V Credit card turnover Visa turnover Exponential Growth curve ВЈbn 16 14 12 10 8 6 4 2 0 1978 1981 1984 Actual 1987 Predicted How would you use such curve for forecasting? What role does judgement play in trend projection? Other trend curves (S curves) DRA/K V Simple modified exponential Y t пЂЅ c пЂ« ab t Yt пЂЅ ae bпЂј0 bt пЂ« ct 2 cпЂј0 Gompertz curve Y t пЂЅ ae Logarithmic parabola be ct b пЂј 0, c пЂј 0 Logistic curve Yt пЂЅ 1 1 пЂ« be cпЂј0 ct Trend and seasonality DRA/K V Quarterly data Time 1 2 3 4 5 6 7 8 9 Sales 37.2 15.7 11 26.6 28.9 12 6.6 20.9 23.5 q1 0 1 0 0 0 1 0 0 0 q2 0 0 1 0 0 0 1 0 0 q3 0 0 0 1 0 0 0 1 0 q4 1 0 0 0 1 0 0 0 1 $m Sales 40 35 30 25 20 15 10 5 0 0 10 20 Quarters 30 40 Regression with seasonal dummy variables Sales = a + b Time + c q2 + d q3 + e q4 Include q1 in the model? Multiple regression with seasonal dummies DRA/K V Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.95 0.90 0.88 2.96 36.00 Coefficients Standard Error 14.78 1.31 -3.75 1.40 8.53 1.40 15.66 1.40 -0.25 0.05 Intercept q2 q3 q4 Time t Stat 11.30 -2.69 6.10 11.23 -5.17 P-value Lower 95% Upper 95% 0.00 12.11 17.44 0.01 -6.60 -0.91 0.00 5.68 11.38 0.00 12.82 18.51 0.00 -0.34 -0.15 Equation: ? Interpretation: ? Time Line Fit Plot 40 35 Sales Predicted Sales 30 Sales 25 20 15 10 5 0 0 5 10 15 20 Time 25 30 35 40 DRA/K V Econometric modelling Regression Analysis Sales = f(GNP, price, advertising) Econometric Modelling Sales = f(GNP, price, advertising) advertising = f(salest-1) production cost = f(sales, labour cost, materials cost) price = f(production cost, price of substitutes) exogenous - endogenous variables Simultaneous parameter estimation DRA/K V The CEF model вЂў The CEF model of the UK economy вЂ“ Agents вЂў Individuals вЂў Banks вЂў Other financial institutions вЂў Government вЂў Overseas agents вЂ“ Markets вЂў Market for goods and services вЂў Market for labour вЂў Market for capital goods вЂў Agents interact in each market influencing supply and demand, which in turn determine price and quantities. вЂ“ 500 equations (!) DRA/K V Judgemental forecasting вЂў Individual вЂ“ Subjective probability assessment вЂў Group forecasting вЂ“ Sales force method вЂ“ Executive committees вЂ“ Expert panels вЂ“ Delphi method вЂў (Feedback, reassessment) вЂў Problems in judgemental forecasting вЂ“ Bias вЂ“ Anchoring вЂ“ Conservatism/Optimism вЂ“ Overconfidence вЂў Combining forecasts DRA/K V Forecasting change DRA/K V Crude price oil forecasts The dangers of straight line forecasts DRA/K V Energy forecasting in West Germany Energy Consumption - forecasts vrs. actual data From Diefenbacher and Johnson вЂњThe politics of Energy ForecastingвЂќ Persistence of mental models! DRA/K V Airline industry forecasts DRA/K V Forecasting & Planning вЂў Traditional view of forecasting вЂ“ The past explains the future вЂ“ Passive or adaptive attitude towards the future вЂў Modern view вЂ“ Active and creative approaches to forecasting вЂ“ Making things happen DRA/K V Scenario planning вЂњIt is impossible to forecast the future and it may be dangerous to do soвЂќ Use of scenarios in planning Develop a small number of internally consistent and credible views of how the world will look in the future, that present testing conditions for the business. The future will of course be different from all of these views/scenarios, but if the company is prepared to cope with any of them, it will be able to cope with the real world. DRA/K V Scenarios in Shell Oil shock scenario: Scenario design Strategic Plan Event Result Shell analyse the impact of a $15/bbl price on cash flows and investment plans Early 1985 Re-evaluation of up-stream plans and cash-flow position of the operating companies Oil price falls from $28/bbl to $10/bbl Early 1986 DRA/K V Advantages of Scenario Planning вЂў Challenge preconceived ideas and single point forecasts вЂў Explore a wide range of uncertainties вЂў Encourage an active and creative attitude to the future вЂў Provide a background for specific project evaluation вЂў Provide a vehicle for communication between the different parts of the organisation DRA/K V Forecasting - Summary вЂў All forecasts are wrong! вЂў Never trust single point forecasts вЂў Data patterns вЂ“ Trend, seasonality, cyclicality, randomness вЂў Time-series forecasting вЂ“ Trend curves вЂў Causal forecasting вЂ“ Regression вЂў Judgemental forecasting вЂў Scenario planning DRA/K V Preparation for Regression workshop вЂў Read the note on Regression Analysis вЂў Work on the вЂњTutorial on Regression Analysis using ExcelвЂќ вЂў Practice on creating descriptive statistics and histograms in Excel (ExcelStats.xls) вЂў Select your workshop partner вЂў In preparation for the exam work on regression exercises

1/--страниц