| THE OHIO STATE UNIVERSITY | Winter Quarter 2008 |
| City and Regional Planning | Professor Philip A. Viton |
City and Regional Planning 870.03 — Forecasting and Simulation in Planning
_________________________________________________________________________________________
_________________________________________________________________________________________
| Credits: | 5 |
| Sequence No.: | 05069–4 |
| Meeting: | 259 Knowlton Hall |
| Time: | Tuesday 12:30–2:30 |
| Instructor’s Office: | 195 Knowlton Hall |
| Office Hours: | Tuesday/Thursday, 2:30–4:00 pm or by appointment |
| E-mail: | viton.1@osu.edu |
_________________________________________________________________________________________
_________________________________________________________________________________________
This course surveys some statistical methods of data analysis which are used to provide empirical answers to the sorts of questions planners raise. The emphasis of the course will be two-fold: on the methods themselves, and on assessing and understanding selected examples applying those methods. Many of the applications will be to economic questions, often the analysis of costs, benefits, and demand.
Your course grade will be based entirely on a term paper. In this paper you must gather some data and perform an original analysis on them, with a view to answering some interesting question. It is important to be clear at the outset that the work you do must be original: a paper which just summarizes what someone else has done is not acceptable. Moreover, since this is the only requirement for the course, the paper must represent a significant effort on your part. In addition to doing the statistical analysis (“running the regression”) your paper must explicitly discuss the validity and applicability of the method you use.
The paper is due in my mailbox in Knowlton 200N by noon on the Monday of Exam Week.
WARNING: Data gathering often takes time. You should start to think as soon as possible about the topic you wish to work on, and how you might get data for it.
There is an interesting series of papers which in my view are a model of how to do, discuss and present empirical work: you may care to take a look at them.
C&RP 770, C&RP 771, C&RP 781. In addition, you should understand partial derivatives and have a basic understanding of matrix algebra. I’ll try to keep the mathematical content at a fairly low level (unless there’s a demand for more rigor) — our primary concern will be to understand the tools, the context which gave rise to them, and the empirical work itself.
None. Most of our work will be from journal articles. But it is a good idea to begin assembling a library of basic statistical references. Of those concerned with econometrics and its applications, I think J. Kmenta, Elements of Econometrics, Macmillan, New York, 1971 is still very good. So is G.S. Maddala, Econometrics, McGraw-Hill, New York, 1977, and William H. Greene, Econometric Analysis, Macmillan Publishing Co., New York, N.Y., 1990. (Later editions available)
At a higher level, standard texts are H. Theil, Principles of Econometrics, John Wiley and Sons, New York, 1971; Peter Schmidt, Econometrics, Marcel Dekker, New York, 1976 (basically just a book of proved theorems, very useful as a reference), and George G. Judge, W.E. Griffiths, R. Carter Hill, Helmut Lütkepoh, and Tsoung-Chao Lee, The Theory and Practice of Econometrics, John Wiley and Co., New York, 1985, or the shorter George G. Judge, R. Carter Hill, William E. Griffiths, Helmut Lütkepohl, and Tsoung-Chao Lee, Introduction to the Theory and Practice of Econometrics, John Wiley, New York, 1988.
A very good readable guide to “what it’s all about” is Peter Kennedy, A Guide to Econometrics, MIT Press, Cambridge, MA, 4th edn., 1998.
On the purely statistical side, Robert V. Hogg and Allen T. Craig, Introduction to Mathematical Statistics, Macmillan, London, 1970 is useful to have around; also James E. Huneycutt, Introduction to Probability, Merrill, Columbus OH, 1973, which makes accessible some of the measure-theoretic ideas behind probability.
Copies of the articles in the reading list (but not the books) will be available for checkout in the “Oblique File” section of the KSA Library, organized by numbered section.
I’ve set up a small website for the course at http://facweb.knowlton.ohio-state.edu/pviton/courses2/crp8703
There may be occasional notes posted there (these will also be announced in class). One feature of the website is that there will be HTML and PDF copies of the syllabus, with live links to those readings which may be found on-line.
Many programs — even spreadsheet programs like Excel — can handle the basic multivariate regression problem. But I strongly recommend that you avoid Excel, and instead teach yourself to use a more specialized statistics package. There are two reasons for this. First, for more complicated models — simultaneous equations, logit and probit analysis, for example — Excel cannot cope (or, cannot cope well). Second, there is some doubt as to the accuracy and robustness of Excel’s statistical routines: see the McCullough references in Section 1 of the reading list, which concludes that “persons desiring to conduct statistical analysis of data are advised not to use Excel”.
There are many of specialized packages available, for example, SAS or SPSS, and even the free R system (supposedly a clone of the commercial S+ system, which is the system of choice for professional statisticians), but I particularly recommend Bill Greene’s Limdep package, which both C&RP and Civil Engineering have site-licensed. It can handle all the statistical methods we will be discussing in the course, with the exception of the spatial regression models. Limdep may not currently be installed in the KH student lab computers, but the IT staff will install it on request. (If you experience any problems getting them to install it, let me know). I have written an introduction to using Limdep, in the context of logit models: it is available in PDF format at:
There is a “student edition” of Limdep available for free on the KSA Faculty (aka Netstorage) drive, typically assigned as the X drive in
The main file is stat.exe. Just copy it to an empty folder on your own computer and run the file. You should accept all the defaults. You should also look at ealimdep.html in the same folder, which explains the restrictions on the student edition. Basically, except for the most recent and complicated models of discrete choice (the mixed-logit model) the student edition will suffice for almost any kind of statistical analysis, as long as your dataset isn’t huge.
Note that access to this is via the KSA network. If you are not in the KSA, come by my office with a zip disk (or writeable CD/jump drive) with at least 4M free (11MB if you also want the Reference Guide in pdf form).
Limdep’s weakness is in the manipulation of large datasets; but it is possible to use another program for this, and then write out the results in a form that LIMDEP can understand. In general however, the choice of a statistical package should be based on its ability to estimate the particular models you are interested in, with a minimal amount of programming on your part. (Almost all packages allow you to program your own estimation routines; but this is often tedious, and a waste of time if a “canned” routine already exists somewhere else).
Like SAS and SPSS, LIMDEP has no facilities for spatial data analysis (though with a bit of effort you could probably program your own). For the spatial regression models, your choices (this will be discussed in more detail when the time comes) are R, Luc Anselin’s free GeoDa program, or, if you have access to Matlab, a set of free programs by James LeSage which run under that system. GeoDa has limited data import facilities (it assumes that the independent variables are contained in the same dataset as the geographical coordinates data) and will not estimate, eg, panel data models, which the LeSage programs will; so I tend to prefer them. Also, LeSage’s programs include a set of non-spatial econometrics routines, so if you’re going to be doing both spatial and non-spatial estimation, using them may save you a certain amount of additional learning. On the other hand, Matlab is primarily a matrix programming language and not a data-processing language, so it doesn’t include facilities for working with variables by name: you must assemble them into an array, and then refer to numeric columns of that array.
Course Outline |
Begin by reviewing Kmenta chs. 7,8,10 or your favorite elementary econometrics book. This is a good time to read through Kennedy, chs. 1–3. The final two references discuss computer packages:
We begin with some straightforward applications, stressing theoretical foundation and choice of functional form.
Extensions of the basic econometric methods.
Here we study cost functions with an additional consideration: we know from theory that these are extremal (maximizing or minimizing) functions. The question is, how can we incorporate that insight into our statistical specification?
The basic theory is covered in Domencich–McFadden, chapters 3–5; and in Madalla, chs. 3–6. The McFadden papers show how one can apply the approach to the study of government decisions; and Small-Winston is an important application of the tobit model.
The nested logit model extends the standard discrete-choice model to cope with problems where the Independence of Irrelevant Alternatives property fails. Madalla, , chapters 3.3–3.12 is a simple exposition of the basic model; the McFadden readings are rather more difficult. The result is a model in which one can incorporate different assumed patterns of perceived similarity among options. Morrison–Winston and Feldman contain applications.
There are some significant recent developments in this area, centered around the Mixed Multinomial Logit model (MMNL), and more generally around the idea of estimation by simulation. In particular, this appears to make estimation of the multinomial probit model — in cases of more than 4 or so choices — feasible for the first time. The theoretical underpinnings are rather more sophisticated than we have hitherto encountered; moreover, software is only now beginning to be available. To important readings are: McFadden and Train (mostly theory, with some supporting applications); and Revelt and Train (application).
Another extension: this time to situations where elements of the choice set can be either discrete — as previously — or continuous.