INFO 447 - Social and Economic Data

John M. Abowd (john.abowd@cornell.edu)

Office: 358 East Ives Hall (255-8024; assistant Brenda Hans 255-2744)

Office hours by appointment

Course URL: http://instruct1.cit.cornell.edu/courses/info447/

Computer Science Course Mangement System Sign-in (homework submission and grades)

Class meets MWF 10:10-11:00 in 202 Thurston. Most Friday classes will be devoted to labs that you do on your own.

Class Sessions

Labs

Exams: midterm, there will be no final exam.

Project submission instructions.

Overview

The organizing theme of INFO 447 for spring 2007 is a project centered on a major data source, which you may choose.

You will learn the basics required to acquire and transform raw information into social and economic data. Legal, statistical, computing, and social science aspects of the data “manufacturing” process will all be treated. In your professional lives, most if not all of you will actively be using data products--either publicly available or proprietary--in order to drive decisions by your organizations. In order to be a “power” user of social and economic data you need to understand the principles of data production and use. And just as importantly, the course will challenge you to think about ways to improve these data products.

The class enables students to learn, practice and execute the steps in data production—from raw confidential files on individual persons, households and business establishments; to merging individual level data files from different sources; to detailed summary tabulations; to public use data products. INFO 447 is appropriate for upper level undergraduate and professional masters students who will be:

• users of data products, from the public and private sectors; and/or
• producers of data products for their organizations, working with existing data products from public and proprietary, as well as administrative or survey data collected by their organization.

Objectives

• learn basic statistical principles of populations and sampling frames for data collection and processing;
• learn how data are acquired via complete counts, sample surveys, and administrative records;
• how statistical agencies deal with missing data and other quality issues;
• understand the law, economics and statistics of data privacy and confidentiality protection; and
• acquire working knowledge of data linking and integration techniques (probabilistic record linking; multivariate statistical matching).

Prerequisites

• introductory course in statistics
• course or equivalent experience in data analysis
• upper level social science course OR permission of instructor

Reading Assignments

All reading assignments are online at the course URL . You may wish to purchase as references:

- Missing Data by Paul Allison (Sage, 2001)
- Survey Methodology by Robert Groves, et al. (Wiley 2001)

Additional required readings are available on-line through links on the class web site. Many of the reading assignments are taken from source documents—journal articles, professional papers, and government documents. Working with source documents enables us to see how professionals are dealing with critical issues in the collection, processing, evaluation and distribution of social and economic data. The disadvantage is that the materials are not neatly summarized and synthesized as they would be in a textbook. This course is an advanced seminar in which we explore topics. Through the class discussions and labs we develop an integrated grasp of the materials.

Test Information

There are two preliminary examinations and both are take-home, open-book examinations of your knowledge of the readings, lectures, and lab assignments. There is no final examination.

Course Projects

The project incorporates data collection, quality and dissemination issues explored during the semester in describing how a public or private policy issue of your choice is affected by quality and type of data in your chosen focus source.

The project will require students to draw upon data processing, statistical analysis, and analytical writing skills.

Lab Assignments

The lab assignments provide students with hands-on experience working with a variety of social and economic data files produced by the U.S. Census Bureau, Bureau of Labor Statistics, and Department of Housing and Urban Development. These are self-graded in the weekly Lab sessions. Successful completion of the lab assignments is necessary for the student’s mastery of the subject matter, as well as completing the exams and class project.

Discussion/Participation

Students are required to prepare assigned readings and be ready to not only answer questions in lecture but to lead discussion as well. Students will be assigned topics for which they hold the responsibility of presenting a summary and leading discussion. On some occasions students will be called upon, at random from the class roster, and expected to draw upon the reading and lecture materials relating to the question being discussed. The student's performance will be used to determine the Participation/Discussion portion of the semester grade. (See Attendance Policy regarding absences and late arrival.)

Submitting Assignments

The exams and project are to be submitted via the Computer Science Course Management System in the desired format (MS-Word and/or Excel) and virus free. Files with viruses will not be accepted. It is your responsibility to verify that the files you submit are virus free. Assignments are due by midnight of the due-date. Late assignments will be graded down 10 points (out of 100) for each day overdue. After 3 days, late assignments will not be accepted and the grade of zero will be assigned.

Grading and Attendance Policies

• Examinations - 25%
• Discussion (In Class and Electronic) - 25%
• Course Project - 50%
• TOTAL - 100%

Grades are based on the demonstrated level of knowledge and understanding of the subject matter, as well as perception and originality. Numeric scores will be awarded for project, exam, and discussion grades. The approach to assigning numeric grades is to start with an 85, assume you are capable of providing a good answer that demonstrates a “moderately broad knowledge and understanding of subject matter; (with) noticeable perception and/or originality.” [Cornell University Grading System] In order to earn additional points you will need to provide excellent answers that demonstrate you have a “comprehensive knowledge and understanding of subject matter; marked perception and/or originality.” Answers that demonstrate a satisfactory level of knowledge are worth a “C” and marginal answers demonstrating a “minimum of knowledge” and only “some perception and/or originality” are graded a “D.” Regular attendance and participation is an important aid to learning the materials in this course. Students are expected to attend all class sessions. While attendance is not taken, students are randomly called upon from the roster. Absences are noted as a zero for that day. Students who arrive late to class risk being absent when called upon.