- 1.5 credits
- Prerequisite: MBA Core
Data mining is a topic that is very much in style, and is being implemented by most large companies. These companies are able to collect huge quantities of data with relative ease. The problems they face are how to get the data in an appropriate form for analysis, and how to analyze the data appropriately. There are actually a number of different problems business people try to solve with data mining, and these typically require different analysis techniques. Therefore, it is important to recognize that data mining is not a single data analysis technique; rather, it is a diverse collection of techniques.
Data mining in the real business world generally draws on huge database, called data warehouses, that are collected and structured for the express purpose of data mining (as opposed to day-to-day transactions). Once the data are accessible, statistical techniques must be used to perform the data mining. There are a number of techniques available, as well as a number of software packages to implement them. One particular software package, Model 1, will be the focus of the data analysis for the semester. Some of the common techniques will be described, and illustrated, using a hands-on approach.
Emphasis in the course will be on the application of data mining to problems in marketing, finance, and other business disciplines. While there will be classroom discussions about the various data mining tools and techniques, the mathematics and statistics behind these techniques will not be emphasized. A major portion of each student's time during this 8-week course will involve applying her or his knowledge of Model 1 to a real-world data set. Students will be divided into teams, and each team will be asked to analyze the data set, generating both a written and an oral report to "management."
- What types of problems can be solved using data mining?
- What are the problems associated with getting data into a proper format for analysis? Building a model.
- Pre-selection bias: problems that occur because the wrong population was sampled in a previous survey.
- False artifact: data that appear to be relevant but actually are misleading or wrong.
- Dynamic properties: The changing characteristics of data (over time) need to be incorporated into the model.
- Data mining tools and their strengths and weaknesses.
- Clustering: grouping of data that share similar trends and patterns
- Decision trees: dividing data into logical categories (branches) according to specified characteristics.
- Genetic algorithms: optimizing combinations of data using the process of biological evolution.
- Neural networks: applying a learning process to data that attempts to mimic the human brain.
- Statistical tools: various forms of regression analysis are used, especially where the dependent variable is 0/1.
- How to apply data mining software to real data, and then interpret its output for management decisions.
Course Materials: Data Mining: Building Competitive Advantage, by Robert Groth (Prentice Hall, 2000).