Essential data mining strategies
For critical issues facing private and public sector executives today, data mining makes the difference. From customer relationship management to risk management to improved production on the factory floor to detecting fraud and abuse, more leading-edge organizations are discovering every day that data mining gives them something reports and even OLAP can?t ? the ability to proactively make changes that help them reach their goals.
In the past few years, many businesses and some public sector agencies have invested heavily in some combination of ERP, supply chain management, sales force automation, data warehouse, query and reporting and OLAP software. Wanting a better return on those investments, they are now wondering whether to invest in data mining.
On the surface, compared to the others, data mining seems to be the riskiest investment. Data mining jargon is thick, the math behind the scenes is mysterious and it seems to touch only a few people in the organization.
However, when you look a little closer, the risk isn?t great ? in fact it is probably much more risky to not take the plunge into data mining. The investments in ERP, data warehousing, OLAP, etc. were important, but lacked leverage for three key reasons. First, they primarily replaced and updated existing ways of getting things done ? which means their value was only the marginal improvement in productivity they offered. Second, because so many organizations implemented them in more or less the same time period, they didn?t offer a competitive advantage. And third, because all of them deal with what has already happened, they didn?t offer decision-makers predictive leverage to make organizational changes needed to succeed in the future the changes organizations need to succeed going forward.
The good news is that we are still in the early part of the curve of the inevitable widespread use of data mining. Today, industry leaders, innovative start-ups and progressive-thinking agencies are using data mining successfully to predict and change their futures. These data miners are realizing not only a return on their investment in data mining, but a return on their other IT investments, especially their data warehouses.
The remainder of this article passes along some tips for data miners based on the experience of these forward-thinking early adopters of data mining. You?ll see that you don?t have to be Einstein to do data mining and that data mining can have widespread positive impact in your organization.
You?ll also see that data mining is really about achieving your organization?s goals, not about the math and the statistics. Data mining enables you to go beyond reporting and OLAP ? to learn not only what happened in your operations, but why things happened ? and to make changes so things work out better in the future. Only data mining gives you predictive models ? views of the future.
This forward-looking information enables you to make positive changes throughout your operations. And, the results of data mining can easily be deployed to all the decision-makers in your organization ? including ?virtual? decision-makers such as your Web site and operational systems ? so they can make improvements exactly where needed.
Despite the OLAP vendors? sloganeering about ?better decision-making? there is simply no way to get useful predictive information without data mining.
Below you can see the difference between reporting, OLAP and data mining.
Don?t wait to get started ? the competition is only a mouse click away
Data mining is a journey ? an ongoing initiative ? not a project. When you?ve addressed today?s critical issues, new ones pop up due to changes in your market or the technology. Given the boom in data mining, it?s probable that your competition is already employing data mining in critical areas such as customer relationship management to better attract, cross-sell and retain new customers. Or, they may be using data mining to offer an automated, but personalized, experience to your customers or constituents at their Web site.
To start before it is too late ? before your reports tell you that you?ve lost key customers or funds (and leave you guessing about whether you?ve lost more since then) ? you may need to outsource your initial effort into data mining. The critical questions are:
? Is your staff skilled and experienced? Most organizations do not have many people on staff, in either line of business or IT roles, with much data mining experience. If your CIO tells you that the staff is experienced in data mining because they?ve built a data warehouse or have implemented OLAP, you know you?ve met the Chief ?I-don?t-really-get-it? Officer. You need to outsource to get the data mining/model-building expertise you need to be successful.
? Do you have the technology infrastructure? Data mining requires data (see "There's a reason it's called 'data mining'," page X). Do you have a good, clean marketing database or a data warehouse? If not, to get going quickly, you?ll likely want to outsource this activity.
As you start, plan ahead. While it?s true that achieving results on your initial efforts are key, because data mining is a journey it is also important to plan for growth, both of the data warehouse and the scope of data mining participation. As you make choices about vendors, technology, etc., be sure to always consider scalability (the ability to work with very large data sets) and flexibility (the ability to apply the technology to a variety of situations).
Begin with the end in mind
Personal productivity guru Steven Covey?s maxim applies to your data mining efforts in a big way. Don?t be a hammer looking for a nail ? there?s no point in just heading off to crunch a bunch of numbers, or even to gather data in a data warehouse, without first deciding what results you want to achieve.
The context for data mining is the things that are critical for your organization?s success. Start data mining by tackling a project that is clearly linked to what you want to accomplish. Successful data mining initiatives typically start small ? they choose a critical organizational issue, such as retaining customers longer, and get right to work. Review your organization?s strategic plan and/or business plan. Is there an area where you aren?t making the progress you hoped for? What information do you need to make decisions that get things back on track?
Most often, in organization?s where data mining has really taken root, the first data mining project informed decision-makers on an important topic, and had both a short time frame and clear deliverables.
Decision-makers tend to value data mining most favorably when they can take action based on the results. So, throughout the planning phase, keep the emphasis on delivering actionable results, not on data storage or the techniques you?ll use to generate the results.
Beginning with the end in mind includes:
? Defining measures which can drive improvement
? Delivering measures on which people are prepared to act
? Ensuring the measures are easy to communicate and understand
? Making sure your measures really fit the problem ? that they will be useful over time, are sensitive to small changes, etc
It?s the decision maker, stupid!
Presidential candidates know that in order to be elected, they need to stay focused on the issue that is critical to voters ? their pocketbooks. So, their mantra becomes, ?It?s the economy, stupid.? Successful data mining project leaders also retain a laser-like focus throughout the project ? on the people who will use the results of the project make important decisions ? the decision makers.
To be successful, both the decision makers (the line of business or public sector executives) and the IT organization must buy in to the project plan. And, typically in successful data mining projects, the decision maker is both the champion for and the lead of the project.
It is important to keep the decision maker involved, even in the portions of the project that are led by IT. Too often, decision makers and IT talk past each other ? and don?t discover the disconnect until effort has been wasted. Avoid these disconnects by over-communicating, during all parts of the process ? particularly on the first few projects.
Another important part of focusing on the decision maker is to set and manage expectations. Remember, the project should have clear, timed deliverables. Make sure that everyone involved knows (and is reminded frequently) that the project is to build a speedboat, not a battleship.
Unless there?s a method, there?s madness
Despite the promises of early data mining ?evangelists,? data mining is not a silver bullet for decision-making ? you can?t just push a button and expect useful results to appear. Successful data mining projects typically employ a formal and iterative process, which guides the team step-by-step from selecting a critical issue through deployment.
Fortunately, there is a tested, industry standard approach to data mining projects. The CRISP-DM (CRoss-Industry Standard Process for Data Mining) was developed with cooperation of over 100 companies, including a mix of industry leaders such NCR and SPSS, small consulting firms and academics. The consortium field tested the methodology and modified it based on that experience and input from the group members. For more information about CRISP-DM, visit .
There?s a reason it?s called ?data mining?
Gold mining is a process for sifting through lots of ore to find valuable nuggets. Data mining is a process for sifting through lots of data to find useful information that aids in decision making. If there?s no gold in a particular mountain or stream, even the best gold miner won?t strike it rich there. Similarly, the ?core? content of the data is critically important to data miners.
Once you?ve picked a problem to solve, you need to start thinking about data. The data is so important, that three of the ten points in this paper focus on data.
This point starts at the highest level, thinking about the data you need from the perspective of the information you want to deliver. You?ll want to capture, and make easily available, the detail needed for strategic analysis and tactical alerts.
Two ?high-level? tips from successful data miners about data are:
? Most often, the ?unit of analysis? in a data mining project is a customer or constituent. In order to do analysis at that level, you almost certainly need a unique ID by customer or constituent everywhere you capture and store data.
? Make use of metadata - information about your data - wherever you can. Metadata includes simple things such as the source of the data to more advanced things such as when the data was last changed or whether the data has been imputed (a missing part of a record has been filled in based on information in other parts of the record). Metadata can take many forms, however the trend is clearly XML.
Better data means better results
There are only so many ways you can improve your results from data mining - better data is one.
Better data means that you can build more comprehensive and more accurate models. Typically, it?s far more valuable to include more types of data in the model building process than it is to have more data (more cases or records). However, there?s a tradeoff between using all data types and getting useful results. Because data mining is a journey, successful data miners typically work with the data they have, get results and realize some ROI, then add in additional data over time to become even more effective.
The best models are built using three types of data:
? Transaction data. Transaction data is very powerful ? it tells you what the customer has actually done. Psychologists have proven that past behavior is the best predictor of future behavior. The good news is that most organizations have a great deal of transaction data ? everything from prior purchases or donations to a record of which Web pages a person visits and how long the linger on them.
? Purchased data. When a customer or constituent interacts with you, they typically only tell you a subset of useful information. An entire industry, led by large corporations such as Acxiom, Experian and Claritas, exists to provide very useful supplemental data about the customer's current situation, including demographic and psychographic data.
? Collected data. Collected data offers a great opportunity for leverage in data mining. And, because it takes effort and skill to collect useful data, it offers a unique opportunity for competitive advantage. Collected data adds information about a customer?s attitudes and opinions into the model building phase ? and results in better decision-making information. You can collect data about customer satisfaction levels, customer preferences, purchase intentions and even ?share-of-wallet? information ? information about how much the customer spends with your competition ? and why. Or, you can turn to professionals in the market research industry to collect it for you. Even a very simple example reveals the power of collected data.
Imagine that you run a Web bookstore. Last week, John Smith purchased two books from your store, one about bicycling and one about gardening. Now, you need to decide what to do when John returns to your site. Based solely on transaction data, for the next few visits you might pop up a list of suggestions that includes both bicycling and gardening titles. But, if John Smith purchased the gardening book for a friend as a gift, and has no personal interest in gardening, he might find the continued presentation of gardening titles frustrating. If you had just collected one additional piece of data at the time of purchase ? is this book for you or is it a gift ? you would be on the way to proving John with a better, more personalized, more valuable customer experience.
It?s still garbage in, garbage out
Some things never change. In the early days of computing, the phrase ?garbage in, garbage out? was coined to reflect the reality that computing results are dramatically affected by the quality of data.
Getting access to the right data, cleaning it and preparing it for analysis is typically the most time consuming step of the data mining process. Don?t kid yourself ? the common adage 70-80 percent of time invested in data mining projects is invested in data access, cleaning and preparation is true. So, plan ? and set expectations ? accordingly.
This work is often time consuming and many people do not find it particularly exciting. In addition, dramatic increases in usefulness of end results can result from more analytically sophisticated approaches to data preparation. As a result, this step is often a good candidate for outsourcing.
Avoid the OLAP trap
Many vendors will try to tell you that they have ?all you need? for data mining. It?s unlikely to be true. Successful data mining requires three families of analytical capabilities: reporting, classification and forecasting.
Why are all three types needed? Each capability delivers a different kind of information to the customer. Reports report ? they inform management of what has happened in the business. Reports (including OLAP) are popular because they are easy to understand and easy for IT to produce. However, if all you have is reporting, you can only see what has already taken place. You cannot predict what might happen later. It?s as if you?re trying to drive your car by only looking in the rearview mirror ? the faster you need to go, the more risky it is.
Classification and forecasting are different from reporting because they enable you to gain more understanding about why things happen ? and to make predictions about what is likely to happen under different scenarios. Armed with that information, you can make proactive changes in your organization and realize better results.
Together, classification and forecasting are commonly known as predictive modeling. Classification methods put things into groups ? for example customers likely to spend more with your company or constituents likely to vote for a proposition. Typically, the classification process includes two steps: establishing the groups and determining (or predicting) group membership on a case-by-case basis. Forecasting methods deal with data where time is the critical measure. Examples of forecasting examples are sales by product and/or region over time and population growth over time.
It?s not the purpose of this paper to delve into the pros and cons of the various methods of predictive modeling. The key point to make in this context is, that contrary to what is often heard, algorithms do matter. Algorithms matter because, in the end just any result won?t do ? to be truly successful, you have to have the best answer. Just as a carpenter uses more than just a hammer to build a house, a data miner uses more than one analytical method to get the best information.
Now, let?s take a quick look how the stereotypical vendor types stack up relative to offering a full range of analytical capabilities:
Reporting and OLAP companies typically have excellent capabilities for management reporting. But, they typically offer little in the way of data mining. The simple test here is to probe the salesperson?s knowledge of data mining and how this product delivers value.
Database companies love data mining ? data mining requires lots of data and that means more databases will be sold. To improve their odds of selling databases for data mining applications, databases vendors are beginning to include data mining algorithms in the databases themselves. This enables the database vendors to ?check the box? and say they have data mining capabilities that offer performance advantages. But, successful data mining requires much more than the presence of a few specialized algorithms and as a result, even when there are algorithms in the database, additional products and services are required to see results.
Data mining companies are, with a few notable exceptions (SPSS and SAS), typically ?one product wonders.? They may offer a unique approach to data mining, but they typically do not offer an adequate range of capabilities, both in terms of reporting and more sophisticated analytics. Only SPSS and SAS offer the full range of analytical capabilities, especially in terms of the more sophisticated, and therefore potentially more valuable methods.
Analytical application companies deliver capabilities for a specific application. Their products typically expect data in a certain form, embody best practices in a certain area and deliver a certain type of highly focused information. The simplicity and focus that makes these vendor?s products intriguing is also their Achilles? heel.
Their emphasis on a specific application and its connection to a specific part of a customer?s operation often comes at the expense of emphasizing the development of the model itself ? and in the end it is the model that makes the difference. Imagine for example the additional value a direct marketer might get from only a few percentage points better response.
So, the secrets of these vendors are that, in general, they are relatively weak in terms of modeling capabilities, model building capabilities (preparing data for analysis, exploring alternative models, etc.), a user interface for the model-builder to use and they deployment capabilities (see below).
Experienced data miners offer three suggestions with regard to data mining analytics:
? Don?t redo your existing reports without adding value. Typically, that means adding one of these three things: drill-down, more context by providing a view over a longer period time (as opposed to the typical, and only marginally valuable, ?this month versus last month? type ?analysis? offered by most reports), or a predictive element.
? Always offer ad hoc capabilities. If all that you do is ?canned? ? particularly if it requires IT intervention to do something simple like change a column or create a new report ? then you?re setting yourself up for frustration and failure. Canned reports are a great starting place, but they are just a starting place. In today?s fast-paced world, ad hoc capabilities are a requirement.
? Your data mining software should offer an easy way for you to incorporate your business knowledge. Software that doesn?t let you make use of what you know about your business in building a model is simply not going to give you the best results.
Deployment is the key to data mining ROI
The ultimate goal of data warehouse ROI cannot be achieved without data mining. But, truly successful data mining cannot be achieved without deployment.
Deployment simply means getting the information, in a usable format, to the place where it is needed. There are four types of deployment:
? To decision makers. Whether it is a report, a model or the result of running new data through a model, getting information in the hands of people who can affect change is key. Typically, deployment to decision makers is done via Intranets. Two ways that people can ?go one better? in this regard are offering ?live? deployment (where the decision maker can interact with the results to drill-down or even specify a new ad-hoc analysis) and supporting off-line use of the information, particularly for mobile employees and executives.
? To ?virtual? decision makers. If a customer enters a store, he or she is a candidate to receive personalized service from another person ? a real decision maker. If a person comes to your Web store, you can still offer a personalized experience if you deploy the data mining models. For example, based on a combination of what you know about a customer and his or her actions or requests at your Web site today, your ?virtual? decision maker can offer different Web page content, differ discounts, etc. on the fly.
? To operational systems. Another customer touch point is your call center. Here, deployment can enable the same type of personalization as in the Web scenario above, by prompting your call-center representatives. In a manufacturing setting, a deployed model may take information coming from a production line and, based on that data, either send a message to a troubleshooter or even make adjustments without human intervention.
? To databases. Interactions with customers today are many faceted. For example, a retailer may interact with the same customer via a storefront, the Web, a call center and a catalog. In order to keep information about that customer current, savvy retailers store all of that data in a centralized data warehouse. So, when on the basis on new information a customer?s profile is updated the ?score,? which indicates the customer?s category, is deployed back to the database for use in future interactions. For example, recent purchase volumes or updated demographic data cause a classification model to change the category in which a customer is classified.
In addition to speed and flexibility, successful data mining projects typically include ways of getting lots of feedback from users quickly. They include a provision for making changes in response to that feedback.
Champions train so they can win the race
Most corporations today have a handful of heavy-duty model builders, a fair number of reasonably analytical knowledge workers and lots of information consumers.
In addition to the obvious returns from training the IT personnel and data mining software users, you are likely to see greater overall return from your data mining investment if you educate the people who receive the information and use it to make decisions. While data mining is unlikely to make everyone an analyst, and certainly won?t make everyone a model builder, ongoing success in data mining typically requires some education and training for your organization?s staff.
The type of education and training needed varies by the individual?s role in the process. Many people have paid their analytical dues in college by sitting through a statistics course ? and volunteers to repeat that are few and far between. However, many benefit from a more practical refresher course that emphasizes analytical thinking, not analytical methods. The content can be delivered in a variety of ways, from a classroom setting to interactive computer-based tutorials that have the flash and accessibility of today?s top-selling computer games.
Another key training consideration arises if you outsource your first data mining project to get started quickly. If you do, you probably want to train your IT staff to be ready to manage the system, make updates, etc. when you assume responsibility for it.
Today, data mining makes the differences in every industry in every area of the world. You can do data mining ? and use the results to make proactive positive change in your organization. A collection of data mining case histories organized by industry and application are available in the section of SPSS? web page. To read them, visit -- and take the first step on your data mining journey.
What was the response rate to our mailing?
What is the profile of people who are likely to respond to future mailings?
How many units of our new product did we sell to our existing customers?
Which existing customers are likely to buy our next new product?
Which customers didn?t renew their policies last month?
Which customers are likely to switch to the competition in the next six months?
Who were my ten best customers last year?
Which ten customers offer me the greatest profit potential?
Which customers defaulted on their loans?
Is this customer likely to be a good credit risk?
What were sales by region last quarter?
What are expected sales by region next year?
What percentage of the parts we produced yesterday are defective?
What can I do to improve throughput and reduce scrap?