Chapter 5
Foundations of Business Intelligence: Database and Information Management
Student Objectives
1. Describe how a relational database organizes data and compare its approach to an object-oriented database.
2. Identify and describe the principles of a database management system.
3. Evaluate tools and technologies for providing information from databases to improve business performance and decision making.
4. Assess the role of information policy and data administration in the management of organizational data resources.
5. Assess the importance of data quality assurance for the business.
Chapter Outline
5.1 The Database Approach to Data Management
Entities and Attributes
Organizing Data in a Relational Database
Establishing Relationships
5.2 Database Management System
Operations of a Relational DBMS
Capabilities of Database Management Systems
Object-Oriented Databases
5.3 Using Databases to Improve Business Performance and Decision Making
Data Warehouses
What is a Data Warehouse?
Data Marts
Business Intelligence, Multidimensional Data Analysis, and Data Mining
Data Mining
Databases and the Web
5.4 Managing Data Resources
Establishing an Information Policy
Ensuring Data Quality
5.5 Hands-On MIS
Key Terms
The Essay on Management For Information Systems
MANAGEMENT FOR INFORMATION SYSTEMS Today there are a lot of current trends and challenges in information management. The digital world is the linking of people, decisions, tasks and processes via computers and computers with other computers. Cyberspace represents the real time transmitting and sharing of text, voice, graphics, video and the like over a variety of computer-based networks. ...
The following alphabetical list identifies the key terms discussed in this chapter. The page number for each key term is provided.
Attributes, 160 Entity-relationship diagram, 162
Business intelligence (BI), 171 Field, 161
Data administration, 179 Foreign key, 162
Data cleansing, 180 Information policy, 178
Data definition language, 166 Key field, 161
Data dictionary, 166 Normalization, 163
Data manipulation language, 168 Object-oriented DBMS, 169
Data mart, 170 Object-relational DBMS, 169
Data mining, 173 Online analytical processing (OLAP), 172
Data quality audit, 180 Predictive analysis, 174
Data warehouse, 170 Primary key, 161
Database, 159 Records, 161
Database administration, 179 Referential integrity, 163
Database management system (DBMS), 165 Relational database, 160
Database server, 176 Structured Query Language (SQL), 168
Entity, 160 Tuples, 161
Teaching Suggestions
The essential message of this chapter is the statement that “Organizations need to manage their data assets very carefully to make sure that the data are easily accessed and used by managers and employees across the organization.” Data have now become central and even vital to an organization’s survival. You can illustrate these comments by referencing the opening case, “NASCAR Races to Manage Its Data”, in order to stress the importance of data and database systems for success in business.
What’s interesting and intriguing about the opening vignette is how it points out that every organization, even something as non-traditional as NASCAR, needs to manage data and information as an important resource. You could substitute almost any other company for NASCAR and the story would be the same. How businesses store, organize, and manage their data has a tremendous impact on organizational effectiveness. Companies need to manage their data to help them reduce costs, improve operational efficiency and decision making, and most of all, boost profitability.
Section 5.1, “The Database Approach to Data Management” This section introduces students to file organization terms and concepts. The database management system is comprised of three components: important database terminology, types of databases, and the elements of SQL. If you have access to a relational DBMS during class time, you can demonstrate several of the concepts presented in this section.
The Term Paper on Databases And Information Management
... design. The logical design models the database from a business perspective. The organization’s data model should reflect its key business processes and decision-making requirements. ... tables such as in an admission database it will include Student database and Course database. d) Object-Oriented : This database is the mixture of relational, ...
Section 5.2, “Database Management System” Database design and management requirements for database systems are introduced. Help your students see how a logical design allows them to analyze and understand the data from a business perspective, while physical design shows how the database is arranged on direct access storage devices. At this point, you can use the enrollment process at your university as an example. Have your students prepare a logical design for the enrollment process. If you have time and as a class activity, ask your students to prepare an entity-relationship diagram, as well as normalize the data. Your students will need guidance from you to complete this activity, but it will help them see and understand the logical design process.
Section 5.3, “Using Databases to Improve Business Performance and Decision Making” This section focuses on how data technologies are actually used: data warehouses, data marts, business intelligence, multidimensional data analysis, and data mining. Regardless of their career choice, students will probably use some or all of these in their jobs. For example, data warehouses and data marts are important to many people, partly because they are critical for those who want to use data mining, which in turn has many uses in management analysis and business decisions. Keep in mind as you teach this chapter that managing data resources can be very technical, but many students will need and want to know the business uses and business values. In the end, effectively managing data is the goal. Doing it in a way that will enable your students to contribute to the success of their organization is the reason why most students are in this course.
Interactive Session: Organizations: DNA Databases: Crime Fighting Weapon or Threat to Privacy?
Case Study Questions:
1. What are the benefits of DNA Databases?
DNA databases provide a centralized, digitized collection of one-of-a-kind data to prove the guilt or innocence of suspected criminals. The database provides a fast, economical method of comparing evidence at crime scenes with DNA profiles to help apprehend those suspected of committing crimes. Law enforcement agencies around the world can access the databases created by the states and linked through the FBI’s CODIS system. The ability to share the data saves money and time. Sharing the data ensures a wider availability of the collected data.
The Essay on Will a National DNA Database Decrease Crime in the U.S.?
Will the use of a National DNA Database decrease crime or increase government intrusion into the population of the United States?DNA ConferenceSo far Britain has taken full advantage of this latest scientific detective, but some are skeptical as to whether the United States will be able to handle such a controversial database. The FBI, confident in their decisions say that the DNA database, which ...
2. What problems do DNA databases pose?
The DNA databases pose privacy risks to the innocent if the databases contain data on people who are not convicted criminals. People who collect and analyze the DNA samples can make mistakes and improperly identify an innocent person as a criminal. Innocent people could be wrongly convicted of crimes they didn’t commit.
3. Who should be included in a national DNA databases? Should it be limited to convicted felons? Explain your answer.
The answers to these questions will vary. Some students may say that only convicted criminals should be included in the DNA database. However, that assumes that only convicted criminals will commit future crimes. On the other hand, if you say that everyone’s DNA should be included in the database on the assumption that anyone is capable of committing a crime, then you run into serious privacy questions. Including everyone’s DNA assumes that everyone may at some point commit a crime.
Including families of suspected or convicted criminals also invites privacy concerns and social problems. Just because a family member commits a crime, are we to suppose that everyone in that family is a criminal or capable of committing crimes? Our legal system is designed to protect juveniles from some of the harsher rules found in the adult criminal legal system. Does including them in the DNA databases violate some of those protections? Does that mark them as a criminal for life?
4. Who should be able to use DNA databases?
Most people will say that all law enforcement agencies should have access to the DNA databases. That poses privacy concerns if the data are misused. Some students may say that Transportation Security Agencies that provide airport security should have access to the databases to help track suspected criminals that may pose threats to airline passengers. If that’s true, then why not allow access for security personnel at bus stations, train stations, and even those who run cruise ships?
MIS In Action
Explore the Web site for the Combined DNA Index System (CODIS) and answer the following questions. (Answers to the questions below are taken directly from the FBI’s Web site at the following address: http://www.fbi.gov/hq/lab/html/codis1.htm )
The Essay on Determining Databases And Data Communication
Being placed in scenarios gets you out of the student mindset and puts you in a unique decision making state of mind. I enjoy these exercises to ensure there is an understanding of what it takes to get into that state of mind. Scenario 1 Tracking data about booth components, equipment, shippers, and shipments are extremely important part of my job. Understanding how to maintain all of this ...
1. How does CODIS work? How is it designed?
The FBI Laboratory’s Combined DNA Index System (CODIS) blends forensic science and computer technology into an effective tool for solving crime. The FBI Laboratory’s CODIS project began as a pilot software project in 1990 serving 14 state and local laboratories. The DNA Identification Act of 1994 formalized the FBI’s authority to establish a national DNA Index System (NDIS) for law enforcement purposes.
CODIS supports NDIS (National DNA Index System), SDIS (State DNA Index System), and LDIS (Local DNA Index System).
NDIS is the highest level in the CODIS hierarchy, and enables the laboratories participating in the program to exchange and compare NDA profiles on the national level. SDIS slows laboratories within states to exchange DNA profiles. All DNA profiles originate at LDIS, and then flow to SDIS and NDIS.
2. What information does CODIS maintain?
Several indexes categorize the profiles entered into CODIS:
• Convicted Offender: Contains profiles of individuals convicted of crimes
• Forensic: Contains DNA profiles developed from crime scene evidence, such as semen stains or blood
• Arrestees: Contains profiles of arrested persons (if state law permits the collection of arrestee samples)
• Missing Persons: Contains DNA reference profiles from missing persons
• Unidentified Human Remains: Contains DNA profiles developed from unidentified human remains. Biological Relatives of Missing Persons contains DNA profiles voluntarily contributed from relatives of missing persons.
3. Who is allowed to use CODIS?
Today, over 170 public law enforcement laboratories participate in NDIS across the United States. Internationally, more than 40 law enforcement laboratories in over 25 countries use the CODIS software for their own database initiatives.
4. How does CODIS aid criminal investigations?
CODIS generates investigative leads in cases where biological evidence is recovered from the crime scene. Matches made among profiles in the Forensic Index can link crime scenes together; possibly identifying serial offenders. Based upon a match, police from multiple jurisdictions can coordinate their respective investigations and share the leads they developed independently. Matches made between the forensic and Offender Indexes provide investigators with the identity of a suspected perpetrator(s).
The Essay on Database Information Databases Data
What is a Database? A database is a software program arranged to collect, hold and process information. There are many software packages that help you handle information. However, what makes databases different is that once you enter the information into it, the database will operate the information in ways that allow you to analyze the data. It is designed in such a way as to make it easy to ...
Since names and other personally identifiable information are not stored at NDIS, qualified DNA analysts in the laboratories sharing matching profiles contact each other to confirm the candidate match.
Interactive Session: Technology: The Databases Behind MySpace
Case Study Questions
1. Describe how MySpace uses databases and database servers.
In its initial phases, MySpace operated with two Web servers communicating with one database server and a Microsoft SQL Server database. The site continued adding Web servers to handle increased user requests. After the number of accounts exceeded 500,000 the site added more SQL Server databases: one served as a master database, the others focused on retrieving data for user page requests. After two million accounts were activated, MySpace switched to a vertical partitioning model in which separate databases supported distinct functions of the Web site. After three million accounts, the site scaled out by adding many cheaper servers to share the database workload.
It eventually switched to a virtualized storage architecture in which databases write data to any available disk, thus eliminating the possibility of an application’s dedicated disk becoming overloaded. MySpace later installed a layer of servers between the database servers and the Web servers to store and serve copies of frequently accessed data objects so that the site’s Web servers wouldn’t have to query the database servers with lookups as frequently
2. Why is database technology so important for a business such as MySpace?
Almost everything MySpace receives from and serves to its users are data objects like pictures, audio files and video files. The objects are very individualized and attached to a certain entity (person).
Its databases must make the objects readily available to anyone requesting access to that entity. Database technology is the only technology that accomplish the mission.
The Essay on Data Table Analysis
This brief will evaluate the design elements of the data tables from an accounting perspective for Kudler Fine Foods. An entity relationship diagram illustrating the existing data tables will be created. Recommendations that are needed for improvements to the data tables will also be outlined. This brief will show a pivot table using Kudler’s general ledger inventory data and there will be an ...
3. How effectively does MySpace organize and store the data on its site?
In its infancy, MySpace used two Web servers communicating with one database server. That was adequate when the site had a small number of users who were updating or accessing database objects. Obviously that won’t work with tens of millions of users. Unfortunately, MySpace still overloads more frequently than other major Web sites. With a log-in error rate of 20 to 40 percent on some days, the site is not effectively organizing or storing data at all.
4. What data management problems have arisen? How has MySpace solved, or attempted to solve, these problems?
Some of the problems MySpace has encountered are inadequate storage space on its database servers, slow access or no access through its log-in application, and users’ inabilities to access data. Over the years, MySpace has attempted to fix these problems by adding more Web servers and more database servers. Some were simply “added on” without restructuring the entire system to more efficiently use its hardware and software. Workloads were not distributed evenly between servers which caused inefficient use of resources. MySpace developers continue to redesign the Web site’s database, software, and storage systems, to keep pace with its exploding growth, but their job is never done.
MIS In Action
Explore MySpace.com, examining the features and tools that are not restricted to registered members. Then answer the following questions:
1. Based on what you can view without registering, what are the entities in MySpace’s database?
Obviously, individual users are the main entity in MySpace’s databases. Other entities are video files, audio files, blogs, forums, groups, events, favorites, and email.
2. Which of these entities have some relationship to individual members?
Which of the entities have a relationship to individual members depends on what the individual decides. For instance, it’s possible that Sarah would have a list of films (video files) attached to her profile. She may also participate in forums or groups. It’s possible that all the entities have some relationship to individual members.
3. Select one of these entities and describe the attributes for that entity.
Films included in MySpace’s databases likely have these attributes: name, date produced, date released, actors, actresses, director, subject, place it was filmed, musical scores included in the film, awards given to the film, comments of film goers, and critics’ ratings.
Section 5.4, “Managing Data Resources” This section introduces students to some of the critical issues surrounding corporate data. Students should realize that setting up the database is only the beginning of the process. Managing the data is the real challenge. In fact, the main point is to show how data management has changed and the reason why data must be organized, accessed easily by those who need access, and protected from the wrong people accessing, modifying, or harming the data.
Developing a database environment requires much more than selecting database technology. It requires a formal information policy governing the maintenance, distribution, and use of information in the organization. The organization must also develop a data administration function and a data-planning methodology. Data planning may need to be performed to make sure that the organization’s data model delivers information efficiently for its business processes and enhances organizational performance. There is political resistance in organizations to many key database concepts, especially the sharing of information that has been controlled exclusively by one organizational group. Creating a database environment is a long-term endeavor requiring large up-front investments and organizational change.
Section 5.5, “Hands-On MIS”
Improving Decision Making: Redesigning the Customer Database: Dirt Bikes U.S.A.
Software skills: Database design; querying and reporting
Business skills: Customer profiling
Redesign Dirt Bikes’ customer database so that it can store and provide the information needed for marketing. You will need to develop a design for the new customer database and then implement that design using database software. Consider using multiple tables in your new design. Populate each new table with ten records.
Develop several reports that would be of great interest to Dirt Bikes’ marketing and sales department (for example, lists of repeat Dirt Bikes customers, Dirt Bike customers who attend racing events, or the average age and years of schooling of Dirt Bikes customers) and print them.
The solution file represents one of many alternative database designs that would satisfy Dirt Bikes’s requirements. The design shown here consists of four tables: Customer, Distributor, Purchase, and Model. Dirt Bikes’s old customer database was modified by breaking it down into these tables. Data on both Dirt Bike’s customer purchases captured from distributors and customer purchases of non, Dirt Bike, models are stored in the Purchase table. The Customers table no longer contains purchase data but it does contain data on e-mail addresses, customer date of birth, years of education, additional sport of interest, and whether they attend dirt bike racing events. This particular design tracks repeat Dirt Bikes’s customers through reports of customer purchases showing which customers have purchased more than one Dirt Bike. Reports for this solution were developed using Access query and report wizards.
An example solution file can be found in the Microsoft Access file named: Ess8ch05 running case solution.mdb.
Improving Operational Excellence, Building a Relational Database for Inventory Management
Software skills: Database design, querying and reporting
Business Skills: Inventory Management
This exercise requires that students know how to create queries and reports using information from multiple tables. The solutions provided here were created using the query wizard and report wizard capabilities of Access. Students can, of course, create more sophisticated reports if they wish.
The database would need some modification to answer other important questions about the business. The owners might want to know, for example, which are the fastest-selling bicycles. The existing database shows products in inventory and their suppliers. The owners might want to add an additional table (or tables) in the database to house information about product sales, such as the product identification number, date placed in inventory, date of sale, purchase price, and customer name, address, and telephone number. Management could use this enhanced database to create reports on best selling bikes over a specific period, the number of bicycles sold during a specific period, total volume of sales over a specific period, or best customers. Students should be encouraged to think creatively about what other pieces of information should be captured on the database that would help the owners manage the business.
The answers to the following questions can be found in the Microsoft Access File named: Ess8ch05solutionfile.mdb.
1. Prepare a report that identifies the five most expensive bicycles. The report should list the bicycles in descending order from most expensive to lease expensive, the quantity on hand for each, and the markup percentage for each.
2. Prepare a report that lists each supplier, its products, their quantities on hand, and associated reorder levels. The report should be sorted alphabetically by supplier. Within each supplier category, the products should be sorted alphabetically.
3. Prepare a report listing only the bicycles that are low in stock and need to be reordered. The report should provide supplier information for the items identified.
4. Write a brief description of how the database could be enhanced to further improve management of the business. What tables or fields should be added? What additional reports would be useful?
Improving Decision Making: Searching Online Databases for Overseas Business Resources
Software skills: Online databases
Business skills: Researching services for overseas operations
List the companies you would contact to interview on your trip to determine whether they can help you with these and any other functions you think vital to establishing your office.
Student answers will vary based on the companies they choose to contact.
Rate the databases you used for accuracy of name, completeness, ease-of-use, and general helpfulness.
The U.S. Department of Commerce Web site contains a fair amount of economic information. However, it may be simpler to direct your students to go to http://www.aol.com. The Web site for the Nationwide Business Directory of Australia is http://www.nationwide.com.au
What does this exercise tell you about the design of databases?
Students may not understand that the World Wide Web is one massive data warehouse, but in non-technical terms that is exactly what it is. Remind them of this when they are completing this assignment. This assignment may best be accomplished in groups, where they can consolidate their findings into a written or oral presentation.
Review Questions
1. How does a relational database organize data and how does it differ from an object-oriented database?
Define and explain the significance of entities, attributes, and key fields.
• Entity is a person, place, thing, or event on which information can be obtained.
• Attribute is a piece of information describing a particular entity.
• Key field is a field in a record that uniquely identifies instances of that unique record so that it can be retrieved, updated, or sorted. For example, a person’s name cannot be a key because there can be another person with the same name, whereas a social security number is unique. Also a product name may not be unique but a product number can be designed to be unique.
Define a relational database and explain how it organizes and stores information.
The relational database is the primary method for organizing and maintaining data today in information systems. It organizes data in two-dimensional tables with rows and columns called relations. Each table contains data about an entity and its attributes. Each row represents a record and each column represents an attribute or field. Each table also contains a key field to uniquely identify each record for retrieval or manipulation.
Explain the role of entity-relationship diagrams and normalization in database design.
An entity-relationship diagram graphically depicts the relationship between entities (tables) in a relational database. A well-designed relational database will not have many-to-many relationships, and all attributes for a specific entity will only apply to that entity. The process of breaking down complex groupings of data and streamlining them to minimize redundancy and awkward many-to-many relationships is called normalization.
Relational databases organize data into two-dimensional tables (called relations) with columns and rows. Each table contains data on an entity and its attributes.
Define an object-oriented database and explain how it differs from a relational database.
An object-oriented DBMS stores the data and procedures that act on those data as objects that can be automatically retrieved and shared. Object-oriented database management systems (OODBMS) are becoming popular because they can be used to manage the various multimedia components or Java applets used in Web applications, which typically integrate pieces of information from a variety of sources.
Although object-oriented databases can store more complex types of information than relational DBMS, they are relatively slow compared with relational DBMS for processing large numbers of transactions.
2. What are the principles of a database management system?
Define a database management system (DBMS) and describe how it works and its benefits to organizations.
A database management system (DBMS) is a specific type of software for creating, storing, organizing, and accessing data from a database. A DBMS consists of software that permits centralization of data and data management so that businesses have a single, consistent source for all their data needs. A single database services multiple applications. The most important feature of the DBMS is its ability to separate the logical and physical views of data. The user works with a logical view of data. The DBMS retrieves information so that the user does not have to be concerned with its physical location.
Define and compare the logical and physical views of data.
The DBMS relieves the end user or programmer from the task of understanding where and how the data are actually stored by separating the logical and physical views of the data. The logical view presents data as end users or business specialists would perceive them, whereas the physical view shows how data are actually organized and structured on physical storage media, such as a hard disk.
Define and describe the three operations of a relational database management system.
In a relational database, three basic operations are used to develop useful sets of data: select, project, and join.
• Select operation creates a subset consisting of all records in the file that meet stated criteria. In other words, select creates a subset of rows that meet certain criteria.
• Joint operation combines relational tables to provide the user with more information that is available in individual tables.
• Project operation creates a subset consisting of columns in a table, permitting the user to create new tables that contain only the information required.
Name and describe the three major capabilities of a DBMS.
A DBMS includes capabilities and tools for organizing, managing, and accessing the data in the database. The principal capabilities of a DBMS include data definition language, data dictionary, and data manipulation language.
• The data definition language specifies the structure and content of the database.
• The data dictionary is an automated or manual file that stores information about the data in the database, including names, definitions, formats, and descriptions of data elements.
• The data manipulation language, such as SQL, is a specialized language for accessing and manipulating the data in the database.
3. What are the principal tools and technologies for accessing information from databases to improve business performance and decision making?
Define a data warehouse and describe how it works.
A data warehouse is a database with archival, querying, and data exploration tools (i.e., statistical tools) and is used for storing historical and current data of potential interest to managers throughout the organization and from external sources (e.g., competitor sales or market share).
The data originate in many of the operational areas and are copied into the data warehouse as often as needed. The data in the warehouse are organized according to company-wide standards so that they can be used for management analysis and decision making. Data warehouses support looking at the data of the organization through many views or directions. The data warehouse makes the data available to anyone to access as needed, but it cannot be altered. A data warehouse system also provides a range of ad hoc and standardized query tools, analytical tools, and graphical reporting facilities. The data warehouse system allows managers to look at products by customer, by year, by salesperson, essentially different slices of the data. Normal operational databases do not permit such different views.
Define business intelligence and explain how it is related to database technology.
Powerful tools are available to analyze and access information that has been captured and organized in data warehouses and data marts. These tools enable users to analyze the data to see new patterns, relationships, and insights that are useful for guiding decision making. These tools for consolidating, analyzing, and providing access to vast amounts of data to help users make better business decisions are often referred to as business intelligence. Principal tools for business intelligence include software for database query and reporting tools for multidimensional data analysis and data mining.
Describe the capabilities of online analytical processing (OLAP).
Data warehouses support multidimensional data analysis, also known as online analytical processing (OLAP), which enables users to view the same data in different ways using multiple dimensions. Each aspect of information represents a different dimension.
OLAP represents relationships among data as a multidimensional structure, which can be visualized as cubes of data and cubes within cubes of data, enabling more sophisticated data analysis. OLAP enables users to obtain online answers to ad hoc questions in a fairly rapid amount of time, even when the data are stored in very large databases. Online analytical processing and data mining enable the manipulation and analysis of large volumes of data from many perspectives, for example, sales by item, by department, by store, by region, in order to find patterns in the data. Such patterns are difficult to find with normal database methods, which is why a data warehouse and data mining are usually parts of OLAP. OLAP represents relationships among data as a multidimensional structure, which can be visualized as cubes of data and cubes within cubes of data, enabling more sophisticated data analysis.
Define data mining describe what types of information can be obtained from it, and explain how it differs from OLAP.
Data mining provides insights into corporate data that cannot be obtained with OLAP by finding hidden patterns and relationships in large databases and inferring rules from them to predict future behavior. The patterns and rules are used to guide decision making and forecast the effect of those decisions. The types of information obtained from data mining include associations, sequences, classifications, clusters, and forecasts.
Explain how users can access information from a company’s internal databases through the Web.
Conventional databases can be linked via middleware to the Web or a Web interface to facilitate user access to an organization’s internal data. Web browser software on his/her client PC is used to access a corporate Web site over the Internet. The Web browser software requests data from the organization’s database, using HTML commands to communicate with the Web server. Because many back-end databases cannot interpret commands written in HTML, the Web server passes these requests for data to special middleware software that then translates HTML commands into SQL so that they can be processed by the DBMS working with the database. The DBMS receives the SQL requests and provides the required data. The middleware transfers information from the organization’s internal database back to the Web server for delivery in the form of a Web page to the user. The software working between the Web server and the DBMS can be an application server, a custom program, or a series of software scripts.
4. What is the role of information policy and data administration in the management of organizational data resources?
Define information policy and data administration and explain how they help organizations manage their data.
An information policy specifies the organization’s rules for sharing, disseminating, acquiring, standardizing, classifying, and inventorying information. Information policy lays out specific procedures and accountabilities, identifying which users and organizational units can share information, where information can be distributed, and who is responsible for updating and maintaining the information.
Data administration is responsible for the specific policies and procedures through which data can be managed as an organizational resource. These responsibilities include developing information policy, planning for data, overseeing logical database design and data dictionary development, and monitoring how information systems specialists and end-user groups use data.
In large corporations, a formal data administration function is responsible for information policy, as well as for data planning, data dictionary development, and monitoring data usage in the firm.
5. Why is data quality assurance so important for a business?
List and describe the most common data quality problems.
Data that are inaccurate, incomplete, or inconsistent create serious operational and financial problems for businesses because they may create inaccuracies in product pricing, customer accounts, and inventory data, and lead to inaccurate decisions about the actions that should be taken by the firm. Firms must take special steps to make sure they have a high level of data quality. These include using enterprise-wide data standards, databases designed to minimize inconsistent and redundant data, data quality audits, and data cleansing software.
List and describe the most important tools and techniques for assuring data quality.
A data quality audit is a structured survey of the accuracy and level of completeness of the data in an information system. Data quality audits can be performed by surveying entire data files, surveying samples from data files, or surveying end users for their perceptions of data quality.
Data cleansing consists of activities for detecting and correcting data in a database that are incorrect, incomplete, improperly formatted, or redundant. Data cleansing not only corrects data but also enforces consistency among different sets of data that originated in separate information systems.
Discussion Questions
1. It has been said that you do not need database management software to create a database environment. Discuss.
A database is a collection of data organized to service many applications at the same time by storing and managing data so that they appear to be in one location. It is not mandated that a database have a DBMS. What is most important is the concept of a database — a model for organizing information so that it can be stored and accessed flexibly and efficiently. Without the right vision of a database and data model, a DBMS is not effective. A DBMS is special software to create and maintain a database. It enables individual business applications to extract the data they need without having to create separate files or data definitions in their computer programs. However, the use of a DBMS can reduce program-data dependence along with program development and maintenance costs. Access and availability of information can be increased because users and programmers can perform ad-hoc queries of data in the database. The DBMS allows the organization to centrally manage data, its use, and security.
2. To what extent should end users be involved in the selection of a database management system and database design?
End users should be involved in the selection of a database management system and the database design. Developing a database environment requires much more than just selecting the technology. It requires a change in the corporation’s attitude toward information. The organization must develop a data administration function and a data planning methodology. The end-user involvement can be instrumental in mitigating the political resistance organizations may have to many key database concepts, especially to sharing information that has been controlled exclusively by one organizational group.
Video Case Questions
You will find a video case illustrating some of the concepts in this chapter on the Laudon Web site at www.prenhall.com/laudon along with questions to help you analyze the case.
Teamwork: Identifying Entities and Attributes in an Online Database
With a group of two or three of your fellow students, select an online database to explore, such as AOL Music or the Internet Movie Database. Explore these Web sites to see what information they provide. Then list the entities and attributes that they must keep track of in their databases. If possible, diagram the relationship between the entities you have identified. If possible, use electronic presentation software to present your findings to the class.
Direct your students to these Web sites. In their analysis, students should quickly articulate that many of these sites use the same entities and attributes to keep track of their database.
There are hundreds of Internet Movie Databases so students will have to select the one that interests them. The Web sites for AOL Music and Gracenote.com are listed below.
http://music.aol.com/
http://gracenote.com/
Business Problem-Solving Case: Can HP Mine Success from an Enterprise Data Warehouse?
1. Identify the problem described in this case. What people, organization, and technology factors were responsible for creating this problem?
At one time HP had
• 5000 information system applications
• 85 computer centers
• Between 19,000 and 22,000 servers
• 17 different database technologies
• 14,000 different databases in use
With all of that computing capacity the organization had these data-related problems:
• It couldn’t collect and analyze “consistent, timely data spanning different parts of the business
• Systems tracked sales data differently
• Commonly used financial information was calculated differently in different business units
• Compiling information from various systems could take up to a week
• Seemingly simple questions were difficult to answer
Without a consistent view of the enterprise, senior executives struggled with decisions on matters such as the size of sales and service teams assigned to particular systems.
Factors that were responsible for creating this problem include:
People: As with most companies, HP experienced political turf issues. Not all departments want to depend on a central data warehouse supported by a centralized information systems staff for their data-analysis needs. HP’s departmental users initially resisted the idea of a central data warehouse. Many of them preferred smaller data marts configured to their particular needs.
Organization: HP had too many different information system applications in too many computer centers. It had too many different database technologies and way too many different databases. As with most organizations, departments were allowed to create, manage and use their own databases without regard towards sharing the data with other departments—islands of information at their finest. Even though HP wanted its data warehouse to give its workforce access to data in real time with no departmental or geographic boundaries, its old system fell far short of that goal.
Technology: All-inclusive data warehouses require enormous work to organize and integrate all the data. Knowledge of database technology and design principles are talents that are hard to find in a large pool of potential employees—techies and non-techies. HP lacked the hardware and software that would allow it to build such a large, consolidated database that is easily and quickly available to over 50,000 users.
2. What solution has HP chosen to fix this problem? Did management select the best solution alternative?
HP CIO Randy Mott began consolidating hundreds of data marts into a single data warehouse. He created a 300-person team that had experience in running data marts and charged them with modeling the enterprise-wide database. He had three goals for the database: it had to always be up-to-date, consistent for the entire enterprise, and complete.
The new database uses proprietary software developed by internal employees. At its implementation the warehouse contains 180 terabytes of raw data and 75 terabytes of functional data. Since the company anticipates the database will double in size at its completion, it’s assumed the team built scalability into the new hardware and software.
Whether management selected the best solution alternative is based on individual perceptions and experiences. Those students who’ve had good success working with very large, consolidated data warehouses will probably agree with HP’s solution. Others who’ve not had good success working with data warehouses probably will not agree with HP’s solution. The fact remains that the company had to do something about its data problems, especially the inability to serve timely, complete, and consistent information to managers. Apparently HP has had good success since it has been able to market the home-grown system to other companies.
3. How much will HP’s database experience and technology help HP and its clients build all-inclusive data warehouses?
The fact that HP built its own data warehouse and had to experience the pain first-hand lends credence to the Neoview system as a potential product and service it can sell to other organizations. It will understand the people, organization, and technology problems that other companies will have to work through. It can offer real-world advice and expertise based on its own experiences.
4. How much will Neoview help HP and its clients create enterprise-wide data warehouses? Explain your answer.
HP promotes Neoview by differentiating it from typical data warehouses, which are costly, use proprietary technology (although so does Neoview), and tend to focus on one area of a business rather than an entire enterprise. The Neoview system was designed from the ground up to be an all-inclusive data warehouse that provides dexterity with table joins and gives the system the ability to perform analysis functions at the same time that it’s managing new incoming data. It includes all of the data used by a company and not just partial segments of data or the company. Most warehouses don’t have that feature.
5. If you were in charge of developing an enterprise-wide data warehouse for your company, describe the steps you would have to take to complete this project. List and describe all of the people, organization, and technology issues that must be addressed to build an enterprise-wide data warehouse successfully.
The first step is to identify the real problem. In HP’s case the real problem was that data was inconsistent across the organization and the current system was slow to provide information to users. It simply did not give the organization a clear, concise, and consistent view of the entire enterprise.
The second step is to assemble the right people, technical and business units users, that could develop an acceptable solution for the entire enterprise. The third step is to implement the solution and the fourth step is to maintain the new system and processes.
Issues that must be addressed to build an enterprise-wide data warehouse successfully include:
People: Perhaps the most important issue is to convince employees, managers, and executives that the new system will be better than the old one. The organization’s change agent is responsible for ensuring all the people in the organization accept the new system. Assemble the right people—techies and non-techies—that have the business knowledge and technical knowledge to build the database. Train, train, and train some more so users have a complete knowledge of the new system.
Organization: Solve, or least reduce, the political turf battles inherent in the old system. Show how the organization will benefit from a better system by having consistent, complete, and up-to-date information across all organizational boundaries.
Technology: HP’s new system has familiar components that will create a larger pool of people with the knowledge to run most data warehouses. The new system will emphasize cost and flexibility. Neoview’s hardware can be used to run other applications aside from those connected to the data warehouse. Most other warehouses in use do not incorporate 100 percent of a company’s data as HP contends Neoview will. Neoview’s system uses servers with Itanium processors from Intel so they meet industry standards and are more versatile than servers with proprietary technology. The system is highly scalable and promises availability 99.999 % of the time.
Chapter Summary
Section 5.1: The Database Approach to Data Management
The relational database is the primary method for organizing and maintaining data today in information systems. It organizes data in two-dimensional tables with rows and columns called relations. Each table contains data about an entity and its attributes. Each row represents a record and each column represents an attribute or field. Each table also contains a key field to uniquely identify each record for retrieval or manipulation. An entity-relationship diagram graphically depicts the relationship between entities (tables) in a relational database. A well-designed relational database will not have many-to-many relationships, and all attributes for a specific entity will only apply to that entity. The process of breaking down complex groupings of data and streamlining them to minimize redundancy and awkward many-to-many relationships is called normalization.
An object-oriented DBMS stores data and procedures that act on the data as objects, and it can handle multimedia as well as characters and numbers.
Section 5.2: Database Management Systems
A database management system (DBMS) consists of software that permits centralization of data and data management so that businesses have a single consistent source for all their data needs. A single database services multiple applications. The most important feature of the DBMS is its ability to separate the logical and physical views of data. The user works with a logical view of data. The DBMS retrieves information so that the user does not have to be concerned with its physical location.
The principal capabilities of a DBMS include a data definition capability, a data dictionary capability, and a data manipulation language. The data definition language specifies the structure and content of the database. The data dictionary is an automated or manual file that stores information about the data in the database, including names, definitions, formats, and descriptions of data elements. The data manipulation language, such as SQL, is a specialized language for accessing and manipulating the data in the database.
Section 5.3: Using Databases to Improve Business Performance and Decision Making
Powerful tools are available to analyze and access the information in databases. A data warehouse consolidates current and historical data from many different operational systems in a central database for reporting and analysis. Data warehouses support multidimensional data analysis, also known as online analytical processing (OLAP).
OLAP represents relationships among data as a multidimensional structure, which can be visualized as cubes of data and cubes within cubes of data, enabling more sophisticated data analysis. Data mining analyzes large pools of data, including the contents of data warehouses, to find patterns and rules that can be used to predict further behavior and guide decision making. Conventional databases can be linked via middleware to the Web or a Web interface to facilitate user access to an organization’s internal data.
Section 5.4: Managing Data Resources
Developing a database environment requires policies and procedures for managing organizational data as well as a good data model and database technology. A formal information policy governs the maintenance, distribution, and use of information in the organization. In large corporations, a formal data administration function is responsible for information policy, as well as for data planning, data dictionary development, and monitoring data usage in the firm.
Data that are inaccurate, incomplete, or inconsistent create serious operational and financial problems for businesses because they may create inaccuracies in product pricing, customer accounts, and inventory data, and lead to inaccurate decisions about the actions that should be taken by the firm. Firms must take special steps to make sure they have a high level of data quality. These include using enterprise-wide data standards, databases designed to minimize inconsistent and redundant data, data quality audits, and data cleansing software.