Business Analysis for Business Intelligence

Preparing to load PDF file. please wait...

0 of 0
Business Analysis for Business Intelligence

Transcript Of Business Analysis for Business Intelligence

Business Analysis for
Business Intelligence
Business Analysis for Business Intelligence Author: Bert Brijs; ISBN: 978-1-4398-5834-9 (Hardback)
© 2013 by Taylor & Francis Group, LLC
Bert Brijs

CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742
© 2013 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed in the United States of America on acid-free paper Version Date: 20120620
International Standard Book Number: 978-1-4398-5834-9 (Hardback)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.
Exce Business pt as permitted under Analysis U.S. Copyright Law, nfooprartBusiness of this book may be r Intelligence eprinted, reproduced, transmitAuthor: Bert Brijs; ISBN: 978-1-4398-5834-9 (Hardback) ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without ©writte2013 n permissio by n from Taylor the publishers. & Francis Group, LLC
For permission to photocopy or use material electronically from this work, please access www.copyright. com ( or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe.

Library of Congress Cataloging‑in‑Publication Data

Brijs, Bert. Business analysis for business intelligence / Bert Brijs. p. cm. Includes bibliographical references and index. ISBN 978-1-4398-5834-9 (hbk. : alk. paper) 1. Business intelligence. 2. Decision making. 3. Strategic planning. I. Title.

HD38.7.B75 2012 658.4’72--dc23


Visit the Taylor & Francis Web site at
and the CRC Press Web site at

Mastering Data Management
Note that I do not use the term “master data management.” Not that I don’t like the concept but I am afraid it will go the same route as the corporate data warehouse in the pre-Kimball era: a nice idea but extremely hard to realize in everyday business practice. Again the tension between providing the business with rapid response to their demands and maintaining a well-architected master data management may push the choice
toBanuismipneerfescst bAutnuaselfyulseinsvifroonrmBenut isnisnteeadsosf tIhnetpeerllfiegctesonlucteion, Authowrh:icBh eis rntevBerrgijosin;gItSo BcoNme: o9ut7o8f -th1e-p4la3n9ni8ng-5ro8om3.4T-h9e t(ruHeavarldueback)
of Ralph Kimball’s work to the development of working corporate data
ware©hou2se0s,1e3mebrgyingTfaroymlohran&ds-Fonradnatca imsaGrtsr, oisurpap,idLlyLdCeliver-
ing business value, but using conformed dimensions to facilitate large integrated warehouses addressing all the subject areas cannot be underestimated. Yet, data management in any of the projects approached (whether the old top-down from Bill Inmon or the bottom-up approach from Kimball) is often an orphan in the process. Just like documenting the project and its deliverables, it is regarded a tedious and superfluous extra on which neither the technicians nor the analysts like to spend their time.
Data management in a Business Intelligence (BI) project is like the blood circulation in an organism: it is needed everywhere but appears in various doses and throughput levels through various blood vessels from the aorta of the extract, transform, load (ETL) process to the capillaries of everyday atomic data capture on the shop floor. The following sections intend to create awareness for clarity and understanding of uniform master data such as “product” or “party” (internal and external, such as supplier and customer) during the business analysis process.

256  •  Business Analysis for Business Intelligence
Data management with regard to BI and your role as a business analyst reflects on the following aspects:
• Managing master data and the system of record • Source analysis • Data profiling • Source-to-target mapping • Metadata • Data management architecture • Business analysis issues
Master Data
DaBtaumsainnaegesmsenAt nfoar lByussiinsesfsoIrntBelluigesnicne eresqsuirIens tteimlleigaendncefefort Authotor:gaBtheerrtanBdruijnsit;e IeSveBryNde:p9ar7tm8en-1t i-n4t3he98or-g5an8iz3at4io-n9ar(oHunad rtdheback)
major entities:
© 2013 by Taylor & Francis Group, LLC
• Party which is a generalization of: • Internal party, which in its turn can be a: −− Person, that is, DIM_EMPLOYEE or DIM_PARTNER −− Organization, that is, DIM_ORGANIZATION or DIM_BUSINESS_UNIT • External party, which can also be a: −− Person, that is, DIM_CUSTOMER −− Organization,thatis,DIM_CUSTOMERorDIM_CHANNEL
• Product describes the physical or service delivery to internal and external customers. It can comprise purchase items, saleables, or consumables but not gross plant addition (GPA).
• Geography is the territory or region that can be strongly determined by the type of operation or the department looking at it.
• Time can be trickier than one may think at first sight: business hours may vary in full continuous operations like the process industry or logistics and in a global or multicultural environment an eight-hour workday may be distributed over eight, ten, or even twelve hours. “Siesta” ring a bell?

Mastering Data Management  •  257

• Time can be registered from three viewpoints: • The happening time: which represents when the event actually happened or “will happen” as in budgeted events. • The valid time: this can be both a point in time and a period when an event or a status was valid. It is a derived value from a slowly changing dimension type 2 or type 3. • The system time: also known as “SYSDATE,” or the time when the event was recorded.
• Calendar where the business days vary over time and per country or even per region may also require a little bit more attention than expected.

More on these concepts in the section on generic definitions. In complex IT architectures, it can be useful to construct an
extended CRUD matrix, superseding the level of one system or database. This high-level matrix shows the relationships between (master) data needed for BI and the applications where the data are created
(CB), uresadin(Re),suspdAatneda(Uly)s, oirsdfeoletredB(uD)s. PinoteenstsialIinctoenlsliisgteennciecseare Authoears:ieBr teorttraBceriinjsa;CIRSUBDNm:at9rix7.8D-o1n’-t4ex3p9ec8t -to5d8o3th4is-9in (aHonae-rodffback)
Tab©le 2140.11t3ellsbuys Timamyeldoiarte&ly wFhrearentcheismGastreor udapta,’sLsLysCtem of
record resides: for customer, order, and territory it is the customer relationship management (CRM) system, for product it is the inventory system, and for the Human Resources (HR) member, the Human Resources Management (HRM) system. But the U’s here and there point to applications that add data to the master data such as accounts receivable adding a customer contact responsible for the accounts payable at the customer site or the territory specification for staff members other than sales and marketing people in the HRM application.

TABLE 14.1

Example of an Extended CRUD Matrix Typea CRM Accts Receivable










HR Member





a Application x target data element.

R R –


Accts Payable
R R R R –

258  •  Business Analysis for Business Intelligence
Extended CRUD matrices can be constructed in various flavors depending on the viewpoint:
• [Application x target data element] as in the example above • [Application x source data element], which produces a much more
refined view as each source record’s life cycle is mapped in the matrix • [User x source data element], which is only relevant in the case where
one user has CRUD access to many applications. In that case a drilldown of “applications per user” will be necessary.
Source Analysis
The source analysis should yield the following information per entity and its attributes:
• Entity name, definition, source application, table with the master file, and the data definition owner.
B• uAsttirnibeutsesnaAmen, adelfyinsitiisonf, odartaBtyupes, idnomeasins(tIhnetleisltliogf veanlucesefor Author: BdiescrrtetBe vrailjuses; aInSdBthNe d:at9a r7a8ng-e1fo-r4c3on9ti8nu-o5u8s v3al4ue-s9). (Hardback)
• Physical data model.
• ©Acc2es0s 1an3d sbecyurTityamyeltohord&s (ruFlersa, pnascswisorGds,reotcu.).p, LLC
• Check how the data are used by business processes and complete the extended CRUD matrix in complex environments.
• Check how the data are used by applications and complete the extended CRUD matrix.
• Check for gaps in data domains such as: insufficient data currency, availability, or relationship gaps.
• Assess a rough approach to the necessary transformations for BI purposes.
Data Profiling
Some practitioners exclude this exercise from the business analyst’s responsibilities. I tend to disagree as data profiling can provide the acid test for the assumptions made during the analysis. Sometimes the business user (or even the source IT specialist) has spent less time with the company than the specific tables he is talking about. It wouldn’t be the first time the client tells you he has 78,000 clients, which a simple count rejects with the proof of the existence of half a million customer records!

Mastering Data Management  •  259

TABLE 14.2
Simple Data Profiling Analysis

# of Instances
300,895,742 300,895,700 300,895,645
56,800 89,478,484 12,874,595

# Nulls
5,894,755 42 107 0
58,7999 450,022

Datatype Not Conform
215 0 0 0 0 0

Table 14.2 teaches us that a large proportion of values is missing and the source system does not support data consistency.
Data profiling comes in two flavors: business or contextual data profiling and technical or out of context profiling. The first checks the assumptions from the business community: that is, “Every employee has a manager,” or, “Every active customer has received at least one invoice in the last three years,” and so on. The technical profiling checks the feasibility for ETL and
anBswuerss ianllethsesneAcesnsaarylyqsueisstiofnos rtoBscuopseitnheecsosmpIlnextiteylalingdewnorckleoad Authoofr:thBe EeTrLt. Brijs; ISBN: 978-1-4398-5834-9 (Hardback)
Bef©ore2y0ou1s3tarbt tyheTbausyinleossr p&rofFilirnag,nycouisshGourldoauskp,yoLurLseClf (and
your customer) about her expectations about the quality of this job. The higher the quality requirement, the more thorough work is needed and the larger the sample size is required, up to profiling large tables entirely. Don’t underestimate the time and resources needed for this job. The sample may be random using SQL functionality to pick every nth record for profiling or it may be a stratified sample using an important decision variable to spread the sampling in a meaningful way. Common strata are “Product,” “Customer,” “Employee,” and “Time.”
Unless you are asked to estimate the end-to-end project workload you needn’t be involved in the technical data-profiling exercise. Nevertheless, don’t take on this job without the involvement of the technician(s) who will actually do the ETL design and build the data warehouse. Ignore this rule and you may find yourself in big trouble as soon as the builders come in.
Source-to-Target Mapping
Some practitioners also exclude this exercise from the business analyst’s responsibilities and here I tend to agree. Nevertheless, one should be aware of this exercise and consult the results as this may prove to be a reality

260  •  Business Analysis for Business Intelligence
check for the assumptions made during the analysis. Make sure you and the ETL designer have an exchange of ideas and information about this stage. In Table 14.3 the transformation column at least will certainly be of interest to the analyst.
Metadata Management for Business Analysts
I couldn’t agree more with Kimball (2004) who says the following about metadata tools in his book, The Data Warehouse ETL Toolkit:
Metadata is an interesting topic because every tool space in the data warehouse arena including business intelligence (BI) tools, ETL tools, databases, and dedicated repositories claims to have a metadata solution, and many books are available to advise you on the best metadata strategies. Yet, after years of implementing and reviewing data warehouses, we’ve yet to encounter a true end-to-end metadata solution. Instead, most data warehouses have manually maintained pieces of metadata that separately exist
BacurosssitnheeirscosmAponneantlsy. sis for Business Intelligence AuthorK: iBmbearllt’s Bdersicjrsip;tiIoSn Bof Nme:ta9d7ata8s-t1ru-c4tu3re9s 8is-c5om8p3le4te-9and(Hseraversdaback)
basis©for2an0y1bu3sinbesys aTnaalyystlowhro&waFntrsatongectidseeGperroinutopth,e LsuLbjCect. Let
me pick out the metadata questions about metadata the business analyst should ask before, during, and after the project.
Before the Project
These questions are mainly about the front room metadata that describe the source data from a business point of view as well as the dimensions and their attributes and the measures in the fact tables. These are to be found in the report requirements or in the business requirements. In complex environments, an application landscape can provide a high-level lineage view on crucial data such as customers and employees. For example, an organization may have different input possibilities like a checkout and a call center, and external customer data sources that need to be reconciled in a customer master data repository.
In this phase, a first estimate of data volumes may be useful to determine the technical scope of the project. In the case of heavy security demands (e.g., HRM analytics or BI for legal or policing purposes) the metadata should also include which profiles (or even persons) have access rights to

TABLE 14.3

Example of a Mapping Table

Target Table

Target Column





Mastering Data Management  •  261
Business Analysis for Business Intelligence Author: Bert Brijs; ISBN: 978-1-4398-5834-9 (Hardback)
© 2013 by Taylor & Francis Group, LLC

Source Table

Source Column

Transformation Rule
If empty THEN take same as DESCRIPTION_UK
Replace ‘Z’ by ‘9’ e.g., 7070ZZZZ becomes 70709999

262  •  Business Analysis for Business Intelligence
which data. Finally, a conceptual or logical data model of the data warehouse or the data mart is both the result and the documentation of the business analysis process. This should provide enough input for the project definition and the project charter.
During the Project
As the business analyst delves deeper into the analysis levels and the metadata are well documented from a business context perspective, the source metadata are described in technical terms: for example, datatypes, record lengths, source to target lineage, results from data profiling, default values, null values, and missing data handling, mandatory flags, slowly changing dimension attribute handling, data quality policies and strategies, business rules on the source systems, and so on. During the project, the physical data model serves as a guide and is also used for technical documentation purposes.
AfBteur tsheinPerosjesctAnalysis for Business Intelligence Author: Bert Brijs; ISBN: 978-1-4398-5834-9 (Hardback)
I advise the business analyst to play more of a documentalist role as this
third©laye2r0of1m3etabdyataTisamyolsotlyra&bouFt trhae nETcLisproGcersos, ua fpur,thLeLr rCefining
of data lineage and a description of the business translations of technical row and column headers. This is the realm of the developers.
The following recommendations are not the gospel. Whatever classification and representation framework you choose, as long as they are consistent, practical, and first and foremost manageable, you are avoiding the two major pitfalls of data management: getting bogged down in unworkable procedures on the one hand or creating data chaos leading to new stovepipes of “Never-mind-I’ll-do-it-myself” solutions. I combine the elegance of the Dublin Core’s metadata catalog with the complexityreducing Zachman framework and deliver this blend with the usability of the structured writing approach. Before you get too mixed up by all this, let us describe the three building blocks.