Skip to content

The Plan: 7 Days to OLAP

One of the formative impressions on my tech career was Douglas Coupland’s novel “Microserfs“. The depiction of the isolation and focus as the developers eschewed everything and poured themselves into writing code sounded exciting… in a very nerdy way.

For the past 4 years, PASS has been my annual adaptation of that immersion experience (it’s also as close to living on a campus in Redmond as I’m comfortable with). I live and work in Toronto, Canada, so coming to Seattle means putting some real distance between myself and home/office life. While I’m here I pour in a lot of hours and energy, and always come out ahead: diagrams sketched on the back of bar napkins, chunks of pseudo-code written in Notepad and saved on the desktop, and poorly formatted TSQL clearly written after midnight.

It’s not just the keynotes, seminars, and demos – it’s the dialogue with my SQL Server brethren at the conference, the time spent at the hotel hacking it all together, and lack of distractions that really make it a productive experience. There’s great coffee on every block, a gym in the hotel, and a bed three feet from my desk. I never see any sights or come home bearing gifts. I eat the same takeout from about two different buritto spots and get scolded by my wife for not inviting her along year after year. But I love it – once a year, the immersion works for me.

This year, I’m upping the ante.

Instead of spreading myself over several different tracks, wandering promiscuously from sessions on index tuning using DMVs to sessions on formatting pie-charts in SSRS, I’m focusing exclusively on OLAP. I’ve got zero experience on that aspect of SQL Server, but I can’t wait to dive in.

So, here’s what I’m gunning for: it’s Friday night in Seattle. I have 7 days before I fly back to Toronto, and I want to fly back with an OLAP solution on my laptop. It’s not going be pretty: I’m not going to plan it very carefully, or document it, or commit it to source code, or anything like that. It’s a frantic sprint, to use the Agile (and running) metaphor. The kind of sprint where you’re wearing jeans and dress shoes, and haven’t done any stretching. I’ll surely trip, stumble, and fall, but with any luck, I’ll make it to the proverbial finish line anyway.

Away we go…

Day 327: From Prototype to Production

cube_sketch

In November of 2012, I embarked on a Quixotic attempt at building a BI prototype in 7 days. I dubbed the experiment ‘OLAP Sprint’.

The plan was to leverage everything great about the PASS Summit (access to experts, time away from the office to focus, a stimulating environment) and build a functional multi-dimensional cube from scratch. I had taken some courses and done some reading, but this was to be my first time putting everything together. By the end of my 7-day sprint in Seattle, I had my scotch-tape-and-paper-clips multidimensional cube built. It was a totally engrossing experience: frustrating, exhilarating, stimulating, and ultimately, totally rewarding.

Aside from the invaluable learning experience, the OLAP Sprint also produced a proof-of-concept that I could share with the business leaders at my office, to show them the light of BI, and set things in motion to upgrade our relational reporting solution to something more scalable and dynamic using SQL Server Analysis Services.

I am happy to report that, almost a year later, we have transformed that prototype into a production-level BI solution, and are primed to release our first set of reports.

What follows are some of my observations, on the stretch of development that has lead me to this point – from Prototype to Production.

Learning Methodology

I underestimated the challenge of going from thinking about the world using two-dimensional OLTP objects to thinking about the world in a multi-dimensional OLAP cube with facts and dimensions. The deep-seated rules of normalization, referential integrity, and constraints, made it awkward and uncomfortable when we first started building out the cube’s dimensions.

In order to get myself in the right frame of mind to deal with that transition, I tried to replicate some of the environmental aspects of the PASS Summit – most importantly the focus and concentration – by blocking most of my days for BI work only, and taking care of operational duties en-masse on other days. I also attended the inaugural PASS Business Analytics conference, which helped to solidify the new themes and concepts that I was learning.

I also alternated between conceptual learning and technical learning. On some days, I’d carve out time to read from “The Data Warehouse Lifecycle Toolkit” (Ralph Kimball), which built my understanding of BI themes and concepts. On other days, I’d focus on executing a certain technical task – for example creating a simple report in SSRS that reported on the cube, from “Professional MS SQL Server 2012 Reporting Services” (Turley, Bruckner, et al). The combination of the two approaches kept my learning balanced.

Expert Help

Before we got started, we had a ‘buy vs. build’ debate internally, to assess the time and risk of doing the build ourselves versus using a consultant to help us. In the end, we engaged a consulting company that specializes in the SQL Server product stack to help us. This was a huge time-saver for us, and also reduced the risk involved, because they were responsible for the major architecture. Our consultant was not only technically very astute, but also a good teacher. The experience was stressful at times, but enjoyable, and we were totally engaged in the development along the way, which means we’re comfortable owning it now.

Development Methodology

When it came time to actually build the SSIS packages, or attribute dimensions, or reports, or whatever, we diagrammed things out on a whiteboard beforehand. It allowed us to make mistakes, and when we sat down to do the work in SSDT, we had a goal to work toward. We also prioritized our deliverables, and in many cases, sought to simplify whatever it was we were working on. We gave ourselves realistic, achievable targets and made sure we hit our marks. If we got stuck, we simplified the deliverable. A good example was user hierarchies: when we first began, we envisioned them as robust and complex, but they were difficult to reproduce. In the end, in order to stay on track, we reduced their complexity quite a bit, in order to make sure that we could deliver something working. This is Agile development methodology 101, and it worked great for us.

I also tried to respect my limits. There were days when I’d want to put in more time, but my brain stopped being able to digest or assimilate the information. Knowing when to push a little harder to solve a stubborn bug and when to throw in the towel and come back fresh in the morning was vital. More often, a fresh mind solved problems faster than a stubborn one.

The ETL Takes a While

I had heard estimates that building the ETL/Data warehouse tiers of a BI project could consume as much as 80% of total time spent on the project. That number always sounded exaggerated to me, but when I counted up the days spent doing development on that tier, it was totally accurate.

In my prototype, I used views based on the OLTP source system to act as my OLAP fact and dimension tables, so I effectively skipped the ETL tier. That obviously wasn’t a feasible model for our Production solution. We spent a lot of time designing and testing packages in SSIS, to make sure that the data coming out of the source system made it to the data warehouse properly.

Like many others, we assumed that our data was clean. Boy were we wrong, especially with our geographic/address data. As we tried to build out our attribute relationships, we would be notified of duplicate keys in our dimension attributes, and it led to quite a period of data investigation and cleaning. The Data Profiling task in SSIS helped with that.

Don’t even get me started on MDX

Having progressed nicely from building the data warehouse and ETL tiers, I ran into MDX like a steel wall. I was at first confident that my knowledge of TSQL would help me make the transition, but was sorely disappointed. In fact, my background in TSQL actually interfered with my learning MDX, because some of the familiar constructs (SELECT, FROM, WHERE) have substantially different applications in MDX.

The biggest difference, aside from syntax nuances, is of course that TSQL deals with the 2-dimensional world of columns and rows, but MDX isn’t limited to two dimensions. Visualizing the multi-dimensional result set that I wanted to produce was difficult, so writing the code to get there was challenging. The life-saver is the query wizard that Microsoft sprinkles throughout SQL Server to help you get what you want (in SSAS, SSMT, SSDT, etc). You can build the query visually, and then hit the ‘Design Mode’ button to see the results in MDX.

A life-saver for learning MDX.

A life-saver for learning MDX.

Oh My God is it Ever Fast

The performance results have been unbelievable. We were giggling like school girls when we ran our first set of reports, and hugely relieved to see just how well the BI solution performed. Queries that had taken minutes to run on OLTP was taking fractions of a second on OLAP. Having been the one to pioneer and advocate for a BI solution, this was immensely satisfying for me. There was a huge investment of time, money, and effort, and to see it all pay off when we ran those reports made everyone breathe a sigh of relief, and smile at the possibilities.

Conclusion

One of the more illuminating quotes I heard from a presenter at the PASS BA conference in Chicago was the following: “BI projects aren’t projects. They’re Programs”. With this in mind, I’m excited about the future potential of this program, fully aware that just because they build is over, it doesn’t mean the work is done. Hopefully, the best is yet to come.

Day 7: 9:00pm: Parting Shots (of Glenlivet 12 Y/O)

PASS Summit 2012 Day 3
Up at the crack of dawn for Lynn Langit’s great session on Amazon’s cloud offerings, called AWS. They’ve got several managed service offerings of interest for DBAs, including:

  • EC2: a VM service
  • S3: a storage service
  • Glacier: a backup service
  • RDS: a managed database service

The most compelling for me was obviously RDS, the managed database service. According to Lynn, AWS has been in the market the longest, is usually the most cost-effective, and is most often used in production (highest level of confidence). That’s a compelling trifecta.

We recently had some problems with the HVAC in our server room, and it made us all take pause and wonder if we shouldn’t be moving things off premise and into the Cloud. So, although it doesn’t apply directly to my BI project, I would consider moving my OLTP environment there, including the Dev, Testing, and Production instances. I’d love to hear some stories of people who have migrated from on-premise to RDS, so if you’re out there, please share.

I also picked up a great resource from Microsoft, at their pavilion. It’s a one-page document that diagrams all of the components of the BI offerings. For context, it takes up about half of my bed here in the hotel room – that should give you an idea of how invested Microsoft is in BI. You can download it here, and then head over to Kinkos and have them print it out on a 5 x 3 ft poster for you.

Microsoft BI at a Glance

RT 90, Anyone?
Lynn also mentioned a NFP that she co-founded, which aims to teach kids programming. The programming language that they use is based on Logo, which is what they taught us in Grade 6. Anyone else remember this:

Who else remembers the turtle?

The other highlight of the day was the Birds of a Feather lunch with some other BI professionals. It’s always great to connect in person with professional confrères, and share stories.
(Home run buffet lunch, btw.)

A BI Sam Witwicky
As far as my OLAP Sprint goes, I leave here tomorrow with the following in tow:

  • 1 SSTD MOLAP Solution
  • 1 Fact Table
  • 5 Dimensional Tables
  • 1 Cube
  • Roles, KPIs, Named Calculations, Translations, etc.
  • 2 SSRS Reports
  • 0 trips to the Space Needle, Pike St Market, etc.

Barring something terrible happening to my laptop, I’ll return to the office on Monday with a prototype cube on my laptop, which was my goal. I have to say, I feel a little bit like Sam Witwicky in Transformers at this point – ready to protect the cube at all costs.

Protect the Cube at all Costs

I’m very pleased with what I managed to accomplish this week.
I realize that I am lucky to have had the opportunity to execute my experiment, and I don’t think I could have done it in just 7 days without the following key elements:

  • Focus (on one goal)
  • Isolation (from regular work/life responsibilities)
  • Immersion (in SQL Server)
  • Access to Resources (Experts, Community Members, Vendors, Books, etc.)

The PASS Summit is undoubtedly the ideal environment for those elements to come together, and make the experiment work.

Please, Try This At Home
For anyone out there contemplating doing a similar experiment, I can’t recommend it enough. As IT professionals, we’re often responsible for planning and executing our own training, and I couldn’t be happier that I decided to take this on, and that I had executive support for it. It was intimidating at the outset, and frustrating at several points along the way (I had some Winnebego Man {NSFW} moments with attribute relationships and MDX), but boy was it worth it. If you’re thinking of doing something similar, find a way to make it work with the resources that you have – you won’t regret it.

I’m also completely exhausted. You could use the BIT datatype to count how many times I went to bed before midnight, but you’d need at least a TINYINT datatype to count the cups of coffee I’ve had in 7 days.

Yes, I know, that table would not scale well.

“Is E-mail Down?”
No VCs have emailed me yet about my buffet fly-over idea, so maybe our Exchange server was down for a while or something. Shouldn’t be too much longer. I’ll submit a helpdesk ticket for our sysadmin to make sure. He’ll like that.

Next Steps
The last chicken burrito is down the hatch (sniffle), which means that Chipotle should beat estimates when they next report earnings, but also that my time in Seattle has come to an end. The next steps for me involve a discussion with the executives about how and whether to implement this for real. Those discussions will require careful consideration of the costs involved, and how to scale our resources to build and support a BI platform.

All that can wait until Monday, though.

Day 6: 11:20pm: Dimension Security for the Common Man

PASS Summit 2012 Day 2
Unfortunately, I accidentally slept through this morning’s Keynote address. I was sorry to have missed it, but I think I needed the bit of extra sleep. I’ve been putting in a lot of hours each day, both on my OLAP project, and soaking in all that the PASS Summit has to offer, so it was nice to get an extra hour of sleep. I wish I could say that I woke up to the sun streaming in through the windows, but I don’t think I’ve seen the sun once this week. Nobody at the PASS Summit seems to mind, though – myself included. Clearly we’re a sheltered people.

I attended some great sessions today. My favourite was a session on the ColumnStore index, delivered by Klaus Aschenbrenner. As I noted yesterday, the access to high-level expertise is energizing and inspiring, and Klaus’ session was no different. I went in to the session knowing just about nothing about ColumnStore indexes, and came out wanting to try them out. From what I gathered, they’re a new type of non-clustered index suitable for Fact tables in data warehouses. The indexes work on the column level, rather than the row level, so the level of compression that they can achieve is impressive, and it parlays into ridiculous query speed. Klaus took us through an exploration of the storage and memory footprint, the execution plan, restrictions and optimizations, and a highlight reel of ‘watch this’ moments where the ColumnStore index ripped through 2 billion records of data in seconds.

I also attended a Power Pivot for MOLAP session, and got to see the coveted tool in action, popping a multidimensional cube to life. I spoke with a Microsoft Engineer at the end of the session, and he hinted that it might be released as early as December. He did caution that “the QA team will kill us if we announce an official date”, so I guess they’re not quite ready for release yet.

Buffet Innovation
Lunch was dynamite once again (salad, beans, tomatoes, mashed potatoes and salmon), and while standing in the buffet line, I thought of a great idea: a buffet fly-over. It would be something similar to the fly-overs that you see watching golf on TV – a camera that swoops over the hole and gives the viewer a sense of how it plays (hazards, fairways, optimal lies, etc). The challenge at the front end of the buffet line is knowing what lies ahead, and how to ration the real estate on your plate. A buffet fly-over video playing on a loop at the front of the line would solve that.

Okay – back to reality.

Securing the Cube
I spent most of this evening experimenting with the various ways of securing cube data. I was really impressed by how flexible the security model is, and how easy it was to implement.

I started with the following goal in mind: within the Product dimension of my cube, allow suppliers to see only their data. SQL Server Analysis Services implements this with the concept of Roles, and Dimension Security, in tandem with Windows users and groups.

So, the first thing I did was create some Windows users on my laptop, to mimic product suppliers, so I could test things out with them. Having spent most of my creative energy conceiving of the buffet fly-over, there wasn’t much left in the well for clever supplier company names. So we’re stuck with the following: Dee Licious, Yumm Inc, Choc Inc, Sugarize, and GumGum.

New Windows Users

With just a few clicks in SSDT, I created a new role for the Dee Licious company, and mapped it to the Windows user that I created. That was the first point to really understand: Windows users are mapped to SQL Server Analysis Services Roles, and Roles take it from there – defining who gets to see what Dimension Members and Cells.

From there, all I had to do, to ensure that the DeeLicious user could only retrieve results for its products was configure the Dimensions Data tab inside the Role designer. It was as simple as selecting the appropriate Dimension (DimProduct), and then deselecting all other suppliers members from the SupplierCompanyNameEN attribute. When I flipped over to the Advanced tab, the MDX required to implement the security was revealed, presumably so that I could edit it to implement something more complex (no thanks).

Dimension Security

I used the cube browser to make sure that I had done everything properly (there’s a button in the toolbar that lets you change the security context), and was pleased to see just one supplier member show up for the attribute when I queried it:

Just DeeLicious

But I wasn’t done there.
I decided to see if I could implement the concept of a category manager: an Analysis Services role that was restricted to seeing data only for the particular categories they managed.

I went back to the Computer Management console and created some Windows groups (Chocolate, Gum, and Snack), and dropped my fictitious users into them, in the following way:

Windows Groups Representing Product Categories

Back in SSDT, I deleted my existing role, and created three new ones: Chocolate, Gum, and Snack. I mapped each role to the Windows group with the same name (i.e. the Chocolate.role to the Chocolate Windows group, etc.). I then changed the properties inside the Dimension Data tab to only allow the role to have access to the attribute member of the same name.

Just Chocolate

I repeated the step for my two other roles: Gum, and Snack.

This design pattern works for the following reason: a Windows user can be part of more than one role inside a MOLAP solution. Per BOL: “If a Microsoft Windows user or group belongs to multiple database roles, the effective permissions for the user or group are additive across all database roles (a union of permissions).” In my setup, Windows users were scattered across different groups, and those groups mapped to roles in a one-to-one relationship. So, Analysis Services would retrieve the correct Windows group based on the role, and then pull the correct users out of that group. (If this sounds convoluted, I apologize.) The main difference here is that you are effectively managing dimension security by managing Windows users and groups, instead of within the MOLAP solution’s roles, where you would have to modify and re-deploy the solution.

To test it out, I switched over to the Cube Browser, and changed the security context to the DeeLicious user, which is a member of both the Chocolate and Gum groups, but not the Snack group. I dragged some measures onto the results pane, set my date dimension filter, took a deep breath, and dragged the Category (Confection Type Major Desc) and Company (Supplier Company Name EN) attributes over, too.

It worked – all I saw were the Chocolate and Gum categories.

Success!

I had Songza running in the background and at the moment when I realized that it was working, Aaron Copland’s “Fanfare for the Common Man” came on. It was the perfect song, both in title and tune. (Give it a listen next time you need a soundtrack for a dramatic moment.)

“Fanfare for the Common Man”, Aaron Copland

Analysis Services goes even deeper with Dimension Security, and also offers the concept of Cell Security, but this common man had had enough security for one night, so that will be saved for another day.

Tomorrow is the last day of the PASS Summit, and also the last day of my project; I fly back to Toronto on Saturday morning. I’ve got mixed feelings about winding it all down. I’ll blog tomorrow about how I’m going to proceed from here, and share some final thoughts on the experience as a whole.

Day 5: 11:30pm: The 34 Were Heard!

PASS Summit 2012 Day 1
Day 5 of my OLAP sprint – which I can now more accurately term a MOLAP sprint (or maybe MOLAP meander) – began with the official opening of PASS Summit 2012. DBAs, BI Developers, etc. from near and far packed into the ballroom at the Seattle Convention Center to hear Ted Kummert (VP of the Business Platform Division at Microsoft) give a keynote address, highlighting some of the new features of SQL Server 2012, as well as giving a sense of Microsoft’s vision for the future of the SQL Server product stack. There were some good demos of things like columnstore indexes, and a few interesting announcements, including:

  • SQL Server 2012, SP1 is now available for download
  • Project ‘Hekaton’, which will introduce in-memory processing for OLTP databases. This is scheduled to be released in ‘the next major release’ of SQL Server.

You can catch the Keynote and all the demos here if you missed it.

Amir Netz and Angelina Jolie
The unequivocal star of the show this morning was Amir Netz, a Technical Fellow at Microsoft, who works as an architect on their BI products. The title ‘Technical Fellow’ is one I enjoy. It’s refreshingly clear and unpretentious compared to the litany of acronyms that often follow people’s names (myself included, an MCTS) and yet it’s also somehow humorous in an old-timey, British way. The weight that the title bears, however, is nothing to smirk at – Technical Fellows are the highest level of technical leadership at Microsoft, and apparently there aren’t very many of them at all.

The most engaging thing about Amir is his passion and exuberance for BI. He took us through a Power View demonstration that brought the following conclusion to bear: if you want to make profitable movies with Angelina Jolie, cast her in animated movies where you can’t see her on screen.

Angelina Jolie in “Shark Tales”

Power View for MOLAP
The punchline had the whole room laughing, but the best part was yet to come. Amir announced that Power View was going to be available for reporting against MOLAP models! They’re implementing it by supporting full DAX querying against multidimensional cubes. Since I skirted the tabular model, I don’t know much about DAX, except that it more closely resembles the Excel expression language than anything else, and it’s not as mind destroying (to borrow Pete Myers’ phrase) to learn.

Now, those of you who have been following along at home will know that I’ve got some humble pie to eat. Just last night, I suggested that the fervor for Power View on MOLAP probably hadn’t reached critical mass (just 34 votes on Connect), and made irreverent comments about farting in the wind, and the cathartic – if hopeless – act of voting.

Well, the joke’s on me, because as of Amir’s announcement this morning, Microsoft has proven me wrong. So, there’s pie on my face for sure, but a big smile under that pie, because I’ll get to play with Power View on my cube.

As for the actual release date for the support: I spoke with a couple of Microsoft BI engineers tonight at the Microsoft pavilion, but nobody seemed to know exactly when it would be happening. Fingers crossed it’s sometime soon.

A Home for my Cube
Having put together a halfway decent Analysis Services cube, I started to wonder what the Best Practices were for the underlying hardware layer. I found Karan Gulati, a terrific and very friendly Microsoft Support Escalation Engineer who had come to the PASS Summit all the way from Bangalore India. Karan was one of many Microsoft Engineers working in the SQL Clinic here at PASS. The SQL Clinic is arguably reason enough on its own to justify coming to PASS. The ability to sit down and have real conversations with people who know the product inside and out is fantastic. Last year, I had a question about the Reporting Services .NET control, and got introduced to Brian Hartman, who turned out to be the guy who wrote the control! Such a treat to have such great access to experts.

I had taken done some quick projections on how my underlying OLTLP database might grow over the next 5 years, and came up with a decent approximation of what the size of my corresponding data warehouse and Analysis Services cube would be as well. I`m on the very low-end of what the Microsoft SQL Server product stack is capable of supporting, so although I knew I’d be able to fit my solution on relatively inexpensive hardware, I wanted to know what the recommendations were. Not surprisingly, the recommendation was to separate my OLTP system from my data warehouse, my data warehouse from my OLAP cube, and my Reporting Services instance from my OLAP cube. Everything on its own box. The reason is fairly straight-forward, the different services require different consideration for their hardware specs.

While I accept this answer from a holistic technical perspective, I’m likely going to have to house some of the same services on the same boxes, in order to be cost-effective.

I also spoke with a Dell representative tonight, about their appliance model (HP has a similar one), which is an interesting alternative. Basically, you buy a server whose pricing includes not only the hardware, but also the O/S license, SQL Server license, and support as well. This is a more high-end solution that I will need if I get around to implementing an OLAP solution any time soon, but an intriguing option nonetheless. There’s more info on Microsoft’s site, here. Dell also had a very cool interactive SQL Server Advisor tool, which takes you through a series of questions about your data assets and environment preferences and spits out a very comprehensive solution, including Hardware, Software, and Services elements.

My results from the Dell SQL Server Advisor Tool

The day ended well: I was one of the first 200 people in line at the Pragmatic Works booth, and got a free signed copy of Wrox’s “SQL Server Integration Services 2012″, which will help nicely when it comes times to load my data warehouse and process my cube. Thank you, Pragmatic Works!

I stopped by Chipotle’s for a chicken burrito – no surprises there – before heading back to the hotel, completely exhausted from a long day.
Looking forward to some great sessions tomorrow.
The sprint continues.

Day 4: 9:55pm: The Halfway Point

I spent most of the day today in Pete Myers’ full-day precon session, “Zero to BI Semantic Model with SQL Server 2012″. Pete gave a great presentation, and like yesterday, I came away with some key insights, and points of clarification. (He also referred to MDX as “Mind Destroying Expression”, which I got a laugh out of.) The session focused on comparing the Tabular and Multidimensional models. The choice between the two models is multifaceted, and though Multidimensional has been along for much longer, it would seem that Tabular has all of the momentum. I’ve already made my bed for the week with the Multidimensional model, but weighing the pros and cons for a long-term solution is going to take some careful thought and planning.

Power View
One obvious advantage of the Tabular model is Power View, a very slick BI delivery tool. The demos of Power View that I’ve seen are impressive – data coming to life in reports with animated measures growing and shrinking as they move across dimension axes – very interactive stuff. Take it for a spin if you haven’t seen it yet. If you watched John White on CNN tonight, working that electoral map like a sushi chef, you’ve got an idea of what Power View can do.

Before I learned that it was only available in Tabular mode, I downloaded a trial license of Excel 2013, and tried to connect Power View to my Multidimensional model, but was rejected unceremoniously with the following error message:

No Power View for Multidimensional

So, in the spirit of democracy and today’s US Presidential elections, I joined 34 other unhappy Multidimensional BI developers on the Microsoft Connect thread and registered my vote for enabling Power View to access Multidimensional models. And while I doubt that 35 votes will get any late-night product road-map meetings convened over in Redmond, the voting was cathartic. (NB: this is the kind of thing my colleague John would call farting in the wind.)

SSRS Matrix Report Comparison (OLTP vs OLAP)
On the topic of BI delivery, Pete Myers had an interesting statistic regarding response times. He referenced a study that said if end users had to wait longer than 5 – 8 seconds between submitting a request for data and receiving the response, they not only lost faith in the BI system, but also forgot what it was they were querying.

If Business Analysts are dozing off at 5-8 seconds, have I got a report for you.

Back at the office, we have an SSRS report that gets run a few times a month, on an ad-hoc basis. It’s a gnarly thing with lots of aggregates, and even a few sub-reports (to accomplish distinct counts where the report matrix couldn’t), and it’s sitting on an OLTP relational database.

I remoted in to my workstation, ran the report, and the ran a quick query on the ExecutionLog3 view on the ReportServer. It returned a total execution time of about 75 seconds, which was right in the wheelhouse for that report.

75 seconds on OLTP

I wondered what the performance would be on the same report sitting on top of an OLAP cube. So, I put a similar report together quickly in Report Builder and uploaded it to the instance of Reporting Services running on my laptop. The results weren’t surprising, but still very impressive: 0.185 seconds.

0.185 seconds on OLAP

Every BI developer out there is saying “Duh” right now, but I wanted to see this for myself: OLAP delivering on its promise of high-speed querying against previously aggregated results.

Half Time
Today represents roughly the mid-way point of my OLAP sprint. I’m pleased with my progress thus far – I’ve got a prototype cube built with some basic proof-of-concept reports, and my knowledge is growing in big leaps daily. Wednesday through Friday, I’m hoping to interact with some of the great people here in Seattle at the PASS Summit, and take advantage of all the expertise.

More specifically, I’d like to get some questions answered around hardware platforms, licensing, and security, so hopefully those answers are out there.

Day 3: 11:29pm: BI Immersion

Great day today.

Brian and Devin Knight’s full-day precon session was filled with lots of great BI info and demos. They’re both entertaining, so it’s not difficult to stay engaged for the whole day, though the information download is a bit like drinking from a fire hose. The big takeaways for me were the explanations and discussions on cube dimensions, including dimension hierarchies and attribute relationships. The explanation of how compound key columns in dimension attributes are necessary in hierarchies (to resolve cryptic Visual Studio error messages) relieved a lot of pain for me. The other main takeaway for me was the importance of a solid data warehouse in the MOLAP model (it doesn’t seem to exist in the Tabular model). Brian made the statement that 70% of a BI project takes place in SSIS, tending to the data warehouse, which surprised me.

So, I spent some time cleaning up my pseudo-data warehouse, which is really just a handful of views that I’ve built in my OLTP system. This would be an untenable model in any environment other than on a laptop in a hotel room. I modified my views to include a few more joins, so that I could create more comprehensive dimensions, and removed columns that I knew I wasn’t going to be using. So, my cube is now looking not so hacky:

Monday’s Cube

I also spent some time in SSMS playing with MDX, and came away feeling that MDX is not for dilettantes. I managed to put together a few queries, which I’ll include at the end of this post, but this was very much me playing the role of Don Quixote, galloping on my steed, with a bucket on my head, tilting at windmills.

What?! No code completion for MDX?

I finished the day on a good note. I downloaded Report Builder 3, and an ESRI shapefile from Statistics Canada, and put together a report which visualizes gross sales by province nicely. In the same way that it was nice to see my data come to life in my first experience with the cube browser late on Saturday night, it was great to see my data reflected in a visualization tonight.

As far as data visualizations go, it’s just the tip of the iceberg, but I wanted to get it under my belt. With any luck, I’ll be able to move forward to the more interactive and dynamic tools.

On another note, the food at PASS was outstanding. A well-rounded plate (two, actually) of salad, potatoes, veggies, and salmon was a nice compliment to the rest of my not-so-balanced diet of chicken burritos and Heineken.

Okay – here’s the MDX:

MDX Query #1
A query to retrieve the Total Price measure (gross sales) with tuples specified on 3 different dimensions:
- DimRetailer (City)
- DimProduct (Confection Type Major)
- DimDate (Year)
In plain English, the measure being retrieved means something like: the gross sales of chocolate sold in Summerside (PEI), in 2012
SELECT Measures.[Total Price] ON COLUMNS
FROM [OLAP Sprint DW]
WHERE
(
[DimRetailer].[Geography].[CAN].[Atlantic].[PE].[Summerside],
[DimProduct].[Confection Type Major Desc].[Chocolate],
[DimDate].[Year Name].[Calendar 2012]
);

MDX Query #2
- Added a gropuing attribute from the DimRetailer dimension (Province) to the SELECT clause
- Used a range for the Calendar Year attribute in the FROM clause.
SELECT
Measures.[Total Price] ON COLUMNS,
[DimRetailer].[Retailer State Prov Abr].ALLMEMBERS ON ROWS
FROM [OLAP Sprint DW]
WHERE
(
[DimRetailer].[Retailer Country Abr].[CAN],
[DimProduct].[Confection Type Major Desc].[Chocolate],
[DimDate].[Year Name].[Calendar 2011]:[DimDate].[Year Name].[Calendar 2012]
);

MDX Query #3
- Used the TopCount function in the SELECT clause to achieve the following:
Return top 5 Sales Reps (no house accounts) in 2012, measured by the Total Price measure.
SELECT
{[Measures].[DCount of Retailers], [Measures].[Total Price] } ON COLUMNS,
TopCount( [DimSalesRep].[Sales Rep Full Name].MEMBERS, 6, [Measures].[Total Price]) ON ROWS
FROM [OLAP Sprint DW]
WHERE
(
[DimSalesRep].[Sales Rep Is House Account].[False],
[DimDate].[Year Name].[Calendar 2012]
)

Day 2: 11:47pm: Hoisted by my own Petard!

“Chump change”, I said.
“Stuff the Goose”, I said.

Well, I’m eating those words now.
It would seem that trying to cram 150 million rows into the fact table in my pseudo-data warehouse did not agree with my disk drive.

I started the data generation job, using Red Gate’s SQL Data Generator (which is a great tool; this is clearly the carpenter’s fault, to use the analogy), and then went off to the gym, had a shower, grabbed a bite to eat, etc. I sat back down to work, and wasn’t paying attention to growing size of my data and log files. I was making progress in SSMS writing some basic MDX, but was eventually harangued to the point of interruption by the low disk space warnings, and subsequent fallout from having no space on my C:\ drive. And so I had to abort my MDX fun and take some action. I spent more time than I would have liked to undoing the damage, but can’t help but laugh at my predicament.

Tomorrow the PASS Summit 2012 begins, and I’ve got a full-day session with Brian and Devin Knight on Building a SQL Server 2012 BI Platform, which should be great. I’ll find some time after all that to post my thoughts on MDX – assuming my laptop is still in working order and I haven’t blown everything up.

This wouldn’t be fun if it was all a piece of cake, right?

Day 2: 9:35am

Dawn of Day 2 – 6 days of dev to go.

I’m pleased with my progress thus far: in something like 36 hours I’ve used SQL Server Data Tools to build a very rudimentary Mutlidimensional OLAP Cube, with one Measure and a handful of Dimensions. I spent some time exploring different sections of my data using the Cube Browser and was excited with the results and the myriad possibilities for analysis.

Today will be my last full day of unstructured time for development, so I want to take full advantage. Tomorrow PASS Summit 2012 kicks off across the street, and I’ve got two full-day seminars to look forward to on Monday and Tuesday, followed by three days of conference sessions before heading back to Toronto.

I’d to accomplish a few things today:

Stuff the Goose:
I’ve got just under 7 million rows in my lone Fact table.
Chump change.
To really understand the scaling potential, I’m going to fill that skinny Fact table with a lot more data. I’ve always been a fan of Red Gate‘s DBA tools, and will use their SQL Data Generator to that end today.

Speak the Language:
Cube Browsing in SSDT was fun – seeing the results grid light-up instantaneously as I dragged and dropped Measures and Dimensions – so I’ll be venturing back into familiar territory with SSMS and trying to replicate the results using MDX. I’m already feeling quite saturated with the all of new OLAP nomenclature, so I’ll have to find some room in my head for the MDX syntax. I’m comforted by the sight of the familiar SELECT, FROM, and WHERE structures, but know that the experience is going to be different from writing TSQL.

Time for some breakfast, and then it’s back to battle.
Looks like the weather will cooperate with 6 more days of programming:

Programming Weather

Day 1: 1:58am – Eureka

I went back to the drawing board with my dimensions, as they were the ones causing me all the trouble.

To be more specific, I took out the elephant gun and started them from scratch.

Because I’m working from an OLTP datasource, and not a data warehouse that might be a bit cleaner, I was getting into trouble. Trouble to the tune of missing values, duplicate values, and the like, in my dimension attributes.

So, I wrote some quick and dirty TSQL UPDATE statements against my base OLTP tables, and then wrote some views in order to accomplish a bit of horizontal and vertical filtering.

Back in SSDT, I defined some hierarchies, and processed the dimensions one at a time, so that I could review the errors and warnings as they came up, and not be crushed by 29 errors at once.

I was rewarded for my patience:

“Ta Da!”

Those three words – “Deployment Completed Successfully” – were music to my ears.

I flipped over to the Cube Browser tab in my newly-minted Cube and started to experiment. The Cube Browser is where the whole OLAP/Analysis Services experience really started to come to life. It was exciting to see the hierarchies and measures in action – a visual confirmation that my data was starting to take its proper shape. I was able to drag and drop the Dimension attributes and Measures quickly around the interface and was giddy at how quickly the results were updated. The filtering was straight-forward and intuitive, and allowed me to quickly spot trends in the data.

I got through 200 pages of my Wrox book, and managed to get a very rudimentary cube up and running today, so I’ll sleep well tonight!

Day 1: 5:49pm – Progress, and then 28 Errors and 7 Warnings

Spent the morning and afternoon reading up on the basic elements of a multidimensional model, including the data warehouse, facts, measures, dimensions, attributes, hierarchies, cubes, etc. The lingua franca of SQL Server Analysis Services is a bit of a headache, but I’m sure I’ll get used to thinking in those terms before too much longer. Two-dimensional life in OLTP was so much simpler: rows and columns!

I ‘modeled’ my first schema on a notepad at the pub across the street from the hotel, while taking down a cheeseburger. Although it looks more like a grocery list than a star or snowflake schema, it’s a start. From what I’ve read, I don’t understand why everyone wouldn’t want a snowflake schema, to take advantage of more complex hierarchies, but I’m in no position to be asking that question yet.

Poor Man’s Schema

Once back at the hotel, I created a data warehouse in the relational database engine in SSMS from a copy of our production OLTP database. I’m not worrying about using SSIS to load the data warehouse from the OLTP source yet – I’ll save that for later.

From there, I jumped over to SSDT, and in about 30 minutes, quickly knocked out the following:

  1. Created an Analysis Services Multidimensional Project
  2. Created a data source
  3. Created a data source view
  4. Created a cube
  5. Created some dimensions

All of the wizards in SSDT made it very easy, and I had a ‘this is too good to be true’ feeling, but nonetheless couldn’t resist feeling a bit pleased with myself for having filled the design surface with so many shiny new things.

Progress!

My fears were confirmed when I went to deploy the solution back to the server and received no less than 28 errors and 7 warnings:

Oh dear.

That’s a lot of errors and warnings. So, I’m going to take a break, find some food, and dive back in later tonight when my confidence has recovered.

Day 1: 10:06am – The First Fork – Tabular vs Multidimensional

The first fork in the road seems to be picking an OLAP model – Tabular or Multidimensional.

I’m going with Multidimensional.

As I stated in the outset, it’s not my intention to be thorough in my research and decision-making this week. So, this Microsoft White Paper, and Chapter 1 from my Wrox book have given me enough input to choose the Multidimensional model.

Loosely, the factors in my decision are:

The Multidimensional model has been around a lot longer (5 releases of SQL Server), and is described as the ‘Traditional’ model. Presumably this means it is more stable and mature. I’m also hoping there are more resources out there for me to learn it.

The Tabular model is intriguing to me mostly for its newness. The in-memory xVelocity (aka Vertipaq) engine sound like lots of fun, and I know that I’m ignoring an opportunity to build a quicker, simpler solution using the Tabular model, but I can’t help but feel like it is somehow a shortcut that will bite me later on. It also seems to involve a lot of Excel. With all due respect to Excel and its legion of followers, I didn’t come all the way to PASS in Seattle to play in Excel all day. :)

Like I said, I’m making snap decisions to get this built, so there we are – Multidimensional it is.

The 6am breakfast of coffee and Pringles has run their course, so my next decision will be where to get some food.

Day 1: 6:58am

Up early this morning, probably due to my body still being on Toronto time (EST).

As the first cup of coffee makes it way through my blood stream, I’m trying to set out some goals for today, my first ‘real’ day of this OLAP sprint experiment.

My target is to get through the first 200 pages of the Wrox book.

The first 200 pages should give me some idea of the fundamentals, including:

  • Multidimensional BISM
  • MDX
  • Data Sources and Data Views
  • Dimension Design
  • Cube Design

I’m also hoping that this will help me to shape the agenda for the rest of the week, as it would be nice to know where I’m going.

As I type, the source code and samples are downloading from the Wrox site, and I’ve already restored a copy of my production OLTP database to my laptop, because I’d like to work with my data this week as I build and explore.

The poor-man’s version of a Keurig/Tassimo coffee maker that they’ve got here in the hotel is passable, but I may have to venture out and get something that doesn’t taste so much like puddle water.

Day 0: Supplies

I have the following supplies, to assist me in my task (in no particular order):

I also ran out to the Kress Supermarket down the road, for sundry nerd food (Pringles, Beer, Snacks, etc.).

I’ll post more of the resources as I stumble upon them.

Follow

Get every new post delivered to your Inbox.

Join 40 other followers