Proof of Concept (POC) initiatives are key to delivering key insight into full-scale project feasibility and success. SQL Server Parallel Data Warehouse (PDW) POC engagements are a bit different in scope than traditional Symmetrical Multiprocessors (SMP) Server or Data Warehouse POC implementations and can be defined, at a minimum, by the following best practice guidelines.
Guideline 1: Identify Environmental Challenges and Prioritize
Challenges to any data-driven POC typically reside in data acquisition, malformed data structures, mismanaged ETL frameworks, reconsolidation processes (or lack thereof), and non-optimized procedural coding and can also hinder the progress of a POC initiative. When encountering challenges in these areas, the ability to quickly identify bottlenecks, apply sustainable workarounds, and/or prioritize remaining issues can greatly increase the likelihood of success in a PDW POC engagement.
For example, an environmental constraint may exist so that a source system is not available during the day due to testing or development activities. These testing or development activities need to be realigned, for example, to ensure that the project team can source from these systems and into the PDW during working hours. Similarly, people resources assigned to the POC are continually available for support and are to maintain a team synergy during working hours to ensure that daily milestones are being met.
Guideline 2: Requirements Gathering
Typically, expectations and metrics in regards to volume vs. performance will be at the top of a client’s requirements list. Most often, the client wants to get an idea of what kind of throughput they can expect when implementing a PDW. Although there may be other items to address within the requirements scope (backup/restore metrics, column-store indexing, partitioning,etc) keep these expectations manageable and implement those requirements most impactful to the client expectations. Furthermore, working within the confines of the SMART methodology (Specific, Measureable, Achievable, Realistic, and Timescale) adds value to client communications, engagement, and expectations. Lastly, take into consideration time limits that you may have within an engagement and plan accordingly.
Guideline 3: The Testing Phase
The PDW partner must work closely with the client to understand the current pain points and clearly define testing requirements. As many of us have experienced, testing can be a time consuming processes in any initiative where the project team may be focused primarily on data quality vs. overall process performance. From test case creation to reconsolidation, there are many bases to be covered, however, we must keep in mind that although data transformations may need to be functionally tested, they are secondary to acquiring core PDW performance baselines.
The recommendation here is to narrow the scope of testing to critical dataset processing times and weigh those findings against varying data volumes. Work with the client to get specific definitions for a successful testing phase and acceptable performance baselines. For example, company XYZ has determined that in order for PDW to be a viable option, they must load 4 Tb of prior day transaction detail in under an hour. Maybe the requirement also includes that the loads must occur successfully on a daily basis over the course of a week. Additionally, the client may be interested in documenting ad-hoc query times and perform ad-hoc data loads during a large batch load.
In summary, the following items should be addressed to define and simplify the testing phase.
-Mixed Workload Concurrency
-Data Load
-Query Performance
-Query Concurrency
-Data Expansion
Guideline 4: Resource Planning and Communication
One of the key components to a successful PDW POC is to have the roles and responsibilities of the project clearly defined while ensuring that both the internal client and partner teams have applicable resources assigned to the project. As most POC engagements are limited to 2-3 weeks, it is also critical that both the partner and the client prioritize the duties of their internal resources accordingly. At a minimum, the client roster should include a DBA, Data Developer, and a Business Analyst. On the partner side, a PDW Architect and Data Developer should be assigned to the project. Of these resources, some may be allocated full-time to the POC, others maybe half-time depending on the current daily and/or weekly milestone. Each assigned resource should also have a designated backup resource to serve as a project alternate as unforeseeable issues, personal or otherwise, can arise.
Communication throughout the relatively short engagement is also critical to the projects success. It should be understood that frequent updates via a daily SCRUM (or other format) along with EOD emails play an essential role in meeting the POC deadlines and effectively communicating progress. To keep things simple, one could also communicate a weekly progress summary that includes simple graphical indicators, such as RAG traffic lights, to visually indicate project progress. Additionally, when new requirements arise and their inclusion into the POC is agreed upon, they should be explicitly communicated to the project manager and sponsor along with implementation recommendations and updated estimates to the overall POC timeline. Lastly, it is a recommendation add additional time to the initial estimate/plan, between 2-3 days, to account for contingencies and unforeseen resource constraints.
Guideline 5: Establishing a Partnership with Microsoft PDW COE
Microsoft has established a great support structure around both PDW POCs and full-scale implementation engagements. Having previously been engaged in a couple of full-scale PDW engagements, I can vouch for the effectiveness in partnerships formed between the client, vendor, and MS COE. Throughout an engagement, the project team may come up with questions and/or request direction applicable to query optimization, data loading, and/or general architecture. Not only will the MS COE provide a fanatical level of support when summoned, they are also proactive in communicating upcoming releases and developments. Additionally, the MS PDW team maintains a wiki that cultivates PDW innovation and revolves around a technical community comprised of both internal and external resources. In summary, it is highly recommended that client and vendor maintain a close relationship with the MS COE, as this internal support group is poised to define, communicate, and demonstrate best practice solutions and ensure the timely delivery of a PDW POC.