High Level Roadmap

 

The Common Data Platform program will be incrementally delivering value each year to campus while maintaining the current legacy data warehouse infrastructure.  CDP is currently targeting the end of 2027 to transition into operations, such that it is delivering the majority of campus' data needs and the legacy data warehouse infrastructure can begin to sunset.

CDP Roadmap: 2023 - 2027

cdp-roadmap-2023-to-2027.jpeg

 

Workstream Summaries

Jump to:      Data Lake | Curated Data Models | Reporting | Plan for InfoView

Data Sharing | Preparing for Operations | Program History

Data Lake

CDP's Data Lake is a centralized repository that stores large volumes of data in its original form, effectively replicating the data from our major campus systems.  The data lake is a new capability relative to the legacy data warehouse, and enables the modern Extract-Load-Transform (ELT) methodology.  Where our legacy data warehouse requires upfront development of schemas to ingest data from our campus systems, the data lake provides a central location where data can be saved "as is".

The following campus systems will be added to the Data Lake in 2024:

Application System Owner Data Domain
Affinaquest University Advancement Advancement
CruzFix Physical Planning, Development & Operations Space

Refer to Program History below to see previously incorporated campus systems.

Curated Data Models

Curated Data Models are a collection of datasets that have been selected and modeled to meet the needs and interests of UCSC stakeholder groups.  The primary goal of these data models is to prepare data for analysis, decision-making, and broader use by improving its usability, context, and relevance.  The roadmap shows the duration for developing the first versions of these data models; however, ongoing maintenance and enhancements will occur for the operational lifetime of the data models.

The following Data Models will be developed in 2024:

Data Model Primary Goal / Purpose Data Domain
Active Enrollment
  • Daily refersh of student enrollment data
  • Enables longitudinal analysis via effective dated records


Student Analytics
Incremental Enrollment
  • Academic term-based student enrollment data snapshots that are incrementally appended to prior terms' data
  • Enables point-in-time longitudinal analysis
  • Enables 3rd Week and End-of-Term analysis for each academic term
Student Analytics
Financial Hierarchy
  • Daily refresh of the chart of accounts (FOAPAL) hierarchy and associated attributes
  • Enables a common defintion of FOAPAL elements and their relationships that can be used in other data models with disparate data sources
Financial Analytics
Employee Job
  • Daily refresh of employee job data and associated attributes
  • Enables a common definition of employee job elements and their relationships that can be used in other data models with disparate data sources
Personnel Analytics
Space Hierarchy
  • Facilities and space hierarchy and associated attributes
  • Enables a common defintion of space elements and their relationships that can be used in other data models with disparate sources
Space Analytics

Reporting & Plan for InfoView

Modernizing our data infrastructure is ultiamtely with the goal of delivering more reliable and timely information to UCSC stakeholders, who will largely interact with data through reporting tools.  While reporting tools may vary, the CDP will be addressing the needs for two main types of tools: 1) Operational Reporting; and 2) Business Intelligence Analysis / Visualization. 

1) Operational Reporting: The next version of the campus' primary operational reporting tool, InfoView, is a sizeable upgrade that will require significant migration effort by the Data Management unit and Report Owners across the organization.  The current version of InfoView has been scheduled for end-of-life in 2027 according to SAP's product roadmap. Before embarking on migration activities, an Options Assessment will be conducted to evaulate the upgraded version of InfoView and other operational reporting tools to determine the best course of action going forward.

2) BI Analysis & Visualization: UCSC's current BI / Visualization tool will be integrated into the CDP to better enable data analysts and data scientists to perform analysis.  A key design principle for the CDP is to bring critical business definitions and logic into curated data model(s) rather than each visualization needing to reconstruct them - which inadvertantly creates opportunities for differing approaches to the same concepts. The CDP refers to this as establishing a shared "Semantic Layer" for reporting tools to leverage and benefit from a single source of truth.

Data Sharing (Reverse ETL)

A Reverse ETL tool provides the ability to share curated datasets from the CDP to a variety of our major campus systems.  Today, the different data sharing approaches have a lack of source-to-destination transparency that limits the confidence in data quality and effective data governance.  In 2024, the CDP will pilot an implementation of a Reverse ETL tool in partnership with University Advancement to provide an evaluation of the benefits.  Should the pilot prove successful, UCSC would be positioned to have a standardized and secure tool to share data between campus systems.

This pilot implementation will include the deelopment of data models that address common data share requests:

  • Person Dimension (Student, Staff, and Academic Personnel)
  • Chart of Accounts

Refer to Program History to see previously implemented tools and capabilities for CDP.

Preparing for Operations

Once the foundational functionality is in place, the CDP will transition into operations with processes for continuous improvement.  By incrementally delivering value to campus during each year of implementation, the CDP aims to reduce the stakeholder burden of migrating away from the current data warehouse environment.  CDP is currently targeting the transition to operations by the end of 2027, such that is it delivering the majority of campus' data needs and the current data warehouse can begin to sunset. With the end in mind, each implementation year will need to support a successful transition to operations.

Begining in 2024, resources within the Data Management team will be prioritizing development of the CDP and limiting effort within the existing data warehouse to only what is necessary for secure operations:

  • For any new system that is approved for implementation by the IT Systems & Data Governance Committee, CDP will be incorporated as the tool for data repository, modeling, sharing, and reporting needs
  • Requests for new data asset development from existing systems will be delivered through the CDP
  • Requests to enhance existing data assets will be assessed for the level of effort required to deliver a production-grade solution.  High levels of effort will be delivered through the CDP and low effort changes may be considered within the current environment.
  • Requests for new data sharing integrations, whether for new or existing campus systems, will be delivered through the CDP.