Data Pipeline – PAR Ordering Knowledge Base

Context

As a business owner/manager, you surely want to have all the insight that good reporting can provide - having visibility of that data gives you better control over your establishment - and that's what Data Pipeline is about.

What is it?

Data Pipeline is PAR Ordering's data sharing feature. It gives you access, in bulk, to the data gathered and generated by our system.

It's designed to feed data from PAR Ordering to you, allowing you to inject data into your data lake and/or data warehouse environment as part of a broader analytics/BI strategy you may run. The data belongs to you, therefore you should be able to access it in its raw form; you can draw meaningful conclusions by:

running your own custom reporting
connecting a BI tool
merging the data with other data you own
power additional functionalities like a CRM platform

You will benefit from the Data Pipeline by obtaining data models for (just to name a few):

Customer Accounts
Orders
Order Items
Menu Items
Loyalty & Discounts

What's good about it?

Here are the few benefits you may find especially important:

Rich user experience

The interface includes:

pipeline set up and management
a monitoring dashboard that displays operational metrics of the pipeline
a data catalog to aid with data discovery
an announcements center to notify and communicate with Users

Robust, reliable and scalable pipeline

optimized infrastructure
fault tolerant design
monitoring and alerting capabilities
change announcement capabilities

Worth

The Pipeline is:

cost efficient
consistent
secure and reliable

Education and data literacy

The data catalog enables data discovery and helps you understand and use our data effectively to make informed decisions.

Logical Model

This new data model operates with ETL-ready ('extract, transform, load') data files that can go to your cloud storage location, after which you may 'ETL' these into your data lake or warehouse.

Frequent updates

The Pipeline includes a subset of data from a given set of PAR Ordering tables. They are updated regularly and the range is:

near real-time (< 15 minutes)
every 6 hours
daily
weekly

Please note: there is an increase in cost with higher frequencies.

Supported Data File Formats

We will offer one or possibly two data formats. The format will be 'ETL friendly' - meaning that it's supported by all or nearly all popular ETL tools. The options are:

compressed NDJSON - which has the advantage of being readable and offering row-level success/failure of import
parquet - which has the advantage of being a very popular data lake format

Supported Destination Locations

The Product will deliver data files, in the file format specified, to a highly available cloud storage destination provisioned by you, with write permission granted to PAR Ordering by your storage instance:

AWS S3
Azure Data Lake Storage (ADLS)

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers scalability, data availability, security, and performance. Amazon S3 can be used to store and retrieve any amount of data at any time, from anywhere.

Customers of all sizes can use Amazon S3 to store and protect any amount of data for a range of use cases, such as data lakes, websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics.

Azure Data Lake Storage Gen2 (ADLS) is a cloud-based repository for both structured and unstructured data. For example, you could use it to store everything from documents to images to social media streams. Data Lake Storage Gen2 is built on top of Blob Storage.

Azure Blob Storage is a general purpose, scalable object store that is designed for a wide variety of storage scenarios. Azure Data Lake Storage Gen1 is a hyper-scale repository that is optimized for big data analytics workloads.

How does it work?

Here's the flow - please keep in mind we're getting more technical here:

As a prerequisite, the storage location you choose to use will need to be configured on your side first. Follow the step by step how-to guide here
Once your storage location is set up, you can now configure your data pipeline within the Management Center

The set up requires:
1. 1. 1. The name of the pipeline
    2. Your storage provider - Destination Type
    3. Your storage location URL - Destination URL:
      - If type is AWS, provide destination_url
      - if Azure fetch the URL from secret manager)
      If you need more information, you can see the details here
    4. The format you wish the data to be provided in – Destination File Type
    5. Finally, the frequency of data updates – Frequency
Based on the configuration, the credentials are verified by doing a test connection from the database
If connectivity test failed, a response will be sent to the Management Center with appropriate error message
If successful in connecting, a response will be sent to the Management Center and the Management Center will update the database with 'pending activation' status. The database will create two pipeline jobs, one for full load and second for incremental load. The jobs will continuously update an audit table with statistics/metrics for both full load and CDC (Change Data Capture for incremental loads)
Once the pipeline is active, the data will start flowing to your destination in your requested format. And if there are any schema changes at the source (addition of column, drop column, datatype change) the change will flow to destination without breaking the pipeline

Users will be able to monitor the pipeline status below metrics:
- - View pipeline status - availability metrics: The Management Center will fetch pipeline status and activation date from database
  - View alerts in case of delays and failures: the Management Center will update status/details
  - Errors notifications - The Data Pipeline will proactively send error status and messages to the Management Center which in turn can be sent via email notifications

Availability

When will it be available? It's live and available.

Who can use it? If you are interested in unlocking the true potential of your data, please contact your Sales Representative for further details.

Which Users can access it? Brand Managers and Admins.