Context
As a business owner/manager, you surely want to have all the insight that good reporting can provide - having visibility of that data gives you better control over your establishment - and that's what Data Pipeline is about.
What is it?
Data Pipeline is PAR Ordering's data sharing feature. It gives you access, in bulk, to the data gathered and generated by our system.
It's designed to feed data from PAR Ordering to you, allowing you to inject data into your data lake and/or data warehouse environment as part of a broader analytics/BI strategy you may run. The data belongs to you, therefore you should be able to access it in its raw form; you can draw meaningful conclusions by:
- running your own custom reporting
- connecting a BI tool
- merging the data with other data you own
- power additional functionalities like a CRM platform
You will benefit from the Data Pipeline by obtaining data models for (just to name a few):
-
Customer Accounts
-
Orders
-
Order Items
-
Menu Items
-
Loyalty & Discounts
What's good about it?
Here are the few benefits you may find especially important:
Rich user experience
The interface includes:
- pipeline set up and management
- a monitoring dashboard that displays operational metrics of the pipeline
- a data catalog to aid with data discovery
- an announcements center to notify and communicate with Users
Robust, reliable and scalable pipeline
- optimized infrastructure
- fault tolerant design
- monitoring and alerting capabilities
- change announcement capabilities
Worth
The Pipeline is:
- cost efficient
- consistent
- secure and reliable
Education and data literacy
The data catalog enables data discovery and helps you understand and use our data effectively to make informed decisions.
Logical Model
This new data model operates with ETL-ready ('extract, transform, load') data files that can go to your cloud storage location, after which you may 'ETL' these into your data lake or warehouse.
Frequent updates
The Pipeline includes a subset of data from a given set of PAR Ordering tables. They are updated regularly and the range is:
- near real-time (< 15 minutes)
- every 6 hours
- daily
- weekly
Please note: there is an increase in cost with higher frequencies.
Supported Data File Formats
We will offer one or possibly two data formats. The format will be 'ETL friendly' - meaning that it's supported by all or nearly all popular ETL tools. The options are:
- compressed NDJSON - which has the advantage of being readable and offering row-level success/failure of import
- parquet - which has the advantage of being a very popular data lake format
Supported Destination Locations
The Product will deliver data files, in the file format specified, to a highly available cloud storage destination provisioned by you, with write permission granted to PAR Ordering by your storage instance:
-
AWS S3
-
Azure Data Lake Storage (ADLS)
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers scalability, data availability, security, and performance. Amazon S3 can be used to store and retrieve any amount of data at any time, from anywhere.
Customers of all sizes can use Amazon S3 to store and protect any amount of data for a range of use cases, such as data lakes, websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics.
Azure Data Lake Storage Gen2 (ADLS) is a cloud-based repository for both structured and unstructured data. For example, you could use it to store everything from documents to images to social media streams. Data Lake Storage Gen2 is built on top of Blob Storage.
Azure Blob Storage is a general purpose, scalable object store that is designed for a wide variety of storage scenarios. Azure Data Lake Storage Gen1 is a hyper-scale repository that is optimized for big data analytics workloads.
How does it work?
Here's the flow - please keep in mind we're getting more technical here:
- As a prerequisite, the storage location you choose to use will need to be configured on your side first. Follow the step by step how-to guide here
- Once your storage location is set up, you can now configure your data pipeline within the Management Center
The set up requires:
-
-
- The name of the pipeline
- Your storage provider - Destination Type
- Your storage location URL - Destination URL:
- If type is AWS, provide destination_url
- if Azure fetch the URL from secret manager)
If you need more information, you can see the details here - The format you wish the data to be provided in – Destination File Type
- Finally, the frequency of data updates – Frequency
-
-
- Based on the configuration, the credentials are verified by doing a test connection from the database
- If connectivity test failed, a response will be sent to the Management Center with appropriate error message
- If successful in connecting, a response will be sent to the Management Center and the Management Center will update the database with 'pending activation' status. The database will create two pipeline jobs, one for full load and second for incremental load. The jobs will continuously update an audit table with statistics/metrics for both full load and CDC (Change Data Capture for incremental loads)
- Once the pipeline is active, the data will start flowing to your destination in your requested format. And if there are any schema changes at the source (addition of column, drop column, datatype change) the change will flow to destination without breaking the pipeline
Users will be able to monitor the pipeline status below metrics:
-
- View pipeline status - availability metrics: The Management Center will fetch pipeline status and activation date from database
- View alerts in case of delays and failures: the Management Center will update status/details
- Errors notifications - The Data Pipeline will proactively send error status and messages to the Management Center which in turn can be sent via email notifications
- View pipeline status - availability metrics: The Management Center will fetch pipeline status and activation date from database
-
Availability
When will it be available? It's live and available from Q3 2023.
Who can use it? If you are interested in unlocking the true potential of your data, please contact your Sales Representative for further details.
Which Users can access it? Brand Managers and Admins.
Comments
0 comments
Please sign in to leave a comment.