How Canada’s #1 Online Eyewear Store Drives Insights in Real Time


Industry: eCommerce
Use case: Near real-time reporting; ad-hoc analytics; machine learning
Technical challenge: Building a codeless, low-maintenance, low-latency cloud data platform; upserts and slowly changing dimensions on Amazon S3
15 minutes
end to end from event time to BI dashboard
6 months
of implementation saved
lines of code required

The Backstory

Founded in 2000, Clearly is the biggest online eyewear retailer in Canada, with customer bases in Canada, the USA, Australia and New Zealand. They sell prescription glasses, sunglasses, contact lenses, and other eye health products, as well as provide access to innovative, customer-centric tools like Virtual Try On and same-day contact lens delivery. They also have retail stores across Canada where customers can shop and access optometrist services.

Clearly believes everyone deserves to see, and everything they do serves their mission to eliminate poor vision. They strive to use their platform to make eyewear affordable, make vision care accessible, and spread awareness about the importance of eye health.

The Business Scenario: Ensuring Smooth eCommerce Operations with Fresh Data

As a large-scale, mostly-online and retail operation, Clearly needs to move quickly and make data-driven decisions in real-time such as to identify whether a promotion is working as expected, a website is down, or there is an issue with payments, anomalies in business metrics and to be able to act on that information within hours rather than days.

In order to respond to events as they happen, avoid costly errors and ensure a high level of performance and customer satisfaction, Clearly needs timely access to data. The data and analytics team needs to be able to surface insights coming from multiple sources in order to engage executives and intervene when a business-critical issue arises.

The Technical Challenge: Building a High-performance, Low-maintenance Cloud Data Platform

Salman Hameed is the Director of Data & Analytics at Clearly, and leads a growing team of analysts, data scientists, data engineers and machine learning engineers. He set out to build a best-in-class data platform for ecommerce, built on the following principles:

  • Data lake architecture: enabling access to granular raw data for machine learning and skipping slow and cumbersome batch processing
  • Low-latency analytics: ability to detect anomalies in near real time – last 15 minutes, last 30 minutes and last hour
  • Low maintenance and low code: avoiding the need to maintain a complex code base for data transformation
  • Broad access to data: Salman’s team consists of a diverse range of data practitioners, so the ability to use SQL rather than Scala or Java was important.
15 minutes
end to end from event time to BI dashboard
6 months
of implementation saved
lines of code required
quote icon “Since we need to act on data in near real-time, we couldn’t accept the traditional framework of data warehouses and scheduled batch processing. We wanted a low latency, near codeless platform that could move data across our systems in minutes, and Upsolver lets us do this with simple SQL.”

The Solution: Serving Curated Datasets On-demand with Upsolver and Amazon Athena

Clearly’s source data resides on multiple different systems – including ERP, fulfillment and order management tools, managed on SQL Server databases. To enable near real-time analytics, reporting and machine learning, data needs to be moved to Amazon S3 and made available for multiple different target systems – including Amazon QuickSight, Microsoft Power BI, and Amazon SageMaker.

Reference Architecture

quote icon “In 25 years in the industry, I’ve never seen anything that can do what Upsolver does so effortlessly.”

Clearly uses Amazon DMS to replicate its various source databases into an Amazon S3 bucket. Upsolver is then used to move and process data that lands in a landing zone ready for curation. These datasets are related to different business processes like demand sales, shipped sales, cost, margin and product attributes.

All detailed curated datasets reside on Amazon S3 and can be queried instantly with Amazon Athena thanks to Upsolver’s integration with the Glue Data Catalog. Using Athena, Clearly can provide timely data to support a wide range of analytic use cases:

  • Ad-hoc analytics using Microsoft Power BI
  • Near real time dashboards using Amazon Quicksight
  • Machine learning using Amazon Sagemaker and Managed ML models
quote icon “ 'Don’t reinvent the wheel' is one of the pillars of our data strategy. With Upsolver, I can see the most up-to-date data on Amazon S3, and I don’t need to manage complex architecture that provides the same functionality.”

Dealing with Upserts and Slowly Changing Dimensions

One of the most challenging aspects of building the cloud architecture at Clearly was the need to update data on S3. As an object storage system, S3 is designed as immutable object storage where events are written by order of ingestion rather than occurrence and past data cannot be easily updated.

Solving the upsert challenge problem proved to be extremely difficult, requiring a dedicated EMR cluster to be managed with Apache Hudi, which in turn would require extensive custom coding. With Upsolver, Clearly was able to abstract this complexity into a few clicks in a visual interface.

Read more: Solving the Upserts Challenge on S3

quote icon “We looked at every major analytics platform - Alteryx, Matillion, Snowflake, Azure, and Google Cloud tools - but Upsolver and AWS proved to be the partners we needed to build a world class modern data platform.

Benefits Achieved

  • 15 minutes end to end latency from event created in source system to QuickSight dashboard reading from Athena
  • Ability to act on data in real-time, such as responding to a server outage during Black Friday
  • Solution up and running in two months with no additional headcount
  • No coding in Scala/Java, only SQL for basic user-friendly interface requires no ramp-up time.
  • Great and timely tech support reduces data pipeline downtime.
quote icon “A near-codeless data solution is another pillar of our data strategy. Without Upsolver, I would need a complex set of services and hire people to write Scala/Python. It would take 6-9 months to ramp 2-3 developers before we could even plan to go to production. With Upsolver, we went live 70% faster and we didn’t need to write any code.”