Founded in 2000, Clearly is the biggest online eyewear retailer in Canada, with customer bases in Canada, the USA, Australia and New Zealand. They sell prescription glasses, sunglasses, contact lenses, and other eye health products, as well as provide access to innovative, customer-centric tools like Virtual Try On and same-day contact lens delivery. They also have retail stores across Canada where customers can shop and access optometrist services.
Clearly believes everyone deserves to see, and everything they do serves their mission to eliminate poor vision. They strive to use their platform to make eyewear affordable, make vision care accessible, and spread awareness about the importance of eye health.
As a large-scale, mostly-online and retail operation, Clearly needs to move quickly and make data-driven decisions in real-time such as to identify whether a promotion is working as expected, a website is down, or there is an issue with payments, anomalies in business metrics and to be able to act on that information within hours rather than days.
In order to respond to events as they happen, avoid costly errors and ensure a high level of performance and customer satisfaction, Clearly needs timely access to data. The data and analytics team needs to be able to surface insights coming from multiple sources in order to engage executives and intervene when a business-critical issue arises.
Salman Hameed is the Director of Data & Analytics at Clearly, and leads a growing team of analysts, data scientists, data engineers and machine learning engineers. He set out to build a best-in-class data platform for ecommerce, built on the following principles:
Clearly’s source data resides on multiple different systems – including ERP, fulfillment and order management tools, managed on SQL Server databases. To enable near real-time analytics, reporting and machine learning, data needs to be moved to Amazon S3 and made available for multiple different target systems – including Amazon QuickSight, Microsoft Power BI, and Amazon SageMaker.
Clearly uses Amazon DMS to replicate its various source databases into an Amazon S3 bucket. Upsolver is then used to move and process data that lands in a landing zone ready for curation. These datasets are related to different business processes like demand sales, shipped sales, cost, margin and product attributes.
All detailed curated datasets reside on Amazon S3 and can be queried instantly with Amazon Athena thanks to Upsolver’s integration with the Glue Data Catalog. Using Athena, Clearly can provide timely data to support a wide range of analytic use cases:
One of the most challenging aspects of building the cloud architecture at Clearly was the need to update data on S3. As an object storage system, S3 is designed as immutable object storage where events are written by order of ingestion rather than occurrence and past data cannot be easily updated.
Solving the upsert challenge problem proved to be extremely difficult, requiring a dedicated EMR cluster to be managed with Apache Hudi, which in turn would require extensive custom coding. With Upsolver, Clearly was able to abstract this complexity into a few clicks in a visual interface.