In a previous article, we covered one of the main challenges in machine learning: the need to set up, maintain and orchestrate two separate ETL flows - one for offline processing and creating the training dataset, and one for real-time serving and inference.
Amazon Redshift remains one of the most popular cloud data warehouses, and is still constantly being updated with new features and capabilities. Over 10,000 companies worldwide use Redshift as part of their AWS deployments (according to a recent press release).
Updating or deleting data (upserts) is a basic functionality in databases, but is surprisingly difficult to do in data lake storage. In this article, we will explain the challenge of data lake upserts, and how we built a solution to enable an efficient and quick update and delete operations on object storage using Upsolver’s SQL-based data transformation engine.
Data lakes are the cornerstones of modern big data architecture, but getting them right can be tricky. How do you design a data lake that will serve the business, rather than weigh down your IT department with technical debt and constant data pipeline rejiggering? In this document we cover the four essential principles for effectively architecting your data lake.
Recent surveys have shown that the data lake market is expected to grow to $20.1 billion by 2024, with a growing number of organizations looking to deploy a data lake in coming years. However, despite growing interest in big data initiatives, a roadblock many organizations run into is the complex, manual nature of building a data lake - which requires hiring skilled personnel that are in dire shortage.
Amazon CloudWatch is a monitoring service for AWS cloud resources and the applications you run on AWS. While CloudWatch enables you to view logs and understand some basic metrics, it’s often necessary to perform additional operations on the data such as aggregations, cleansing and SQL querying, which are not supported by CloudWatch out of the box.