Data engineering is a relatively new field within computer science, and is tied closely to data science on the one hand and to DevOps and DBA work on the other. Keeping up with the latest trends and developments can help ensure that your organization works with the most optimal infrastructure to handle big data. To help you do do, we’ve gathered this list of the 14 best data engineering resources – from podcasts to blogs to video libraries. Enjoy!
The increasing popularity of podcasts has not skipped the field of data engineering – there’s not many out there, but the few that exist are highly recommended. Our favorites include:
Data Engineering Podcast
This podcast is one of our favorite ways to stay abreast of industry goings-on and in-depth discussions of new products and developments in data engineering. The Data Engineering Podcast. produced and hosted by Tobias Macey, a DevOps manager at MIT – is an absolute must-listen for anyone who wants to keep up with data engineering trends or dive deeper into the field.
Visit the website: https://www.dataengineeringpodcast.com/
Software Engineering Daily
True to its namesake, this is a daily podcast about software engineering – but many of the topics discussed are highly pertinent to data engineering, with commonly discussed topics including NoSQL, infrastructure optimization and AI architecture. The podcast features a wide variety of guests – both established thought leaders and up-and-coming developers – and with tons of new content published each week, is another great way to keep up with the industry.
Visit the website: https://softwareengineeringdaily.com/
Engineering and Technical Blogs
Learning by example is always a good way to deepen your knowledge – and luckily, some of the most data-intensive companies regularly publish great educational content on their engineering blogs:
Data engineers at Pinterest tackle a variety of complex data engineering challenges, including Kafka optimization, real-time anomaly detection, knowledge graphs and building a Kubernetes platform.
Visit the blog: https://medium.com/@Pinterest_Engineering
Yelp Engineering and Product Blog
The architecture that powers Yelp’s recommendation systems is fascinating, and this blog offers a glimpse into the types of complex architectures Yelp’s data engineers build using Kafka, Cassandra, Tensorflow and other technologies.
Visit the blog: https://medium.com/@Pinterest_Engineering
One of the company’s that has really epitomized big data’s impact on the world around us (for better or worse), the Uber Engineering blog contains tons of useful information on how the company manages infrastructure for truly massive volumes of data.
Visit the blog: https://eng.uber.com/
The Netflix Tech Blog
Another company changing traditional industries with big data, Netflix’s technical blog details how the streaming giant runs its cloud data architecture to power personalized recommendations and other aspects of its service.
Visit the blog: https://medium.com/netflix-techblog
Additional publications and individual authors you should be following:
AWS Big Data blog
While obviously mostly relevant to AWS users, the AWS blog is highly technical and oriented towards solving specific engineering problems, with very little in the way of promoting specific AWS products. It published a lot of content by both AWS writers and guest contributors and is worth keeping an eye on.
Visit the blog: https://aws.amazon.com/blogs/big-data/
This blog is written by a former data engineer at Facebook and AirBnB and the creator of Apache Airflow and Apache Superset. He writes about data engineering as a disciple and the day-to-day life of a data engineer, as well as more technical content related to Airflow.
Visit the blog: https://medium.com/@maximebeauchemin
O’Reilly on Data
The renowned analyst firm behind the Strata data conference regularly publishes excellent in-depth content related to data engineering and data science topics.
Visit the website: https://www.oreilly.com/radar/topics/data/
Communities and Content Hubs
The following websites mostly curated user-generated content around data engineering and data science, but we like their editorial standards and the selection of articles they offer:
Towards Data Science
A community devoted to “concepts, ideas and codes”. While much of the content skews more towards the data science and analytics side, there’s plenty of data engineering goodness to be found here such as in-depth guides to Apache Spark.
Visit the website: https://towardsdatascience.com/
A website dedicated to analytics learning, the Analytics Vidhya community offers a mix of content around various data-related topics. While the content is of varying quality and relevance, you can find some excellent contributions on the site.
Visit the website: https://www.analyticsvidhya.com/blog/
Don’t be fooled by its simple and unassuming design – KD Nuggets is one of the best places on the web for data science, analytics and AI content. Gregory Piatetsky-Shapiro and his team regularly curate original research and in-depth, thought-provoking articles contributed by users.
Visit the website: https://twitter.com/kdnuggets
Prefer the moving pictures? Check out some of these Youtube collections:
Martin Kleppmann is an established thought leader on stream processing and distributed computing – you can definitely learn something from watching one of his video lectures.
Watch the videos: https://www.youtube.com/results?search_query=martin+kleppmann
A Youtube channel for the community of users and developers of data analysis tools. Tons of new video content every week covering a wide variety of topics.
Watch the videos: https://www.youtube.com/user/PyDataTV/videos