As the end of the year rapidly approaches, it’s time to take a look at what the next one might have in store.
We’ve spent the past 12 months talking to hundreds of organizations about their streaming data issues: storage, analysis, governance and more. In addition to incredible new customers, these conversations have helped us understand what companies are doing with their streams today, and what they hope to achieve tomorrow.
Here are our predictions for the top trends that will dominate the conversation around streaming data in 2019.
1. Industry interest in streaming goes beyond the usual suspects
When we started Upsolver about five years ago, most of the interest in analyzing streaming data came from ad-tech: programmatic, real-time bidding, mobile monetization... companies in this space work have always worked with high-velocity, high-volume data. Companies that work with IoT sensors were in a similar boat. But besides these core verticals, streaming wasn’t that much of a thing for most companies.
This is no longer the case. With the continued growth of SaaS, web and mobile applications, alongside the increased interest in data science and advanced analytics, we are seeing a far broader range of companies involved in streaming. Today it’s difficult to find a mid-size or larger enterprise that doesn’t have a streaming data project on the horizon - driven by a desire to analyze web or product logs, clickstream data, customer information and a myriad of other use cases.
2. Separation of storage and compute
The enterprise data warehouse has been losing its luster for a while, but now more than ever we are seeing the database breaking apart into separate components; data teams are gravitating more and more towards clusters and distributed computing, and towards solutions that allow them to leverage inexpensive storage that is separate from compute resources - including cloud object storage provided by Amazon, Microsoft and Google, as well as high-end databases such as Snowflake and Google BigQuery.
Streaming data especially fits into this trend as it requires analysts to “fish” for insights. A single event is almost never insightful - rather, you would need a large amount of data to pile up into a haystack before you go searching for the needle. Use cases are fluid, innovation-driven and not always expected, which creates a further need to store all the data now. Separation of storage and compute is key to this approach, and will continue to grow in popularity throughout 2019 and beyond.
3. Data science drives schemaless development
Data science projects are often exploratory in nature, relying on machine learning and neural networks to uncover insights that traditional analysis would not have uncovered. Unlike the data analyst of yesteryear, a data scientist does not always know the questions she is going to ask when she approaches a new dataset. This creates an acute need for storing data in its original state, without imposing schema or performing transformations as it is ingested.
Streaming data and data science go hand-in-hand: recommender systems, predictive decisioning and predictive forecasting are all built on large-scale analysis of event-based information. Hence we will see more organizations adopting schemaless development paradigms to facilitate more agile and effective data science
4. Data management and governance become key concerns
At the end of 2018, most organizations still have shockingly little insight into their streaming data. Developers often rely on convoluted manual coding to pull periodic samples from each stream, and end up with only a vague notion of what they are actually bringing in to their data lakes.
As the big data landscape continues to mature and pervade additional industries, this is likely going to change. Organizations that work with streaming data will spend more time and resources on data discovery, governance and access management in the coming year.
5. Data plumbing is unsustainable
Organizations have been pouring resources into big data projects for the past decade; in recent years, the conversation has increasingly shifted towards seeing a clear return on that investment. With this came the realization that companies can’t keep spending endless time and money being on ‘data plumbing’: organizations want to focus less to data engineering, and more on extracting insights from their data.
This trend manifests itself in growing demand for managed services - storage, stream processing, and orchestration come easily to mind - as well as self-service tools for transforming, analyzing and visualizing streaming data. These tools allow data platforms team to empower data analysts, data scientists and business users to be self reliant, rather than writing endless ETL jobs.
6. More users need access to streaming data
In the past, streaming data used to be the domain of a very small group of people within the organization - big data engineers and data scientists. These highly skilled individuals were familiar with the incredibly complex toolset needed to work with streams - Spark, Flink, MapReduce, Scala, etc.; while BI and business analysts would focus on running SQL queries against relational databases.
In 2019, this will no longer be the case. As additional business processes generate and rely on streaming sources, businesses expect to be able to work with this data as they do with any other dataset - in interactive dashboards, ad-hoc analytics, and for software development teams. This creates a need to make data more accessible to the masses within the organization (which in turn drives the aforementioned demand for self-service).
7. Rapid transition from batch to streaming architectures
Batch processing has always been the bread-and-butter of stream analysis. 12-24 hour timeframes to answer an analytical query with streaming data used to be a perfectly reasonable state of affairs for most organizations, or at the very least - an accepted limitation of technology and resources.
Today’s businesses are less patient, and this is likely to become even more pronounced in the near future. Data teams are expected to deliver insights at or near real-time - especially in domains such as IoT analytics, where insights are often perishable and require immediate action. To facilitate this need for timely information, we are going to see more companies replacing batch processing with streaming architectures.
Want to learn more about streaming architectures? Check out our previous posts on Apache Kafka with and without a data lake, or the 5 signs you've outgrown your data architecture; or schedule a call with one of our experts to learn how you can gain the most value out of your streaming data in 2019.