Data infrastructure’s role is to protect, secure, preserve, and process the voluminous structured and unstructured data that the modern company generates. The technologies that make up the data infrastructure include cloud or managed resources, hardware, software, and data engineers who are tasked with maintaining this infrastructure and ensuring that it serves its fundamental purpose of converting the data into meaningful information that provides the basis for the strategic decision-making process.
The Information Technology industry is exceptionally good at building extensive, complex software applications that are now constructed around data. In other words, the primary value to the business comes from the system’s ability to analyze data, rather than the software itself.
Data infrastructure trends are driving the analysis of data through the use of enterprise software applications. Software is being built to manage data rather than the other way around. Additionally, this fast-moving trend has had, and continues to have, an impact across the IT industry, including shifts in customer spending. Ergo, data, and the analysis of data have become valuable currency to business organizations.
Many of today’s tech startups are building products to manage and analyze data. These systems are fundamentally analytic systems that drive data-driven decisions with data-powered operational systems to collect the data and display the analyzed results.
Unfortunately, there is still tremendous confusion surrounding what technologies and architectures are driving the modern data infrastructure. Consequently, let’s consider several of the most prominent data infrastructure trends in 2020, for this information will lead us into 2021 and the future.
2021 Trends in Data Infrastructure
1. Gargantuan growth in the data infrastructure industry
The data infrastructure industry has grown aggressively over the last couple of years and will continue to increase rapidly.
Statistical support for the aggressive growth in the last couple of years is found in the article titled, “Emerging Architectures for Modern Data Infrastructure” reports that Gartner determined that, in 2019, data infrastructure spending reached a record high of $66 billion (USD). This number represents 24% of all 2019 infrastructure spend. Additionally, Pitchbook reported that the top 30 data infrastructure startups have “raised over $8 billion in venture capital in the last five years at an aggregate value of $35 billion.“
Most of this data is stored in the cloud or data centers across the globe. The website prnewswire.com highlights the fact that the worldwide data center infrastructure market is projected to grow at a Compound Annual Growth Rate (CAGR) of “6.79% during the forecast period, reaching a total market size of US$230.169 billion in 2025 from US$155.201 billion in 2019.”
As cited above, this growth is predominantly driven by the increased requirement for cloud computing architecture, such as the data storage offered by Amazon AWS and Google Cloud.
2. Emerging data analytics capabilities
Gartner highlighted the fact that data and data analytics are the blueprints for digital transformation. The leading global organizations use data and its subsequent analysis as “competitive weapons, operational accelerants and innovation catalysts.” Consequently, new and innovative ways of analyzing and interpreting data must be found.
Hence, new data analytics capabilities are emerging that necessitate the creation of new toolsets and infrastructure to implement these unique data and information requirements.
This trend is partially driven by the need to find new ways to analyze data, as guided by the widespread advancement and adoption of the Fourth Industrial Revolution (4IR).
Klaus Schwab of the World Economic Forum defines the impact that 4IR has had and will continue to have on the world as we know it in the following way.
“We… [are in] …a technological revolution that will fundamentally alter the way we live, work, and relate to one another. In its scale, scope, and complexity, the transformation will be unlike anything humankind has experienced before… It is characterized by a fusion of technologies that is blurring the lines between the physical, digital, and biological spheres. The speed of current breakthroughs has no historical precedent… [4IR] …is evolving at an exponential rather than a linear pace. Moreover, it is disrupting almost every industry in every country.”
3. The confluence of data analytics and AI or machine learning
As highlighted at the beginning of this article, data infrastructure serves two purposes.
- To provide the necessary information that helps management make better decisions (analytic systems or use cases)
- To build data intelligence, utilizing AI and machine learning, into customer-facing applications (operational systems or use cases)
Two data architectures have developed around these use cases.
- The data warehouse forms the basis for the data analytics use case because it stores data in a structured format, providing users quickly and relatively simply.
- The data lake supports the operational use case. It stores data in its raw format, allowing custom-developed data analysis applications utilizing Artificial Intelligence and machine learning to interpret the data and provide meaningful information from this data.
Note: It is interesting to note that the data warehouse and the data lake are starting to resemble one another after a fifteen-year battle between the data lake and the data warehouse construct. In his article titled “What is an Open Data Lakehouse? A World Without Monoliths,” Ori Rafael describes the confluence of the data warehouse and the data lake to store and analyze data, deriving the benefits of both architectures simultaneously.
As stated at the outset of this discussion, the modern business organization considers data an extremely valuable currency and precious commodity. However, data on its own, without analysis and interpretation, has no value. As a result, the need to analyze and interpret massive data volumes and turn them into useful, valuable information drives the latest trends in data infrastructure.
Succinctly stated, data infrastructure or architecture has to support the storage and analysis of both structured and unstructured data in data lakes, data warehouses, or a combination of both, as per the Upsolver data lake house architecture. And as the data volumes grow and different data types needing analysis increase, the data architecture and the analysis tools must develop alongside the rapid rise in data volumes.