By Destin Valine on May 26th, 2021
Read time: 4 minutes.
Globally corporations spent and are spending big on data lakes, racking up an estimated $7.9 billion in 2019 in costs. These numbers will grow to reach up to more than $20.0 billion by the mid-2020s. Even as the location of the lakes shifts from on-premise to cloud at an ever-increasing rate, these numbers will grow.
Data lakes were designed to bring multiple sources of data together into a single location so the business could have on-demand reporting capability from data that was acceptably aged. Data lakes have since been shoehorned to meet needs of disparate data reconciliation in live systems. With this questionable use, engineers have sought to balance timeliness with cost. The more frequent the reconciliation, the more likely the data will not be stale. That assurance comes at the costs of network, storage, and compute.
The solutions have taken many forms, of which the following have been the most frequently used:
The approach to disparate data sources has led to increased IT costs, especially with the labor required to architect, build and maintain the solutions engineered to work-around the pattern that data lakes were designed to solve. Ironically, businesses were sold on, and believed, they would reap financial rewards from putting this data in a lake.
Serve the needs of stakeholders to whom static business intelligence is acceptable. Reduce the costs of your acquisition and storage. Challenge the sources you have stored, and remove the data you don’t need. Set reasonable retention practices. Consider the data you are storing as static to the refreshing interval which meets your goals for return on the investment.
By reversing the data lake paradigm. Using, when possible and practical, the staleness of your data lakes’ design. Combine that data lake with a healthy on-demand data acquisition and action strategy. Stream the data lake’s data and combine it with real-time SoRs as needed. Using this pattern, you keep your organization’s data close at hand while combining it with critical systems your business needs. While data is streaming to your operator, augment that data with other contextual data to provide your operator all the information that they need to perform their job most effectively.
Primary complaint about the hybrid model has been that each time an external system is integrated, the cost is duplicated.
Software savvy organizations place abstractions between systems and applications, while data-savvy organizations place data lakes in the same place. Low code solutions, like edgeCore™ offer a new option. Leave the data where it belongs and create a powerful decisioning platform for your users. Combine streams of data into impactful visualizations that drive decisions. Bring data lakes together with on-demand data and decision support.
edgeCore™ enables companies to utilize their lakes, placing aggregations and other powerful lake performant goodies within easy reach. Placing data within a pipeline is an exercise of creating the node, setting the connection properties, and publishing it. Connecting to APIs, text files, shell results, and more follow that same pattern.
Additionally, with the data sources published, a user with a modest level of SQL experience can custom-craft the user experience, reducing, securing, and transforming captured data.
Most importantly, when seeking to reduce the dependence on the lake, the selected data, reduced to only what is important to the set of users acting upon it, can be augmented. Therefore, this enables dramatic improvements to corporate applications. Augmenting looks up infrequent references and connects frequently changing data, it efficiently connects specific data to business support and AI. Finally, through this connection to support systems, businesses reduce costs associated with processing unneeded information.
In this month’s CTO Masterclass join Destin Valine, as he explores, explains, and demonstrates the “Augment Transform” feature in the edgeCore™ platform. Destin will explain how reduction and targeting can help you identify the routes needed to act efficiently on data. How this process can reduce your dependence on frequent updates to otherwise low-value data, and how these changes, taken to their logical conclusions, will help organizations utilize their systems more efficiently and effectively.