bipp - like many companies - wouldn’t exist if it wasn’t for cloud computing. It gives us agility, enables scale, lowers costs…yet the cloud as we know it today is a relatively new concept. We take Zoom and Netflix for granted, but back (back!) in the 2000s, this new way of thinking about data was in its infancy.
Cast your mind back to those whacky, crazy days. IBM was talking ‘on demand’. Amazon’s Mechanical Turk provided Cloud-based services - including something called “human intelligence.” And by 2010, ‘cloud’ was the suffix of choice.
Data, Data, Everywhere
Companies were starting to recognize that data was a valuable asset for influencing business decisions. But most lacked the IT/engineering resources to realize the opportunities. Then the software industry stepped in, offering data analytics through ERP software suites and BI point solutions.
While this helped non-IT teams move faster and generate more value - it also created several problems. These new point tools played by their own rules and didn’t talk across teams, let alone to the data warehouse. Ironically, tools designed to make data easier to manage and empower business users created more data silos and security concerns.
In response, early data layering tools like Segment and Snowplow disaggregated the data layer from software systems, requiring engineers to collect data into a flexible, centralized warehouse and layer on analytics tools. But the proliferation of individual tools and datasets across teams made it challenging to collect the data, which created a crisis of integration.
There were also issues managing the views in the main data warehouse, and increasingly a need to bring data from a wide variety of sources. The lack of a single source of truth also led to inconsistent metrics, a lack of version control which made it harder to trust the data.
In addition, security breaches and customer concerns forced companies to look at data governance. Historically this function was relegated to the IT department and referred to a process for cataloging large quantities of transactional data. But with business success depending on reliable, secure, and available data, data governance was moving well beyond record-keeping. Clearly, the data explosion needed rules.
A solution to the integration and governance challenges required a robust infrastructure for collecting, transforming, storing, and serving data. And someone with the skillset to make it all work.
The Rise (and Rise) of the Data Engineer
Historically, data engineers were bricklayers - systems builders, data transporters, and back-end engineers. In the pre-cloud days, engineers moved data to and from databases. They would also be on hand to optimize data while servers were upgraded or installed.
Then came the cloud.
Today’s data engineer plays a critical role: connecting tools that architect and operate a modern data stack. They are constantly assessing new tools and figuring out how to integrate them with existing ones.
This requires a comprehensive understanding of data structures and storage technologies, distributed and cloud computing knowledge, and SQL skills. Armed with SQL expertise, data engineers can read and understand database execution plans, access, read, manipulate, create, modify, remove tables, views, and indexes.
This makes the data engineering skillset ideal for solving business-critical integration and governance problems born of the data explosion that we’re still dealing with today. As a result, they are becoming critical business and technical partners, especially as companies adopt analytics cultures to get the most from their data.
New Solutions to New Problems
With the data engineers owning standards, best practices, and certification processes for data objects, they have evolved. No longer bricklayers, the modern data engineer is more of a master builder, working in tandem with data architects to conceptualize, visualize, then build data management frameworks.
Data engineers also increasingly have responsibility for data modeling, bringing real-world context and associations to flat data to make it come alive. Reusable data models provide more contextual insights by building an extra layer of logic over the top of your databases. By linking data sources with a data model to represent a real-life system and its interactions, data engineers can help companies create a more accurate picture.
But, the data boom and our reliance on the cloud wait for no person. Data engineers are also responsible for owning three changes shaping BI’s role in the modern data stack.
Automation of data governance processes
Driven by the growth of self-service models for business operations, BI systems with row and column level security will take the guesswork out of who has permission to access which data. In addition, automating data governance will provide an efficient, user-friendly way for employees across various business functions (marketing, sales, customer support, finance…) to access the data they need.
Predictive analytics will become BI’s primary role
Predictive analytics, driven by machine learning, will advance organizations’ business intelligence maturity. We’ll see BI platforms prioritize forward-looking, autonomous decision support over descriptive analytics based on old data. Data engineers will drive this shift, writing the code for training and preparing models, and deploying them to live production environments. They’ll build the production infrastructure, including automation, testing, monitoring, and logs.
Accounting for new sources and types of data
The staggering growth of new data sources and types is forcing a constant state of change. As a result, data governance and integration approaches will have to adapt to address emerging data sources from advanced mobile technologies, new social media platforms, and countless future developments to decide best to be housed and leveraged.
As e-commerce became ‘commerce,’ cloud computing is now ‘computing.’ The challenges of governance and integration are only becoming more complex. And data engineers have become the most critical business and technology partner.
If you want to talk more about data engineering and the future of BI, tweet us @bippanalytics