Enterprise data analysis can be performed in many different ways. It is not a single set of defined processes that work for every data set or every type of business question.
In-database analytics is one way to analyze data where the processing of data is performed within a database. It has multiple benefits over traditional analytics techniques, where databases and their processing are separated.
Traditionally, BI data analysis is achieved by moving data out of a database or data warehouse and into a BI product before you can perform the operations to answer your complex business questions.
When databases and data warehouses were small enough that the time taken to extract the data as an initial step was minimal and not a huge burden for BI tools.
In 2003, an Encyclopedia of Information Systems article noted: “For large organizations, several gigabytes of data may be recorded in the enterprise database within a week or even a day.”
In 18 years, gigabytes have turned into terabytes and petabytes of data collected each week. The volume of business data has exploded due to automated internal logging, IoT data capture activities, and app and cloud integration data extraction.
The data extraction step now comes at a high cost in performance.
Most BI products still use an import from database function before processing data. Despite the performance cost, it remains an easy method to manipulate data, as vendors can create custom ways to model data and extract queries.
There are multiple problems BI products encounter with this conventional approach to data analysis.
- Importing large datasets impacts performance
Despite increasing bandwidth capabilities, importing very large datasets to BI platforms takes time. This means you are waiting to perform any analysis. Performance and scalability are both hindered.
- Importing large datasets cut off rapidly changing data
Lengthy import times result in quickly changing data not being included in your data set. This is a show-stopper if you need near-real-time analysis.
- Security implications with full data import
Your data in motion on external networks are at risk as it imports from the database to your BI tools. Besides enhancing your security, having only your analysis results moving across the network limits your valuable business data exposure.
- Proprietary data manipulation techniques
While manipulating data within a BI product may be easy for experienced users, proprietary data modeling and manipulation techniques mean that you have vendor lock-in. You may need to start again from scratch with modeling and analysis if you want to change products.
In-database analytics does analytical querying in the database rather than importing the BI tool’s data and then performing other operations.
This is a push function from the BI tool rather than a traditional pull.
How do we work directly with the database across relational, NoSQL, and columnar databases to speak a language they all recognize? The answer is SQL. SQL remains the language of databases since the 70s, supported by every database worth its money.
BI products convert the business question to a SQL query for in-database analytics, then pass it along to the database.
With operations performed on the database, rather than in the BI tool, you remove the resource-heavy pull step, plus you now have portable code within the database you can take to another tool. Only the modeling, query, and results are sent between the database and your BI tool regarding security. The entire dataset is never in motion.
Some BI tools use additional data constructs inside the database, such as C++ or Java modules, to perform complex querying. This results in queries that cannot be ported, as the constructs are not standard across the different types of databases, unlike pure SQL.
The cloud has given rise to a slew of business intelligence tools that promise insights by helping you write and organize SQL queries. These tools are excellent for single-use reports and analyses by technical users who can write SQL. These tools are overly simplistic and do not allow for ad-hoc analyses and reports by business users.
Even with pure SQL, querying large and complex datasets is prone to errors, not reusable, sometimes tricky to understand, and not easy to collaborate on code.
The bipp approach to in-database analytics is different. bipp enables technical users to specify a data model using SQL fragments and bippLang, our lightweight data modeling language. bippLang is git-enabled, produces reusable code, and simplifies the query creation and collaboration processes.
Once the data model is created, business users can analyze the data on their own. They can create ad-hoc reports and queries, with bipp auto-generating the SQL. Users can create dashboards, view their results using various visual representations, and share their work.
To see in-database analytics in action…
IN THIS PAGE