A Case for a Data Modeling Language (and Better Wheels)

A Case for a Data Modeling Language (and Better Wheels)

There’s a lot of talk about point-and-click BI. Some companies claim to make SQL the lingua franca of business (anyone can code!). And others still are talking about AI, ML and the death of SQL.

At bipp, we certainly are looking ahead to the future of BI. And we also tend to be proud of our technical strength. Technical passion and excellence are in our DNA. Most critically, we believe in solving real world problems, not by re-inventing the wheel, but by constantly striving to build a better wheel. And not just because marketers want us to build a better wheel.

Build Features for Users (Not Your Marketing Team)

Going after low-hanging fruit is a worthy endeavor, but if that’s all human beings ever did, Google would not have beaten out all the competitors in the “mature” search engine market of the early 2000s. I imagine someone asking Brin or Page in 2003, “What’s your differentiator? Aren’t you doing the same thing as Alta Vista or MetaCrawler or Lycos?”

What would they answer? Maybe they would refer to their landmark Pagerank paper. Maybe they would wait for the pudding to be eaten.

I am not audacious (read: crazy) enough to compare bipp to Google in it’s world-changing effect. The point is simply this: it is not always easy or possible to come up with a convincing elevator pitch that would clarify a technical advantage.

bipp has several so-called differentiators that make it stand apart from other worthy BI companies. Some of them are easy to describe. Some of them are too technical to fit in a pithy English sentence that business demands. But once you use such a tool, you will not go back.

SQL is SQL - Right?

I will illustrate one such technical strength here: structured SQL. I will try to illustrate the idea with as simple and short an example as possible.

You have 100 stores, each selling a myriad products to customers. You want to compute the average daily customer spend for every store and every customer. You have one table with 6 columns:

date, storeID, customerID, itemID, quantitySold, pricePerItem

Easy, right?

You first need to add up quantitySold * pricePerItem for every (date, storeID, customerID)-triple.
Then you will need to compute the average of those sums over all dates for every (storeID, customerID)-pair.

In a simple SELECT statement, you can compute sums and averages, but not averages of sums! A nested SELECT statement will do the trick:

SELECT storeID, customerID, AVG(spend) AS averageSpend FROM (
  SELECT date, storeID, customerID, 
    SUM(quantitySold*pricePerItem) AS spend 
  FROM yourTable 
  GROUP BY date, storeID, customerID)
GROUP BY storeID, customerID

While a simple SELECT statement is powerful, a lot of the real expressive power of SQL comes from these nested SELECTs. You can call them higher order SQL.

How well does your BI platform support it? If the BI platform has a data modeling language, you can probably define a virtual table (a.k.a. unmaterialized view) for the inner SELECT. When the user chooses the columns for the outer SELECT, the required final SQL statement is produced.

At this point, any BI analyst, who has had the experience of ever-changing requirements, will notice a gotcha: What about filters?


What if you still want to compute the average daily customer spend, but *subject* to one or more of the following *constraints*:
  1. in the last 6 months
  2. only for one specific store
  3. only for a handful of specific items
  4. only for items with price at least $10
Maybe you will add a filter to your outer SQL like so:

SELECT storeID, customerID, AVG(spend) AS averageSpend FROM (
  SELECT ... FROM yourTable ...)
GROUP BY storeID, customerID
WHERE <your condition goes here>

If your data is for 10 years and your condition is A above (last 6 months), most of the data that the inner SELECT will process will be in vain. A similar unnecessary computation cost will be paid if you have 100 stores and your condition is B (one specific store).

If your condition is C or D, then no filter on the outer SELECT will solve your problem.

What you need is a dynamic filter on the inner SELECT.

Does your BI platform allow it? bipp does.

We saw the problem in real world large enterprise scenarios. We did what we do best: come up with a robust technical solution. We built bippLang, a data modeling language that streamlines SQL queries. With appropriate bippLang data models, bipp auto-generates arbitrarily nested SELECT statements, each SELECT remaining accessible to dynamic filtering.

It doesn’t exactly roll off the tongue. But you can’t deny, if you work in SQL every day, it really is a better wheel…

If you want to learn more, visit https://bipp.io/request-demo/ to see bippLang data modeling in action.

Posted under:

Get notified of new posts

Invalid email address.
Thank You! submission has been received.