Introduction to Semantic Models in Looker
Slawomir ZakrzewskiKeeping analytics consistent is hard. But it’s not magic. If your company, like many others, is experiencing difficulties in maintaining multiple stand-alone models, created independently by different analytical teams, here is what you can do to improve. In this article we will discuss how these issues can be addressed with the help of Looker, an enterprise platform for business intelligence, and then we will show how to create a foundation for a centralized analytical platform with Looker.
In the world of business intelligence many technologies are focusing on the individual freedom of a single analyst and enabling the analyst to slice and dice the data any way they see fit, with little emphasis on creating a coherent analytics environment. This approach, while allowing the analysts to work independently of each other, often does not scale well. When the work of multiple analysts needs to be combined in a larger enterprise, it can lead to inconsistent metrics, conflicting business logic, KPIs not matching across different departments, etc.. These issues can undermine the trust between business users and analytics teams, thus complicating data-driven decision making. One industry-tested way of solving these issues is creating a single source of truth that can be used across the organization. How can we ensure that every analysis in the company uses data that is consistent and reliable? Through the use of Looker!
Looker is an enterprise platform for business intelligence, data applications and embedded analytics aimed at complex data sources. Together with Looker Studio, it forms Google’s family of data-centric products that enable different user groups to make data-driven decisions. Since Looker Studio is aimed at less complex, single or pre-aggregated data sources and one-off visualizations, in this article we will be focusing solely on Looker. The main differentiators in Lookers approach to solving above-mentioned issues is declaring and storing semantic models centrally, and full integration with Git.
The first point is enabling analysts to declare semantic models. For this purpose Looker provides Looker Modelling Language, henceforth called “LookML”. LookML provides a way to describe dimensions, aggregates, calculations and data relationships in your data model. Looker uses its SQL Generator to translate the user’s LookML queries into an SQL query which is then executed against the source database. This process, combined with the fact that LookML is independent of particular SQL dialects, allows the separation of the structure of the queries from the content. Consequently, users can focus on a more high-level understanding of their analysis rather than the specific SQL code.
Secondly, storing the model on a centralized platform enables the users to reuse its elements once they are created. Business users can view the model definition and utilize all the elements of the declared model in their analyses, with only the analysts/developers having editor access. Dimensions and measures, once created, can be reused across different analyses, reports and dashboards. The biggest advantage of the reusability of models and their elements is that it ensures that the metrics are consistent across all analyses within an organization. This in turn is a step towards bolstering the trust between business and analytics. An added benefit of this approach is that it saves time (by analysts not being forced to declare the same metrics multiple times in different analyses).
Thirdly, to further improve the trust in analytics, Looker ensures that all model changes can be tracked, reviewed, approved, and audited. Having the entire semantic model defined in code makes it possible to incorporate tools and development best practices commonly used in software development, which makes Looker well suited for scaling analytics in an organization. Furthermore, Looker integrates seamlessly with Git, which means that analysts can work on their changes to the semantic model on separate development branches. This ability to work simultaneously and track changes in an auditable way is especially important in large companies with a more mature analytical environment where the models are growing complex, support multiple reports for various business users, and the manageability and maintainability of data models is paramount, because it allows for seamless collaboration of large analytical teams. It is worth noting that the concept of defining our analytics with LookML, and version control does not apply only to model definitions, but building visualizations and dashboards as well, which allows large teams to seamlessly collaborate on visual analytics.
During the course of this tutorial article, we will:
- Create a new LookML project containing the semantic data model
- Configure a Git repository for the project
- Adjust the model to add additional measures Make adjustments to the model
- Create our first dashboard
We will use publicly available data on Airbnb listings from http://insideairbnb.com/
Create a new LookML Project
In order to create a LookML project we need to set up a data connection. This operation can be done in the “Connections” menu of the administration panel. When creating a data connection we need to provide details such as the SQL dialect that is used in the database. Looker supports a wide variety of SQL dialects, from T-SQL and PostgreSQL to massively parallel relational database management systems like Amazon Redshift or Google BigQuery. For each type of database, the connection details may vary. We are using BigQuery, therefore, we provide the ID of the GCP project used for billing, and the dataset that will be used in the connection. We also provide an authentication key for the service account that Looker will use to connect to BigQuery.
After the connection is created we can proceed with creating a Looker project. To create a new project, enter “Projects” panel in the “Develop” section of the left hand menu and in the project management panel select “New LookML Project”
Select the connection that was created previously and schema in the datasource. We can choose between including all tables or a single table from the data source. We select “All tables” for our model, which enables Looker to create a model that includes all the tables automatically. Including all tables allows us to use them for any joins that are necessary for our analysis. After creating a LookML project it will contain separate definition files for the model and for each of the views it contains. The model definition includes details about the datasource, which views are included in the model, relationship between tables, and which views are available to explore for the end users. View definitions include details about all the dimensions and measures in the view, defines which column is the primary key etc..
Configure a Git repository for the project
Our next step is to configure a Git repository to ensure that analysts will be able to work on the code in parallel, and all changes to the code will be trackable and auditable via Git. In order to do it we need to provide an existing empty git repository. Looker supports multiple Git services like GitHub, GitLab, Bitbucket, Phabricator Diffusion etc.. For the purpose of this exercise we created a repository using Google Cloud Source Repositories. You can quickly make your own with these instructions: https://cloud.google.com/source-repositories/docs/creating-an-empty-repository
After providing the repository address, Looker will provide an SSH key that needs to be added to the repository.
This operation will enable the “Git Action” menu, which will allow us to create branches for features they are working on, as well as deploy the changes to the production model when they are completed. When creating a new branch, provide the name for the branch as well as indicate which branch should be used as the source. In case the code is not pulled automatically, we may use the “Pull from” operation and pull the code from Production, or another branch in case of a more complex release cycle.
Adjust the model
You might want to adjust your model in order to add additional measures or dimensions that fit your business requirements. Here is how you can do that: for the purpose of this article we will add a location type dimension to the “listings” view, so that we can display the listings on a map type visualization. We may also want to add a new measure that will calculate an average price.
After adding the code we save and validate the changes.
Once the changes are validated they can be committed, and finally deployed to production.
Creating a dashboard
With the initial model established, we are ready to create our first dashboard. Apart from its built-in visualization capabilities, Looker provides integration with other tools (e.g. Looker Studio), so that the semantic models created in Looker can be reused across other tools used in the organization. For the sake of simplicity, we will focus on the in-built Looker capabilities. In order to create a new dashboard we open the main page of the Looker interface and navigate to “My Folder” in the left hand side menu, where we select a new dashboard.
When created the dashboard will be empty. In order to fill it with content, we select the “Edit Dashboard” button, adding new visualizations from the top menu. Next we select the view from the model that the visualization will be based on.
From there we create a map visualization using the Listing Location dimension which we created previously. We can also add additional visualizations, filters etc.. The dashboards are interactive and can be used to dynamically filter other visualizations.
We mentioned before that we can define and modify dashboards using LookML, here’s how that works. We can write LookML dashboard code ourselves or we can review the LookML code reference for the visualizations that were created in the UI. For the purpose of this article let’s view the LookML code for the dashboard we just created.
We use the “Copy to Clipboard” button and put this dashboard directly in the model. To do that lets first create a dedicated folder for the dashboards. Within the newly created folder we create a new dashboard file.
We modify the model file so that the newly created dashboards folder is included. Afterwards we paste the previously copied LookML dashboard code and deploy it to production following the same process (validating, commiting and deploying) as with any other code changes.
After this operation the newly created code-based dashboard is available in the “LookML Dashboards” of the main UI. As mentioned above this dashboard can be modified by editing code, with all the advantages of using Git workflows.
To summarize, we identified that inconsistent analytics are a pain point that can erode trust and complicate data-driven decision making in large organizations. We showed how we can solve this challenge by taking advantage of Lookers semantic model platform and by using Lookers Git integration to ensure that all changes to semantic models and dashboard are trackable, reviewable, and auditable. Looker provides both data visualization and modeling capabilities, with focus on providing a very robust and consistent central data platform and enabling collaboration by taking advantage of Git integration.
Have thoughts on Looker capabilities or are interested in hearing more about the platform? Feel free to let us know in the comments, or reach out to us directly!