Skip to content

You are viewing documentation for Immuta version 2023.3.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Getting Started with the Databricks Unity Catalog Integration

Databricks Unity Catalog allows you to manage and access data in your Databricks account across all of your workspaces. With Immuta’s Databricks Unity Catalog integration, you can write your policies in Immuta and have them enforced automatically by Databricks across data in your Unity Catalog metastore.

Permissions

  • APPLICATION_ADMIN Immuta permission for the user configuring the integration in Immuta.
  • Databricks privileges:
    • An account with the CREATE CATALOG privilege on the Unity Catalog metastore to create an Immuta-owned catalog and tables. For automatic setups, this privilege must be granted to the Immuta system account user. For manual setups, the user running the Immuta script must have this privilege.
    • An Immuta system account user requires the following Databricks privileges:

      • OWNER permission on the Immuta catalog you configure.
      • OWNER permission on catalogs with schemas and tables registered as Immuta data sources so that Immuta can administer Unity Catalog row-level and column-level security controls. This permission can be applied by granting OWNER on a catalog to a Databricks group that includes the Immuta system account user to allow for multiple owners. If the OWNER permission cannot be applied at the catalog- or schema-level, each table registered as an Immuta data source must individually have the OWNER permission granted to the Immuta system account user.
      • USE CATALOG and USE SCHEMA on parent catalogs and schemas of tables registered as Immuta data sources so that the Immuta system account user can interact with those tables.
      • SELECT and MODIFY on all tables registered as Immuta data sources so that the system account user can grant and revoke access to tables and apply Unity Catalog row- and column-level security controls.
      • USE CATALOG on the system catalog for native query audit.
      • USE SCHEMA on the system.access schema for native query audit.
      • SELECT on the following system tables for native query audit:

        • system.access.audit
        • system.access.table_lineage
        • system.access.column_lineage

Requirements

Before you configure the Databricks Unity Catalog integration, ensure that you have fulfilled the following requirements:

  • Unity Catalog metastore created and attached to a Databricks workspace. Immuta supports configuring a single metastore for each configured integration, and that metastore may be attached to multiple Databricks workspaces.
  • Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta instance rather than a cluster for both performance and availability reasons.
  • Personal access token generated for the user that Immuta will use to manage policies in Unity Catalog.
  • No Databricks SQL integrations are configured in your Immuta instance. The Databricks Unity Catalog integration replaces the Databricks SQL integration entirely and cannot coexist with it. If there are configured Databricks SQL integrations, remove them and add a Databricks Unity Catalog integration in its place. Databricks data sources will also need to be migrated if they are defined in the hive_metastore catalog.
  • No Databricks Spark integrations with Unity Catalog support are configured in your Immuta instance. Immuta does not support that integration and the Databricks Unity Catalog integration concurrently. See the Unity Catalog overview for supported cluster configurations.
  • Unity Catalog system tables enabled for native query audit.

Best practices

Ensure your integration with Unity Catalog goes smoothly by following these guidelines:

  • Use a Databricks SQL warehouse to configure the integration. Databricks SQL warehouses are faster to start than traditional clusters, require less management, and can run all the SQL that Immuta requires for policy administration. A serverless warehouse provides nearly instant startup time and is the preferred option for connecting to Immuta.

  • Move all data into Unity Catalog before configuring Immuta with Unity Catalog. The default catalog used once Unity Catalog support is enabled in Immuta is the hive_metastore, which is not supported by the Unity Catalog native integration. Data sources in the Hive Metastore must be managed by the Databricks Spark integration. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.

Migrate data to Unity Catalog

  1. Disable existing Databricks SQL and Databricks Spark with Unity Catalog Support integrations.
  2. Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.
  3. Move all data into Unity Catalog before configuring Immuta with Unity Catalog. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured. If you don't move all data before configuring the integration, metastore magic will protect your existing data sources throughout the migration process.

Configure the Databricks Unity Catalog integration

Existing data source migration

If you have existing Databricks data sources, complete these migration steps before proceeding.

You have two options for configuring your Databricks Unity Catalog integration:

  • Automatic setup: Immuta creates the catalogs, schemas, tables, and functions using the integration's configured personal access token.
  • Manual setup: Run the Immuta script in Databricks yourself to create the catalog. You can also modify the script to customize your storage location for tables, schemas, or catalogs.

Automatic setup

Required permissions

When performing an automatic setup, the Databricks personal access token you configure below must be attached to an account with the following permissions for the metastore associated with the specified Databricks workspace:

  • USE CATALOG and USE SCHEMA on parent catalogs and schemas of tables registered as Immuta data sources so that the Immuta system account user can interact with those tables.
  • SELECT and MODIFY on all tables registered as Immuta data sources so that the system account user can grant and revoke access to tables and apply Unity Catalog row- and column-level security controls.
  • OWNER permisison on the Immuta catalog created below.
  • OWNER permission on catalogs with schemas and tables registered as Immuta data sources so that Immuta can administer Unity Catalog row-level and column-level security controls. This permission can be applied by granting OWNER on a catalog to a Databricks group that includes the Immuta system account user to allow for multiple owners. If the OWNER permission cannot be applied at the catalog- or schema-level, each table registered as an Immuta data source must individually have the OWNER permission granted to the Immuta system account user.
  • CREATE CATALOG on the workspace metastore.
  • USE CATALOG on the system catalog for native query audit.
  • USE SCHEMA on the system.access schema for native query audit.
  • SELECT on the following system tables for native query audit:

    • system.access.audit
    • system.access.table_lineage
    • system.access.column_lineage
  1. Click the App Settings icon in the left sidebar.
  2. Scroll to the Native Integration Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox. The additional settings in this section are only relevant to the Databricks Spark with Unity Catalog integration and will not have any effect on the Unity Catalog integration. These can be left with their default values.
  3. Scroll to the Native Integrations section, and click + Add Native Integration.
  4. Select Databricks Unity Catalog from the dropdown menu.
  5. Complete the following fields:

    • Server Hostname is the hostname of your Databricks workspace.
    • HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
    • Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
    • Exemption Group is the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
  6. Unity Catalog query audit is enabled by default; you can disable it by clicking the Enable Native Query Audit checkbox. Ensure you have enabled system tables in Unity Catalog and provided the required access to the Immuta system account.

    1. Configure the audit frequency by scrolling to Integrations Settings and find the Unity Catalog Audit Sync Schedule section.
    2. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
    3. Continue with your integration configuration.
  7. Enter a Databricks Personal Access Token. This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed above for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.

  8. Click Test Databricks Unity Catalog Connection.
  9. Save and Confirm your changes.

Manual setup

Required permissions

When performing a manual setup, the following Databricks permissions are required:

  • The user running the script must have the CREATE CATALOG permission on the workspace metastore.
  • The Databricks personal access token you configure below must be attached to an account with the following permissions:

    • USE CATALOG and USE SCHEMA on parent catalogs and schemas of tables registered as Immuta data sources so that the Immuta system account user can interact with those tables.
    • SELECT and MODIFY on all tables registered as Immuta data sources so that the system account user can grant and revoke access to tables and apply Unity Catalog row- and column-level security controls.
    • OWNER permission on the Immuta catalog created below.
    • OWNER permission on catalogs with schemas and tables registered as Immuta data sources so that Immuta can administer Unity Catalog row-level and column-level security controls. This permission can be applied by granting OWNER on a catalog to a Databricks group that includes the Immuta system account user to allow for multiple owners. If the OWNER permission cannot be applied at the catalog- or schema-level, each table registered as an Immuta data source must individually have the OWNER permission granted to the Immuta system account user.
    • USE CATALOG on the system catalog for native query audit.
    • USE SCHEMA on the system.access schema for native query audit.
    • SELECT on the following system tables for native query audit:

      • system.access.audit
      • system.access.table_lineage
      • system.access.column_lineage
  1. Click the App Settings icon in the left sidebar.
  2. Scroll to the Native Integration Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox. The additional settings in this section are only relevant to the Databricks Spark with Unity Catalog integration and will not have any effect on the Unity Catalog integration. These can be left with their default values.
  3. Scroll to the Native Integrations section, and click + Add Native Integration.
  4. Select Databricks Unity Catalog from the dropdown menu.
  5. Complete the following fields:

    • Server Hostname is the hostname of your Databricks workspace.
    • HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
    • Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
    • Exemption Group is the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
  6. Unity Catalog query audit is enabled by default; you can disable it by clicking the Enable Native Query Audit checkbox. Ensure you have enabled system tables in Unity Catalog and provided the required access to the Immuta system account.

    1. Configure the audit frequency by scrolling to Integrations Settings and find the Unity Catalog Audit Sync Schedule section.
    2. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
    3. Continue with your integration configuration.
  7. Enter a Databricks Personal Access Token. This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed above for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.

  8. Select the Manual toggle and copy or download the script. You can modify the script to customize your storage location for tables, schemas, or catalogs.
  9. Run the script in Databricks.
  10. Click Test Databricks Unity Catalog Connection.
  11. Save and Confirm your changes.

Enable native query audit for Unity Catalog

To enable native query audit for Unity Catalog, complete the following steps before configuring the integration:

  1. Enable a system schema where the <SCHEMA_NAME> is access.
  2. Grant your Immuta system account user access to the Databricks Unity Catalog system tables. For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.

    • USE CATALOG on the system catalog
    • USE SCHEMA on the system.access schema
    • SELECT on the following system tables:

      • system.access.audit
      • system.access.table_lineage
      • system.access.column_lineage
  3. Enable verbose audit logs in Unity Catalog.

  4. Use the Databricks Personal Access Token in the configuration above for the account you just granted system table access. This account will be the Immuta system account user.

Register your data

Register Unity Catalog tables as Immuta data sources.

Warning

External data connectors and query-federated tables are preview features in Databricks. See the Databricks documentation for details about the support and limitations of these features before registering them as data sources in the Unity Catalog integration.

Protect your data

  1. Map Databricks usernames to Immuta to ensure Immuta properly enforces policies and audits user queries.
  2. Build global policies in Immuta to enforce table-, column-, and row-level security.