An Alternate Approach to Generating Salted Hashes in Power Bi

Table of Contents

Challenge

In an earlier post I wrote about how I used R to generate hashes in Power Bi. The approach worked well, the need to use a personal gateway meant that it didn't fit my use case. I needed to find an alternate approach.

Solution

I developed two custom functions that kept the functionality self contained to the report. This approach also supports automatic scheduled refreshing of the data in the Power BI service.

getPseudoHash

The basis for the first function is a code snippet I found in the Power BI Community. Purists will argue that this function is misusing using a CRC value. I argue that it is similar enough to a hash function for this use case.

( clearText as text ) => Binary.ToText(
    Binary.FromList(
        List.FirstN(
            List.LastN(
                Binary.ToList(
                    Binary.Compress( Text.ToBinary( clearText, BinaryEncoding.Base64), Compression.GZip)
                ),
            8),
        4)
    ),
BinaryEncoding.Hex)

generateRandomString

The second function is based on the comments in this blog post. The random string that it generates is used as a salt.

let
    Source = () => let
        // Specify the length of the random string.
        StringLength = 32,

        // Specify the list of characters to choose from.
        ValidCharacters = Text.ToList( "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()" ),

        // Accumulate the output of the generate list function into the final string
        Source = Text.Combine(
            List.Transform(
                { 1..StringLength },
                each ValidCharacters { Number.IntegerDivide(
                        Number.RandomBetween( 0, List.Count( ValidCharacters ) -0.01 ), 1
                    )
                }
            )
        )
        // Return the random string.
    in Source
in
    Source

Putting them together

The screen capture below shows how I'm using these two functions. I've had to blur out some of the identifying information in the query.

An example of how I use these two custom functions

An example of how I use these two custom functions

I've added numbers to highlight the important lines. Specifically:

  1. Sorting the rows to make further analysis easier to verify.
  2. Cleaning the values that I need to hash.
  3. Trimming the values.
  4. Converting to lowercase the values.
  5. Replacing all spaces with an empty string.
  6. Generating a random string using the generateRandomString and storing it in a variable.
  7. Using the getPseudoHash to generate a hash using the value combined with the random string.
  8. Removing the original values.

Benefits

The benefits of this approach are similar to the earlier approach:

  1. The report supports trend analysis by using the hashed values as keys.
  2. Removal of the PII maintains the privacy of the people.
  3. Salted hash values do not support reverse engineering the PII.
  4. The salted hash is random on each refresh, and is not stored in the report.

The biggest benefit is that it can be used in the Power BI service without the need for a data gateway.