Deriving a migration order from the Jaccard index of properties

There are many different factors that come into play when planning the sequence of multiple migrations to the cloud. Regardless of whether it is different applications or a platform with different tenants, a sequence must be designed in each case. A good strategy would be to start with a simple case and then gradually increase the complexity through similar applications/tenants. To determine the similarity of objects, corresponding properties can be selected and compared with the Jaccard Index.

In the following example, the order for different customers of a platform is to be defined, whereby the individual customers only use parts (features) of the platform. Assuming that all features cause the same amount of effort during migration, you should start with customers who use few features. Customers with the same features or slightly more features can be migrated afterwards. The mapping table contains the assignment of the 17 features to the 27 customers:

A 1 means that the customer uses this feature and a 0 means that the customer does not need this feature. Cust8 and Cust16 are therefore good first migration candidates because they use few features. The next customer should then be very similar to the first migrated customers. Two customers are very similar if they have a 1 and a 0 in the same places. These customers therefore have the same range of features and can probably be migrated at the same time. However, a 0 means that a feature is not relevant. It is therefore easy to migrate customers with the same or fewer features at the same time:

Feature	Migrated Customer	Potential Migration Candidate	Hint
Example 1	1	1	similar
Example 2	0	0	similar
Example 3	1	0	similar
Example 4	0	1	not similar

To represent this similarity in Excel, the Jaccard Index can be calculated using a matrix function that compares every customer with every other. A very good guide has been written on Stack Overflow for this, although adjustments were needed. Firstly, if one customer uses the feature and the other customer does not, it should be recognized as the same. Secondly, the formula must find the elements (features) to be compared in rows. The formula =IF(COLUMN(B2)>ROW(B2);"";AVERAGE(IF(INDIRECT("Mapping!"&ADDRESS(2;ROW($B2); 1)):INDIRECT("Mapping!"&ADDRESS(28;ROW($B2); 1))>=Mapping!B$2:Mapping!B$28;1;0))) is entered in cell B2 in the second worksheet Jaccard. It must be inserted as a matrix function, i.e. with SHIFT + CTRL + ENTER. The formula can then be filled in downwards and to the right across the worksheet. The result then shows the similarities between the customers in the row and the customers in the column:

Specifically for cell E8, it means that customer Cust7 is 93% similar to customer Cust4. Cust7 has 11 features and Cust4 has 8 features, with Cust4 still needing features 2 and 3, which Cust7 does not use. The formula for cell E8 is as follows:

{
  =IF(
    COLUMN(E8)>ROW(E8);
    "";
    AVERAGE(
      IF(
        INDIRECT("Mapping!"&ADDRESS(2;ROW($B8); 1))
        :INDIRECT("Mapping!"&ADDRESS(28;ROW($B8); 1))
        >= Mapping!E$2:Mapping!E$28;
        1;
        0
      )
    )
  )
}

The curly brackets in rows 1 and 15 are created by entering the formula as a matrix function and do not need to be inserted manually.
Line 2-4 are only there to make the formula look better when filling it out. This means that the formula is only used in the lower left part of the table.
In row 7, the start cell and in row 8, the end cell for the source customer are calculated. The current row (ROW) of $B8, which corresponds to the customer’s column, is used for this. ADDRESS is used to define the range for column 8 (H) from 2 to 28. Since the column is on the Mapping worksheet, Mapping! is added in front of the cell coordinates. INDIRECT turns the character string into a real reference.
The column is then compared with column E (row 11). The comparison operator is >=, which means that 0 and 0, 1 and 1 and also 1 and 0 meet the condition. If the condition is true, a 1 is returned, otherwise a 0.
The matrix function applies this comparison to each feature row of the two columns and then calculates the average.

A 1 in the result therefore means an exact match. A value close to 1 is therefore a sign of high similarity. For better visualization, conditional formatting can also be used. The entire Excel file MigrationOrderByJaccardIdx.xlsx can be downloaded from GitHub.

Deriving a migration order from the Jaccard index of properties

Thomas Zühlke

Schreibe einen Kommentar Antwort abbrechen