Abusing Levenshtein automaton for syncing trees

Dr. Vladimir I. Levenshtein

Interpretation by Jan @Novoj Novotný

Go through presentation with me
http://bit.ly/jopenspace2019

Problem definition

Compute minimal set of operations to apply on left tree to get to the structure of the right one.

Levenshtein distance

Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits required to change one word into the other. It is named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965.

Allowed operations:

  • insert
  • delete
  • substitute

Example:


Levenshtein distance here is 3

Sequencing tree

A A( B C C( D D( E F )D )C G H )A I I( J )I

Applying Levenshtein automaton


Algorithm uses stack of operations instead of plain numbers. Imagine that under the number 3 is following stack of operations:

  • INSERT A(
  • INSERT B
  • DELETE C

Reconstructing tree

  1. removing opening and closing elements and pointing all inner operations to correct parents
  2. replacing pair operations INSERT + DELETE targeting the same element to MOVE operation

And finally apply modification recipe applied on source tree.

Real life usage

Real life usage

Real life usage - the why

  • FG developers create page structure -> Git
  • Customer creates it's own modifications to structure on base of existing page structure
  • FG developers update the page structure -> Git
  • We need to merge original Customer modifications to new page structure pulled from Git
  • We create modification recipe skipping changes to nodes present in Git

Conflicts may still occur.

Thank you for your attention

Sources:

Contact me @Novoj or novotnaci@gmail.com