<input id="0qass"><u id="0qass"></u></input>
  • <input id="0qass"><u id="0qass"></u></input>
  • <menu id="0qass"><u id="0qass"></u></menu>

    算法:編輯距離(萊文斯坦距離)

    算法 專欄收錄該內容
    5 篇文章 0 訂閱

    題目描述

    編輯距離介紹

    編輯距離是針對二個字符串(例如英文字)的差異程度的量化量測,量測方式是看至少需要多少次的處理才能將一個字符串變成另一個字符串。編輯距離可以用在自然語言處理中,例如拼寫檢查可以根據一個拼錯的字和其他正確的字的編輯距離,判斷哪一個(或哪幾個)是比較可能的字。DNA也可以視為用A、C、G和T組成的字符串,因此編輯距離也用在生物信息學中,判斷二個DNA的類似程度。Unix下的diff及patch即是利用編輯距離來進行文本編輯對比的例子。
    編輯距離有幾種不同的定義,差異在可以對字符串進行的處理。

    在萊文斯坦距離中,可以刪除、加入、取代字符串中的任何一個字元,也是較常用的編輯距離定義,常常提到編輯距離時,指的就是萊文斯坦距離。
    也存在其他編輯距離的定義方式,例如 Damerau-Levenshtein 距離是一種萊文斯坦距離的變種,但允許以單一操作交換相鄰的兩個字符(稱為字符轉置),如 AB→BA 的距離是 1(交換)而非 2(先刪除再插入、或者兩次替換)。
    LCS(最長公共子序列)距離只允許刪除、加入字元。
    Jaro 距離只允許字符轉置。
    漢明距離只允許取代字元。 [1]
    

    萊文斯坦距離

    萊文斯坦距離,又稱Levenshtein距離,是編輯距離的一種。指兩個字串之間,由一個轉成另一個所需的最少編輯操作次數。允許的編輯操作包括將一個字符替換成另一個字符,插入一個字符,刪除一個字符。俄羅斯科學家弗拉基米爾·萊文斯坦在1965年提出這個概念。 [1]
    例如將kitten一字轉成sitting:

    sitten (k→s)
    sittin (e→i)
    sitting (→g)
    

    示例

    示例1:
    kitten和sitting的萊文斯坦距離是3。將kitten變為sitting的最小處理方式如下:

    kitten →sitten(將k改為s)
    sitten → sittin(將e改為i)
    sittin → sitting(最后加入g)
    

    若是考慮LCS距離(只考慮加入及刪除),LCS距離是5:

    刪除位在第1個字的k
    在第1個字之前加入s
    刪除位在第4個字的e
    在第4個字之前加入i
    在第6個字之后加入g [1]
    

    示例2:
    輸入:word1 = “horse”, word2 = “ros”
    輸出:3
    解釋:

    horse -> rorse (將 'h' 替換為 'r')
    rorse -> rose (刪除 'r')
    rose -> ros (刪除 'e')
    

    示例 3:
    輸入:word1 = “intention”, word2 = “execution”
    輸出:5
    解釋:

    intention -> inention (刪除 't')
    inention -> enention (將 'i' 替換為 'e')
    enention -> exention (將 'n' 替換為 'x')
    exention -> exection (將 'n' 替換為 'c')
    exection -> execution (插入 'u')
    

    代碼

    import pysnooper
    class Solution:
        # @pysnooper.snoop()  # pysnooper是裝飾器,可以修改函數,作用是打印函數的參數
        def editDistance(self , word1 , word2 ):
            # write code here
            temp = [[i+j for j in range(len(word2)+1)] for i in range(len(word1)+1)]
            for i in range(1,len(word1)+1):
                for j in range(1,len(word2)+1):
                    if word1[i-1] == word2[j-1]:
                        d = 0
                    else:
                        d =1
                    temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
            return temp[-1][-1]
    
    word1,word2 = "horse","ros"
    s = Solution()
    print(s.editDistance(word1 , word2))
    

    運行過程分析

    Starting var:.. word1 = 'horse'
    Starting var:.. word2 = 'ros'
    11:24:46.749754 call         4     def editDistance(self , word1 , word2 ):
    11:24:46.750755 line         6         temp = [[i+j for j in range(len(word2)+1)] for i in range(len(word1)+1)]
    New var:....... temp = [[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8]]
    11:24:46.750755 line         7         for i in range(1,len(word1)+1):
    New var:....... i = 1
    11:24:46.750755 line         8             for j in range(1,len(word2)+1):
    New var:....... j = 1
    11:24:46.750755 line         9                 if word1[i-1] == word2[j-1]:
    11:24:46.750755 line        12                     d =1
    New var:....... d = 1
    11:24:46.750755 line        13                 temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
    Modified var:.. temp = [[0, 1, 2, 3], [1, 1, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8]]
    11:24:46.750755 line         8             for j in range(1,len(word2)+1):
    Modified var:.. j = 2
    11:24:46.751755 line         9                 if word1[i-1] == word2[j-1]:
    11:24:46.751755 line        12                     d =1
    11:24:46.751755 line        13                 temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
    Modified var:.. temp = [[0, 1, 2, 3], [1, 1, 2, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8]]
    11:24:46.751755 line         8             for j in range(1,len(word2)+1):
    Modified var:.. j = 3
    11:24:46.751755 line         9                 if word1[i-1] == word2[j-1]:
    11:24:46.751755 line        12                     d =1
    11:24:46.751755 line        13                 temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
    Modified var:.. temp = [[0, 1, 2, 3], [1, 1, 2, 3], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8]]
    11:24:46.751755 line         8             for j in range(1,len(word2)+1):
    11:24:46.751755 line         7         for i in range(1,len(word1)+1):
    Modified var:.. i = 2
    11:24:46.751755 line         8             for j in range(1,len(word2)+1):
    Modified var:.. j = 1
    11:24:46.751755 line         9                 if word1[i-1] == word2[j-1]:
    11:24:46.751755 line        12                     d =1
    11:24:46.751755 line        13                 temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
    Modified var:.. temp = [[0, 1, 2, 3], [1, 1, 2, 3], [2, 2, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8]]
    11:24:46.751755 line         8             for j in range(1,len(word2)+1):
    Modified var:.. j = 2
    11:24:46.751755 line         9                 if word1[i-1] == word2[j-1]:
    11:24:46.751755 line        10                     d = 0
    Modified var:.. d = 0
    11:24:46.751755 line        13                 temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
    Modified var:.. temp = [[0, 1, 2, 3], [1, 1, 2, 3], [2, 2, 1, 5], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8]]
    11:24:46.751755 line         8             for j in range(1,len(word2)+1):
    Modified var:.. j = 3
    11:24:46.752755 line         9                 if word1[i-1] == word2[j-1]:
    11:24:46.752755 line        12                     d =1
    Modified var:.. d = 1
    11:24:46.752755 line        13                 temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
    Modified var:.. temp = [[0, 1, 2, 3], [1, 1, 2, 3], [2, 2, 1, 2], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8]]
    11:24:46.752755 line         8             for j in range(1,len(word2)+1):
    11:24:46.752755 line         7         for i in range(1,len(word1)+1):
    Modified var:.. i = 3
    11:24:46.752755 line         8             for j in range(1,len(word2)+1):
    Modified var:.. j = 1
    11:24:46.752755 line         9                 if word1[i-1] == word2[j-1]:
    11:24:46.752755 line        10                     d = 0
    Modified var:.. d = 0
    11:24:46.752755 line        13                 temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
    Modified var:.. temp = [[0, 1, 2, 3], [1, 1, 2, 3], [2, 2, 1, 2], [3, 2, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8]]
    11:24:46.752755 line         8             for j in range(1,len(word2)+1):
    Modified var:.. j = 2
    11:24:46.756755 line         9                 if word1[i-1] == word2[j-1]:
    11:24:46.756755 line        12                     d =1
    Modified var:.. d = 1
    11:24:46.756755 line        13                 temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
    Modified var:.. temp = [[0, 1, 2, 3], [1, 1, 2, 3], [2, 2, 1, 2], [3, 2, 2, 6], [4, 5, 6, 7], [5, 6, 7, 8]]
    11:24:46.756755 line         8             for j in range(1,len(word2)+1):
    Modified var:.. j = 3
    11:24:46.756755 line         9                 if word1[i-1] == word2[j-1]:
    11:24:46.756755 line        12                     d =1
    11:24:46.756755 line        13                 temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
    Modified var:.. temp = [[0, 1, 2, 3], [1, 1, 2, 3], [2, 2, 1, 2], [3, 2, 2, 2], [4, 5, 6, 7], [5, 6, 7, 8]]
    11:24:46.756755 line         8             for j in range(1,len(word2)+1):
    11:24:46.756755 line         7         for i in range(1,len(word1)+1):
    Modified var:.. i = 4
    11:24:46.756755 line         8             for j in range(1,len(word2)+1):
    Modified var:.. j = 1
    11:24:46.756755 line         9                 if word1[i-1] == word2[j-1]:
    11:24:46.757757 line        12                     d =1
    11:24:46.757757 line        13                 temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
    Modified var:.. temp = [[0, 1, 2, 3], [1, 1, 2, 3], [2, 2, 1, 2], [3, 2, 2, 2], [4, 3, 6, 7], [5, 6, 7, 8]]
    11:24:46.757757 line         8             for j in range(1,len(word2)+1):
    Modified var:.. j = 2
    11:24:46.757757 line         9                 if word1[i-1] == word2[j-1]:
    11:24:46.757757 line        12                     d =1
    11:24:46.757757 line        13                 temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
    Modified var:.. temp = [[0, 1, 2, 3], [1, 1, 2, 3], [2, 2, 1, 2], [3, 2, 2, 2], [4, 3, 3, 7], [5, 6, 7, 8]]
    11:24:46.757757 line         8             for j in range(1,len(word2)+1):
    Modified var:.. j = 3
    11:24:46.757757 line         9                 if word1[i-1] == word2[j-1]:
    11:24:46.757757 line        10                     d = 0
    Modified var:.. d = 0
    11:24:46.757757 line        13                 temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
    Modified var:.. temp = [[0, 1, 2, 3], [1, 1, 2, 3], [2, 2, 1, 2], [3, 2, 2, 2], [4, 3, 3, 2], [5, 6, 7, 8]]
    11:24:46.758758 line         8             for j in range(1,len(word2)+1):
    11:24:46.758758 line         7         for i in range(1,len(word1)+1):
    Modified var:.. i = 5
    11:24:46.758758 line         8             for j in range(1,len(word2)+1):
    Modified var:.. j = 1
    11:24:46.758758 line         9                 if word1[i-1] == word2[j-1]:
    11:24:46.758758 line        12                     d =1
    Modified var:.. d = 1
    11:24:46.758758 line        13                 temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
    Modified var:.. temp = [[0, 1, 2, 3], [1, 1, 2, 3], [2, 2, 1, 2], [3, 2, 2, 2], [4, 3, 3, 2], [5, 4, 7, 8]]
    11:24:46.758758 line         8             for j in range(1,len(word2)+1):
    Modified var:.. j = 2
    11:24:46.758758 line         9                 if word1[i-1] == word2[j-1]:
    11:24:46.758758 line        12                     d =1
    11:24:46.758758 line        13                 temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
    Modified var:.. temp = [[0, 1, 2, 3], [1, 1, 2, 3], [2, 2, 1, 2], [3, 2, 2, 2], [4, 3, 3, 2], [5, 4, 4, 8]]
    11:24:46.758758 line         8             for j in range(1,len(word2)+1):
    Modified var:.. j = 3
    11:24:46.758758 line         9                 if word1[i-1] == word2[j-1]:
    11:24:46.759756 line        12                     d =1
    11:24:46.759756 line        13                 temp[i][j]=min(temp[i-1][j]+1,temp[i][j-1]+1,temp[i-1][j-1]+d)
    Modified var:.. temp = [[0, 1, 2, 3], [1, 1, 2, 3], [2, 2, 1, 2], [3, 2, 2, 2], [4, 3, 3, 2], [5, 4, 4, 3]]
    11:24:46.759756 line         8             for j in range(1,len(word2)+1):
    11:24:46.759756 line         7         for i in range(1,len(word1)+1):
    11:24:46.759756 line        14         return temp[-1][-1]
    11:24:46.759756 return      14         return temp[-1][-1]
    Return value:.. 3
    3
    

    參考文獻

    [1]Navarro, Gonzalo. A guided tour to approximate string matching (PDF). ACM Computing Surveys. 1 March 2001, 33 (1): 31–88 [19 March 2015]. doi:10.1145/375360.375365.

    • 6
      點贊
    • 4
      評論
    • 6
      收藏
    • 一鍵三連
      一鍵三連
    • 掃一掃,分享海報

    ??2020 CSDN 皮膚主題: 程序猿惹誰了 設計師:白松林 返回首頁
    實付
    使用余額支付
    點擊重新獲取
    掃碼支付
    錢包余額 0

    抵扣說明:

    1.余額是錢包充值的虛擬貨幣,按照1:1的比例進行支付金額的抵扣。
    2.余額無法直接購買下載,可以購買VIP、C幣套餐、付費專欄及課程。

    余額充值
    多乐彩