.net c#获取字符串比较的开始和结束

Get start and end of string comparison
2021-06-10
  •  译文(汉语)
  •  原文(英语)

我正在尝试自行创建某种基于字符串的diff算法.我正在做的是:我正在遍历文本文档中的每个段落,将它们进行比较.

现在,我正在努力的是两个字符串的比较开始和结束.

考虑使用以下两个字符串:

This is a test-text.

This is a very long test-text.

这意味着第二行('very long ')更改了10个字符(9个文本,1个空格).

这些字符应相应地突出显示.我已经想出了找到字符串差异开始的解决方案(例如:索引n是差异开始的地方):

int diffIndexStart = localText.Zip(serverText, (c1, c2) => c1 == c2).TakeWhile(b => b).Count();

现在如何检测字符串何时再次匹配,所以我可以在此处停止突出显示,而不是突出显示行的其余部分(以开头diffIndexStart).

还有另一个问题:如果一行中有多个更改,那是什么意思呢?

This is a test-text.

This, apparently, is a very long test-text.

现在,我进行了两项更改:, apparently,very long.

解决过程1

您正在查看常见的最长公共子序列(LCS)问题.关于这一点的文章很多(Wikipedia页面提供了一些链接作为开始),Wiki中已经突出了几种常见的方法.

速聊1:
这是答案还是评论?而且我认为这与Levenshtein距离更相关
速聊2:
您是否声称我的回答没有给出工作方向?还是您希望看到廉价的自由职业者风格的"此代码段有效"的非答案?
速聊3:
And I think It's more related to Levenshtein distance 是和否-OP至少在这里仅举例说明插入.

I'm trying to create some sort of string-based diff algorithm on my own. What I'm doing is: I'm iterating through every paragraph in my textdocument, comparing them both.

Now what I'm struggling with is the comparison start and end of both strings.

Consider having the two strings:

This is a test-text.

This is a very long test-text.

This means there's a change of 10 characters (9 text, 1 whitespace) in the second line ('very long ').

These characters should be highlighted accordingly. I've already come up with the solution of finding the start of the string-differences (say: index n is where the differences start):

int diffIndexStart = localText.Zip(serverText, (c1, c2) => c1 == c2).TakeWhile(b => b).Count();

Now how can I detect when the string matches again, so I can stop highlighting there, instead of highlighting the rest of the row (starting with diffIndexStart).

There's also another issue: What's when there are multiple changes within one line, let's say:

This is a test-text.

This, apparently, is a very long test-text.

Now I've got two changes: , apparently, and very long.

Solutions1

You're looking at the common Longest Common Subsequence (LCS) problem. There are numerous papers on that (the Wikipedia page gives some links as a start), several common approaches are highlighted in Wiki already.

Talk1:
Is this an answer or comment? And I think It's more related to Levenshtein distance
Talk2:
Do you claim my answer does not give a direction to work in? Or did you expect to see an cheap freelancer-style "this code piece works" non-answer?
Talk3:
And I think It's more related to Levenshtein distance Yes and no - OP does speak only of insertions here, at least in examples.
转载于:https://stackoverflow.com/questions/19082041/get-start-and-end-of-string-comparison

本人是.net程序员,因为英语不行,使用工具翻译,希望对有需要的人有所帮助
如果本文质量不好,还请谅解,毕竟这些操作还是比较费时的,英语较好的可以看原文

留言回复
我们只提供高质量资源,素材,源码,坚持 下了就能用 原则,让客户花了钱觉得值
上班时间 : 周一至周五9:00-17:30 期待您的加入