.net c#如何辨别哪个分隔符字符串

How to tell which delimiter string was split on
2021-07-21
  •  译文(汉语)
  •  原文(英语)

我正在尝试从PDF提取的文本中解析出订单项.提取的文本格式不正确,每页一个长字符串.没有任何有用的定界符,但是这些行以两个字符串之一开头.我已经使用带有两个字符串的字符串数组设置了Split(),但是我需要知道元素在哪个分隔符上进行拆分.

我找到了此链接,但是我在RegEx上并不是很好.有人可以协助编写RegEx字串吗?

    var lineItems = page.PageText.Split(new string[] { "First String Delimiter", "Second String Delimiter" }, StringSplitOptions.None);

我需要知道的是element [x]是"First String Delimiter"还是"Second String Delimiter"的结果.

编辑:我不在乎Regex是否是解决方案.Linq可能同样适用.直到获得学位后,Linq才出来,所以我对它也不熟悉.

想象一个页面,其中大约有15-20个从头到尾的返回结果是一个长字符串,没有回车符:由于它们都以"公司贸易付款信用"或"预授权ACH信用"开头,因此我可以拆分这些,但是我需要知道它是什么类型.

预授权ACH信用(165)10,000.00 489546541 0000000000文本有关交易的一些详细描述-预授权ACH信用(165)5,310.99 8465498461 0000000000文本另一个长描述公司贸易支付信用(165)4,933.17 8478632458775 0000000000文本另一个机密字符串描述.

速聊1:
请举一些例子.
解决过程1

您为什么不只两次运行拆分,一次使用第一个定界符,然后再次使用第二个定界符?

var firstDelimiterItems = page.PageText.Split("First String Delimiter");

var secondDelimiterItems = page.PageText.Split("Second String Delimiter");
速聊1:
那是我被迫要做的事情,但是如果可能的话,我希望从一开始就将每个订单项都包含在自己的元素中.
解决过程2

有时,最简单的解决方案是最好的解决方案.不知道为什么这在我之前没有发生过.

    var pageText = page.PageText.Replace("Corporate Trade Payment", "\r\nCorporate Trade Payment").Replace("Preauthorized ACH Credit", "\r\nPreauthorized ACH Credit");

这使我可以将订单项放在各自的行中.无需正则表达式.谢谢大家的帮助,如果您找到正则表达式原始问题的解决方法,请发表.我一直在努力学习更多.

I'm trying to parse out line items from text extracted from a PDF. The text extracted comes out poorly formatted and in one long string per page. There aren't any useful delimiters, but the lines start with one of two strings. I've set up the Split() using a string array with both of those strings, but I need to know which delimiter the elements were split on.

I found this link, but I'm not that great at RegEx. Can someone assist in writing the RegEx string?

    var lineItems = page.PageText.Split(new string[] { "First String Delimiter", "Second String Delimiter" }, StringSplitOptions.None);

What I need is to know is if element[x] was a result of "First String Delimiter" or "Second String Delimiter".

EDIT: I don't care if Regex is the solution. Linq may be equally suited. Linq didn't come out until after I earned my degrees, so I'm similarly unfamiliar with it.

Imagine a page with about 15-20 of these end to end coming back as one long string with no carriage returns: Since they all start with "Corporate Trade Payment Credit" or "Preauthorized ACH Credit", I can split on those, but I need to know what type it was.

Preauthorized ACH Credit (165) 10,000.00 489546541 0000000000 Text Some long description about transaction- Preauthorized ACH Credit (165) 5,310.99 8465498461 0000000000 Text Another long description Corporate Trade Payment Credit (165) 4,933.17 8478632458775 0000000000 Text Another confidential string description.

Talk1:
give some examples please.
Solutions1

Why don't you just run the split twice, once with the first delimiter, then again with the second delimiter?

var firstDelimiterItems = page.PageText.Split("First String Delimiter");

var secondDelimiterItems = page.PageText.Split("Second String Delimiter");
Talk1:
That my be what I'm forced to do, but I would like it, if possible, to have each line item in its own element from the start.
Solutions2

Sometimes the simplest solutions are the best ones. Don't know why this didn't occur to me earlier.

    var pageText = page.PageText.Replace("Corporate Trade Payment", "\r\nCorporate Trade Payment").Replace("Preauthorized ACH Credit", "\r\nPreauthorized ACH Credit");

This gives me the line items on their own lines. No Regex needed. Thank you all for your help, and if you find a way to the original question with Regex, please post. I'm always up to learning more.

转载于:https://stackoverflow.com/questions/17430233/how-to-tell-which-delimiter-string-was-split-on

本人是.net程序员,因为英语不行,使用工具翻译,希望对有需要的人有所帮助
如果本文质量不好,还请谅解,毕竟这些操作还是比较费时的,英语较好的可以看原文

留言回复
我们只提供高质量资源,素材,源码,坚持 下了就能用 原则,让客户花了钱觉得值
上班时间 : 周一至周五9:00-17:30 期待您的加入