仅当逗号在双引号C#之外时才用逗号分割字符串

Split string by comma only when is coma outside of double-quotes c#
2021-01-20
  •  译文(汉语)
  •  原文(英语)
"TIMESTAMP (UTC)","LOG TYPE","DEVICE TYPE","DEVICE","MESSAGE","PARAMETERS"
"2014-08-12 17:30:34.437","Warning","DiverGate","141403G00294","Diver gate(s) did not connect since","2014-08-08 06:37:31 (UTC)"
"2014-08-12 17:30:34.577","Warning","DiverGate","141403G00120","Diver gate(s) did not connect since","2014-08-08 06:46:22 (UTC)"
"2014-08-13 06:45:18.890","Error","DiverGate","141403G00294","Was set to inactive, because it did not connect since","2014-08-08 06:37:31 (UTC)"
"2014-08-13 07:00:18.903","Error","DiverGate","141403G00120","Was set to inactive, because it did not connect since","2014-08-08 06:46:22 (UTC)"

这是我的.csv文件,我需要从文件中读取信息,但是我需要用双引号之外的逗号分隔信息,因为在其他一些文件中,我可以将逗号分为某些信息,尤其是消息,日志类型. ..

 string url = @"E:\Project.csv";
 Stream stream = File.Open(url, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
            string[] lines = null;

            using (StreamReader sr = new StreamReader(stream))
            {
                string str = sr.ReadToEnd();
                lines = Regex.Split(str, //what expression is going here);
            }
解决过程1

您可以尝试环视

它们不是consume字符串中的字符,而是仅断言amatch是否可行.

(?<="),(?=")

这是在线演示,并在regexstorm进行了测试

模式说明很简单

  (?<=                     look behind to see if there is:
    "                        '"'
  )                        end of look-behind
  ,                        ','
  (?=                      look ahead to see if there is:
    "                        '"'
  )                        end of look-ahead
速聊1:
这很好,但是我还有另一个问题.当它到达一行的结尾时(在这种情况下为4),我得到\"PARAMETERS\"\r\n\"2014-08-12 17:30:34.437\"".我无法理解\ r \ n \的含义,但是如何改为\ r \ n \将所有来自\ r \ n \的文本放入新字符串中?
速聊2:
regexstorm这就是我的字符串数组中的样子.每四个索引看起来像这样
解决过程2

这只是基本的CSV解析,并且已经有库在做.我建议您看一下我以前使用过的CsvHelper,而不是尝试重新发明轮子.

您可以使用Package Manager控制台并键入以下命令,将其真正轻松地包含在项目中:

安装包CsvHelper

解决过程3

不用推出自己的CSV解析器,而使用现有的库.有TextFieldParser类,这是可以用Visual Basic,我想补充参考Microsoft.VisualBasic 下项目引用,那么你可以这样做:

TextFieldParser textFieldParser = new TextFieldParser(@"E:\Project.csv");
textFieldParser.TextFieldType = FieldType.Delimited;
textFieldParser.SetDelimiters(",");
while (!textFieldParser.EndOfData)
{
    string[] values = textFieldParser.ReadFields();
    Console.WriteLine(string.Join("---", values));//printing the row
}
textFieldParser.Close();
解决过程4

嘿,您也可以使用此正则表达式

var result = Regex.Split(samplestring, ",(?=(?:[^']*'[^']*')*[^']*$)");
"TIMESTAMP (UTC)","LOG TYPE","DEVICE TYPE","DEVICE","MESSAGE","PARAMETERS"
"2014-08-12 17:30:34.437","Warning","DiverGate","141403G00294","Diver gate(s) did not connect since","2014-08-08 06:37:31 (UTC)"
"2014-08-12 17:30:34.577","Warning","DiverGate","141403G00120","Diver gate(s) did not connect since","2014-08-08 06:46:22 (UTC)"
"2014-08-13 06:45:18.890","Error","DiverGate","141403G00294","Was set to inactive, because it did not connect since","2014-08-08 06:37:31 (UTC)"
"2014-08-13 07:00:18.903","Error","DiverGate","141403G00120","Was set to inactive, because it did not connect since","2014-08-08 06:46:22 (UTC)"

This is my .csv file and i need to read informations from file, but I need to split informations with comma who is outside double quotes, because in some other files I can find comma into some informations, especially in message, log type,...

 string url = @"E:\Project.csv";
 Stream stream = File.Open(url, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
            string[] lines = null;

            using (StreamReader sr = new StreamReader(stream))
            {
                string str = sr.ReadToEnd();
                lines = Regex.Split(str, //what expression is going here);
            }
Solutions1

You can try with Lookaround

They do not consume characters in the string, but only assert whether a match is possible or not.

(?<="),(?=")

Here is online demo and tested at regexstorm

Pattern explanation is very simple

  (?<=                     look behind to see if there is:
    "                        '"'
  )                        end of look-behind
  ,                        ','
  (?=                      look ahead to see if there is:
    "                        '"'
  )                        end of look-ahead
Talk1:
This is great, but i have one more problem. When this get to end of one row (i have it in this case 4), I get \"PARAMETERS\"\r\n\"2014-08-12 17:30:34.437\"". I unederstand what is meaning \r\n\, but how to instead \r\n\ i put all text right from \r\n\ into new string?
Talk2:
regexstorm This is how its like in my string array. Every fourth index is seems like this
Solutions2

This is just basic CSV parsing, and there are libraries out there to do it already. I would recommend taking a look at CsvHelper which I've used before rather than trying to re-invent the wheel.

You can include this in your project really easily by using the Package Manager Console and typing:

Install-Package CsvHelper

Solutions3

Instead of rolling out your own CSV parser, use existing libraries. There is TextFieldParser class which is available with Visual Basic, Just add reference to Microsoft.VisualBasic under project references then you can do:

TextFieldParser textFieldParser = new TextFieldParser(@"E:\Project.csv");
textFieldParser.TextFieldType = FieldType.Delimited;
textFieldParser.SetDelimiters(",");
while (!textFieldParser.EndOfData)
{
    string[] values = textFieldParser.ReadFields();
    Console.WriteLine(string.Join("---", values));//printing the row
}
textFieldParser.Close();
Solutions4

Hey you can also use this regex

var result = Regex.Split(samplestring, ",(?=(?:[^']*'[^']*')*[^']*$)");
转载于:https://stackoverflow.com/questions/25384056/split-string-by-comma-only-when-is-coma-outside-of-double-quotes-c-sharp

本人是.net程序员,因为英语不行,使用工具翻译,希望对有需要的人有所帮助
如果本文质量不好,还请谅解,毕竟这些操作还是比较费时的,英语较好的可以看原文

留言回复
我们只提供高质量资源,素材,源码,坚持 下了就能用 原则,让客户花了钱觉得值
上班时间 : 周一至周五9:00-17:30 期待您的加入