I am trying to match and remove multiple newlines between quotations to clean up a CSV file. SED is what I am most familiar with, but am happy to use whatever assuming I have access to it.
Example
"ABC","This is a test ","1","2","This is another test"
Expected End Result
"ABC","This is a test","1","2","This is another test"
I've tried multiple patterns on regex101.com and looked around the "similar questions," but can't seem to find anything remotely close to working. Any help would be appreciated.
regex101.com
You may try this gnu awk:
gnu awk
awk -v RS='"[^"]+"' 'RT {gsub(/" +| +"/, """, RT); gsub(/ +/, " ", RT)} {ORS=RT} 1' file.csv "ABC","This is a test","1","2","This is another test"
2.1m questions
2.1m answers
60 comments
57.0k users