Hi guys,

After trawling through the forums and interweb, hopefully someone here can offer me some help with sed!
I have a csv file that contains records. Each field is delimited by a comma, and each row is delimited by a \n. However, in some cases, I need to remove the \n that may occur in a string between two comma's.

e.g. 1 (correct) TAP,NO,NO,Country,NSW,NSW,No,Yes,,LLL,,Rural and Remote,,,,,,

e.g. 2 (incorrect) TAP,NO,NO,Country,NSW,NSW,No,Yes,,LLL,,"Regional Towns
Added 22 May 04 by Mwun-LLL/8M",,,,,,

As you can see in example two, after "Regional Towns", there is a \n that needs to be removed.
I can't remove every single occurence of \n because that will just give me one huge line. I can only remove the \n when it occurs between comma's.

I am quite new to sed, so any help would be much appreciated!

I haven't tried it very often, but look up the sed command "N". It is meant to append an additional line into the pattern space. Try something like

/"[^"]*$/{
N
s/\n//
}

Edited 3 Years Ago by diafol: fixed formatting

Hey There,

The above solution is good for 2 lines. It can be modified for an indefinite number, also, but that doesn't come to my mind right this second ;)

If the \n you want to remove is always at the exact end of the line and you have a variable amount of \n's to contend with, you can try this also, if you know what the exact format of a line should be (looks like a comma at the end and never a comma directly preceding a \n, from what you've posted but that might not always be the case, in which case this won't help - sorry):

sed 's/^.*,\n$//'

and it should take care of you in that limited circumstance

Best wishes,

Mike

Hi guys,

Thanks for your help, it's certainly put me on the right track!

Mike, you are correct about masijade's sed statement, it breaks when there is more than 2 \n in a line.
However, running your sed statement produces no results for some reason. Looking at it, it does what I need it to do, i.e. if a line ends with ",\n", remove that \n. All lines need to end with a comma.

Any thoughts as to why it might not be working? I am pretty confident that the text file is UNIX format, as masijade's statement works okay.

Thanks in advance

Actually, looking back at my previous post, I am contradicting myself.
What I need is a sed statement that searches for all lines that DO NOT end with a comma, and remove that \n at the end. All lines need to end with one or more comma's.

Hope that makes more sense

Hey There,

My bad on the sed statement. I meant to do the substitution, and left out the match operators :(

Given your extra post, Masijade's post is dead on.

just sub

/"[^"]*$/

with

/.*[^,]$/

Hopefully we were able to help out in tandem :)

, Mike

Just to clarify (in case you don't realise it)

sed 's/^.*,\n$//'

doesn't work because sed is a line editor. It works one line at a time, and the "\n" at the end of the line is the end of the pattern, i.e. it is $. For that reason "\n$" will never match. Also, you cannot remove/replace the newline, for the same reason. That is the reason that the "N" command exists, so that you can pull an additional line into the pattern space, thereby making the \n a part of the pattern, rather than it being the end of the pattern (now the \n at the end of the second line is the end of the pattern, until you read in a third line, etc.)

At least that is the way I understand it. I could be wrong though.

Edit: And, by the way, just as a side note, if that command had actually done anything, it would have deleted the entire line. ;-)

Yes, I understand - you have to cut me some slack for answering questions in the middle of the night ;) Corrected in my follow-up post.

Thanks :)

, Mike

It's cool. I was mainly explaining for the OP's benefit. The more he understands the less he has to ask. ;-)

Definitely, no hard feelings, just kidding around with ya,

I really do need to get some sleep, though ;)

Take it easy :)

, Mike

This article has been dead for over six months. Start a new discussion instead.