Hi,

My CSV files contains embedded double quotes, as it does not conform to the RFC 4180 standard and results in parser complication.

I tried using Perl Module "Text::CSV_XS" to properly escape with double-quotes, the output file gives me the result as desired, however some of the line breaks into multiple lines because few fields contain breakline characters.

Any help in this regard will be appreciated. Please let me know if you need any info.

Regards,
John

just check for the newline characters and replace them...something like:

$linefromfile =~ s/\n//gi;

Just remember though, that you will have to manually add your own newline character ("\n") whenever you want to save the data to the new file... otherwise it will be one big ass line. :icon_eek:

Hi Kevin,

Thanks for the reply. I am already using the steps as suggested by the reference link you provided above. But the problem still persist, I mean breakline characters in fields values creates a new line. I am pasting the script for your reference/comments.

Please let me know if you need any further info.

Thanks in Advance,
John

#!/pro/bin/perl

# This script can be used as a base to parse unreliable CSV streams
# Modify to your own needs

use strict;
use warnings;

use Text::CSV_XS;

my $csv = Text::CSV_XS->new ({ binary             => 1,
			       blank_is_undef     => 1,
			       always_quote     =>1,
			       eol                => $/,
			       			       });
my $csa = Text::CSV_XS->new ({ binary             => 1,
			       allow_loose_quotes => 1,
			       blank_is_undef     => 1,
			       escape_char        => undef,
	 #              escape_char        => "\"\",
			       });

my $file = @ARGV ? shift : "test.csv";
open my $fh, "<", $file or die "$file: $!\n";

my %err_eol = map { $_ => 1 } 2010, 2027, 2031, 2032;

print STDERR "Reading $file with Text::CSV_XS $Text::CSV_XS::VERSION\n";
while (1) {
    my $row = $csv->getline ($fh);

    unless ($row) {	# Parsing failed

	# Could be end of file
	$csv->eof and last;

	# Diagnose and show what was wrong
	my @diag = $csv->error_diag;
	print STDERR "$file line $./$diag[2] - $diag[0] - $diag[1]\n";
	my $ep  = $diag[2] - 1; # diag[2] is 1-based
	my $ein = $csv->error_input;	# The line scanned so far
	my $err = $ein . "         ";
	substr $err, $ep + 1, 0, "*";	# Bad character marked between **
	substr $err, $ep,     0, "*";
	($err = substr $err, $ep - 5, 12) =~ s/ +$//;
	print STDERR "    |$err|\n";

	REPARSE: {	# Now retry with allowed options
	    if ($csa->parse ($ein)) {
		print STDERR "Accepted in allow mode ...\n";
		$row = [ $csa->fields ];
		}
	    else {	# Still fails
		my @diag = $csa->error_diag;
		if (exists $err_eol{$diag[0]}) { # \r or \n inside field
		    print STDERR "  Extending line with next chunk\n";
		    $ein .= scalar <$fh>;
		    goto REPARSE;
		    }

		print STDERR "  Also could not parse it in allow mode\n";
		print STDERR "  $./$diag[2] - $diag[0] - $diag[1]\n";
		print STDERR "  Line skipped\n";
		next;
		}
	    }
	}

    # Data was fine, print data properly quoted
    $csv->print (*STDOUT, $row);
    }

Well, I don't know what the character is that is causing the problem but it appears you will need to remove it manually, something like:

my $row = $csv->getline ($fh);
s/[\r\n]// for @{$row};

Hi,

Thanks for the reply. I tried using the command as suggested by you, but the script is not coming to an end, I believe it is going to an infinite loop.

I have just inserted the following line in my script;

s/[\r\n]// for @{$row};

am I missing something?

On the other hand, I would like show my original Perl script which is generating a LDAP report in a csv format.

Can't we remove quotes from every field while generating csv report file?

#!/usr/bin/perl
#
# Copyright (c) 2004
#      Ali Onur Cinar &060;cinar&064;zdo.com&062;
#
# License:
#
#   Permission  to  use,  copy, modify, and  distribute  this software and its
#   documentation for  non-commercial  purposes  and  without  fee  is  hereby
#   granted,  provided  that the  above  copyright notice appear in all copies
#   and  that both  the  copyright  notice  and  this  permission  notice  and
#   warranty  disclaimer appear in supporting documentation, and that the name
#   of Ali Onur Cinar not be  used  in  advertising or publicity pertaining to
#   distribution of the software without specific, written prior permission.
#
# @author  A. Onur Cinar &lt;cinar(a)zdo.com&gt;
# @version $Id: ldap2csv.pl,v 1.1 2004/12/07 15:12:38 cinar Exp $
#


use Net::LDAP;

@fields =                                             # fields to export
(
#  Label         Field Name        Index
  ['First Name', 'givenName'      , 0],
  ['Last Name' , 'sn'             , 0],
  ['Title'     , 'title'          , 0],
  ['Email'     , 'mail'           , 0],
  ['Email 2'   , 'mail'           , 1],
  ['Phone'     , 'telephoneNumber', 0],
  ['Phone'     , 'telephoneNumber', 1],
  ['Phone'     , 'telephoneNumber', 2],
  ['Phone'     , 'telephoneNumber', 3],
  ['Phone'     , 'telephoneNumber', 4],
  ['Mobile'    , 'mobile'         , 0],
  ['Birthday'  , 'birthday'       , 0],
  ['Address'   , 'street'         , 0],
  ['Address'   , 'street'         , 1],
  ['Address'   , 'street'         , 2],
);


$ldap = Net::LDAP->new('localhost') or die "$@";      # open connection
$mesg = $ldap->bind(                                  # login
  "uid=test", password => "password");

$mesg = $ldap->search(                                # search
 base => "dc=abc",
 filter => "(objectClass=inetOrgPerson)"
);

$DATETIME=`date`;

sub get                                               # get field value
{
  return $_[0]->exists($_[1])
    ? $_[0]->get($_[1])->[$_[2] ? $_[2] : 0]
    : '';
}

print "$DATETIME";                # first line with file created date/time

print '"'.(join '","', "Report",                # header
  map {$_->[0]} @fields)."\"\n";

foreach $entry ($mesg->all_entries)                   # for each entry
{
  @dn = map {s/[a-z]+=//gi; $_ = ucfirst}             # organization
     reverse split /,/, $entry->dn;
  shift @dn;
  shift @dn;
  shift @dn;
  pop @dn;

  print '"'.(join '","', "@dn",                       # row
    map {get $entry, $_->[1], $_->[2]} @fields)."\"\n";
}

$mesg = $ldap->unbind;                                # close connection

Here is a simple test of the regexp I posted:

$row = [qq{this is\n a test},qq{this is\ranother test}];
s/[\r\n]// for @{$row};
print @{$row};

That appears to work fine. Why it causes your code to seemingly hang, I don't know. In the code you posted before, do this:

my $row = $csv->getline ($fh);
print $row,"\n";

and see what $row is. If it does not return something like ARRAY(0x155526c) then what does it print?

HI,

I am getting the following characters printed along with distorted data:

ARRAY(0x199bfa4)

On the other hand, is it possible to remove the quotes on the first perl script as mentioned above, which uses NET::LDAP module.
It is because if the quotes are removed from fields then there is no need to use the second perl script which escape to the quotes. We are seeing newline character issue on the output of second perl script, which uses Text::CSV_XS module, both of these script are pasted in this thread.

Can't this line be modified as per the need?

print '"'.(join '","', "@dn", # row
map {get $entry, $_->[1], $_->[2]} @fields)."\"\n";
}

csv files typically use quotes around fields that have text or non-digit data in them. If you don't want the quotes you can try this:

print (join ',' , "@dn", # row
map {get $entry, $_->[1], $_->[2]} @fields)."\n";
}

I see you posted the same question on another forum and provided more detail. My suggestion above will not do what you want.

Actually, I want the quotes to be removed from the fields and not the quotes which cover them.

For example:

my CSV file consists some entries like the following, notice the quotes within field 6,

"abc","1234","HCM","John D","JOHN D","rocks" 12","seigo"

my expected line should be something like this:

"abc","1234","HCM","John D","JOHN D","rocks 12","seigo"

Thanks in Advance.

This question has already been answered. Start a new discussion instead.