Hi,

I am attempting to read in a CSV file, modify certain arrays and output it into a new file. It works fine on a mac, however on Windows it does not do anything. Any ideas what im doing wrong? Code posted below;

open(FILE,"file.csv") or die $!;
open(TARGET,"> newfile.csv") or die $!;

while($line = <FILE>){
	chomp($line);
	next unless($line =~ m/\S+/);
	@array = split("‰",$line);
	
	
     if ($array[3] == 2)
	{
		
		$array[3] = "Read";
	}
			
	$line = join("‰",@array);
	
	print TARGET $line ."\n";
}
close FILE;
close TARGET;

Thank you for your help,
Tango

Recommended Answers

All 22 Replies

Hi,

I am attempting to read in a CSV file, modify certain arrays and output it into a new file. It works fine on a mac, however on Windows it does not do anything. Any ideas what im doing wrong? Code posted below;

open(FILE,"file.csv") or die $!;
open(TARGET,"> newfile.csv") or die $!;

while($line = <FILE>){
	chomp($line);
	next unless($line =~ m/\S+/);
	@array = split("‰",$line);
	
	
     if ($array[3] == 2)
	{
		
		$array[3] = "Read";
	}
			
	$line = join("‰",@array);
	
	print TARGET $line ."\n";
}
close FILE;
close TARGET;

Thank you for your help,
Tango

Why not attach your file.csv to your post? (Look for the "Manage Attachments" button and attach your file.) Then someone who has Windows can try your script and may find the problem.

Hi,

Example CSV attached - it wouldn't allow me to upload as a CSV so I have renamed the extension to TXT so you'll need to just rename it back.

Thank you,
Tango

Why not attach your file.csv to your post? (Look for the "Manage Attachments" button and attach your file.) Then someone who has Windows can try your script and may find the problem.

Hi,
Thank you for your reply, I have now attached the CSV file too.

Cheers,
Tango

Hi,

I am attempting to read in a CSV file, modify certain arrays and output it into a new file. It works fine on a mac, however on Windows it does not do anything. Any ideas what im doing wrong? Code posted below;

open(FILE,"file.csv") or die $!;
open(TARGET,"> newfile.csv") or die $!;

while($line = <FILE>){
	chomp($line);
	next unless($line =~ m/\S+/);
	@array = split("‰",$line);
	
	
     if ($array[3] == 2)
	{
		
		$array[3] = "Read";
	}
			
	$line = join("‰",@array);
	
	print TARGET $line ."\n";
}
close FILE;
close TARGET;

Thank you for your help,
Tango

When you say "on Windows it does not do anything" do you mean that it creates the output file newfile.csv and writes the contents of file.csv into newfile.csv so that the output file contains the same data as the input file? That is what your script did when I ran it on my computer, which has Ubuntu (Linux), not Windows. Also, I noticed that when I added use strict; use warnings to the start of the script (always recommended) it gave me the following non-fatal errors:

Use of uninitialized value $array[3] in numeric eq (==) at /home/david/Programming/Perl/temp.pl line 14, <FILE> line 1.
Use of uninitialized value $array[3] in numeric eq (==) at /home/david/Programming/Perl/temp.pl line 14, <FILE> line 2.
Use of uninitialized value $array[3] in numeric eq (==) at /home/david/Programming/Perl/temp.pl line 14, <FILE> line 3.
Use of uninitialized value $array[3] in numeric eq (==) at /home/david/Programming/Perl/temp.pl line 14, <FILE> line 4.
Use of uninitialized value $array[3] in numeric eq (==) at /home/david/Programming/Perl/temp.pl line 14, <FILE> line 5.

Can you tell us what encoding was used to create file.csv? My text editor guesses that it is encoded in Latin-1/ISO-8859-1 but that is just a guess. My text editor (Komodo Edit) displays file.csv as follows:

messageidâ°Nameâ°Time / Dateâ°Sent/Receivedâ°Readâ°Messageâ°Data Terminator
1â°Joâ°1â°Monâ°3â°1â°Message 1 here,°.
2â°Joâ°2â°Monâ°2â°1â°Message 2 hereâ°.
3â°Jimâ°1â°Wedâ°2â°1â°Message 3 hereâ°.
4â°Janeâ°2â°Wedâ°3â°1â°Message 4 here,°.

Your script tries to split on characters that look like "‰" so your split doesn't work as expected. My guess is that what you are doing wrong is not specifying the encoding for the input and output files. Perl has to guess what encoding to use and it seems to guess differently on a mac than it does on Windows or Linux.

Hi, Thank you for your reply, sorry for the late reply back.

Yep, to create the CSV file I just logged into an SQL database and outputted the table using Ä∞ as the separator and .Ä∞ as the data terminator. How would I specify the input and output encoding in perl? I haven't specified any encoding on the mac side but im guess it could be defaulting and using some encoding scheme. I've also tried extracting the data from the SQL database on windows but I still run into the same problem when using the perl script.

The snippet you've put above of your readout is consistent with mine so your text editor is reading it correctly.

Hi, Thank you for your reply, sorry for the late reply back.

Yep, to create the CSV file I just logged into an SQL database and outputted the table using Ä∞ as the separator and .Ä∞ as the data terminator. How would I specify the input and output encoding in perl? I haven't specified any encoding on the mac side but im guess it could be defaulting and using some encoding scheme. I've also tried extracting the data from the SQL database on windows but I still run into the same problem when using the perl script.

The snippet you've put above of your readout is consistent with mine so your text editor is reading it correctly.

You can specify the encoding when opening the files (see script below), but we still don't know what the encoding is. You logged into an SQL database and exported the contents of a table so the SQL database encoded the resulting file in whatever the default encoding was for that SQL database. Although our text editors tell us the encoding is Latin-1/ISO-8859-1 that can't be correct because the infinity symbol that you use, , does not belong to the Latin-1 character set.

The following script assumes the correct encoding is utf-8. Running it results in some non-fatal error messages saying some of the line terminator values are not valid, but since we don't know the actual encoding it's the best I can do.

#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use utf8;

my $in_filename = 'file.csv';
my $out_filename = 'newfile.csv';
my $csv_delim = chr(0x2030);

open my $in_fh, '<:encoding(utf8)', $in_filename or die $!;  # auto UTF-8 decoding on read
open(my $out_fh,'>:encoding(utf8)', $out_filename) or die $!;

while(my $line = <$in_fh>){
    
    chomp($line);
    next unless($line =~ m/\S+/);
    my @array = split($csv_delim,$line);

    if ($array[2] eq '2'){
        
        $array[2] = "Read";
    }
                    
    $line = join($csv_delim,@array);
    
    print $out_fh $line ."\n";
}
close $in_fh;
close $out_fh;

Thank you very much. Im sorry I am unable to provide you with any more detailed information, I will see if I can work out the encoding. I will test out the code shortly and get back to you.

Thank you again for your time and effort

Update: the responce I get when running that snippet is

utf8 "\89" does not map to Unicode at test.pl line 14, ($in_fh) line 1.
(for every line in csv file)

I guess the error messages indicate that utf8 is not the correct encoding. Please try the following script. It doesn't give me any errors on my Linux platform:

#!/usr/bin/perl
use strict;
use warnings;
use 5.010;

my $in_filename = 'file.csv';
my $out_filename = 'newfile.csv';
my $csv_delim = chr(0x00e2) . chr(0x0080) . chr(0x00b0);

#Open files with auto Latin-1/ISO-8859-1 decoding/encoding on I/O
open my $in_fh, '<:encoding(ISO-8859-1)', $in_filename or die $!;
open(my $out_fh,'>:encoding(ISO-8859-1)', $out_filename) or die $!;

while(my $line = <$in_fh>){
    
    chomp($line);
    next unless($line =~ m/\S+/);
    my @array = split($csv_delim,$line);

    if ($array[2] eq '2'){
        
        $array[2] = "Read";
    }
                    
    $line = join($csv_delim,@array);
    
    print $out_fh $line ."\n";
}
close $in_fh;
close $out_fh;

I guess the error messages indicate that utf8 is not the correct encoding. Please try the following script. It doesn't give me any errors on my Linux platform:

#!/usr/bin/perl
use strict;
use warnings;
use 5.010;

my $in_filename = 'file.csv';
my $out_filename = 'newfile.csv';
my $csv_delim = chr(0x00e2) . chr(0x0080) . chr(0x00b0);

#Open files with auto Latin-1/ISO-8859-1 decoding/encoding on I/O
open my $in_fh, '<:encoding(ISO-8859-1)', $in_filename or die $!;
open(my $out_fh,'>:encoding(ISO-8859-1)', $out_filename) or die $!;

while(my $line = <$in_fh>){
    
    chomp($line);
    next unless($line =~ m/\S+/);
    my @array = split($csv_delim,$line);

    if ($array[2] eq '2'){
        
        $array[2] = "Read";
    }
                    
    $line = join($csv_delim,@array);
    
    print $out_fh $line ."\n";
}
close $in_fh;
close $out_fh;

Hi,

Sorry for late reply, bank holiday breaks have been causing problems.

I have tested the code however it still displays the same error on my windows laptop
'Use of uninitialized value $array[2] in string eq at FILEPATH line 20, <$in_fh> line (EVERYLINE)


I have also tried modifying the line:

if ($array[2] eq '2'){
        
        $array[2] = "Read";
    }

to:

if ($array[2] == '2'){

if ($array[2] == 2){

if ($array[2] eq 2){

Mac and Linux don't seem to have a problem with it and it works fine, however on windows, it generates the new file and copies over all the data but does not modify the set array value for 2.

Cheers

Without knowing what encoding the database used when exporting the data to a file there's not much you can do except guess. Here's a test script in which I guess that the encoding might be UTF8:

#!/usr/bin/perl
use strict;
use warnings;
use 5.008;

use Encode qw(encode decode);
my $line   = decode('UTF8', readline DATA);

#Looked up the following at http://www.unicodemap.org/search.asp
#Character Information	Unicode Range	Unicode Binary	UCS-2 Binary
#"â"	LATIN SMALL LETTER A WITH CIRCUMFLEX	Latin-1 Supplement	0x00E2	0xE200
#""		Latin-1 Supplement	0x0080	0x8000
#"°"	DEGREE SIGN	Latin-1 Supplement	0x00B0	0xB000

my $csv_delim = chr(0x00E2) . chr(0x0080) . chr(0x00B0);

binmode STDOUT, ':encoding(UTF8)';
print "Let's try splitting with $csv_delim:\n";

my @array = split($csv_delim,$line);

my $index = 0;
foreach(@array){
    print "Array element $index = $array[$index]\n";
    $index++;
}

__DATA__
messageidâ°Nameâ°Time / Dateâ°Sent/Receivedâ°Readâ°Messageâ°Data Terminator
1â°Joâ°1â°Monâ°3â°1â°Message 1 here,°.
2â°Joâ°2â°Monâ°2â°1â°Message 2 hereâ°.
3â°Jimâ°1â°Wedâ°2â°1â°Message 3 hereâ°.
4â°Janeâ°2â°Wedâ°3â°1â°Message 4 here,°.

When I run it under Linux, and also on http://codepad.org/YkfPTjPb (try it there and see), I get the following output:

Let's try splitting with â°:
Array element 0 = messageid
Array element 1 = Name
Array element 2 = Time / Date
Array element 3 = Sent/Received
Array element 4 = Read
Array element 5 = Message
Array element 6 = Data Terminator

What version of Windows do you use? I ran the following script using your file.csv on a Windows Vista SP2 platform and did not get any errors. Komodo Edit text editor for Windows says the file encoding is CP1253 so I modified the script based on that assumption.

#!/usr/bin/perl
use strict;
use warnings;
use 5.008;
use utf8;
use Encode qw(encode decode);
my $in_filename = 'C:/Users/David/Programming/Perl/file.csv';
my $out_filename = 'C:/Users/David/Programming/Perl/newfile.csv';

#When viewed under Windows Vista in Komodo Edit 6.1 delimiter looks like ‰
#Looked up ‰ at http://www.unicodemap.org/search.asp
my $csv_delim = chr(0x03B2) . chr(0x20AC) . chr(0x00B0);

open my $in_fh, '<', $in_filename or die $!;
open(my $out_fh,'>', $out_filename) or die $!;

while(my $line = <$in_fh>){
    chomp($line);
    next unless($line =~ m/\S+/);
    
    my $line   = decode('CP1253', $line);
    my @array = split($csv_delim,$line);

    if ($array[2] eq '2'){
        $array[2] = "Read";
    }
                    
    $line = join($csv_delim,@array);
    $line = encode('CP1253', $line);
    print $out_fh $line ."\n";
}
close $in_fh;
close $out_fh;

After running the script in Windows Vista, newfile.csv contains the following:

messageid‰Name‰Time / Date‰Sent/Received‰Read‰Message‰Data Terminator
1‰Jo‰1‰Mon‰3‰1‰Message 1 here,€°.
2‰Jo‰Read‰Mon‰2‰1‰Message 2 here‰.
3‰Jim‰1‰Wed‰2‰1‰Message 3 here‰.
4‰Jane‰Read‰Wed‰3‰1‰Message 4 here,€°.

Ah I can see an issue here -- change the line

if ($array[2] eq '2'){
        $array[2] = "Read";
    }

to

if ($array[4] eq '2'){
        $array[4] = "Read";
    }

The result were looking for is:

messageid‰Name‰Time / Date‰Sent/Received‰Read‰Message‰Data Terminator
1‰Jo‰1‰Mon‰3‰1‰Message 1 here,€°.
2‰Jo‰Read‰Mon‰Read‰1‰Message 2 here‰.      --- 'Read' is displayed rather than 2
3‰Jim‰1‰Wed‰Read‰1‰Message 3 here‰.         --- 'Read' is displayed rather than 2
4‰Jane‰Read‰Wed‰3‰1‰Message 4 here,€°.

The idea of this script is to substitute numbers in certain array positions (i.e. array 4 if number 2 is there, change it to 'Read') than can then be read into something else

I'm sorry I have no idea how i've made this mistake and put array 2 rather than array 4, sorry if i've wasted anyones time, i really appreciate your help

Ah I can see an issue here -- change the line

if ($array[2] eq '2'){
        $array[2] = "Read";
    }

to

if ($array[4] eq '2'){
        $array[4] = "Read";
    }

The result were looking for is:

messageid‰Name‰Time / Date‰Sent/Received‰Read‰Message‰Data Terminator
1‰Jo‰1‰Mon‰3‰1‰Message 1 here,€°.
2‰Jo‰Read‰Mon‰Read‰1‰Message 2 here‰.      --- 'Read' is displayed rather than 2
3‰Jim‰1‰Wed‰Read‰1‰Message 3 here‰.         --- 'Read' is displayed rather than 2
4‰Jane‰Read‰Wed‰3‰1‰Message 4 here,€°.

The idea of this script is to substitute numbers in certain array positions (i.e. array 4 if number 2 is there, change it to 'Read') than can then be read into something else

I'm sorry I have no idea how i've made this mistake and put array 2 rather than array 4, sorry if i've wasted anyones time, i really appreciate your help

That's OK. So would seem that once you make that change in the array index (4 instead of 2) the script should run correctly on both platforms, right? If so, please mark this thread solved.

Hi,

I have change the array to array 4 and to array 5 in separate tests - still NO luck im afraid. I can't work out what i'm doing wrong :-S .

It reads in fine, and outputs fine but doesn't change the value 2 to 'Read'.

I got the same error again on my cmd window:

Use of uninitialized value $array[5] in string eq at test.pl line 24, <in_fh> line EVERYLINENO

Question: To use perl on windows XP I had to install a freeware program called 'Strawberry-perl' and im wondering if its that. What does everyone else use?

Thank you for your persistence and time,
Tango

Update: I've just replaced strawberry perl with active perl but it has made no difference, same errors etc.

I'm stumped. It works for me with ActivePerl on Windows Vista. I can't test it on Windows XP because I don't have XP. When you run the following test script, to see if the split populated your array correctly, what does it output?

#!/usr/bin/perl
use strict;
use warnings;
use 5.008;
use utf8;
use Encode qw(encode decode);
use Data::Dumper;

my $in_filename = 'file.csv';

#When viewed under Windows Vista in Komodo Edit 6.1 delimiter looks like ‰
#Looked up ‰ at http://www.unicodemap.org/search.asp
my $csv_delim = chr(0x03B2) . chr(0x20AC) . chr(0x00B0);

open my $in_fh, '<', $in_filename or die $!;

my $line;
$line = <$in_fh>;#Read and skip the first couple of records
$line = <$in_fh>;#Read and skip the first couple of records
$line = <$in_fh>;#Read the message 2 record
$line   = decode('CP1253', $line);
my @array = split($csv_delim,$line);

print Dumper(\@array); #Let's see what's in the array. Did the split work?

close $in_fh;

When I run it on Linux or Windows Vista it prints the following:

$VAR1 = [
          '2',
          'Jo',
          '2',
          'Mon',
          '2',
          '1',
          'Message 2 here',
          '.
'
        ];

If it prints something different for you on Windows XP then the split is not separating the data record into separate elements on Windows XP.

Hi,

Thank you for the test script. The script appears to split the array correctly which has made me go back and rebuild my csv files where I still ran into the same problem.

However, when I manually changed all the delimiters from Ä∞" (which looks like a percent sign with an extra o at the back on the mac first created on (i.e. o/oo - called per mill and can be seen http://www.nouilles.info/keyboard_shortcuts.html) to the delimiter you are using it works fine!). Any ideas how I can get this delimited to work without having to change the csv building method changing the delimiter?

Thank you so much for your help and patience with this, I have added rep points for your help and hard work. Im sorry if i've wasted anytime through this issue,

cheers, Tango

When I Googled "per mille sign" encoding I saw this document which says

There is a "per mille
# sign" variant of the Mac OS Gaelic encoding that still
# maps 0xE4 to U+2030; this can be used for older fonts.
# Note: U+204A is new with Unicode 3.0; for earlier Unicode
# versions, Mac OS Gaelic was unified with AMPERSAND.

which seems too arcane for me to understand except it made me think "Let's try the 'MacGaelic' encoding." Unfortunately, that doesn't work for me because I don't have a 'MacGaelic' encoding on my computer. However, I did try 'MacRoman' in the following script which seems to work OK under Linux (and under Windows Vista also).

#!/usr/bin/perl
use strict;
use warnings;
use 5.008;
use utf8;
use Encode qw(encode decode);
my $in_filename = 'file.csv';
my $out_filename = 'newfile.csv';

#http://www.unicodemap.org/search.asp?search=%E2%80%9A%C3%84%E2%88%9E
my $csv_delim = chr(0x201A) . chr(0x00C4) . chr(0x221E);

open my $in_fh, '<', $in_filename or die $!;
open(my $out_fh,'>', $out_filename) or die $!;

while(my $line = <$in_fh>){
    chomp($line);
    next unless($line =~ m/\S+/);
    
    my $line   = decode('MacRoman', $line);
    my @array = split($csv_delim,$line);

    if ($array[4] eq '2'){
        $array[4] = "Read";
    }
                    
    $line = join($csv_delim,@array);
    $line = encode('MacRoman', $line);
    print $out_fh $line ."\n";
}
close $in_fh;
close $out_fh;

On Linux, running the above script puts the following in newfile.csv:

messageid‰Name‰Time / Date‰Sent/Received‰Read‰Message‰Data Terminator
1‚Ä∞Jo‚Ä∞1‚Ä∞Mon‚Ä∞3‚Ä∞1‚Ä∞Message 1 here,Ä∞.
2‰Jo‰2‰Mon‰Read‰1‰Message 2 here‰.
3‰Jim‰1‰Wed‰Read‰1‰Message 3 here‰.
4‚Ä∞Jane‚Ä∞2‚Ä∞Wed‚Ä∞3‚Ä∞1‚Ä∞Message 4 here,Ä∞.

Hi, on the example it works however on the files extracted from SQL databases it doesn't. It only seems to work on text files I have manually created. Can you test to see if this is just my machine causing this problem please? I have attached an example extracted file to this message. I have also attached previous perl script.

Due to upload restrictions;
Please rename NewTesting.txt to NewTesting.pl
Please rename file.txt to file.csv

Thank you so much

Hi, on the example it works however on the files extracted from SQL databases it doesn't. It only seems to work on text files I have manually created. Can you test to see if this is just my machine causing this problem please? I have attached an example extracted file to this message. I have also attached previous perl script.

Due to upload restrictions;
Please rename NewTesting.txt to NewTesting.pl
Please rename file.txt to file.csv

Thank you so much

The script doesn't split the data properly from the file you attached most recently. However when I look at the file you previously attached and the most recent file the delimiter characters don't appear the same. My text editor assumes they are both encoded as Latin-1 and they look like this:

Previous file.csv:

messageidâ°Nameâ°Time / Dateâ°Sent/Receivedâ°Readâ°Messageâ°Data Terminator
1â°Joâ°1â°Monâ°3â°1â°Message 1 here,°.
2â°Joâ°2â°Monâ°2â°1â°Message 2 hereâ°.
3â°Jimâ°1â°Wedâ°2â°1â°Message 3 hereâ°.
4â°Janeâ°2â°Wedâ°3â°1â°Message 4 here,°.

Current file.csv:

NumberAB FirstTime / DateSent/ReceivedReadMessageData Terminator
1JoMon32message 1.
2JoMon21message 2.
3JimWed32message 3.
4JaneWed21message 4.

See? The first one seems to have extra characters in the delimiter.

To get it to work for me with the most recent file I had to change the delimiter character used to split and join and change my guess for the encoding back to LATIN-1, in the following:

#!/usr/bin/perl
use strict;
use warnings;
use 5.008;
use utf8;
use Encode qw(encode decode);
my $in_filename = 'file.csv';
my $out_filename = 'newfile.csv';

#http://www.unicodemap.org/search.asp?search=%E2%80%9A%C3%84%E2%88%9E
#my $csv_delim = chr(0x201A) . chr(0x00C4) . chr(0x221E);#Worked for your old file
my $csv_delim = chr(0x0089);#Works for your recently attached file

open my $in_fh, '<', $in_filename or die $!;
open(my $out_fh,'>', $out_filename) or die $!;

while(my $line = <$in_fh>){
    chomp($line);
    next unless($line =~ m/\S+/);
    
    my $line   = decode('LATIN-1', $line);
    my @array = split($csv_delim,$line);

    if ($array[4] eq '2'){
        $array[4] = "Read";
    }
                    
    $line = join($csv_delim,@array);
    $line = encode('LATIN-1', $line);
    print $out_fh $line ."\n";
}
close $in_fh;
close $out_fh;

This puts the following in newfile.csv:

NumberAB FirstTime / DateSent/ReceivedReadMessageData Terminator
1JoMon3Readmessage 1.
2JoMon21message 2.
3JimWed3Readmessage 3.
4JaneWed21message 4.
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.