Hi,
I am attempting to read in a CSV file, modify certain arrays and output it into a new file. It works fine on a mac, however on Windows it does not do anything. Any ideas what im doing wrong? Code posted below;
open(FILE,"file.csv") or die $!;
open(TARGET,"> newfile.csv") or die $!;
while($line = <FILE>){
chomp($line);
next unless($line =~ m/\S+/);
@array = split("‰",$line);
if ($array[3] == 2)
{
$array[3] = "Read";
}
$line = join("‰",@array);
print TARGET $line ."\n";
}
close FILE;
close TARGET;
Thank you for your help,
Tango
Why not attach your file.csv to your post? (Look for the "Manage Attachments" button and attach your file.) Then someone who has Windows can try your script and may find the problem.
d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
Hi,
I am attempting to read in a CSV file, modify certain arrays and output it into a new file. It works fine on a mac, however on Windows it does not do anything. Any ideas what im doing wrong? Code posted below;
open(FILE,"file.csv") or die $!;
open(TARGET,"> newfile.csv") or die $!;
while($line = <FILE>){
chomp($line);
next unless($line =~ m/\S+/);
@array = split("‰",$line);
if ($array[3] == 2)
{
$array[3] = "Read";
}
$line = join("‰",@array);
print TARGET $line ."\n";
}
close FILE;
close TARGET;
Thank you for your help,
Tango
When you say "on Windows it does not do anything" do you mean that it creates the output file newfile.csv and writes the contents of file.csv into newfile.csv so that the output file contains the same data as the input file? That is what your script did when I ran it on my computer, which has Ubuntu (Linux), not Windows. Also, I noticed that when I added use strict; use warnings to the start of the script (always recommended) it gave me the following non-fatal errors:
Use of uninitialized value $array[3] in numeric eq (==) at /home/david/Programming/Perl/temp.pl line 14, <FILE> line 1.
Use of uninitialized value $array[3] in numeric eq (==) at /home/david/Programming/Perl/temp.pl line 14, <FILE> line 2.
Use of uninitialized value $array[3] in numeric eq (==) at /home/david/Programming/Perl/temp.pl line 14, <FILE> line 3.
Use of uninitialized value $array[3] in numeric eq (==) at /home/david/Programming/Perl/temp.pl line 14, <FILE> line 4.
Use of uninitialized value $array[3] in numeric eq (==) at /home/david/Programming/Perl/temp.pl line 14, <FILE> line 5.
d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
Can you tell us what encoding was used to create file.csv? My text editor guesses that it is encoded in Latin-1/ISO-8859-1 but that is just a guess. My text editor (Komodo Edit) displays file.csv as follows:
messageidâ°Nameâ°Time / Dateâ°Sent/Receivedâ°Readâ°Messageâ°Data Terminator
1â°Joâ°1â°Monâ°3â°1â°Message 1 here,°.
2â°Joâ°2â°Monâ°2â°1â°Message 2 hereâ°.
3â°Jimâ°1â°Wedâ°2â°1â°Message 3 hereâ°.
4â°Janeâ°2â°Wedâ°3â°1â°Message 4 here,°.
Your script tries to split on characters that look like "‰" so your split doesn't work as expected. My guess is that what you are doing wrong is not specifying the encoding for the input and output files. Perl has to guess what encoding to use and it seems to guess differently on a mac than it does on Windows or Linux.
d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
Hi, Thank you for your reply, sorry for the late reply back.
Yep, to create the CSV file I just logged into an SQL database and outputted the table using Ä∞ as the separator and .Ä∞ as the data terminator. How would I specify the input and output encoding in perl? I haven't specified any encoding on the mac side but im guess it could be defaulting and using some encoding scheme. I've also tried extracting the data from the SQL database on windows but I still run into the same problem when using the perl script.
The snippet you've put above of your readout is consistent with mine so your text editor is reading it correctly.
You can specify the encoding when opening the files (see script below), but we still don't know what the encoding is. You logged into an SQL database and exported the contents of a table so the SQL database encoded the resulting file in whatever the default encoding was for that SQL database. Although our text editors tell us the encoding is Latin-1/ISO-8859-1 that can't be correct because the infinity symbol that you use, ∞ , does not belong to the Latin-1 character set .
The following script assumes the correct encoding is utf-8. Running it results in some non-fatal error messages saying some of the line terminator values are not valid, but since we don't know the actual encoding it's the best I can do.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use utf8;
my $in_filename = 'file.csv';
my $out_filename = 'newfile.csv';
my $csv_delim = chr(0x2030);
open my $in_fh, '<:encoding(utf8)', $in_filename or die $!; # auto UTF-8 decoding on read
open(my $out_fh,'>:encoding(utf8)', $out_filename) or die $!;
while(my $line = <$in_fh>){
chomp($line);
next unless($line =~ m/\S+/);
my @array = split($csv_delim,$line);
if ($array[2] eq '2'){
$array[2] = "Read";
}
$line = join($csv_delim,@array);
print $out_fh $line ."\n";
}
close $in_fh;
close $out_fh;
d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
I guess the error messages indicate that utf8 is not the correct encoding. Please try the following script. It doesn't give me any errors on my Linux platform:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $in_filename = 'file.csv';
my $out_filename = 'newfile.csv';
my $csv_delim = chr(0x00e2) . chr(0x0080) . chr(0x00b0);
#Open files with auto Latin-1/ISO-8859-1 decoding/encoding on I/O
open my $in_fh, '<:encoding(ISO-8859-1)', $in_filename or die $!;
open(my $out_fh,'>:encoding(ISO-8859-1)', $out_filename) or die $!;
while(my $line = <$in_fh>){
chomp($line);
next unless($line =~ m/\S+/);
my @array = split($csv_delim,$line);
if ($array[2] eq '2'){
$array[2] = "Read";
}
$line = join($csv_delim,@array);
print $out_fh $line ."\n";
}
close $in_fh;
close $out_fh;
d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
Without knowing what encoding the database used when exporting the data to a file there's not much you can do except guess. Here's a test script in which I guess that the encoding might be UTF8:
#!/usr/bin/perl
use strict;
use warnings;
use 5.008;
use Encode qw(encode decode);
my $line = decode('UTF8', readline DATA);
#Looked up the following at http://www.unicodemap.org/search.asp
#Character Information Unicode Range Unicode Binary UCS-2 Binary
#"â" LATIN SMALL LETTER A WITH CIRCUMFLEX Latin-1 Supplement 0x00E2 0xE200
#"" Latin-1 Supplement 0x0080 0x8000
#"°" DEGREE SIGN Latin-1 Supplement 0x00B0 0xB000
my $csv_delim = chr(0x00E2) . chr(0x0080) . chr(0x00B0);
binmode STDOUT, ':encoding(UTF8)';
print "Let's try splitting with $csv_delim:\n";
my @array = split($csv_delim,$line);
my $index = 0;
foreach(@array){
print "Array element $index = $array[$index]\n";
$index++;
}
__DATA__
messageidâ°Nameâ°Time / Dateâ°Sent/Receivedâ°Readâ°Messageâ°Data Terminator
1â°Joâ°1â°Monâ°3â°1â°Message 1 here,°.
2â°Joâ°2â°Monâ°2â°1â°Message 2 hereâ°.
3â°Jimâ°1â°Wedâ°2â°1â°Message 3 hereâ°.
4â°Janeâ°2â°Wedâ°3â°1â°Message 4 here,°.
When I run it under Linux, and also on http://codepad.org/YkfPTjPb (try it there and see), I get the following output:
Let's try splitting with â°:
Array element 0 = messageid
Array element 1 = Name
Array element 2 = Time / Date
Array element 3 = Sent/Received
Array element 4 = Read
Array element 5 = Message
Array element 6 = Data Terminator
d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
What version of Windows do you use? I ran the following script using your file.csv on a Windows Vista SP2 platform and did not get any errors. Komodo Edit text editor for Windows says the file encoding is CP1253 so I modified the script based on that assumption.
#!/usr/bin/perl
use strict;
use warnings;
use 5.008;
use utf8;
use Encode qw(encode decode);
my $in_filename = 'C:/Users/David/Programming/Perl/file.csv';
my $out_filename = 'C:/Users/David/Programming/Perl/newfile.csv';
#When viewed under Windows Vista in Komodo Edit 6.1 delimiter looks like ‰
#Looked up ‰ at http://www.unicodemap.org/search.asp
my $csv_delim = chr(0x03B2) . chr(0x20AC) . chr(0x00B0);
open my $in_fh, '<', $in_filename or die $!;
open(my $out_fh,'>', $out_filename) or die $!;
while(my $line = <$in_fh>){
chomp($line);
next unless($line =~ m/\S+/);
my $line = decode('CP1253', $line);
my @array = split($csv_delim,$line);
if ($array[2] eq '2'){
$array[2] = "Read";
}
$line = join($csv_delim,@array);
$line = encode('CP1253', $line);
print $out_fh $line ."\n";
}
close $in_fh;
close $out_fh;
After running the script in Windows Vista, newfile.csv contains the following:
messageid‰Name‰Time / Date‰Sent/Received‰Read‰Message‰Data Terminator
1‰Jo‰1‰Mon‰3‰1‰Message 1 here,€°.
2‰Jo‰Read‰Mon‰2‰1‰Message 2 here‰.
3‰Jim‰1‰Wed‰2‰1‰Message 3 here‰.
4‰Jane‰Read‰Wed‰3‰1‰Message 4 here,€°.
d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
Ah I can see an issue here -- change the line
if ($array[2] eq '2'){
$array[2] = "Read";
}
to
if ($array[4] eq '2'){
$array[4] = "Read";
}
The result were looking for is:
messageid‰Name‰Time / Date‰Sent/Received‰Read‰Message‰Data Terminator
1‰Jo‰1‰Mon‰3‰1‰Message 1 here,€°.
2‰Jo‰Read‰Mon‰Read‰1‰Message 2 here‰. --- 'Read' is displayed rather than 2
3‰Jim‰1‰Wed‰Read‰1‰Message 3 here‰. --- 'Read' is displayed rather than 2
4‰Jane‰Read‰Wed‰3‰1‰Message 4 here,€°.
The idea of this script is to substitute numbers in certain array positions (i.e. array 4 if number 2 is there, change it to 'Read') than can then be read into something else
I'm sorry I have no idea how i've made this mistake and put array 2 rather than array 4, sorry if i've wasted anyones time, i really appreciate your help
That's OK. So would seem that once you make that change in the array index (4 instead of 2) the script should run correctly on both platforms, right? If so, please mark this thread solved.
d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
I'm stumped. It works for me with ActivePerl on Windows Vista. I can't test it on Windows XP because I don't have XP. When you run the following test script, to see if the split populated your array correctly, what does it output?
#!/usr/bin/perl
use strict;
use warnings;
use 5.008;
use utf8;
use Encode qw(encode decode);
use Data::Dumper;
my $in_filename = 'file.csv';
#When viewed under Windows Vista in Komodo Edit 6.1 delimiter looks like ‰
#Looked up ‰ at http://www.unicodemap.org/search.asp
my $csv_delim = chr(0x03B2) . chr(0x20AC) . chr(0x00B0);
open my $in_fh, '<', $in_filename or die $!;
my $line;
$line = <$in_fh>;#Read and skip the first couple of records
$line = <$in_fh>;#Read and skip the first couple of records
$line = <$in_fh>;#Read the message 2 record
$line = decode('CP1253', $line);
my @array = split($csv_delim,$line);
print Dumper(\@array); #Let's see what's in the array. Did the split work?
close $in_fh;
When I run it on Linux or Windows Vista it prints the following:
$VAR1 = [
'2',
'Jo',
'2',
'Mon',
'2',
'1',
'Message 2 here',
'.
'
];
If it prints something different for you on Windows XP then the split is not separating the data record into separate elements on Windows XP.
d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159