Hello Dear DaniWeb community

Here I am again, with a new problem :-)

For those of you that are reading every post you probably have already seen that I'm trying to implement an auto-extracting process which extract data from ActiveDirectory, process it in order to make a filtered csv file with specific datas (these data are published in an intranet phonebook). User's data are processed in the filter, and converted into Excel file so that a manager can modify manually the data. And then they are re-imported into the AD in order to maintain updated data in it.

Well my problem is that because of special characters in user's info in ActiveDirectory, I have to use operator "-u" in csvde. And even with that I encounter hex values instead of string.

I think that these hex values are generated every time there is a \n -newline- in the string.

Here is an example:

Header:
DN,cn,sn,c,l,st,title,postalCode,telephoneNumber,facsimileTelephoneNumber,givenName,co,department,streetAddress,mailNickname,userAccountControl,lastLogon,accountExpires,logonCount,objectCategory,mail,mobile,physicalDeliveryOfficeName

"CN=BERRY Richard,OU=TestFinance,OU=HR,OU=CORP,OU=CR,DC=mycorp,DC=com","BERRY Richard",BERRY,FR,"Puylouvier",FRANCE,,56000,"+33 1 23 45 67 89","+ 33 1 23 45 67 80",Richard,FRANCE,IT,X'332c206176656e7565204e6577746f6e0d0a',Bob.Malone,Richard.BERRY@mycorp.net," -",

For the field "streetAddress" we have the value "X'332c206176656e7565204e6577746f6e0d0a'", which is a string encoded in hex, and means "3, avenue Newton
" once decoded. As you can see there's a newline just after "Newton".

When I try to filter the csv file I try to search values starting with "X'" and then convert the rest of the value from hex to string, but I have some trouble...

Any idea ?

Bastien
PS: sorry for the very long explanation !

The matching of the beginning X and removal of the ticks is left up to you...

my $mem='332c206176656e7565204e6577746f6e0d0a';
 $mem=~s/([a-fA-F0-9]{2})/chr(hex $1)/eg;
 print "$mem\n";
Comments
Great decode function, works for me !
Nice regex.

BTW "0d0a" is line feed/carriage return together - like a line terminator in Windows, so whoever input this information probably was using windows to input that line and hit "enter" at the end. Any line with those codes ("0d0a") usually comes from windows. You might want to remove those lines after decoding. It's up to you.

Thanks a lot Mike !

I'll try to implement that and publish the code I'll be using.

Regards,
Bastien

Hi Mike and the rest of Daniweb community,

I've been trying to implement that but I always fail to do it exactly how I want it.

My goal is to:
- first search into the CSV file for some value that starts with "X'"
- then take this value, drop the X' and the ending ' which is a the end of the field (always the 2 first character of the string and the last one)
- into this string drop the contained values "0d0a" (thanks for the Microsoft line terminator you found out Mike) and use the resulting string (a "0d0a" can be met several times, and in the middle of the string...)
- convert the value into a human-understandable format -with the " $mem=~s/([a-fA-F0-9]{2})/chr(hex $1)/eg; " Mike previously provided
- and then finaly print the resultant line

And I just fail to do all that stuff...

A little precision: the value X'abcdefgh' can be met 2 or 3 times in the same line of the CSV file, and of course into several lines...

Bastien

Mike,

Yes there's a header, but I can drop it very easily and put it back after processing the file with two separate one-liners perl command (I already got them).

The fields remains quoted because I have this issue with the column "DN" that has some comas in it... I can also drop the quotes (and has already done so with other fields), but then I encounter problems with the DN column that adds some comas and thus gave me a heart attack the first time I processed a csv file without quotes ;-)

Bastien

Edited 6 Years Ago by BastienP: n/a

#!/usr/bin/perl
use strict;
use warnings;

#For testing, let's assign a string constructed with q(), which puts single
#quotes around text that may contain quotes. You can read from your file instead.
my $rec = q("CN=BERRY Richard,OU=TestFinance,OU=HR,OU=CORP,OU=CR,DC=mycorp,DC=com","BERRY Richard",BERRY,FR,"Puylouvier",FRANCE,,56000,"+33 1 23 45 67 89","+ 33 1 23 45 67 80",Richard,FRANCE,IT,X'332c206176656e7565204e6577746f6e0d0a',Bob.Malone,Richard.BERRY@mycorp.net," -",);
#Remove X and ticks, call convert subroutine with data from between the ticks
#Substitute with string returned from convert subroutine.
$rec =~ s/X'([a-fA-F0-9]+)'/convert($1)/eg;
print $rec;

sub convert{
    #This subroutine contains Mike's regex to substitute hex digits with
    #the corresponding characters
    my $mem = $_[0];
    $mem =~ s/0d0a//; #Remove 0d0a from string of hex digits
    $mem=~s/([a-fA-F0-9]{2})/chr(hex $1)/eg;
    chomp $mem;
    return $mem;
}

Why not combine the

Text::CSV

Example David provided a few threads ago with the convert subroutine, based on David's code, based on my code? I think it should work fine.

Hi Mike, David,

Thanks to both of you, I'll try this (probably tonight, French time), and will keep you posted !

I'll put the final code I'm using of course.

Bastien

Hello guys,

I've finaly found some time to test/implement the hex decryption algorythm.

I've written this:

#! /usr/bin/perl -w

use strict;
use warnings;
use diagnostics;
#use Encode qw( _utf8_on );
binmode STDOUT, ":utf8";


#open(INFILE,"<:raw:encoding(utf-16):crlf:utf8", "1.csv");
open(INFILE, "<1.csv");
open(OUTFILE, ">out.csv");
while(<INFILE>)
{
	my($rec) = $_;
	chomp($rec);
	$rec =~ s/X'([a-fA-F0-9]+)'/convert($1)/eg;

 print OUTFILE "$rec\n";
}
close(INFILE);
close(OUTFILE);

sub convert{
    #This subroutine contains Mike's regex to substitute hex digits with
    #the corresponding characters
    my $mem = $_;
    #Remove 0d0a from string of hex digits
    $mem =~ s/0d0a//; 
    $mem=~s/([a-fA-F0-9]{2})/chr(hex $1)/eg;
    chomp $mem;
    return $mem;
}

The problem I have is that the decode is done on the entire line and isn't dropping the "X'".

I tried to implement it using David's algorythm and Text::CSV module but haven't found how to search for an value of array begining with X'.

Regards,
Bastien

Of course I've tried to add a line like this one:

$rec =~ s/X'//eg;

for the X' deletion. But I thought that there could be some line without the hex encryption and containing X', so I wanted to drop that only for the values beginning with X'. These values are of course hex encrypted values, I don't see why some other value would start with that.

Edited 6 Years Ago by BastienP: n/a

Weird...

If I add the line like that:

#! /usr/bin/perl -w

use strict;
use warnings;
use diagnostics;
#use Encode qw( _utf8_on );
binmode STDOUT, ":utf8";


#open(INFILE,"<:raw:encoding(utf-16):crlf:utf8", "1.csv");
open(INFILE, "<1.csv");
open(OUTFILE, ">out.csv");
while(<INFILE>)
{
	my($rec) = $_;
	chomp($rec);
	$rec =~ s/X\'//;
	$rec =~ s/X'([a-fA-F0-9]+)'/convert($1)/eg;
 print OUTFILE "$rec\n";
}
close(INFILE);
close(OUTFILE);

sub convert{
    #This subroutine contains Mike's regex to substitute hex digits with
    #the corresponding characters
    my $mem = $_;
    #Remove 0d0a from string of hex digits
    $mem =~ s/0d0a//; 
    $mem =~ s/X\'//;
    $mem=~s/([a-fA-F0-9]{2})/chr(hex $1)/eg;
    chomp $mem;
    return $mem;
}

Then it drops only some of them, in what appears to be a random order.

And if I modify the line by "$rec =~ s/^X\'//eg;" then it drops nothing.

However, I still don't find how to find the correct field in csv values (the ones starting with X'), and do a decrypt only on this field.

Anyway thanks for the help/advices !
Bastien

Let me try to rewrite your program from scratch... I can't do it right this minute, but maybe later tonight.

Something has changed in the convert subroutine in the last couple of scripts you posted.

##my $mem = $_; #This does not contain the argument passed to the sub
my $mem = $_[0]; #The first (and only) element in the array of args, @_

When I run the following script, which reads the attached file, I get the attached output file. (If, by chance, this version of the script works for you, then there's no need to send us your data files. Otherwise, please attach your input file and we'll have look at it.)

#!/usr/bin/perl
#TextCSVBastienP.pl
use strict;
use warnings;
use diagnostics;
#use Encode qw( _utf8_on );
#binmode STDOUT, ":utf8";
my $dir = '/home/david/Programming/Perl/data';
my $in = $dir . '/' . '1.csv.txt'; #Added .txt in order to attach files to post
my $out = $dir . '/' . 'out.csv.txt';#Daniweb requires .txt extension to attach 
#open(INFILE,"<:raw:encoding(utf-16):crlf:utf8", "1.csv");
open(INFILE, '<', $in);
open(OUTFILE, '>', $out);
while(<INFILE>)
{
	my $rec = $_;
	chomp($rec);
	$rec =~ s/X'([a-fA-F0-9]+)'/convert($1)/eg;
        print OUTFILE "$rec\n";
}
close(INFILE);
close(OUTFILE);

sub convert{
    #This subroutine contains Mike's regex to substitute hex digits with
    #the corresponding characters
    my $mem = $_[0]; #The first (and only) element in the array of args, @_
    $mem =~ s/0d0a//; #Remove 0d0a from string of hex digits
    $mem=~s/([a-fA-F0-9]{2})/chr(hex $1)/eg;
    chomp $mem;
    return $mem;
}
Comments
As usual, a very helpful post :-)
Attachments
"CN=BERRY Richard,OU=TestFinance,OU=HR,OU=CORP,OU=CR,DC=mycorp,DC=com","BERRY Richard",BERRY,FR,"Puylouvier",FRANCE,,56000,"+33 1 23 45 67 89","+ 33 1 23 45 67 80",Richard,FRANCE,IT,X'332c206176656e7565204e6577746f6e0d0a',Bob.Malone,Richard.BERRY@mycorp.net," -",
"CN=MCDONALD Ronald,OU=TestFinance,OU=HR,OU=CORP,OU=CR,DC=mycorp,DC=com","MCDONALD Ronald",MCDONALD,CA,"Puylouvier",CANADA,,56000,"+33 1 23 45 67 89","+ 33 1 23 45 67 80",Ronald,CANADA,IT,X'3434302c20426f756c65766172642052656ec3a92d4cc3a976657371756520572c204d6f6e747265616c2c2051432048325a3156372c2043410d0a',Bob.Malone,Ronald.MCDONALD@mycorp.net," -",
"CN=BERRY Richard,OU=TestFinance,OU=HR,OU=CORP,OU=CR,DC=mycorp,DC=com","BERRY Richard",BERRY,FR,"Puylouvier",FRANCE,,56000,"+33 1 23 45 67 89","+ 33 1 23 45 67 80",Richard,FRANCE,IT,3, avenue Newton,Bob.Malone,Richard.BERRY@mycorp.net," -",
"CN=MCDONALD Ronald,OU=TestFinance,OU=HR,OU=CORP,OU=CR,DC=mycorp,DC=com","MCDONALD Ronald",MCDONALD,CA,"Puylouvier",CANADA,,56000,"+33 1 23 45 67 89","+ 33 1 23 45 67 80",Ronald,CANADA,IT,440, Boulevard Ren-Lvesque W, Montreal, QC H2Z1V7, CA,Bob.Malone,Ronald.MCDONALD@mycorp.net," -",

This works for me....

use Text::CSV;

my @rows;
 my $csv = Text::CSV->new ( { binary => 1 } )  # should set binary attribute.
                 or die "Cannot use CSV: ".Text::CSV->error_diag ();
 
 open my $fh, "<:encoding(utf8)", "1.csv" or die "1.csv: $!";
 while ( my $row = $csv->getline( $fh ) ) {
     #$row->[2] =~ m/pattern/ or next; # 3rd field should match
     push @rows, $row;
 }
 $csv->eof or $csv->error_diag();
 close $fh;
my $x=0;
open (OUT, ">","out.csv");
for (@rows){
   $x++;
   if($x==1){
   	print OUT "@$_\n"; #prints out the header fields
   	next;
   }
   print OUT "\"";
   for (@$_){ #this runs through the fields one at a time
 		my $rec=$_;
 		if(/^X\'/){
 			$rec =~ s/^X\'//eg;
 			$rec =~ s/\'$//eg;	
 			$rec=~s/([a-fA-F0-9]{2})/chr(hex $1)/eg;
 	    $rec=~s/\n//eg;
 	    $rec=~s/\r//eg;
 		}
 		print OUT "$rec\"\,\"";
	 } 	
	 print OUT "\n";
}
close OUT;

Here is my input file

cn,name,last,lang,city,country,unknown,postal_code,phone1,phone2,first,country,unknown2,street,other_name,email,unknown3,unknown4
"CN=BERRY Richard,OU=TestFinance,OU=HR,OU=CORP,OU=CR,DC=mycorp,DC=com","BERRY Richard",BERRY,FR,"Puylouvier",FRANCE,,56000,"+33 1 23 45 67 89","+ 33 1 23 45 67 80",Richard,FRANCE,IT,X'332c206176656e7565204e6577746f6e0d0a',Bob.Malone,Richard.BERRY@mycorp.net," -",
"CN=BERRY Richard,OU=TestFinance,OU=HR,OU=CORP,OU=CR,DC=mycorp,DC=com","BERRY Richard",BERRY,FR,"Puylouvier",FRANCE,,56000,"+33 1 23 45 67 89","+ 33 1 23 45 67 80",Richard,FRANCE,IT,X'332c206176656e7565204e6577746f6e0d0a',Bob.Malone,Richard.BERRY@mycorp.net," -",
"CN=BERRY Richard,OU=TestFinance,OU=HR,OU=CORP,OU=CR,DC=mycorp,DC=com","BERRY Richard",BERRY,FR,"Puylouvier",FRANCE,,56000,"+33 1 23 45 67 89","+ 33 1 23 45 67 80",Richard,FRANCE,IT,X'332c206176656e7565204e6577746f6e0d0a',Bob.Malone,Richard.BERRY@mycorp.net," -",
"CN=BERRY Richard,OU=TestFinance,OU=HR,OU=CORP,OU=CR,DC=mycorp,DC=com","BERRY Richard",BERRY,FR,"Puylouvier",FRANCE,,56000,"+33 1 23 45 67 89","+ 33 1 23 45 67 80",Richard,FRANCE,IT,X'332c206176656e7565204e6577746f6e0d0a',Bob.Malone,Richard.BERRY@mycorp.net," -",

Hi there !

David, you code has done everything just like I needed !

I've just added 2 lines in order to replace some weirdly encoded accentuated letters (such as â or é), because the decode wasn't done correctly:

#!/usr/bin/perl
#TextCSVBastienP.pl
use strict;
use warnings;
use diagnostics;

#use Encode qw( _utf8_on );
#binmode STDOUT, ":utf8";

my $dir = 'C:\temp\test_hex_decode';
my $in = $dir . '/' . '1.csv.txt'; #Added .txt in order to attach files to post
my $out = $dir . '/' . 'out.csv.txt';#Daniweb requires .txt extension to attach 

#open(INFILE,"<:raw:encoding(utf-16):crlf:utf8", "1.csv");

open(INFILE, '<', $in);
open(OUTFILE, '>', $out);
while(<INFILE>)
{
	my $rec = $_;
	chomp($rec);
	$rec =~ s/X'([a-fA-F0-9]+)'/convert($1)/eg;
        print OUTFILE "$rec\n";
}

close(INFILE);
close(OUTFILE);

sub convert{
    #This subroutine contains Mike's regex to substitute hex digits with
    #the corresponding characters
    my $mem = $_[0]; #The first (and only) element in the array of args, @_
    $mem =~ s/0d0a//; #Remove 0d0a from string of hex digits
    $mem =~ s/c3a2/e2/; #change a circumflex strangely encoded by correctly decoded
    $mem =~ s/c3a9/e9/; #change e acute strangely encoded by correctly encoded
    $mem=~s/([a-fA-F0-9]{2})/chr(hex $1)/eg;
    chomp $mem;
    return $mem;
}

Your code works very well with more than 8000 lines :-)

Mike, as for yours well I've tested some things, and on a 8000+ lines files it takes more than 1 hour and a half to run, but just dunno why... I'm taking a look at this, it seems that the encoding of 1.csv file is an issue. It should be a UTF8 encoded file but it seems to be some different kind of encoding, not exactly/really UTF8.

I'll take a look at that and will let you know, just for info because that's what whe're still testing things and raising some questions, perpetually improving !

Real huge thanks to both of you guys, you've helped me so much !

Regards,
Bastien

An hour and 1/2 huh? Well, admittedly, my code was sloppy. I want to do some profiling to see where the time is spent. I have never used the CSV module before and I put in some unnecessary code. Glad you got it solved though!

An hour and 1/2 huh? Well, admittedly, my code was sloppy. I want to do some profiling to see where the time is spent. I have never used the CSV module before and I put in some unnecessary code. Glad you got it solved though!

Mike's script works for me too. Now Bastien has two solutions, so that's great. I like the way Text::CSV parses fields with multiple embedded commas; and I use it for that, but it has more options and features that I haven't figured out how to use properly. There also is a separate Text::CSV::Encoded module which I have not installed or tried. Hopefully you won't need it. (In fact, aren't most or all of the French accented characters represented in ascii?)

My preference would be to convert all the hex in each record first, perhaps create a temporary 'converted' file that could be input to a second script that would use Text::CSV to parse the records into fields, if necessary. But breaking processing up into multiple steps is just my preference, probably ingrained from years of writing batch programs on old mainframes.

Edited 6 Years Ago by d5e5: n/a

Hi There

Mike I know where the problem was... A nasty error when I copied the code !

I've found it, corrected it and it works well. And it takes the same time as David's code.

Sorry for the suspiscion of a faulty code, my mistake ! :o)

David yes the french accentuated characters are all represented in ascii, but I had to work also with norvegian and sweeden characters and it seems that some of them are missing.

Thanks once again guys.

Regards,
Bastien

Glad to be vindicated - haha. I'd hate to have some code I write take 1 hour plus to run! I do have a bit of perl code in my regular job that takes 10 hours to run, but it creates over 150 200 page powerpoint documents from a database. I just kick it off the night before and let it go.

This question has already been answered. Start a new discussion instead.