0

hello, i want to parse a file that has the following log entries

1183245991.961 0.079 137.157.56.34 200 1277 GET http://linux.pacific.net.au/linux/packman/suse/10.2/repodata/repomd.xml "text/xml"
1183327698.250 2.568 137.157.56.212 200 57891 GET http://csc3-2004-crl.verisign.com/CSC3-2004.crl "application/pkix-crl"
1183328737.107 0.570 137.157.56.223 301 777 GET http://www.starnet.com/expiredialog/demo.php?product_name=xwin32&time_remain=1800&version=8.0.2216&locale=en_AU&w32iso639ulang=en&uuid={dfe167ce-f51c-4fd4-af80-25f31246f6bc} "text/html"
1183328737.908 0.781 137.157.56.139 200 5696 GET http://www.starnet.com/expiredialog/demo.php/xwin32/1800?version=8.0.2216&locale=en_AU&w32iso639ulang=en&uuid=%7bdfe167ce-f51c-4fd4-af80-25f31246f6bc%7d "text/html"
1183328738.777 0.759 137.157.56.91 200 3726 GET http://www.starnet.com/css/starnet.css "text/css"
1183679515.608 0.059 137.157.56.51 200 520 GET http://fdimages.fairfax.com.au/crtvs/bulletpoint.gif "image/gif"

Basically the area of concern would be to extract the FQDN of it. i only want to get the www. files only, however im also getting http:// images as well. an example correct output would be

http://www.carsales.com http://www.sensis.com.cn http://www.smh.com.au

but i am getting stuff like images.google.com as well.

below is the following code

@FQDN = split (/\./, $entry[6]);		# Grab the domain name (ie. aol.com, attbi.com)
            $lenght =1;
            $temp = $FQDN[$lenght]. "." .$FQDN[$lenght+1];
            $Domain{$temp}++;

please advise

2
Contributors
1
Reply
2
Views
6 Years
Discussion Span
Last Post by JeoSaurus
This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.