Im from Chennai, India. And my university is Anna University. Our results are published in their website and in couple of others. Our college staff go into each url for each student and copy marks and do analysis stuff. Im working on a project to help them do it automatically.

Ive scripted a code in ruby (my project is in Rails) to scrape the data from the website iterating through each url for each student. But the catch is, the servers of Anna univ are damn slow and on the first day several hundred thousand students access em. So its nearly impossible to get the data quickly.

I think this might cause request timeouts and some students' results may not be scraped.

There is a page for old results. I would really appreciate if someone gave me an idea to get the data as fast as possible. The access is through perl cgi script in their website.

Some result pages:

http://result.annauniv.edu/cgi-bin/result/result11gr.pl

ill update few other result pages.

This is the ruby script I wrote. I used nokogiri gem to parse html using CSS selectors.

require 'rubygems'
require 'nokogiri'
require 'open-uri'

reg_nos=[23009104071,23009104072,23009104073]

reg_nos.each do |reg_no|
    url="http://result.annauniv.edu/cgi-bin/result/result11gr.pl?regno=#{reg_no}"
    doc=Nokogiri::HTML(open(url))
    name=doc.css("th:nth-child(4) font").text
    reg_no=doc.css("th:nth-child(2) font").text
    cells=doc.css("td center:nth-child(1)")

    subs=cells.length/4
    puts "______________________________________"
    puts "Name: #{name}\n"
    puts "Reg No: #{reg_no}\n"
    puts "Grades:\n"
    subs.times do |i|
        puts "Subject Code: #{cells[i*4].text}\n"
        puts "Grade: #{cells[i*4+2].text}\n"
        puts"--------\n"
    end
end

Thanks in advance :)

Recommended Answers

All 4 Replies

Wouldn't it be easier to request some other method of accessing this data?

Wouldn't it be easier to request some other method of accessing this data?

it would be rather expensive... :) cany u give any idea :)

Please help

If the servers are really so slow... not much you can do. Be sure to implement some retry mechanism, and wait it out.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.