0

Im from Chennai, India. And my university is Anna University. Our results are published in their website and in couple of others. Our college staff go into each url for each student and copy marks and do analysis stuff. Im working on a project to help them do it automatically.

Ive scripted a code in ruby (my project is in Rails) to scrape the data from the website iterating through each url for each student. But the catch is, the servers of Anna univ are damn slow and on the first day several hundred thousand students access em. So its nearly impossible to get the data quickly.

I think this might cause request timeouts and some students' results may not be scraped.

There is a page for old results. I would really appreciate if someone gave me an idea to get the data as fast as possible. The access is through perl cgi script in their website.

Some result pages:

http://result.annauniv.edu/cgi-bin/result/result11gr.pl 

ill update few other result pages.

This is the ruby script I wrote. I used nokogiri gem to parse html using CSS selectors.

require 'rubygems'
require 'nokogiri'
require 'open-uri'

reg_nos=[23009104071,23009104072,23009104073]

reg_nos.each do |reg_no|
    url="http://result.annauniv.edu/cgi-bin/result/result11gr.pl?regno=#{reg_no}"
    doc=Nokogiri::HTML(open(url))
    name=doc.css("th:nth-child(4) font").text
    reg_no=doc.css("th:nth-child(2) font").text
    cells=doc.css("td center:nth-child(1)")

    subs=cells.length/4
    puts "______________________________________"
    puts "Name: #{name}\n"
    puts "Reg No: #{reg_no}\n"
    puts "Grades:\n"
    subs.times do |i|
        puts "Subject Code: #{cells[i*4].text}\n"
        puts "Grade: #{cells[i*4+2].text}\n"
        puts"--------\n"
    end
end

Thanks in advance :)

2
Contributors
4
Replies
5
Views
5 Years
Discussion Span
Last Post by pritaeas
0

Wouldn't it be easier to request some other method of accessing this data?

it would be rather expensive... :) cany u give any idea :)

0

If the servers are really so slow... not much you can do. Be sure to implement some retry mechanism, and wait it out.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.