0

Hi everyone,

I’m currently working on my Master Thesis which in few words concentrates about NoSQL databases and performance and tuning tests of particular MongoDB.

I’m trying to create some test dummy data to put in so I can run some initially tests to create a baseline for comparison. I’ve found a MovieSet which should contain 10 million records and a parser to get the data from the movie files into MongoDB.

My only problem is that the script is in Ruby which I’ve absolutely no knowledge about what so ever. Because I’m working under a deadline I rather not use too much time studying the Ruby programming language. I’m though fairly familiar to PHP, so I was wonder if there was any Ruby & PHP expert that very graciously would convert the 70+ line Ruby script to PHP?

Thank you

Sincere
- Mestika

The Ruby script, in its orginal form (From Wrox.com Code Library):

require 'rubygems' #can skip this line in Ruby 1.9
require 'mongo'

field_map = {
    "users" => %w(_id gender age occupation zip_code),
    "movies" => %w(_id title genres),
    "ratings" => %w(user_id movie_id rating timestamp)
}

db = Mongo::Connection.new.db("mydb")
collection_map = {
    "users" => db.collection("users"),
    "movies" => db.collection("movies"),
    "ratings" => db.collection("ratings")
}

unless ARGV.length == 1
    puts "Usage: movielens_dataloader data_filename"
    exit(0)
end

class Array
  def to_h(key_definition)
    result_hash = Hash.new()
    
    counter = 0
    key_definition.each do |definition|
      if not self[counter] == nil then
          if self[counter].is_a? Array or self[counter].is_a? Integer then
              result_hash[definition] = self[counter]
          else
              result_hash[definition] = self[counter].strip
          end
      else
        # Insert the key definition with a empty value.
        # Because we probably still want the hash to contain the key.
        result_hash[definition] = ""
      end
      # For some reason counter.next didn't work here....
      counter = counter + 1
    end
    
    return result_hash
  end
end

if File.exists?(ARGV[0])
    file = File.open(ARGV[0], 'r')
    data_set = ARGV[0].chomp.split(".")[0]
    file.each { |line|
        field_names = field_map[data_set] 
        field_values = line.split("::").map { |item|
            if item.to_i.to_s == item
                item = item.to_i
            else
                item
            end
        }
        puts "field_values: #{field_values}"
        #last_field_value = line.split("::").last
        last_field_value = field_values.last
        puts "last_field_value: #{last_field_value}"
        if last_field_value.split("|").length > 1
           field_values.pop 
           field_values.push(last_field_value.split().join('\n').split("|"))
        end
        field_values_doc = field_values.to_h(field_names)
        collection_map[data_set].insert(field_values_doc)
    }
    puts "inserted #{collection_map[data_set].count()} records into the #{collection_map[data_set].to_s} collection"
end
2
Contributors
1
Reply
2
Views
5 Years
Discussion Span
Last Post by curlissue657
This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.