Friday, August 19, 2011

Faster XML Parser in ruby on rails

Hello Guys,

Last week i was trying to  parse xml using different xml parser like Hpricot. But when we have large amount of data than segment fault occurs. So i moved to better and faster xml parser called libxml-ruby.

There is simple steps to parse large xml using libxml-ruby. First of all you need to install libxml-ruby by

  gem install libxml-ruby

Lets, we have sample xml

sample.xml
xml = %{
  <users>
    <user>
      <name>Priyanka Pathak</name>
      <mark subject=”biology”> 80 </mark>
    </user>
    <user>
      <name>Rahul Pathak</name>
      <mark subject=”biology”> 85 </mark>
    </user>
  </users>
}

Now create method to parse xml

require 'rubygems'
require 'libxml'
require 'benchmark'

def parse_xml
   Benchmark.bmbm do |r|
      r.report("Process XML"){
        parser = LibXML::XML::Parser.file('sample.xml',:encoding => XML::Encoding::UTF_8)
        doc, collect_data = parser.parse, []
        doc.find('//users/user').each do |e|
           data = {}
           data['name'] = e.find('name').first.content
           mark = e.find('mark').first
           data['mark'] = {:subject => mark.attributes.first.value , :value => mark.content}
           collect_data << data     
        end

        puts "collect data: " + collect_data.inspect
     }
   end
end
Benchmark shows the time required during xml parsing and as per my experience it's faster than other xml parser.

For more information about libxml follow http://libxml.rubyforge.org/rdoc/
Hope this post will help you.

No comments:

Post a Comment