Hello Guys,
Last week i was trying to parse xml using different xml parser like Hpricot. But when we have large amount of data than segment fault occurs. So i moved to better and faster xml parser called libxml-ruby.
There is simple steps to parse large xml using libxml-ruby. First of all you need to install libxml-ruby by
gem install libxml-ruby
gem install libxml-ruby
Lets, we have sample xml
sample.xml
xml = %{
<users>
<user>
<user>
<name>Priyanka Pathak</name>
<mark subject=”biology”> 80 </mark>
<mark subject=”biology”> 80 </mark>
</user>
<user>
<user>
<name>Rahul Pathak</name>
<mark subject=”biology”> 85 </mark>
<mark subject=”biology”> 85 </mark>
</user>
</users>
}
</users>
}
Now create method to parse xml
require 'rubygems'
require 'libxml'
require 'benchmark'
def parse_xml
Benchmark.bmbm do |r|
r.report("Process XML"){
parser = LibXML::XML::Parser.file('sample.xml',:encoding => XML::Encoding::UTF_8)
doc, collect_data = parser.parse, []
doc.find('//users/user').each do |e|
data = {}
data['name'] = e.find('name').first.content
mark = e.find('mark').first
data['mark'] = {:subject => mark.attributes.first.value , :value => mark.content}
collect_data << data
end
collect_data << data
end
puts "collect data: " + collect_data.inspect
}
end
end
Benchmark shows the time required during xml parsing and as per my experience it's faster than other xml parser.
For more information about libxml follow http://libxml.rubyforge.org/rdoc/
Hope this post will help you.
No comments:
Post a Comment