The British Library to preserve terabytes of information on the web

The British Library has initiated a project that will “preserve and analyse terabytes of information on the web before it is lost forever”.
 
The new analytics software project, called IBM BigSheets, helps extract, annotate and visually analyse vast amounts of web information using a web browser. 
 
IBM BigSheets is a new technology prototype. Users can explore and generate new data insights using a web application and then the IBM software publishes Web 2.0 standard data feeds which can be searchable by British Library patrons. The new technology prototype is helping the British Library archive and preserve web pages. It is helping in speeding up the archival process before web data is lost forever.
 
The companies highlighted that recent research estimates the average life expectancy of a web site is just 44 – 75 days.  In turn, every six months, 10% of web pages on the UK domain are lost.
 
Helen Hockx-Yu, web archiving programme manager, The British Library, estimates that the UK Web space will contain over 11 million websites by 2011. 
 
“To take on the enormous challenge of capturing this content, we need a system capable of taking the UK Web Archive to web-scale,” said Hockx-Yu. “IBM can help us analyse the web archive containing millions of pages and unlock embedded knowledge which otherwise is difficult to discover using traditional search methods.”
 
Web Archiving Programme
 
The programme collects, makes accessible and preserves web resources of scholarly and cultural importance from the UK domain. The British Library’ objectives are:
 
•         to build a comprehensive web archive as part of the British Library’s digital collection,
•         to preserve the archive so that it remains accessible in the future,
•         to put in place people, processes and systems so that the Library can fulfill its obligations with respect to legal deposit of web resources.
 


Comment on this Story

Want to Comment?

If you want to join the debate, then please join the Text Analytics community.

Please LOGIN or CREATE A NEW ACCOUNT ... and get commenting!

Upcoming Events


6th Annual Text Analytics Summit

Click here to find out more

Recent Comments

Start the Conversation!