Friday, September 26, 2008

AI Inquisitors

While OCLC’s WorldCat Resource Sharing™ brings together and centralizes vast amounts of information and holdings, it is still a far cry from creating the world’s largest possible library consisting of all libraries fully networked together. This ideal World Wide Library, or WWL, would theoretically resemble the fictitious Library of Babel described by Jorges Luis Borges and would therefore encounter the same sort of problems as laid out by Mr. Borges. In “The Library of Babel” Mr. Borges describes what amounts to as a seemingly infinite library that contains all the possible pieces of information, including all the possible combinations of alphabets and characters known to humans. While this may seem like an ideal to humans, Mr. Borges warns us against developing something like this because it would have some inherent problems that would render the holdings useless. Developing a WWL that recorded all the library holdings in the world would develop a system very similar to Mr. Borges’ Library of Babel and would be in danger of encountering the same problems described by Jorges Luis Borges.
One of the major problems inherent in the Library of Babel as described by Mr. Borges is that because there is so much information stored in the Library of Babel, it becomes near impossible to find anything.
"There are official searchers, inquisitors. I have seen them in the performance of their function: they always arrive extremely tired from their journeys; they speak of a broken stairway which almost killed them; they talk with the librarian of galleries and stairs; sometimes they pick up the nearest volume and leaf through it, looking for infamous words. Obviously, no one expects to discover anything."
Obviously these human inquisitors, or librarian searchers, are far too slow to actually find something from within the vast holdings of this library, and a WWL would contain so many holdings that it would pose a similar problem. While there would be computer help in searching in the form of WorldCat’s search tools in their database, it seems like it would still be difficult to sort through all the information available. What would be needed is the next great progression in computers and technology: Artificial Intelligence (AI).
Conventional computers seem powerful because they can compute billions of simple calculations in sequence within a second, which means that they work extremely fast. But speed is not always everything. Humans are much slower at doing these simple calculations, but nearly billions of neural pathways can be used at the same time, giving us the ability to have and follow more complicated thoughts. The ability to simulate human thoughts and emotions in computers, known as artificial intelligence, is still in its infancy but has the potential of opening up a wide array of applications. By combining the thought pattern of humans (whether it is by using neural networks or some other AI methodology) and the speed of computers, one could make a very useful search tool. This could be the equivalent of making an AI inquisitor, which would be similar to an intelligent agent that could go out into the vast holdings and rapidly search for and find information that it knew was relevant. Having a search tool like this would enable us to use a vast library system and would therefore enable us to develop a large WWL that resembled the Library of Babel.
Unfortunately it seems like that sort of AI inquisitor is a long way from being developed, so as yet, the problem of actually being able to find anything in the Library of Babel and the WWL still persists and hinders the actual development of such a library system, whether or not WorldCat Resource Sharing™ could really be developing such a network.

Tuesday, September 23, 2008

WWL: World Wide Library

With the introduction of computers into libraries, librarians and patrons were able to gather greater amounts of knowledge and information at much greater speeds. Now, as computers become increasingly powerful and databases and search tools become more versatile, it makes sense that the libraries in the world are becoming closer.
OCLC promises to take the search for information yet another step forward with the product WorldCat Resource Sharing™. With this new product (which I am sure comes with a nice shiny price-tag), OCLC is in effect attempting to make the world’s largest library by linking many independent libraries together through this central WorldCat database. According to their brochure for WorldCat Resource Sharing™, “the more than 9,100-library system has at its core the comprehensive WorldCat® database. With more than one billion holdings—both physical and digital—WorldCat crosses all manner of subjects, languages, cultures and uniqueness.” ( brochures/211370usb_resourcesharing.pdf) One billion holdings…. Now that’s a library! By combining the holdings of those 9,100 libraries into one database, OCLC and WorldCat are making a strong effort to build the world’s largest catalogue or database, giving the patrons of these libraries unprecedented access to vast amounts of information. OCLC’s product brochure promises member libraries that “fulfillment grows dramatically when your customers can place requests electronically” (http:// and receive their requested material from wherever it resides within the various holdings. WorldCat Resource Sharing™ gives libraries and patrons the ability to share and find a great deal of information from all over the world.
Judging from this advancement the next and perhaps ideal progression is development of the ultimate library by connecting all the libraries in the world. By networking all libraries and recording all holdings in one central database, we could develop the ultimate library. Since this encompasses all libraries around the world, the theoretical structure would resemble that of the worldwide web. Instead of the Library of Babel, one could call this the WWL (World Wide Library), and it would obviously have a much more powerful search tool than Jorge Luis Borges’ perpetually lost inquisitors. Even though it would be possible to search for information and materials in this system, there would still be problems with this system. While it would be easy to find and access digital information, one major problem remains. If one needs access to an actual physical information source such as a book, it is very possible that it would need to be shipped from across the world, delaying the access to that information. Any problems aside, this would represent a giant advance in information science and retrieval.
While WorldCat Resource Sharing™ does not bring us to this ultimate library, it does go far in networking a large number of formerly independent libraries and building up an extensive database of holdings. This product represents a large advance towards the ideal of an information network consisting of all the world’s libraries and known as the WWL.

Friday, September 19, 2008

Digital Archives as Profit

OK, as I have written before, I am concerned about the preservation of digital information in light of the rather quick deterioration of such files and the rapid introduction of new technologies and software. Apparently, I was not thinking like a businessman as certain people see this as the perfect opportunity to make some money. There are two major organizations that repute to offer long-term storage and preservation of digital information and collections: OCLC’s Digital Archive™ and Amazon’s S3 (Simple Storage Service). These two services seem to offer solutions to some of the problems that I outlined in my blog entitled “The Question of Digital Durability.” These services do not come free, however, and Amazon’s S3 charges $1.80 per gigabyte per year, while OCLC’s Digital Archive™ costs about $7.50 per gigabyte per year. Digital Archive™ costs over four times as much as storing information in S3! As Peter E. Murray describes in his blog Disruptive Library Technology Jester ( ), there are important reasons for that price difference.
As Murray points out, while Amazon’s S3 does provide an accessible place to store digital information, this provider has some serious drawbacks. S3 does make certain guarantees, but they seem to be limited to the performance of their service, which seems to relate more to the accessibility of the information you are storing with them. There seems to be no guarantee on the long-term preservation of that data. In fact, they explicitly say that the customer is responsible for the security and backups, which are a large part of digital preservation, for the information S3 is storing. In section 7.2 of their Customer Agreement, they state, “you acknowledge that you bear sole responsibility for adequate security, protection and backup of Your Content.” In light of this, when doing business with S#, you are mainly paying for information storage and not preservation.
According to Murray, OCLC’s Digital Archive™ goes a step further than Amazon’s S3 and specializes more in the preservation of digital masters. In the “Our Commitment” statement, OCLC describes what they aim to do: “OCLC is actively developing processes for full preservation of digital assets to ensure complete renderability, regardless of technology changes. This preservation system will likely involve a combination of migration and emulation.” Not only do they protect the information more than S3, they also ensure its accessibility “regardless of technology changes.” This goes much further in attempting to preserve the digital information they are storing for the customer. However, this helps translate into a cost that is over four times what S3 costs.
While the costs labeled above may not seem like an exorbitant amount, when you start dealing with storage in the TB range, as Murray plans to do for his institution, you can quickly run up the costs into the hundreds of thousands of dollars per year for the storage of your digital information. This could quickly put this service, especially OCLC’s Digital Archive™ (I love how they trademarked that term already), out of reach for many libraries, which still leaves the problem of storing digital information. However, it is comforting to see businesses attempting to help remedy this particular problem. Perhaps one day soon it will be affordable to store data with companies that guarantee the preservation of the digital information in their care (I guess as long as that company stays in business).

Simple Tech

I recently heard from an acquaintance about an issue at a certain library (which shall remain nameless here) that sounded rather archaic for this day and age. It seems that there was a problem with lost and misplaced books at this library and the management decided to try to track it. They decided that all their re-shelvers would carry around a chart with them, and as they perform their duties, they would record the call-numbers of the books, their names, the date, and the time. This method would effectively trace the last person to shelve a book, so if a book ever went missing or turned up in the wrong location, management would know who re-shelved the book last. While there are other ways to lose a book in the library, if it turned out one person had handled a substantially high percentage of the misplaced books, management would be able to determine that at least part of the problem rested on certain employees in the institution. They could then use that information to identify who needs to be retrained or watched more closely. Incorporating this form of accountability into the re-shelving process could help management at least reduce in some part the frequency in which books go misplaced or incorrectly re-shelved.
As it turns out, the re-shelvers at this institution had a problem with this new policy. They didn’t mind the newly introduced accountability, though. They felt more annoyance towards the fact that this new policy made them terribly inefficient at their jobs. It now took them at least twice as much time to re-shelve a given number of books, because each time they re-shelved a book, they had to manually record the call-number, their name, the date, and the time. While the information derived from this new policy could be put to good use, the policy itself greatly increased the amount of time to perform a certain job, which made the re-shelving employees unhappy.
One of the stated benefits of technology is that it makes our lives easier and makes us more efficient at what we do. It seems to me like a very simple task to utilize some rather simple (for this day and age) technology to improve this particular policy. All you would need is a few hand held scanners that the re-shelvers could carry around with them as they re-shelved books. The re-shelvers would be givin log-in names and passwords so they could log onto the scanners, which would be wirelessly connected to server and central database, as a particular user. Each time they re-shelved a book, all they would have to do is scan the already existing bar code. This one quick scan would record the call number of the book, the re-shelvers log-in name/real name, the date, and the time, and the scanner could then send the information back to be stored in the central “Re-shelving” database. This is a fairly simple use of a database and would not take long to set up. This technology would record all the necessary information without increasing the amount of time it took to re-shelve books. It allows the library to increase the amount of accountability while maintaining the efficiency of its employees. While the scanners might represent a moderate initial expense, the long term savings on labor costs would quickly repay that investment. These scanners could also be used to quickly record additional information, thereby only increasing their value to the company. However, if the re-shelvers in the library are interns and don’t get paid, it might make it difficult to argue that buying the equipment and technology is going to save the library money.

Friday, September 12, 2008

The Question of Digital Durability

As technology enhances our ability to observe and obtain new observations and information, we have an increasing amount of information to store and organize. Fortunately, technology has also developed digital forms of storage that can save large amounts of information in small areas. All of our collective knowledge and information needs to be stored and organized in an accessible manner, because the only way for us to move forward in our thinking and knowledge is to work from the knowledge and information that has previously been developed. For this reason the storage and preservation of our collective knowledge and information is of the utmost importance. Since we now develop such vast amounts of information on a daily basis, digital libraries, archives, and databases have become tools both popular and necessary for the purpose of preserving our knowledge and information. However, there remains one important and nagging doubt in my mind about all this digital storage. Durability. Will these methods of storing data last and be accessible in the distant future? Not being an expert on the subject, perhaps I am missing some vital information myself. Regardless, it seems to me that there is the very real danger of losing vast amounts of information to the detriment of our future progress.
In the past, humans have generally stored useful information on very tangible things: stone, wood, clay, and paper. Sometimes these pieces were lost but subsequently dug up by people thousands of years later. The information stored on these relics could generally be retrieved whether through vigorous research or simply by reading it. The information stored on these tangible objects remained accessible to us and have been proven useful in understanding past civilizations and the human species itself. What would happen if we stored some vital or interesting piece of information digitally on a CD or on a USB flash drive and then lost it. If found by someone in the future, would the information that was stored be accessible to future generations?
Information that is stored digitally is stored merely as 1’s or 0’s on some sort of digital storage device, whether it be a CR, a DVD, a hard drive, or a flash drive. Depending on the storage device and depending on what type of information is stored (text, music, pictures, video, etc.), one needs a certain piece of hardware with specific software to retrieve the stored information and render it into something that means something to the human brain. This method of storing information seems very specific to both the hardware and software being used to store it. If one does not have the correct combination of hardware and software, is the information stored in a digital format transformed into a meaningless sequence of 1’s and 0’s? And if true, isn’t that information in effect lost to future generations? Of course, all of this ignores the fact that many digital storage mediums don’t actually last long anyway. It is said that certain magnetic storage devices last “years, even decades, before deteriorating” (Stair, Ralph M., and Kenneth Baldauf. 2008. Succeeding with Technology: Computer Concepts for Real Life. 3rd Edition. Boston, MA: Thomson Course Technology. p.79). Decades?! A few decades is a mere blink of an eye in the grad scheme of things. Storage mediums that deteriorate after a few decades are not durable storage mediums.
All of this seems to suggest that now on top of storing all the vast amounts of new information that being developed, we also need to constantly resave or restore all the old information from the past. This may mean simply resaving it in the same medium; however, since technology keeps advancing at such a fast pace, this probably means restoring and moving old information onto new storage mediums. This process of restoring and restoring and restoring this information seems to increase the likelihood of possibly losing some information. Once something is left behind, it may rather quickly be difficult or even impossible to access the information digitally stored since our technology rapidly develops new hardware and software. To me, this situation seems to make it alarmingly possible to lose large quantities of important data; however, as I stated before, I may be missing something here.

Technology Preserves Librarians

While advances in technology continue to rapidly improve our daily lives and enhance our ability to expand our collective knowledge, these advances carry the potential undesirable effect of information overload, which in turn could then hamper our ability to further our knowledge. Technological advances have increased our ability to observe and collect data and have increased the speed of communicating any knew information, which has the effect of increasing the amounts of and the discovery rates of new knowledge and information. While new technologies are developed to help cope with storing all of this new information, one is left to wonder if there is too much.
Technology has always helped our species drive forward in our quest for knowledge, understanding, and information, but now as we use technology to improve technology, the rate of technological advancements has vastly increased and continues to dramatically increase, thereby rapidly increasing our knowledge base. Technological advances in communications have also improved the ease and speed of communicating information across the globe and beyond. This improvement in communication and the spread of knowledge helps foster the generation of new information as other people use and work off of that newly communicated information. Technology has also made publishing much more efficient, meaning we as a species can turn out much more information quickly and cheaply. This also means that the publishing community in general does not have to be as critical in what actually gets published since such a great quantity can be published. This phenomenon has the effect of both increasing the amount of information being spread, but it also has the effect of spreading useless or bad information into our collective knowledge base. All of these situations help grow our knowledge base at an ever increasing rate.
Fortunately, technology has also developed methods that aid us in storing all of this information. Digital libraries and databases allow us to store vast amounts of information in relatively small areas. However, know that we have all of this knowledge and information, scholars and other people need to sift through it all in order to find the relevant information they are seeking. With all the information out there, this can seem like a daunting task. It would almost be a full-time job for someone in a particular discipline to keep up with all the information in their field alone. In order to move their field forward, they need to be able to work off of the information already generated, which means they need to be able to find it in the first place. Just sitting down and attempting to sift through all the information that is out there could easily overwhelm a person and prevent them from finding what they need to know, which in turn prevents them from generating any new information. This situation places librarians in a new and very important position. Having trained individuals in place working with other disciplines in order to help gather, organize, and maneuver through the seemingly endless amounts of information becomes indispensable if our species is to continue in our quest for knowledge and information. Far from negating the need for librarians, technology has developed a situation where librarians, with new technologies and skills, are desperately needed to help control the storage and flow of information.