Fluxing the Future of The Filesystem
Meta-data- Data About Data
So, lately, I have been on the AppleInsider boards, and there I have been the talking of the way of organizing files. Right now, we live in a world where files are organized into folders, and if someone wants to have that one file in more than one folder that file needs to be either duplicated (which causes problems) or aliased/shortcutted. What the future is, is Meta-data. Using meta-data to organize files not by folders but by a list of documents.
Starting Point
Look at iTunes, which has a great filesystem going on. It has a Library of all the files, and then instead of folders there are playlists, and even better are special folders called "smart playlists" which uses meta-data to fill the playlists with things such as "Last 25 Played" or "1960's." (these smart playlists are defined by rules such as "all tracks whose year is 1960" or "all tracks that have been played most recently, limit to 25")This works great with Music files which have a lot of meta-data to work with: title, artist, album, composer, year authored, track number, genre, last played, date added, play count, rating, and more. In this case, the keywords are the artist and album (the two most specific groups that people would search by the most). But, how can this be applied to documents?
In documents, the keywords would be: title, album, last used, and date created. But the most important keywords would be data that you input yourself. Unlike music, documents are created by oneself so there is no online Document database where meta-data can be downloaded from (unlike the Gracenote CD Database). Therefore, upon saving the document, keywords would have to associated with the file. Let's say I just wrote an science essay for a final exam. I would file it under Schoolwork, Science, Essays, Finals. (let's suppose that the computer is smart enough to associate words like finals, final exams, and final all as one group). After I have saved my essay, I would be able to look in my Smart Folder Recent Work that has the rule "all files modified in the past two weeks, that have keyword work." and there is my new essay. But I could also look in my Schoolwork Smart Folder and find it there organized in a list by title, date created, subject, or type (essay or letter or simple question homework). And in every Smart Folder, all it would be is a reference back to a file that is listed in my Library of files, and the real file could be anywhere on my hard drive, but to me it is in my Smart Folders. What would be moved from Smart Folders (or just folders you populate yourself) is a reference that when opened would open the file wherever it may reside on the hard drive. This would be an incredible way to get data. There are problems to getting to this momentous point, however.
Investing in the Future
First of all, most people do not want to spend time after they have finished an essay typing in some keywords about what you just did. You don't want to have to remember if the keyword was "science" or "biology" or "bio." People already have to type in the title, why make them do more? (Well, for the reasons above). There is Author, Date Created, and Date Modified. The size (in bytes) is not as important as it is for an MP3 because one can figure from the size of the MP3 and the data rate the length of it. But because of different fonts, line heights, and margins the size of the document cannot necessarily tell you the length. So, maybe the Word Count and Length of the document (with the current font, line height, and margins) could be included as well. Already we have a Smart Folder that could have a rule such as "all documents written by Me in the last 9 months, that are at least 2 pages long." That would be a good way to get all the essays you have done this past school year. Yet, there is no subject or type of document, and that is where either context or user provided keywords come into play.
The Importance of Keywords
Context keywords would be that the system would include as meta-data. Much as a junk mail filter/scanner goes through a piece of e-mail to figure out if it is spam or not, another filter/scanner would be going through a document and trying to figure out what kind it is, or what it is related to. The Context Scanner would look for letters ( with words like "dear" and "sincerely," and clues such as an address and date at the top), essays (which would be long and contain headings or other characteristic factors), or maybe a screenplay (look for names in all CAPS and centered content.) The list goes on and on. Once it found what type of document it is, it can go on to look for the subject of it and assign a keyword that way.
The Context Scanner would be able to figure out if that essay you wrote was for English or for science, and figure out if the letter was personal or business. I have no idea how that could occur, but it would be great. Let's say you have just created a PowerPoint presentation for a math class. Upon saving it, it looks and already knows that is a slide show type of presentation because the Document Type is "PowerPoint Presentation," but then the Context Scanner goes to work and finds out that it would best go into keywords Schoolwork and Math. It would not append a Presentation keyword because that is already on the Document Type. If the Context Scanner did not exist, users would have to punch data themselves and that would get annoying and tedious (although auto-completion would help) and the dream of a meta-data driven file system would die.
An Excellent Metaphor
Users would also have to get used to the idea that they are using a card catalog to browse their files now. In the card catalog system there is great redundancy in that one book is filed several times to make looking for it easier. Once you find the card, the book to match it is very quick to find (Thank you Mr.Dewey.)
The card catalog bins are the Smart Folders and the Library are all your files, and finding things can be very quick. The computer is now your librarian and the card catalog, instead of being alphabetized have headings like "Work," "Projects," or "Personal." This is much better than now having to be your own File Clerk, putting things away by subject when maybe they would go great also by date, and it is getting hard to remember which one went where?
The problems here are the investment that software engineers want to make into a Meta-data driven system as well as protection novice/simple computer users from an incredibly changed file system without angering advance/power users with the lack of customizability and choice to put stuff where they want it.
Comments
This is a great idea. I could get used to typing in 2 or 3 keywords after creating or downloading a document or file. I don't know how much I'd trust an automated "classifier" though. Isn't there any implementation of this concept for some existing filesystem ? I'd implement it myself if I had the time and, ahem, the expertise.
Posted by: Frank | May 9, 2005 08:50 AM