« (Permission) Culture Study |
Main
| Classical Myopia and the BBC's Beethoven »
July 12, 2005
Opening Up the Wayback Can of Worms
Posted by Alan Wexelblat
Back in November of last year I noted a court case in which saved Web pages from the Internet Archive (informally the Wayback Machine) was used as evidence. At the time I expressed surprise at the judge's ready acceptance of the evidence and noted that this is an extremely murky and untried legal area.
Now, if the report in the NJ Star Ledger is correct, we may see some litigation of a few of the issues raised by an archive of this sort and its involvement in copyright and court proceedings.
As best I understand it, what seems to be happening is the operators of the Wayback Machine are themselves being sued. Geist pointed to a Geocities page for a copy of the actual complaint, but it was 404 when I went to look.
What Kevin Coughlin's story says is that Healthcare Advocates is suing Wayback because the operators of Wayback failed to block access to certain archived materials during a 2003 trade secrets dispute. According to the complaint, the opposing counsel at the time obtained pages from the Wayback Machine. One issue is how those pages were obtained - did they come from normal searching or from some kind of "hacking?"
Another issue is the copyrights of the pages - if the pages were copyrighted by Healthcare Advocates, then what was Wayback doing with copies of them in the first place? And why was it serving up copies of material it didn't own the copyrights to? And were opposing counsel engaged in knowingly obtaining by extra-judicial means material they knew was supposed to be protected by IP laws? And does the Internet Archive have responsibility in part due to what it apparently admits were broken "blocking procedures"? (My instinctive guess is that their spider wasn't properly obeying robot exclusion directives.)
Kurt Opsahl, staff attorney for the EFF, is quoted as opining that the doctrine of fair use generally allows the gathering of copyrighted materials as evidence in trade secret cases. In which case, the whole thing may get chucked out quickly and no legal precedents will emerge. But I remain convinced that this is the barest tip of a huge legal iceberg that is going to crash into the business of search engines and other 'net archives, soon. Maybe not this specific case, but the issues I pointed to last year still remain completely unresolved and in the absence of guiding legislation parties wishing to establish principles have little choice except to litigate their claims and hope for good precedents.
Comments (12)
+ TrackBacks (0) | Category: Laws and Regulations
- RELATED ENTRIES
- Apple Jumps Into iBooks - With Hobnailed Boots
- On the Dissent in Golan v Holder
- Jonathan Coulton on Megaupload/Piracy
- Stallman on E-Book Evils & Privacy
- Admin Update
- Joe Konrath Claims USD 100,000 E-book Profits in Jan
- Nate Anderson vs the MPAA
- SF vs SF Ideas
1. Donna on July 12, 2005 1:03 PM writes...
William Patry has more...
Permalink to Comment2. Crosbie Fitch on July 12, 2005 1:42 PM writes...
Maybe the next Web should require all participants to agree that everything they publish on their websites is published under a Creative Commons license?
Otherwise it's petulance of the highest order to publish information and then expect to be able to remove it from the historical record at any time thereafter - like some kind of gentleman's DRM agreement. "You can look, but don't record"
Almost as bad as "You can look, but don't copy"
Permalink to Comment3. Dr. wex on July 12, 2005 2:01 PM writes...
it's petulance of the highest order to publish information and then expect to be able to remove it from the historical record at any time thereafter
This is pretty much precisely what the NYTimes/Boston Globe do now. Also it strikes me this is what the various DRM schemes used by music downloaders propose to do as well.
Permalink to Comment4. Crosbie Fitch on July 12, 2005 2:35 PM writes...
What would you do or say if someone who'd commented on Copyfight over the years said the following?
"I have decided to join the dark side of strong IP and therefore require that all my comments (largely containing viewpoints of a libertarian nature) be deleted from your archives. I am the author and copyright owner (irrespective of your site's obscure policy which I did not consent to and have only just become aware of) and will sue for infringement if you do not comply within 28 days"
1) Not a problem - didn't like 'em anyway
Permalink to Comment2) Sure, but you'll have to ask umpteen other syndicated sites to expunge their records yourself - let alone search engines and archival sites.
3) Blog off, you plonker!
5. Kevin Wimberly on July 12, 2005 2:51 PM writes...
I'm just throwing this out there - it's not exactly on point, but it's sort of interesting in light of your comment about possible copyright infringement by Wayback. Remember the 4 DMCA exceptions that the Copyright Office approved in October of 2003? The third one - "Computer programs and video games distributed in formats that have become obsolete and which require the original media or hardware as a condition of access" - was lobbied for by the Internet Archive. They wanted the exception to broadly apply to "Literary and audiovisual works embodied in software whose access control systems prohibit access to replicas of the works." It seems to me that that would have included web pages and protection measures found on servers. This is a big "if," but IF robots.txt files are access control systems under the DMCA, and IF the CO would have gone with IA's broad request, then the Internet Archive would be permitted to break in and archive any website it wanted to - subject to it meeting the additional conditions that it is an "archivist" as section 108 requires (the comments to this 3rd exemption seem to suggest that it is limited to section 108(c) uses). Being that the IA makes the works available to the public, they would not have been able to take advantage of the proposed exception anyway.
Just a thought...
See p. 41 (bottom)
http://www.copyright.gov/1201/docs/registers-recommendation.pdf
Permalink to Comment6. Seth Finkelstein on July 12, 2005 3:40 PM writes...
Here's an important comment from Patry's blog post:
[repost, not my words, FvL's comment below]
"Fred von Lohmann said...
I'd like to suggest a different interpretation of their DMCA claim (while acknowledging that the complaint is not clear): that the robot.txt file operates as a TPM as used by the Internet Archive. The standing provisions of the DMCA have been interpreted broadly, so perhaps the plaintiff here is arguing that the Internet Archive has implemented a TPM that controls access to its archived materials. The robot.txt file is intended to block external access to these materials, and was bypassed by the defendants. (I'll admit, this sounds like the Archive's claim to bring, not the plaintiffs', but the DMCA's standing provision has been stretched before.)
I think the claim still fails for the other reason you note. But I don't think the complaint need necessarily be construed as arguing that robot.txt is a TPM generally.
Permalink to Comment10:03 AM"
7. Daniel Brandt on July 13, 2005 1:57 PM writes...
On August 12, 2004, I sent a fax to Brewster Kahle requesting a permanent block of all pages -- past, present, and future -- on all 12 of my domains. The Wayback Machine (Internet Archive) complied within two days.
The Archive does not use robots.txt in the conventional manner. They have an arrangement with Alexa, owned by Amazon but which has close relations with Brewster Kahle, to get Alexa's crawl. Alexa uses the crawl to sell packaged sets of the Intenet.
The Archive checks for a robots.txt in real time when a request is made. If it finds an exclusion that includes "ia_archiver" it will say so and not show the page. However, it still has those pages and it still keeps them.
I know this because I sold a domain that I had always excluded from ia_archiver to someone who does not use robots.txt. Without a real-time robots.txt exclusion, all the old pages on that domain, which I had assumed were not crawled when I owned the domain, were suddenly available on the Archive.
Now that would have been a strong lawsuit, and that's why the Archive honored my request to block all my domains. However, they still have those pages. I now exclude Alexa's bots entirely in my routing table, just to be safe.
Permalink to Comment8. Dr. wex on July 13, 2005 2:20 PM writes...
Thanks for the story, Daniel. It's interesting. For the record, Alexa used to be Kahle's company before he sold it to Amazon. I also wonder how effective it is to simply block Alexa's bots - what's to say that the Archive doesn't also buy data from other crawlers?
Permalink to Comment9. Daniel Brandt on July 13, 2005 2:41 PM writes...
I should have added that when I did some experiments last August, I tried blocking the Archive's real-time fetch of robots.txt to see what would happen. It turned out that after a 20-second timeout, the Archive went ahead and provided the requested page.
That's when I registered the domain "archive-watch.org" but it's only been parked so far. When I'm done with Google's copyright heist at the University of Michigan library, maybe I'll get more interested in the Archive and put up a page or two under archive-watch.org
I really don't like Brewster Kahle's approach. He's less rich and more nonprofit than Google, but he still takes liberties with copyright law that I don't think can be justified.
Permalink to Comment10. Ned Ulbricht on July 13, 2005 3:41 PM writes...
A text version of the complaint is now up at
Permalink to Commenthttp://www.ip-wars.net/story/2005/7/12/185442/034
11. Jack on July 15, 2005 3:17 AM writes...
They have an arrangement with Alexa, owned by Amazon but which has close relations with Brewster Kahle, to get Alexa's crawl. Alexa uses the crawl to sell packaged sets of the Intenet.
Permalink to CommentThe Archive checks for a robots.txt in real time when a request is made. If it finds an exclusion that includes "ia_archiver" it will say so and not show the page. However, it still has those pages and it still keeps them.
12. Seth Finkelstein on July 15, 2005 11:34 AM writes...
I've been writing about this at length over at my my own blog (Infothought), with some technical speculations and a counter-argument:
Internet Archive DMCA Circumvention Lawsuit
http://sethf.com/infothought/blog/archives/000877.html
Internet Archive DMCA "Circumvention" - Access vs. Copying
http://sethf.com/infothought/blog/archives/000878.html
Proposition: OPT-OUT controls are not DMCA access controls
Permalink to Commenthttp://sethf.com/infothought/blog/archives/000879.html