|
The amount and substantiality of
the portion taken |
Small portions of the original
text, rather than full-text access |
Under no circumstances whatsoever
do end users have access to entire texts (e.g. newspaper,
magazine, or journal articles, or short stories). All access is
via the web interface, and the vast majority of what users see
are simply frequency charts showing the frequency of words or
phrases in different parts of the corpus. Access to small
portions of the original text is more of an "afterthought",
rather than the central feature of the interface.
Access to actual portions of the
original text is limited to very short
"Keyword in Context" displays, where users see just a handful of words to
the left and the right of the word(s) searched for. In addition,
all access is logged, and users can only perform a limited
number of searches per day. As a result, it would be difficult
for end users to re-create even one paragraph from the original
text, and it would be virtually impossible to re-create
an entire page of text, much less the entire article.
This "snippet defense" (which
relies on limited access to the original text via small snippets
from the web interface) is the same one used by
Google Books for its use
of millions of copyrighted materials. In addition, we have
consulted two lawyers who specialize in Internet copyright law
(names available upon request). They have both stated that
because of our limited access to end users, as well as our
status with regards to the other three factors shown here, we
are clearly in accord with the provisions of the Fair Use
statute. |
|
The nature of the copyrighted work
|
Non-creative works |
There are some creative works (e.g.
short stories and small sections of novels) in the corpus, but
more than 80% of the corpus is composed of transcripts of TV shows, and
articles from newspapers, magazines, and academic journals. |
|
The effect of the use upon the
potential market |
Little or no effect on the
copyright holder |
Because of the very limited access
via our web interface (see the first item above), it is
extremely unlikely that anyone would use this corpus as a
"substitute" for other access to the original texts. Other
sources make these texts available as "complete articles", which
are meant to be read in their entirety. That is completely
impossible with our interface.
Access to the texts via our
interface, as compared to access via other sources, serves two
completely different audiences. Our interface is designed for
linguists and language learners who want to see the frequency of
words, phrases, synonyms, etc., and it is completely inadequate
for anyone who wishes to read the entire text of an article. As
a result, there is very little or no "competition" between our
service and that provided by others, and therefore virtually no
market impact. |