[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

Re: [port-peer-review] reviews



> Review 1
> --------    (01)

(Message reformatted to fit <80 columns.)    (02)

It's not immediately obvious if responses to reviews are desired,
but since discussion has been light...see below:    (03)

> Paper's title: Creating Conceptual Access:
>                Faceted Knowledge Organization in the Unrev-II email archives
>
> Paper's author: Kathryn La Barre and Chris Dent
>
> Summary. The authors describe an experiment to use, combine and
> compare various document indexation tools (especially in latent
> semantic analysis) for the creation of "conceptual" clusters of
> terms/concepts/facets/documents (however, the only relations
> between the terms/concepts/facets/documents/clusters seem to be
> measures of similarity/divergence calculated by the semantic
> analysis tools). The evaluation (and refinement) of the created
> clusters, and hence what the authors call the "access
> structure" of the set of documents, is expected to be done by
> people via a manual ranking and tagging (keywords or short
> phrase) of documents. It is also expected that each phase of
> refinement will be usable as input for another phase of
> automated clustering.    (04)

I think this is a fairly accurate overview of what we wrote.
Unfortunately I think the review is correct when it suggests that
what we wrote did not provide enough of an explanation of how the
eventual system is supposed to work. The tool in progress is
designed to accept input of any kind that indicates clustering of
documents (or document elements). The testing thus far has been
done with LSA but any system that provides a method of grouping
can be used. The tool is an infrastructure for the human
evaluation of document groupings to determine if the groupings
are valid (i.e. the groupings have relevant meaning). If the
groupings do have meaning, those groupings can be used as facets
in a faceted access structure that provides a concept based
information retrieval interface to the dataset. The tool will
support evaluating and labelling clusters.    (05)

> Clarity and precision. I found the description too general and
> hence difficult to understand. There are many repetitions but
> at the same level of generality, i.e. without precision,
> definition or example, even about the most frequently used
> words/expressions: "cluster", "facet", "access structure",
> "coding messages".    (06)

Yes. The only excuse I can offer is that we were in a hurry and
did not feel we had time for much local review. As it turned out we
probably had plenty of time, given the flexibility with
deadlines.    (07)

> The figures are not much helpful since they do not show any
> term/concept/facet and the nature of the relations between the
> nodes is not explicited.    (08)

That's because no term/concept/facet has been determined at that
stage. That particular dataset represents a similar matrix of
documents generated with LSA. The matrix _may_ indicate some
conceptual groupings in the dateset. What the associated concepts
are is unknown and LSA provides no method of discovery. Our tool
is designed to help with that. A reviewer can look at the
documents in a group and determine if they are in fact related
and choose a conceptual label for that relationship. Others can
evaluate that evaluation.    (09)

> My understanding of the article relies on the information that
> the output of classic document indexation tools is used, and my
> conviction that therefore there cannot be anything really
> "conceptual" or "structured" to exploit. Hence, "facet" must
> refer to a simple keyword, and "access structure" to some
> calculated similarity relations which do not have any
> commonsense meaning. Which document indexation tools have been
> used, and how, is also unspecified.    (010)

See above and: thus far only LSA has been used, but this only to
exercise the (incomplete) interface.    (011)

> Originality. The proceedings of the WWW conferences are full of
> descriptions of tools creating clusters of documents (based on
> classic document indexation techniques) and permitting to
> navigate within and between them. I do not know (or like the
> output of) these approaches enough to appreciate their
> originality.    (012)

It is here where it becomes most apparent that our writing
failed. Our system is neither for creating clusters nor for
primarily navigating within them, except in the context of
evaluating the clusters under consideration. Clusters that
evaluate well can be used in the facet analysis process. Kathryn
has more knowledge of formal facet analysis so if we need more
discussion of that, perhaps she'll chime in?    (013)

Perhaps the hypothesis in the abstract was poorly worded:    (014)

http://lab.bootstrap.org/port/papers/2002/labarredent/index.html#nid05    (015)

  Tools such as latent semantic analysis, vector space models,
  traditional concordancing, and self-organizing maps may be
  worthwhile tools to generate meaningful clusters in the dataset.
  These clusters would then be used as aids in the human process of
  facet analysis in order to generate a faceted access structure1
  for the conceptual content of the archive or similar textual
  repositories.    (016)

It's the second sentence that matters. Faceted classifiation
holds a great deal of promise for electronic resources but the
generation of the system is labor intensive; much like, perhaps,
the process of getting document elements into FCG.    (017)

Our project is an effort to create tools to augment the humans
doing the process. Not replace them, augment them.    (018)

> I was more surprized by the absence of references to the use of
> Formal Concept Analysis (or similar methods) for the
> structuration and navigation of a base of documents and the
> terms used for indexing it. Indeed, this approach has the
> advantages of producing a genuinely understandable and    (019)

According to who?    (020)

[snip]    (021)

> Interest of the approach. This question is not discussed in the
> article. The abstract mentions that the hypothesis is that
> classic document indexation techniques "may be worthwhile tools
> to generate meaningful clusters in the dataset". However, there
> is no indication this is so in the article.    (022)

A purpose of PORT is to evaluate tools that may be useful.
Classic document indexation techniques may have a place, especially
when used in tandem with faceted classification. Our tool is a
tool for evaluating tools. It is a first step in a multi-stage
inquiry.    (023)

> I personally do not think this is so. I do not even have much
> interest in using the "Email Concept Analysis" tool of my
> friend because it is not enough "conceptual"/"knowledge-based"
> to me: I am not interested in retrieving sets of
> e-mails/documents according to terms/authors/..., I am
> interested in getting precise answers to precise questions,    (024)

This is a classic division in information retrieval. Are you
trying to find a precise answer to a precise question or are you
trying to learn more generally, without precise goals?    (025)

Interestingly, my own experience is that "classic document
indexation techniques" are more effective when questions are
precise, if I query well. Where those techniques fail is when I
want to know some stuff, kind of related to some other stuff,
that might have something to do with this and that. That's when I
want faceting.    (026)

Thank you very much for the comments. They have been very
productive for me. I had hoped that there might be more and that
we might have a constructive dialog. Perhaps we still can.    (027)

-- 
Chris Dent  <cdent@burningchrome.com>  http://www.burningchrome.com/~cdent/
"Mediocrities everywhere--now and to come--I absolve you all! Amen!"
 -Salieri, in Peter Shaffer's Amadeus    (028)