[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

Re: [port-peer-review] reviews


On Wed, 26 Jun 2002, Philippe Martin wrote:    (01)

> An answer to your question and a few comments on your comments.    (02)

In the spirit of maintaining the discussion (which is fantastic to
finally see) I'll continue. We may be straying off a clean path, but
this path is interesting.    (03)

[fca and being understood]
> > According to who?
> 
> 1) myself. Even the "cosinus measure", the simplest and 
>   easiest-to-understand vector-space indexation technique,
>   is a statistical measure: the results are numbers, the
>   meaning of which is far from being intuitive, and represent
>   a huge loss of information. In FCA, the extracted relations
>   between documents/terms are clear (simple subset relationships),    (04)

Our use of LSA in our tool is not because we believe it to have great
semantic value, but simply because it was easily available. The work
that led to my involvement with Kathryn's project required I use LSA
as a data mining technique. We're interested in evaluating any process
that automatically generates clusters. We've identified cluster
generation as a roadblock that _might_ be made automatic. It is this
identification of small roadblocks that leads to augmentation (I
think).    (05)

My general process for augmenting individuals or teams is this:    (06)

- analyze a large task
- break it down into small tasks
- which of those tasks may respond to small pieces of automation,
  _while still maintaining the larger task and human oversight_
- try some methods of automation
- keep the ones that work well    (07)

I agree that LSA has _severe_ limitations, but it does (easily)
exercise a methodology that we are trying to evaluate. You may enjoy
reading this:    (08)

 http://ella.slis.indiana.edu/~jodmarek/l697/project3/joe_marek_project_3.html    (09)

It's a bit of a screed on why LSA is lame, in response to the
assignment (mentioned above) that required it.    (010)

[...]    (011)

>   are used for the indexation, but it is less relevant. This ratio
>   is irrelevant with knowledge representations since then you
>   can have perfect precision and completeness (ok, not for "all"
>   questions, and with respect to the closed world of a KB).    (012)

This last statement strikes me as one of the cruxes. In the closed
world of a KB, that is being generated in the present day, a complex
representation such as Frame-CGs may in fact be ideal because:    (013)

 - the KB world is closed
 - new information is being generated, may as well do it in an
   articulate, sharable form    (014)

However, in a situation such at the PORT project, where much of the
information will be "old", translating to new representations probably
won't happen. Therefore, access structures, especially simple (as in
both easy for a human to use and/or easy for a computer to automate)
ones will be very important. The ability to find stuff and make
reference will be key. Thus the tool on which Kathryn and I are working
and my own insistence that purple is cool (because it is _so_ simple,
but provides a great deal).     (015)

So, that suggests to me that although there is an apparent chasm
caused by our different and religious attitudes towards knowledge
representation what we actually have here is classic case of different
jobs needing different tools.    (016)

Annotative systems for PORT _may_ benefit from formal knowledge
representations, and be facilitated by authoring tools of that sort.
Or they _may_ benefit from human facilitators who recast the more
informal ramblings of humans into more formal structures (like the IBIS
hoistings and happenings of this list).    (017)

Archival systems for PORT _may_ benefit from a variety of access
structures, ranging from simple vector space indexation methods to
classification system. Which of those are best is a question that
responds well to testing and measurement.     (018)

Access structures for or based on the annotative systems are likely to
provide an _excellent_ method for getting into the archived content.    (019)

> 2) the authors of an article that I read about two months ago and
>   comparing FCA, LSA, the "cosinus" measure, and another
>   clusterization technique. Unfortunately, I forgot the reference.
>   FCA compared well (it precised and explained the clusters obtained
>   via the other techniques) but was not used alone: some pre-filtering on
>   the documents or terms to index was done via LSA-style technique.    (020)

This seems common: combining techniques to get usable data.    (021)

[...]    (022)

> Here, the formalism is simple enough not to be an obstacle.
> Examples: [http://foo.bar/doc.html#nid43, example: "an example"];
> or ["birds fly", corrective_specialization: "most health birds are 
> able to fly"]. Alternatively, connecting DEs by rhetorical relations
> can be done via the interface of any good hypertext tool (this excludes
> usual Web browsers).    (023)

[...]    (024)

> In fact, I seriously doubt that scholars will take the time to create 
> representations like the ones I illustrated in the Examples section.
> This is why I conclude my paper by "It may be that a future easier-to-use
> and less knowledge-oriented version of WebKB-2 is required" (for PORT).
> And I was thinking about concept maps, although I do not like them.    (025)

Simple is, as you imply, relative. Formal languages present problems
that all boil down to the creation of division between the priests and
laity. I vacillate between two reactions to that problem: "so?" and
"we ignore this at our peril."     (026)

> Humans can be precise, and in my opinion, actual collaboration
> (even without computers) requires it too. Most of the discussions
> I hear (e.g. political, ethical, technical) go on and on, and
> full circle, because of generalizations/imprecisions. People
> rarely contradict precise facts. Precise dicussions quickly end
> (often by recognizing that different persons want different things).    (027)

That doesn't strike me as a productive conclusion. In fact "end"
doesn't seem like a good goal. From my perspective the point of
collaboration is not to collect facts so that people agree they
disagree, but to change perspectives so that new understandings
emerge as part of ongoing dialog.    (028)

However: you are absolutely correct, most discussions go in circles,
covering the same ground. I don't think this problem is solved with
precise facts, but with tools that help people place knowledge of
what's come before into their current awareness.    (029)

Thus my attention to access and referential structures.     (030)

> This may not be achievable but it's much more a social problem than a 
> technical problem.    (031)

No debate from me on this point. It's all much more a social problem.    (032)

> The point is that peer-to-peer discussions or 
> writing/reading articles are extremely inefficient and time-consuming
> ways of publishing/searching precise and complete information compared 
> to publishing/searching doing it in a genuine Semantic Web (clearly not
> the Semantic Web envisaged by Berners-Lee). In fact, each person would do
> very little publishing since most would have already be written by others.    (033)

How incredibly boring! It implies that the world is not really subject
to interpretation.     (034)

> Again, this could only work with a "semantic" network of precise 
> nodes: it cannot work if the nodes can be collections of sentences
> (e.g. paragraphs) or general sentences such as those that (have to) fill
> research articles (including mine) or e-mails (including this one).
> Linear writing is different from very-fine-grained hypertextual writing.
> 
> Sorry for all that dreaming.    (035)

It is the dreaming which makes the example of the funny and fuzzy
associations that you ask for. Your email would have no power to
change my perspective if you presented it in a highly formal symbolic
representation.    (036)

We are collaborating for the sake of perspective change (learning),
yes?    (037)

Emails and research articles are going to happen, so we have to be
prepared to deal with them with whatever not quite perfect tools we
can make.    (038)

> > While it is true that computers must have formal constructs to
> > operate, we deny ourselves the elegance of our humanity if we
> > ignore the "interface and integration issues" which are the
> > stress of this year's workshop
> > (http://lml.bas.bg/iccs2002/PORT.htmI) and show that humans make
> > funny and fuzzy associations. It's those fuzzy associations that
> > I think are the best ones: they reach across voids and form new
> > knowledge.
> 
> Any example?    (039)

Have I made one, or should I think again?    (040)

> > Formalisms amongst humans work in small groups and fail in the
> > world at large. A group that insists on formalisms runs the risk
> > of not reaching across voids, but only reaching within to find
> > the same thing over and over again, thinking it new because it
> > was found by a new type of link in the graph.
> 
> Unless we are talking about different things, that is in
> direct opposition to the above presented vision which was supposed to
> eliminate the current world of redundancy, repetitions and difficulty of
> getting/publishing information, that is brought by the (sole) use
> of peer-to-peer communications, documents or other linear ways of
> communications. Ok, I won't convince you.    (041)

Right, thus my assertion that we are talking religion here.    (042)

> > Interestingly, I think small simple tools, such as the purple
> > numbers used to provide a high(er) level of granularity of access
> > to the papers in this review process promise a much more
> > immediate style of progress that will accelerate (and bootstrap)
> > inquiry.
> 
> I'd have used them much more if they had permitted to refer to sentences
> instead of whole paragraphs. Something like sec4.1_par3_sent2 would suit
> me, if I also have the possibility of providing the base URL just once,
> e.g. as in HTML: < base href="http://www.foo.bar/doc.html"; >    (043)

I mentioned purple simply as an example of a small tool that doesn't
require a fullscale revision of authoring and reading and thus can be
used soon, to significant effect.    (044)

Thanks again for this continued conversation.    (045)

-- 
Chris Dent  <cdent@burningchrome.com>  http://www.burningchrome.com/~cdent/
"Mediocrities everywhere--now and to come--I absolve you all! Amen!"
 -Salieri, in Peter Shaffer's Amadeus    (046)