Re: [port-peer-review] reviews
Dear Chris, (01)
An answer to your question and a few comments on your comments. (02)
Cheers, (03)
Philippe (04)
> > I was more surprized by the absence of references to the use of
> > Formal Concept Analysis (or similar methods) for the
> > structuration and navigation of a base of documents and the
> > terms used for indexing it. Indeed, this approach has the
> > advantages of producing a genuinely understandable and
>
> According to who? (05)
1) myself. Even the "cosinus measure", the simplest and
easiest-to-understand vector-space indexation technique,
is a statistical measure: the results are numbers, the
meaning of which is far from being intuitive, and represent
a huge loss of information. In FCA, the extracted relations
between documents/terms are clear (simple subset relationships),
and are all kept: you can use them to compare any set of
documents/terms to another and hence, most often, you can use them
to regenerate (and explain) the particular views/clusters that
a particular LSA-style technique has encoded via numbers.
Furthermore, actually-used LSA-style techniques are much more
complex than the "cosinus measure" and involve a lot of
tweaking around with the weights to get acceptable
completeness/precision ratios for a given set of documents.
There is no such things with FCA. The completeness/precision
ratio still applies because words (instead of category identifiers)
are used for the indexation, but it is less relevant. This ratio
is irrelevant with knowledge representations since then you
can have perfect precision and completeness (ok, not for "all"
questions, and with respect to the closed world of a KB). (06)
2) the authors of an article that I read about two months ago and
comparing FCA, LSA, the "cosinus" measure, and another
clusterization technique. Unfortunately, I forgot the reference.
FCA compared well (it precised and explained the clusters obtained
via the other techniques) but was not used alone: some pre-filtering on
the documents or terms to index was done via LSA-style technique. (07)
> Once the knowledge has been formally represented it is possible
> to perform transformations and computations, but how does the
> knowledge go in? Unless I'm missing something (entirely possible,
> as the entry aspect of the examples was rather glossed)
> there is some step between identification of document elements (DE)
> and connecting "them to other DEs or statements" in the WebKB formal
> structure. The author acknowledges this step is "the most
> difficult and time-consuming for the users."
> http://lab.bootstrap.org/port/papers/2002/martin.html#nid013 (08)
The object of the sentence you refer to was: "The use of formal or
semi-formal statements (instead of types) to represent some of the
content of DEs or other statements, connect them to other DEs or
statements, and make judgements or hypothesis about them, ...". (09)
I added "connecting DEs or statements" to be complete but this is
the easiest and least precise/knowledge-based/exploitable part.
Anyone can easily connect document elements (DEs) by rhetorical relations
(e.g. specialization, proof, example, ...), argumentation relations
(e.g. answer, contradiction) and the few more precise relations
(e.g. corrective_specialization and corrective_generalization)
proposed and exploited by WebKB-2 uses for reducing redundancy and
increasing consistency and comparability in the KB, and hence
cooperation between users.
Here, the formalism is simple enough not to be an obstacle.
Examples: [http://foo.bar/doc.html#nid43, example: "an example"];
or ["birds fly", corrective_specialization: "most health birds are
able to fly"]. Alternatively, connecting DEs by rhetorical relations
can be done via the interface of any good hypertext tool (this excludes
usual Web browsers).
The problem is that the granularity of such "representations" is far too
coarse. The content of the DEs are not represented (i.e. the actual
objects of interest and their interelations are not explicited) and
therefore there is nearly nothing semantic to exploit for retrieving,
comparing and cross-linking information in answer to questions. (010)
Hence, what I was mainly referring to was "to represent some of the
content of DEs or other statements". And that's what I illustrated
in the "4. Examples" section (019) of my article.
Such representations, I do agree, are difficult.
I emphasized it and explained why in this Examples section, e.g. in
http://lab.bootstrap.org/port/papers/2002/martin.html#nid031
In fact, I seriously doubt that scholars will take the time to create
representations like the ones I illustrated in the Examples section.
This is why I conclude my paper by "It may be that a future easier-to-use
and less knowledge-oriented version of WebKB-2 is required" (for PORT).
And I was thinking about concept maps, although I do not like them. (011)
I do not really understand what "missing step" you are looking after.
You want to represent something, WebKB proposes notations, conventions,
an ontology to extend, and procedures to check the entered knowledge.
There is not much choice left in the order of the operations:
1) selecting or creating the categories (concept/relation types, individual),
2) writing statements using categories and quantifiers.
The choice of categories and statements depends on what you want to express.
If you don't really know what to express, methodologies such as KADS might
guide you but if you want to represent your thoughts (or more precisely,
sentences) you do not need methodologies. (012)
> Perhaps it is the nature of the beast,
> but the need for a "precise, flexible and
> exploitable-for-inferencing approach"
> (http://lab.bootstrap.org/port/papers/2002/martin.html#nid013)
> gets, to my eye, in the way of collaboration. Humans are not
> precise, computers are not flexible. (013)
Humans can be precise, and in my opinion, actual collaboration
(even without computers) requires it too. Most of the discussions
I hear (e.g. political, ethical, technical) go on and on, and
full circle, because of generalizations/imprecisions. People
rarely contradict precise facts. Precise dicussions quickly end
(often by recognizing that different persons want different things). (014)
For me, the optimum collaboration between people would be achieved
by permitting them to store precise facts "at the right place"
in a "Semantic Web", that is, in a structured way, once and for all,
without repetitions in many ways and in many documents, and such that:
- from any usual idea, one could see an organized collection of
pro/cons facts or fact-supported hypothesis,
- from any usual goal, one could see an organized collection of
method or tool permitting to reach it,
- from any usual object/service, one could see an organized collection of
ways to acquire it, use it, know its advantages/drawbacks, etc. (015)
This does NOT imply any central authority (anyone could add new
links/objects from any object), nor any centralized physical architecture
(as long as the Web services that store knowledge about a certain domain
piggy-back from each other, it does not matter where people store their
knowledge, or where they search). (016)
This may not be achievable but it's much more a social problem than a
technical problem. The point is that peer-to-peer discussions or
writing/reading articles are extremely inefficient and time-consuming
ways of publishing/searching precise and complete information compared
to publishing/searching doing it in a genuine Semantic Web (clearly not
the Semantic Web envisaged by Berners-Lee). In fact, each person would do
very little publishing since most would have already be written by others.
Again, this could only work with a "semantic" network of precise
nodes: it cannot work if the nodes can be collections of sentences
(e.g. paragraphs) or general sentences such as those that (have to) fill
research articles (including mine) or e-mails (including this one).
Linear writing is different from very-fine-grained hypertextual writing. (017)
Sorry for all that dreaming. (018)
> While it is true that computers must have formal constructs to
> operate, we deny ourselves the elegance of our humanity if we
> ignore the "interface and integration issues" which are the
> stress of this year's workshop
> (http://lml.bas.bg/iccs2002/PORT.htmI) and show that humans make
> funny and fuzzy associations. It's those fuzzy associations that
> I think are the best ones: they reach across voids and form new
> knowledge. (019)
Any example? (020)
> Formalisms amongst humans work in small groups and fail in the
> world at large. A group that insists on formalisms runs the risk
> of not reaching across voids, but only reaching within to find
> the same thing over and over again, thinking it new because it
> was found by a new type of link in the graph. (021)
Unless we are talking about different things, that is in
direct opposition to the above presented vision which was supposed to
eliminate the current world of redundancy, repetitions and difficulty of
getting/publishing information, that is brought by the (sole) use
of peer-to-peer communications, documents or other linear ways of
communications. Ok, I won't convince you. (022)
> Interestingly, I think small simple tools, such as the purple
> numbers used to provide a high(er) level of granularity of access
> to the papers in this review process promise a much more
> immediate style of progress that will accelerate (and bootstrap)
> inquiry. (023)
I'd have used them much more if they had permitted to refer to sentences
instead of whole paragraphs. Something like sec4.1_par3_sent2 would suit
me, if I also have the possibility of providing the base URL just once,
e.g. as in HTML: < base href="http://www.foo.bar/doc.html" > (024)
> Waiting for the perfect knowledge representation will take forever. (025)
Having created my own "ideal" KRLs (Frame-CGs and Formalized English)
to solve the problems of KIF (too low-level) and CGLF (more high-level
but not precise and expressive enough, too much syntactic sugar, and
an un-intuitive way to represent quantifiers), I have to disagree. :-)
Frame-CGs is very concise, while Formalized English looks like some
bad but understandable-by-anyone pidgin English and, unlike other
structured English (e.g. ACE), is very expressive and does not rely
on ontological assumptions. (026)