|
|
Research
|
Current research focus: Learning from folksonomies.
What is a folksonomy?
According to Wikipedia, a folksonomy is “an Internet-based
information retrieval methodology consisting of collaboratively
generated, open-ended labels that categorize content such as Web pages,
online photographs, and Web links. A folksonomy is most notably
contrasted from a taxonomy in that the authors of the labeling system
are often the main users (and sometimes originators) of the content to
which the labels are applied. The labels are commonly known as tags and
the labeling process is called tagging.”
Notice that I used Wikipedia as the source of our definition. This is
not accidental; If we would look for “folksonomy” in the
latest Merriam and Webster dictionary we would not find it (as of Jan
2007). This is an example of a much bigger phenonmenon: the collective creation of knowledge.
Types of folksonomies.
Folksonomies have different characteristics depending on the design of
the system that supports the tagging process: The tagged objects range
from webpages to images, music, video, physical objects such as
products or locations and other users. Tags can be labels, numerical
values as in the case of ratings, icons, votes or geographical
information.
The potential value of folksonomies
Folksonomies are a real-world example of ontology creation through
emergent semantics, meaning that folksonomies appear naturally from the
interaction of a community of users and not through a directed endeavor
of creating a consistent and complete ontology of a domain. The second
approach, favored by the proponents of the Semantic Web has as
challenges the dimension of the project and the inability to
accommodate fuzziness and change over time (ontological drift).
In a shell, folksonomies offer: quick, practical and dinamic
classification. The drawbacks are: synonimy, polysemy and the possible
semantic gap between tags.
It has been shown
that, over time, for a particular object of interest
(bookmarked webpage) stable tagging patterns emerge (the
vocabulary is self-limiting).
Machine learning and its applications for folksonomies:
I. Folksonomy visualization – data reduction - What are the proper ways in which folksonomies can be visualized?
II. Social network analysis - Find communities, opinion leaders, roles,
central actors, prestigious actors, interesting or valuable objects in folksonomies.
III. Search - Update search techniques such that for a given query, factors like
tags, users and content of webpages, are all taken into account
IV. Structuring - Find
Past approaches and their limitations
Recent papers have already started the investigation of methods for
representing and exploiting folksonomies by representing the two-way
relations between the users, tags and objects in the folksonomy.
However, the natural relations that appear in the context of
folksonomies are ternary relations of the form user-tag-object, which
cannot be represented by graphs relations without losing information.
Solution - I will be presenting a possible solution in the upcoming paper to be submitted to KDD 2007
Spectral techniques for Machine Learning.
Reason: The existing results of spectral clustering in machine vision and relational learning and its fair scalability.
Current plans: Adapt spectral techniqies to folksonomies
Some other things I'm working on:
Value-based Markov Blankets
Error localization in programs using MINCOST SATISFIABILITY
Some other things I'm interested in:
Wavelets and frames for Machine Learning.
Machine learning for Artificial Creativity - Learning Artisitic Styles
- I'm interested especially in extracting visual styles from images
grouped by artists, school or technique. Introductory work in
this field dates back to the 70's when Georgy Stiny and James Gips introduced the idea of shape grammars.
However, an image has different levels of interpretation such as its subject that could be represented by the objects and their relative position and technique posibly represented by the colour histogram, colour density by unit of surface and so on.
Past results. With
TRIPPER we managed to learn useful abstract rules from data. This
is exciting especially for text mining where the explicitness of
results is crucial. For example:
The 2 non-abstracted rules:
If document contains dollar and contains bank then subject is financial.
If document contains euro and contains bank then subject is financial.
are replaced by the more abstract:
If document contains currency and contains bank then subject is financial.
|
|