Iowa State University

Iowa State UniversityIowa State University
email: flavian@cs.iastate.edu   phone: 515-294-7331
Flavian C. Vasile
Artificial Intelligence Research Laboratory

Department of Computer Science

Research 
Current research focus: Learning from folksonomies.
What is a folksonomy? According to Wikipedia, a folksonomy is “an Internet-based information retrieval methodology consisting of collaboratively generated, open-ended labels that categorize content such as Web pages, online photographs, and Web links. A folksonomy is most notably contrasted from a taxonomy in that the authors of the labeling system are often the main users (and sometimes originators) of the content to which the labels are applied. The labels are commonly known as tags and the labeling process is called tagging.”
Notice that I used Wikipedia as the source of our definition. This is not accidental; If we would look for “folksonomy” in the latest Merriam and Webster dictionary we would not find it (as of Jan 2007).  This is an example of a much bigger phenonmenon: the collective creation of knowledge.

Types of folksonomies. Folksonomies have different characteristics depending on the design of the system that supports the tagging process: The tagged objects range from webpages to images, music, video, physical objects such as products or locations and other users. Tags can be labels, numerical values as in the case of ratings, icons, votes or geographical information.

The potential value of folksonomies
Folksonomies are a real-world example of ontology creation through emergent semantics, meaning that folksonomies appear naturally from the interaction of a community of users and not through a directed endeavor of creating a consistent and complete ontology of a domain. The second approach, favored by the proponents of the Semantic Web has as challenges the dimension of the project and the inability to accommodate fuzziness and change over time (ontological drift).
In a shell, folksonomies offer: quick, practical and dinamic classification. The drawbacks are: synonimy, polysemy and the possible semantic gap between tags.
It has been shown that, over time,  for a particular object of interest (bookmarked webpage)  stable tagging patterns emerge (the vocabulary is self-limiting).

Machine learning and its applications for folksonomies:
I. Folksonomy visualization
– data reduction - What are the proper ways in which folksonomies can be visualized?
II. Social network analysis - Find communities, opinion leaders, roles, central actors, prestigious actors, interesting or valuable objects in folksonomies.
III. Search - Update search techniques such that for a given query, factors like tags, users and content of webpages, are all taken into account
IV. Structuring - Find

Past approaches and their limitations
Recent papers have already started the investigation of methods for representing and exploiting folksonomies by representing the two-way relations between the users, tags and objects in the folksonomy.  However, the natural relations that appear in the context of folksonomies are ternary relations of the form user-tag-object, which cannot be represented by graphs relations without losing information.

Solution - I will be presenting a possible solution in the upcoming paper to be submitted to KDD 2007

Spectral techniques for Machine Learning.
Reason: The existing results of spectral clustering in machine vision and relational learning and its fair scalability.
Current plans:  Adapt spectral techniqies to folksonomies

Some other things I'm working on:
Value-based Markov Blankets
Error localization in programs using MINCOST SATISFIABILITY

Some other things I'm interested in:
Wavelets and frames for Machine Learning.

Machine learning for Artificial Creativity - Learning Artisitic Styles - I'm interested especially in extracting visual styles from images grouped by artists, school  or technique. Introductory work in this field dates back to the 70's when Georgy Stiny and James Gips introduced the idea of shape grammars.
However, an image has different  levels of interpretation such as  its subject that could be represented by the objects and their relative position and technique posibly represented by the colour histogram, colour density by  unit of surface and so on.


Past results. With TRIPPER we managed to learn useful abstract rules from data. This is exciting especially for text mining where the explicitness of results is crucial. For example:
The 2 non-abstracted rules:
If document contains dollar  and contains bank then subject is financial.
If document contains euro  and contains bank then subject is financial.
are replaced by the more abstract:
If document contains currency  and contains bank then subject is financial.