WEKA --- A Starter's Guide 


 

You are about to learn WEKA, a practical machine learning tools  package to help you in your following lab assignments. "WEKA" stands for the Waikato Environment for Knowledge Analysis, which was developed at the University of Waikato in New Zealand. WEKA is extensible and has become a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost every platform. 

WEKA is easy to use and to be applied at several different levels. You can access the WEKA class library from your own Java program, and implement new machine learning algorithms.

There are three major implemented schemes in WEKA. (1) Implemented schemes for classification. (2) Implemented schemes for numeric prediction. (3) Implemented "meta-schemes" .

Besides actual learning schemes, WEKA also contains a large variety of tools that can be used for pre-processing datasets, so that you can focus on your algorithm without considering too much details as reading the data from files, implementing filtering algorithm and providing code to evaluate the results. 

 

We have  installed one copy of WEKA package in our CS573 course account. You don't have to install your own copy, and by doing the following steps, you will be able to run it:

1) Log into your CS account.

2) Create a directory called weka using: 

    > mkdir weka

    > cd weka

3) Create a symbolic link to our installed weka pack:

    > ln -s /home/course/cs573x/weka-3-4/weka.jar

4) You are able to run weka now:

    > java -jar weka.jar

    *note: this will only work on Linux or Solaris workstations, as this will run GUI version of WEKA

5) Please make the following two links to enable you to access the source code of weka and some sample datasets.

    > ln -s /home/course/cs573x/weka-3-4/weka

    > ln -s /home/course/cs573x/weka-3-4/data

The following step is to set the CLASSPATH environment variable prior to use the command line. You need to indicate the location of the weka.jar file.

1) For sh, ksh and bash users:

  Please add "export CLASSPATH=/home/course/cs573x/weka-3-4/weka.jar:$CLASSPATH" into your shell configuration profile.

2) For csh and tcsh users:

  Please add "setenv CLASSPATH /home/course/cs573x/weka-3-4/weka.jar" into your shell configuration profile.

3) To test if it is set correctly:

Type "java weka.classifiers.trees.j48.J48 " (note the change from previous versions, in case you are familiar with any), it should display a list of all learning options for J48. If it displays an exception error message, then you will check if you set the environment variable correctly. Please make sure that you also set the correct path to Java, so that the system can locate and run Java. 

Now, the installation is done! You can run WEKA either in command line or in graphic user interface. Remember only in Linux or a Solaris workstation can you run GUI, telnet won't work.

 

If you have a home computer and would like to install WEKA by yourself, please check the following:

There are two stable versions of WEKA. Either you can download the self-extraction executable version that includes the Java Virtual Machine 1.4 (weka-3-4jre.exe; 19,543,851 bytes), or the self-extracting executable without Java VM (weka-3-4.exe; 6,467,165 bytes). This new version comes with the GUI, which provides the user with more flexibility than the command line. 

After extracting the files, you will need to set your classpath variable to a complete path to weka.jar (suppose you extracted WEKA to C:\Weka, then set your classpath variable to C:\Weka\weka.jar, ie add "C:\Weka\weka.jar;" to the list of values that environment variable Path can take when working in Windows)

If you don't have administrator privileges, you can still install WEKA. For that, download the jar archive (weka-3-4.jar; 6,322,417 bytes). Make sure that the Java J2SE 1.4 (download from SUN) is installed on your system (which includes the jar utility). Then open a command line console, change into the directory containing weka-3-4.jar, and enter

jar -xvf weka-3-4.jar

This will create a new directory called weka-3-4. To un-jar (install) the source code, position yourself in the recently created weka-3-4 directory and type

jar -xvf weka-src.jar

Which will create a new directory weka containing the source code. Since WEKA is open source software issued under the GNU General Public License, you can use and modify the source code as you like.

NOTE: It seems that Windows will not set up your CLASSPATH properly if any of the WEKA directories contains spaces. Therefore, installing Weka in the Program Files folder is not a good idea.

From your weka-3-4 directory, you will find:

The most detailed and up-to-date information could be found in the online documentation on WEKA Web Site . This page has a lot of documentation and guides on installation/usage pages.


This page is written by Jun Zhang  (updated by Facundo Bromberg, Oksana Yakhnenko and Rafael Jordan after installed version 3-4).

For questions, write to rjordan at iastate.edu