Perl data objects

Perl program format and execution

Perl is an interpreted language, where your source code is directly executed by the Perl interpreter. The Perl interpreter is just a program that takes your source code in text format and execute your instructions. It is usually located under /usr/bin/perl but may be at a different location on your system. To find out where Perl program is installed, use the shell command 'which perl'.

 

Since it is the Perl interpreter which actually carries out your instructions, you need to somehow link your source code to that interpreter. You can always run your program by calling Perl directly, and giving your source code as its argument: 'perl your_program.pl'. However, most people prefer to run Perl programs as if they are directly executable by the operating system. To achieve that, you have to put a special first line in your Perl source code to tell the shell which interpreter will be reading in and executing your Perl program. In most system, this will be '#!/usr/bin/perl'. However, do use the 'which perl' command to find out where Perl was actually installed in your system.

 

For new users, in order to prevent many unforeseen errors programming in Perl, it is recommended that you also add two more pragma lines to your source code: 'use strict;' and 'use warnings;'. These tell Perl interpreter that if it senses something funny, don't assume anything but warn you about it. These two pragmas greatly help you locate programming bugs.

Numerical and textual data

Perl is convenient because it does not make a hard distinction between numerical and textual data formats. You can write numbers as 123, or '123', and depending on your usage of the data, Perl will convert the string '123' to the value 123 dynamically. Of course, if you know ahead of time the 123 is supposed to be used as numbers, then you don't want to type in a string to waste CPU time converting it to real value later. In general, when a string is to be converted to values, it will either convert to the true value, or to zero if it does not look like a number. If you try to convert 'abc' to a number, Perl will warn you about it if you have the two pragmas set above.

 

The standard mathematical operators tell Perl that you intend to use certain data as numbers, i.e., +, -, *, /, % are all numerical operators. You also have numerical comparison operators, e.g., <, <=, ==, >=, >, that tells Perl you are comparing numbers. There is a distinctive set of operators that compare strings, e.g., lt, le, eq, ne, ge, gt, that tells Perl you are comparing strings. The string comparisons and number comparisons will produce different results on numbers. For example, 2 is smaller than 10 if compared as numbers, but greater than 10 if compared as strings because alphabetically, 2 is after 1 in the 10 and the first letter already determines it's greater than 1.

 

Textual data need to be quoted inside Perl to avoid confusing the Perl interpreter. The command 'print print;' does not make sense to Perl, but 'print 'print';' does, which tells Perl to print the string 'print' out. There are single quotes and double quotes in Perl. The double quotes will be so called interpolated, thus it is more commonly used. We will explain interpolation in a moment.

 

There are many functions that work on the data, and convert them to newer data. For example, length('abc') will tell you the length of the string 'abc' is 3, substr('abc', 1, 1) will give you the second letter b in the string, and sin(3.14159) will give you the sine value and radian 3.14159, etc. There are many functions in Perl, and almost all common mathematical functions are directly available in Perl. Use the 'perldoc' command to find out how to use specific functions, say, 'perldoc -f sin'.

Scalar variables

A programming language will not be very useful if it only handles data statically provided by you in a source code. The most important dynamic data object in Perl is the scalar variables, or simply variables. To represent a variable, use the dollar sign as in $variable. You have to declare a variable before use (because of the two pragmas we mentioned earlier), by typing 'my $variable;' To declare a set of variables, put them in parentheses like 'my ($i, $j, $k);'. To assign values to a variable, use the = operator, note that it is different to the == comparison operator, which compares two values. For example, you can assign both value and string to the same variable in different context, although generally people will have one type of data intended for one variable: '$variable = 'abc'; ...; $variable = 123'. Naturally, by combining the operators and functions, you can do a lot of computations and save the results to Perl variables:

  • $sin_value = sin(3.14159);
  • $combine = 'abc' . 'def';
  • $letter_2 = substr('abc', 1, 1);

Note that in the second example above, we used the concatenate operator . to combine two strings into one. In all examples above, the source data can come from other variables:

  • $sin_value = sin($radian);
  • $combine = $string1 . $string 2;
  • $letter_2 = substr($some_string, 1, 1);

Why would you need a variable? When you want to take data from outside of your source program, you need to store them somewhere in your program so you can do some computation with them, but since they are outside of your program, you can't possibly refer to their values. In order to refer to them in your computation, you use variables. Variables can be used almost everywhere that Perl expects some data to work on, e.g., expressions, function arguments, print command, etc.

String interpolation

With variables comes the possibility of string interpolations. In double-quoted strings, you can refer to variable names just like you did in the examples above. Perl will then replace all variable names by their content value, possibly sandwiched by other verbatim text you have in a string. For example, all of the following print "hello world" to the screen:

  • print "hello, world\n";
  • my $string = "hello, world\n";
    print $string;
  • my $string = "hello, world";
  • print "$string\n";

The last example uses string interpolation, and adds the newline symbol with your string. String interpolation in double-quoted Perl strings are the most versatile Perl feature you will come to appreciate when you need to formulate your output.

Input and output

To output data from your program, simply use the 'print' command as we have seen. There are other more sophisticated output formatting commands that will be covered later in the class. To take input, you can start using the easier input operator <>, which is two opposite arrow brackets without space in between. Its content comes from different sources, depending on how you run your program:

  • If you run your program with input file names as arguments, the <> operator will take one line at a time from all files you named in the argument list, and finally return the 'undef' value to signal the end of all input:
    your_perl_program.pl input_file1 input_file2 input_file3
  • If you run your program without giving any argument, the <> operator will take its input from the standard input, which usually is your keyboard.
  • Finally, you can use the I/O redirection of tcsh to make the standard input link to some input files, like this:
    your_perl_program.pl < input_file1

The last example is less commonly used since you can use the first example to load the content of a file.

Default variable

Many Perl operator like the input operator <> assumes a default variable to store its value when you didn't specify one. For example, typing '<>;' on its own actually means '$_ = <>;', where $_ (that's an underscore after dollar sign) is the default variable to store the line from input files when you didn't specify one. Therefore, after that '<>;' command, you can refer to the input line as $_. Of course you can also explicitly assign the value to some variables you named, 'my $input_line = <>;', but many people prefer to convenience for having the default variable so they do not need to type the whole assignment statement.

Basic looping construct

Each time the <> operator is called upon, it reads one line from the input and put the line into the $_ variable (or any explicitly defined variable you assigned to). Since you do not know how many lines are there in your input, you cannot keep typing <>, <>, <>, ..., to read line 1, line 2, line 3. This is the perfect case that you need a 'loop' to go through all your input lines without repeating the <> operator, as in the following example:

  • while (<>) {
      print length($_), "\t$_";
    }

This three line code block will loop through all your input lines, assigning them one at a time to the $_ variable by default. Inside the loop, you print the length of the current line (i.e., $_ to the length function), add a tab character \t, and output the line itself. Here the newline is not needed because <> takes each line, including its ending newline, into $_ each time it is called. When there is no more lines in the input, <> will return undef, which will cause the 'while' loop to stop. We will talk more about looping constructs in later lectures.

 

Last modified July 13, 2007. All rights reserved.