|
|
|
Perl arrays
Why do we need the arrays
Previously in Perl data objects, we mentioned that we need scalar variables to hold values that we do not know at programming time. For example, if we want to take input from a file, check the length of each line, and output the length together with each line, we need a variable to represent each input line, as in the following:
- while (<>) {
print length($_), "\t$_";
}
Here the scalar variable $_ was automatically stored each input line consecutively until the end of the input file. Now think about the situation that instead of output the length of the whole line, we want now to output the length of each word in each line. In another word, we want to map an input file with words on each line to their lengths on each line. In this case, plain scalar variables are not going to be very helpful. One might think that we can create scalar variables $word0, $word1, $word2, ..., etc. to represent each word on a line. However, even this might work on the input side, to output these words will be a tedious typing task:
- print length($word0), " ", length($word1), " ", length($word2), ...
For as many words as it might possibly occur in a line, you need to add matching length($wordN), " " pairs to the print command. The " " extra space is to separate the length numbers so they will not be mingled together. Now, obviously, this is not the right way to do this job. We should use Perl arrays here to store each word from a line, and refer to them by their index numbers like this: $words[$i].
Array syntax
Each element of an array is referred to by using an index value, e.g., $words[0], $words[1], $words[2], etc. As mentioned previously, almost at everywhere a value is expected, you can use a variable. Therefore, you can also write $words[$i], $words[$j], and $words[$k] to represent the $i, $j and $k'th elements in the words array, whichever $i, $j, $k may be. The arrays as a whole are represented by the @ symbol, e.g., @words. There are situations where you will manipulate the arrays as a whole, like in the following example:
- my @words = split " ", $_;
The split function separates items in a scalar value (i.e., the $_ in this example) to its constituent parts based on a pattern. Here we just used a very special but simple pattern of one space character to mean any number of space characters. Normally, patterns would be denoted by the / symbol, not by the string quotes. Patterns can be very powerful and complicated, and will be the topic of later lectures. The above example will split all words in the line stored in $_ to individual words and return them as a list; naturally, we stored the list in the @words array so we can refer to them later. This example identified one convenient feature with Perl arrays: the array size is dynamic. There can be any number of words in each line and we do not know in advance, but we do not have to specify the length of the array because it will be determined automatically for us based on how many words the split function actually returns.
With the help of the arrays, the above problem can be much simply solved by using an array. For example, to map words to their lengths in the output, we can use the following code:
- while (<>) {
my @words = split " ", $_;
for (@words) {
print length($_), " ";
}
print "\n";
}
The logical flow of this code is as follows. As before, we get each line into the $_ symbol by way of the <> operator. We then split the line into words in the @words array. Then, for each word in the array, we output its length using another loop construct, the for loop. The for loop will assign each word in the @words array to the $_ default variable, and the print command will finally output its length plus a space character. After the inside loop, we output a newline to break the output line since the original newline symbol in the input is gone due to the split function.
If you are used to the for loop construct in the other programming languages like C or Java, the equivalent version in Perl is the following:
- while (<>) {
my @words = split " ", $_;
for (my $i=0; $i<@words; $i++) {
print length($words[$i]), " ";
}
print "\n";
}
Here we declared a variable $i right inside the for loop initialization, and compared it against the size of the @words array. As long as we have not exceeded its size, we print the length of the $i'th word in the array. The output works are the same, that we have to add a space after each length value, and we also add a newline after the loop. Note that in this case, you must refer to each array element in $words[$i] format since there is no default setting to the $_ variable with this code.
There is also a special variable created for each array to denote the index of the last element of the array, therefore it is always one less than the size of the @array because array indices starts from zero. For that @words array, the special variable is $#words. The dollar sign $ tells you that it is just another scalar variable holding the last index of an array with the same name.
Scalar and list context
If you are careful, you might have noticed that we are comparing a scalar variable $i with the array @words in the above example. Is that an error, and if not, what does that mean? To answer this, we need to understand the difference between scalar and list contexts. When the < operator is used with an array, it puts the array in scalar context, and when in scalar context, arrays will represent their lengths, not their elements. Thus, we are actually comparing the scalar variable $i with the length of the array @words, and that is correct and commonly used. In other situations, the arrays may be in list context and the whole content of the array will be represented, like in the following example:
Remember that the print command can take a list with elements separated by commas, and print each one of them out consecutively? Here, it is providing a list context to the array @words as well, thus all words of the array will be printed one by one. Actually, they will be printed without any space in between the words, thus we may need to add some space characters between the words using the join function:
The join function is like the counterpart of the split function; it will connect all items in the @words array back to one long string using its first argument as separators. Although we mentioned the scalar and list context difference here, in most cases you do not need to worry about that since the particular usage of arrays will automatically dictate only one context out of the two possibilities, e.g., the comparison operator < will be happy only under scalar context, and the print command will be happy under the list context. In special situations, however, you may want to change the context manually. For example, if you wish to output the length of an array, you may have to force the scalar context even within the print command list. Using the scalar keyword, you can achieve that:
You may omit the scalar keyword by first saving the length of the array into a scalar variable, and then output that variable instead. In the following example, the $array_size was converted to a one element list, and then printed out:
- my $array_size = @words;
print $array_size;
Perhaps the most notorious confusion between scalar and list contexts are with the reverse function. The reverse function will flip all characters in a string under scalar context, or all elements in an array under list context. When given just one argument, however, it is not clear which context it is in!
You will not see anything difference in the above command output, since reverse is in list context due to the print command, thus it thinks that it is given a one member array and reverses the array. Since there is only one array element, the reverse of the array is still the same, thus nothing seems to have changed. To be sure that reverse is doing something really, try on a two element array:
- print reverse "abc", "def";
This time, it will reverse the two words in the output, but not their characters. However, to actually reverse the characters, you need to force the scalar context as when printing the length of the arrays:
- print scalar reverse "abc", "def";
The map function
In the word length example above, we applied the length function on each element of the array and then individually printed them out. Since that kind of job is very common, there is a special function map that can help us convert words to their lengths in one command. When coupled with the join function, we can achieve the same thing using the following shorter code:
- while (<>) {
my @words = split;
print join (" ", map { length; } @words), "\n";
}
The parentheses are necessary so the newline symbol at the end will not be eaten by the map function. The map function takes a fragment of Perl code in braces { }, in this case the length function, and apply that function to each word in the @words array. It will return a new array with length values instead, which will be immediately taken up by the join function to form a single string of values separated by spaces, and finally the print command prints that string out together with a newline. Since by default split is using " " and $_ as its arguments, therefore we may also omit those arguments as well.
Sorting and search in arrays
The other commonly done jobs with arrays are to sort the elements in an array and to find elements in an array. To sort the elements in an array, use the sort function, which is very similar to the map function that it takes a comparison code in braces { } that will return -1, 0, or 1 based on the comparison of two temporary array elements denoted by $a and $b, and depend on if $a should be before, the same, or after $b in the sorted list. For example, to sort an array of words alphabetically, use one of the following commands:
- sort { if ($a le $b) { -1 } elsif ($a eq $b) { 0 } else { 1} } @words;
- sort { $a cmp $b } @words;
- sort @words;
The first line spelled out the comparisons directly, so if $a is less than $b, you get -1, and if they are equal, you get 0. Otherwise, you get 1. Since this kinds of comparisons are very common, Perl has a special comparison operator cmp that compares two strings and returns -1, 0 or 1 depending on the order of the two strings. A similar comparison operator for numbers are <=>. The second line above makes use of the cmp comparison operator in lieu of the lengthy if-elsif-else code, and finally, since the default is to sort words in ascending order, therefore you can even omit the code block like in line 3 above, and Perl will automatically apply the default cmp operator for you. Of course, if you want to sort things in descending order or other fancy ways, you need to provide your own code block to determine the order between two items.
To find elements in an array, just use a loop. For example, the following pseudo code (meaning that it offers concept but may not be actually executable code) will allow you to look up values and do something about them:
- for (@arrays) {
if ($_ contains something I am looking for) {
do something about $_;
}
}
There are more array related techniques and functions then we can cover here, but the above should be enough to solve many problems you will encounter. Later lectures will offer more examples.
|