Book Reviews

The following book reviews are the copyright of their respective authors and no part should be reproduced without the express permission of the author. Publishers and Authors of the books reviewed may reproduce the whole or extracts of a review for their book. To request copyright permission please email webmaster@birmingham.pm.org.

All the reviews herein are the opinions of the reviewer and are not necessarily the views of Birmingham Perl Mongers and its members. If you feel a review or comment has been made in error, please contact webmaster@birmingham.pm.org to rectify the situation.

Perl Books

Static Link: http://birmingham.pm.org/reviews/11

 
Perl and LWP
Title:Perl and LWP
Author(s):Sean M. Burke
ISBN:0-596-00178-9
Publisher:O'Reilly Media
Reviewer:Barbie

Reading the POD entries for all LWP suite of modules can be mind blowing. The aim of this book is to demystify the use of the LWP suite, and separate out what you need to know from what lies beneath and just does the job.

The first few chapters cover much of the basics, and introduces concepts, such as Cookies, which are later covered in more depth. We are introduced, via the LWP::Simple module to GET and POST. This is then followed by the $browser object via LWP::UserAgent and HTTP::Response, which are neatly wrapped up into the single LWP module.

The methods are explained in reasonable depth, as are the parameters and return values. One aspect which is covered very well is the ability to read and change the HTTP header lines. Much of the server and response interrogation is concerned with the header attributes, and getting the best out of them can be very useful. There are many functions that analyse the response and content, of which the important ones are dealt with here. Handling success or failure, redirection or authentication, it's all covered.

Next up are URLs. Interrogating URLs can be a tricky business, as there are some very longwinded rules as to how a URL can be broken up. Thankfully the URI module is our friend. By splitting URLs or even changing them on the fly, again can be very useful, particularly if you were writing Fetch style application or were buried deep in an Apache module. The ability to build query strings with the ability to just rely on the URI module to get it right is quite a relief.

Forms are the next subject on the menu, and this is perhaps the only part of the book that made me wonder "why?". The first part of the chapter looks all the possible form fields, and the how they look in HTML. If someone is screen scraping, they should know all this, and if they don't then they really need a decent HTML book first. If you're not screen scraping do you need to know about what HTML tags and attributes look like? To my mind I just need to know when creating a query pair, do I need to have a name and value or name and list? Perhaps passing a filename from a FILE field is the only tricky one. The Perl code for the chapter does explain what to do very well, but I just felt the first part of the chapter was a bit needless.

The next two chapters deal with HTML processing, or screen scraping. The first shows the methods used when utilizing regular expressions to match characters, whereas the second shows you how to use token parsing, using HTML::TokeParser. Both have their place and the examples used show the usage of each method very clearly.

In many instances you will likely use a combination of both methods and in the next chapter the author does just that. By taking the preceding two chapters and showing how to use them in real world examples, we get to see how well the work together. The chapter highlights some of HTML::TokeParser's special qualities, such as how get_text() and get_text_trimmed() assume IMG tags as virtual text, and use the contents of the ALT attribute as if the tag was just a text entry. Not something you automatically think about.

However, in and of themselves, token parsing and regular expressions can be a longwinded process to get at what you want. Thankfully there is a module that can help simplify things further, the author's very own HTML::TreeBuilder. HTML::TreeBuilder together with HTML::Element are explained in the next two chapters. Taking the examples used in the earlier chapters, we are shown a set of simpler methods to extract what we want. The chapters do go further and highlight the ability to totally transform a HTML page by juggling around the tags and nodes.

Then on to Cookies and Authentication. Both are looked at briefly in earlier chapters, but here they are given a bit more depth. There are many instances when you may wish to access a secure site to reap pages, this chapter shows you how.

Finally Spiders make an appearance. The preceding chapters provide the building blocks to build your own spider, which is one of the most written applications of LWP. Spiders come in several different flavours, but essentially they all pull back pages and search links. How detailed that goes is an exercise for the reader.

The Appendices I found useful long before I'd even got halfway through the book. The LWP Module list contains all the public functions contained within the LWP bundle, although many of which you'll never directly use. It is also good to have the HTTP Status Codes, Common MIME Types, Language Tags and Common Content Encodings gathered together, as many times I've found myself off searching the web to remind myself of what they are. Of most use for me, has been the ASCII table. Not only does it include the the most frequently used characters in the ASCII table, it also provides the Decimal, Octal, Hexadecimal, raw encoding and UTF8 encoding, as well as the symbolic representation and HTML entity code of the character where applicable.

The last Appendix is an article the author wrote for The Perl Journal. On the face of it, the article does seems like filler text, however, as it deals with Object Orientated Programming and much of the LWP interface relies on OOP, users who haven't had much experience of this style of programming, will find this appendix a worthwhile read.

On the whole this book covers a lot of ground very quickly. However, many of the examples have been styled to teach you as much possible without going too far. If you're happy to wade through the POD documents accompanying the associated modules, then this book will be a bit tame. However, if you are new to LWP or have found some of its construction and usage difficult, then you'll love it.