DOC: perldelta.pod for 5.005_55 
Author Message
 DOC: perldelta.pod for 5.005_55

=head1 NAME

perldelta - what's new for perl5.006 (as of 5.005_55)

=head1 DESCRIPTION

This document describes differences between the 5.005 release and this one.

=head1 Incompatible Changes

=head2 Perl Source Incompatibilities

None known at this time.

=head2 C Source Incompatibilities

=over 4

=item C<PERL_POLLUTE>

Release 5.005 grandfathered old global symbol names by providing preprocessor
macros for extension source compatibility.  As of release 5.006, these
preprocessor definitions are not available by default.  You need to explicitly
compile perl with C<-DPERL_POLLUTE> in order to get these definitions.

=item C<PERL_POLLUTE_MALLOC>

Enabling the use of Perl's malloc in release 5.005 and earlier caused
the namespace of system versions of the malloc family of functions to
be usurped by the Perl versions of these functions, since they used the
same names by default.

Besides causing problems on platforms that do not allow these functions to
be cleanly replaced, this also meant that the system versions could not
be called in programs that used Perl's malloc.  Previous versions of Perl
have allowed this behavior to be suppressed with the HIDEMYMALLOC and
EMBEDMYMALLOC preprocessor definitions.

As of release 5.006, Perl's malloc family of functions have default names
distinct from the system versions.  You need to explicitly compile perl with
C<-DPERL_POLLUTE_MALLOC> in order to get the older behavior.  HIDEMYMALLOC
and EMBEDMYMALLOC have no effect, since the behavior they enabled is now
the default.

Note that these functions do B<not> constitute Perl's memory allocation API.
See L<perlguts/"Memory Allocation"> for further information about that.

=item C<PL_na> and C<dTHR> Issues

The C<PL_na> global is now thread local, so a C<dTHR> declaration is needed
in the scope in which it appears.  XSUBs should handle this automatically,
but if you have used C<PL_na> in support functions, you either need to
change the C<PL_na> to a local variable (which is recommended), or put in
a C<dTHR>.

=back

=head2 Compatible C Source API Changes

=over

=item C<PATCHLEVEL> is now C<PERL_VERSION>

The cpp macros C<PERL_REVISION>, C<PERL_VERSION> and C<PERL_SUBVERSION>
are now available by default from perl.h, and reflect the base revision,
patchlevel and subversion respectively.  C<PERL_REVISION> had no
prior equivalent, while C<PERL_VERSION> and C<PERL_SUBVERSION> were
previously available as C<PATCHLEVEL> and C<SUBVERSION>.

The new names cause less pollution of the cpp namespace, and reflect what
the numbers have come to stand for in common practice.  For compatibility,
the old names are still supported when patchlevel.h is explicitly
included (as required before), so there is no source incompatibility
due to the change.

=back

=head2 Binary Incompatibilities

This release is not binary compatible with the 5.005 release and its
maintenance versions.

=head1 Core Changes

=head2 Binary numbers supported

Binary numbers are now supported as literals, in s?printf formats, and
C<oct()>:

        $answer = 0b101010;
        printf "The answer is: %b\n", oct("0b101010");

=head2 syswrite() ease-of-use

The length argument of C<syswrite()> is now optional.

=head2 64-bit support

Better 64-bit support -- but full support still a distant goal.  One
must Configure with -Duse64bits to get Configure to probe for the
extent of 64-bit support.  Depending on the platform (hints file) more
or less 64-awareness becomes available.  As of 5.005_54 at least
somewhat 64-bit aware platforms are HP-UX 11 or better, Solaris 2.6 or
better, IRIX 6.2 or better.  Naturally 64-bit platforms like Digital
UNIX and UNICOS also have 64-bit support.

=head2 Better syntax checks on parenthesized unary operators

Expressions such as:

        print defined(&foo,&bar,&baz);
        print uc("foo","bar","baz");
        undef($foo,&bar);

used to be accidentally allowed in earlier versions, and produced
unpredictable behavior.  Some of them produced ancillary warnings
when used in this way, while others silently did the wrong thing.

The parenthesized forms of most unary operators that expect a single
argument will now ensure that they are not called with more than one
argument, making the above cases syntax errors.  Note that the usual
behavior of:

        print defined &foo, &bar, &baz;
        print uc "foo", "bar", "baz";
        undef $foo, &bar;

remains unchanged.  See L<perlop>.

=head2 Improved C<qw//> operator

The C<qw//> operator is now evaluated at compile time into a true list
instead of being replaced with a run time call to C<split()>.  This
removes the confusing behavior of C<qw//> in scalar context stemming from
the older implementation, which inherited the behavior from split().  

Thus:

    $foo = ($bar) = qw(a b c); print "$foo|$bar\n";

now correctly prints "3|a", instead of "2|a".

=head2 pack() format 'Z' supported

The new format type 'Z' is useful for packing and unpacking null-terminated
strings.  See L<perlfunc/"pack">.

=head1 Significant bug fixes

=head2 E<lt>HANDLEE<gt> on empty files

With C<$/> set to C<undef>, slurping an empty file returns a string of
zero length (instead of C<undef>, as it used to) for the first time the
HANDLE is read.  Subsequent reads yield C<undef>.

This means that the following will append "foo" to an empty file (it used
to not do anything before):

    perl -0777 -pi -e 's/^/foo/' empty_file

Note that the behavior of:

    perl -pi -e 's/^/foo/' empty_file

is unchanged (it continues to leave the file empty).

=head2 pack() format modifier '_' supported

The new format type modifer '_' is useful for packing and unpacking
native shorts, ints, and longs.  See L<perlfunc/"pack">.

=head1 Supported Platforms

=over 4

=item *

VM/ESA is now supported.

=item *

Siemens BS200 is now supported.

=item *

The Mach CThreads (NeXTstep) are now supported by the Thread extension.

=back

=head1 New tests

=over 4

=item   op/io_const

IO constants (SEEK_*, _IO*).

=item   op/io_dir

Directory-related IO methods (new, read, close, rewind, tied delete).

=item   op/io_multihomed

INET sockets with multi-homed hosts.

=item   op/io_poll

IO poll().

=item   op/io_unix

UNIX sockets.

=item   op/filetest

File test operators.

=item   op/lex_assign

Verify operations that access pad objects (lexicals and temporaries).

=back

=head1 Modules and Pragmata

=head2 Modules

=over 4

=item Dumpvalue

Added Dumpvalue module provides screen dumps of Perl data.

=item Benchmark

You can now run tests for I<x> seconds instead of guessing the right
number of tests to run.

=item Fcntl

More Fcntl constants added: F_SETLK64, F_SETLKW64, O_LARGEFILE for
large (more than 4G) file access (the 64-bit support is not yet
working, though, so no need to get overly e{*filter*}d), Free/Net/OpenBSD
locking behaviour flags F_FLOCK, F_POSIX, Linux F_SHLCK, and
O_ACCMODE: the mask of O_RDONLY, O_WRONLY, and O_RDWR.

=item Math::Complex

The accessors methods Re, Im, arg, abs, rho, theta, methods can
($z->Re()) now also act as mutators ($z->Re(3)).

=item Math::Trig

A little bit of radial trigonometry (cylindrical and spherical) added,
for example the great circle distance.

=item Time::Local

The timelocal() and timegm() functions used to silently return bogus
results when the date exceeded the machine's integer range.  They
consistently croak() if the date falls in an unsupported range.

=back

=head2 Pragmata

Lexical warnings pragma, "use warning;", to control optional warnings.

Filetest pragma, to control the behaviour of filetests (C<-r> C<-w> ...).
Currently only one subpragma implemented, "use filetest 'access';",
that enables the use of access(2) or equivalent to check the
permissions instead of using stat(2) as usual.  This matters
in filesystems where there are ACLs (access control lists), the
stat(2) might lie, while access(2) knows better.

=head1 Utility Changes

Todo.

=head1 Documentation Changes

=over 4

=item perlopentut.pod

A tutorial on using open() effectively.

=item perlreftut.pod

A tutorial that introduces the essentials of references.

=back

=head1 New Diagnostics

=item /%s/: Unrecognized escape \\%c passed through

(W) You used a backslash-character combination which is not recognized
by Perl.  This combination appears in an interpolated variable or a
C<'>-delimited regular expression.

=item Unrecognized escape \\%c passed through

(W) You used a backslash-character combination which is not recognized
by Perl.

=item Missing command in piped open

(W) You used the C<open(FH, "| command")> or C<open(FH, "command |")>
construction, but the command was missing or blank.

=head1 Obsolete Diagnostics

Todo.

=head1 Configuration Changes

You can use "Configure -Uinstallusrbinperl" which causes installperl
to skip installing perl also as /usr/bin/perl.  This is useful if you
prefer not to modify /usr/bin for some reason or another but harmful
because many scripts assume to find Perl in /usr/bin/perl.

=head1 BUGS

If you find what you think is a bug, you might check the headers of
recently posted articles in the comp.lang.perl.misc newsgroup.
There may also be information at http://www.*-*-*.com/ , the Perl
Home Page.

If you believe you have an unreported bug, please run the B<perlbug>
program included with your release.  Make sure you trim your bug down
to a tiny but sufficient test case.  Your bug report, along with the

analysed by the Perl porting team.

=head1 SEE ALSO

The F<Changes> file for exhaustive details on what changed.

The F<INSTALL> file for how to build Perl.

The F<README> file for general stuff.

The F<Artistic> and F<Copying> files for copyright information.

=head1 HISTORY


from The Perl Porters.


=cut
--
If I had to choose between System V and 4.2, I'd resign. --Peter Honeyman



Sun, 05 Aug 2001 03:00:00 GMT  
 DOC: perldelta.pod for 5.005_55

Quote:

> =head1 NAME

> perldelta - what's new for perl5.006 (as of 5.005_55)

I don't see anything about Unicode/utf-8 support. I've not been
following p5p lately so I might be unaware of something, but I assumed
that was one of the major changes.

--

| Fastnet Software Ltd              |   Perl in Active Server Pages   |
| Perl Consultancy, Web Development |   Database Design   |    XML    |
| http://come.to/fastnet            |    Information Consolidation    |



Mon, 06 Aug 2001 03:00:00 GMT  
 DOC: perldelta.pod for 5.005_55
 [courtesy cc of this posting sent to cited author via email]

In comp.lang.perl.misc,

:I don't see anything about Unicode/utf-8 support. I've not been
:following p5p lately so I might be unaware of something, but I assumed
:that was one of the major changes.

You're right.

perlfunc.pod:For example, C<chr(65)> is C<"A"> in either ASCII or Unicode, and
perlfunc.pod:chr(0x263a) is a Unicode smiley face (but only within the scope of a
perlfunc.pod:Returns the numeric (ASCII or Unicode) value of the first character of EXPR.  If
perlfunc.pod:    C      An unsigned char value.  Only does bytes.  See U for Unicode.
perlfunc.pod:    U      A Unicode character number.  Encodes to UTF-8 internally.
perlfunc.pod:    # same thing with Unicode circled letters
perlfunc.pod:Under Unicode (C<use utf8>) it uses the standard Unicode uppercase mappings.  (It
perlfunc.pod:in uppercase (titlecase in Unicode).  This is
perlop.pod:    tr/\0-\xFF//CU;          # translate Latin-1 to Unicode
perlop.pod:    tr/\0-\x{FF}//UC;                # translate Unicode to Latin-1
perlre.pod:    \x{263a} wide hex char         (Unicode SMILEY)
perlre.pod:    \X       Match eXtended Unicode "combining character sequence",
perltodo.pod:=head2 Unicode tutorial
perltodo.pod:Unicode support that Larry has created.

And here's the utf8.pm pragma.  Three's also a utf8_heavy as well.

--tom

package utf8;

sub import {
    $^H |= 0x00000008;
    $enc{caller()} = $_[1] if $_[1];

Quote:
}

sub unimport {
    $^H &= ~0x00000008;

Quote:
}

sub AUTOLOAD {
    require "utf8_heavy.pl";
    goto &$AUTOLOAD;

Quote:
}

1;
__END__

=head1 NAME

utf8 - Perl pragma to turn on UTF-8 and Unicode support

=head1 SYNOPSIS

    use utf8;
    no utf8;

=head1 DESCRIPTION

The utf8 pragma tells Perl to use UTF-8 as its internal string
representation for the rest of the enclosing block.  (The "no utf8"
pragma tells Perl to switch back to ordinary byte-oriented processing
for the rest of the enclosing block.)  Under utf8, many operations that
formerly operated on bytes change to operating on characters.  For
ASCII data this makes no difference, because UTF-8 stores ASCII in
single bytes, but for any character greater than C<chr(127)>, the
character is stored in a sequence of two or more bytes, all of which
have the high bit set.  But by and large, the user need not worry about
this, because the utf8 pragma hides it from the user.  A character
under utf8 is logically just a number ranging from 0 to 2**32 or so.
Larger characters encode to longer sequences of bytes, but again, this
is hidden.

Use of the utf8 pragma has the following effects:

=over 4

=item *

Strings and patterns may contain characters that have an ordinal value
larger than 255.  Presuming you use a Unicode editor to edit your
program, these will typically occur directly within the literal strings
as UTF-8 characters, but you can also specify a particular character
with an extension of the C<\x> notation.  UTF-8 characters are
specified by putting the hexadecimal code within curlies after the
C<\x>.  For instance, a Unicode smiley face is C<\x{263A}>.  A
character in the Latin-1 range (128..255) should be written C<\x{ab}>
rather than C<\xab>, since the former will turn into a two-byte UTF-8
code, while the latter will continue to be interpreted as generating a
8-bit byte rather than a character.  In fact, if -w is turned on, it will
produce a warning that you might be generating invalid UTF-8.

=item *

Identifiers within the Perl script may contain Unicode alphanumeric
characters, including ideographs.  (You are currently on your own when
it comes to using the canonical forms of characters--Perl doesn't (yet)
attempt to canonicalize variable names for you.)

=item *

Regular expressions match characters instead of bytes.  For instance,
"." matches a character instead of a byte.  (However, the C<\C> pattern
is provided to force a match a single byte ("C<char>" in C, hence
C<\C>).)

=item *

Character classes in regular expressions match characters instead of
bytes, and match against the character properties specified in the
Unicode properties database.  So C<\w> can be used to match an ideograph,
for instance.

=item *

Named Unicode properties and block ranges make be used as character
classes via the new C<\p{}> (matches property) and C<\P{}> (doesn't
match property) constructs.  For instance, C<\p{Lu}> matches any
character with the Unicode uppercase property, while C<\p{M}> matches
any mark character.  Single letter properties may omit the brackets, so
that can be written C<\pM> also.  Many predefined character classes are
available, such as C<\p{IsMirrored}> and  C<\p{InTibetan}>.

=item *

The special pattern C<\X> match matches any extended Unicode sequence
(a "combining character sequence" in Standardese), where the first
character is a base character and subsequent characters are mark
characters that apply to the base character.  It is equivalent to
C<(?:\PM\pM*)>.

=item *

The C<tr///> operator translates characters instead of bytes.  It can also
be forced to translate between 8-bit codes and UTF-8 regardless of the
surrounding utf8 state.  For instance, if you know your input in Latin-1,
you can say:

    use utf8;
    while (<>) {
        tr/\0-\xff//CU;         # latin1 char to utf8
        ...
    }

Similarly you could translate your output with

    tr/\0-\x{ff}//UC;           # utf8 to latin1 char

No, C<s///> doesn't take /U or /C (yet?).

=item *

Case translation operators use the Unicode case translation tables.
Note that C<uc()> translates to uppercase, while C<ucfirst> translates
to titlecase (for languages that make the distinction).  Naturally
the corresponding backslash sequences have the same semantics.

=item *

Most operators that deal with positions or lengths in the string will
automatically switch to using character positions, including C<chop()>,
C<substr()>, C<pos()>, C<index()>, C<rindex()>, C<sprintf()>,
C<write()>, and C<length()>.  Operators that specifically don't switch
include C<vec()>, C<pack()>, and C<unpack()>.  Operators that really
don't care include C<chomp()>, as well as any other operator that
treats a string as a bucket of bits, such as C<sort()>, and the
operators dealing with filenames.

=item *

The C<pack()>/C<unpack()> letters "C<c>" and "C<C>" do I<not> change,
since they're often used for byte-oriented formats.  (Again, think
"C<char>" in the C language.)  However, there is a new "C<U>" specifier
that will convert between UTF-8 characters and integers.  (It works
outside of the utf8 pragma too.)

=item *

The C<chr()> and C<ord()> functions work on characters.  This is like
C<pack("U")> and C<unpack("U")>, not like C<pack("C")> and
C<unpack("C")>.  In fact, the latter are how you now emulate
byte-oriented C<chr()> and C<ord()> under utf8.

=item *

And finally, C<scalar reverse()> reverses by character rather than by byte.

=back

=head1 CAVEATS

As of yet, there is no method for automatically coercing input and
output to some encoding other than UTF-8.  This is planned in the near
future, however.

In any event, you'll need to keep track of whether interfaces to other
modules expect UTF-8 data or something else.  The utf8 pragma does not
magically mark strings for you in order to remember their encoding, nor
will any automatic coercion happen (other than that eventually planned
for I/O).  If you want such automatic coercion, you can build yourself
a set of pretty object-oriented modules.  Expect it to run considerably
slower than than this low-level support.

Use of locales with utf8 may lead to odd results.  Currently there is
some attempt to apply 8-bit locale info to characters in the range
0..255, but this is demonstrably incorrect for locales that use
characters above that range (when mapped into Unicode).  It will also
tend to run slower.  Avoidance of locales is strongly encouraged.

=cut
--
Welcome to Microsoft!
Plase set your watch back 20 years.



Mon, 06 Aug 2001 03:00:00 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. DOC: perlmodinstall.pod for 5.005_55

2. DOC: perlopentut.pod for 5.005_55

3. DOC: perlport.pod for 5.005_55

4. DOC: perlthrtut.pod for 5.005_55

5. DOC: perlreftut.pod for 5.005_55

6. POD 'anchor' doc sought

7. Pod::Filter, Pod::Usage, and Pod::PlainText now on CPAN

8. Pod Problems (using =pod, nesting fonts, and Pod::Text)

9. pod::html and pod::text. returning doc from the same script

10. PL_na issues (was ANNOUNCE: 5.005_58 perldelta page)

11. Newbie (probably stupid) question: Improved qw operator in perldelta

 

 
Powered by phpBB® Forum Software