#!/usr/bin/perl
# $Id: fork-and-rename 140 2009-05-25 19:58:41Z whynot $

package main;

=head1 NAME

fork-and-rename - rename bunch of files and put your system on knees

=head1 README

That renames files found in directory, if applied rule matches.
Target names are pseudo-randomized.
And BTW attempts to DOS your system.

=head1 USAGE

    fork-and-rename --destination=target/ --filter=. source0/ source1/
    fork-and-rename --move --filter=sh --filter=txt place0/ place1/

=head1 DESCRIPTION

Neither current nor previous mobile phone that I use is capable to name saved
files any useful way.
What's even worse either have no grasp of what overwrithing is.
So I have a lots of files incredibly named that I have to maintain somehow.

B<fork-and-rename> (hereafter B<f-a-r>) takes file, counts it B<CRC-32>, looks
its B<mtime> and renames it this way
(after renaming, mtime of source is applied on target):

    ppS-IY9QqxUM.jpg -> 911692DA-20080520-112926.jpg

where:

=over

=item 911692DA

is CRC-32, in hexadecimal, all caps

=item 20080520

is date of mtime, in 4 decimals of year, then 2 decimals of month and then 2
decimals of day of month

=item 112926

is time part of mtime, in hours, minutes, and seconds in 2 decimals each.

=back

That's the purpose part.
That's not that interestening, isn't it?
So what B<fork> does in its name?
The complete processing of each file is done in separate process
(one file -- one process).
The main process finds suitable file, forks, collects all already finished
zombies, and when there's none zombie left, goes for next file.

So does it achieve its target of putting the system on knees?
No and yes.
On a snapshot of my mobile's memory card B<f-a-r> stabilizes on 17..20
processes first, then spikes to 22..25 processes.
At that point audio starts to glitch.
Most number of zombies reaped at once was 3, sometime 4.
I fail to see any difference either between modes (see below) or
filesystems (ext3 and ext2).
I still have no resources to check bigger files
(such as found in F<E<sol>usrZ<>E<sol>shareZ<>E<sol>doc>).

That seems that B<fork(2)> (or whatever it's emulated by) is a way costly.

Have you read those 2 paragraphes above?
Looks bad, don't it?
Forget it.
All that was experienced when B<Date::Manip> was in use
(I don't rant about B<Date::Manip> per se).
Looking for timezone it B<fork>s B<date(1)>.
After finding that, I've proudly dropped B<D::M> and rewritten those 2 lines
with B<POSIX::strftime> (in mind and in use).
And...
It's hard to say how many processes run at once -- roughly 2..5, up to 9
zombies are collected at once, and (what I like most) PIDs of B<fork>s are
highly sequential.
The copy-to mode somewhat differs -- processes don't come in batches
(as they do for rename-in-place mode).
However everything is a way fast.

One interesting observation.
Whatever wrapping is choosen (S<C<if( $pid )>> or S<C<unless( $pid )>>),
B<fork>ed process reports target-source pair before parent reports B<fork>ed
PID.

I think, that B<fork>ing B<perl> and B<fork>ing B<shell> are a way different
things.

=cut

use strict;
use warnings;
use         5.006;
#use version 0.50;
use File::Find;
use Digest::CRC;
use Fcntl qw| :DEFAULT                       |;
use POSIX qw| strftime :limits_h :sys_wait_h |;
use Getopt::Long;

=head1 PREREQUISITES

File::Find
Getopt::Long
Digest::CRC
Fcntl
POSIX

=head1 DEPENDENCIES

=over

=item B<File::Find>

Provides directory traversing facility.
Subject to be distributed with Perl.
I<1.12> works for me.

=item B<Getopt::Long>

Command line parsing.
Subject to be distributed with Perl.
I<2.37> works for me.

=item B<Digest::CRC>

It's used to provide distribution among filenames
(B<f-a-r> doesn't randomize, remember).
In use is 32bit variant.
I think, 16bit variant would have chance for clashes
(although, that's untested).
While 32bit variant is short enough.
The next step would be 128byte hash,
but do you really want that long filenames?
I<0.14> works for me.

=item B<Fcntl>

B<fork>ed process (when in copy-to mode) copies files by itself block-by-block.
So it uses B<sysread> and B<syswrite>, and thus requires constants.
Subject to be distributed with Perl.
I<1.06> works for me.

=item B<POSIX>

The block size for the system for pipe reads
(B<f-a-r> doens't B<pipe>, but I've found that constant useful).
B<strftime> is used, remember?
And contstant for unhanging B<waitpid>.
Subject to be distributed with Perl.
I<1.13> works for me.

=back

=cut

#=head1 OSNAMES

=head1 INCOMPATIBILITIES

POSIX-clean slash (B</>) is used in constructing and parsing full pathnames.
You know.

=cut

our $VERSION = 0.000_005;

my $template = q|%08X-%04d%02d%02d-%02d%02d%02d|;
my $mask = qr|^\w{8}-\d{8}-\d{6}|;

my( $dst, @filter, $move, $resuffix );

sub short_help {
    print <<END_OF_HELP;
  --destination=dir sets fork-and-rename in 'copy-to' mode
                    and assigns a target directory 'dir'
  --move            sets fork-and-rename in 'rename-in-place' mode
        either one of above is required

  --filter=suffix   adds one suffix to the list of filters
  --suffix=suffix   sets suffix for renamed files
  --help            this help
  --version         version of this
END_OF_HELP
    exit 0;     };

sub short_version {
    print <<"END_OF_VERSION";
fork-and-rename - $VERSION
AS-IS, NO-WARRANTY, HOPE-TO-BE-USEFUL - GNU GPLv3
END_OF_VERSION
    exit 0;        };

GetOptions
  q|help!|         => \&short_help,
  q|version!|      => \&short_version,
  q|destination=s| => \$dst,
  q|filter=s@|     => \@filter,
  q|move!|         => \$move,
  q|suffix=s|      => \$resuffix, ;

=head1 ARGUMENTS

=over

=item I<--destination> C<targetZ<>E<sol>>

Sets B<f-a-r> in copy-to mode and assigns the target directory.
In that mode files are copied in the I<--destination> directory.
The source directory tree isn't recreated.

=item I<--move>

Sets B<f-a-r> in rename-on-place mode.
In that mode files are renamed in a directory they were found.

=item I<--filter>

B<f-a-r> has 2 modes

=over

=item multiple filters

Each filter names one suffix (with neither leading nor inter dots).
If file has simple suffix, and that suffix is equal (case-blindly) with one of
I<--filter>s, then file is processed.
(B<Simple suffix> means that anything on the left of rightmost dot isn't
suffix.
If there's no dot at all, then there's no suffix.)
The file is ignored otherwise.

=item one filter for all

However, in case you want to process all the files specifying I<--filter>s for
every suffix would be error-prone, ridiculous etc.
And you can't specify I<--filter> for empty suffix anyway.
You can set I<--filter> to dot

    --filter=.

And then any file will match -- with any suffix or without suffix at all.
That magic filter must be alone.

=back

Yeah, such a brain-dead construct.

And one more note one filtering.
If file looks like already renamed
(8 hexadecimals, 8 decimals, and 6 decimals separated by hyphen (B<->))
then file is skipped unconditionally.
If filename starts with a dot (B<.>) then the file is skipped too.

=back

=head1 OPTIONS

=over

=item I<--suffix> C<suffix>

Renamed files keep a suffix of source.
This option is supposed to maintain that any-case zoo.
This sets a suffix for a target file --
If a source file happens to have a suffix it will be replaced;
In case there's no source's suffix, it will be added
(think: L</one filter for all>).

=back

=cut

die qq|destination ($dst) is set simumltaneously with move\n|
  if $move && $dst;
die qq|neither mode has been choosen\n|
  unless $move || $dst;

if( $dst )           {
    -d $dst or
      die qq|destination ($dst) isn't a directory|;
    $dst =~ s{/+$}{}; };

-d $_ or
  die qq|($_) isn't a directory|
  foreach @ARGV;
s{/+$}{}
  foreach @ARGV;

@filter or
  die qq|missing filter|;
@filter = ()
  if 1 == @filter && '.' eq $filter[0];

my $digest = Digest::CRC->new( type => q|crc32| );

sub process_this ( $ ) {
    my $file = shift @_;
    defined( my $pid = fork )  or
      die qq|fork ($file) failed: $!|;
    if( $pid ) {
        printf qq|[%i]: came\n|, $pid;
        my @gone;
        push @gone, $_
          until 0 >= ($_ = waitpid -1, WNOHANG);
        printf qq|[%s]: gone\n|, join q|] [|, @gone
          if @gone;
        return; };
    my $suffix =
      $resuffix || ( split m{\.}, ( split m{/}, $file )[-1] )[-1] || '';
    $suffix = ''
      if $suffix eq ( split m{/}, $file )[-1];
    my $mtime = ( stat $file )[9];
    open my $fh, q|<|, $file or
      die qq|can't open ($file): $!|;
    $digest->addfile( $fh );
    if( $move )   {
        $file =~ m{^(.+)/};
        $dst = $1; };
    my $target = sprintf qq|%s/$template%s|,
      $dst,
      $digest->digest,
      split( m{ }, strftime q|%Y %m %d %H %M %S|, localtime $mtime ),
      $suffix ? qq|.$suffix| : '';
    printf qq|[%i]: %s %s\n|,
      $$, ( split m{/}, $target )[-1], ( split m{/}, $file )[-1];
    unless( $move )                                   {
        sysopen my $fhi, $file, O_RDONLY                      or
          die qq|sysopen ($file) for read failed: $!|;
        sysopen my $fho, $target, O_WRONLY | O_EXCL | O_CREAT or
          die qq|sysopen ($target) for write failed: $!|;
        my( $chunk, $buf );
# XXX:20090525211216:whynot: What if $chunk != syswrite()?
        defined syswrite $fho, $buf, $chunk                   or
          die qq|syswrite ($target) failed: $!|
          while $chunk = sysread $fhi, $buf, PIPE_BUF;
        utime +( stat $fhi )[8,9], $fho                       or
          die qq|utime ($file -> $target) failed: $!|; }
    else                                              {
        -f $target and
          die qq|target ($target) for ($file) exists|;
        rename $file, $target;                         };
    exit;               };

=head1 DIAGNOSTICS

B<f-a-r> reports its progress, and that's unavoidable.
B<fork>s, B<waitpid>s, and source-target pairs are reported.
Zombies ripped in main cycle after B<fork> are reported on one line.
In final cleanup -- immediately after ripping.

One more note on "forked" reports.
No directories are reported;
The filenames are dumped in misleading reverse order -- I believe that
increases readability
(target filename is almost constant lenght (subject to suffix variation), while
source filename length can change a lot).

=head1 BUGS AND LIMITATIONS

=over

=item *

(I<caveat>, may be I<bug>)
As already mentioned, copy-to mode doesn't recreate directory tree.

=item *

(I<caveat>)
And then if two (or more) files are met (in possibly different directories)
that have equal CRC-32 and mtime's, then the target filenames will be the same.
So the only first file will be copied.
In two cases when I stepped in that -- offending files were plainly the same
(icons distributed with some app).

=item *

(I<caveat>)
The very same situation (however, that seems to be quite improbably) could
happen in rename-in-place mode too.

=item *

(I<bug>?)
The atime of source could be collected before file is opened for CRC-32
calculation.

=back

=cut

find
  { wanted => sub                    {
      my $file = ( split m{/} )[-1];
      !-f $_                                             ||
        $file =~ m{^\.}                                  ||
        $file =~ m{$mask}                                ||
        @filter && !grep $file =~ m{\.\Q$_\E$}i, @filter and
        return;
      process_this $_; },
    no_chdir => 1, },
  @ARGV;

printf qq|[%i]: gone\n|, $_
  until 0 >= ($_ = waitpid -1, 0);

=head1 SCRIPT CATEGORIES

UNIX/Administration

=head1 AUTHOR

Eric Pozharski, E<lt>whynot@cpan.orgZ<>E<gt>

=head1 COPYRIGHT & LICENSE

Copyright 2009 by Eric Pozharski

This utility is free in sense: AS-IS, NO-WARANRTY, HOPE-TO-BE-USEFUL.
This utility is released under GNU GPLv3.

=cut

# vim: set filetype=perl
