#!/usr/bin/perl # $Id: fork-and-rename 140 2009-05-25 19:58:41Z whynot $ package main; =head1 NAME fork-and-rename - rename bunch of files and put your system on knees =head1 README That renames files found in directory, if applied rule matches. Target names are pseudo-randomized. And BTW attempts to DOS your system. =head1 USAGE fork-and-rename --destination=target/ --filter=. source0/ source1/ fork-and-rename --move --filter=sh --filter=txt place0/ place1/ =head1 DESCRIPTION Neither current nor previous mobile phone that I use is capable to name saved files any useful way. What's even worse either have no grasp of what overwrithing is. So I have a lots of files incredibly named that I have to maintain somehow. B (hereafter B) takes file, counts it B, looks its B and renames it this way (after renaming, mtime of source is applied on target): ppS-IY9QqxUM.jpg -> 911692DA-20080520-112926.jpg where: =over =item 911692DA is CRC-32, in hexadecimal, all caps =item 20080520 is date of mtime, in 4 decimals of year, then 2 decimals of month and then 2 decimals of day of month =item 112926 is time part of mtime, in hours, minutes, and seconds in 2 decimals each. =back That's the purpose part. That's not that interestening, isn't it? So what B does in its name? The complete processing of each file is done in separate process (one file -- one process). The main process finds suitable file, forks, collects all already finished zombies, and when there's none zombie left, goes for next file. So does it achieve its target of putting the system on knees? No and yes. On a snapshot of my mobile's memory card B stabilizes on 17..20 processes first, then spikes to 22..25 processes. At that point audio starts to glitch. Most number of zombies reaped at once was 3, sometime 4. I fail to see any difference either between modes (see below) or filesystems (ext3 and ext2). I still have no resources to check bigger files (such as found in FusrZ<>EshareZ<>Edoc>). That seems that B (or whatever it's emulated by) is a way costly. Have you read those 2 paragraphes above? Looks bad, don't it? Forget it. All that was experienced when B was in use (I don't rant about B per se). Looking for timezone it Bs B. After finding that, I've proudly dropped B and rewritten those 2 lines with B (in mind and in use). And... It's hard to say how many processes run at once -- roughly 2..5, up to 9 zombies are collected at once, and (what I like most) PIDs of Bs are highly sequential. The copy-to mode somewhat differs -- processes don't come in batches (as they do for rename-in-place mode). However everything is a way fast. One interesting observation. Whatever wrapping is choosen (S> or S>), Bed process reports target-source pair before parent reports Bed PID. I think, that Bing B and Bing B are a way different things. =cut use strict; use warnings; use 5.006; #use version 0.50; use File::Find; use Digest::CRC; use Fcntl qw| :DEFAULT |; use POSIX qw| strftime :limits_h :sys_wait_h |; use Getopt::Long; =head1 PREREQUISITES File::Find Getopt::Long Digest::CRC Fcntl POSIX =head1 DEPENDENCIES =over =item B Provides directory traversing facility. Subject to be distributed with Perl. I<1.12> works for me. =item B Command line parsing. Subject to be distributed with Perl. I<2.37> works for me. =item B It's used to provide distribution among filenames (B doesn't randomize, remember). In use is 32bit variant. I think, 16bit variant would have chance for clashes (although, that's untested). While 32bit variant is short enough. The next step would be 128byte hash, but do you really want that long filenames? I<0.14> works for me. =item B Bed process (when in copy-to mode) copies files by itself block-by-block. So it uses B and B, and thus requires constants. Subject to be distributed with Perl. I<1.06> works for me. =item B The block size for the system for pipe reads (B doens't B, but I've found that constant useful). B is used, remember? And contstant for unhanging B. Subject to be distributed with Perl. I<1.13> works for me. =back =cut #=head1 OSNAMES =head1 INCOMPATIBILITIES POSIX-clean slash (B) is used in constructing and parsing full pathnames. You know. =cut our $VERSION = 0.000_005; my $template = q|%08X-%04d%02d%02d-%02d%02d%02d|; my $mask = qr|^\w{8}-\d{8}-\d{6}|; my( $dst, @filter, $move, $resuffix ); sub short_help { print < \&short_help, q|version!| => \&short_version, q|destination=s| => \$dst, q|filter=s@| => \@filter, q|move!| => \$move, q|suffix=s| => \$resuffix, ; =head1 ARGUMENTS =over =item I<--destination> CE> Sets B in copy-to mode and assigns the target directory. In that mode files are copied in the I<--destination> directory. The source directory tree isn't recreated. =item I<--move> Sets B in rename-on-place mode. In that mode files are renamed in a directory they were found. =item I<--filter> B has 2 modes =over =item multiple filters Each filter names one suffix (with neither leading nor inter dots). If file has simple suffix, and that suffix is equal (case-blindly) with one of I<--filter>s, then file is processed. (B means that anything on the left of rightmost dot isn't suffix. If there's no dot at all, then there's no suffix.) The file is ignored otherwise. =item one filter for all However, in case you want to process all the files specifying I<--filter>s for every suffix would be error-prone, ridiculous etc. And you can't specify I<--filter> for empty suffix anyway. You can set I<--filter> to dot --filter=. And then any file will match -- with any suffix or without suffix at all. That magic filter must be alone. =back Yeah, such a brain-dead construct. And one more note one filtering. If file looks like already renamed (8 hexadecimals, 8 decimals, and 6 decimals separated by hyphen (B<->)) then file is skipped unconditionally. If filename starts with a dot (B<.>) then the file is skipped too. =back =head1 OPTIONS =over =item I<--suffix> C Renamed files keep a suffix of source. This option is supposed to maintain that any-case zoo. This sets a suffix for a target file -- If a source file happens to have a suffix it will be replaced; In case there's no source's suffix, it will be added (think: L). =back =cut die qq|destination ($dst) is set simumltaneously with move\n| if $move && $dst; die qq|neither mode has been choosen\n| unless $move || $dst; if( $dst ) { -d $dst or die qq|destination ($dst) isn't a directory|; $dst =~ s{/+$}{}; }; -d $_ or die qq|($_) isn't a directory| foreach @ARGV; s{/+$}{} foreach @ARGV; @filter or die qq|missing filter|; @filter = () if 1 == @filter && '.' eq $filter[0]; my $digest = Digest::CRC->new( type => q|crc32| ); sub process_this ( $ ) { my $file = shift @_; defined( my $pid = fork ) or die qq|fork ($file) failed: $!|; if( $pid ) { printf qq|[%i]: came\n|, $pid; my @gone; push @gone, $_ until 0 >= ($_ = waitpid -1, WNOHANG); printf qq|[%s]: gone\n|, join q|] [|, @gone if @gone; return; }; my $suffix = $resuffix || ( split m{\.}, ( split m{/}, $file )[-1] )[-1] || ''; $suffix = '' if $suffix eq ( split m{/}, $file )[-1]; my $mtime = ( stat $file )[9]; open my $fh, q|<|, $file or die qq|can't open ($file): $!|; $digest->addfile( $fh ); if( $move ) { $file =~ m{^(.+)/}; $dst = $1; }; my $target = sprintf qq|%s/$template%s|, $dst, $digest->digest, split( m{ }, strftime q|%Y %m %d %H %M %S|, localtime $mtime ), $suffix ? qq|.$suffix| : ''; printf qq|[%i]: %s %s\n|, $$, ( split m{/}, $target )[-1], ( split m{/}, $file )[-1]; unless( $move ) { sysopen my $fhi, $file, O_RDONLY or die qq|sysopen ($file) for read failed: $!|; sysopen my $fho, $target, O_WRONLY | O_EXCL | O_CREAT or die qq|sysopen ($target) for write failed: $!|; my( $chunk, $buf ); # XXX:20090525211216:whynot: What if $chunk != syswrite()? defined syswrite $fho, $buf, $chunk or die qq|syswrite ($target) failed: $!| while $chunk = sysread $fhi, $buf, PIPE_BUF; utime +( stat $fhi )[8,9], $fho or die qq|utime ($file -> $target) failed: $!|; } else { -f $target and die qq|target ($target) for ($file) exists|; rename $file, $target; }; exit; }; =head1 DIAGNOSTICS B reports its progress, and that's unavoidable. Bs, Bs, and source-target pairs are reported. Zombies ripped in main cycle after B are reported on one line. In final cleanup -- immediately after ripping. One more note on "forked" reports. No directories are reported; The filenames are dumped in misleading reverse order -- I believe that increases readability (target filename is almost constant lenght (subject to suffix variation), while source filename length can change a lot). =head1 BUGS AND LIMITATIONS =over =item * (I, may be I) As already mentioned, copy-to mode doesn't recreate directory tree. =item * (I) And then if two (or more) files are met (in possibly different directories) that have equal CRC-32 and mtime's, then the target filenames will be the same. So the only first file will be copied. In two cases when I stepped in that -- offending files were plainly the same (icons distributed with some app). =item * (I) The very same situation (however, that seems to be quite improbably) could happen in rename-in-place mode too. =item * (I?) The atime of source could be collected before file is opened for CRC-32 calculation. =back =cut find { wanted => sub { my $file = ( split m{/} )[-1]; !-f $_ || $file =~ m{^\.} || $file =~ m{$mask} || @filter && !grep $file =~ m{\.\Q$_\E$}i, @filter and return; process_this $_; }, no_chdir => 1, }, @ARGV; printf qq|[%i]: gone\n|, $_ until 0 >= ($_ = waitpid -1, 0); =head1 SCRIPT CATEGORIES UNIX/Administration =head1 AUTHOR Eric Pozharski, Ewhynot@cpan.orgZ<>E =head1 COPYRIGHT & LICENSE Copyright 2009 by Eric Pozharski This utility is free in sense: AS-IS, NO-WARANRTY, HOPE-TO-BE-USEFUL. This utility is released under GNU GPLv3. =cut # vim: set filetype=perl