archivemail — archive and compress your old email
archivemail
[options
] {MAILBOX
...}
archivemail
is a tool for archiving and compressing old email in
mailboxes. By default it will read the mailbox MAILBOX
, moving messages that
are older than the specified number of days (180 by default)
to a mbox(5)-format
mailbox in the same directory that is compressed with
gzip(1). It can
also just delete old email rather than archive it.
By default, archivemail derives the
archive filename from the mailbox name by appending an
_archive
suffix to the mailbox
name. For example, if you run archivemail on a mailbox
called exsouthrock
, the archive
will be created with the filename exsouthrock_archive.gz
. This default
behavior can be overridden with command line options,
choosing a custom suffix, a prefix, or a completely custom
name for the archive.
archivemail supports reading IMAP, Maildir, MH and mbox-format mailboxes, but always writes mbox-format archives.
Messages that are flagged important are not archived or
deleted unless explicitly requested with the --include-flagged
option. Also, archivemail can be
configured not to archive unread mail, or to only archive
messages larger than a specified size.
To archive an IMAP-format mailbox, use the format
imap://username:password@server/mailbox
to specify the mailbox. archivemail will expand
wildcards in IMAP mailbox
names according to [RFC 3501], which says:
“The character
"*" is a wildcard, and matches zero or more characters at
this position. The character "%" is similar to "*", but it
does not match a hierarchy delimiter.”
You can omit the password from the URL; use the --pwfile
option to make archivemail read the
password from a file, or alternatively just enter it upon
request. If the --pwfile
option
is set, archivemail does not look
for a password in the URL,
and the colon is not considered a delimiter. Substitute
imap
with
imaps
, and
archivemail
will establish a secure SSL connection. See below for more
IMAP peculiarities.
-d NUM
,
--days=NUM
Archive messages older than NUM
days. The default
is 180. This option is incompatible with the
--date
option below.
-D DATE
,
--date=DATE
Archive messages older than DATE
. DATE
can be a date
string in ISO format (eg “2002-04-23”), Internet
format (eg “23 Apr
2002”) or Internet format with full
month names (eg
“23 April
2002”). Two-digit years are not
supported. This option is incompatible with the
--days
option above.
-o PATH
,
--output-dir=PATH
Use the directory name PATH
to store the
mailbox archives. The default is the same directory as
the mailbox to be read.
-P FILE
,
--pwfile=FILE
Read IMAP
password from file FILE
instead of from
the command line. Note that this will probably not work
if you are archiving folders from more than one IMAP
account.
-F STRING
,
--filter-append=STRING
Append STRING
to the
IMAP filter string.
For IMAP
wizards.
-p NAME
,
--prefix=NAME
Prefix NAME
to the archive
name. NAME
is
expanded by the python(1)
function time.strftime()
,
which means that you can specify special directives in
NAME
to make
an archive named after the archive cut-off date. See
the discussion of the --suffix
option for a list of valid
strftime()
directives.
The default is not to add a prefix.
-s NAME
,
--suffix=NAME
Use the suffix NAME
to create the
filename used for archives. The default is _archive
, unless a prefix is
specified.
Like a prefix, the suffix NAME
is expanded by the
python(1)
function time.strftime()
with the archive cut-off date. time.strftime()
understands the
following directives:
%a
%A
%b
%B
%c
%d
%H
%I
%j
%m
%M
%p
%S
%U
%w
%W
%x
%X
%y
%Y
%Z
%%
-a NAME
,
--archive-name=NAME
Use NAME
as the archive name, ignoring the name of the mailbox
that is archived. Like prefixes and suffixes,
NAME
is
expanded by time.strftime()
with the archive
cut-off date. Because it hard-codes the archive name,
this option cannot be used when archiving multiple
mailboxes.
-S NUM
,
--size=NUM
Only archive messages that are NUM
bytes or
greater.
-n
, --dry-run
Don't write to any files -- just show what would have been done. This is useful for testing to see how many messages would have been archived.
-u
, --preserve-unread
Do not archive any messages that have not yet been
read. archivemail
determines if a message in a mbox-format or MH-format mailbox has been read by
looking at the Status
header (if it exists). If the status header is equal to
“RO
” or
“OR
” then
archivemail assumes
the message has been read. archivemail
determines if a maildir message has been read by
looking at the filename. If the filename contains an
“S
” after
:2,
then it assumes the
message has been read.
--dont-mangle
Do not mangle lines in message bodies beginning with
“From
”. When
archiving a message from a mailbox not in
mbox format, by
default archivemail mangles
such lines by prepending a “>
” to them,
since mail user agents might otherwise interpret these
lines as message separators. Messages from
mbox folders are
never mangled. See mbox(5)
for more information.
--delete
Delete rather than archive old mail. Use this option with caution!
--copy
Copy rather than archive old mail. Creates an
archive, but the archived messages are not deleted from
the originating mailbox, which is left unchanged. This
is a complement to the --delete
option, and mainly useful for
testing purposes. Note that multiple passes will create
duplicates, since messages are blindly appended to an
existing archive.
--all
Archive all messages, without distinction.
--include-flagged
Normally messages that are flagged important are not archived or deleted. If you specify this option, these messages can be archived or deleted just like any other message.
--no-compress
Do not compress any archives.
--warn-duplicate
Warn about duplicate Message-ID
s that appear in the input
mailbox.
-v
, --verbose
Reports lots of extra debugging information about what is going on.
--debug-imap=NUM
Set IMAP
debugging level. This makes archivemail dump its
conversation with the IMAP server and some internal
IMAP processing to
stdout
. Higher values for
NUM
give more
elaborate output. Set NUM
to 4 to see all
exchanged IMAP
commands. (Actually, NUM
is just passed
literally to imaplib.Debug
.)
-q
, --quiet
Turns on quiet mode. Do not print any statistics about how many messages were archived. This should be used if you are running archivemail from cron.
-V
, --version
Display the version of archivemail and exit.
-h
, --help
Display brief summary information about how to run archivemail.
archivemail
requires python(1)
version 2.3 or later. When reading an mbox-format mailbox, archivemail will create a
lockfile with the extension .lock
so that procmail(1) will
not deliver to the mailbox while it is being processed. It
will also create an advisory lock on the mailbox using
lockf(2). The
archive is locked in the same way when it is updated.
archivemail
will also complain and abort if a 3rd-party modifies the
mailbox while it is being read.
archivemail
will always attempt to preserve the last-access and
last-modify times of the input mailbox. Archive mailboxes are
always created with a mode of 0600
. If archivemail finds a
pre-existing archive mailbox it will append rather than
overwrite that archive. archivemail will refuse to
operate on mailboxes that are symbolic links.
archivemail
attempts to find the delivery date of a message by looking
for valid dates in the following headers, in order of
precedence: Delivery-date
,
Received
, Resent-Date
and Date
. If it cannot find any valid date in
these headers, it will use the last-modified file timestamp
on MH and Maildir format mailboxes, or the date on
the From_
line on
mbox-format mailboxes.
When archiving mailboxes with leading dots in the name,
archivemail
will strip the dots off the archive name, so that the
resulting archive file is not hidden. This is not done if the
--prefix
or --archive-name
option is used. Should there
really be mailboxes distinguished only by leading dots in the
name, they will thus be archived to the same archive file by
default.
A conversion from other formats to mbox(5)
will silently overwrite existing Status
and X-Status
message headers.
When archivemail processes an
IMAP folder, all
messages in that folder will have their \Recent
flag unset, and they will probably
not show up as “new” in your user agent later
on. There is no way around this, it's just how
IMAP works. This does
not apply, however, if you run archivemail with the
options --dry-run
or
--copy
.
archivemail relies on
server-side searches to determine the messages that should
be archived. When matching message dates, IMAP servers refer to server internal
message dates, and these may differ from both delivery time
of a message and its Date
header. Also, there exist broken servers which do not
implement server side searches.
archivemail's
IMAP URL parser was written with the
RFC 2882 (Internet Message Format) rules for the
local-part of email addresses
in mind. So, rather than enforcing an URL-style encoding of
non-ascii and reserved
characters, it allows you to double-quote the username
and password. If your username or password contains the
delimiter characters “@”
or “:”, just quote it like this:
imap://"username@bogus.com":"password"@imap.bogus.com/mailbox
.
You can use a backslash to escape double-quotes that are
part of a quoted username or password. Note that quoting
only a substring will not work, and be aware that your
shell will probably remove unprotected quotes or
backslashes.
Similarly, there is no need to percent-encode non-ascii characters in IMAP mailbox names. As long as your locale is configured properly, archivemail should handle these without problems. Note, however, that due to limitations of the IMAP protocol, non-ascii characters do not mix well with wildcards in mailbox names.
archivemail tries to be
smart when handling mailbox paths. In particular, it will
automatically add an IMAP NAMESPACE
prefix to the mailbox path if
necessary; and if you are archiving a subfolder, you can
use the slash as a path separator instead of the
IMAP server's internal
representation.
To archive all messages in the mailbox debian-user
that are older than 180 days
to a compressed mailbox called debian-user_archive.gz
in the current
directory:
bash$
archivemail debian-user
To archive all messages in the mailbox debian-user
that are older than 180 days
to a compressed mailbox called debian-user_October_2001.gz
(where the
current month and year is April, 2002) in the current
directory:
bash$
archivemail --suffix '_%B_%Y' debian-user
To archive all messages in the mailbox cm-melb
that are older than the first of
January 2002 to a compressed mailbox called cm-melb_archive.gz
in the current
directory:
bash$
archivemail --date='1 Jan 2002' cm-melb
Exactly the same as the above example, using an ISO date format instead:
bash$
archivemail --date=2002-01-01 cm-melb
To delete all messages in the mailbox spam
that are older than 30 days:
bash$
archivemail --delete --days=30 spam
To archive all read messages in the mailbox incoming
that are older than 180 days to
a compressed mailbox called incoming_archive.gz
in the current
directory:
bash$
archivemail --preserve-unread incoming
To archive all messages in the mailbox received
that are older than 180 days to
an uncompressed mailbox called received_archive
in the current
directory:
bash$
archivemail --no-compress received
To archive all mailboxes in the directory $HOME/Mail
that are older than 90 days to
compressed mailboxes in the $HOME/Mail/Archive
directory:
bash$
archivemail -d90 -o $HOME/Mail/Archive $HOME/Mail/*
To archive all mails older than 180 days from the given
IMAP INBOX
to a compressed mailbox INBOX_archive.gz
in the $HOME/Mail/Archive
directory, quoting the
password and reading it from the environment variable
PASSWORD
:
bash$
archivemail -o $HOME/Mail/Archive imaps://user:'"'$PASSWORD'"'@example.org/INBOX
Note the protected quotes.
To archive all mails older than 180 days in subfolders
of foo
on the given
IMAP server to
corresponding archives in the current working directory,
reading the password from the file ~/imap-pass.txt
:
bash$
archivemail --pwfile=~/imap-pass.txt imaps://user@example.org/foo/*
Probably the best way to run archivemail is from your
crontab(5) file,
using the --quiet
option. Don't
forget to try the --dry-run
and
perhaps the --copy
option for
non-destructive testing.
If an IMAP mailbox path
contains slashes, the archive filename will be derived from
the basename of the mailbox. If the server's folder separator
differs from the Unix slash and is used in the
IMAP URL, however, the whole path will be
considered the basename of the mailbox. E.g. the two URLs imap://user@example.com/folder/subfolder
and imap://user@example.com/folder.subfolder
will be archived in subfolder_archive.gz
and folder.subfolder_archive.gz
, respectively,
although they might refer to the same IMAP mailbox.
archivemail does not support reading MMDF or Babyl-format mailboxes. In fact, it will probably think it is reading an mbox-format mailbox and cause all sorts of problems.
archivemail is still too slow, but if you are running from crontab(5) you won't care. Archiving maildir-format mailboxes should be a lot quicker than mbox-format mailboxes since it is less painful for the original mailbox to be reconstructed after selective message removal.
This manual page was written by Paul Rodger <paul at
paulrodger dot com>. Updated and supplemented by Nikolaus
Schulz <microschulz@web.de>