More chiseling away at the replacement inbox filter.

author Art Cancro <ajc@citadel.org>

Sun, 23 Aug 2020 18:34:24 +0000 (14:34 -0400)

committer Art Cancro <ajc@citadel.org>

Sun, 23 Aug 2020 18:34:24 +0000 (14:34 -0400)
author Art Cancro <ajc@citadel.org>
Sun, 23 Aug 2020 18:34:24 +0000 (14:34 -0400)
committer Art Cancro <ajc@citadel.org>
Sun, 23 Aug 2020 18:34:24 +0000 (14:34 -0400)
diff --git a/citadel/docs/databaselayout.md b/citadel/docs/databaselayout.md

deleted file mode 100644 (file)

index 9c2cb56..0000000
--- a/citadel/docs/databaselayout.md
+++ /dev/null
@@ -1,450 +0,0 @@
-The totally incomplete guide to Citadel internals
------------------------------------------------------
------------------------------------------------------
-
-Citadel has evolved quite a bit since its early days, and the data structures
-have evolved with it.  This document provides a rough overview of how the
-system works internally.  For details you're going to have to dig through the
-code, but this'll get you started. 
-
-
-DATABASE TABLES
----------------
-As you probably already know by now, Citadel uses a group of tables stored
-with a record manager (usually Berkeley DB).  Since we're using a record
-manager rather than a relational database, all record structures are managed
-by Citadel.  Here are some of the tables we keep on disk:
-
-
-USER RECORDS
-------------
-This table contains all user records.  It's indexed by
-user name (translated to lower case for indexing purposes).  The records in
-this file look something like this:
-
-struct ctdluser {                       /* User record                      */
-        int version;                    /* Cit vers. which created this rec */
-        uid_t uid;                      /* Associate with a unix account?   */
-        char password[32];              /* password (for Citadel-only users)*/
-        unsigned flags;                 /* See US_ flags below              */
-        long timescalled;               /* Total number of logins           */
-        long posted;                    /* Number of messages posted (ever) */
-        CIT_UBYTE axlevel;              /* Access level                     */
-        long usernum;                   /* User number (never recycled)     */
-        time_t lastcall;                /* Last time the user called        */
-        int USuserpurge;                /* Purge time (in days) for user    */
-        char fullname[64];              /* Name for Citadel messages & mail */
-};
- 
- Most fields here should be fairly self-explanatory.  The ones that might
-deserve some attention are:
-
- - *uid* -- if uid is not the same as the *unix uid* Citadel is running as, then the
-   account is assumed to belong to the user on the underlying Unix system with
-   that uid.  This allows us to require the user's OS password instead of having
-   a separate Citadel password.
- 
- - *usernum* -- these are assigned sequentially, and **NEVER REUSED**. This is
-  important because it allows us to use this number in other data structures
-  without having to worry about users being added/removed later on, as you'll
-  see later in this document.
- 
- 
-ROOM RECORDS
-------------
-These are room records.  There is a room record for every room on the
-system, public or private or mailbox.  It's indexed by room name (also in
-lower case for easy indexing) and it contains records which look like this:
-
-    struct ctdlroom {
-        char QRname[ROOMNAMELEN];       /* Name of room                     */
-        char QRpasswd[10];              /* Only valid if it's a private rm  */
-        long QRroomaide;                /* User number of room aide         */
-        long QRhighest;                 /* Highest message NUMBER in room   */
-        time_t QRgen;                   /* Generation number of room        */
-        unsigned QRflags;               /* See flag values below            */
-        char QRdirname[15];             /* Directory name, if applicable    */
-        long QRinfo;                    /* Info file update relative to msgs*/
-        char QRfloor;                   /* Which floor this room is on      */
-        time_t QRmtime;                 /* Date/time of last post           */
-        struct ExpirePolicy QRep;       /* Message expiration policy        */
-        long QRnumber;                  /* Globally unique room number      */
-        char QRorder;                   /* Sort key for room listing order  */
-        unsigned QRflags2;              /* Additional flags                 */
-        int QRdefaultview;              /* How to display the contents      */
-    };
-
-Again, mostly self-explanatory.  Here are the interesting ones:
- 
-*QRnumber* is a globally unique room ID, while QRgen is the "generation number"
-of the room (it's actually a timestamp).  The two combined produce a unique
-value which identifies the room.  The reason for two separate fields will be
-explained below when we discuss the visit table.  For now just remember that
-*QRnumber* remains the same for the duration of the room's existence, and QRgen
-is timestamped once during room creation but may be restamped later on when
-certain circumstances exist.
-
-FLOORTAB
---------
-Floors.  This is so simplistic it's not worth going into detail about, except
-to note that we keep a reference count of the number of rooms on each floor.
- 
-MSGLISTS
---------
-Each record in this table consists of a bunch of message  numbers
-which represent the contents of a room.  A message can exist in more than one
-room (for example, a mail message with multiple recipients -- 'single instance
-store').  This table is never, ever traversed in its entirety.  When you do
-any type of read operation, it fetches the msglist for the room you're in
-(using the room's ID as the index key) and then you can go ahead and read
-those messages one by one.
-
-Each room is basically just a list of message numbers.  Each time
-we enter a new message in a room, its message number is appended to the end
-of the list.  If an old message is to be expired, we must delete it from the
-message base.  Reading a room is just a matter of looking up the messages
-one by one and sending them to the client for display, printing, or whatever.
- 
-
-VISIT
------
-This is the tough one.  Put on your thinking cap and grab a fresh cup of
-coffee before attempting to grok the visit table.
- 
-This table contains records which establish the relationship between users
-and rooms.  Its index is a hash of the user and room combination in question.
-When looking for such a relationship, the record in this table can tell the
-server things like "this user has zapped this room," "this user has access to
-this private room," etc.  It's also where we keep track of which messages
-the user has marked as "old" and which are "new" (which are not necessarily
-contiguous; contrast with older Citadel implementations which simply kept a
-"last read" pointer).
- 
-
-Here's what the records look like:
- 
-    struct visit {
-        long v_roomnum;
-        long v_roomgen;
-        long v_usernum;
-        long v_lastseen;
-        unsigned int v_flags;
-        char v_seen[SIZ];
-        int v_view;
-    };
-
-    #define V_FORGET        1       /* User has zapped this room        */
-    #define V_LOCKOUT       2       /* User is locked out of this room  */
-    #define V_ACCESS        4       /* Access is granted to this room   */
- 
-This table is indexed by a concatenation of the first three fields.  Whenever
-we want to learn the relationship between a user and a room, we feed that
-data to a function which looks up the corresponding record.  The record is
-designed in such a way that an "all zeroes" record (which is what you get if
-the record isn't found) represents the default relationship.
- 
-With this data, we now know which private rooms we're allowed to visit: if
-the *V_ACCESS* bit is set, the room is one which the user knows, and it may
-appear in his/her known rooms list.  Conversely, we also know which rooms the
-user has zapped: if the *V_FORGET* flag is set, we relegate the room to the
-zapped list and don't bring it up during new message searches.  It's also
-worth noting that the *V_LOCKOUT* flag works in a similar way to administratively
-lock users out of rooms.
- 
-Implementing the "cause all users to forget room" command, then, becomes very
-simple: we simply change the generation number of the room by putting a new
-timestamp in the *QRgen* field.  This causes all relevant visit records to
-become irrelevant, because they appear to point to a different room.  At the
-same time, we don't lose the messages in the room, because the msglists table
-is indexed by the room number (*QRnumber*), which never changes.
- 
-*v_seen* contains a string which represents the set of messages in this room
-which the user has read (marked as 'seen' or 'old').  It follows the same
-syntax used by IMAP and NNTP.  When we search for new messages, we simply
-return any messages that are in the room that are **not** represented by this
-set.  Naturally, when we do want to mark more messages as seen (or unmark
-them), we change this string.  Citadel BBS client implementations are naive
-and think linearly in terms of "everything is old up to this point," but IMAP
-clients want to have more granularity.
-
-
-DIRECTORY
----------
-This table simply maps Internet e-mail addresses to Citadel network addresses
-for quick lookup.  It is generated from data in the Global Address Book room.
-
-USETABLE
---------
-This table keeps track of message ID's of messages arriving over a network,
-to prevent duplicates from being posted if someone misconfigures the network
-and a loop is created.  This table goes unused on a non-networked Citadel.
-
-THE MESSAGE STORE
------------------
-This is where all message text is stored.  It's indexed by message number:
-give it a number, get back a message.  Messages are numbered sequentially, and
-the message numbers are never reused.
- 
-We also keep a "metadata" record for each message.  This record is also stored
-in the msgmain table, using the index (0 - msgnum).  We keep in the metadata
-record, among other things, a reference count for each message.  Since a
-message may exist in more than one room, it's important to keep this reference
-count up to date, and to delete the message from disk when the reference count
-reaches zero.
- 
-#Here's the format for the message itself:
-
- - Each message begins with an 0xFF 'start of message' byte.
- 
- - The next byte denotes whether this is an anonymous message.  The codes
-   available are *MES_NORMAL*, *MES_ANON*, or *MES_AN2* (defined in citadel.h).
- 
- - The third byte is a "message type" code.  The following codes are defined:
-  - 0 - "Traditional" Citadel format.  Message is to be displayed "formatted."
-  - 1 - Plain pre-formatted ASCII text (otherwise known as text/plain)
-  - 4 - MIME formatted message.  The text of the message which follows is
-        expected to begin with a "Content-type:" header.
- 
- - After these three opening bytes, the remainder of
-   the message consists of a sequence of character strings.  Each string
-   begins with a type byte indicating the meaning of the string and is
-   ended with a null.  All strings are printable ASCII: in particular,
-   all numbers are in ASCII rather than binary.  This is for simplicity,
-   both in implementing the system and in implementing other code to
-   work with the system.  For instance, a database driven off Citadel archives
-   can do wildcard matching without worrying about unpacking binary data such
-   as message ID's first.  To provide later downward compatability
-   all software should be written to IGNORE fields not currently defined.
-
-
-#The type bytes currently defined are:
-
-
-| BYTE  |       Enum        | NW   | Mnemonic       |  Enum / Comments
-|-------|-------------------|------|----------------|---------------------------------------------------------
-| A     |    eAuthor        | from | Author         |  *eAuthor*
-|       |                   |      |                |  Name of originator of message.
-| B     |    eBig\_message  |      | Big message    |  *eBig\_message*
-|       |                   |      |                |  This is a flag which indicates that the message is
-|       |                   |      |                |  big, and Citadel is storing the body in a separate
-|       |                   |      |                |  record.  You will never see this field because the
-|       |                   |      |                |  internal API handles it.
-| C     |    eRemoteRoom    |      | RemoteRoom     |  *eRemoteRoom*
-|       |                   |      |                |  when sent via Citadel Networking, this is the room
-|       |                   |      |                |  its going to be put on the remote site.
-| D     |    eDestination   |      | Destination    |  *eDestination*
-|       |                   |      |                |  Contains name of the system this message should
-|       |                   |      |                |  be sent to, for mail routing (private mail only).
-| E     |    eExclusiveID   | exti | Exclusive ID   |  *eExclusiveID*
-|       |                   |      |                |  A persistent alphanumeric Message ID used for
-|       |                   |      |                |  network replication.  When a message arrives that
-|       |                   |      |                |  contains an Exclusive ID, any existing messages which
-|       |                   |      |                |  contain the same Exclusive ID and are *older* than this
-|       |                   |      |                |  message should be deleted.  If there exist any messages
-|       |                   |      |                |  with the same Exclusive ID that are *newer*, then this
-|       |                   |      |                |  message should be dropped.
-| F     |    erFc822Addr    | rfca | rFc822 address |  *erFc822Addr*
-|       |                   |      |                |  For Internet mail, this is the delivery address of the
-|       |                   |      |                |  message author.
-| H     |    eHumanNode     | hnod | Human node name|  *eHumanNode*
-|       |                   |      |                |  Human-readable name of system message originated on.
-| I     |    emessageId     | msgn | Message ID     |  *emessageId*
-|       |                   |      |                |  An RFC822-compatible message ID for this message.
-| J     |    eJournal       | jrnl | Journal        |  *eJournal*
-|       |                   |      |                |  The presence of this field indicates that the message
-|       |                   |      |                |  is disqualified from being journaled, perhaps because
-|       |                   |      |                |  it is itself a journalized message and we wish to
-|       |                   |      |                |  avoid double journaling.
-| K     |    eReplyTo       | rep2 | Reply-To       |  *eReplyTo*
-|       |                   |      |                |  the Reply-To header for mailinglist outbound messages
-| L     |    eListID        | list | List-ID        |  *eListID*
-|       |                   |      |                |  Mailing list identification, as per RFC 2919
-| M     |    eMesageText    | text | Message Text   |  *eMesageText*
-|       |                   |      |                |  Normal ASCII, newlines seperated by CR's or LF's,
-|       |                   |      |                |  null terminated as always.
-| N     |    eNodeName      | node | Nodename       |  *eNodeName*
-|       |                   |      |                |  Contains node name of system message originated on.
-| O     |    eOriginalRoom  | room | Room           |  *eOriginalRoom* - Room of origin.
-| P     |    eMessagePath   | path | Path           |  *eMessagePath*
-|       |                   |      |                |  Complete path of message, as in the UseNet news
-|       |                   |      |                |  standard.  A user should be able to send Internet mail
-|       |                   |      |                |  to this path. (Note that your system name will not be
-|       |                   |      |                |  tacked onto this until you're sending the message to
-|       |                   |      |                |  someone else)
-| R     |    eRecipient     | rcpt | Recipient      |  *eRecipient* - Only present in Mail messages.
-| S     |    eSpecialField  | spec | Special field  |  *eSpecialField*
-|       |                   |      |                |  Only meaningful for messages being spooled over a
-|       |                   |      |                |  network.  Usually means that the message isn't really
-|       |                   |      |                |  a message, but rather some other network function:
-|       |                   |      |                |  -> "S" followed by "FILE" (followed by a null, of
-|       |                   |      |                |     course) means that the message text is actually an
-|       |                   |      |                |     IGnet/Open file transfer.  (OBSOLETE)
-|       |                   |      |                |  -> "S" followed by "CANCEL" means that this message
-|       |                   |      |                |     should be deleted from the local message base once
-|       |                   |      |                |     it has been replicated to all network systems.
-| T     |    eTimestamp     | time | date/Time      |  *eTimestamp*
-|       |                   |      |                |  Unix timestamp containing the creation date/time of
-|       |                   |      |                |  the message.
-| U     |    eMsgSubject    | subj | sUbject        |  *eMsgSubject* - Optional.
-|       |                   |      |                |  Developers may choose whether they wish to
-|       |                   |      |                |  generate or display subject fields.
-| V     |    eenVelopeTo    | nvto | enVelope-to    |  *eenVelopeTo*
-|       |                   |      |                |  The recipient specified in incoming SMTP messages.
-| W     |    eWeferences    | wefw | Wefewences     |  *eWeferences*
-|       |                   |      |                |  Previous message ID's for conversation threading.  When
-|       |                   |      |                |  converting from RFC822 we use References: if present, or
-|       |                   |      |                |  In-Reply-To: otherwise.
-|       |                   |      |                |  (Who in extnotify spool messages which don't need to know
-|       |                   |      |                |  other message ids)
-| Y     |    eCarbonCopY    | cccc | carbon copY    |  *eCarbonCopY*
-|       |                   |      |                |  Optional, and only in Mail messages.
-| %     |    eHeaderOnly    | nhdr | oNlyHeader     |  we will just be sending headers. for the Wire protocol only.
-| %     |    eFormatType    | type | type           |  type of citadel message: (Wire protocol only)
-|       |                   |      |                |     FMT\_CITADEL     0   Citadel vari-format (proprietary) 
-|       |                   |      |                |     FMT\_FIXED       1   Fixed format (proprietary)
-|       |                   |      |                |     FMT\_RFC822      4   Standard (headers are in M field)
-| %     |    eMessagePart   | part | emessagePart   |  *eMessagePart* is the id of this part in the mime hierachy
-| %     |       eSubFolder     | suff | eSubFolder     |  descend into a mime sub container
-| %     |       ePevious       | pref | ePevious       |  exit a mime sub container
-| 0     |    eErrorMsg      |      | Error          |  *eErrorMsg*
-|       |                   |      |                |  This field is typically never found in a message on
-|       |                   |      |                |  disk or in transit.  Message scanning modules are
-|       |                   |      |                |  expected to fill in this field when rejecting a message
-|       |                   |      |                |  with an explanation as to what happened (virus found,
-|       |                   |      |                |  message looks like spam, etc.)
-| 1     |    eSuppressIdx   |      | suppress index |  *eSuppressIdx*
-|       |                   |      |                |  The presence of this field indicates that the message is
-|       |                   |      |                |  disqualified from being added to the full text index.
-| 2     |    eExtnotify     |      | extnotify      |  *eExtnotify* - Used internally by the serv_extnotify module.
-| 3     |    eVltMsgNum     |      | msgnum         |  *eVltMsgNum*
-|       |                   |      |                |  Used internally to pass the local message number in the
-|       |                   |      |                |  database to after-save hooks.  Discarded afterwards.
-
-EXAMPLE
--------
-Let *<FF>* be a *0xFF* byte, and *<0>* be a null *(0x00)* byte.  Then a message
-which prints as...
-
-    Apr 12, 1988 23:16 From Test User In Network Test> @lifesys (Life Central)
-    Have a nice day!
-
-might be stored as...
-
-    <FF><40><0>I12345<0>Pneighbor!lifesys!test_user<0>T576918988<0>    (continued)
-    -----------|Mesg ID#|--Message Path---------------|--Date------
-    
-    AThe Test User<0>ONetwork Test<0>Nlifesys<0>HLife Central<0>MHave a nice day!<0>
-    |-----Author-----|-Room name-----|-nodename-|Human Name-|--Message text-----
-
-Weird things can happen if fields are missing, especially if you use the
-networker.  But basically, the date, author, room, and nodename may be in any
-order.  But the leading fields and the message text must remain in the same
-place.  The H field looks better when it is placed immediately after the N
-field.
-
-
-EUID (EXCLUSIVE MESSAGE ID'S)
------------------------------
-This is where the groupware magic happens.  Any message in any room may have
-a field called the Exclusive message *ID*, or *EUID*.  We keep an index in the
-table *CDB_EUIDINDEX* which knows the message number of any item that has an
-*EUID*.  This allows us to do two things:
- 
- - If a subsequent message arrives with the same *EUID*, it automatically
-   *deletes* the existing one, because the new one is considered a replacement
-   for the existing one.
- - If we know the *EUID* of the item we're looking for, we can fetch it by *EUID*
-   and get the most up-to-date version, even if it's been updated several times.
-
-This functionality is made more useful by server-side hooks.  For example,
-when we save a vCard to an address book room, or an iCalendar item to a
-calendar room, our server modules detect this condition, and automatically set
-the *EUID* of the message to the *UUID* of the *vCard* or *iCalendar* item.
-Therefore when you save an updated version of an address book entry or
-a calendar item, the old one is automatically deleted.
-
-NETWORKING (REPLICATION)
-------------------------
-Citadel nodes network by sharing one or more rooms. Any Citadel node
-can choose to share messages with any other Citadel node, through the sending
-of spool files.  The sending system takes all messages it hasn't sent yet, and
-spools them to the recieving system, which posts them in the rooms.
-
-The *EUID* discussion above is extremely relevant, because *EUID* is carried over
-the network as well, and the replacement rules are followed over the network
-as well.  Therefore, when a message containing an *EUID* is saved in a networked
-room, it replaces any existing message with the same *EUID* *on every node in
-the network*.
-
-Complexities arise primarily from the possibility of densely connected
-networks: one does not wish to accumulate multiple copies of a given
-message, which can easily happen.  Nor does one want to see old messages
-percolating indefinitely through the system.
-
-This problem is handled by keeping track of the path a message has taken over
-the network, like the UseNet news system does.  When a system sends out a
-message, it adds its own name to the bang-path in the *<P>* field of the
-message.  If no path field is present, it generates one.
-   
-With the path present, all the networker has to do to assure that it doesn't
-send another system a message it's already received is check the <P>ath field
-for that system's name somewhere in the bang path.  If it's present, the system
-has already seen the message, so we don't send it.
-
-We also keep a small database, called the "use table," containing the ID's of
-all messages we've seen recently.  If the same message arrives a second or
-subsequent time, we will find its ID in the use table, indicating that we
-already have a copy of that message.  It will therefore be discarded.
-
-The above discussion should make the function of the fields reasonably clear:
-
- o  Travelling messages need to carry original message-id, system of origin,
-    date of origin, author, and path with them, to keep reproduction and
-    cycling under control.
-
-(Uncoincidentally) the format used to transmit messages for networking
-purposes is precisely that used on disk, serialized.  The current
-distribution includes serv_network.c, which is basically a database replicator;
-please see network.txt on its operation and functionality (if any).
-
-PORTABILITY ISSUES
-------------------
-Citadel is 64-bit clean, architecture-independent, and Year 2000
-compliant.  The software should compile on any POSIX compliant system with
-a full pthreads implementation and TCP/IP support.  In the future we may
-try to port it to non-POSIX systems as well.
-
-On the client side, it's also POSIX compliant.  The client even seems to
-build ok on non-POSIX systems with porting libraries (such as Cygwin).
-
-SUPPORTING PRIVATE MAIL
------------------------
-Can one have an elegant kludge?  This must come pretty close.
-
-Private mail is sent and recieved in the *Mail>* room, which otherwise
-behaves pretty much as any other room.        To make this work, we have a
-separate Mail> room for each user behind the scenes.  The actual room name
-in the database looks like *"0000001234.Mail"* (where *'1234'* is the user
-number) and it's flagged with the *QR_MAILBOX* flag.  The user number is
-stripped off by the server before the name is presented to the client.  This
-provides the ability to give each user a separate namespace for mailboxes
-and personal rooms.
-
-This requires a little fiddling to get things just right. For example,
-*make_message()* has to be kludged to ask for the name of the recipient
-of the message whenever a message is entered in *Mail>*. But basically
-it works pretty well, keeping the code and user interface simple and
-regular.
-
-PASSWORDS AND NAME VALIDATION
------------------------------
-This has changed a couple of times over the course of Citadel's history.  At
-this point it's very simple, again due to the fact that record managers are
-used for everything.    The user file (user) is indexed using the user's
-name, converted to all lower-case.  Searching for a user, then, is easy.  We
-just lowercase the name we're looking for and query the database.  If no
-match is found, it is assumed that the user does not exist.
-
-This makes it difficult to forge messages from an existing user.  (Fine
-point: nonprinting characters are converted to printing characters, and
-leading, trailing, and double blanks are deleted.)
diff --git a/citadel/docs/databaselayout.txt b/citadel/docs/databaselayout.txt

new file mode 100644 (file)

index 0000000..7dbcf1d
--- /dev/null
+++ b/citadel/docs/databaselayout.txt
@@ -0,0 +1,415 @@
+The totally incomplete guide to Citadel internals
+-----------------------------------------------------
+
+Citadel has evolved quite a bit since its early days, and the data structures
+have evolved with it.  This document provides a rough overview of how the
+system works internally.  For details you're going to have to dig through the
+code, but this'll get you started. 
+
+
+DATABASE TABLES
+---------------
+As you probably already know by now, Citadel uses a group of tables stored
+with a record manager (usually Berkeley DB).  Since we're using a record
+manager rather than a relational database, all record structures are managed
+by Citadel.  Here are some of the tables we keep on disk:
+
+
+USER RECORDS
+------------
+This table contains all user records.  It's indexed by
+user name (translated to lower case for indexing purposes).  The records in
+this file look something like this:
+
+struct ctdluser {                       /* User record                      */
+        int version;                    /* Cit vers. which created this rec */
+        uid_t uid;                      /* Associate with a unix account?   */
+        char password[32];              /* password (for Citadel-only users)*/
+        unsigned flags;                 /* See US_ flags below              */
+        long timescalled;               /* Total number of logins           */
+        long posted;                    /* Number of messages posted (ever) */
+        CIT_UBYTE axlevel;              /* Access level                     */
+        long usernum;                   /* User number (never recycled)     */
+        time_t lastcall;                /* Last time the user called        */
+        int USuserpurge;                /* Purge time (in days) for user    */
+        char fullname[64];              /* Name for Citadel messages & mail */
+};
+ 
+ Most fields here should be fairly self-explanatory.  The ones that might
+deserve some attention are:
+
+ - *uid* -- if uid is not the same as the *unix uid* Citadel is running as, then the
+   account is assumed to belong to the user on the underlying Unix system with
+   that uid.  This allows us to require the user's OS password instead of having
+   a separate Citadel password.
+ 
+ - *usernum* -- these are assigned sequentially, and **NEVER REUSED**. This is
+  important because it allows us to use this number in other data structures
+  without having to worry about users being added/removed later on, as you'll
+  see later in this document.
+ 
+ 
+ROOM RECORDS
+------------
+These are room records.  There is a room record for every room on the
+system, public or private or mailbox.  It's indexed by room name (also in
+lower case for easy indexing) and it contains records which look like this:
+
+    struct ctdlroom {
+        char QRname[ROOMNAMELEN];       /* Name of room                     */
+        char QRpasswd[10];              /* Only valid if it's a private rm  */
+        long QRroomaide;                /* User number of room aide         */
+        long QRhighest;                 /* Highest message NUMBER in room   */
+        time_t QRgen;                   /* Generation number of room        */
+        unsigned QRflags;               /* See flag values below            */
+        char QRdirname[15];             /* Directory name, if applicable    */
+        long QRinfo;                    /* Info file update relative to msgs*/
+        char QRfloor;                   /* Which floor this room is on      */
+        time_t QRmtime;                 /* Date/time of last post           */
+        struct ExpirePolicy QRep;       /* Message expiration policy        */
+        long QRnumber;                  /* Globally unique room number      */
+        char QRorder;                   /* Sort key for room listing order  */
+        unsigned QRflags2;              /* Additional flags                 */
+        int QRdefaultview;              /* How to display the contents      */
+    };
+
+Again, mostly self-explanatory.  Here are the interesting ones:
+ 
+*QRnumber* is a globally unique room ID, while QRgen is the "generation number"
+of the room (it's actually a timestamp).  The two combined produce a unique
+value which identifies the room.  The reason for two separate fields will be
+explained below when we discuss the visit table.  For now just remember that
+*QRnumber* remains the same for the duration of the room's existence, and QRgen
+is timestamped once during room creation but may be restamped later on when
+certain circumstances exist.
+
+FLOORTAB
+--------
+Floors.  This is so simplistic it's not worth going into detail about, except
+to note that we keep a reference count of the number of rooms on each floor.
+ 
+MSGLISTS
+--------
+Each record in this table consists of a bunch of message  numbers
+which represent the contents of a room.  A message can exist in more than one
+room (for example, a mail message with multiple recipients -- 'single instance
+store').  This table is never, ever traversed in its entirety.  When you do
+any type of read operation, it fetches the msglist for the room you're in
+(using the room's ID as the index key) and then you can go ahead and read
+those messages one by one.
+
+Each room is basically just a list of message numbers.  Each time
+we enter a new message in a room, its message number is appended to the end
+of the list.  If an old message is to be expired, we must delete it from the
+message base.  Reading a room is just a matter of looking up the messages
+one by one and sending them to the client for display, printing, or whatever.
+ 
+
+VISIT
+-----
+This is the tough one.  Put on your thinking cap and grab a fresh cup of
+coffee before attempting to grok the visit table.
+ 
+This table contains records which establish the relationship between users
+and rooms.  Its index is a hash of the user and room combination in question.
+When looking for such a relationship, the record in this table can tell the
+server things like "this user has zapped this room," "this user has access to
+this private room," etc.  It's also where we keep track of which messages
+the user has marked as "old" and which are "new" (which are not necessarily
+contiguous; contrast with older Citadel implementations which simply kept a
+"last read" pointer).
+ 
+
+Here's what the records look like:
+ 
+    struct visit {
+        long v_roomnum;
+        long v_roomgen;
+        long v_usernum;
+        long v_lastseen;
+        unsigned int v_flags;
+        char v_seen[SIZ];
+        int v_view;
+    };
+
+    #define V_FORGET        1       /* User has zapped this room        */
+    #define V_LOCKOUT       2       /* User is locked out of this room  */
+    #define V_ACCESS        4       /* Access is granted to this room   */
+ 
+This table is indexed by a concatenation of the first three fields.  Whenever
+we want to learn the relationship between a user and a room, we feed that
+data to a function which looks up the corresponding record.  The record is
+designed in such a way that an "all zeroes" record (which is what you get if
+the record isn't found) represents the default relationship.
+ 
+With this data, we now know which private rooms we're allowed to visit: if
+the *V_ACCESS* bit is set, the room is one which the user knows, and it may
+appear in his/her known rooms list.  Conversely, we also know which rooms the
+user has zapped: if the *V_FORGET* flag is set, we relegate the room to the
+zapped list and don't bring it up during new message searches.  It's also
+worth noting that the *V_LOCKOUT* flag works in a similar way to administratively
+lock users out of rooms.
+ 
+Implementing the "cause all users to forget room" command, then, becomes very
+simple: we simply change the generation number of the room by putting a new
+timestamp in the *QRgen* field.  This causes all relevant visit records to
+become irrelevant, because they appear to point to a different room.  At the
+same time, we don't lose the messages in the room, because the msglists table
+is indexed by the room number (*QRnumber*), which never changes.
+ 
+*v_seen* contains a string which represents the set of messages in this room
+which the user has read (marked as 'seen' or 'old').  It follows the same
+syntax used by IMAP and NNTP.  When we search for new messages, we simply
+return any messages that are in the room that are **not** represented by this
+set.  Naturally, when we do want to mark more messages as seen (or unmark
+them), we change this string.  Citadel BBS client implementations are naive
+and think linearly in terms of "everything is old up to this point," but IMAP
+clients want to have more granularity.
+
+
+DIRECTORY
+---------
+This table simply maps Internet e-mail addresses to Citadel network addresses
+for quick lookup.  It is generated from data in the Global Address Book room.
+
+USETABLE
+--------
+This table keeps track of message ID's of messages arriving over a network,
+to prevent duplicates from being posted if someone misconfigures the network
+and a loop is created.  This table goes unused on a non-networked Citadel.
+
+THE MESSAGE STORE
+-----------------
+This is where all message text is stored.  It's indexed by message number:
+give it a number, get back a message.  Messages are numbered sequentially, and
+the message numbers are never reused.
+ 
+We also keep a "metadata" record for each message.  This record is also stored
+in the msgmain table, using the index (0 - msgnum).  We keep in the metadata
+record, among other things, a reference count for each message.  Since a
+message may exist in more than one room, it's important to keep this reference
+count up to date, and to delete the message from disk when the reference count
+reaches zero.
+ 
+#Here's the format for the message itself:
+
+ - Each message begins with an 0xFF 'start of message' byte.
+ 
+ - The next byte denotes whether this is an anonymous message.  The codes
+   available are *MES_NORMAL*, *MES_ANON*, or *MES_AN2* (defined in citadel.h).
+ 
+ - The third byte is a "message type" code.  The following codes are defined:
+  - 0 - "Traditional" Citadel format.  Message is to be displayed "formatted."
+  - 1 - Plain pre-formatted ASCII text (otherwise known as text/plain)
+  - 4 - MIME formatted message.  The text of the message which follows is
+        expected to begin with a "Content-type:" header.
+ 
+ - After these three opening bytes, the remainder of
+   the message consists of a sequence of character strings.  Each string
+   begins with a type byte indicating the meaning of the string and is
+   ended with a null.  All strings are printable ASCII: in particular,
+   all numbers are in ASCII rather than binary.  This is for simplicity,
+   both in implementing the system and in implementing other code to
+   work with the system.  For instance, a database driven off Citadel archives
+   can do wildcard matching without worrying about unpacking binary data such
+   as message ID's first.  To provide later downward compatability
+   all software should be written to IGNORE fields not currently defined.
+
+
+#The type bytes currently defined are:
+
+
+| BYTE  |       Enum        | NW   | Mnemonic       |  Enum / Comments
+|-------|-------------------|------|----------------|---------------------------------------------------------
+| A     |    eAuthor        | from | Author         |  Name of originator of the message.
+| B     |    eBig_message   |      | Big message    |  This is a flag which indicates that the message is
+|       |                   |      |                |  big, and Citadel is storing the body in a separate
+|       |                   |      |                |  record.  You will never see this field because the
+|       |                   |      |                |  internal API handles it.
+| E     |    eExclusiveID   | exti | Exclusive ID   |  A persistent alphanumeric Message ID used for
+|       |                   |      |                |  replication control.  When a message arrives that
+|       |                   |      |                |  contains an Exclusive ID, any existing messages which
+|       |                   |      |                |  contain the same Exclusive ID and are *older* than this
+|       |                   |      |                |  message should be deleted.  If there exist any messages
+|       |                   |      |                |  with the same Exclusive ID that are *newer*, then this
+|       |                   |      |                |  message should be dropped.
+| F     |    erFc822Addr    | rfca | rFc822 address |  For Internet mail, this is the delivery address of the
+|       |                   |      |                |  message author.
+| I     |    emessageId     | msgn | Message ID     |  An RFC822-compatible message ID for this message.
+|       |                   |      |                |  
+| J     |    eJournal       | jrnl | Journal        |  The presence of this field indicates that the message
+|       |                   |      |                |  is disqualified from being journaled, perhaps because
+|       |                   |      |                |  it is itself a journalized message and we wish to
+|       |                   |      |                |  avoid double journaling.
+| K     |    eReplyTo       | rep2 | Reply-To       |  the Reply-To header for mailinglist outbound messages
+| L     |    eListID        | list | List-ID        |  Mailing list identification, as per RFC 2919
+| M     |    eMesageText    | text | Message Text   |  Normal ASCII, newlines seperated by CR's or LF's,
+|       |                   |      |                |  null terminated as always.
+| O     |    eOriginalRoom  | room | Room           |  Room of origin.
+| P     |    eMessagePath   | path | Path           |  Complete path of message, as in the UseNet news
+|       |                   |      |                |  standard.  A user should be able to send Internet mail
+|       |                   |      |                |  to this path. (Note that your system name will not be
+|       |                   |      |                |  tacked onto this until you're sending the message to
+|       |                   |      |                |  someone else)
+| R     |    eRecipient     | rcpt | Recipient      |  Only present in Mail messages.
+| T     |    eTimestamp     | time | date/Time      |  Unix timestamp containing the creation date/time of
+|       |                   |      |                |  the message.
+| U     |    eMsgSubject    | subj | sUbject        |  Message subject.  Optional.
+|       |                   |      |                |  Developers may choose whether they wish to
+|       |                   |      |                |  generate or display subject fields.
+| V     |    eenVelopeTo    | nvto | enVelope-to    |  The recipient specified in incoming SMTP messages.
+| W     |    eWeferences    | wefw | Wefewences     |  Previous message ID's for conversation threading.  When
+|       |                   |      |                |  converting from RFC822 we use References: if present, or
+|       |                   |      |                |  In-Reply-To: otherwise.
+|       |                   |      |                |  (Who in extnotify spool messages which don't need to know
+|       |                   |      |                |  other message ids)
+| Y     |    eCarbonCopY    | cccc | carbon copY    |  Carbon copy (CC) recipients.
+|       |                   |      |                |  Optional, and only in Mail messages.
+| %     |    eHeaderOnly    | nhdr | oNlyHeader     |  we will just be sending headers. for the Wire protocol only.
+| %     |    eFormatType    | type | type           |  type of citadel message: (Wire protocol only)
+|       |                   |      |                |     FMT\_CITADEL     0   Citadel vari-format (proprietary) 
+|       |                   |      |                |     FMT\_FIXED       1   Fixed format (proprietary)
+|       |                   |      |                |     FMT\_RFC822      4   Standard (headers are in M field)
+| %     |    eMessagePart   | part | emessagePart   |  eMessagePart is the id of this part in the mime hierachy
+| %     |       eSubFolder     | suff | eSubFolder     |  descend into a mime sub container
+| %     |       ePevious       | pref | ePevious       |  exit a mime sub container
+| 0     |    eErrorMsg      |      | Error          |  This field is typically never found in a message on
+|       |                   |      |                |  disk or in transit.  Message scanning modules are
+|       |                   |      |                |  expected to fill in this field when rejecting a message
+|       |                   |      |                |  with an explanation as to what happened (virus found,
+|       |                   |      |                |  message looks like spam, etc.)
+| 1     |    eSuppressIdx   |      | suppress index |  The presence of this field indicates that the message is
+|       |                   |      |                |  disqualified from being added to the full text index.
+| 2     |    eExtnotify     |      | extnotify      |  Used internally by the serv_extnotify module.
+| 3     |    eVltMsgNum     |      | msgnum         |  Used internally to pass the local message number in the
+|       |                   |      |                |  database to after-save hooks.  Discarded afterwards.
+
+EXAMPLE
+-------
+Let *<FF>* be a *0xFF* byte, and *<0>* be a null *(0x00)* byte.  Then a message
+which prints as...
+
+    Apr 12, 1988 23:16 From Test User In Network Test> @lifesys (Life Central)
+    Have a nice day!
+
+might be stored as...
+
+    <FF><40><0>I12345<0>Pneighbor!lifesys!test_user<0>T576918988<0>    (continued)
+    -----------|Mesg ID#|--Message Path---------------|--Date------
+    
+    AThe Test User<0>ONetwork Test<0>Nlifesys<0>HLife Central<0>MHave a nice day!<0>
+    |-----Author-----|-Room name-----|-nodename-|Human Name-|--Message text-----
+
+Weird things can happen if fields are missing, especially if you use the
+networker.  But basically, the date, author, room, and nodename may be in any
+order.  But the leading fields and the message text must remain in the same
+place.  The H field looks better when it is placed immediately after the N
+field.
+
+
+EUID (EXCLUSIVE MESSAGE ID'S)
+-----------------------------
+This is where the groupware magic happens.  Any message in any room may have
+a field called the Exclusive message *ID*, or *EUID*.  We keep an index in the
+table *CDB_EUIDINDEX* which knows the message number of any item that has an
+*EUID*.  This allows us to do two things:
+ 
+ - If a subsequent message arrives with the same *EUID*, it automatically
+   *deletes* the existing one, because the new one is considered a replacement
+   for the existing one.
+ - If we know the *EUID* of the item we're looking for, we can fetch it by *EUID*
+   and get the most up-to-date version, even if it's been updated several times.
+
+This functionality is made more useful by server-side hooks.  For example,
+when we save a vCard to an address book room, or an iCalendar item to a
+calendar room, our server modules detect this condition, and automatically set
+the *EUID* of the message to the *UUID* of the *vCard* or *iCalendar* item.
+Therefore when you save an updated version of an address book entry or
+a calendar item, the old one is automatically deleted.
+
+NETWORKING (REPLICATION)
+------------------------
+Citadel nodes network by sharing one or more rooms. Any Citadel node
+can choose to share messages with any other Citadel node, through the sending
+of spool files.  The sending system takes all messages it hasn't sent yet, and
+spools them to the recieving system, which posts them in the rooms.
+
+The *EUID* discussion above is extremely relevant, because *EUID* is carried over
+the network as well, and the replacement rules are followed over the network
+as well.  Therefore, when a message containing an *EUID* is saved in a networked
+room, it replaces any existing message with the same *EUID* *on every node in
+the network*.
+
+Complexities arise primarily from the possibility of densely connected
+networks: one does not wish to accumulate multiple copies of a given
+message, which can easily happen.  Nor does one want to see old messages
+percolating indefinitely through the system.
+
+This problem is handled by keeping track of the path a message has taken over
+the network, like the UseNet news system does.  When a system sends out a
+message, it adds its own name to the bang-path in the *<P>* field of the
+message.  If no path field is present, it generates one.
+   
+With the path present, all the networker has to do to assure that it doesn't
+send another system a message it's already received is check the <P>ath field
+for that system's name somewhere in the bang path.  If it's present, the system
+has already seen the message, so we don't send it.
+
+We also keep a small database, called the "use table," containing the ID's of
+all messages we've seen recently.  If the same message arrives a second or
+subsequent time, we will find its ID in the use table, indicating that we
+already have a copy of that message.  It will therefore be discarded.
+
+The above discussion should make the function of the fields reasonably clear:
+
+ o  Travelling messages need to carry original message-id, system of origin,
+    date of origin, author, and path with them, to keep reproduction and
+    cycling under control.
+
+(Uncoincidentally) the format used to transmit messages for networking
+purposes is precisely that used on disk, serialized.  The current
+distribution includes serv_network.c, which is basically a database replicator;
+please see network.txt on its operation and functionality (if any).
+
+PORTABILITY ISSUES
+------------------
+Citadel is 64-bit clean and architecture-independent.  The software is
+developed and primarily run on the Linux operating system (which uses the
+Linux kernel) but it should compile and run on any reasonably POSIX
+compliant system.
+
+On the client side, it's also POSIX compliant.  The client even seems to
+build ok on non-POSIX systems with porting libraries (such as Cygwin and
+WSL).
+
+SUPPORTING PRIVATE MAIL
+-----------------------
+Can one have an elegant kludge?  This must come pretty close.
+
+Private mail is sent and recieved in the *Mail>* room, which otherwise
+behaves pretty much as any other room.        To make this work, we have a
+separate Mail> room for each user behind the scenes.  The actual room name
+in the database looks like *"0000001234.Mail"* (where *'1234'* is the user
+number) and it's flagged with the *QR_MAILBOX* flag.  The user number is
+stripped off by the server before the name is presented to the client.  This
+provides the ability to give each user a separate namespace for mailboxes
+and personal rooms.
+
+This requires a little fiddling to get things just right. For example,
+*make_message()* has to be kludged to ask for the name of the recipient
+of the message whenever a message is entered in *Mail>*. But basically
+it works pretty well, keeping the code and user interface simple and
+regular.
+
+PASSWORDS AND NAME VALIDATION
+-----------------------------
+This has changed a couple of times over the course of Citadel's history.  At
+this point it's very simple, again due to the fact that record managers are
+used for everything.    The user file (user) is indexed using the user's
+name, converted to all lower-case.  Searching for a user, then, is easy.  We
+just lowercase the name we're looking for and query the database.  If no
+match is found, it is assumed that the user does not exist.
+
+This makes it difficult to forge messages from an existing user.  (Fine
+point: nonprinting characters are converted to printing characters, and
+leading, trailing, and double blanks are deleted.)
diff --git a/citadel/modules/inboxrules/serv_inboxrules.c b/citadel/modules/inboxrules/serv_inboxrules.c

index fa9cca82acd06320d5d0c7aafafccc061b9b6e1c..96fb2d21b9bbb7c90ef50608ab8778648737dc75 100644 (file)
--- a/citadel/modules/inboxrules/serv_inboxrules.c
+++ b/citadel/modules/inboxrules/serv_inboxrules.c
@@ -834,8 +834,8 @@ char *final_keys[] = {
  
  // This data structure represents ONE inbox rule within the configuration.
  struct irule {
-       int field_compare_op;
         int compared_field;
+       int field_compare_op;
         char compared_value[128];
         int size_compare_op;
         long compared_size;
@@ -967,18 +967,18 @@ struct inboxrules *deserialize_inbox_rules(char *serialized_rules) {
                         free(decoded_rule);
  
                         // if we re-serialized this now, what would it look like?
-                       //syslog(LOG_DEBUG, "test reserialize: 0|%s|%s|%s|%s|%ld|%s|%s|%s|%s|%s",
-                               //field_keys[new_rule->compared_field],
-                               //fcomp_keys[new_rule->field_compare_op],
-                               //new_rule->compared_value,
-                               //scomp_keys[new_rule->size_compare_op],
-                               //new_rule->compared_size,
-                               //action_keys[new_rule->action],
-                               //new_rule->file_into,
-                               //new_rule->redirect_to,
-                               //new_rule->autoreply_message,
-                               //final_keys[new_rule->final_action]
-                       //);
+                       syslog(LOG_DEBUG, "test reserialize: 0|%s|%s|%s|%s|%ld|%s|%s|%s|%s|%s",
+                               field_keys[new_rule->compared_field],
+                               fcomp_keys[new_rule->field_compare_op],
+                               new_rule->compared_value,
+                               scomp_keys[new_rule->size_compare_op],
+                               new_rule->compared_size,
+                               action_keys[new_rule->action],
+                               new_rule->file_into,
+                               new_rule->redirect_to,
+                               new_rule->autoreply_message,
+                               final_keys[new_rule->final_action]
+                       );
                         // delete the above after moving it to a reserialize function
  
                 }
@@ -1000,13 +1000,108 @@ struct inboxrules *deserialize_inbox_rules(char *serialized_rules) {
   */
  void inbox_do_msg(long msgnum, void *userdata) {
         struct inboxrules *ii = (struct inboxrules *) userdata;
-       struct CtdlMessage *msg;
+       struct CtdlMessage *msg = NULL;
+       int headers_loaded = 0;
+       int body_loaded = 0;
+       int metadata_loaded = 0;
+       int i;
+
         syslog(LOG_DEBUG, "inboxrules: processing message #%ld which is higher than %ld, we are in %s", msgnum, ii->lastproc, CC->room.QRname);
  
-       // FIXME    you are here
+       if (ii->num_rules <= 0) {
+               syslog(LOG_DEBUG, "inboxrules: rule set is empty");
+               return;
+       }
+
+       for (i=0; i<ii->num_rules; ++i) {
+               syslog(LOG_DEBUG, "inboxrules: processing rule %d is %s", i, field_keys[ ii->rules[i].compared_field ]);
+
+               // Before doing a field compare, check to see if we have the correct parts of the message in memory.
+
+               switch(ii->rules[i].compared_field) {
+                       // These fields require loading only the top-level headers
+                       case field_from:                // From:
+                       case field_tocc:                // To: or Cc:
+                       case field_subject:             // Subject:
+                       case field_replyto:             // Reply-to:
+                       case field_listid:              // List-ID:
+                       case field_envto:               // Envelope-to:
+                       case field_envfrom:             // Return-path:
+                               if (!headers_loaded) {
+                                       syslog(LOG_DEBUG, "inboxrules: loading headers for message %ld", msgnum);
+                                       msg = CtdlFetchMessage(msgnum, 0);
+                                       headers_loaded = 1;
+                               }
+                               break;
+                       // These fields are not stored as Citadel headers, and therefore require a full message load.
+                       case field_sender:
+                       case field_resentfrom:
+                       case field_resentto:
+                       case field_xmailer:
+                       case field_xspamflag:
+                       case field_xspamstatus:
+                               if (!body_loaded) {
+                                       syslog(LOG_DEBUG, "inboxrules: loading all of message %ld", msgnum);
+                                       if (msg != NULL) {
+                                               CM_Free(msg);
+                                       }
+                                       msg = CtdlFetchMessage(msgnum, 1);
+                                       headers_loaded = 1;
+                                       body_loaded = 1;
+                               }
+                               break;
+                       case field_size:
+                               if (!metadata_loaded) {
+                                       syslog(LOG_DEBUG, "inboxrules: loading metadata for message %ld", msgnum);
+                                       // FIXME do this
+                                       metadata_loaded = 1;
+                               }
+                               break;
+                       case field_all:
+                               syslog(LOG_DEBUG, "this is an always-on rule");
+                               break;
+                       default:
+                               syslog(LOG_DEBUG, "inboxrules: unknown rule key");
+               }
+
+               // Message data to compare is loaded, now do something.
+               switch(ii->rules[i].compared_field) {
+                       default:
+                               TRACE;
+                               break;
+               }
  
+       
+       }
  
-       //msg = CtdlFetchMessage(msgnum, 
+       TRACE;
+       if (msg != NULL) {
+               CM_Free(msg);
+       }
+       // FIXME you are here YOU ARE HERE
+
+//struct irule {
+       //int field_compare_op;
+       //int compared_field;
+       //char compared_value[128];
+       //int size_compare_op;
+       //long compared_size;
+       //int action;
+       //char file_into[ROOMNAMELEN];
+       //char redirect_to[1024];
+       //char autoreply_message[SIZ];
+       //int final_action;
+//};
+
+//struct inboxrules {
+       //long lastproc;
+       //int num_rules;
+       //struct irule *rules;
+
+
+       // Fetch the message, including the body, we need all of it to run our rules.
+       //msg = CtdlFetchMessage(msgnum, 0);
+       //CM_Free(msg);
  
  }
author	Art Cancro <ajc@citadel.org>
	Sun, 23 Aug 2020 18:34:24 +0000 (14:34 -0400)
committer	Art Cancro <ajc@citadel.org>
	Sun, 23 Aug 2020 18:34:24 +0000 (14:34 -0400)
citadel/docs/databaselayout.md	[deleted file]	patch \| blob \| history
citadel/docs/databaselayout.txt	[new file with mode: 0644]	patch \| blob
citadel/modules/inboxrules/serv_inboxrules.c		patch \| blob \| history