From: Art Cancro Date: Sun, 23 Aug 2020 18:34:24 +0000 (-0400) Subject: More chiseling away at the replacement inbox filter. X-Git-Tag: v939~234 X-Git-Url: https://code.citadel.org/?a=commitdiff_plain;h=42bc3578dcbbcf3b205b27fff06f333c2270e60b;p=citadel.git More chiseling away at the replacement inbox filter. --- diff --git a/citadel/docs/databaselayout.md b/citadel/docs/databaselayout.md deleted file mode 100644 index 9c2cb5653..000000000 --- a/citadel/docs/databaselayout.md +++ /dev/null @@ -1,450 +0,0 @@ -The totally incomplete guide to Citadel internals ------------------------------------------------------ ------------------------------------------------------ - -Citadel has evolved quite a bit since its early days, and the data structures -have evolved with it. This document provides a rough overview of how the -system works internally. For details you're going to have to dig through the -code, but this'll get you started. - - -DATABASE TABLES ---------------- -As you probably already know by now, Citadel uses a group of tables stored -with a record manager (usually Berkeley DB). Since we're using a record -manager rather than a relational database, all record structures are managed -by Citadel. Here are some of the tables we keep on disk: - - -USER RECORDS ------------- -This table contains all user records. It's indexed by -user name (translated to lower case for indexing purposes). The records in -this file look something like this: - -struct ctdluser { /* User record */ - int version; /* Cit vers. which created this rec */ - uid_t uid; /* Associate with a unix account? */ - char password[32]; /* password (for Citadel-only users)*/ - unsigned flags; /* See US_ flags below */ - long timescalled; /* Total number of logins */ - long posted; /* Number of messages posted (ever) */ - CIT_UBYTE axlevel; /* Access level */ - long usernum; /* User number (never recycled) */ - time_t lastcall; /* Last time the user called */ - int USuserpurge; /* Purge time (in days) for user */ - char fullname[64]; /* Name for Citadel messages & mail */ -}; - - Most fields here should be fairly self-explanatory. The ones that might -deserve some attention are: - - - *uid* -- if uid is not the same as the *unix uid* Citadel is running as, then the - account is assumed to belong to the user on the underlying Unix system with - that uid. This allows us to require the user's OS password instead of having - a separate Citadel password. - - - *usernum* -- these are assigned sequentially, and **NEVER REUSED**. This is - important because it allows us to use this number in other data structures - without having to worry about users being added/removed later on, as you'll - see later in this document. - - -ROOM RECORDS ------------- -These are room records. There is a room record for every room on the -system, public or private or mailbox. It's indexed by room name (also in -lower case for easy indexing) and it contains records which look like this: - - struct ctdlroom { - char QRname[ROOMNAMELEN]; /* Name of room */ - char QRpasswd[10]; /* Only valid if it's a private rm */ - long QRroomaide; /* User number of room aide */ - long QRhighest; /* Highest message NUMBER in room */ - time_t QRgen; /* Generation number of room */ - unsigned QRflags; /* See flag values below */ - char QRdirname[15]; /* Directory name, if applicable */ - long QRinfo; /* Info file update relative to msgs*/ - char QRfloor; /* Which floor this room is on */ - time_t QRmtime; /* Date/time of last post */ - struct ExpirePolicy QRep; /* Message expiration policy */ - long QRnumber; /* Globally unique room number */ - char QRorder; /* Sort key for room listing order */ - unsigned QRflags2; /* Additional flags */ - int QRdefaultview; /* How to display the contents */ - }; - -Again, mostly self-explanatory. Here are the interesting ones: - -*QRnumber* is a globally unique room ID, while QRgen is the "generation number" -of the room (it's actually a timestamp). The two combined produce a unique -value which identifies the room. The reason for two separate fields will be -explained below when we discuss the visit table. For now just remember that -*QRnumber* remains the same for the duration of the room's existence, and QRgen -is timestamped once during room creation but may be restamped later on when -certain circumstances exist. - -FLOORTAB --------- -Floors. This is so simplistic it's not worth going into detail about, except -to note that we keep a reference count of the number of rooms on each floor. - -MSGLISTS --------- -Each record in this table consists of a bunch of message numbers -which represent the contents of a room. A message can exist in more than one -room (for example, a mail message with multiple recipients -- 'single instance -store'). This table is never, ever traversed in its entirety. When you do -any type of read operation, it fetches the msglist for the room you're in -(using the room's ID as the index key) and then you can go ahead and read -those messages one by one. - -Each room is basically just a list of message numbers. Each time -we enter a new message in a room, its message number is appended to the end -of the list. If an old message is to be expired, we must delete it from the -message base. Reading a room is just a matter of looking up the messages -one by one and sending them to the client for display, printing, or whatever. - - -VISIT ------ -This is the tough one. Put on your thinking cap and grab a fresh cup of -coffee before attempting to grok the visit table. - -This table contains records which establish the relationship between users -and rooms. Its index is a hash of the user and room combination in question. -When looking for such a relationship, the record in this table can tell the -server things like "this user has zapped this room," "this user has access to -this private room," etc. It's also where we keep track of which messages -the user has marked as "old" and which are "new" (which are not necessarily -contiguous; contrast with older Citadel implementations which simply kept a -"last read" pointer). - - -Here's what the records look like: - - struct visit { - long v_roomnum; - long v_roomgen; - long v_usernum; - long v_lastseen; - unsigned int v_flags; - char v_seen[SIZ]; - int v_view; - }; - - #define V_FORGET 1 /* User has zapped this room */ - #define V_LOCKOUT 2 /* User is locked out of this room */ - #define V_ACCESS 4 /* Access is granted to this room */ - -This table is indexed by a concatenation of the first three fields. Whenever -we want to learn the relationship between a user and a room, we feed that -data to a function which looks up the corresponding record. The record is -designed in such a way that an "all zeroes" record (which is what you get if -the record isn't found) represents the default relationship. - -With this data, we now know which private rooms we're allowed to visit: if -the *V_ACCESS* bit is set, the room is one which the user knows, and it may -appear in his/her known rooms list. Conversely, we also know which rooms the -user has zapped: if the *V_FORGET* flag is set, we relegate the room to the -zapped list and don't bring it up during new message searches. It's also -worth noting that the *V_LOCKOUT* flag works in a similar way to administratively -lock users out of rooms. - -Implementing the "cause all users to forget room" command, then, becomes very -simple: we simply change the generation number of the room by putting a new -timestamp in the *QRgen* field. This causes all relevant visit records to -become irrelevant, because they appear to point to a different room. At the -same time, we don't lose the messages in the room, because the msglists table -is indexed by the room number (*QRnumber*), which never changes. - -*v_seen* contains a string which represents the set of messages in this room -which the user has read (marked as 'seen' or 'old'). It follows the same -syntax used by IMAP and NNTP. When we search for new messages, we simply -return any messages that are in the room that are **not** represented by this -set. Naturally, when we do want to mark more messages as seen (or unmark -them), we change this string. Citadel BBS client implementations are naive -and think linearly in terms of "everything is old up to this point," but IMAP -clients want to have more granularity. - - -DIRECTORY ---------- -This table simply maps Internet e-mail addresses to Citadel network addresses -for quick lookup. It is generated from data in the Global Address Book room. - -USETABLE --------- -This table keeps track of message ID's of messages arriving over a network, -to prevent duplicates from being posted if someone misconfigures the network -and a loop is created. This table goes unused on a non-networked Citadel. - -THE MESSAGE STORE ------------------ -This is where all message text is stored. It's indexed by message number: -give it a number, get back a message. Messages are numbered sequentially, and -the message numbers are never reused. - -We also keep a "metadata" record for each message. This record is also stored -in the msgmain table, using the index (0 - msgnum). We keep in the metadata -record, among other things, a reference count for each message. Since a -message may exist in more than one room, it's important to keep this reference -count up to date, and to delete the message from disk when the reference count -reaches zero. - -#Here's the format for the message itself: - - - Each message begins with an 0xFF 'start of message' byte. - - - The next byte denotes whether this is an anonymous message. The codes - available are *MES_NORMAL*, *MES_ANON*, or *MES_AN2* (defined in citadel.h). - - - The third byte is a "message type" code. The following codes are defined: - - 0 - "Traditional" Citadel format. Message is to be displayed "formatted." - - 1 - Plain pre-formatted ASCII text (otherwise known as text/plain) - - 4 - MIME formatted message. The text of the message which follows is - expected to begin with a "Content-type:" header. - - - After these three opening bytes, the remainder of - the message consists of a sequence of character strings. Each string - begins with a type byte indicating the meaning of the string and is - ended with a null. All strings are printable ASCII: in particular, - all numbers are in ASCII rather than binary. This is for simplicity, - both in implementing the system and in implementing other code to - work with the system. For instance, a database driven off Citadel archives - can do wildcard matching without worrying about unpacking binary data such - as message ID's first. To provide later downward compatability - all software should be written to IGNORE fields not currently defined. - - -#The type bytes currently defined are: - - -| BYTE | Enum | NW | Mnemonic | Enum / Comments -|-------|-------------------|------|----------------|--------------------------------------------------------- -| A | eAuthor | from | Author | *eAuthor* -| | | | | Name of originator of message. -| B | eBig\_message | | Big message | *eBig\_message* -| | | | | This is a flag which indicates that the message is -| | | | | big, and Citadel is storing the body in a separate -| | | | | record. You will never see this field because the -| | | | | internal API handles it. -| C | eRemoteRoom | | RemoteRoom | *eRemoteRoom* -| | | | | when sent via Citadel Networking, this is the room -| | | | | its going to be put on the remote site. -| D | eDestination | | Destination | *eDestination* -| | | | | Contains name of the system this message should -| | | | | be sent to, for mail routing (private mail only). -| E | eExclusiveID | exti | Exclusive ID | *eExclusiveID* -| | | | | A persistent alphanumeric Message ID used for -| | | | | network replication. When a message arrives that -| | | | | contains an Exclusive ID, any existing messages which -| | | | | contain the same Exclusive ID and are *older* than this -| | | | | message should be deleted. If there exist any messages -| | | | | with the same Exclusive ID that are *newer*, then this -| | | | | message should be dropped. -| F | erFc822Addr | rfca | rFc822 address | *erFc822Addr* -| | | | | For Internet mail, this is the delivery address of the -| | | | | message author. -| H | eHumanNode | hnod | Human node name| *eHumanNode* -| | | | | Human-readable name of system message originated on. -| I | emessageId | msgn | Message ID | *emessageId* -| | | | | An RFC822-compatible message ID for this message. -| J | eJournal | jrnl | Journal | *eJournal* -| | | | | The presence of this field indicates that the message -| | | | | is disqualified from being journaled, perhaps because -| | | | | it is itself a journalized message and we wish to -| | | | | avoid double journaling. -| K | eReplyTo | rep2 | Reply-To | *eReplyTo* -| | | | | the Reply-To header for mailinglist outbound messages -| L | eListID | list | List-ID | *eListID* -| | | | | Mailing list identification, as per RFC 2919 -| M | eMesageText | text | Message Text | *eMesageText* -| | | | | Normal ASCII, newlines seperated by CR's or LF's, -| | | | | null terminated as always. -| N | eNodeName | node | Nodename | *eNodeName* -| | | | | Contains node name of system message originated on. -| O | eOriginalRoom | room | Room | *eOriginalRoom* - Room of origin. -| P | eMessagePath | path | Path | *eMessagePath* -| | | | | Complete path of message, as in the UseNet news -| | | | | standard. A user should be able to send Internet mail -| | | | | to this path. (Note that your system name will not be -| | | | | tacked onto this until you're sending the message to -| | | | | someone else) -| R | eRecipient | rcpt | Recipient | *eRecipient* - Only present in Mail messages. -| S | eSpecialField | spec | Special field | *eSpecialField* -| | | | | Only meaningful for messages being spooled over a -| | | | | network. Usually means that the message isn't really -| | | | | a message, but rather some other network function: -| | | | | -> "S" followed by "FILE" (followed by a null, of -| | | | | course) means that the message text is actually an -| | | | | IGnet/Open file transfer. (OBSOLETE) -| | | | | -> "S" followed by "CANCEL" means that this message -| | | | | should be deleted from the local message base once -| | | | | it has been replicated to all network systems. -| T | eTimestamp | time | date/Time | *eTimestamp* -| | | | | Unix timestamp containing the creation date/time of -| | | | | the message. -| U | eMsgSubject | subj | sUbject | *eMsgSubject* - Optional. -| | | | | Developers may choose whether they wish to -| | | | | generate or display subject fields. -| V | eenVelopeTo | nvto | enVelope-to | *eenVelopeTo* -| | | | | The recipient specified in incoming SMTP messages. -| W | eWeferences | wefw | Wefewences | *eWeferences* -| | | | | Previous message ID's for conversation threading. When -| | | | | converting from RFC822 we use References: if present, or -| | | | | In-Reply-To: otherwise. -| | | | | (Who in extnotify spool messages which don't need to know -| | | | | other message ids) -| Y | eCarbonCopY | cccc | carbon copY | *eCarbonCopY* -| | | | | Optional, and only in Mail messages. -| % | eHeaderOnly | nhdr | oNlyHeader | we will just be sending headers. for the Wire protocol only. -| % | eFormatType | type | type | type of citadel message: (Wire protocol only) -| | | | | FMT\_CITADEL 0 Citadel vari-format (proprietary) -| | | | | FMT\_FIXED 1 Fixed format (proprietary) -| | | | | FMT\_RFC822 4 Standard (headers are in M field) -| % | eMessagePart | part | emessagePart | *eMessagePart* is the id of this part in the mime hierachy -| % | eSubFolder | suff | eSubFolder | descend into a mime sub container -| % | ePevious | pref | ePevious | exit a mime sub container -| 0 | eErrorMsg | | Error | *eErrorMsg* -| | | | | This field is typically never found in a message on -| | | | | disk or in transit. Message scanning modules are -| | | | | expected to fill in this field when rejecting a message -| | | | | with an explanation as to what happened (virus found, -| | | | | message looks like spam, etc.) -| 1 | eSuppressIdx | | suppress index | *eSuppressIdx* -| | | | | The presence of this field indicates that the message is -| | | | | disqualified from being added to the full text index. -| 2 | eExtnotify | | extnotify | *eExtnotify* - Used internally by the serv_extnotify module. -| 3 | eVltMsgNum | | msgnum | *eVltMsgNum* -| | | | | Used internally to pass the local message number in the -| | | | | database to after-save hooks. Discarded afterwards. - -EXAMPLE -------- -Let ** be a *0xFF* byte, and *<0>* be a null *(0x00)* byte. Then a message -which prints as... - - Apr 12, 1988 23:16 From Test User In Network Test> @lifesys (Life Central) - Have a nice day! - -might be stored as... - - <40><0>I12345<0>Pneighbor!lifesys!test_user<0>T576918988<0> (continued) - -----------|Mesg ID#|--Message Path---------------|--Date------ - - AThe Test User<0>ONetwork Test<0>Nlifesys<0>HLife Central<0>MHave a nice day!<0> - |-----Author-----|-Room name-----|-nodename-|Human Name-|--Message text----- - -Weird things can happen if fields are missing, especially if you use the -networker. But basically, the date, author, room, and nodename may be in any -order. But the leading fields and the message text must remain in the same -place. The H field looks better when it is placed immediately after the N -field. - - -EUID (EXCLUSIVE MESSAGE ID'S) ------------------------------ -This is where the groupware magic happens. Any message in any room may have -a field called the Exclusive message *ID*, or *EUID*. We keep an index in the -table *CDB_EUIDINDEX* which knows the message number of any item that has an -*EUID*. This allows us to do two things: - - - If a subsequent message arrives with the same *EUID*, it automatically - *deletes* the existing one, because the new one is considered a replacement - for the existing one. - - If we know the *EUID* of the item we're looking for, we can fetch it by *EUID* - and get the most up-to-date version, even if it's been updated several times. - -This functionality is made more useful by server-side hooks. For example, -when we save a vCard to an address book room, or an iCalendar item to a -calendar room, our server modules detect this condition, and automatically set -the *EUID* of the message to the *UUID* of the *vCard* or *iCalendar* item. -Therefore when you save an updated version of an address book entry or -a calendar item, the old one is automatically deleted. - -NETWORKING (REPLICATION) ------------------------- -Citadel nodes network by sharing one or more rooms. Any Citadel node -can choose to share messages with any other Citadel node, through the sending -of spool files. The sending system takes all messages it hasn't sent yet, and -spools them to the recieving system, which posts them in the rooms. - -The *EUID* discussion above is extremely relevant, because *EUID* is carried over -the network as well, and the replacement rules are followed over the network -as well. Therefore, when a message containing an *EUID* is saved in a networked -room, it replaces any existing message with the same *EUID* *on every node in -the network*. - -Complexities arise primarily from the possibility of densely connected -networks: one does not wish to accumulate multiple copies of a given -message, which can easily happen. Nor does one want to see old messages -percolating indefinitely through the system. - -This problem is handled by keeping track of the path a message has taken over -the network, like the UseNet news system does. When a system sends out a -message, it adds its own name to the bang-path in the *

* field of the -message. If no path field is present, it generates one. - -With the path present, all the networker has to do to assure that it doesn't -send another system a message it's already received is check the

ath field -for that system's name somewhere in the bang path. If it's present, the system -has already seen the message, so we don't send it. - -We also keep a small database, called the "use table," containing the ID's of -all messages we've seen recently. If the same message arrives a second or -subsequent time, we will find its ID in the use table, indicating that we -already have a copy of that message. It will therefore be discarded. - -The above discussion should make the function of the fields reasonably clear: - - o Travelling messages need to carry original message-id, system of origin, - date of origin, author, and path with them, to keep reproduction and - cycling under control. - -(Uncoincidentally) the format used to transmit messages for networking -purposes is precisely that used on disk, serialized. The current -distribution includes serv_network.c, which is basically a database replicator; -please see network.txt on its operation and functionality (if any). - -PORTABILITY ISSUES ------------------- -Citadel is 64-bit clean, architecture-independent, and Year 2000 -compliant. The software should compile on any POSIX compliant system with -a full pthreads implementation and TCP/IP support. In the future we may -try to port it to non-POSIX systems as well. - -On the client side, it's also POSIX compliant. The client even seems to -build ok on non-POSIX systems with porting libraries (such as Cygwin). - -SUPPORTING PRIVATE MAIL ------------------------ -Can one have an elegant kludge? This must come pretty close. - -Private mail is sent and recieved in the *Mail>* room, which otherwise -behaves pretty much as any other room. To make this work, we have a -separate Mail> room for each user behind the scenes. The actual room name -in the database looks like *"0000001234.Mail"* (where *'1234'* is the user -number) and it's flagged with the *QR_MAILBOX* flag. The user number is -stripped off by the server before the name is presented to the client. This -provides the ability to give each user a separate namespace for mailboxes -and personal rooms. - -This requires a little fiddling to get things just right. For example, -*make_message()* has to be kludged to ask for the name of the recipient -of the message whenever a message is entered in *Mail>*. But basically -it works pretty well, keeping the code and user interface simple and -regular. - -PASSWORDS AND NAME VALIDATION ------------------------------ -This has changed a couple of times over the course of Citadel's history. At -this point it's very simple, again due to the fact that record managers are -used for everything. The user file (user) is indexed using the user's -name, converted to all lower-case. Searching for a user, then, is easy. We -just lowercase the name we're looking for and query the database. If no -match is found, it is assumed that the user does not exist. - -This makes it difficult to forge messages from an existing user. (Fine -point: nonprinting characters are converted to printing characters, and -leading, trailing, and double blanks are deleted.) diff --git a/citadel/docs/databaselayout.txt b/citadel/docs/databaselayout.txt new file mode 100644 index 000000000..7dbcf1da3 --- /dev/null +++ b/citadel/docs/databaselayout.txt @@ -0,0 +1,415 @@ +The totally incomplete guide to Citadel internals +----------------------------------------------------- + +Citadel has evolved quite a bit since its early days, and the data structures +have evolved with it. This document provides a rough overview of how the +system works internally. For details you're going to have to dig through the +code, but this'll get you started. + + +DATABASE TABLES +--------------- +As you probably already know by now, Citadel uses a group of tables stored +with a record manager (usually Berkeley DB). Since we're using a record +manager rather than a relational database, all record structures are managed +by Citadel. Here are some of the tables we keep on disk: + + +USER RECORDS +------------ +This table contains all user records. It's indexed by +user name (translated to lower case for indexing purposes). The records in +this file look something like this: + +struct ctdluser { /* User record */ + int version; /* Cit vers. which created this rec */ + uid_t uid; /* Associate with a unix account? */ + char password[32]; /* password (for Citadel-only users)*/ + unsigned flags; /* See US_ flags below */ + long timescalled; /* Total number of logins */ + long posted; /* Number of messages posted (ever) */ + CIT_UBYTE axlevel; /* Access level */ + long usernum; /* User number (never recycled) */ + time_t lastcall; /* Last time the user called */ + int USuserpurge; /* Purge time (in days) for user */ + char fullname[64]; /* Name for Citadel messages & mail */ +}; + + Most fields here should be fairly self-explanatory. The ones that might +deserve some attention are: + + - *uid* -- if uid is not the same as the *unix uid* Citadel is running as, then the + account is assumed to belong to the user on the underlying Unix system with + that uid. This allows us to require the user's OS password instead of having + a separate Citadel password. + + - *usernum* -- these are assigned sequentially, and **NEVER REUSED**. This is + important because it allows us to use this number in other data structures + without having to worry about users being added/removed later on, as you'll + see later in this document. + + +ROOM RECORDS +------------ +These are room records. There is a room record for every room on the +system, public or private or mailbox. It's indexed by room name (also in +lower case for easy indexing) and it contains records which look like this: + + struct ctdlroom { + char QRname[ROOMNAMELEN]; /* Name of room */ + char QRpasswd[10]; /* Only valid if it's a private rm */ + long QRroomaide; /* User number of room aide */ + long QRhighest; /* Highest message NUMBER in room */ + time_t QRgen; /* Generation number of room */ + unsigned QRflags; /* See flag values below */ + char QRdirname[15]; /* Directory name, if applicable */ + long QRinfo; /* Info file update relative to msgs*/ + char QRfloor; /* Which floor this room is on */ + time_t QRmtime; /* Date/time of last post */ + struct ExpirePolicy QRep; /* Message expiration policy */ + long QRnumber; /* Globally unique room number */ + char QRorder; /* Sort key for room listing order */ + unsigned QRflags2; /* Additional flags */ + int QRdefaultview; /* How to display the contents */ + }; + +Again, mostly self-explanatory. Here are the interesting ones: + +*QRnumber* is a globally unique room ID, while QRgen is the "generation number" +of the room (it's actually a timestamp). The two combined produce a unique +value which identifies the room. The reason for two separate fields will be +explained below when we discuss the visit table. For now just remember that +*QRnumber* remains the same for the duration of the room's existence, and QRgen +is timestamped once during room creation but may be restamped later on when +certain circumstances exist. + +FLOORTAB +-------- +Floors. This is so simplistic it's not worth going into detail about, except +to note that we keep a reference count of the number of rooms on each floor. + +MSGLISTS +-------- +Each record in this table consists of a bunch of message numbers +which represent the contents of a room. A message can exist in more than one +room (for example, a mail message with multiple recipients -- 'single instance +store'). This table is never, ever traversed in its entirety. When you do +any type of read operation, it fetches the msglist for the room you're in +(using the room's ID as the index key) and then you can go ahead and read +those messages one by one. + +Each room is basically just a list of message numbers. Each time +we enter a new message in a room, its message number is appended to the end +of the list. If an old message is to be expired, we must delete it from the +message base. Reading a room is just a matter of looking up the messages +one by one and sending them to the client for display, printing, or whatever. + + +VISIT +----- +This is the tough one. Put on your thinking cap and grab a fresh cup of +coffee before attempting to grok the visit table. + +This table contains records which establish the relationship between users +and rooms. Its index is a hash of the user and room combination in question. +When looking for such a relationship, the record in this table can tell the +server things like "this user has zapped this room," "this user has access to +this private room," etc. It's also where we keep track of which messages +the user has marked as "old" and which are "new" (which are not necessarily +contiguous; contrast with older Citadel implementations which simply kept a +"last read" pointer). + + +Here's what the records look like: + + struct visit { + long v_roomnum; + long v_roomgen; + long v_usernum; + long v_lastseen; + unsigned int v_flags; + char v_seen[SIZ]; + int v_view; + }; + + #define V_FORGET 1 /* User has zapped this room */ + #define V_LOCKOUT 2 /* User is locked out of this room */ + #define V_ACCESS 4 /* Access is granted to this room */ + +This table is indexed by a concatenation of the first three fields. Whenever +we want to learn the relationship between a user and a room, we feed that +data to a function which looks up the corresponding record. The record is +designed in such a way that an "all zeroes" record (which is what you get if +the record isn't found) represents the default relationship. + +With this data, we now know which private rooms we're allowed to visit: if +the *V_ACCESS* bit is set, the room is one which the user knows, and it may +appear in his/her known rooms list. Conversely, we also know which rooms the +user has zapped: if the *V_FORGET* flag is set, we relegate the room to the +zapped list and don't bring it up during new message searches. It's also +worth noting that the *V_LOCKOUT* flag works in a similar way to administratively +lock users out of rooms. + +Implementing the "cause all users to forget room" command, then, becomes very +simple: we simply change the generation number of the room by putting a new +timestamp in the *QRgen* field. This causes all relevant visit records to +become irrelevant, because they appear to point to a different room. At the +same time, we don't lose the messages in the room, because the msglists table +is indexed by the room number (*QRnumber*), which never changes. + +*v_seen* contains a string which represents the set of messages in this room +which the user has read (marked as 'seen' or 'old'). It follows the same +syntax used by IMAP and NNTP. When we search for new messages, we simply +return any messages that are in the room that are **not** represented by this +set. Naturally, when we do want to mark more messages as seen (or unmark +them), we change this string. Citadel BBS client implementations are naive +and think linearly in terms of "everything is old up to this point," but IMAP +clients want to have more granularity. + + +DIRECTORY +--------- +This table simply maps Internet e-mail addresses to Citadel network addresses +for quick lookup. It is generated from data in the Global Address Book room. + +USETABLE +-------- +This table keeps track of message ID's of messages arriving over a network, +to prevent duplicates from being posted if someone misconfigures the network +and a loop is created. This table goes unused on a non-networked Citadel. + +THE MESSAGE STORE +----------------- +This is where all message text is stored. It's indexed by message number: +give it a number, get back a message. Messages are numbered sequentially, and +the message numbers are never reused. + +We also keep a "metadata" record for each message. This record is also stored +in the msgmain table, using the index (0 - msgnum). We keep in the metadata +record, among other things, a reference count for each message. Since a +message may exist in more than one room, it's important to keep this reference +count up to date, and to delete the message from disk when the reference count +reaches zero. + +#Here's the format for the message itself: + + - Each message begins with an 0xFF 'start of message' byte. + + - The next byte denotes whether this is an anonymous message. The codes + available are *MES_NORMAL*, *MES_ANON*, or *MES_AN2* (defined in citadel.h). + + - The third byte is a "message type" code. The following codes are defined: + - 0 - "Traditional" Citadel format. Message is to be displayed "formatted." + - 1 - Plain pre-formatted ASCII text (otherwise known as text/plain) + - 4 - MIME formatted message. The text of the message which follows is + expected to begin with a "Content-type:" header. + + - After these three opening bytes, the remainder of + the message consists of a sequence of character strings. Each string + begins with a type byte indicating the meaning of the string and is + ended with a null. All strings are printable ASCII: in particular, + all numbers are in ASCII rather than binary. This is for simplicity, + both in implementing the system and in implementing other code to + work with the system. For instance, a database driven off Citadel archives + can do wildcard matching without worrying about unpacking binary data such + as message ID's first. To provide later downward compatability + all software should be written to IGNORE fields not currently defined. + + +#The type bytes currently defined are: + + +| BYTE | Enum | NW | Mnemonic | Enum / Comments +|-------|-------------------|------|----------------|--------------------------------------------------------- +| A | eAuthor | from | Author | Name of originator of the message. +| B | eBig_message | | Big message | This is a flag which indicates that the message is +| | | | | big, and Citadel is storing the body in a separate +| | | | | record. You will never see this field because the +| | | | | internal API handles it. +| E | eExclusiveID | exti | Exclusive ID | A persistent alphanumeric Message ID used for +| | | | | replication control. When a message arrives that +| | | | | contains an Exclusive ID, any existing messages which +| | | | | contain the same Exclusive ID and are *older* than this +| | | | | message should be deleted. If there exist any messages +| | | | | with the same Exclusive ID that are *newer*, then this +| | | | | message should be dropped. +| F | erFc822Addr | rfca | rFc822 address | For Internet mail, this is the delivery address of the +| | | | | message author. +| I | emessageId | msgn | Message ID | An RFC822-compatible message ID for this message. +| | | | | +| J | eJournal | jrnl | Journal | The presence of this field indicates that the message +| | | | | is disqualified from being journaled, perhaps because +| | | | | it is itself a journalized message and we wish to +| | | | | avoid double journaling. +| K | eReplyTo | rep2 | Reply-To | the Reply-To header for mailinglist outbound messages +| L | eListID | list | List-ID | Mailing list identification, as per RFC 2919 +| M | eMesageText | text | Message Text | Normal ASCII, newlines seperated by CR's or LF's, +| | | | | null terminated as always. +| O | eOriginalRoom | room | Room | Room of origin. +| P | eMessagePath | path | Path | Complete path of message, as in the UseNet news +| | | | | standard. A user should be able to send Internet mail +| | | | | to this path. (Note that your system name will not be +| | | | | tacked onto this until you're sending the message to +| | | | | someone else) +| R | eRecipient | rcpt | Recipient | Only present in Mail messages. +| T | eTimestamp | time | date/Time | Unix timestamp containing the creation date/time of +| | | | | the message. +| U | eMsgSubject | subj | sUbject | Message subject. Optional. +| | | | | Developers may choose whether they wish to +| | | | | generate or display subject fields. +| V | eenVelopeTo | nvto | enVelope-to | The recipient specified in incoming SMTP messages. +| W | eWeferences | wefw | Wefewences | Previous message ID's for conversation threading. When +| | | | | converting from RFC822 we use References: if present, or +| | | | | In-Reply-To: otherwise. +| | | | | (Who in extnotify spool messages which don't need to know +| | | | | other message ids) +| Y | eCarbonCopY | cccc | carbon copY | Carbon copy (CC) recipients. +| | | | | Optional, and only in Mail messages. +| % | eHeaderOnly | nhdr | oNlyHeader | we will just be sending headers. for the Wire protocol only. +| % | eFormatType | type | type | type of citadel message: (Wire protocol only) +| | | | | FMT\_CITADEL 0 Citadel vari-format (proprietary) +| | | | | FMT\_FIXED 1 Fixed format (proprietary) +| | | | | FMT\_RFC822 4 Standard (headers are in M field) +| % | eMessagePart | part | emessagePart | eMessagePart is the id of this part in the mime hierachy +| % | eSubFolder | suff | eSubFolder | descend into a mime sub container +| % | ePevious | pref | ePevious | exit a mime sub container +| 0 | eErrorMsg | | Error | This field is typically never found in a message on +| | | | | disk or in transit. Message scanning modules are +| | | | | expected to fill in this field when rejecting a message +| | | | | with an explanation as to what happened (virus found, +| | | | | message looks like spam, etc.) +| 1 | eSuppressIdx | | suppress index | The presence of this field indicates that the message is +| | | | | disqualified from being added to the full text index. +| 2 | eExtnotify | | extnotify | Used internally by the serv_extnotify module. +| 3 | eVltMsgNum | | msgnum | Used internally to pass the local message number in the +| | | | | database to after-save hooks. Discarded afterwards. + +EXAMPLE +------- +Let ** be a *0xFF* byte, and *<0>* be a null *(0x00)* byte. Then a message +which prints as... + + Apr 12, 1988 23:16 From Test User In Network Test> @lifesys (Life Central) + Have a nice day! + +might be stored as... + + <40><0>I12345<0>Pneighbor!lifesys!test_user<0>T576918988<0> (continued) + -----------|Mesg ID#|--Message Path---------------|--Date------ + + AThe Test User<0>ONetwork Test<0>Nlifesys<0>HLife Central<0>MHave a nice day!<0> + |-----Author-----|-Room name-----|-nodename-|Human Name-|--Message text----- + +Weird things can happen if fields are missing, especially if you use the +networker. But basically, the date, author, room, and nodename may be in any +order. But the leading fields and the message text must remain in the same +place. The H field looks better when it is placed immediately after the N +field. + + +EUID (EXCLUSIVE MESSAGE ID'S) +----------------------------- +This is where the groupware magic happens. Any message in any room may have +a field called the Exclusive message *ID*, or *EUID*. We keep an index in the +table *CDB_EUIDINDEX* which knows the message number of any item that has an +*EUID*. This allows us to do two things: + + - If a subsequent message arrives with the same *EUID*, it automatically + *deletes* the existing one, because the new one is considered a replacement + for the existing one. + - If we know the *EUID* of the item we're looking for, we can fetch it by *EUID* + and get the most up-to-date version, even if it's been updated several times. + +This functionality is made more useful by server-side hooks. For example, +when we save a vCard to an address book room, or an iCalendar item to a +calendar room, our server modules detect this condition, and automatically set +the *EUID* of the message to the *UUID* of the *vCard* or *iCalendar* item. +Therefore when you save an updated version of an address book entry or +a calendar item, the old one is automatically deleted. + +NETWORKING (REPLICATION) +------------------------ +Citadel nodes network by sharing one or more rooms. Any Citadel node +can choose to share messages with any other Citadel node, through the sending +of spool files. The sending system takes all messages it hasn't sent yet, and +spools them to the recieving system, which posts them in the rooms. + +The *EUID* discussion above is extremely relevant, because *EUID* is carried over +the network as well, and the replacement rules are followed over the network +as well. Therefore, when a message containing an *EUID* is saved in a networked +room, it replaces any existing message with the same *EUID* *on every node in +the network*. + +Complexities arise primarily from the possibility of densely connected +networks: one does not wish to accumulate multiple copies of a given +message, which can easily happen. Nor does one want to see old messages +percolating indefinitely through the system. + +This problem is handled by keeping track of the path a message has taken over +the network, like the UseNet news system does. When a system sends out a +message, it adds its own name to the bang-path in the *

* field of the +message. If no path field is present, it generates one. + +With the path present, all the networker has to do to assure that it doesn't +send another system a message it's already received is check the

ath field +for that system's name somewhere in the bang path. If it's present, the system +has already seen the message, so we don't send it. + +We also keep a small database, called the "use table," containing the ID's of +all messages we've seen recently. If the same message arrives a second or +subsequent time, we will find its ID in the use table, indicating that we +already have a copy of that message. It will therefore be discarded. + +The above discussion should make the function of the fields reasonably clear: + + o Travelling messages need to carry original message-id, system of origin, + date of origin, author, and path with them, to keep reproduction and + cycling under control. + +(Uncoincidentally) the format used to transmit messages for networking +purposes is precisely that used on disk, serialized. The current +distribution includes serv_network.c, which is basically a database replicator; +please see network.txt on its operation and functionality (if any). + +PORTABILITY ISSUES +------------------ +Citadel is 64-bit clean and architecture-independent. The software is +developed and primarily run on the Linux operating system (which uses the +Linux kernel) but it should compile and run on any reasonably POSIX +compliant system. + +On the client side, it's also POSIX compliant. The client even seems to +build ok on non-POSIX systems with porting libraries (such as Cygwin and +WSL). + +SUPPORTING PRIVATE MAIL +----------------------- +Can one have an elegant kludge? This must come pretty close. + +Private mail is sent and recieved in the *Mail>* room, which otherwise +behaves pretty much as any other room. To make this work, we have a +separate Mail> room for each user behind the scenes. The actual room name +in the database looks like *"0000001234.Mail"* (where *'1234'* is the user +number) and it's flagged with the *QR_MAILBOX* flag. The user number is +stripped off by the server before the name is presented to the client. This +provides the ability to give each user a separate namespace for mailboxes +and personal rooms. + +This requires a little fiddling to get things just right. For example, +*make_message()* has to be kludged to ask for the name of the recipient +of the message whenever a message is entered in *Mail>*. But basically +it works pretty well, keeping the code and user interface simple and +regular. + +PASSWORDS AND NAME VALIDATION +----------------------------- +This has changed a couple of times over the course of Citadel's history. At +this point it's very simple, again due to the fact that record managers are +used for everything. The user file (user) is indexed using the user's +name, converted to all lower-case. Searching for a user, then, is easy. We +just lowercase the name we're looking for and query the database. If no +match is found, it is assumed that the user does not exist. + +This makes it difficult to forge messages from an existing user. (Fine +point: nonprinting characters are converted to printing characters, and +leading, trailing, and double blanks are deleted.) diff --git a/citadel/modules/inboxrules/serv_inboxrules.c b/citadel/modules/inboxrules/serv_inboxrules.c index fa9cca82a..96fb2d21b 100644 --- a/citadel/modules/inboxrules/serv_inboxrules.c +++ b/citadel/modules/inboxrules/serv_inboxrules.c @@ -834,8 +834,8 @@ char *final_keys[] = { // This data structure represents ONE inbox rule within the configuration. struct irule { - int field_compare_op; int compared_field; + int field_compare_op; char compared_value[128]; int size_compare_op; long compared_size; @@ -967,18 +967,18 @@ struct inboxrules *deserialize_inbox_rules(char *serialized_rules) { free(decoded_rule); // if we re-serialized this now, what would it look like? - //syslog(LOG_DEBUG, "test reserialize: 0|%s|%s|%s|%s|%ld|%s|%s|%s|%s|%s", - //field_keys[new_rule->compared_field], - //fcomp_keys[new_rule->field_compare_op], - //new_rule->compared_value, - //scomp_keys[new_rule->size_compare_op], - //new_rule->compared_size, - //action_keys[new_rule->action], - //new_rule->file_into, - //new_rule->redirect_to, - //new_rule->autoreply_message, - //final_keys[new_rule->final_action] - //); + syslog(LOG_DEBUG, "test reserialize: 0|%s|%s|%s|%s|%ld|%s|%s|%s|%s|%s", + field_keys[new_rule->compared_field], + fcomp_keys[new_rule->field_compare_op], + new_rule->compared_value, + scomp_keys[new_rule->size_compare_op], + new_rule->compared_size, + action_keys[new_rule->action], + new_rule->file_into, + new_rule->redirect_to, + new_rule->autoreply_message, + final_keys[new_rule->final_action] + ); // delete the above after moving it to a reserialize function } @@ -1000,13 +1000,108 @@ struct inboxrules *deserialize_inbox_rules(char *serialized_rules) { */ void inbox_do_msg(long msgnum, void *userdata) { struct inboxrules *ii = (struct inboxrules *) userdata; - struct CtdlMessage *msg; + struct CtdlMessage *msg = NULL; + int headers_loaded = 0; + int body_loaded = 0; + int metadata_loaded = 0; + int i; + syslog(LOG_DEBUG, "inboxrules: processing message #%ld which is higher than %ld, we are in %s", msgnum, ii->lastproc, CC->room.QRname); - // FIXME you are here + if (ii->num_rules <= 0) { + syslog(LOG_DEBUG, "inboxrules: rule set is empty"); + return; + } + + for (i=0; inum_rules; ++i) { + syslog(LOG_DEBUG, "inboxrules: processing rule %d is %s", i, field_keys[ ii->rules[i].compared_field ]); + + // Before doing a field compare, check to see if we have the correct parts of the message in memory. + + switch(ii->rules[i].compared_field) { + // These fields require loading only the top-level headers + case field_from: // From: + case field_tocc: // To: or Cc: + case field_subject: // Subject: + case field_replyto: // Reply-to: + case field_listid: // List-ID: + case field_envto: // Envelope-to: + case field_envfrom: // Return-path: + if (!headers_loaded) { + syslog(LOG_DEBUG, "inboxrules: loading headers for message %ld", msgnum); + msg = CtdlFetchMessage(msgnum, 0); + headers_loaded = 1; + } + break; + // These fields are not stored as Citadel headers, and therefore require a full message load. + case field_sender: + case field_resentfrom: + case field_resentto: + case field_xmailer: + case field_xspamflag: + case field_xspamstatus: + if (!body_loaded) { + syslog(LOG_DEBUG, "inboxrules: loading all of message %ld", msgnum); + if (msg != NULL) { + CM_Free(msg); + } + msg = CtdlFetchMessage(msgnum, 1); + headers_loaded = 1; + body_loaded = 1; + } + break; + case field_size: + if (!metadata_loaded) { + syslog(LOG_DEBUG, "inboxrules: loading metadata for message %ld", msgnum); + // FIXME do this + metadata_loaded = 1; + } + break; + case field_all: + syslog(LOG_DEBUG, "this is an always-on rule"); + break; + default: + syslog(LOG_DEBUG, "inboxrules: unknown rule key"); + } + + // Message data to compare is loaded, now do something. + switch(ii->rules[i].compared_field) { + default: + TRACE; + break; + } + + } - //msg = CtdlFetchMessage(msgnum, + TRACE; + if (msg != NULL) { + CM_Free(msg); + } + // FIXME you are here YOU ARE HERE + +//struct irule { + //int field_compare_op; + //int compared_field; + //char compared_value[128]; + //int size_compare_op; + //long compared_size; + //int action; + //char file_into[ROOMNAMELEN]; + //char redirect_to[1024]; + //char autoreply_message[SIZ]; + //int final_action; +//}; + +//struct inboxrules { + //long lastproc; + //int num_rules; + //struct irule *rules; + + + // Fetch the message, including the body, we need all of it to run our rules. + //msg = CtdlFetchMessage(msgnum, 0); + //CM_Free(msg); }