- hack.txt for Citadel/UX
- (possibly a little out of date)
-
- Much of this document is borrowed from the original hack.doc from
-Citadel-CP/M and Citadel-86, because many of the concepts are the same. Hats
-off to whoever wrote the original, for a fine document that inspired the
-implementation of Citadel for Unix.
+ ------------------------------------------------------
+ The totally incomplete guide to Citadel/UX internals
+ ------------------------------------------------------
+
+ Citadel has evolved quite a bit since its early days, and the data structures
+have evolved with it. This document provides a rough overview of how the
+system works internally. For details you're going to have to dig through the
+code, but this'll get you started.
+
+
+ Database tables
- Note that this document is really out of date. It doesn't cover anything
-about the threaded server architecture or any of the network stuff. What is
-covered here is the basic architecture of the databases.
+
+ As you probably already know by now, Citadel uses a group of tables stored
+with a record manager (usually Berkeley DB). Since we're using a record
+manager rather than a relational database, all record structures are managed
+by Citadel. Here are some of the tables we keep on disk:
+
+
+ USERSUPP
+ --------
- But enough of the preamble. Here's how Citadel/UX works :)
-
- Here are the major databases to be discussed:
+ This table contains all user records. It's called 'usersupp' because it was
+once a supplementary file (at one point in ancient history, we created a user
+record on the underlying operating system for each user). It's indexed by
+user name (translated to lower case for indexing purposes). The records in
+this file look something like this:
+
+struct usersupp { /* User record */
+ int version; /* Cit vers. which created this rec */
+ uid_t uid; /* Associate with a unix account? */
+ char password[32]; /* password (for BBS-only users) */
+ unsigned flags; /* See US_ flags below */
+ long timescalled; /* Total number of logins */
+ long posted; /* Number of messages posted (ever) */
+ CIT_UBYTE axlevel; /* Access level */
+ long usernum; /* User number (never recycled) */
+ time_t lastcall; /* Last time the user called */
+ int USuserpurge; /* Purge time (in days) for user */
+ char fullname[64]; /* Name for Citadel messages & mail */
+ CIT_UBYTE USscreenwidth; /* Screen width (for textmode users)*/
+ CIT_UBYTE USscreenheight; /* Screen height(for textmode users)*/
+};
+
+ Most fields here should be fairly self-explanatory. The ones that might
+deserve some attention are:
+
+ uid -- if uid is not the same as the uid Citadel is running as, then the
+account is assumed to belong to the user on the underlying Unix system with
+that uid. This allows us to require the user's OS password instead of having
+a separate Citadel password.
+
+ usernum -- these are assigned sequentially, and NEVER REUSED. This is
+important because it allows us to use this number in other data structures
+without having to worry about users being added/removed later on, as you'll
+see later in this document.
+
+ The screenwidth and screenheight fields are almost never used anymore. Back
+when people were calling into dialup systems we had no way of knowing the
+user's screen dimensions, but modern networks almost always transmit this
+information so we set it up dynamically.
+
+
+ QUICKROOM
+ ---------
+
+ These are room records. One per room. It's called 'quickroom' because at
+one time it was a quick index hash type of thing (there was a pair called
+quickroom and fullroom). There is a quickroom record for every room on the
+system, public or private or mailbox. It's indexed by room name (also in
+lower case for easy indexing) and it contains records which look like this:
+
+struct quickroom {
+ char QRname[ROOMNAMELEN]; /* Name of room */
+ char QRpasswd[10]; /* Only valid if it's a private rm */
+ long QRroomaide; /* User number of room aide */
+ long QRhighest; /* Highest message NUMBER in room */
+ time_t QRgen; /* Generation number of room */
+ unsigned QRflags; /* See flag values below */
+ char QRdirname[15]; /* Directory name, if applicable */
+ long QRinfo; /* Info file update relative to msgs*/
+ char QRfloor; /* Which floor this room is on */
+ time_t QRmtime; /* Date/time of last post */
+ struct ExpirePolicy QRep; /* Message expiration policy */
+ long QRnumber; /* Globally unique room number */
+ char QRorder; /* Sort key for room listing order */
+ unsigned QRflags2; /* Additional flags */
+ int QRdefaultview; /* How to display the contents */
+};
+
+ Again, mostly self-explanatory. Here are the interesting ones:
+
+ QRnumber is a globally unique room ID, while QRgen is the "generation number"
+of the room (it's actually a timestamp). The two combined produce a unique
+value which identifies the room. The reason for two separate fields will be
+explained below when we discuss the visit table. For now just remember that
+QRnumber remains the same for the duration of the room's existence, and QRgen
+is timestamped once during room creation but may be restamped later on when
+certain circumstances exist.
+
- msgmain The big circular file that contains message text
- quickroom Contains room info such as room names, stats, etc.
- fullroom One fullrm file per room: message numbers and pointers.
- usersupp Contains info for each user on the system.
-
- The fundamental structure of the system differs greatly from the way
-Citadels used to work. Citadel now depends on a record manager or database
-manager of some sort. Thanks to the API which is in place for connecting to
-a data store, any record manager may be used as long as it supports the
-storage and retrieval of large binary objects (blobs) indexed by unique keys.
-Please see database.c for more information on data store primitives.
-
- The message base (MSGMAIN) is a big file of messages indexed by the message
-number. Messages are numbered consecutively and start with an FF (hex)
-byte. Except for this FF start-of-message byte, all bytes in the message
-file have the high bit set to 0. This means that in principle it is
-trivial to scan through the message file and locate message N if it
-exists, or return error. (Complexities, as usual, crop up when we
-try for efficiency...)
-
- Each room is basically just a list of message numbers. Each time
+
+ FLOORTAB
+ --------
+
+ Floors. This is so simplistic it's not worth going into detail about, except
+to note that we keep a reference count of the number of rooms on each floor.
+
+
+
+ MSGLISTS
+ --------
+ Each record in this table consists of a bunch of message numbers
+which represent the contents of a room. A message can exist in more than one
+room (for example, a mail message with multiple recipients -- 'single instance
+store'). This table is never, ever traversed in its entirety. When you do
+any type of read operation, it fetches the msglist for the room you're in
+(using the room's ID as the index key) and then you can go ahead and read
+those messages one by one.
+
+ Each room is basically just a list of message numbers. Each time
we enter a new message in a room, its message number is appended to the end
of the list. If an old message is to be expired, we must delete it from the
message base. Reading a room is just a matter of looking up the messages
one by one and sending them to the client for display, printing, or whatever.
- Implementing the "new message" function is also trivial in principle:
-we just keep track, for each caller in the userlog, of the highest-numbered
-message which existed on the *last* call. (Remember, message numbers are
-simply assigned sequentially each time a message is created. This
-sequence is global to the entire system, not local within a room.) If
-we ignore all message-numbers in the room less than this, only new messages
-will be printed. Voila!
+
+ VISIT
+ -----
+
+ This is the tough one. Put on your thinking cap and grab a fresh cup of
+coffee before attempting to grok the visit table.
+
+ This table contains records which establish the relationship between users
+and rooms. Its index is a hash of the user and room combination in question.
+When looking for such a relationship, the record in this table can tell the
+server things like "this user has zapped this room," "this user has access to
+this private room," etc. It's also where we keep track of which messages
+the user has marked as "old" and which are "new" (which are not necessarily
+contiguous; contrast with older Citadel implementations which simply kept a
+"last read" pointer).
+
+ Here's what the records look like:
+
+struct visit {
+ long v_roomnum;
+ long v_roomgen;
+ long v_usernum;
+ long v_lastseen;
+ unsigned int v_flags;
+ char v_seen[SIZ];
+ int v_view;
+};
+
+#define V_FORGET 1 /* User has zapped this room */
+#define V_LOCKOUT 2 /* User is locked out of this room */
+#define V_ACCESS 4 /* Access is granted to this room */
+
+ This table is indexed by a concatenation of the first three fields. Whenever
+we want to learn the relationship between a user and a room, we feed that
+data to a function which looks up the corresponding record. The record is
+designed in such a way that an "all zeroes" record (which is what you get if
+the record isn't found) represents the default relationship.
+
+ With this data, we now know which private rooms we're allowed to visit: if
+the V_ACCESS bit is set, the room is one which the user knows, and it may
+appear in his/her known rooms list. Conversely, we also know which rooms the
+user has zapped: if the V_FORGET flag is set, we relegate the room to the
+zapped list and don't bring it up during new message searches. It's also
+worth noting that the V_LOCKOUT flag works in a similar way to administratively
+lock users out of rooms.
+
+ Implementing the "cause all users to forget room" command, then, becomes very
+simple: we simply change the generation number of the room by putting a new
+timestamp in the QRgen field. This causes all relevant visit records to
+become irrelevant, because they appear to point to a different room. At the
+same time, we don't lose the messages in the room, because the msglists table
+is indexed by the room number (QRnumber), which never changes.
+
+ v_seen contains a string which represents the set of messages in this room
+which the user has read (marked as 'seen' or 'old'). It follows the same
+syntax used by IMAP and NNTP. When we search for new messages, we simply
+return any messages that are in the room that are *not* represented by this
+set. Naturally, when we do want to mark more messages as seen (or unmark
+them), we change this string. Citadel BBS client implementations are naive
+and think linearly in terms of "everything is old up to this point," but IMAP
+clients want to have more granularity.
+
+
+ DIRECTORY
+ ---------
+
+ This table simply maps Internet e-mail addresses to Citadel network addresses
+for quick lookup. It is generated from data in the Global Address Book room.
+
+
+ USETABLE
+ --------
+ This table keeps track of message ID's of messages arriving over a network,
+to prevent duplicates from being posted if someone misconfigures the network
+and a loop is created. This table goes unused on a non-networked Citadel.
+
+ MSGMAIN
+ -------
+
+ This is where all message text is stored. It's indexed by message number:
+give it a number, get back a message. Messages are numbered sequentially, and
+the message numbers are never reused.
+ We also keep a "metadata" record for each message. This record is also stored
+in the msgmain table, using the index (0 - msgnum). We keep in the metadata
+record, among other things, a reference count for each message. Since a
+message may exist in more than one room, it's important to keep this reference
+count up to date, and to delete the message from disk when the reference count
+reaches zero.
- Message format on disk (MSGMAIN)
+ Here's the format for the message itself:
- As discussed above, each message begins with an FF byte.
+ Each message begins with an 0xFF 'start of message' byte.
The next byte denotes whether this is an anonymous message. The codes
available are MES_NORMAL, MES_ANON, or MES_AN2 (defined in citadel.h).
BYTE Mnemonic Comments
A Author Name of originator of message.
-B Phone number The dialup number of the system this message
- originated on. This is optional, and is only
- defined for helping implement C86Net gateways.
D Destination Contains name of the system this message should
be sent to, for mail routing (private mail only).
E Extended ID A persistent alphanumeric Message ID used for
message should be deleted. If there exist any messages
with the same Extended ID that are *newer*, then this
message should be dropped.
-F rFc821 address For Internet mail, this is the delivery address of the
+F rFc822 address For Internet mail, this is the delivery address of the
message author.
-G Gateway domain This field is provided solely for the implementation
- of C86Net gateways, and holds the C86Net domain of
- the system this message originated on. Unless you're
- implementing such a gateway, there's no need to even
- bother with this field.
H HumanNodeName Human-readable name of system message originated on.
I Original ID A 32-bit integer containing the message ID on the
system the message *originated* on.
U Subject Optional. Developers may choose whether they wish to
generate or display subject fields. Citadel/UX does
not generate them, but it does print them when found.
+0 Error This field is typically never found in a message on
+ disk or in transit. Message scanning modules are
+ expected to fill in this field when rejecting a message
+ with an explanation as to what happened (virus found,
+ message looks like spam, etc.)
EXAMPLE
place. The H field looks better when it is placed immediately after the N
field.
+
+
+
+
Networking
Citadel nodes network by sharing one or more rooms. Any Citadel node
cycling under control.
(Uncoincidentally) the format used to transmit messages for networking
-purposes is precisely that used on disk, except that there may be any amount
-of garbage between the null ending a message and the <FF> starting the next
-one. This allows greater compatibility if slight problems crop up. The current
-distribution includes netproc.c, which is basically a database replicator;
+purposes is precisely that used on disk, serialized. The current
+distribution includes serv_network.c, which is basically a database replicator;
please see network.txt on its operation and functionality (if any).
+
+
Portability issues
- At this point, all hardware-dependent stuff has been removed from the
-system. On the server side, most of the OS-dependent stuff has been isolated
-into the sysdep.c source module. The server should compile on any POSIX
-compliant system with a full pthreads implementation and TCP/IP support. In
-the future, we may try to port it to non-POSIX systems as well.
+ Citadel/UX is 64-bit clean, architecture-independent, and Year 2000
+compliant. The software should compile on any POSIX compliant system with
+a full pthreads implementation and TCP/IP support. In the future we may
+try to port it to non-POSIX systems as well.
On the client side, it's also POSIX compliant. The client even seems to
-build ok on non-POSIX systems with porting libraries (such as the Cygnus
-Win32 stuff).
+build ok on non-POSIX systems with porting libraries (such as Cygwin).
- "Room" records (quickroom)
-
-The rooms are basically indices into msgmain, the message database.
-As noted in the overview, each is essentially an array of pointers into
-the message file. The pointers consist of a 32-bit message ID number
-(we will wrap around at 32 bits for these purposes).
-
-Since messages are numbered sequentially, the
-set of messages existing in msgmain will always form a continuous
-sequence at any given time.
-
-That should be enough background to tackle a full-scale room. From citadel.h:
-struct quickroom {
- char QRname[20]; /* Max. len is 19, plus null term */
- char QRpasswd[10]; /* Only valid if it's a private rm */
- long QRroomaide; /* User number of room aide */
- long QRhighest; /* Highest message NUMBER in room */
- long QRgen; /* Generation number of room */
- unsigned QRflags; /* See flag values below */
- char QRdirname[15]; /* Directory name, if applicable */
- char QRfloor; /* (not yet implemented) */
- };
-
-#define QR_BUSY 1 /* Room is being updated, WAIT */
-#define QR_INUSE 2 /* Set if in use, clear if avail */
-#define QR_PRIVATE 4 /* Set for any type of private room */
-#define QR_PASSWORDED 8 /* Set if there's a password too */
-#define QR_GUESSNAME 16 /* Set if it's a guessname room */
-#define QR_DIRECTORY 32 /* Directory room */
-#define QR_UPLOAD 64 /* Allowed to upload */
-#define QR_DOWNLOAD 128 /* Allowed to download */
-#define QR_VISDIR 256 /* Visible directory */
-#define QR_ANONONLY 512 /* Anonymous-Only room */
-#define QR_ANON2 1024 /* Anonymous-Option room */
-#define QR_NETWORK 2048 /* Shared network room */
-#define QR_PREFONLY 4096 /* Preferred users only */
-
-[Note that all components start with "QR" for quickroom, to make sure we
- don't accidentally use an offset in the wrong structure. Be very careful
- also to get a meaningful sequence of components --
- some C compilers don't check this sort of stuff either.]
-
-QRgen handles the problem of rooms which have died and been reborn
-under another name. This will be clearer when we get to the userlog.
-For now, just note that each room has a generation number which is
-bumped by one each time it is recycled.
-
-QRflags is just a bag of bits recording the status of the room. The
-defined bits are:
-
-QR_BUSY This is to insure that two processes don't update the same
- record at the same time, even though this hasn't been
- implemented yet.
-QR_INUSE 1 if the room is valid, 0 if it is free for re-assignment.
-QR_PRIVATE 1 if the room is not visible by default, 0 for public.
-QR_PASSWORDED 1 if entry to the room requires a password.
-QR_GUESSNAME 1 if the room can be reached by guessing the name.
-QR_DIRECTORY 1 if the room is a window onto some disk/userspace, else 0.
-QR_UPLOAD 1 if users can upload into this room, else 0.
-QR_DOWNLOAD 1 if users can download from this room, else 0.
-QR_VISDIR 1 if users are allowed to read the directory, else 0.
-QR_ANONONLY 1 if all messages are to recieve the "****" anon header.
-QR_ANON2 1 if the user will be asked if he/she wants an anon message.
-QR_NETWORK 1 if this room is shared on a network, else 0.
-QR_PREFONLY 1 if the room is only accessible to preferred users, else 0.
-
-QRname is just an ASCII string (null-terminated, like all strings)
-giving the name of the room.
-
-QRdirname is meaningful only in QR_DIRECTORY rooms, in which case
-it gives the directory name to window.
-
-QRpasswd is the room's password, if it's a QR_PASSWORDED room. Note that
-if QR_PASSWORDED or QR_GUESSNAME are set, you MUST also set QR_PRIVATE.
-QR_PRIVATE by itself designates invitation-only. Do not EVER set all three
-flags at the same time.
-
-QRroomaide is the user number of the room's room-aide (or zero if the room
-doesn't have a room aide). Note that if a user is deleted, his/her user number
-is never used again, so you don't have to worry about a new user getting the
-same user number and accidentally becoming a room-aide of one or more rooms.
-
-The only field new to us in quickroom is QRhighest, recording the
-most recent message in the room. When we are searching for rooms with
-messages a given caller hasn't seen, we can check this number
-and avoid a whole lot of extra disk accesses.
-
- There used to also be a structure called "fullroom" which resided in one
-file for each room on the system. This has been abandoned in favour of
-"message lists" which are variable sized and simply contain zero or more
-message numbers. The message numbers, in turn, point to messages on disk.
-
- User records (usersupp)
-
-This is the fun one. Get some fresh air and plug in your thinking cap
-first. (Time, space and complexity are the eternal software rivals.
-We've got lots of log entries times lots of messages spread over up to nnn
-rooms to worry about, and with multitasking, disk access time is important...
-so perforce, we opt for complexity to keep time and space in bounds.)
-
-To understand what is happening in the log code takes a little persistence.
-You also have to disentangle the different activities going on and
-tackle them one by one.
-
- o We want to remember some random things such as terminal screen
- size, and automatically set them up for each caller at login.
-
- o We want to be able to locate all new messages, and only new
- messages, efficiently. Messages should stay new even if it
- takes a caller a couple of calls to get around to them.
-
- o We want to remember which private rooms a given caller knows
- about, and treat them as normal rooms. This means mostly
- automatically seeking out those with new messages. (Obviously,
- we >don't< want to do this for unknown private rooms!) This
- has to be secure against the periodic recycling of rooms
- between calls.
-
- o We want to support private mail to a caller.
-
- o We want to provide some protection of this information (via
- passwords at login) and some assurance that messages are from
- who they purport to be from (within the system -- one shouldn't
- be able to forge messages from established users).
-
-Lifting another page from citadel.h gives us:
-
-struct usersupp { /* User record */
- int USuid; /* uid account is logged in under */
- char password[20]; /* password */
- long lastseen[MAXROOMS]; /* Last message seen in each room */
- char generation[MAXROOMS]; /* Generation # (for private rooms) */
- char forget[MAXROOMS]; /* Forgotten generation number */
- unsigned flags; /* See US_ flags below */
- int screenwidth; /* For formatting messages */
- int timescalled; /* Total number of logins */
- int posted; /* Number of messages posted (ever) */
- char fullname[26]; /* Bulletin Board name for messages */
- char axlevel; /* Access level */
- long usernum; /* Eternal user number */
- long lastcall; /* Last time the user called */
- };
-
-#define US_PERM 1 /* Permanent user; don't scroll off */
-#define US_LASTOLD 16 /* Print last old message with new */
-#define US_EXPERT 32 /* Experienced user */
-#define US_UNLISTED 64 /* Unlisted userlog entry */
-#define US_NOPROMPT 128 /* Don't prompt after each message */
-#define US_PREF 1024 /* Preferred user */
-
-Looks simple enough, doesn't it? One topic at a time:
-
- Random configuration parameters:
--screenwidth is the caller's screen width. We format all messages to this
-width, as best we can. flags is another bit-bag, recording whether we want
-prompts, people who want to suppress the little automatic hints all through
-the system, etc.
-
- Attachments, names & numbers:
--USuid is the uid the account was established under. For most users it will
-be the same as BBSUID, but it won't be for users that logged in from the shell.
--fullname is the user's full login name.
--usernum is the user's ID number. It is unique to the entire system:
-once someone has a user number, it is never used again after the user is
-deleted. This allows an easy way to numerically represent people.
--password is the user's password.
--axlevel is the user's access level, so we know who's an Aide, who's a problem
-user, etc. These are defined and listed in the system.
-
- Feeping Creatures:
--timescalled is the number of times the user has called.
--posted is the number of messages the user has posted, public or private.
-
- Misc stuff:
--lastcall holds the date and time (standard Unix format) the user called, so
-we can purge people who haven't called in a given amount of time.
-
- Finding new messages:
-This is the most important. Thus, it winds up being the most
-elaborate. Conceptually, what we would like to do is mark each
-message with a bit after our caller has read it, so we can avoid
-printing it out again next call. Unfortunately, with lots of user
-entries this would require adding lots of bits to each message... and
-we'd wind up reading off disk lots of messages which would never
-get printed. So we resort to approximation and a small table.
-
-The approximation comes in doing things at the granularity of
-rooms rather than messages. Messages in a given room are "new"
-until we visit it, and "old" after we leave the room... whether
-we read any of them or not. This can actually be defended: anyone
-who passes through a room without reading the contents probably just
-isn't interested in the topic, and would just as soon not be dragged
-back every visit and forced to read them. Given that messages are
-numbered sequentially, we can simply record the most recent message ID#
-of each room as of the last time we visited it. Very simple.
-
-Putting it all together, we can now compute whether a given room
-has new messages for our current caller without going to the message base
-index (fullroom) at all:
-
- > We get the usersupp.lastseen[] for the room in question
- > We compare this with the room's quickroom.QRhighest, which tells us
- what the most recent message in the room is currently.
-
-
- REMEMBERING WHICH PRIVATE ROOMS TO VISIT
-
-This looks trivial at first glance -- just record one bit per room per
-caller in the log records. The problem is that rooms get recycled
-periodically, and we'd rather not run through all the log entries each
-time we do it. So we adopt a kludge which should work 99% of the time.
-
-As previously noted, each room has a generation number, which is bumped
-by one each time it is recycled. As not noted, this generation number
-runs from 0 -> 127 (and then wraps around and starts over).
- When someone visits a room, we set usersupp.generation for the room
-equal to that of the room. This flags the room as being available.
-If the room gets recycled, on our next visit the two generation numbers
-will no longer match, and the room will no longer be available -- just
-the result we're looking for. (Naturally, if a room is public,
-all this stuff is irrelevant.)
-
-This leaves only the problem of an accidental matchup between the two
-numbers giving someone access to a Forbidden Room. We can't eliminate
-this danger completely, but it can be reduced to insignificance for
-most purposes. (Just don't bet megabucks on the security of this system!)
-Each time someone logs in, we set all "wrong" generation numbers to -1.
-So the room must be recycled 127 times before an accidental matchup
-can be achieved. (We do this for all rooms, INUSE or dead, public
-or private, since any of them may be reincarnated as a Forbidden Room.)
-
-Thus, for someone to accidentally be led to a Forbidden Room, they
-must establish an account on the system, then not call until some room
-has been recycled 127 to 128 times, which room must be
-reincarnated as a Forbidden Room, which someone must now call back
-(having not scrolled off the userlog in the mean time) and read new
-messages. The last clause is about the only probable one in the sequence.
-The danger of this is much less than the danger that someone will
-simply guess the name of the room outright (if it's a guess-name room)
-or some other human loophole.
-
- FORGOTTEN ROOMS
-
- This is exactly the opposite of private rooms. When a user chooses to
-forget a room, we put the room's generation number in usersupp.forget for
-that room. When doing a <K>nown rooms list or a <G>oto, any matchups cause
-the room to be skipped. Very simple.
+
SUPPORTING PRIVATE MAIL
separate Mail> room for each user behind the scenes. The actual room name
in the database looks like "0000001234.Mail" (where '1234' is the user
number) and it's flagged with the QR_MAILBOX flag. The user number is
-stripped off by the server before the name is presented to the client.
+stripped off by the server before the name is presented to the client. This
+provides the ability to give each user a separate namespace for mailboxes
+and personal rooms.
This requires a little fiddling to get things just right. For example,
make_message() has to be kludged to ask for the name of the recipient
regular.
+
PASSWORDS AND NAME VALIDATION
This has changed a couple of times over the course of Citadel's history. At