+++ /dev/null
- hack.txt for Citadel/UX
- (possibly a little out of date)
-
- Much of this document is borrowed from the original hack.doc from
-Citadel-CP/M and Citadel-86, because many of the concepts are the same. Hats
-off to whoever wrote the original, for a fine document that inspired the
-implementation of Citadel for Unix.
-
- Note that this document is really out of date. It doesn't cover anything
-about the threaded server architecture or any of the network stuff. What is
-covered here is the basic architecture of the databases.
-
- But enough of the preamble. Here's how Citadel/UX works :)
-
- Here are the major databases to be discussed:
-
- msgmain The big circular file that contains message text
- quickroom Contains room info such as room names, stats, etc.
- fullroom One fullrm file per room: message numbers and pointers.
- usersupp Contains info for each user on the system.
-
- The fundamental structure of the system differs greatly from the way
-Citadels used to work. Citadel now depends on a record manager or database
-manager of some sort. Thanks to the API which is in place for connecting to
-a data store, any record manager may be used as long as it supports the
-storage and retrieval of large binary objects (blobs) indexed by unique keys.
-Please see database.c for more information on data store primitives.
-
- The message base (MSGMAIN) is a big file of messages indexed by the message
-number. Messages are numbered consecutively and start with an FF (hex)
-byte. Except for this FF start-of-message byte, all bytes in the message
-file have the high bit set to 0. This means that in principle it is
-trivial to scan through the message file and locate message N if it
-exists, or return error. (Complexities, as usual, crop up when we
-try for efficiency...)
-
- Each room is basically just a list of message numbers. Each time
-we enter a new message in a room, its message number is appended to the end
-of the list. If an old message is to be expired, we must delete it from the
-message base. Reading a room is just a matter of looking up the messages
-one by one and sending them to the client for display, printing, or whatever.
-
- Implementing the "new message" function is also trivial in principle:
-we just keep track, for each caller in the userlog, of the highest-numbered
-message which existed on the *last* call. (Remember, message numbers are
-simply assigned sequentially each time a message is created. This
-sequence is global to the entire system, not local within a room.) If
-we ignore all message-numbers in the room less than this, only new messages
-will be printed. Voila!
-
-
- Message format on disk (MSGMAIN)
-
- As discussed above, each message begins with an FF byte.
-
- The next byte denotes whether this is an anonymous message. The codes
-available are MES_NORMAL, MES_ANON, or MES_AN2 (defined in citadel.h).
-
- The third byte is a "message type" code. The following codes are defined:
- 0 - "Traditional" Citadel format. Message is to be displayed "formatted."
- 1 - Plain pre-formatted ASCII text (otherwise known as text/plain)
- 4 - MIME formatted message. The text of the message which follows is
- expected to begin with a "Content-type:" header.
-
- After these three opening bytes, the remainder of
-the message consists of a sequence of character strings. Each string
-begins with a type byte indicating the meaning of the string and is
-ended with a null. All strings are printable ASCII: in particular,
-all numbers are in ASCII rather than binary. This is for simplicity,
-both in implementing the system and in implementing other code to
-work with the system. For instance, a database driven off Citadel archives
-can do wildcard matching without worrying about unpacking binary data such
-as message ID's first. To provide later downward compatability
-all software should be written to IGNORE fields not currently defined.
-
- The type bytes currently defined are:
-
-BYTE Mnemonic Comments
-
-A Author Name of originator of message.
-B Phone number The dialup number of the system this message
- originated on. This is optional, and is only
- defined for helping implement C86Net gateways.
-D Destination Contains name of the system this message should
- be sent to, for mail routing (private mail only).
-E Extended ID A persistent alphanumeric Message ID used for
- network replication. When a message arrives that
- contains an Extended ID, any existing messages which
- contain the same Extended ID and are *older* than this
- message should be deleted. If there exist any messages
- with the same Extended ID that are *newer*, then this
- message should be dropped.
-G Gateway domain This field is provided solely for the implementation
- of C86Net gateways, and holds the C86Net domain of
- the system this message originated on. Unless you're
- implementing such a gateway, there's no need to even
- bother with this field.
-H HumanNodeName Human-readable name of system message originated on.
-I Original ID A 32-bit integer containing the message ID on the
- system the message *originated* on.
-M Message Text Normal ASCII, newlines seperated by CR's or LF's,
- null terminated as always.
-N Nodename Contains node name of system message originated on.
-O Room Room of origin.
-P Path Complete path of message, as in the UseNet news
- standard. A user should be able to send Internet mail
- to this path. (Note that your system name will not be
- tacked onto this until you're sending the message to
- someone else)
-R Recipient Only present in Mail messages.
-S Special field Only meaningful for messages being spooled over a
- network. Usually means that the message isn't really
- a message, but rather some other network function:
- -> "S" followed by "FILE" (followed by a null, of
- course) means that the message text is actually an
- IGnet/Open file transfer.
-T Date/Time A 32-bit integer containing the date and time of
- the message in standard UNIX format (the number
- of seconds since January 1, 1970 GMT).
-U Subject Optional. Developers may choose whether they wish to
- generate or display subject fields. Citadel/UX does
- not generate them, but it does print them when found.
-
- EXAMPLE
-
-Let <FF> be a 0xFF byte, and <0> be a null (0x00) byte. Then a message
-which prints as...
-
-Apr 12, 1988 23:16 From Test User In Network Test> @lifesys (Life BBS)
-Have a nice day!
-
- might be stored as...
-<FF><40><0>I12345<0>Pneighbor!lifesys!test_user<0>T576918988<0> (continued)
------------|Mesg ID#|--Message Path---------------|--Date------
-
-AThe Test User<0>ONetwork Test<0>Nlifesys<0>HLife BBS<0>MHave a nice day!<0>
-|-----Author-----|-Room name-----|-nodename-|Human Name-|--Message text-----
-
- Weird things can happen if fields are missing, especially if you use the
-networker. But basically, the date, author, room, and nodename may be in any
-order. But the leading fields and the message text must remain in the same
-place. The H field looks better when it is placed immediately after the N
-field.
-
- Networking
-
-Citadel nodes network by sharing one or more rooms. Any Citadel node
-can choose to share messages with any other Citadel node, through the sending
-of spool files. The sending system takes all messages it hasn't sent yet, and
-spools them to the recieving system, which posts them in the rooms.
-
-Complexities arise primarily from the possibility of densely connected
-networks: one does not wish to accumulate multiple copies of a given
-message, which can easily happen. Nor does one want to see old messages
-percolating indefinitely through the system.
-
-This problem is handled by keeping track of the path a message has taken over
-the network, like the UseNet news system does. When a system sends out a
-message, it adds its own name to the bang-path in the <P> field of the
-message. If no path field is present, it generates one.
-
-With the path present, all the networker has to do to assure that it doesn't
-send another system a message it's already received is check the <P>ath field
-for that system's name somewhere in the bang path. If it's present, the system
-has already seen the message, so we don't send it. (Note that the current
-implementation does not allow for "loops" in the network -- if you build your
-net this way you will see lots of duplicate messages.)
-
-The above discussion should make the function of the fields reasonably clear:
-
- o Travelling messages need to carry original message-id, system of origin,
- date of origin, author, and path with them, to keep reproduction and
- cycling under control.
-
-(Uncoincidentally) the format used to transmit messages for networking
-purposes is precisely that used on disk, except that there may be any amount
-of garbage between the null ending a message and the <FF> starting the next
-one. This allows greater compatibility if slight problems crop up. The current
-distribution includes netproc.c, which is basically a database replicator;
-please see network.txt on its operation and functionality (if any).
-
- Portability issues
-
- At this point, all hardware-dependent stuff has been removed from the
-system. On the server side, most of the OS-dependent stuff has been isolated
-into the sysdep.c source module. The server should compile on any POSIX
-compliant system with a full pthreads implementation and TCP/IP support. In
-the future, we may try to port it to non-POSIX systems as well.
-
- On the client side, it's also POSIX compliant. The client even seems to
-build ok on non-POSIX systems with porting libraries (such as the Cygnus
-Win32 stuff).
-
-
- "Room" records (quickroom)
-
-The rooms are basically indices into msgmain, the message database.
-As noted in the overview, each is essentially an array of pointers into
-the message file. The pointers consist of a 32-bit message ID number
-(we will wrap around at 32 bits for these purposes).
-
-Since messages are numbered sequentially, the
-set of messages existing in msgmain will always form a continuous
-sequence at any given time.
-
-That should be enough background to tackle a full-scale room. From citadel.h:
-
-struct quickroom {
- char QRname[20]; /* Max. len is 19, plus null term */
- char QRpasswd[10]; /* Only valid if it's a private rm */
- long QRroomaide; /* User number of room aide */
- long QRhighest; /* Highest message NUMBER in room */
- long QRgen; /* Generation number of room */
- unsigned QRflags; /* See flag values below */
- char QRdirname[15]; /* Directory name, if applicable */
- char QRfloor; /* (not yet implemented) */
- };
-
-#define QR_BUSY 1 /* Room is being updated, WAIT */
-#define QR_INUSE 2 /* Set if in use, clear if avail */
-#define QR_PRIVATE 4 /* Set for any type of private room */
-#define QR_PASSWORDED 8 /* Set if there's a password too */
-#define QR_GUESSNAME 16 /* Set if it's a guessname room */
-#define QR_DIRECTORY 32 /* Directory room */
-#define QR_UPLOAD 64 /* Allowed to upload */
-#define QR_DOWNLOAD 128 /* Allowed to download */
-#define QR_VISDIR 256 /* Visible directory */
-#define QR_ANONONLY 512 /* Anonymous-Only room */
-#define QR_ANON2 1024 /* Anonymous-Option room */
-#define QR_NETWORK 2048 /* Shared network room */
-#define QR_PREFONLY 4096 /* Preferred users only */
-
-[Note that all components start with "QR" for quickroom, to make sure we
- don't accidentally use an offset in the wrong structure. Be very careful
- also to get a meaningful sequence of components --
- some C compilers don't check this sort of stuff either.]
-
-QRgen handles the problem of rooms which have died and been reborn
-under another name. This will be clearer when we get to the userlog.
-For now, just note that each room has a generation number which is
-bumped by one each time it is recycled.
-
-QRflags is just a bag of bits recording the status of the room. The
-defined bits are:
-
-QR_BUSY This is to insure that two processes don't update the same
- record at the same time, even though this hasn't been
- implemented yet.
-QR_INUSE 1 if the room is valid, 0 if it is free for re-assignment.
-QR_PRIVATE 1 if the room is not visible by default, 0 for public.
-QR_PASSWORDED 1 if entry to the room requires a password.
-QR_GUESSNAME 1 if the room can be reached by guessing the name.
-QR_DIRECTORY 1 if the room is a window onto some disk/userspace, else 0.
-QR_UPLOAD 1 if users can upload into this room, else 0.
-QR_DOWNLOAD 1 if users can download from this room, else 0.
-QR_VISDIR 1 if users are allowed to read the directory, else 0.
-QR_ANONONLY 1 if all messages are to recieve the "****" anon header.
-QR_ANON2 1 if the user will be asked if he/she wants an anon message.
-QR_NETWORK 1 if this room is shared on a network, else 0.
-QR_PREFONLY 1 if the room is only accessible to preferred users, else 0.
-
-QRname is just an ASCII string (null-terminated, like all strings)
-giving the name of the room.
-
-QRdirname is meaningful only in QR_DIRECTORY rooms, in which case
-it gives the directory name to window.
-
-QRpasswd is the room's password, if it's a QR_PASSWORDED room. Note that
-if QR_PASSWORDED or QR_GUESSNAME are set, you MUST also set QR_PRIVATE.
-QR_PRIVATE by itself designates invitation-only. Do not EVER set all three
-flags at the same time.
-
-QRroomaide is the user number of the room's room-aide (or zero if the room
-doesn't have a room aide). Note that if a user is deleted, his/her user number
-is never used again, so you don't have to worry about a new user getting the
-same user number and accidentally becoming a room-aide of one or more rooms.
-
-The only field new to us in quickroom is QRhighest, recording the
-most recent message in the room. When we are searching for rooms with
-messages a given caller hasn't seen, we can check this number
-and avoid a whole lot of extra disk accesses.
-
- There used to also be a structure called "fullroom" which resided in one
-file for each room on the system. This has been abandoned in favour of
-"message lists" which are variable sized and simply contain zero or more
-message numbers. The message numbers, in turn, point to messages on disk.
-
- User records (usersupp)
-
-This is the fun one. Get some fresh air and plug in your thinking cap
-first. (Time, space and complexity are the eternal software rivals.
-We've got lots of log entries times lots of messages spread over up to nnn
-rooms to worry about, and with multitasking, disk access time is important...
-so perforce, we opt for complexity to keep time and space in bounds.)
-
-To understand what is happening in the log code takes a little persistence.
-You also have to disentangle the different activities going on and
-tackle them one by one.
-
- o We want to remember some random things such as terminal screen
- size, and automatically set them up for each caller at login.
-
- o We want to be able to locate all new messages, and only new
- messages, efficiently. Messages should stay new even if it
- takes a caller a couple of calls to get around to them.
-
- o We want to remember which private rooms a given caller knows
- about, and treat them as normal rooms. This means mostly
- automatically seeking out those with new messages. (Obviously,
- we >don't< want to do this for unknown private rooms!) This
- has to be secure against the periodic recycling of rooms
- between calls.
-
- o We want to support private mail to a caller.
-
- o We want to provide some protection of this information (via
- passwords at login) and some assurance that messages are from
- who they purport to be from (within the system -- one shouldn't
- be able to forge messages from established users).
-
-Lifting another page from citadel.h gives us:
-
-struct usersupp { /* User record */
- int USuid; /* uid account is logged in under */
- char password[20]; /* password */
- long lastseen[MAXROOMS]; /* Last message seen in each room */
- char generation[MAXROOMS]; /* Generation # (for private rooms) */
- char forget[MAXROOMS]; /* Forgotten generation number */
- unsigned flags; /* See US_ flags below */
- int screenwidth; /* For formatting messages */
- int timescalled; /* Total number of logins */
- int posted; /* Number of messages posted (ever) */
- char fullname[26]; /* Bulletin Board name for messages */
- char axlevel; /* Access level */
- long usernum; /* Eternal user number */
- long lastcall; /* Last time the user called */
- };
-
-#define US_PERM 1 /* Permanent user; don't scroll off */
-#define US_LASTOLD 16 /* Print last old message with new */
-#define US_EXPERT 32 /* Experienced user */
-#define US_UNLISTED 64 /* Unlisted userlog entry */
-#define US_NOPROMPT 128 /* Don't prompt after each message */
-#define US_PREF 1024 /* Preferred user */
-
-Looks simple enough, doesn't it? One topic at a time:
-
- Random configuration parameters:
--screenwidth is the caller's screen width. We format all messages to this
-width, as best we can. flags is another bit-bag, recording whether we want
-prompts, people who want to suppress the little automatic hints all through
-the system, etc.
-
- Attachments, names & numbers:
--USuid is the uid the account was established under. For most users it will
-be the same as BBSUID, but it won't be for users that logged in from the shell.
--fullname is the user's full login name.
--usernum is the user's ID number. It is unique to the entire system:
-once someone has a user number, it is never used again after the user is
-deleted. This allows an easy way to numerically represent people.
--password is the user's password.
--axlevel is the user's access level, so we know who's an Aide, who's a problem
-user, etc. These are defined and listed in the system.
-
- Feeping Creatures:
--timescalled is the number of times the user has called.
--posted is the number of messages the user has posted, public or private.
-
- Misc stuff:
--lastcall holds the date and time (standard Unix format) the user called, so
-we can purge people who haven't called in a given amount of time.
-
- Finding new messages:
-This is the most important. Thus, it winds up being the most
-elaborate. Conceptually, what we would like to do is mark each
-message with a bit after our caller has read it, so we can avoid
-printing it out again next call. Unfortunately, with lots of user
-entries this would require adding lots of bits to each message... and
-we'd wind up reading off disk lots of messages which would never
-get printed. So we resort to approximation and a small table.
-
-The approximation comes in doing things at the granularity of
-rooms rather than messages. Messages in a given room are "new"
-until we visit it, and "old" after we leave the room... whether
-we read any of them or not. This can actually be defended: anyone
-who passes through a room without reading the contents probably just
-isn't interested in the topic, and would just as soon not be dragged
-back every visit and forced to read them. Given that messages are
-numbered sequentially, we can simply record the most recent message ID#
-of each room as of the last time we visited it. Very simple.
-
-Putting it all together, we can now compute whether a given room
-has new messages for our current caller without going to the message base
-index (fullroom) at all:
-
- > We get the usersupp.lastseen[] for the room in question
- > We compare this with the room's quickroom.QRhighest, which tells us
- what the most recent message in the room is currently.
-
-
- REMEMBERING WHICH PRIVATE ROOMS TO VISIT
-
-This looks trivial at first glance -- just record one bit per room per
-caller in the log records. The problem is that rooms get recycled
-periodically, and we'd rather not run through all the log entries each
-time we do it. So we adopt a kludge which should work 99% of the time.
-
-As previously noted, each room has a generation number, which is bumped
-by one each time it is recycled. As not noted, this generation number
-runs from 0 -> 127 (and then wraps around and starts over).
- When someone visits a room, we set usersupp.generation for the room
-equal to that of the room. This flags the room as being available.
-If the room gets recycled, on our next visit the two generation numbers
-will no longer match, and the room will no longer be available -- just
-the result we're looking for. (Naturally, if a room is public,
-all this stuff is irrelevant.)
-
-This leaves only the problem of an accidental matchup between the two
-numbers giving someone access to a Forbidden Room. We can't eliminate
-this danger completely, but it can be reduced to insignificance for
-most purposes. (Just don't bet megabucks on the security of this system!)
-Each time someone logs in, we set all "wrong" generation numbers to -1.
-So the room must be recycled 127 times before an accidental matchup
-can be achieved. (We do this for all rooms, INUSE or dead, public
-or private, since any of them may be reincarnated as a Forbidden Room.)
-
-Thus, for someone to accidentally be led to a Forbidden Room, they
-must establish an account on the system, then not call until some room
-has been recycled 127 to 128 times, which room must be
-reincarnated as a Forbidden Room, which someone must now call back
-(having not scrolled off the userlog in the mean time) and read new
-messages. The last clause is about the only probable one in the sequence.
-The danger of this is much less than the danger that someone will
-simply guess the name of the room outright (if it's a guess-name room)
-or some other human loophole.
-
- FORGOTTEN ROOMS
-
- This is exactly the opposite of private rooms. When a user chooses to
-forget a room, we put the room's generation number in usersupp.forget for
-that room. When doing a <K>nown rooms list or a <G>oto, any matchups cause
-the room to be skipped. Very simple.
-
- SUPPORTING PRIVATE MAIL
-
- Can one have an elegant kludge? This must come pretty close.
-
- Private mail is sent and recieved in the Mail> room, which otherwise
-behaves pretty much as any other room. To make this work, we have a
-separate Mail> room for each user behind the scenes. The actual room name
-in the database looks like "0000001234.Mail" (where '1234' is the user
-number) and it's flagged with the QR_MAILBOX flag. The user number is
-stripped off by the server before the name is presented to the client.
-
- This requires a little fiddling to get things just right. For example,
-make_message() has to be kludged to ask for the name of the recipient
-of the message whenever a message is entered in Mail>. But basically
-it works pretty well, keeping the code and user interface simple and
-regular.
-
-
- PASSWORDS AND NAME VALIDATION
-
- This has changed a couple of times over the course of Citadel's history. At
-this point it's very simple, again due to the fact that record managers are
-used for everything. The user file (usersupp) is indexed using the user's
-name, converted to all lower-case. Searching for a user, then, is easy. We
-just lowercase the name we're looking for and query the database. If no
-match is found, it is assumed that the user does not exist.
-
- This makes it difficult to forge messages from an existing user. (Fine
-point: nonprinting characters are converted to printing characters, and
-leading, trailing, and double blanks are deleted.)