From 0b5f1ab8ec8a4ccc55c328b87fe06ea5c9fbb81b Mon Sep 17 00:00:00 2001 From: Art Cancro Date: Sat, 20 Jul 2002 04:05:12 +0000 Subject: [PATCH] * Updated hack.txt with some fresh new information --- citadel/ChangeLog | 4 + citadel/techdoc/hack.txt | 510 +++++++++++++++++---------------------- 2 files changed, 223 insertions(+), 291 deletions(-) diff --git a/citadel/ChangeLog b/citadel/ChangeLog index 58acdcbd4..225987e93 100644 --- a/citadel/ChangeLog +++ b/citadel/ChangeLog @@ -1,4 +1,7 @@ $Log$ + Revision 591.69 2002/07/20 04:05:10 ajc + * Updated hack.txt with some fresh new information + Revision 591.68 2002/07/19 02:28:11 ajc * citadel_ipc.c: changed various buffer lengths from 256 to SIZ in order to accomodate long lines which often spew out (usually from spam unfortunately) @@ -3818,3 +3821,4 @@ Sat Jul 11 00:20:48 EDT 1998 Nathan Bryant Fri Jul 10 1998 Art Cancro * Initial CVS import + diff --git a/citadel/techdoc/hack.txt b/citadel/techdoc/hack.txt index af5ce2cd1..5cd4d57ea 100644 --- a/citadel/techdoc/hack.txt +++ b/citadel/techdoc/hack.txt @@ -1,57 +1,222 @@ - hack.txt for Citadel/UX - (some of this stuff is *very* out of date.) - - Much of this document is borrowed from the original hack.doc from -Citadel-CP/M and Citadel-86, because many of the concepts are the same. Hats -off to whoever wrote the original, for a fine document that inspired the -implementation of Citadel for Unix. + ------------------------------------------------------ + The totally incomplete guide to Citadel/UX internals + ------------------------------------------------------ + + Citadel has evolved quite a bit since its early days, and the data structures +have evolved with it. This document provides a rough overview of how the +system works internally. For details you're going to have to dig through the +code, but this'll get you started. + + + Database tables - Note that this document is really out of date. It doesn't cover anything -about the threaded server architecture or any of the network stuff. What is -covered here is the basic architecture of the databases. + + As you probably already know by now, Citadel uses a group of tables stored +with a record manager (usually Berkeley DB). Since we're using a record +manager rather than a relational database, all record structures are managed +by Citadel. Here are some of the tables we keep on disk: + + + USERSUPP + -------- - But enough of the preamble. Here's how Citadel/UX works :) - - Here are the major databases to be discussed: + This table contains all user records. It's called 'usersupp' because it was +once a supplementary file (at one point in ancient history, we created a user +record on the underlying operating system for each user). It's indexed by +user name (translated to lower case for indexing purposes). The records in +this file look something like this: + +struct usersupp { /* User record */ + int version; /* Cit vers. which created this rec */ + uid_t uid; /* Associate with a unix account? */ + char password[32]; /* password (for BBS-only users) */ + unsigned flags; /* See US_ flags below */ + long timescalled; /* Total number of logins */ + long posted; /* Number of messages posted (ever) */ + CIT_UBYTE axlevel; /* Access level */ + long usernum; /* User number (never recycled) */ + time_t lastcall; /* Last time the user called */ + int USuserpurge; /* Purge time (in days) for user */ + char fullname[64]; /* Name for Citadel messages & mail */ + CIT_UBYTE USscreenwidth; /* Screen width (for textmode users)*/ + CIT_UBYTE USscreenheight; /* Screen height(for textmode users)*/ +}; + + Most fields here should be fairly self-explanatory. The ones that might +deserve some attention are: + + uid -- if uid is not the same as the uid Citadel is running as, then the +account is assumed to belong to the user on the underlying Unix system with +that uid. This allows us to require the user's OS password instead of having +a separate Citadel password. + + usernum -- these are assigned sequentially, and NEVER REUSED. This is +important because it allows us to use this number in other data structures +without having to worry about users being added/removed later on, as you'll +see later in this document. + + The screenwidth and screenheight fields are almost never used anymore. Back +when people were calling into dialup systems we had no way of knowing the +user's screen dimensions, but modern networks almost always transmit this +information so we set it up dynamically. + + + QUICKROOM + --------- + + These are room records. One per room. It's called 'quickroom' because at +one time it was a quick index hash type of thing (there was a pair called +quickroom and fullroom). There is a quickroom record for every room on the +system, public or private or mailbox. It's indexed by room name (also in +lower case for easy indexing) and it contains records which look like this: + +struct quickroom { + char QRname[ROOMNAMELEN]; /* Name of room */ + char QRpasswd[10]; /* Only valid if it's a private rm */ + long QRroomaide; /* User number of room aide */ + long QRhighest; /* Highest message NUMBER in room */ + time_t QRgen; /* Generation number of room */ + unsigned QRflags; /* See flag values below */ + char QRdirname[15]; /* Directory name, if applicable */ + long QRinfo; /* Info file update relative to msgs*/ + char QRfloor; /* Which floor this room is on */ + time_t QRmtime; /* Date/time of last post */ + struct ExpirePolicy QRep; /* Message expiration policy */ + long QRnumber; /* Globally unique room number */ + char QRorder; /* Sort key for room listing order */ + unsigned QRflags2; /* Additional flags */ + int QRdefaultview; /* How to display the contents */ +}; + + Again, mostly self-explanatory. Here are the interesting ones: + + QRnumber is a globally unique room ID, while QRgen is the "generation number" +of the room (it's actually a timestamp). The two combined produce a unique +value which identifies the room. The reason for two separate fields will be +explained below when we discuss the visit table. For now just remember that +QRnumber remains the same for the duration of the room's existence, and QRgen +is timestamped once during room creation but may be restamped later on when +certain circumstances exist. + - msgmain The big circular file that contains message text - quickroom Contains room info such as room names, stats, etc. - fullroom One fullrm file per room: message numbers and pointers. - usersupp Contains info for each user on the system. - - The fundamental structure of the system differs greatly from the way -Citadels used to work. Citadel now depends on a record manager or database -manager of some sort. Thanks to the API which is in place for connecting to -a data store, any record manager may be used as long as it supports the -storage and retrieval of large binary objects (blobs) indexed by unique keys. -Please see database.c for more information on data store primitives. - - The message base (MSGMAIN) is a big file of messages indexed by the message -number. Messages are numbered consecutively and start with an FF (hex) -byte. Except for this FF start-of-message byte, all bytes in the message -file have the high bit set to 0. This means that in principle it is -trivial to scan through the message file and locate message N if it -exists, or return error. (Complexities, as usual, crop up when we -try for efficiency...) - - Each room is basically just a list of message numbers. Each time + + FLOORTAB + -------- + + Floors. This is so simplistic it's not worth going into detail about, except +to note that we keep a reference count of the number of rooms on each floor. + + + + MSGLISTS + -------- + Each record in this table consists of a bunch of message numbers +which represent the contents of a room. A message can exist in more than one +room (for example, a mail message with multiple recipients -- 'single instance +store'). This table is never, ever traversed in its entirety. When you do +any type of read operation, it fetches the msglist for the room you're in +(using the room's ID as the index key) and then you can go ahead and read +those messages one by one. + + Each room is basically just a list of message numbers. Each time we enter a new message in a room, its message number is appended to the end of the list. If an old message is to be expired, we must delete it from the message base. Reading a room is just a matter of looking up the messages one by one and sending them to the client for display, printing, or whatever. - Implementing the "new message" function is also trivial in principle: -we just keep track, for each caller in the userlog, of the highest-numbered -message which existed on the *last* call. (Remember, message numbers are -simply assigned sequentially each time a message is created. This -sequence is global to the entire system, not local within a room.) If -we ignore all message-numbers in the room less than this, only new messages -will be printed. Voila! + + VISIT + ----- + + This is the tough one. Put on your thinking cap and grab a fresh cup of +coffee before attempting to grok the visit table. + + This table contains records which establish the relationship between users +and rooms. Its index is a hash of the user and room combination in question. +When looking for such a relationship, the record in this table can tell the +server things like "this user has zapped this room," "this user has access to +this private room," etc. It's also where we keep track of which messages +the user has marked as "old" and which are "new" (which are not necessarily +contiguous; contrast with older Citadel implementations which simply kept a +"last read" pointer). + + Here's what the records look like: + +struct visit { + long v_roomnum; + long v_roomgen; + long v_usernum; + long v_lastseen; + unsigned int v_flags; + char v_seen[SIZ]; + int v_view; +}; + +#define V_FORGET 1 /* User has zapped this room */ +#define V_LOCKOUT 2 /* User is locked out of this room */ +#define V_ACCESS 4 /* Access is granted to this room */ + + This table is indexed by a concatenation of the first three fields. Whenever +we want to learn the relationship between a user and a room, we feed that +data to a function which looks up the corresponding record. The record is +designed in such a way that an "all zeroes" record (which is what you get if +the record isn't found) represents the default relationship. + + With this data, we now know which private rooms we're allowed to visit: if +the V_ACCESS bit is set, the room is one which the user knows, and it may +appear in his/her known rooms list. Conversely, we also know which rooms the +user has zapped: if the V_FORGET flag is set, we relegate the room to the +zapped list and don't bring it up during new message searches. It's also +worth noting that the V_LOCKOUT flag works in a similar way to administratively +lock users out of rooms. + + Implementing the "cause all users to forget room" command, then, becomes very +simple: we simply change the generation number of the room by putting a new +timestamp in the QRgen field. This causes all relevant visit records to +become irrelevant, because they appear to point to a different room. At the +same time, we don't lose the messages in the room, because the msglists table +is indexed by the room number (QRnumber), which never changes. + + v_seen contains a string which represents the set of messages in this room +which the user has read (marked as 'seen' or 'old'). It follows the same +syntax used by IMAP and NNTP. When we search for new messages, we simply +return any messages that are in the room that are *not* represented by this +set. Naturally, when we do want to mark more messages as seen (or unmark +them), we change this string. Citadel BBS client implementations are naive +and think linearly in terms of "everything is old up to this point," but IMAP +clients want to have more granularity. + + + DIRECTORY + --------- + + This table simply maps Internet e-mail addresses to Citadel network addresses +for quick lookup. It is generated from data in the Global Address Book room. + + + USETABLE + -------- + This table keeps track of message ID's of messages arriving over a network, +to prevent duplicates from being posted if someone misconfigures the network +and a loop is created. This table goes unused on a non-networked Citadel. + + MSGMAIN + ------- + + This is where all message text is stored. It's indexed by message number: +give it a number, get back a message. Messages are numbered sequentially, and +the message numbers are never reused. + We also keep a "metadata" record for each message. This record is also stored +in the msgmain table, using the index (0 - msgnum). We keep in the metadata +record, among other things, a reference count for each message. Since a +message may exist in more than one room, it's important to keep this reference +count up to date, and to delete the message from disk when the reference count +reaches zero. - Message format on disk (MSGMAIN) + Here's the format for the message itself: - As discussed above, each message begins with an FF byte. + Each message begins with an 0xFF 'start of message' byte. The next byte denotes whether this is an anonymous message. The codes available are MES_NORMAL, MES_ANON, or MES_AN2 (defined in citadel.h). @@ -144,6 +309,10 @@ order. But the leading fields and the message text must remain in the same place. The H field looks better when it is placed immediately after the N field. + + + + Networking Citadel nodes network by sharing one or more rooms. Any Citadel node @@ -179,6 +348,8 @@ purposes is precisely that used on disk, serialized. The current distribution includes serv_network.c, which is basically a database replicator; please see network.txt on its operation and functionality (if any). + + Portability issues Citadel/UX is 64-bit clean, architecture-independent, and Year 2000 @@ -190,254 +361,8 @@ try to port it to non-POSIX systems as well. build ok on non-POSIX systems with porting libraries (such as Cygwin). - "Room" records (quickroom) - -The rooms are basically indices into msgmain, the message database. -As noted in the overview, each is essentially an array of pointers into -the message file. The pointers consist of a 32-bit message ID number -(we will wrap around at 32 bits for these purposes). - -Since messages are numbered sequentially, the -set of messages existing in msgmain will always form a continuous -sequence at any given time. - -That should be enough background to tackle a full-scale room. From citadel.h: -struct quickroom { - char QRname[20]; /* Max. len is 19, plus null term */ - char QRpasswd[10]; /* Only valid if it's a private rm */ - long QRroomaide; /* User number of room aide */ - long QRhighest; /* Highest message NUMBER in room */ - long QRgen; /* Generation number of room */ - unsigned QRflags; /* See flag values below */ - char QRdirname[15]; /* Directory name, if applicable */ - char QRfloor; /* (not yet implemented) */ - }; - -#define QR_BUSY 1 /* Room is being updated, WAIT */ -#define QR_INUSE 2 /* Set if in use, clear if avail */ -#define QR_PRIVATE 4 /* Set for any type of private room */ -#define QR_PASSWORDED 8 /* Set if there's a password too */ -#define QR_GUESSNAME 16 /* Set if it's a guessname room */ -#define QR_DIRECTORY 32 /* Directory room */ -#define QR_UPLOAD 64 /* Allowed to upload */ -#define QR_DOWNLOAD 128 /* Allowed to download */ -#define QR_VISDIR 256 /* Visible directory */ -#define QR_ANONONLY 512 /* Anonymous-Only room */ -#define QR_ANON2 1024 /* Anonymous-Option room */ -#define QR_NETWORK 2048 /* Shared network room */ -#define QR_PREFONLY 4096 /* Preferred users only */ - -[Note that all components start with "QR" for quickroom, to make sure we - don't accidentally use an offset in the wrong structure. Be very careful - also to get a meaningful sequence of components -- - some C compilers don't check this sort of stuff either.] - -QRgen handles the problem of rooms which have died and been reborn -under another name. This will be clearer when we get to the userlog. -For now, just note that each room has a generation number which is -bumped by one each time it is recycled. - -QRflags is just a bag of bits recording the status of the room. The -defined bits are: - -QR_BUSY This is to insure that two processes don't update the same - record at the same time, even though this hasn't been - implemented yet. -QR_INUSE 1 if the room is valid, 0 if it is free for re-assignment. -QR_PRIVATE 1 if the room is not visible by default, 0 for public. -QR_PASSWORDED 1 if entry to the room requires a password. -QR_GUESSNAME 1 if the room can be reached by guessing the name. -QR_DIRECTORY 1 if the room is a window onto some disk/userspace, else 0. -QR_UPLOAD 1 if users can upload into this room, else 0. -QR_DOWNLOAD 1 if users can download from this room, else 0. -QR_VISDIR 1 if users are allowed to read the directory, else 0. -QR_ANONONLY 1 if all messages are to recieve the "****" anon header. -QR_ANON2 1 if the user will be asked if he/she wants an anon message. -QR_NETWORK 1 if this room is shared on a network, else 0. -QR_PREFONLY 1 if the room is only accessible to preferred users, else 0. - -QRname is just an ASCII string (null-terminated, like all strings) -giving the name of the room. - -QRdirname is meaningful only in QR_DIRECTORY rooms, in which case -it gives the directory name to window. - -QRpasswd is the room's password, if it's a QR_PASSWORDED room. Note that -if QR_PASSWORDED or QR_GUESSNAME are set, you MUST also set QR_PRIVATE. -QR_PRIVATE by itself designates invitation-only. Do not EVER set all three -flags at the same time. - -QRroomaide is the user number of the room's room-aide (or zero if the room -doesn't have a room aide). Note that if a user is deleted, his/her user number -is never used again, so you don't have to worry about a new user getting the -same user number and accidentally becoming a room-aide of one or more rooms. - -The only field new to us in quickroom is QRhighest, recording the -most recent message in the room. When we are searching for rooms with -messages a given caller hasn't seen, we can check this number -and avoid a whole lot of extra disk accesses. - - There used to also be a structure called "fullroom" which resided in one -file for each room on the system. This has been abandoned in favour of -"message lists" which are variable sized and simply contain zero or more -message numbers. The message numbers, in turn, point to messages on disk. - - User records (usersupp) - -This is the fun one. Get some fresh air and plug in your thinking cap -first. (Time, space and complexity are the eternal software rivals. -We've got lots of log entries times lots of messages spread over up to nnn -rooms to worry about, and with multitasking, disk access time is important... -so perforce, we opt for complexity to keep time and space in bounds.) - -To understand what is happening in the log code takes a little persistence. -You also have to disentangle the different activities going on and -tackle them one by one. - - o We want to remember some random things such as terminal screen - size, and automatically set them up for each caller at login. - - o We want to be able to locate all new messages, and only new - messages, efficiently. Messages should stay new even if it - takes a caller a couple of calls to get around to them. - - o We want to remember which private rooms a given caller knows - about, and treat them as normal rooms. This means mostly - automatically seeking out those with new messages. (Obviously, - we >don't< want to do this for unknown private rooms!) This - has to be secure against the periodic recycling of rooms - between calls. - - o We want to support private mail to a caller. - - o We want to provide some protection of this information (via - passwords at login) and some assurance that messages are from - who they purport to be from (within the system -- one shouldn't - be able to forge messages from established users). - -Lifting another page from citadel.h gives us: - -struct usersupp { /* User record */ - int USuid; /* uid account is logged in under */ - char password[20]; /* password */ - long lastseen[MAXROOMS]; /* Last message seen in each room */ - char generation[MAXROOMS]; /* Generation # (for private rooms) */ - char forget[MAXROOMS]; /* Forgotten generation number */ - unsigned flags; /* See US_ flags below */ - int screenwidth; /* For formatting messages */ - int timescalled; /* Total number of logins */ - int posted; /* Number of messages posted (ever) */ - char fullname[26]; /* Bulletin Board name for messages */ - char axlevel; /* Access level */ - long usernum; /* Eternal user number */ - long lastcall; /* Last time the user called */ - }; - -#define US_PERM 1 /* Permanent user; don't scroll off */ -#define US_LASTOLD 16 /* Print last old message with new */ -#define US_EXPERT 32 /* Experienced user */ -#define US_UNLISTED 64 /* Unlisted userlog entry */ -#define US_NOPROMPT 128 /* Don't prompt after each message */ -#define US_PREF 1024 /* Preferred user */ - -Looks simple enough, doesn't it? One topic at a time: - - Random configuration parameters: --screenwidth is the caller's screen width. We format all messages to this -width, as best we can. flags is another bit-bag, recording whether we want -prompts, people who want to suppress the little automatic hints all through -the system, etc. - - Attachments, names & numbers: --USuid is the uid the account was established under. For most users it will -be the same as BBSUID, but it won't be for users that logged in from the shell. --fullname is the user's full login name. --usernum is the user's ID number. It is unique to the entire system: -once someone has a user number, it is never used again after the user is -deleted. This allows an easy way to numerically represent people. --password is the user's password. --axlevel is the user's access level, so we know who's an Aide, who's a problem -user, etc. These are defined and listed in the system. - - Feeping Creatures: --timescalled is the number of times the user has called. --posted is the number of messages the user has posted, public or private. - - Misc stuff: --lastcall holds the date and time (standard Unix format) the user called, so -we can purge people who haven't called in a given amount of time. - - Finding new messages: -This is the most important. Thus, it winds up being the most -elaborate. Conceptually, what we would like to do is mark each -message with a bit after our caller has read it, so we can avoid -printing it out again next call. Unfortunately, with lots of user -entries this would require adding lots of bits to each message... and -we'd wind up reading off disk lots of messages which would never -get printed. So we resort to approximation and a small table. - -The approximation comes in doing things at the granularity of -rooms rather than messages. Messages in a given room are "new" -until we visit it, and "old" after we leave the room... whether -we read any of them or not. This can actually be defended: anyone -who passes through a room without reading the contents probably just -isn't interested in the topic, and would just as soon not be dragged -back every visit and forced to read them. Given that messages are -numbered sequentially, we can simply record the most recent message ID# -of each room as of the last time we visited it. Very simple. - -Putting it all together, we can now compute whether a given room -has new messages for our current caller without going to the message base -index (fullroom) at all: - - > We get the usersupp.lastseen[] for the room in question - > We compare this with the room's quickroom.QRhighest, which tells us - what the most recent message in the room is currently. - - - REMEMBERING WHICH PRIVATE ROOMS TO VISIT - -This looks trivial at first glance -- just record one bit per room per -caller in the log records. The problem is that rooms get recycled -periodically, and we'd rather not run through all the log entries each -time we do it. So we adopt a kludge which should work 99% of the time. - -As previously noted, each room has a generation number, which is bumped -by one each time it is recycled. As not noted, this generation number -runs from 0 -> 127 (and then wraps around and starts over). - When someone visits a room, we set usersupp.generation for the room -equal to that of the room. This flags the room as being available. -If the room gets recycled, on our next visit the two generation numbers -will no longer match, and the room will no longer be available -- just -the result we're looking for. (Naturally, if a room is public, -all this stuff is irrelevant.) - -This leaves only the problem of an accidental matchup between the two -numbers giving someone access to a Forbidden Room. We can't eliminate -this danger completely, but it can be reduced to insignificance for -most purposes. (Just don't bet megabucks on the security of this system!) -Each time someone logs in, we set all "wrong" generation numbers to -1. -So the room must be recycled 127 times before an accidental matchup -can be achieved. (We do this for all rooms, INUSE or dead, public -or private, since any of them may be reincarnated as a Forbidden Room.) - -Thus, for someone to accidentally be led to a Forbidden Room, they -must establish an account on the system, then not call until some room -has been recycled 127 to 128 times, which room must be -reincarnated as a Forbidden Room, which someone must now call back -(having not scrolled off the userlog in the mean time) and read new -messages. The last clause is about the only probable one in the sequence. -The danger of this is much less than the danger that someone will -simply guess the name of the room outright (if it's a guess-name room) -or some other human loophole. - - FORGOTTEN ROOMS - - This is exactly the opposite of private rooms. When a user chooses to -forget a room, we put the room's generation number in usersupp.forget for -that room. When doing a nown rooms list or a oto, any matchups cause -the room to be skipped. Very simple. + SUPPORTING PRIVATE MAIL @@ -448,7 +373,9 @@ behaves pretty much as any other room. To make this work, we have a separate Mail> room for each user behind the scenes. The actual room name in the database looks like "0000001234.Mail" (where '1234' is the user number) and it's flagged with the QR_MAILBOX flag. The user number is -stripped off by the server before the name is presented to the client. +stripped off by the server before the name is presented to the client. This +provides the ability to give each user a separate namespace for mailboxes +and personal rooms. This requires a little fiddling to get things just right. For example, make_message() has to be kludged to ask for the name of the recipient @@ -457,6 +384,7 @@ it works pretty well, keeping the code and user interface simple and regular. + PASSWORDS AND NAME VALIDATION This has changed a couple of times over the course of Citadel's history. At -- 2.30.2