citadel/techdoc/hack.txt

   1  hack.doc for Citadel/UX
   2  written by Art Cancro (ajc@uncnsrd.mt-kisco.ny.us)
   3
   4    Much of this document is borrowed from the original hack.doc from
   5 Citadel-CP/M and Citadel-86, because many of the concepts are the same.  Hats
   6 off to whoever wrote the original, for a fine document that inspired the
   7 implementation of Citadel for Unix.
   8
   9    Note that this document is really out of date.  It doesn't cover anything
  10 about the threaded server architecture or any of the network stuff.  What is
  11 covered here is the basic architecture of the databases.
  12
  13    But enough of the preamble.  Here's how Citadel/UX works :)
  14
  15    Here are the major databases to be discussed:
  16
  17   msgmain         The big circular file that contains message text
  18   quickroom       Contains room info such as room names, stats, etc.
  19   fullroom        One fullrm file per room: message numbers and pointers.
  20   usersupp        Contains info for each user on the system.
  21
  22    The fundamental structure of the system differs greatly from the way
  23 Citadels used to work.  Citadel now depends on a record manager or database
  24 manager of some sort.  Thanks to the API which is in place for connecting to
  25 a data store, any record manager may be used as long as it supports the
  26 storage and retrieval of large binary objects (blobs) indexed by unique keys.
  27 Please see database.c for more information on data store primitives.
  28
  29    The message base (msgmain) is a big file of messages indexed by the message
  30 number.  Messages are numbered consecutively and start with an FF (hex)
  31 byte.  Except for this FF start-of-message byte, all bytes in the message
  32 file have the high bit set to 0.  This means that in principle it is
  33 trivial to scan through the message file and locate message N if it
  34 exists, or return error.  (Complexities, as usual, crop up when we
  35 try for efficiency...)
  36
  37     Each room is basically just a list of message numbers.  Each time
  38 we enter a new message in a room, we slide all the old message-numbers
  39 down a slot, and probably the oldest one falls off the bottom (in which case
  40 we must delete it from the message base).  Reading a rooms is just a matter
  41 of looking up the messages one by one and sending them to the client for
  42 display, printing, or whatever.
  43
  44     Implementing the "new message" function is also trivial in principle:
  45 we just keep track, for each caller in the userlog, of the highest-numbered
  46 message which existed on the >last< call.  (Remember, message numbers are
  47 simply assigned sequentially each time a message is created.  This
  48 sequence is global to the entire system, not local within a room.)  If
  49 we ignore all message-numbers in the room less than this, only new messages
  50 will be printed.  Voila!
  51
  52                 message format on disk  (msgmain)
  53
  54   Each message begins with an FF byte. The next byte will then be MES_NORMAL,
  55 MES_ANON, or MES_ANON2, depending on whether the message in anonymous or not.
  56 The next byte is either a 0 or 1. If it is 0, the message will be printed
  57 with the Citadel formatter.  If it is a 1, the
  58 message is printed directly to the screen, as is.  External editors generate
  59 this type of message.  After these three opening bytes, the remainder of
  60 the message consists of a sequence of character strings.  Each string
  61 begins with a type byte indicating the meaning of the string and is
  62 ended with a null.  All strings are printable ASCII: in particular,
  63 all numbers are in ASCII rather than binary.  This is for simplicity,
  64 both in implementing the system and in implementing other code to
  65 work with the system.  For instance, a database driven off Citadel archives
  66 can do wildcard matching without worrying about unpacking binary data such
  67 as message ID's first.  To provide later downward compatability
  68 all software should be written to IGNORE fields not currently defined.
  69
  70                   The type bytes currently defined are:
  71
  72 BYTE    Mnemonic        Comments
  73
  74 T       Date/Time       A 32-bit integer containing the date and time of
  75                         the message in standard UNIX format (the number
  76                         of seconds since January 1, 1970 GMT).
  77 P       Path            Complete path of message, as in the UseNet news
  78                         standard.  A user should be able to send UUCP mail to
  79                         this path. (Note that your system name will not be
  80                         tacked onto this until you're sending the message to
  81                         someone else)
  82 I       ID on orig      A 32-bit integer containing the message ID on the
  83                         system the message *originated* on.
  84 #       ID on local     A 32-bit integer containing the message ID on the
  85                         system the message is *currently* on (obviously this
  86                         is meaningless for a message being transmitted over
  87                         a network).
  88 A       Author          Name of originator of message.
  89 R       Recipient       Only present in Mail messages.
  90 O       Room            Room of origin.
  91 N       Nodename        Contains node name of system message originated on.
  92 H       HumanNodeName   Contains human name of system message originated on.
  93 D       Destination     Contains name of the system this message should
  94                         be sent to, for mail routing (private mail only).
  95 U       Subject         Optional.  Developers may choose whether they wish to
  96                         generate or display subject fields.  Citadel/UX does
  97                         not generate them, but it does print them when found.
  98 B       Phone number    The dialup number of the system this message
  99                         originated on.  This is optional, and is only
 100                         defined for helping implement C86Net gateways.
 101 G       Gateway domain  This field is provided solely for the implementation
 102                         of C86Net gateways, and holds the C86Net domain of
 103                         the system this message originated on.  Unless you're
 104                         implementing such a gateway, there's no need to even
 105                         bother with this field.
 106 S       Special field   Only meaningful for messages being spooled over a
 107                         network.  Usually means that the message isn't really
 108                         a message, but rather some other network function:
 109                         -> "S" followed by "FILE" (followed by a null, of
 110                         course) means that the message text is actually an
 111                         IGnet/Open file transfer.
 112 M       Message Text    Normal ASCII, newlines seperated by CR's or LF's,
 113                         null terminated as always.
 114
 115                         EXAMPLE
 116
 117 Let <FF> be a 0xFF byte, and <0> be a null (0x00) byte.  Then a message
 118 which prints as...
 119
 120 Apr 12, 1988 23:16 From Test User In Network Test> @lifesys (Life BBS)
 121 Have a nice day!
 122
 123  might be stored as...
 124 <FF><40><0>I12345<0>Pneighbor!lifesys!test_user<0>T576918988<0>    (continued)
 125 -----------|Mesg ID#|--Message Path---------------|--Date------
 126
 127 AThe Test User<0>ONetwork Test<0>Nlifesys<0>HLife BBS<0>MHave a nice day!<0>
 128 |-----Author-----|-Room name-----|-nodename-|Human Name-|--Message text-----
 129
 130  Weird things can happen if fields are missing, especially if you use the
 131 networker.  But basically, the date, author, room, and nodename may be in any
 132 order.  But the leading fields and the message text must remain in the same
 133 place.  The H field looks better when it is placed immediately after the N
 134 field.
 135
 136                             Networking
 137
 138 Citadel nodes network by sharing one or more rooms. Any Citadel node
 139 can choose to share messages with any other Citadel node, through the sending
 140 of spool files.  The sending system takes all messages it hasn't sent yet, and
 141 spools them to the recieving system, which posts them in the rooms.
 142
 143 Complexities arise primarily from the possibility of densely connected
 144 networks: one does not wish to accumulate multiple copies of a given
 145 message, which can easily happen.  Nor does one want to see old messages
 146 percolating indefinitely through the system.
 147
 148 This problem is handled by keeping track of the path a message has taken over
 149 the network, like the UseNet news system does.  When a system sends out a
 150 message, it adds its own name to the bang-path in the <P> field of the
 151 message.  If no path field is present, it generates one.
 152
 153 With the path present, all the networker has to do to assure that it doesn't
 154 send another system a message it's already received is check the <P>ath field
 155 for that system's name somewhere in the bang path.  If it's present, the system
 156 has already seen the message, so we don't send it.  (Note that the current
 157 implementation does not allow for "loops" in the network -- if you build your
 158 net this way you will see lots of duplicate messages.)
 159
 160 The above discussion should make the function of the fields reasonably clear:
 161
 162  o  Travelling messages need to carry original message-id, system of origin,
 163     date of origin, author, and path with them, to keep reproduction and
 164     cycling under control.
 165
 166 (Uncoincidentally) the format used to transmit messages for networking
 167 purposes is precisely that used on disk, except that there may be any amount
 168 of garbage between the null ending a message and the <FF> starting the next
 169 one.  This allows greater compatibility if slight problems crop up. The current
 170 distribution includes netproc.c, which is basically a database replicator;
 171 please see network.txt on its operation and functionality (if any).
 172
 173                         portability problems
 174
 175  At this point, most hardware-dependent stuff has been removed from the
 176 system.  On the server side, most of the OS-dependent stuff has been isolated
 177 into the sysdep.c source module.  The server should compile on any POSIX
 178 compliant system with a full pthreads implementation and TCP/IP support.  In
 179 the future, we may try to port it to non-POSIX systems as well.
 180
 181  On the client side, it's also POSIX compliant.  The client even seems to
 182 build ok on non-POSIX systems with porting libraries (such as the Cygnus
 183 Win32 stuff).
 184
 185
 186                    "Room" records (quickroom/fullroom)
 187
 188 The rooms are basically indices into msgmain, the message database.
 189 As noted in the overview, each is essentially an array of pointers into
 190 the message file.  The pointers consist of a 32-bit message ID number
 191 (we will wrap around at 32 bits for these purposes).
 192
 193 Since messages are numbered sequentially, the
 194 set of messages existing in msgmain will always form a continuous
 195 sequence at any given time.
 196
 197 That should be enough background to tackle a full-scale room.  From citadel.h:
 198
 199 STRUCT QUickroom {
 200         char QRname[20];                /* Max. len is 19, plus null term   */
 201         char QRpasswd[10];              /* Only valid if it's a private rm  */
 202         long QRroomaide;                /* User number of room aide         */
 203         long QRhighest;                 /* Highest message NUMBER in room   */
 204         char QRgen;                     /* Generation number of room        */
 205         unsigned QRflags;               /* See flag values below            */
 206         char QRdirname[15];             /* Directory name, if applicable    */
 207         char QRfloor;                   /* (not yet implemented)            */
 208                 };
 209
 210 #define QR_BUSY         1               /* Room is being updated, WAIT      */
 211 #define QR_INUSE        2               /* Set if in use, clear if avail    */
 212 #define QR_PRIVATE      4               /* Set for any type of private room */
 213 #define QR_PASSWORDED   8               /* Set if there's a password too    */
 214 #define QR_GUESSNAME    16              /* Set if it's a guessname room     */
 215 #define QR_DIRECTORY    32              /* Directory room                   */
 216 #define QR_UPLOAD       64              /* Allowed to upload                */
 217 #define QR_DOWNLOAD     128             /* Allowed to download              */
 218 #define QR_VISDIR       256             /* Visible directory                */
 219 #define QR_ANONONLY     512             /* Anonymous-Only room              */
 220 #define QR_ANON2        1024            /* Anonymous-Option room            */
 221 #define QR_NETWORK      2048            /* Shared network room              */
 222 #define QR_PREFONLY     4096            /* Preferred users only             */
 223
 224 struct fullroom {
 225         long FRnum[MSGSPERRM];          /* Message NUMBERS                  */
 226                 };
 227
 228 [Note that all components start with "QR" for quickroom, to make sure we
 229  don't accidentally use an offset in the wrong structure. Be very careful
 230  also to get a meaningful sequence of components --
 231  some C compilers don't check this sort of stuff either.]
 232
 233 QRgen handles the problem of rooms which have died and been reborn
 234 under another name.  This will be clearer when we get to the userlog.
 235 For now, just note that each room has a generation number which is
 236 bumped by one each time it is recycled.
 237
 238 QRflags is just a bag of bits recording the status of the room.  The
 239 defined bits are:
 240
 241 QR_BUSY         This is to insure that two processes don't update the same
 242                 record at the same time, even though this hasn't been
 243                 implemented yet.
 244 QR_INUSE        1 if the room is valid, 0 if it is free for re-assignment.
 245 QR_PRIVATE      1 if the room is not visible by default, 0 for public.
 246 QR_PASSWORDED   1 if entry to the room requires a password.
 247 QR_GUESSNAME    1 if the room can be reached by guessing the name.
 248 QR_DIRECTORY    1 if the room is a window onto some disk/userspace, else 0.
 249 QR_UPLOAD       1 if users can upload into this room, else 0.
 250 QR_DOWNLOAD     1 if users can download from this room, else 0.
 251 QR_VISDIR       1 if users are allowed to read the directory, else 0.
 252 QR_ANONONLY     1 if all messages are to recieve the "****" anon header.
 253 QR_ANON2        1 if the user will be asked if he/she wants an anon message.
 254 QR_NETWORK      1 if this room is shared on a network, else 0.
 255 QR_PREFONLY     1 if the room is only accessible to preferred users, else 0.
 256
 257 QRname is just an ASCII string (null-terminated, like all strings)
 258 giving the name of the room.
 259
 260 QRdirname is meaningful only in QR_DIRECTORY rooms, in which case
 261 it gives the directory name to window.
 262
 263 QRpasswd is the room's password, if it's a QR_PASSWORDED room. Note that
 264 if QR_PASSWORDED or QR_GUESSNAME are set, you MUST also set QR_PRIVATE.
 265 QR_PRIVATE by itself designates invitation-only. Do not EVER set all three
 266 flags at the same time.
 267
 268 QRroomaide is the user number of the room's room-aide (or zero if the room
 269 doesn't have a room aide). Note that if a user is deleted, his/her user number
 270 is never used again, so you don't have to worry about a new user getting the
 271 same user number and accidentally becoming a room-aide of one or more rooms.
 272
 273 The only field new to us in quickroom is QRhighest, recording the
 274 most recent message in the room.  When we are searching for rooms with
 275 messages a given caller hasn't seen, we can check this number
 276 and avoid a whole lot of extra disk accesses.
 277
 278    The fullroom is the array of pointers into the message file. We keep one
 279 file for each fullroom array to keep the quickroom file small (and access time
 280 efficient). FRnum are the message numbers on disk of
 281 each message in the room. (For NIL, we stick zeroes in both fields.)
 282
 283                         user records (usersupp)
 284
 285 This is the fun one.  Get some fresh air and plug in your thinking cap
 286 first.  (Time, space and complexity are the usernum software rivals.
 287 We've got lots of log entries times lots of messages spread over up to nnn
 288 rooms to worry about, and with multitasking, disk access time is important...
 289 so perforce, we opt for complexity to keep time and space in bounds.)
 290
 291 To understand what is happening in the log code takes a little persistence.
 292 You also have to disentangle the different activities going on and
 293 tackle them one by one.
 294
 295  o      We want to remember some random things such as terminal screen
 296         size, and automatically set them up for each caller at login.
 297
 298  o      We want to be able to locate all new messages, and only new
 299         messages, efficiently.  Messages should stay new even if it
 300         takes a caller a couple of calls to get around to them.
 301
 302  o      We want to remember which private rooms a given caller knows
 303         about, and treat them as normal rooms.  This means mostly
 304         automatically seeking out those with new messages.  (Obviously,
 305         we >don't< want to do this for unknown private rooms!)  This
 306         has to be secure against the periodic recycling of rooms
 307         between calls.
 308
 309  o      We want to support private mail to a caller.
 310
 311  o      We want to provide some protection of this information (via
 312         passwords at login) and some assurance that messages are from
 313         who they purport to be from (within the system -- one shouldn't
 314         be able to forge messages from established users).
 315
 316 Lifting another page from citadel.h gives us:
 317
 318 struct usersupp {                       /* User record                      */
 319         int USuid;                      /* uid account is logged in under   */
 320         char password[20];              /* password (for BBS-only users)    */
 321         long lastseen[MAXROOMS];        /* Last message seen in each room   */
 322         char generation[MAXROOMS];      /* Generation # (for private rooms) */
 323         char forget[MAXROOMS];          /* Forgotten generation number      */
 324         long mailnum[MAILSLOTS];        /* Message #'s of each mail message */
 325         long mailpos[MAILSLOTS];        /* Disk positions of each mail      */
 326         unsigned flags;                 /* See US_ flags below              */
 327         int screenwidth;                /* For formatting messages          */
 328         int timescalled;                /* Total number of logins           */
 329         int posted;                     /* Number of messages posted (ever) */
 330         char fullname[26];              /* Bulletin Board name for messages */
 331         char axlevel;                   /* Access level                     */
 332         char spare[3];                  /* spare bytes for future use       */
 333         long usernum;                   /* Eternal user number              */
 334         long lastcall;                  /* Last time the user called        */
 335                                 };
 336
 337 #define US_PERM         1               /* Permanent user; don't scroll off */
 338 #define US_LASTOLD      16              /* Print last old message with new  */
 339 #define US_EXPERT       32              /* Experienced user                 */
 340 #define US_UNLISTED     64              /* Unlisted userlog entry           */
 341 #define US_NOPROMPT     128             /* Don't prompt after each message  */
 342 #define US_PREF         1024            /* Preferred user                   */
 343
 344 Looks simple enough, doesn't it?  One topic at a time:
 345
 346  Random configuration parameters:
 347 -screenwidth is the caller's screen width.  We format all messages to this
 348 width, as best we can. flags is another bit-bag, recording whether we want
 349 prompts, people who want to suppress the little automatic hints all through
 350 the system, etc.
 351
 352   Attachments, names & numbers:
 353 -USuid is the uid the account was established under. For most users it will
 354 be the same as BBSUID, but it won't be for users that logged in from the shell.
 355 -fullname is the user's full login name.
 356 -usernum is the user's ID number.  It is unique to the entire system:
 357 once someone has a user number, it is never used again after the user is
 358 deleted. This allows an easy way to numerically represent people.
 359 -password is the user's password.
 360 -axlevel is the user's access level, so we know who's an Aide, who's a problem
 361 user, etc.  These are defined and listed in the system.
 362
 363   Feeping Creatures:
 364 -timescalled is the number of times the user has called.
 365 -posted is the number of messages the user has posted, public or private.
 366
 367   Misc stuff:
 368 -lastcall holds the date and time (standard Unix format) the user called, so
 369 we can purge people who haven't called in a given amount of time.
 370
 371   Finding new messages:
 372 This is the most important.  Thus, it winds up being the most
 373 elaborate.  Conceptually, what we would like to do is mark each
 374 message with a bit after our caller has read it, so we can avoid
 375 printing it out again next call.  Unfortunately, with lots of user
 376 entries this would require adding lots of bits to each message... and
 377 we'd wind up reading off disk lots of messages which would never
 378 get printed.  So we resort to approximation and a small table.
 379
 380 The approximation comes in doing things at the granularity of
 381 rooms rather than messages.  Messages in a given room are "new"
 382 until we visit it, and "old" after we leave the room... whether
 383 we read any of them or not.  This can actually be defended: anyone
 384 who passes through a room without reading the contents probably just
 385 isn't interested in the topic, and would just as soon not be dragged
 386 back every visit and forced to read them.  Given that messages are
 387 numbered sequentially, we can simply record the most recent message ID#
 388 of each room as of the last time we visited it. Very simple.
 389
 390 Putting it all together, we can now compute whether a given room
 391 has new messages for our current caller without going to the message base
 392 index (fullroom) at all:
 393
 394  > We get the usersupp.lastseen[] for the room in question
 395  > We compare this with the room's quickroom.QRhighest, which tells us
 396    what the most recent message in the room is currently.
 397
 398
 399              REMEMBERING WHICH PRIVATE ROOMS TO VISIT
 400
 401 This looks trivial at first glance -- just record one bit per room per
 402 caller in the log records.  The problem is that rooms get recycled
 403 periodically, and we'd rather not run through all the log entries each
 404 time we do it.  So we adopt a kludge which should work 99% of the time.
 405
 406 As previously noted, each room has a generation number, which is bumped
 407 by one each time it is recycled.  As not noted, this generation number
 408 runs from 0 -> 127 (and then wraps around and starts over).
 409   When someone visits a room, we set usersupp.generation for the room
 410 equal to that of the room.  This flags the room as being available.
 411 If the room gets recycled, on our next visit the two generation numbers
 412 will no longer match, and the room will no longer be available -- just
 413 the result we're looking for.  (Naturally, if a room is public,
 414 all this stuff is irrelevant.)
 415
 416 This leaves only the problem of an accidental matchup between the two
 417 numbers giving someone access to a Forbidden Room.  We can't eliminate
 418 this danger completely, but it can be reduced to insignificance for
 419 most purposes.  (Just don't bet megabucks on the security of this system!)
 420 Each time someone logs in, we set all "wrong" generation numbers to -1.
 421 So the room must be recycled 127 times before an accidental matchup
 422 can be achieved.  (We do this for all rooms, INUSE or dead, public
 423 or private, since any of them may be reincarnated as a Forbidden Room.)
 424
 425 Thus, for someone to accidentally be led to a Forbidden Room, they
 426 must establish an account on the system, then not call until some room
 427 has been recycled 127 to 128 times, which room must be
 428 reincarnated as a Forbidden Room, which someone must now call back
 429 (having not scrolled off the userlog in the mean time) and read new
 430 messages.  The last clause is about the only probable one in the sequence.
 431 The danger of this is much less than the danger that someone will
 432 simply guess the name of the room outright (if it's a guess-name room)
 433 or some other human loophole.
 434
 435                      FORGOTTEN ROOMS
 436
 437   This is exactly the opposite of private rooms. When a user chooses to
 438 forget a room, we put the room's generation number in usersupp.forget for
 439 that room. When doing a <K>nown rooms list or a <G>oto, any matchups cause
 440 the room to be skipped. Very simple.
 441
 442                      SUPPORTING PRIVATE MAIL
 443
 444 Can one have an elegant kludge?  This must come pretty close.
 445
 446 Private mail is sent and recieved in the Mail> room, which otherwise
 447 behaves pretty much as any other room.  To make this work, we store
 448 the actual message pointers in mailnum[] and mailpos[] in the caller's
 449 log record, and then copy them into the Mail> room array whenever we
 450 enter the room.  This requires a little fiddling to get things just
 451 right.  We have to update quickroom[1].QRhighest at login
 452 to reflect the presence or absence of new messages, for example.  And
 453 make_message() has to be kludged to ask for the name of the recipient
 454 of the message whenever a message is entered in Mail>.  But basically
 455 it works pretty well, keeping the code and user interface simple and
 456 regular.
 457
 458
 459                    PASSWORDS AND NAME VALIDATION
 460
 461   This has changed a couple of times over the course of Citadel's history.  At
 462 this point it's very simple, again due to the fact that record managers are
 463 used for everything.    The user file (usersupp) is indexed using the user's
 464 name, converted to all lower-case.  Searching for a user, then, is easy.  We
 465 just lowercase the name we're looking for and query the database.  If no
 466 match is found, it is assumed that the user does not exist.
 467
 468   This makes it difficult to forge messages from an existing user.  (Fine
 469 point: nonprinting characters are converted to printing characters, and
 470 leading, trailing, and double blanks are deleted.)