X-Git-Url: https://code.citadel.org/?a=blobdiff_plain;f=citadel%2Ftechdoc%2Fhack.txt;h=1af927555f80139ae118509743c5f3dd9773da18;hb=HEAD;hp=718797a15db092db32097687710dcec6af541a76;hpb=21445583daf279a856a775604be2ae8ebaddbbb1;p=citadel.git diff --git a/citadel/techdoc/hack.txt b/citadel/techdoc/hack.txt deleted file mode 100644 index 718797a15..000000000 --- a/citadel/techdoc/hack.txt +++ /dev/null @@ -1,464 +0,0 @@ - hack.doc for Citadel/UX - written by Art Cancro (ajc@uncnsrd.mt-kisco.ny.us) - - Much of this document is borrowed from the original hack.doc from -Citadel-CP/M and Citadel-86, because many of the concepts are the same. Hats -off to whoever wrote the original, for a fine document that inspired the -implementation of Citadel for Unix. - - Note that this document is really out of date. It doesn't cover anything -about the threaded server architecture or any of the network stuff. What is -covered here is the basic architecture of the databases. - - But enough of the preamble. Here's how Citadel/UX works :) - - Here are the major databases to be discussed: - - msgmain The big circular file that contains message text - quickroom Contains room info such as room names, stats, etc. - fullroom One fullrm file per room: message numbers and pointers. - usersupp Contains info for each user on the system. - - The fundamental structure of the system differs greatly from the way -Citadels used to work. Citadel now depends on a record manager or database -manager of some sort. Thanks to the API which is in place for connecting to -a data store, any record manager may be used as long as it supports the -storage and retrieval of large binary objects (blobs) indexed by unique keys. -Please see database.c for more information on data store primitives. - - The message base (MSGMAIN) is a big file of messages indexed by the message -number. Messages are numbered consecutively and start with an FF (hex) -byte. Except for this FF start-of-message byte, all bytes in the message -file have the high bit set to 0. This means that in principle it is -trivial to scan through the message file and locate message N if it -exists, or return error. (Complexities, as usual, crop up when we -try for efficiency...) - - Each room is basically just a list of message numbers. Each time -we enter a new message in a room, its message number is appended to the end -of the list. If an old message is to be expired, we must delete it from the -message base. Reading a room is just a matter of looking up the messages -one by one and sending them to the client for display, printing, or whatever. - - Implementing the "new message" function is also trivial in principle: -we just keep track, for each caller in the userlog, of the highest-numbered -message which existed on the >last< call. (Remember, message numbers are -simply assigned sequentially each time a message is created. This -sequence is global to the entire system, not local within a room.) If -we ignore all message-numbers in the room less than this, only new messages -will be printed. Voila! - - Message format on disk (MSGMAIN) - - Each message begins with an FF byte. The next byte will then be MES_NORMAL, -MES_ANON, or MES_ANON2, depending on whether the message in anonymous or not. -The next byte is either a 0 or 1. If it is 0, the message will be printed -with the Citadel formatter. If it is a 1, the -message is printed directly to the screen, as is. External editors generate -this type of message. After these three opening bytes, the remainder of -the message consists of a sequence of character strings. Each string -begins with a type byte indicating the meaning of the string and is -ended with a null. All strings are printable ASCII: in particular, -all numbers are in ASCII rather than binary. This is for simplicity, -both in implementing the system and in implementing other code to -work with the system. For instance, a database driven off Citadel archives -can do wildcard matching without worrying about unpacking binary data such -as message ID's first. To provide later downward compatability -all software should be written to IGNORE fields not currently defined. - - The type bytes currently defined are: - -BYTE Mnemonic Comments - -T Date/Time A 32-bit integer containing the date and time of - the message in standard UNIX format (the number - of seconds since January 1, 1970 GMT). -P Path Complete path of message, as in the UseNet news - standard. A user should be able to send UUCP mail to - this path. (Note that your system name will not be - tacked onto this until you're sending the message to - someone else) -I Original ID A 32-bit integer containing the message ID on the - system the message *originated* on. -# Local ID A 32-bit integer containing the message ID on the - system the message is *currently* on (obviously this - is meaningless for a message being transmitted over - a network). -A Author Name of originator of message. -R Recipient Only present in Mail messages. -O Room Room of origin. -N Nodename Contains node name of system message originated on. -H HumanNodeName Human-readable name of system message originated on. -D Destination Contains name of the system this message should - be sent to, for mail routing (private mail only). -U Subject Optional. Developers may choose whether they wish to - generate or display subject fields. Citadel/UX does - not generate them, but it does print them when found. -B Phone number The dialup number of the system this message - originated on. This is optional, and is only - defined for helping implement C86Net gateways. -G Gateway domain This field is provided solely for the implementation - of C86Net gateways, and holds the C86Net domain of - the system this message originated on. Unless you're - implementing such a gateway, there's no need to even - bother with this field. -S Special field Only meaningful for messages being spooled over a - network. Usually means that the message isn't really - a message, but rather some other network function: - -> "S" followed by "FILE" (followed by a null, of - course) means that the message text is actually an - IGnet/Open file transfer. -M Message Text Normal ASCII, newlines seperated by CR's or LF's, - null terminated as always. -X eXtension field Extension fields are used to carry additional RFC822 - type lines. X fields contain the X byte followed by - the RFC822 field name, a colon, a space, and the value. - - EXAMPLE - -Let be a 0xFF byte, and <0> be a null (0x00) byte. Then a message -which prints as... - -Apr 12, 1988 23:16 From Test User In Network Test> @lifesys (Life BBS) -Have a nice day! - - might be stored as... -<40><0>I12345<0>Pneighbor!lifesys!test_user<0>T576918988<0> (continued) ------------|Mesg ID#|--Message Path---------------|--Date------ - -AThe Test User<0>ONetwork Test<0>Nlifesys<0>HLife BBS<0>MHave a nice day!<0> -|-----Author-----|-Room name-----|-nodename-|Human Name-|--Message text----- - - Weird things can happen if fields are missing, especially if you use the -networker. But basically, the date, author, room, and nodename may be in any -order. But the leading fields and the message text must remain in the same -place. The H field looks better when it is placed immediately after the N -field. - - Networking - -Citadel nodes network by sharing one or more rooms. Any Citadel node -can choose to share messages with any other Citadel node, through the sending -of spool files. The sending system takes all messages it hasn't sent yet, and -spools them to the recieving system, which posts them in the rooms. - -Complexities arise primarily from the possibility of densely connected -networks: one does not wish to accumulate multiple copies of a given -message, which can easily happen. Nor does one want to see old messages -percolating indefinitely through the system. - -This problem is handled by keeping track of the path a message has taken over -the network, like the UseNet news system does. When a system sends out a -message, it adds its own name to the bang-path in the

field of the -message. If no path field is present, it generates one. - -With the path present, all the networker has to do to assure that it doesn't -send another system a message it's already received is check the

ath field -for that system's name somewhere in the bang path. If it's present, the system -has already seen the message, so we don't send it. (Note that the current -implementation does not allow for "loops" in the network -- if you build your -net this way you will see lots of duplicate messages.) - -The above discussion should make the function of the fields reasonably clear: - - o Travelling messages need to carry original message-id, system of origin, - date of origin, author, and path with them, to keep reproduction and - cycling under control. - -(Uncoincidentally) the format used to transmit messages for networking -purposes is precisely that used on disk, except that there may be any amount -of garbage between the null ending a message and the starting the next -one. This allows greater compatibility if slight problems crop up. The current -distribution includes netproc.c, which is basically a database replicator; -please see network.txt on its operation and functionality (if any). - - Portability issues - - At this point, most hardware-dependent stuff has been removed from the -system. On the server side, most of the OS-dependent stuff has been isolated -into the sysdep.c source module. The server should compile on any POSIX -compliant system with a full pthreads implementation and TCP/IP support. In -the future, we may try to port it to non-POSIX systems as well. - - On the client side, it's also POSIX compliant. The client even seems to -build ok on non-POSIX systems with porting libraries (such as the Cygnus -Win32 stuff). - - - "Room" records (quickroom) - -The rooms are basically indices into msgmain, the message database. -As noted in the overview, each is essentially an array of pointers into -the message file. The pointers consist of a 32-bit message ID number -(we will wrap around at 32 bits for these purposes). - -Since messages are numbered sequentially, the -set of messages existing in msgmain will always form a continuous -sequence at any given time. - -That should be enough background to tackle a full-scale room. From citadel.h: - -struct quickroom { - char QRname[20]; /* Max. len is 19, plus null term */ - char QRpasswd[10]; /* Only valid if it's a private rm */ - long QRroomaide; /* User number of room aide */ - long QRhighest; /* Highest message NUMBER in room */ - char QRgen; /* Generation number of room */ - unsigned QRflags; /* See flag values below */ - char QRdirname[15]; /* Directory name, if applicable */ - char QRfloor; /* (not yet implemented) */ - }; - -#define QR_BUSY 1 /* Room is being updated, WAIT */ -#define QR_INUSE 2 /* Set if in use, clear if avail */ -#define QR_PRIVATE 4 /* Set for any type of private room */ -#define QR_PASSWORDED 8 /* Set if there's a password too */ -#define QR_GUESSNAME 16 /* Set if it's a guessname room */ -#define QR_DIRECTORY 32 /* Directory room */ -#define QR_UPLOAD 64 /* Allowed to upload */ -#define QR_DOWNLOAD 128 /* Allowed to download */ -#define QR_VISDIR 256 /* Visible directory */ -#define QR_ANONONLY 512 /* Anonymous-Only room */ -#define QR_ANON2 1024 /* Anonymous-Option room */ -#define QR_NETWORK 2048 /* Shared network room */ -#define QR_PREFONLY 4096 /* Preferred users only */ - -[Note that all components start with "QR" for quickroom, to make sure we - don't accidentally use an offset in the wrong structure. Be very careful - also to get a meaningful sequence of components -- - some C compilers don't check this sort of stuff either.] - -QRgen handles the problem of rooms which have died and been reborn -under another name. This will be clearer when we get to the userlog. -For now, just note that each room has a generation number which is -bumped by one each time it is recycled. - -QRflags is just a bag of bits recording the status of the room. The -defined bits are: - -QR_BUSY This is to insure that two processes don't update the same - record at the same time, even though this hasn't been - implemented yet. -QR_INUSE 1 if the room is valid, 0 if it is free for re-assignment. -QR_PRIVATE 1 if the room is not visible by default, 0 for public. -QR_PASSWORDED 1 if entry to the room requires a password. -QR_GUESSNAME 1 if the room can be reached by guessing the name. -QR_DIRECTORY 1 if the room is a window onto some disk/userspace, else 0. -QR_UPLOAD 1 if users can upload into this room, else 0. -QR_DOWNLOAD 1 if users can download from this room, else 0. -QR_VISDIR 1 if users are allowed to read the directory, else 0. -QR_ANONONLY 1 if all messages are to recieve the "****" anon header. -QR_ANON2 1 if the user will be asked if he/she wants an anon message. -QR_NETWORK 1 if this room is shared on a network, else 0. -QR_PREFONLY 1 if the room is only accessible to preferred users, else 0. - -QRname is just an ASCII string (null-terminated, like all strings) -giving the name of the room. - -QRdirname is meaningful only in QR_DIRECTORY rooms, in which case -it gives the directory name to window. - -QRpasswd is the room's password, if it's a QR_PASSWORDED room. Note that -if QR_PASSWORDED or QR_GUESSNAME are set, you MUST also set QR_PRIVATE. -QR_PRIVATE by itself designates invitation-only. Do not EVER set all three -flags at the same time. - -QRroomaide is the user number of the room's room-aide (or zero if the room -doesn't have a room aide). Note that if a user is deleted, his/her user number -is never used again, so you don't have to worry about a new user getting the -same user number and accidentally becoming a room-aide of one or more rooms. - -The only field new to us in quickroom is QRhighest, recording the -most recent message in the room. When we are searching for rooms with -messages a given caller hasn't seen, we can check this number -and avoid a whole lot of extra disk accesses. - - There used to also be a structure called "fullroom" which resided in one -file for each room on the system. This has been abandoned in favour of -"message lists" which are variable sized and simply contain zero or more -message numbers. The message numbers, in turn, point to messages on disk. - - User records (usersupp) - -This is the fun one. Get some fresh air and plug in your thinking cap -first. (Time, space and complexity are the eternal software rivals. -We've got lots of log entries times lots of messages spread over up to nnn -rooms to worry about, and with multitasking, disk access time is important... -so perforce, we opt for complexity to keep time and space in bounds.) - -To understand what is happening in the log code takes a little persistence. -You also have to disentangle the different activities going on and -tackle them one by one. - - o We want to remember some random things such as terminal screen - size, and automatically set them up for each caller at login. - - o We want to be able to locate all new messages, and only new - messages, efficiently. Messages should stay new even if it - takes a caller a couple of calls to get around to them. - - o We want to remember which private rooms a given caller knows - about, and treat them as normal rooms. This means mostly - automatically seeking out those with new messages. (Obviously, - we >don't< want to do this for unknown private rooms!) This - has to be secure against the periodic recycling of rooms - between calls. - - o We want to support private mail to a caller. - - o We want to provide some protection of this information (via - passwords at login) and some assurance that messages are from - who they purport to be from (within the system -- one shouldn't - be able to forge messages from established users). - -Lifting another page from citadel.h gives us: - -struct usersupp { /* User record */ - int USuid; /* uid account is logged in under */ - char password[20]; /* password */ - long lastseen[MAXROOMS]; /* Last message seen in each room */ - char generation[MAXROOMS]; /* Generation # (for private rooms) */ - char forget[MAXROOMS]; /* Forgotten generation number */ - unsigned flags; /* See US_ flags below */ - int screenwidth; /* For formatting messages */ - int timescalled; /* Total number of logins */ - int posted; /* Number of messages posted (ever) */ - char fullname[26]; /* Bulletin Board name for messages */ - char axlevel; /* Access level */ - long usernum; /* Eternal user number */ - long lastcall; /* Last time the user called */ - }; - -#define US_PERM 1 /* Permanent user; don't scroll off */ -#define US_LASTOLD 16 /* Print last old message with new */ -#define US_EXPERT 32 /* Experienced user */ -#define US_UNLISTED 64 /* Unlisted userlog entry */ -#define US_NOPROMPT 128 /* Don't prompt after each message */ -#define US_PREF 1024 /* Preferred user */ - -Looks simple enough, doesn't it? One topic at a time: - - Random configuration parameters: --screenwidth is the caller's screen width. We format all messages to this -width, as best we can. flags is another bit-bag, recording whether we want -prompts, people who want to suppress the little automatic hints all through -the system, etc. - - Attachments, names & numbers: --USuid is the uid the account was established under. For most users it will -be the same as BBSUID, but it won't be for users that logged in from the shell. --fullname is the user's full login name. --usernum is the user's ID number. It is unique to the entire system: -once someone has a user number, it is never used again after the user is -deleted. This allows an easy way to numerically represent people. --password is the user's password. --axlevel is the user's access level, so we know who's an Aide, who's a problem -user, etc. These are defined and listed in the system. - - Feeping Creatures: --timescalled is the number of times the user has called. --posted is the number of messages the user has posted, public or private. - - Misc stuff: --lastcall holds the date and time (standard Unix format) the user called, so -we can purge people who haven't called in a given amount of time. - - Finding new messages: -This is the most important. Thus, it winds up being the most -elaborate. Conceptually, what we would like to do is mark each -message with a bit after our caller has read it, so we can avoid -printing it out again next call. Unfortunately, with lots of user -entries this would require adding lots of bits to each message... and -we'd wind up reading off disk lots of messages which would never -get printed. So we resort to approximation and a small table. - -The approximation comes in doing things at the granularity of -rooms rather than messages. Messages in a given room are "new" -until we visit it, and "old" after we leave the room... whether -we read any of them or not. This can actually be defended: anyone -who passes through a room without reading the contents probably just -isn't interested in the topic, and would just as soon not be dragged -back every visit and forced to read them. Given that messages are -numbered sequentially, we can simply record the most recent message ID# -of each room as of the last time we visited it. Very simple. - -Putting it all together, we can now compute whether a given room -has new messages for our current caller without going to the message base -index (fullroom) at all: - - > We get the usersupp.lastseen[] for the room in question - > We compare this with the room's quickroom.QRhighest, which tells us - what the most recent message in the room is currently. - - - REMEMBERING WHICH PRIVATE ROOMS TO VISIT - -This looks trivial at first glance -- just record one bit per room per -caller in the log records. The problem is that rooms get recycled -periodically, and we'd rather not run through all the log entries each -time we do it. So we adopt a kludge which should work 99% of the time. - -As previously noted, each room has a generation number, which is bumped -by one each time it is recycled. As not noted, this generation number -runs from 0 -> 127 (and then wraps around and starts over). - When someone visits a room, we set usersupp.generation for the room -equal to that of the room. This flags the room as being available. -If the room gets recycled, on our next visit the two generation numbers -will no longer match, and the room will no longer be available -- just -the result we're looking for. (Naturally, if a room is public, -all this stuff is irrelevant.) - -This leaves only the problem of an accidental matchup between the two -numbers giving someone access to a Forbidden Room. We can't eliminate -this danger completely, but it can be reduced to insignificance for -most purposes. (Just don't bet megabucks on the security of this system!) -Each time someone logs in, we set all "wrong" generation numbers to -1. -So the room must be recycled 127 times before an accidental matchup -can be achieved. (We do this for all rooms, INUSE or dead, public -or private, since any of them may be reincarnated as a Forbidden Room.) - -Thus, for someone to accidentally be led to a Forbidden Room, they -must establish an account on the system, then not call until some room -has been recycled 127 to 128 times, which room must be -reincarnated as a Forbidden Room, which someone must now call back -(having not scrolled off the userlog in the mean time) and read new -messages. The last clause is about the only probable one in the sequence. -The danger of this is much less than the danger that someone will -simply guess the name of the room outright (if it's a guess-name room) -or some other human loophole. - - FORGOTTEN ROOMS - - This is exactly the opposite of private rooms. When a user chooses to -forget a room, we put the room's generation number in usersupp.forget for -that room. When doing a nown rooms list or a oto, any matchups cause -the room to be skipped. Very simple. - - SUPPORTING PRIVATE MAIL - -Can one have an elegant kludge? This must come pretty close. - -Private mail is sent and recieved in the Mail> room, which otherwise -behaves pretty much as any other room. To make this work, we store -the actual message pointers in a message list for each user (MAILBOXES) -instead of MSGLISTS. This requires a little fiddling to get things just -right. We have to update quickroom[1].QRhighest at login -to reflect the presence or absence of new messages, for example. And -make_message() has to be kludged to ask for the name of the recipient -of the message whenever a message is entered in Mail>. But basically -it works pretty well, keeping the code and user interface simple and -regular. - - - PASSWORDS AND NAME VALIDATION - - This has changed a couple of times over the course of Citadel's history. At -this point it's very simple, again due to the fact that record managers are -used for everything. The user file (usersupp) is indexed using the user's -name, converted to all lower-case. Searching for a user, then, is easy. We -just lowercase the name we're looking for and query the database. If no -match is found, it is assumed that the user does not exist. - - This makes it difficult to forge messages from an existing user. (Fine -point: nonprinting characters are converted to printing characters, and -leading, trailing, and double blanks are deleted.)