Peck by MarkKordusic

MarkKordusicvia treechat·2mo
❤️ 0 Likes · ⚡ 0 Tips
{
  "txid": "f75d290cc5852a7df7f420def757f2ee1c77907391812b9679e5d25429a19a4e",
  "block_height": 0,
  "time": null,
  "app": "treechat",
  "type": "reply",
  "map_content": "Supplementary quotes.\r\nChuck moore\" forth is the only programing language i would consider programing a computer with\"\r\n From memory may be incorrect wording but i believe he said  something along these lines very closely.\r\nAnother transcrpit for reference material and video.\r\ni am chuck moore i created fourth 45 years ago it is still being used which is moderately amazing it's certainly being used by me it's the only language i would ever consider programming a computer in i'm the cto of a company called green arrays which is a startup that's been in business about five years so it's not exactly a startup but we're still looking for customers a killer app um investors we have a high-speed low-power multi-core chip 144 computers on a 5-millimeter chip it's a remarkable little chip it's a lot of fun to program it has enormous computing capability i'll show you that in the next slide so first i'm going to talk about fourth and then i'm going to talk about the fourth engine which is a computer which runs forth very efficiently i'll show you some fourth programming and then point out why how you can program in low energy and i recommend that all of you keep energy in your mind when you're writing programs because pcs are going to be going away smart phones are going to be embedded embedded meaning i embedded in your body in your ear in your eye and your heart and power is going to be crucial in those applications if we talk about smart dust we have untold trillions of computers floating in the air they aren't going to have very much memory they are going to have very much power if you talk about molecular computers that are circulating in your bloodstream they're going to have much power they're going to have much memory and they aren't going to be able to use much energy so keep that in mind as you move through the next 10 years oh let's go to the next slide all right here's a here's 144 computers just so that you know what we're talking about there are eight rows of 18 computers and then the geometry of a chip that turns out to be uh square each computer can talk to its neighbors it's a mesh of computers four neighbors each computer takes one and a half nanoseconds to execute an instruction that means it's running at 666 mips uh it takes four milliwatts if it's running flat out which is set seven pico joules per instruction which is small now i'm going to be talking about energy there's energy and there's power power is energy per unit time power is watts and a watt is a joule per second so it really doesn't matter how long it takes to perform an operation an operation is going to require a certain amount of energy it requires energy because you have to charge capacitances and you have to run currents through [Music] resistors but energy is the key and one of our instructions takes roughly seven pico joules more or less one of these computers takes a hundred nanowatts if it's idle remember there's milliwatts and microwatts and nanowatts so it takes very little energy just sitting there and 144 computers they all power up idle the chip at reset is drawing no power the entire chip can run 96 gifs 96 000 mips which is a lot of pcs it only takes 550 ml watts to do that that's half a lot if the chip is idle at reset 15 microwatts of of leakage current although all the transistors are sitting there no transistors changing state but you still get a little bit of leakage through the uh through the gate this is what i mean by low power this is the lowest power computer that i know of a green arrays one of green arrays customers developed a chart showing all existing computers commercial and experimental and we're right at the top of the list this is about the best you can do the only people that can do better are those that deal with sub threshold logic with the high noise margins and this is a a good computer the computers are numbered from zero with their row in the column zero to uh seven one seven and i'll be referring to those numbers so that is that is the that is my motivation for talking about these things i wasn't paying any attention to energy until a couple years ago when we started realizing that this chip is remarkably low energy and now that we know how important it is i can reduce the amount of energy it's using and i will be doing that i've been to a number of hardware conferences where people are talking about designing asics large a6s millions and millions of transistors and i've always considered that size and speed were important parameters but in the modern context there's only three parameters that matter power power and power we've got to reduce the power for portable applications and also to reduce the energy used by these server farms that require a nuclear power station to run okay here's one of the computers uh it's 18 bits 18-bit words for not very important reasons but it's a very uh happy number it lets you count up to 256 000. it has a 64 word ram that's not kill words that's words you can pack four instructions per word so each computer can execute a that's up up to 256 instructions that's what i mean by saying it's a small computer that has implications in the way you program it as characteristic of fourth and a fourth computers it has two push down stacks one which is common to all computers and that is a stack on which you store return addresses for nesting subroutine calls the other which is pretty much unique to fourth is that you store parameters on it anything you read goes on to the stack anything you write comes off of the stack there are three registers of interesting there t is the top of the stack s is the second element in the stack and r is the top element on the return stack you've got the programmer has got to be aware of these two stacks and manipulate them nicely nothing is done automatically there are three address registers p is the program counter which is where you are in the instruction stream a and b are just registers there are four communication ports to your neighbors up down right and left and you can act you can read and write to your neighbors you can read and write all four of them if you want to there's a bit for each port in the address so that is that is the context those are the facilities available to a programmer here is the instruction set there are five bit instructions we can pack four per word except that in the third slot you have to add a couple zeros so in the list of instructions those in red can be put in slot three and all the others can be put in any slot jump instructions can be in either slot 0 1 or 2 and all remaining bits in the word are the address for the jump i'll just briefly go through the instructions this is a perfectly conventional instruction set except for a few in the right hand column you've got jump you've got call you've got a jump if zero you've got a jump if positive you've got a jump if the r register is non-zero and decrement so that's a decrementing count for uh for next loop the second column you've got fences and stores fetch from register a fetch from the address and register b or fetch inline in the next word in the instruction stream and the same equivalent stores third column are our alu operations we've got uh binary operation add and and or unary operations two star two slash and minus the binary operations take as input the contents of t and s the two things on the top of the data stack and they return an argument nt throwing away s so it's a it's a stack operation these are called zero operand instructions the only reason i can pack them into five bits is that you don't have to specify their operand the operands are known in the third column you've got some stack operations you can duplicate the top of the stack you can drop the top of the stack you can pop the stack from s into from the return stack into the data stack vice versa you can push the data stack onto the return stack and then three interesting instructions so you can store into register a store under register b read back from register a and do nothing the dot is a no op no op is essential in order to fill in slots in a word instruction word that you haven't used 32 instructions if you haven't memorized them already it would only take you another five minutes and that's partly what i mean by saying this is an easy computer program now if you're familiar at all with fourth those are fourth words there's a one-to-one mapping between your source code and your machine language compiler is a few thousand lines of code it doesn't optimize it merely translates your source into the machine language this is a different world from what i've been hearing in the last couple days um it may be a world that dates back to 1968 and it's not interesting to you but again i caution that the future is going to be different than the present and i'm i'm happy to deal with a very transparent very small compiler you've got we can argue about whether you call it a compiler or an assembler because it really doesn't do anything but compiler is a more impressive word here's here's the code here's code for one of these computers and this is an example of color fourth i created force in in about 1970 and i've used it ever since i call that classic fourth color fourth i devised in 2000 it has a number of nice features mostly that i felt i needed a new language for the 21st century but one of the things it does is it minim it reduces the amount of of punctuation fourth has very little punctuation compared to say c but color forth has almost no punctuation instead it uses color each [Music] each source word in the source code each word in the source code has a tag that indicates its function and the function is reflected in this editor as color white words are comments they're ignored red words are being defined so the definition of go is the green words follow it uh those green words are compiled uh the yellow words are executed at compile time they're direct commands to compiler or to the yes to the compiler the gray words you see there are inserted by the compiler to tell you where you are in your 64 words when we finish the definition of go we've used one word of memory and so your program counter is set at 0 1 in hex init marks the end of code which is compiled into memory and begins the code that is compiled with this executed at load time these computers have got four ports and you can read and write to the ports they're memory mapped so you can also jump to them if you jump to a port you are waiting for your neighbor to give you an instruction if you jump to all four ports at the same time you're waiting for one of your neighbors to give you an instruction to execute and it is the programmer's responsibility that only one of those neighbors will actually provide an instruction if you get two at the same time you're dead at load time at the time we're loading this program into memory we're executing those green words after init up is the address of the up port it is stored in the register a down is the address of the down port and it's stored in the register b and then we jump to go so this is how you can initialize a computer without actually having to execute code from its ram and i'll show you some more examples it's it's very nice to be able to tell your neighbor to do something for you for instance there's a hierarchy of memory available on these computers you can get at the data on your return stack instantly you can get at the data on your re i'm sorry data on your data stack instantly you get data from your return stack with one instruction pop 1.5 nanoseconds you can read data from your ram but it takes you five nanoseconds to do that you can read data from your neighbor's ram uh if you have written a little program and and send it to him he can give you sequential data from ram without you having to give him explicit addresses or if you're doing an fft he can give you scrambled data from his ram knowing what you're going to do with it you can have smart memory if your neighbor doesn't have enough memory he can ask his neighbor i've had as many as 10 computers chained up passing passing data to me and i can get the data as fast as i can handle it you can actually use your neighbor's stack these stacks are circular you've got the two registers on top of the data stack and then eight circular registers beneath them so if you push things onto the stack eventually they fall off the end if you pop things from the stack after the first two you're going to get the next eight circularly you can get you can circulate the stack indefinitely which is useful and i'll show an example you can do the same with your neighbor your neighbor can give you things off his stack indefinitely and you've got four neighbors so we have lots of memory available to you in these small chunks of 64 words 64 words is 9 000 words of memory on chip now for words 18-bit words you can think of them as 18 000 9-bit bytes if you wish i actually use 6-bit bytes so i can get 27 000 characters here's an example of some of the programming and this has direct relevance to energy it's um two star is a left shift left shift t this is something that we do remarkably often there's a lot of bit bit blinding plus is an ad next is a countdown loop if if you run out of count it it falls through and the count is discarded so here's an example a four next loop with a call to drive in the middle going to be done ten times the loop count is one less than the uh number of repetitions um oh that's confusing micronext is another kind of loop instead of jumping to an address you jump to the beginning of your current instruction word so a micro x a micro next loop can handle uh can can repeat three instructions because micro next can be in slot 3. so the instructions can be in slot 0 1 and 2 or any combination of that if i say 416 for micronext i'm executing an empty loop with only the micro next instruction i'm going to 416 times which is a microsecond another way of doing that would be to put a no op in there now i have a two instruction loop and i execute the no off and then i execute the micro next and repeat indefinitely and i get in a microsecond again the only advantage of the second one is i can count longer than 256 000 i can count up to a million instruction delays so those are basically delay loops there's two problems with delay loops two problems one problem they take energy the computer is running and micronext doesn't take as much energy as some of the other instructions but still you want to keep that in mind of those two loops the empty loop and the no-op loop the first is uses less energy because micro next is executed repeatedly in the second case you alternating me two and two instructions no op and micro next so the instruction decode transistors are toggling in the first case nothing is changing in the computer except the count so you're much better off with the first construct than the second if you're interested in energy and having become interested in energy that's the way i do it i'll show you some um fetch is fourth symbol for uh at sign is for symbol for a fetch operator so fetch means read store means right bang means right exclamation point those are particular symbols for historic reasons and also because these words are used so frequently that they deserve a small a small word now the communication between neighbors has has a handshake involved if i'm reading as all the computers start out doing i raise my read line and my neighbors can look at that if they want but everyone is reading so there's the one massive collision of 144 times four ports um no problem somebody will break that symmetry by writing when that happens when both the read and write lines on a communication port are high both computers say and take off and run the data is transferred from my stack to his stack when both lines are high there's no no logic no latches no delays um it's it's sort of a blocking read if if i want to read i just hang up i just wait until i've got something to read it takes no energy while i'm reading while i'm waiting the chip is idle i mean the computer is idle no transistors are toggling communication is is key when you design an application you've got to do two things right write code for each of the computers involved and then manage their communication um it's fun there's all kinds of nice tricks you can play to make things work properly here's another another example of code this is put into block 200. the yellow line at the bottom describes how it is put into 200. this code which is a a message if you will will be sent eight nodes left and three nodes down across what i call my ether interface it is code that pre-exists in all the computers and passes messages back and forth this is a clock actually it's more than a clock it's a it's a it's driving a ceramic oscillator which is like a crystal except it's easier to drive the word drive is going to drive the crystal crystal is attached between a pin which this computer has access to and ground i hit it with a positive voltage and it it the the the pin will rise i hit it with a negative voltage and the pin will drop i do that 10 times and the crystal sees those drive signals and starts oscillating so i can start the crystal running i'll show you how i do that in a minute then having started it i don't have to delay any longer the drive signal was a micro next loop and it was pre it was pre-programmed to have the right frequency so that the crystal would start to oscillate but that drive is is running the computer and it's using energy actually you use energy when you put an output on the pin and that energy probably dominates everything else but anyway we go to pump i no longer have to have a delay loop i can look i can read the pin and this is a blocking read and when the pin transitions from low to high i wake up and do something one of the things i do is change the polarity of the edge i'm interested in so next time when the pin goes from high to low i wake up otherwise i'm asleep so it takes no energy to keep this crystal running i just wake up and kick it and go back to sleep ring avoids the kicking i can pump it and then i can let it ring for a few times while the amplitude gradually decreases and i can pump it again a strategy for driving a crystal with using minimum energy and it it works beautifully now in the case of initializing when i load this code i'm initializing the state of the computer in this case i have all these dupes and overs i'm filling the data stack with numbers and taking advantage of the fact that it's circular so in drive when i do a store b i'm storing something into the address in register b and that address is the address of the pin and the thing i am storing is already on the stack and will remain on the stack indefinitely so i don't have to fetch a literal which makes the code smaller actually it doesn't matter because the code is only 16 words long but still you like to do it nicely and you like to explore the limits of how how nice is possible so i put these numbers on the stack in exactly the right way and then i can reference them in the code without any cost this is a good crystal driver for several reasons one that only uses one pin and pins are a limited resource we have 88 pins on this on this chip it is not as efficient as a one of these little surface mount uh crystal oscillators we're a very low power computer but we're not as low power as a custom asic we're better than fpga much better in an fpga but if you design transistors unique to a particular application we can't compete we've got versatility we can program all kinds of things but for a dedicated application you're better off with custom silicon all right optimum programming this is not being anything that you aren't uh you are familiar with the key absolute elephant in the room minimize the number of instructions you execute that's almost synonymous with minimize the size of your source code actually most of the instructions you execute are going to be in a loop so you minimize the size of your loop you can do that by factoring your application very carefully by unrolling loops by doing all the tricks to minimize the number of instructions you execute now actually in the case of these computers you can only have 256 instructions so you have to minimize the total number of instructions as well as the number that are actually inside the loops you want to use all the slots in the word the slots which you don't execute or don't use are going to be in the right hand end of the word they're going to contain no ops but they are going to be executed they're going to take energy and they're going to take time so you don't you want to avoid that it is hard and perhaps impossible to use all your slots that's just the cost of doing business also when you're trying to minimize energy you want to recognize that there is a minute there's a floor on that you can reduce the amount of energy you use but eventually going to hit the point where there's a certain amount of work to be done certain number of transistors to switch you're just going to have to swallow that cost the goal is not you can't reduce it and except the last the last line all you can do is minimize waste use all the slots fetches and stores are best put early in the word in slot zero or one instead of late in the word because as soon as you stop using the address bus instruction prefetch can take place and fetching the next instruction word so it's ready for you when you finish the current instruction word so there's a left bias to those i o operations then it matters where you put the code in memory uh a jump in slot 2 only has three address bits which means it can jump within a little three a little eight word page you want to position your jumps where they matter so that you can use a slot to jump and that just means moving things around in memory that means you need to be aware of where your jumps are and what slot they're in and the compiler tells you that but the best thing you can do and actually the thing that is the most fun is you come up with a better algorithm whatever you think you're doing think about it say do i really have to do it this way is there a different way which might be better better in matching the constraints that the little computer puts upon you that's to make it fast to make it compact and fit in 256 instructions the same rules apply really you want to minimize the number of instructions you want to avoid literals a literal is a fetch from the address and the p register which is pointing to the next instruction in your instruction next word the instruction stream and you can store a literal there when you fetch it the p register is incremented over the literal and you just picked up something in line and when you put a number in your source code that's what happens um that's certainly cheaper than making an explicit reference to ram but it's not as cheap as having something already on the stack which is why i preload the stack with up to eight numbers that i'm going to cycle through dupor would give you a zero duplicate the thing on top of the sac exclusive wore it with itself and you pack a zero it's a lot cheaper than fetching a literal zero which would require one slot for the fetch instruction and four slots for the literal and itself but this only works with zero it works a little bit with minus one but most literals you can't construct you have to actually pick them up dupoir destroys the top of the stack if you care about the top of the stack then you have to have a dupe dupe ore but it's still cheaper micronext is better than next if you can do you can use it a micro next is often used with shift instructions um if you want to shift left 10 places you can set up a four next micro next loop it'll count down four places and do a shift this is slower than actually having ten unrolling that loop and putting 10 2 stars in line but until you get up to eight or so shifts it's cheaper to put them in line initialize from the port but again the better algorithm is the better way of making your code small hey timing is working out nicely here's another page of code this is a i call a block and i found that this is the amount of source code that you can you can fit in 256 instructions this is an impressive little piece of code it goes into node eight it's reading sram or it's it's accessing sram you've got these words read write and read modify write which lets you address either randomly or sequentially into into off-chip ram offship ram is absolutely essential it's only the only way we can get a decent amount of memory we've got a million words of chip ram in this particular chip ram chip and can address it in 50 nanoseconds there are three nodes involved in reading ram one node has the address bus one node has the data bus and one node has the control lines it's worth talking about but i don't have time now for low energy programs first thing you have to do is be able to measure the energy you do that with a micro ammeter attached to the the the input trace to your chip such meters cost a couple thousand dollars but they're absolutely worth the investment and they let you measure the things which i'm trying to optimize what you want is a low duty cycle you want your computers to be sleeping they're sleeping they're not doing any energy if they're spinning they're they're draining energy a 32 kilohertz crystal will give you timing real clock timing without using any energy because you're only waking up every whatever it is 16 30 milliseconds to do something if you if you zero your stacks you avoid stack thrashing whenever you read something and put it on the stack you're pushing the stack if the stack all contains all zeros you're pushing zero into what it was all previously a zero and it's not taking very much energy if your stack is random you're going to be using measurably more energy than if it's empty same with a return stack and there's another thing which we only recently discovered it matters where you put your loops if you if your loop has a lot of address bits changing it's going to cost you more energy than if it doesn't so if you're jumping within a page you're better off than if you're jumping way back when more address bits change and so it goes they're fun to program they're challenging to program you have to factor in fourth and this chip you have to factor into these tiny computers and i didn't talk about wires i showed you the code for wires you fetch from one port right to another port and you've programmed node to be nothing but a wire it just passes messages through itself um that's you've got to do that you are free to choose put your your your active computers anywhere you want to put them your i o pins because if you aren't reading input and writing output you aren't doing anything and it's a layout you've got to lay out the code in a way that minimizes the amount of wire nodes you have to pass messages through and that's what i have to say you\r\n\r\n\r\nhttps://youtube.com/watch?v=0PclgBd6_Zs&si=Mxn_OEflFc12GeF7",
  "media_type": "text/markdown",
  "filename": "|",
  "author": "14aqJ2hMtENYJVCJaekcrqi12fiZJzoWGK",
  "display_name": "MarkKordusic",
  "channel": null,
  "parent_txid": "4dbf3da439f7920b0c17db5df1fcd65a4c2736b8c14f8ef2ed892128d58cf5f6",
  "ref_txid": null,
  "tags": null,
  "reply_count": 0,
  "like_count": 0,
  "timestamp": "2026-03-17T08:11:59.000Z",
  "media_url": null,
  "aip_verified": true,
  "thread_root_tx": null,
  "engagement_score": 0,
  "token_ref": null,
  "token_type": null,
  "kind": null,
  "lat": null,
  "lng": null,
  "category": null,
  "has_access": true,
  "attachments": [],
  "ui_name": "MarkKordusic",
  "ui_display_name": "MarkKordusic",
  "ui_handle": "MarkKordusic",
  "ui_display_raw": "MarkKordusic",
  "ui_signer": "14aqJ2hMtENYJVCJaekcrqi12fiZJzoWGK",
  "ref_ui_name": "unknown",
  "ref_ui_signer": "unknown"
}
Signed by14aqJ2hMtENYJVCJaekcrqi12fiZJzoWGKAIP!