Why Japan's Moon Lander Crashed Due to An Unbelievable Computer Bug

No video

Why Japan's Moon Lander Crashed Due to An Unbelievable Computer Bug

Рет қаралды 903,187

Жыл бұрын

The investigation into why the Hakuto-R lander crashed into the moon last month after an otherwise perfect mission has revealed the answer: The software encountered unexpected terrain and didn't believe what the sensors were showing, so it started ignoring them.
iSpace Report
ispace-inc.com...
LROC images of the crash site:
lroc.sese.asu.e...
Follow me on Twitter for more updates:
/ djsnm
I have a discord server where I regularly turn up:
/ discord
If you really like what I do you can support me directly through Patreon
/ scottmanley

Пікірлер: 3 100

@mcarpenter2917 Жыл бұрын

That's what happens when you keep changing the software spec's of a project. It's a bit hard to believe that they changed the landing site without rerunning the simulations.

@MarlinMay Жыл бұрын

This! This all day.

@pjotrtje0NL Жыл бұрын

Your first remark is very true, and not just in an aerospace environment!

@Powertampa Жыл бұрын

That's like releasing software without doing unit tests just right after the remote guy pushed ten thousand lines of code

@ailivac Жыл бұрын

I feel like something in the sensor processing design isn't fundamentally robust enough if it can be this easily confused by real terrain features. Maybe they can add a second radar or lidar sensor for dissimilar redundancy or to differentiate unexpected yet real inputs from sensor faults. We all know what happens when you run a safety-critical algorithm on a single AoA sensor...

@Ni999 Жыл бұрын

Exactly! Mission creep eats in to the project time line and system tests degrade into delta testing for success instead of system testing for non-failure.

@kjgoebel7098 Жыл бұрын

I'd like to see an episode of "Things KSP Doesn't Teach" about instrumentation. How air/spacecraft instruments work, their limitations and quirks, and how they can fail.

@JohnWilliamNowak Жыл бұрын

I'll second that. The Soviets had a number of uncrewed vehicle losses because they used ionic sensors to determine the orientation of the vehicle, which would fail on occasion. On the other hand, the gyroscopes aboard Apollo 13 held true despite being pushed well outside their comfort zone. Some sort of video about orientation sensors would be very enlightening.

@BabyMakR Жыл бұрын

Yes please!!! We need more of those videos please Scott.

@ferdievanschalkwyk1669 Жыл бұрын

Another vote. I see it in formula racing where drivers are having to "fail" various sensors to address issues with the power train.

@eekee6034 Жыл бұрын

Yep, me too. I think I'm aware of the issues already, but I'd like to know how different real sensors would be.

@Spacedog49 Жыл бұрын

@@ferdievanschalkwyk1669 As a former Formula 500 driver, the fastest lap times were NOT the shortest distance lines around the track. A computer simulation takes the shortest distance, while the faster drivers took a slightly longer, but faster path that defied logic.

@miroslavhoudek7085 Жыл бұрын

In my personal experience, people insufficiently care about aerospace software. I worked in a software company that worked for ESA and we were always pretty much ignored (e.g. in all presentations of our local space agency). But when some other company made a screw for a satellite, it was plastered all over their presentations. There were literal delegations going to take a look at the space-screw-producing machine. Such an interesting visit, you see, to a hall with machining equipment, clean rooms - that's the "space stuff" in minds of people. Something you can touch and see. How do you brag about a company with people sitting at their PCs? Nobody cares. Even if these are the guys whose work ultimately decides whether these magical screws end-up doing something or are splattered over the moon. I don't care about the publicity but it's the mindset. Everyone focuses on the aluminum this and titanium that - and software is always the afterthought. We can change that anytime. We can even send an update to space ... so, why should we think about too hard? Bam!

@deang5622 Жыл бұрын

Good point. And I think it is because it takes a higher level of intelligence and technical knowledge to understand software systems and the media and others can't understand it. You only have to look at any news article published by the main stream media, on television, in newspapers and you will see the errors the journalists make, the incorrect use of terminology, the lack of detail and you walk away realising the news article has told you almost nothing.

@jayasuriyas2604 Жыл бұрын

oof

@windywaz Жыл бұрын

Boom! As a retired architect for space sensor payloads, I can say you are spot on. I watched management spend all sorts of money on convenience tooling but if SW wanted licenses for software production and testing tools, oh God, you got run through the gauntlet. So how many times must a company learn these lessons? Simple, once per program.

@Mernom Жыл бұрын

It's the same attitude all over the place. Games no longer ship out as completed projects... 'we can just patch it later'. Mamy other fields also do shit like this.

@B_dev Жыл бұрын

Software in general too

@rhymereason3449 Жыл бұрын

It fascinates me that as you look at the history of disasters how many of them are ultimately caused by cutting corners to meet time pressures or budget targets. In this case you have to wonder (A) why the target zone was changed late in the game, and (B) why simulations with the new target zone weren't run. I would bet a dollar that engineers thought of it, but they were over-ruled because of time pressures or a budget target.

@aarondavis8943 Жыл бұрын

Your question (A) is a great one. It could be that the new landing site could be reached with less expenditure of propellant or something like that. They thought it was a lower margin of error. Or was it the opposite? Was there a "better" more ambitious site with more interesting geography?

@rhymereason3449 Жыл бұрын

@@aarondavis8943 It is interesting to speculate. IMHO "less expenditure of propellant" would fit into theory about disasters and cutting corners to meet budget targets. On the "better geography" thought... unless an asteroid suddenly impacted an area close to their original site, one would think that the geography question would have been settled long ago... the lunar surface is pretty well documented (at least the front side).

@pierQRzt180 Жыл бұрын

Proverbs have a sort of statistical truth. "Haste makes waste" exist exactly due to that. The sad part is that seemingly we keep doing the same mistakes

@rhymereason3449 Жыл бұрын

@@pierQRzt180 Yes it is sad to think about all the people who have lost their lives due to decisions on someone's part to save a few bucks by cutting corners. One of the latest examples appears to be that partial collapse of the apartment building in Davenport. Looking like the owner went with a cheaper contractor who would forego shoring up the building before proceeding.

@Beregorn88 Жыл бұрын

And C) why there weren't redundant sistems with majority check before deciding to discard the most vital part of your data...

@maurice_walker Жыл бұрын

In their official debriefing, ispace actually admitted that it's primarily a (project / program) management issue, not an engineering issue. That gives me hope that they might actually learn something from this.

@rspawn Жыл бұрын

most underrated comment

@curtislowe4577 Жыл бұрын

Life imitates art: a common problem in the Dilbert comic results in utter failure.

@philkarn1761 Жыл бұрын

It's almost *ALWAYS* a project/program management issue, not an engineering issue. This was also true for Mars Polar Lander and for Mars Climate Orbiter (the one that famously mixed up imperial and metric units).

@tomhenry897 Жыл бұрын

Don’t bet on it

@SayAhh Жыл бұрын

@@Josh_728 Get with the program: in 2023, we measure things in bananas

@ReverendTed Жыл бұрын

It continues to amaze me that we managed to safely land astronauts on the moon AND have them take off from the lunar surface and return home, several times. Obviously, having actual humans present makes a ton of difference, but the number of things that could have gone wrong but didn't is mind-boggling.

@MarlinMay Жыл бұрын

The brain is a wonderful flight computer. Lander: I'm going to land here. Human: Dummy, there's a rock the size of a McMansion there! Gimme manual control.

@a4d9 Жыл бұрын

The first moon landing was saved by the astronauts: the automation on the lander was going to put it down in a field of big boulders.

@unflexian Жыл бұрын

think about it like this: humans have managed to control powered airplanes since the start of the 20th century, while autonomous aircraft have only just appeared in the last decade or two. humans are just that versatile

@raifikarj6698 Жыл бұрын

@@MarlinMay I am howling, when I pictured this in my head with astronaut Slapping their computer and called it dumb.

@technocracy90 Жыл бұрын

One of the NASA research reports justified the cost and risk to send human astronauts to the Moon with the allegory says "Human brain is the most lightweight and easy-to-aqcuire real-time non-linear computer"

@robertbarron7660 Жыл бұрын

It's very interesting that this is almost the exact reverse of the famous 1201 alarm on Apollo 11. In that case the computer restarted and generated errors on the astronauts control panels. But because they knew that they were at the right altitude per the flight plan they had confidence that they were still flying correctly and Neil Armstrong brought the lander down safely.

@warrenpierce5542 Жыл бұрын

Source of 1202 and 1201 alarms was traced to the rendezvous docking radar, used for rejoining the command/service module was inadvertently left on, at the same time the radar for landing, the only one needed for the decent phase was running. This overwhelmed the lunar module computer, but mission control knew it was still safe to land because of one man at Huston.

@robertbarron7660 Жыл бұрын

@@warrenpierce5542 yes, when you go into the details then these are different cases. But in the abstract, in both cases the computer was confused because it got signals which were unexpected and didn't handle them well. In Apollo's case, the human was able to use additional information to recognize that the problem wasn't severe and in this case - there was no human.

@larrybud Жыл бұрын

@@warrenpierce5542 In Mike Collins' excellent book, he mentioned the 1201 and 1202 weren't exactly "well known" issues. Took a bit of "looking up" (quickly, albeit)

@richardmogie9675 Жыл бұрын

That second antennae wasn:t inadvertently left on. I saw Buzz sheepishly confess, the engineers didn’t think the same way he did in an interview.

@purnachandran87 Жыл бұрын

Just realized that manned missions are technologically easier (skill of pilot) than unmanned soft landings that are possible now due to the progress of software systems.

@subhakantagmail 11 ай бұрын

Finally the software bug is fixed and the Vikram lander from Chandrayaan-3 landed safely on lunar surface by ISRO. Hope most of the space agencies share data among themselves so that space progress is accelerated faster, instead of each one reinventing the wheel. Knowledge for Humanity...👍

@henrikibjensen3869 11 ай бұрын

Sorry, Humanity doesnt land on the Moon, nations do - or dont.

@martinmacphee3262 Жыл бұрын

Scott - great video as usual - thank you! But really this is not a software 'bug' is it. It's a systemic design and control failure. The software was designed to work as it did, but the specifications do not seem to have included passing over a crater like this. In other words the initial flight plan was intended to avoid this situation, and the software was designed to work within that flight plan. The first error was changing the flight plan without checking if the software could still function with the new one. The second error was not testing the software under the revised conditions it would have to work in. Both errors are symptomatic of inadequate control over change management. In other words, the flaw did not lie in the programming, but the organization's approach to change management.

@anotheruser676 Жыл бұрын

...and perhaps a Third error of the program disregarding the radar altimeter instead of querying it again. 'Say what? That result is outside of parameters. Please take your reading again'

@LezamaDamian Жыл бұрын

I agree this probably shouldn't be called a bug. Requirements were not properly validated, so it's a failure in their systems engineering process.

@nosuchanimal6947 Жыл бұрын

came here to say that! also, even if the result lateron would be inside parameters again: the device has already been proven to be unreliable. it might be an intermittent error, or it might be a bias that only on this occasion was noticed but existed all the time. revalidating system reliability would be a tough cookie to crack on its own if it didn't come with a redundant 2nd and 3rd system, though it should have notified ground control and gotten an update/patch. to my understanding that is how generally system failures are resolved. i don't know if their mission profile put an artificial time delay on that to prepare for longer ranged versions, or what happened.

@TheSheepwall Жыл бұрын

Haven't read the report so might be wrong, but if they use something like a Kalman filter, it is likely that they are not simply not querying the sensor, but that the calculated variance to associate to the sensor readings spiked. In that case, the sensor would still be queried, but is _effectively_ disregarded since the resulting effect on output would be so low (due to the change in the assigned variance). Someone can correct me if I am wrong here.

@sciencecompliance235 Жыл бұрын

There's also the design of the spacecraft that has to be called into question, specifically the AD&C architecture. Relying on a single altimeter means that you can't verify the data with a redundant sensor. Since accelerometers and gyroscopes can't really capture things like topography from orbit, it's like flying with one eye. I don't know how much mass, power, and space another altimeter would have taken up, but perhaps a redundant altitude sensor, possibly one with a lower resolution and/or sample rate, could have been used to verify the data coming from the primary one.

@_Mentat Жыл бұрын

My experience of being a software engineer is that the code has to be tested every time. It's amazing how often things that can't go wrong do go wrong.

@hanskloss7726 Жыл бұрын

It is not a sw that changed but the parameters of the flight. You may of course argue that the sw was made for the particular landing zone which I do not buy. I may be mistaken as the video is the only source of my knowledge of the situation - sort of like this radar was. So you take a peek at the surface with radar and see this crater with it or rather a human having visual would have seen the crater - the landing module saw just a point on the surface which was 3km higher then the previous point it peeked at. I suspect what they would have needed to do is to have more points that radar is measuring especially from distance and make an average out of it or use some other technique to see where one is. When much lower this would also needed to be done to see if there is no big stone occupying part of the landing zone. I suppose this last thing was eliminated by assumption that the landing is going to be done on the flat empty surface by choice of the mission control. I suspect if they were landing on the water/liquid surface this radar error could only occur due to a massive tsunami - well no water surface and no tsunamis but hard landing. Interesting to know all this tho, aint it?

@simonmultiverse6349 Жыл бұрын

Been there! Written lots of software... made some unbelievable bone-headed mistakes, which are all *BLINDINGLY OBVIOUS* in retrospect. "This change is SOOOOOOOOOO OBVIOUS that we don't need to test it" ... ha ha ha... this is when reality bites you on the backside, informing you that you definitely *DO* need to test it again.

@simonmultiverse6349 Жыл бұрын

@@hanskloss7726 HA! Then you discover it's high tide instead of low tide... maybe you simulated it with mean sea level but a mile away someone opened the sluice gates and there was a large wave from the reservoir... etc.

@roguedrones Жыл бұрын

This moon lander crash is an example of space sabotage. Deliberate.

@hanskloss7726 Жыл бұрын

@@simonmultiverse6349 low tide v. high tide does not cut it here - the surface is mostly flat still at least from a 5km perspective. The crater is a different story so you need to have many points possibly also a map? Not sure what is easier here but their method obviously failed. We know this is not a shame - we all have been there....

@sharizabel2582 Жыл бұрын

I flew fighters for over 20 years. The Kalman filter was the bain of the navigation and bombing solution. It would actually discount most of the updates I would insert. It thought it knew more than I did … it didn’t.

@peterweston1356 Жыл бұрын

Makes the Apollo landings even more amazing. Considering the precision of sensors and computational resources, both to simulates and support landing.

@Nioub Жыл бұрын

There was a similar bug in the LEM : if the module had flown above a circular-shaped crated of a certain size, the radar altimeter would have shut off all propulsion, probably leading to a crash. Fortunately the bug was never triggered (mainly because the onboard crew had taken over manual controls at this point) and was found decades after the landings.

@alamrasyidi4097 Жыл бұрын

why are lunar manned missions not done anymore these days?

@jessepollard7132 Жыл бұрын

@@alamrasyidi4097 Congress dropped funding, so NASA had no money for going to the moon (canceled the last planned 4 trips).

@vast634 Жыл бұрын

@@alamrasyidi4097 No Soviets to beat

@dr.cheeze5382 Жыл бұрын

@@alamrasyidi4097 isn't nasa planning to go back? Starting with an (unmanned?) Mission sometime after 2024?

@alamrasyidi4097 Жыл бұрын

@@dr.cheeze5382 so ive heard. but compared to the alternative of having to lose these spacecrafts to software error, i think "no soviet to beat" is a ridiculoua excuse. so i still really dont understand why lunar exploration has been strictly rover based these past few years...

@johnbuchman4854 Жыл бұрын

This is why you also have timers for expected milestones (earliest and latest time a milestone can be validly sensed). My background is that I worked on the Attitude and Articulation Flight Software for the Galileo and Cassini spacecraft when I worked at JPL. For a very simple and solid method they could have used what the Surveyor landers did.

@danrbarlow Жыл бұрын

Thanks for your awesome contribution to space science!

@nocturnal6863 Жыл бұрын

I'm sure mission control had a plot of the expected altitude changes, the lander may have had one as well. Problem is that the expected rate of change of the altitude, was outside what had been set as acceptable for the altitude radar. It was probably written in the specs somewhere. Proper simulation of the landing would have caught this, it could possibly even have been dealt with after launch. It's changing the landing site without simulating it that screwed them.

@nocturnal6863 Жыл бұрын

What did Galileo and Cassini use for altitude readings? and would they have been equally screwed if forced to switch over to gyro / accelerometer readings with an apparent failed altitude radar?

@u1zha Жыл бұрын

@@nocturnal6863 John's point was that "forced" switch is averted, if the switch algorithm is completely disabled at such an early phase of flight. Reread about "earliest time.. a milestone can be validly sensed".

@nocturnal6863 Жыл бұрын

@@u1zha except you wouldn’t disable the software monitoring a sensor for failure. Not unless you knew in advance it might give faulty readings at that point. Further thinking, I think I see what you are suggesting. That it should have been expecting by the dip in altitude and it’s failure the see it, means it should have known it’s altitude was off.

@ezequielblanco8659 Жыл бұрын

Being a software developer, I have seen this happen countless times in multiple companies. Software is often overlooked. Testing is usually considered redundant and a waste of time/money. Developer's warnings and requests are normally disregarded or displaced by other department's concerns which are non-technical and even non-functional.

@old_guard2431 Жыл бұрын

In my experience the software developers/engineers are kept out of the decision-making inner circle. Actually, this goes for engineering/tech in general. It’s fine, just change this, this and that: what’s the worst that can happen? (Changing the Moon’s landscape to more closely resemble a seedy neighborhood in Brooklyn, one spacecraft at a time.)

@harshu2651 11 ай бұрын

After fully tested, I still fear my code would break in some case that we have not looked 😂, its scary for space mission

@henrymalone422 Жыл бұрын

Been watching you since 2015! You have helped keep me interested in space flight! Thank you for doing what you do Mr.Manley.

@ksbs2036 Жыл бұрын

About 30 years ago I had a single page photocopied from Computer World or some such industrial publication taped to the outside of my cubicle. On that page was listed the ten most expensive software defects (bugs). I was astounded when the most expensive defects caused hundreds of millions of dollars of loss. When you read the list the top five defects (again, multi million losses) you found out that they were all losses of spacecraft and/or their payload. Flight software is tremendously complex and a single error will cost you your whole vehicle and years of effort. Now that page would have to be scaled to near billions of loss I expect

@a.p.2356 Жыл бұрын

Maybe not most expensive, but Therac-25 should be on that list somewhere. Ya know, because it ended up maiming and killing a bunch of people with intense doses of radiation.

@RoryMacdonald-pfff Жыл бұрын

There you go Scott - that’s an epic video right there. Top 10 most expensive Astro/Software defects.

@o0alessandro0o Жыл бұрын

@@a.p.2356 In a way, that is possibly the most expensive software bug ever; in another, it's quite cheap. Consider: we know for a fact that cars kill people, all the time, in every way, yet we do not ban cars. The value of a human being's life has been calculated, and apparently it's cheaper than you would expect. Electricity production has a cost measured in lives per TW/h. You can look it up. Biofuel has a cost of 12 people per TW/h. Solar is 0.44. Wind is 0.15, and new/clear is 0.07. The average American consumes 0.1-0.2 GW/h per year. In other words, over the course of your entire life you will likely kill less than one fiftieth of a person in order to keep the lights on. This does stack with the people you kill while driving, however - I'm talking about tyres particulate and excess death from pollution, not running somebody over. Ain't that grand?

@travelbugse2829 Жыл бұрын

@@o0alessandro0o It's not easy to respond to that kind of information. I do know that training and regular checking of pilots contributes to a high level safety for commercial aviation (ignoring mechanical failures). For drivers, I reckon that similar processes should be followed. It would not be popular among the general public, but I have said for years that licenses should be graded, based on years of experience and how many training courses a driver takes. Governments balk at the idea, however, and go on putting up cameras and roadside radars, more draconian speed limits, but never addressing the fact that poor situational awareness, slow and inappropriate reactions, and limited skills are the biggest factors in car accident rates. But I'm going down a rabbit hole!

@malbacato91 Жыл бұрын

Not strictly a bug, rather bad design; but implicit nullability - first introduced in ALGOL in 1965 and later copied into most programming languages - was famously coined by its creator as a billion dollar mistake. I think I read somewhere that at the time the estimate was quite accurate, but that was 2009 so by now it wouldn't be surprising if it is an order of magnitude too low.

@user-jz1su8bh5t Жыл бұрын

Another outstanding episode Scott! Being a software safety engineer for the last 39 years, I have to agree with previous comments that point out this is not a software bug, but more of a people problem during design, testing, management, etc. I believe the first Ariane 5 launch was a similar issue where the software worked perfectly per its specifications (from Ariane 4) and doomed the flight to failure. Like in this case, proper testing would have prevented the, expensive, tragedy. Also wanted to give a shout to "How To Destroy Wayward Rockets - Flight Termination Systems Explained". My 39 years were all spent on Range Safety Software with the last 13 years working on autonomous flight termination systems. That was another outstanding episode! Keep up the awesome work!

@Icowom2 Жыл бұрын

Pop op o99⁹9th kiwi's😊

@xGOKOPx Жыл бұрын

It is a software bug though. People problem is that the bug wasn't caught

@vast634 Жыл бұрын

Have you ever experience a flight termination system not working instantly, but 50 seconds late, as with the starship launch?

@user-jz1su8bh5t Жыл бұрын

@@vast634 Depends on the type of Flight Termination System (FTS). For solid rocket motors, they use a shaped linear charge that opens the casing and exposes the fuel which burns up quickly in an impressive display. (I think Scott mentioned that in his previous video.) For chemical fuels, things are different. You have more choices. The basic idea is to stop thrusting the vehicle so it falls into an unpopulated area, such as a broad ocean area in the case of SpaceX. Based on the video of the flight, the FTS worked properly and detonated explosive devices that created holes in the fuel tanks. That reduced or stopped the fuel flow to the engines. The FTS did its job. After that, it's all physics. If the fuels are hypergolic, they will combust on contact and you get a near-instant explosion. Otherwise, you need combustible fuel, oxygen, and an ignition source. Guessing, it took about 40 seconds before the three elements came together in the right quantities in the case of SpaceX. An FTS doesn't need to create an explosion. Rather than connect to explosives, the FTS can connect to fuel valves that terminate fuel flow.

@user-jz1su8bh5t Жыл бұрын

@@xGOKOPx I understand your perspective. My point is that the bug should have been avoided during design or implementation, and if not, then detected during development testing. Find and correct all the bugs before deployment. Since their development testing failed to react properly to "unexpected terrain" (kind of a silly term considering the moon's terrain is pretty stable), the people failed in the software development cycle and left in a failure mode (i.e., the bug) so it could be exposed during execution. The software did what it was designed to do so it worked properly. The people failed to account for something. The same thing happens with hardware but folks don't usually blame the hardware. The failure of Galloping Gertie wasn't blamed on the bridge. The people who designed and built it were blamed for not accounting for potential wind loading.

@AllAmericanGuyExpert Жыл бұрын

My Dad helped design the Apollo lunar landing software ... and curiously enough, it was never used due to a sensor overload ... the famous DSKY error 1202. When Neil Armstrong disabled my Dad's software for Apollo 11, that was the end of it. The LM landing program was always over-ridden by future LM pilots and the LM was landed manually. The fault was in a completely unrelated system ... I guess a lot of people wonder if it would have done its job. My dad says it was pretty robust and he never saw a simulation that it would have failed if given the chance to run to completion. It's a good thing Armstrong was a good pilot! My dad would go on to be famous for mockups, and then later, he worked on the avionics of the world's most capable fighter jet. He's getting old, but still with us. I wish he was more of a storyteller ... but the one he thought was the funniest (and most irrelevant) was meeting the president in the restroom at NASA ... as in, _um, nice day, isn't it Lyndon?_ as they conducted their business. I am guessing it was during LBJ's visit to Houston in 1968, the same time frame that my dad was working there.

@PT-xi5rt Жыл бұрын

You still believe in this fable? Open your eyes

@AllAmericanGuyExpert Жыл бұрын

You @@PT-xi5rt didn't know that LBJ was president? Or that he used the bathroom like the rest of us?

@AMeierhoefer Жыл бұрын

Scott, I am surprised that you did not touch on redundancy. I was a fighter jet aviator and one of the things we always did was use multiple sensors to allow the software to compare and then estimate probability. If they has three Radar altimeters they could see the rate of change of the surface as the spacecraft travels. Even if each would have shown the cliff, probability calc would have told it that its is virtually impossible that all three are suddenly all bad. Redundancy would be one answer in my book.

@thierrybriand2413 Жыл бұрын

Agree and also on my part, I always thought that radar altimeters were used « closer » to the surface.

@drill_fiend1097 Жыл бұрын

Probably budget constrained.

@AMeierhoefer Жыл бұрын

@@drill_fiend1097 This is a commercial effort so they could have just gotten one normally used in aircraft. It's not NASA where they cost $750K each just because...

@Papershields001 Жыл бұрын

I feel such compassion for the Hakuto-R team. They are going to accomplish it!

@serronserron1320 Жыл бұрын

I hope that they can make a new one and landed on the moon the next few years

@emileriksson76 Жыл бұрын

I watched the landing live stream and I felt so bad for them. Their nervous faces really hurt me too. I bet they do it next tie!

@abarratt8869 Жыл бұрын

They may not accomplish it. Very often such incidents reveal a whole load of issues that have been swept under the carpet, and the necessary organisational change required to address them all can easily break a small team / organisation. Even big companies can be killed by this. This is what is going on in Boeing right now. They caused the crashes of two 737MAXes and killed people. Since then they've tried to institute root and branch reform of how they run their business. Yet, they're still having problems. The most recent one was a fuselage manufacturing defect (they were building them wrong) that had gone unnoticed for approx 700 airframes (yep they're flying, possibly with Southwest today!). Fine, they've found it, repairs needed, not immediately dangerous, but cannot be ignored. Trouble is the manner of them finding it was accidental; someone was in the right place, at the right time and realised what was going wrong. The issue is that, if despite the introduction of a root and branch reform about how they approach quality (= safety, reliability) they're still finding major issues by chance, then the root and branch reforms are junk and are not working. They should be finding such problems as part of a systematic continuous improvement process, and they're not. So the bet-your-life question is, what else have they missed, given that they've essentially admitted that they've not been looking hard enough? It's similar with 787 (fuselage barrel joints), brand new 737MAXs with FOD and rodent damage, etc. This suggests to me that Boeing are in no way adequately reformed following the MAX crashes, the problem most likely being in the senior management who never understood it before and are still there today. It's worryingly possible that they're going to make another fatal mistake. Ok, the FAA is now (belatedly) keeping a much beadier eye on Boeing, but they can't see and check everything; certification engineers / inspectors are not there to do basic QC and basic QC improvement. The Hakuto team's best bet, if they're to try it again, is to just fix that one core issue and try again, and do as much simming as they can muster. Unlike Boeing, crashes are just disappointments and money.

@99guspuppet8 Жыл бұрын

❤❤❤❤❤❤❤❤❤❤ Yes they will succeed…… After they spend a lot of someone else’s money……… Let’s all go to Sugar rock Candy Mountain

@thePronto Жыл бұрын

But they launched knowing that their testing was invalid. Kinda like practicing parachute landings in a field, then jumping over water. I hope they don't ask me for a donation, because polite refusal often offends.

@dmacpher Жыл бұрын

Such a bummer that a error correction filter with and edge case nailed them. Lots of amazing data and at least it’s a software fix!

@sliceofbread2611 Жыл бұрын

Cliff case*

@dmacpher Жыл бұрын

🎢

@thePronto Жыл бұрын

Edge case? A crater on the moon? But it's not just a software fix is it? Or are we talking about a KSP do-over?

@slcpunk2740 Жыл бұрын

Seems a pretty basic error, in what universe did they think they could figure the exact altitude without the radar? Even if it was broke too bad, damned if you do/don't.

@dmacpher Жыл бұрын

@@thePronto They moved their landing site to align with NASA South Pole targets super late in development (post validation). The threshold for culling/re-baselining seems to be the issue. The sudden change in relative altitude wasn’t expected from their simulations.

@yashrajb5251 11 ай бұрын

Indias Chandrayaan 3 has finally soft landed on the moons south pole successfully. 🎉

@dandeprop Жыл бұрын

Hi Scott: Very nicely done! (but then, I say that a lot about your stuff...). This scenario is directly reminiscent of the situation on the Apollo landings where passing over a crater (or any other feature like that) would cause a 'jump' in the Radar Altimeter-portrayed altitude, and it would 'jump' from the PGNCS altitude. Remember 'Delta H'? The difference between RA and PGNCS altitudes. In order to keep things from diverging in the PGNCS, they had to incorporate a 'terrain map' into the software that accounted for local differences in surface elevation. Remember the landing of Apollo 17? At some time in the PDI maneuver, one of the crewmen (I can't tell which one--they sound a lot alike) said 'We went over the hump, and Delta H just jumped'. It sounds (at least at first blush) like a feature similar to the Apollo 'terrain map' might have been appropriate here (?) Thank you.

@regolith1350 Жыл бұрын

Software may have been the proximate cause but you can argue the real problem was somewhere in the development and quality control procedures. How can you not re-run a full landing simulation after changing the landing location? It reminds me of Starliner's problems in 2019. The software glitch where the flight computer grabbed the wrong "time" was the proximate cause, but the real problem was Boeing never ran a full end-to-end launch simulation.

@srinitaaigaura Жыл бұрын

Actually these days so much of manufacturing and coding is outsourced that the management, hardware and software teams are no longer next to each other - quality control begins to suffer massively. The more people outsource stuff, the more the work gets into the hands of rookies paid on cheap wages, who then end up making rookie mistakes that then require even more time and energy to fix. Boeing turned from an engineering firm to a management firm and the rest is history - 787, 737 max, 777x, Starliner. And as more and more automation comes in there's less and less human intervention to take care of the times where the computers reach their limits.

@user-cr4sc1ht9t Жыл бұрын

Feels like they might not have a great CI indeed, probably more like bunch of artifacts in git LFS type of management. But Starliner glitch might be slightly different topic IMO

@BubblefishOfTrem Жыл бұрын

I was also wondering how expensive such a simulation would be. If they aren't too expensive, I was wondering if you couldn't run landing simulations from randomized positions and flag anomalies from there. Not so much that you can just fling the lander at the moon arbitrarily, but more so you can find starting conditions which result in something weird. IDK, maybe we're getting into a space where "moon lander software testing" and later "asteroid lander software testing" might be a market, that would be amazing. With the costs of these missions, there might be some money on the line for a testing company - especially if they end up with a body of "known problematic situations" like the one from the video.

@MrJdsenior Жыл бұрын

How can you put a tank that has experienced both problems and damage in test into Apollo 13? Exactly like that, only different. Or get km and miles crossed up and smash a probe into Mars (IIRC), or ... ad infinitum. You can run all the simulations in the universe and still have problems, but not running ANY sims to cover a deviation in the program...yeah, that's just begging for it. I would think, in this day and age, that you could pretty much run that sim real time in parallel with the mission, for the problem they had there, knowing the path and surface profile, I'm guessing, and have it fire up a quick "do not ignore the damned properly functioning radar" command, or some such. It might even be good to have REAL TIME simulations running against the truth of the mission. Having done some aerospace hardware design, I'm guessing that there were schedulers and/or bean counters directly in the problematical loop. Or maybe idiotic MBA wielding managers that think they are engineers, or worse know BETTER than the engineers, because they know a few buzz words, and then maybe hold people's feet to the fire to get them to sign off on VERY cold Shuttle launches, or what have you. That's the sort of feedback you do NOT want in, say, a servo. :-/ Sometimes I look back and am glad I am retired, frankly. Some of it was fun, some of it SUCKED. Doc requirements come to mind as some of the latter. I had one junior documentation fiefdom wannabe tell me that the real output of a program was the documentation. When I finally quit laughing I told her that if she actually believed that she should go talk to some F16 pilot and ask them which they'd rather have with them on a mission, a working LANTIRN pod, or the documentation that describes it. She wasn't happy, because then a couple of people standing around laughed too. She wasn't a nice person (that's putting it mildly), or I wouldn't have said it that way. My bad, I guess.

@i-love-space390 Жыл бұрын

Armchair quarterbacks are a dime a dozen. You can certainly crow if you ever land a vehicle on the moon or even achieve orbit. Perhaps we can talk about "how obvious" the solution was when we stop whining about how LONG it takes to build and fly a vehicle and how the contractors are "milking the American public" for so much money. I thank Providence every day for Kathy Lueders and NASA for riding herd on SpaceX to make the Dragon 2 safe. Everyone had lots of criticism for NASA for being conservative and "delaying" the first launch of the manned spacecraft. But all that effort kept the astronauts safe. (Also SpaceX had a real leg up on Boeing, because they had a working cargo spacecraft in Dragon 1 to build on. The last time Boeing designed a manned spacecraft was the 1970s and the Space Shuttle. All those engineers are long since retired.)

@BeardyBaldyBob Жыл бұрын

I'd argue it's due to inadequate testing and making assumptions they shouldn't make rather than just blaming the software. To move the landing site and NOT run a series of full simulations for the new site is just an astonishing degree of incompetence!

@mcgilliman Жыл бұрын

This.

@BeardyBaldyBob Жыл бұрын

@@mcgilliman I like to think of an F1 analogy... Imagine if you set your car up to race in good sunny weather in Monaco at sea level, and they changed the race to be in Mexico in soaking wet weather at 2,260m above sea level... You would NEVER just race the car with the exact same set up and no testing before the race!!

@Myndale Жыл бұрын

True, but if history has taught us anything it's that the incompetence almost certainly wasn't the software engineers themselves and was instead a cumulative effect of multiple levels of beurecracy repeatedly ignoring the recommendations and pleas of the people who actually knew what they were doing and what additional work had to be done. I suspect this is a scaled-down version of Challenger all over again, albeit thankfully with no loss of life this time.

@i_Kruti Жыл бұрын

7:50 Yeah , the VIKRAM lander from CHANDRAYAAN-2 had lost communication and went out of control , but with improvements in software, damper etc , we are again ready for CHANDRAYAAN-3 to it in July according to official message......

@Anacronian Жыл бұрын

It's crazy to me that they didn't redo the simulations when a new landing site was chosen.

@bobboonstra3484 Жыл бұрын

Not a software bug, it was a design bug. The software functioned as specified.

@pigsnoutman Жыл бұрын

How do you know? Did you read the design spec? If the design spec stated it should be able to handle multiple lunar landing locations, then it's not a design spec issue.

@simongeard4824 Жыл бұрын

Definitely a process bug that this wasn't picked up in testing - but premature to say that it wasn't also a software bug.

@marcusdirk Жыл бұрын

@@pigsnoutman 6:17

@DavidEsp1 Жыл бұрын

Mismatch at Requirements and/or Expectation levels. Activated by beyond test envelope operation. Needed a calm (seasoned?) "captain" to hold a steady, pre-planned course.

@Spillerrec Жыл бұрын

@@simongeard4824 I think the video was quite clear on that the software started ignoring that sensor because it was programmed to do so. An intentional feature that behaved differently than expected *because* it was put into a situation that was not considered while designing it. And that this only happened because they changed the mission plan after the software was developed and did not test it again with the new landing site, because their tests would have detected the issue. That last part really hurts because they reasonably could have avoided the crash.

@IsMaski Жыл бұрын

Unfortunate to see what led to the failure of this mission. But glad to see that they have found the issue. Really hoping they succeed on their next attempt. Thanks Scott for the comprehensive explanation on this!

@MrPaxio Жыл бұрын

they didnt find the issue, they made the issue

@MonkeyJedi99 Жыл бұрын

Sounds like the software took the path of flat-Earth "science". What I see doesn't fit my preconceptions, ignore it!

@togowack Жыл бұрын

People need to wake up, controversy surrounded moon landings because there is stuff there. The issue / bug was in there on purpose. They will probably never let us see the real moon.

@davidbeppler3032 Жыл бұрын

They did not find the issue. The issue was management. The software was fine. Software did not change the landing location, management did.

@togowack Жыл бұрын

@@davidbeppler3032 The whole things was planned it is every time with every country why do people not see this, every single machine that lands on the moon has issues - #1 because the surface is covered in glass domes and other hanging debris #2 to cover up such things from the public in a convincing way.

@wChris_ Жыл бұрын

Its amazing how Apollo didnt have such bugs, despite it being written in pure Assembly!

@PMA65537 Жыл бұрын

They chose tamer landing sites.

@phloxie Жыл бұрын

@@PMA65537 apollo 15 likes to have word wth you

@castafioreomg 11 ай бұрын

Apollo missions had some issues but they handled then well..The engineers couldn't even visit their families becoz of the work pressure

@bretthoffstadt Жыл бұрын

I can't believe they didn't simulate their final landing site but that's what you are saying. Thanks for the explanation. Such a shame, they picked the wrong thing for a shortcut!

@hjalfi Жыл бұрын

There's an argument to be made that if a sensor is critical enough that if it fails you're going to land on non-existent terrain 5km up, then you just assume it won't fail. If you handle failure gracefully but then don't have enough data to avoid crashing, what's the point of handling it gracefully? Of course, ideally you'd have a backup. Like another radar, or GPS, or a video camera capable of estimating height using machine vision and a map, so you can sanity check it. The next best thing is just have a map: the vehicle knows where it is, so if it knows the terrain it can estimate what the radar values _should_ be, so instead of going 'eek, a delta of 3km in ten seconds is clearly wrong' you go 'the radar has shown a delta of 3km in ten seconds, what does the map say the delta should be? Right, 3km, moving on'.

@stoic.little Жыл бұрын

You can have a video camera that is very good at finding the distance by using phase detect autofocus, same principle as a rangefinder.

@driedurchin Жыл бұрын

I work in flight software and you're right. At a certain point if a system is so critical and irreplaceable you just have to trust it won't fail because as you said, detecting the failure isn't helpful if your SOL.

@Spillerrec Жыл бұрын

There is an argument to be made that if a $90 million project can go up into smoke due to a single sensor failure you have an expectation that it could potentially fail, you should really have some sort of redundancy even if it is unlikely. Or some other form for backup plan. The question is if it was actually considered if this sensor could fail, or if it just used the same behavior failure detection and handling as any other sensor without further consideration.

@CodeKujo Жыл бұрын

My reaction to just the title is "There are no unbelievable computer bugs". Now that I've watched the video: *very* believable. Accumulation of error is nasty and dead reckoning is very hard. Changing something that "can't possibly affect the outcome" late in the process and not doing a full test happens often enough that it's a subject of comic strips and many high profile failures.

@Hebdomad7 Жыл бұрын

Except the one that flew into one of the first computers and caused a short circuit.

@Ergzay Жыл бұрын

Scott's been moving to more and more clickbait titles of late. It's unfortunate to see him doing it.

@winebartender6653 Жыл бұрын

When you're using accelerometer and gyroscopic data alone for position on a 2d plane, it can become hilariously inaccurate quickly, no matter how good your algo is. Doing this in a 3D plane would be basically impossible if I'm being honest. As an example, there is a reason VR relies so heavily on video processing for limb positioning. Obviously these aren't in the same ball park of cost/importance, but the same rules apply.

@VarenRoth Жыл бұрын

The unbelievable part here, honestly, is how someone expected this to work without simulating the actual final flight plan at least once.

@CodeKujo Жыл бұрын

@@winebartender6653 US missile submarines can pull it off, but their inertial navigation hardware is larger than the entire lunar probe and submarines experience much smaller accelerations. It does seem like it was selected as a fallback with rather optimistic expectations of how well it would stay accurate. In hindsight, it would have been better to try turning the radar off and back on, relying on inertial navigation only as long as it took the radar to come back on. Also, redundant radar.

@Songfugel Жыл бұрын

Having seen in person how Japanese programmers work, how specialized and narrow their programming skills are and how ridiculously rigid their management approaches are, how many non-unified standards they use, this sort of thing doesn't surprise me at all ps. the Ron Burgundu clip was priceless and so on money xD

@goodlife1302 Жыл бұрын

I actually did not get your point . Could you please explain little bit more ?

@JosePineda-cy6om Жыл бұрын

the point being this was a bug tha should've been relatively easy to find, if thoy had simulated a couple of "landing site changed at last minute" scenarios that included heavily cratered areas or craters with steep walls. Just doing some tests on random landing sites would've triggered this. But nobodu thought of this, and because of corporate culture, everybody was dis-incentivized to even raise the question

@StudioVRM Жыл бұрын

The software was built by Astrobotic, an American company. Not sure how stereotypes of Japanese corporate culture come into this.

@goodlife1302 Жыл бұрын

@@JosePineda-cy6om Oh ok . Thanks a lot for the explaination

@Dr.Kraig_Ren Жыл бұрын

They outsource programming. It happened due to budget and time constraints. I'm pretty sure engineers wanted to rerun the simulation

@Aditya-gp2ih 11 ай бұрын

Came here after successful landing of chandrayan 3 of India....best of luck to Japan for future projects...

@perishmokrat8257 Жыл бұрын

Working as a Software Tester I often see the managers tend to take the risk to save some money vs malfunctioning SW especially when it has to deal with error handling.

@Henglaar Жыл бұрын

Which is a shame, really. The more expensive the project, the less management should feel like cutting corners on error handling and verification. Ah, well, what "should" happen in the real world doesn't agree closely with what actually happens in the real world.

@connecticutaggie Жыл бұрын

Yea, that is the challenge of small projects with limited resources. It is great that this is not a problem for larger projects (cough-cough-Starliner) that have the money and resources to allocate to proper SW verification.😆

@AleXsSpaceXTalks Жыл бұрын

Very good explanation and top video! I guess also the loss of the Mars Polar Lander was caused by a software issue, telling the landing thrusters to ignite too early, causing the probe to run out of fuel...

@ytashu33 Жыл бұрын

Love this! Thanks you for reminding me of Kalman Filters, i studied those in my M. Tech., loved them but never thought i would ever hear of them again. I still remember how the "location estimation" part, based on current velocity and direction integrated over time (aka: dead reckoning) can provide smooth and accurate predictions over short durations, but errors tend to accumulate in a physics based predictor like this and needs to be augmented with an independent measurement (ie: the radar), even if the radar data is not accurate. Amazing to see how stuff like that led to this outcome. It is a tough one though... I wish you had shared your thoughts on how should a "faulty sensor" be detected then? I mean, you could say that a 3 Km sudden jump in the sensor output means the the sensor is probably broken, right? If not, how else would you do that and handle the case when the sensor actually is broken?

@Beregorn88 Жыл бұрын

Redundant systems and majority check: if all three of your radar sensor reports a sudden altitude change, than that's what actually happened. What surprise me is that the sudden altitude change eventuality is never accounted for...

@dust1209 Жыл бұрын

This reminds me of an Alastair Reynolds novel where an automated system recorded the sudden vanishing of a planet but disregarded the data because the event was so far out of expected results that it assumed there was some kind of fault.

@letsburn00 Жыл бұрын

It then accidentally creates a cult.

@yogiwp_ Жыл бұрын

Which novel is this?

@dust1209 Жыл бұрын

@@yogiwp_ Absolution Gap, it's the third book in the Revelation Space series which is kind of weird. If you're looking to check out the author, I'd recommend Pushing Ice!

@ShoeTheGreyCat Жыл бұрын

@@letsburn00 And also liquifying the poor guys wife stuck in the scrimshaw suit

@letsburn00 Жыл бұрын

@@ShoeTheGreyCat I forgot about that bit. Given that series largely relates to characters that are functionally aging immortal, it's wild how easily they torture and kill each other.

@kennethng8346 Жыл бұрын

I've never done it, but from what I have read, sensor fusion is an enormously complicated and fuzzy technique. You have to take a bunch of sensors, account for non linearities and malfunctions, and you need to figure out which ones are correct, which ones are sorta correct and by how much, and which to ignore. On top of this you have enormous weight and power restraints. And there must be a million fudge factors that have to be played with. Move it one way and you get a false positive, move it the other way and you get a false negative.

@andrewahern3730 Жыл бұрын

I wonder if this would be a good application for AI? A computer would definitely be able to interpret way more inputs than a human pilot ever could and in real time

@JKa244 Жыл бұрын

It's a satisfying problem to work on.

@Niosus Жыл бұрын

@@andrewahern3730 AI isn't a magic fix. Those sensor fusion algorithms are supported by a a deep understanding of the system and statistics. Like with the Falcon 9, they are extremely reliable once properly tuned. Obviously an advanced enough AI system can always do the job. But if, like in this case, you simply didn't test the system with enough variations of inputs, you're not going to get good results either. The amount of simulations needed to properly train the AI would also have been plenty to find this bug in the old control code. The lesson here is that more robust testing is needed. I have a feeling that spaceflight is often seen as hardware-first. That's understandable, but without proper software the hardware is useless. I think more modern software engineering practices could be useful here.

@Orieni Жыл бұрын

IRL, nothing says you can’t have false positives and false negatives at the same time, while you struggle to understand the data. That’s no fun at all.

@GeorgeTsiros Жыл бұрын

kalman filtering is pretty damn straightforward. It's a basic method, not something extraordinary. Known for more than 50 years and optimal for typical sensors (ie those with common noise distribution).

@gonun13 Жыл бұрын

Putting aside changes in mission plans, redundant systems missing or even software bugs, I think the main issue here is overly strict programming. Assuming something is defective just because of a sudden change that is out of scope is bit extreme. Baffles me how it could hover waiting for the moon while letting propellant go to zero without at the some point trying to salvage itself with something like "this is not working, maybe I should take another look at that system i think it's dead".

@ahadsuleymanli9572 Жыл бұрын

what you're describing is human decision making, and you're ready to scratch this plan and try something better when the moment comes. you can't just imagine every scenario branching out at every step and hard-coding solutions to each. At some point you'll realize you need a generic decision making algorithm. In fact the mission failed due to them having a specific solution of switching off a reading since that allowed them success in previous simulations.

@thetooginator153 Жыл бұрын

It would be interesting to try an optical parallax system to verify the radar readings. If both systems agree, then the data is correct. Cameras could be a few meters apart, so, the parallax would be measurable from pretty far away.

@EnricoGolfettoMasella Жыл бұрын

That’s a very creative solution! ✌🏼✌🏼Pretty sure would work!

@xonx209 Жыл бұрын

If they don't agree, then what do you do?

@4k8t Жыл бұрын

@@xonx209 In sci-fi usually it would be three independent systems with two having to agree as to what they were seeing. A two system setup would require that both system have to agree and if one system cut out a sensor as malfunctioning and the other didn't, something would have to be present to break the disagreement deadlock.

@Alex-og3ev Жыл бұрын

Similar thing happened in 2017 with second launch from new cosmodrome Vostochny, old software logic applied to new geography without double check. Didn't happen at first because they used very rare Volga upper stage but second launch was in default configuration that flew for decades from launch pads everywhere including South America. So after final separation, Fregat upper stage was scheduled to make 10 degree turn counter clockwise but due to geography of new cosmodrome and flight trajectory, software decided that it needs 350° clockwise turn instead. Didn't end well. Turned out that there was narrow set of input parameters that could make upper stage behave like this and new lauch pad won jackpot.

@JohnMullee Жыл бұрын

Wasn't there something about thermal modelling and pipes freezing in the fregat upper? Or am I misremembering

@Alex-og3ev Жыл бұрын

@@JohnMullee No, that was definitely some other story

@firefly4f4 Жыл бұрын

By, "unbelievable", I'm pretty sure you meant, "Completely realistic, very common scenario when the software is put in an untested environment." Note that I am saying this as a software developer myself. I actually just identified a scenario where our existing tests were thought to be sufficient, but then some surrounding parameters changed and a bug was found.

@jarisundell8859 Жыл бұрын

As a software developer myself, I'm actually asking myself why those simulations were not set up to run like a CI.

@firefly4f4 Жыл бұрын

@@jarisundell8859 Good question. Seems like actually running the sim again once the final site was chosen should have caught this, maybe allowing them to upload the fix. For the record, CI is how the one I looked at was caught... prior to release 👍

@danstenger1 Жыл бұрын

Scott is also a dev by trade, too, lol, he works at Apple.

@cinquine1 Жыл бұрын

@@firefly4f4 I think it's a joke, since the bug happened because the computer didn't "believe" the radar

@scottmanley Жыл бұрын

By unbelievable, I mean the software stopped believing the radar

@elleryhorton44 Жыл бұрын

Redundant systems to help the mission don't matter if the mission never starts. I worked on a Single/Dual/Triple redundancy system a long time ago. I think the probability of a single incorrect signal per million samples for each device was 75/93/98 percent (roughly, I don't recall the exact number). A huge bonus from single to dual redundancy but rarely worth the extra 33% in cost between Dual and Triple. However, each module had to boot up on its own and if they did not, then the system wouldn't run anyway.

@bertram-raven Жыл бұрын

I would add optical recognition and stored high resolution images to the package. These would optically compare the expected position and orientation to the visible information and so call out anomalies. This entire apparatus would be as small as a Raspberry Pi, using off the shelf components. DJI drones use something similar in their RTB software. When the drone sets off, it takes photographs of its starting location and compares them to the downward facing camera live images when returning to land.

@Topcoatdetail Жыл бұрын

One of the reasons Chandrayan-2 failed because of the mapping. When the lander moved away from the photographed landing site it tried to over correct and failed.

@thePronto Жыл бұрын

A lunar lander encountered a crater and got confused. Total freak accident: one in a million. I can totally relate: today, I encountered a Starbucks in a strip mall.

@dorsetdumpling5387 Жыл бұрын

Unbelievable that they had only one method of determining altitude!

@manuelsilva8640 Жыл бұрын

My thought exactly.

@theqwert3305 Жыл бұрын

And that that one method could be turned off for the rest of the landing!

@EmpereurHector Жыл бұрын

I guess that's part and parcel for those very small landers.

@GlutenEruption Жыл бұрын

I mean to be fair, even the Apollo lunar module only had a single non-redundant landing radar altimeter for determining exact altitude. The astronauts were fairly confident they could manage to land without it but if it failed, mission rules called for an immediate abort. The weight constraints for landers are so tight, engineers have no choice but to make those trade offs.

@dorsetdumpling5387 Жыл бұрын

@@GlutenEruption Ah, but they had the backup that was the Mk. 1 Eyeball and its associated biological computer!

@kaineis Жыл бұрын

I love the ksp2 animations you added. That was really nice to watch.

@noahserio4182 Жыл бұрын

I’m surprised they didn’t have a redundant altimeter to verify the suspect altimeter reading against.

@chouseification Жыл бұрын

hey Scott - thanks for the analysis. I remember this one (as well as the Israeli and Indian ones) and seeing the disbelief in the control room was sad. It is easy to tell who has a clue and who is a bureaucrat by their expressions, etc. :P

@adarsh4764 Жыл бұрын

Hope there's no software issue when Nasa lands back on the moon!😂

@chouseification Жыл бұрын

@@adarsh4764 agreed - one would have thought that even a small lander would have a pretty robust navigation system these days, but obviously they met an edge condition they hadn't properly tested for... and a sad oversight too as nearly all landing trajectories will have the radar return affected by craters you're passing over. There are many of them after all, and although most are small, many are large/deep and you need to keep their profile in mind as you use the radar/laser/etc surface measurement. The state vector routine needs a sanity check to make sure the drift never disagrees from projected too much without it doing some form of reliable recheck.

@glennpearson9348 Жыл бұрын

Excellent explanation, Scott. Thanks for putting it all together for us to easily digest. Nice Kerbal recreation, too!

@therealzilch Жыл бұрын

Another fascinating and instructive example of Robert Burns' "The best laid schemes o' mice an' men / Gang aft a-gley.”. Cheers from sunny Vienna, Scott.

@mikeburch2998 Жыл бұрын

I'm so sorry to hear that this happened. I hope they try again and maybe send back some remarkable pictures. Don't give up. Greetings from Arizona.

@joelcorley3478 Жыл бұрын

But what if the radar altimeter actually did fail around the time it passed over that crater? It sounds like it would have produced the same result. I think the only way to deal with this in the design is to have at least one redundant sensor for something this mission critical. Of course the problem with just one sensor is that you need to try figure out which one is actually the broken sensor. That's why there is often 3 sensors or 3 computer systems that are used in this kind of redundancy...

@sonaxaton Жыл бұрын

Sounds like a redundant sensor wouldn't have helped this particular issue though, because it would have just gotten the same confusing measurements of the cliff wall. I think they just need to thoroughly run simulations of the actual mission to catch edge cases like this early.

@a4d9 Жыл бұрын

On a vehicle like this, without humans onboard, the space and weight requirements might be too costly compared to the risk of a failed sensor.

@SashaNaronin Жыл бұрын

@@sonaxaton exactly. Proper simulation campaign would've catched that.

@Damien.D Жыл бұрын

@@sonaxaton 3 redundant sensor and a voting system is the way to go. Worked flawlessly in many aeronautical things, from Concorde autopilot to missile guidance system.

@travcollier Жыл бұрын

The dead reckoning system combined with prior knowledge (a map of roughly what is expected) should have been enough of a redundant system. Seems like they should have included a reassessment/recovery routine to check if that apparent altimeter glitch (which wasn't a glitch of course) cleared and the instrument was giving reasonable data. This stuff is really tricky without a human in the loop.

@Hagop64 Жыл бұрын

If it stopped to a speed of 0, then fell to a speed of 500 km/h then it would have had to fallen for ~86 seconds. Moon gravity acceleration = 1.62 m/s^2. That means it was in free fall for a distance of about 6.0 km. That's all based off of the "500 km/h" crash speed given.

@scottmanley Жыл бұрын

Actually, I figured out 500km/h based upon the amateur radar measurments of 88seconds of freefall.

@Hagop64 Жыл бұрын

@@scottmanley Love how reliable basic physics equations are! With either bits of data it still comes up with the same results! If only the rest of landing on the moon were that simple.

@travelbugse2829 Жыл бұрын

What I want to know is how that equates to a violent impact on earth. Do I divide by six, which comes to 83.3km/h or just under 52mph? That's bad enough for it to need airbags...

@highdefinist9697 Жыл бұрын

@@travelbugse2829 You multiply by the square root of six - assuming there is no air resistance, so with air resistance you might end up with something not too different from 500 km/h for this type of vehicle.

@Kromaatikse Жыл бұрын

@@travelbugse2829 When it comes to the moment of impact, 500kph is 500kph. It's about Mach 0.5. You know those old war movies where they show fighters shot down and augering in? *That.*

@LightsEnd304 Жыл бұрын

Your explanation reminded me quite a bit of dynamic positioning systems on ships / oil rigs

@mrpocock Жыл бұрын

FYI if the landing location and approach is part of the software spec then a change to the landing site and approach is a change to the software spec and requires a full end-to-end revalidation of the software.

@ns219000 Жыл бұрын

Japan, sorry for your loss, but thanks for the software design lesson. Rockets are hard and this is how we learn. Thanks for sharing this one, Scott!

@Anvilshock Жыл бұрын

Hardly a "bug" when it worked correctly for the data input it was programmed to handle. At best, it encountered data it _wasn't_ programmed to handle, which makes this more a missing feature.

@mikehartsough489 Жыл бұрын

I was thinking same thing. Sounds like the software did exactly what it was supposed to do.

@1224chrisng Жыл бұрын

well, a bug is just unintended behaviour. The computer did exactly what you told it to do, just not what you wanted it to do

@ddnguyen278 Жыл бұрын

Can't imagine why they didnt run simulations of this. It's not like the moons topology isn't known down to the meter. Stick it in Kerbal and run simulations.

@RemyPorter Жыл бұрын

@@ddnguyen278 Uh, the moon's topography *isn't* known down to the meter. Some areas of the moon are, but generating meaningful maps of the moon is actually quite hard and time consuming. There are folks whose entire job is to take lower res digital elevation maps and apply reasonable interpolations to generate higher fidelity maps than we actually have. Not saying they shouldn't have done more sims, but it's harder than it sounds.

@davidwright7193 Жыл бұрын

Repeat after me “That’s not a bug it’s a feature”

@synergy021 Жыл бұрын

Is there a requirement that all titles must be clickbait and include one of these words: Unbelievable, Shocking, Terrifying? No the reason wasn't unbelievable, it's actually quite believable and simply just an oversight.

@scottmanley Жыл бұрын

It’s unbelievable because the navigation software stopped believing the altimeter.

@synergy021 Жыл бұрын

@@scottmanley Lol, good save. Wasn't really directed at your video per say, just that's the KZfaq titling by youtubers trend these days. Although yours is actually technically accurate hah 🙂

@kiereluurs1243 Жыл бұрын

What was the 'REAL TRUTH?!!'

@scvcebc Жыл бұрын

Neil Armstrong took over the controls and manually landed on the moon when he saw rougher terrain than expected at the final approach of the first manned landing in 1969. He was a true test pilot who was able to think fast and take action without losing his nerve. He barely had enough fuel for the extra maneuver, so he was also lucky. The problem with depending on robotics is that software doesn't have "common sense" and enough experience to handle the unexpected. However, these crashed robot landers are much cheaper than manned missions, so with trial and error they will eventually work.

@ClickClack_Bam Жыл бұрын

And then a unicorn ran up & they rode the unicorn all around the Moon going 240,000 miles back to Earth. The Unicorn didn't run 28,000mph like they would've had to go in the pop rivet aluminum can they brought them there.

@lyoha5028 Жыл бұрын

I wonder what all these people in mission control were doing during the landing. Were they analyzing the telemetry in real-time? I assume they were supposed to notice that the radar altimeter was considered faulty and disabled. If so, perhaps they could have reviewed its readings and realized that after passing the edge of the crater, the readings returned back to normal. In that case, they could have just manually reenabled the radar altimeter. Since it is not Mars, the signal delay is small enough to allow for manual corrections during the landing.

@katho8472 Жыл бұрын

Word!

@ooooneeee Жыл бұрын

They lost telemetry. If they had a connection they could saved it.

@pavanshetty9806 Жыл бұрын

There might also be delay in communication.

@rorykeegan1895 Жыл бұрын

Seems pretty sloppy not realising a change in landing site might cause the craft problems ... Sounds like bad project management to me.

@ankoku37 Жыл бұрын

"Actually I'm underground, so I should cut my parachute" is the funniest conclusion an AI spacecraft could make before murking itself

@zrohit Жыл бұрын

Maybe multiple countries could drop beacons around common landing areas that everyone could use during landing. Not a foolproof but can help.

@codediporpal Жыл бұрын

I'm very impressed with the abilities to diagnose what went wrong. Even amateurs helped! Another case study for future designers of "fail-safe" systems.

@riccardob9026 Жыл бұрын

To be honest (and a bit philosophical), I would not call this a "bug," in the sense that sometimes with bug you mean an error in the software that makes it behave differently from the behavior specified at design phase. In this case the software had to face a situation that was not expected, that is, a suddenly increase of altitude due to a deep crater. It was not an error introduced at implementation time (that is, when they wrote the software), but at design time. Like a bridge that breaks down, not because some error during the building, but because of a strong wind that was not considered at design time.

@DrDeuteron Жыл бұрын

I agree. This was planning error, or a failure to test error, or changing the landing into a regime that had not been tested, or all of the above. It's been know for a long time that radar altimeters can be spoofed by terrain, it is nothing new.

@serronserron1320 Жыл бұрын

An engineering oversight

@aspuzling Жыл бұрын

As a software engineer I agree lol but that's not to say it is not also partly the responsibility of software engineers to raise potential bugs in the design.

@chaz720 Жыл бұрын

Agreed, and came to write this. As a space systems engineer, this was a systems engineering failure, not a software bug.

@bbgun061 Жыл бұрын

They should have tested their software with real data.

@jeechun Жыл бұрын

This story reminds me that once I planned to make a simulation for spaceships/probes, where the simulation goes down to almost hardware level, where the subsystems (sensors) could be configured to have a certain precision, sampling rate, processing delay, and the way how they communicate with the CPU, the flight computer, so the design of such a vehicle architecture would be closer to reality. Also, the propulsion units could be configured to have delay to start/stop/change working, and a function, how it is done. May be, in KSP3? :D (Feel free to use this idea, most probably I won't have time to develop it.)

@robertst-laurent6452 10 ай бұрын

Mr. Manley, for the whole planet you are our 21st century Eugene Kranz. At 01:13 your video proves that we now have available: ‘A da Vinci World of Creativity at Home’ The video shows that they used a $170 Airspy R2 receiver (with a $620 LNC + antenna) with the mind blowing power of the software available for the Airspy, so for less than $900 USD you can have the same setup at home ! Your use of the Kerbal simulator, to help us better understand the sequence of events, is of jaw dropping beauty.

@BILLY-px3hw Жыл бұрын

It tore me apart watching the team coming so close, it really has to weigh on the people who didn't catch the glitch, I am sure some are still laying in bed awake at night, can't wait to see the team bounce back with a flawless mission

@OhNiceMatt Жыл бұрын

Those software engineers were layed off, hence the laying in bed awake at night

@0x8badbeef Жыл бұрын

6:20 Planned landing site change? That would normally require a revalidation of the software in the industry. I would blame this on the people who decided not to do that. I would investigate those guys and why the change. I would not blame the software as the software was not designed to be used that way.

@carlwill5009 Жыл бұрын

😅 A buddy's could hear from you again. Thanks for your good update videos.

@rayoflight62 Жыл бұрын

Thank you for all the detailed explanation! Greetings, Anthony

@RogHawk Жыл бұрын

Thank you, Scott! You answered questions I've had for the last few years about the landers crashing on the moon.

@bobbun9630 Жыл бұрын

From this description it sounds like the software worked as intended based on the circumstances. It sounds more like they need to rethink the system level design to have more inputs that can be used to sanity check one another, and perhaps have a means for a one-time instrument glitch (at least in the design interpretation) to be "forgiven" if later sanity checks pass.

@u1zha Жыл бұрын

Yes, that makes sense, and the "forgiving" part is commonly solved by Kalman filtering, which Scott also mentioned. Here it sounds like Ispace overengineered a little bit, overeagerly dropping sensor data on the floor before giving the filter a chance.

@TECHN01200 Жыл бұрын

As someone in software, I find it concerning that data just gets thrown away on a whim like that.

@gonun13 Жыл бұрын

Exactly. There are even retry and resume techniques, when data is out of scope, and how to change the code according to each retry to be less strict and try to salvage "the most possible". (Ex: a harder landing better than spinning out of control, etc)

@dwarftoad Жыл бұрын

Seems like it might have landed OK if it didn't run out of fuel? Was it taking much longer to land, or using more fuel for some other reason in this contingency without radar guidance? Seems like that's a gap in design/testing as well. Also, was it too conservative in rejecting the radar data? Did it rely too much on an incorrect elevation model of the surface, or just wait too long to start using it again, or turn it off permanently for some reason?

@JohnSmith-fz1ih Жыл бұрын

My understanding was it believed 5km elevation was the moons surface, so it was going to hover there slowly descending until touchdown. So in other words it was always going to run out of fuel. (I think the video said it was descending at around 1km/hour, so it would have needed an additional 5 hours worth of fuel!).

@skougi Жыл бұрын

After watching both India and Israel do the same thing (live) I decided it must be tradition to crash on the moon at least once before landing there. Seriously though, both of those crashed last minute too. It's like their up becomes down and they rocket full speed into the moon trying to avoid, well, just that. I think the one chinese probe even had to resort to using optical recognition tech to get around the weird landing issues. thanks for posting!

@brentboswell1294 Жыл бұрын

Didn't Neal Armstrong have to do some on the fly recoding to overcome the 1202 error when the Eagle lunar module was getting overwhelmed with input? (Which was fixed on later Apollo missions through code fixes and turning off an un-needed radar as part of the checklist?). Seems like they could have used an altimeter, but on the moon the altimeter setting is always "00.00" 😅

@caturlifelive Жыл бұрын

Thats why i love Scott Manley video, so detail

@AeroGraphica Жыл бұрын

I suppose that with the rapid advances in technology and AI , this kind of problems will soon disappear. A simple camera pair for example could recreate human-like vision, and give enough information to an AI to perform a landing, specially if paralleled with all the already existing sensors.

@LaBamba690 Жыл бұрын

Excellent point.

@alwayshiking_ Жыл бұрын

And do you really think the Japanese didn't deploy that?

@AeroGraphica Жыл бұрын

@@alwayshiking_ Well, apparently not since it crashed after thinking for too long that it had landed, 5km above the surface ...

@drill_fiend1097 Жыл бұрын

With the AI the need for higher processing power to multiply large matrices come. This increases the electricity power requirements and requires more RnD for creating radiation-hardened variants of processors. The Snapdragon 801 in ingenuity Mars drone is probably the state-of-the-art SoC being used. But that's the same one used for Galaxy S5 a decade ago.

@pavanshetty9806 Жыл бұрын

I doubt AI in space crafts any time soon unless we develop more effecient processors.

@jbirdmax Жыл бұрын

Enjoyed hearing you on NSF Mr. manly.

@snwendland Жыл бұрын

I seem to recall the University of Wyoming having a "Missile Guidance for Dummies" audio description of a guidance system for knowing where the missle is by knowing where it isn't - it seemed pretty rock solid. I have to wonder why this method hasn't been adapted for spacecraft yet.

@frodo9649 Жыл бұрын

It substracts where it should be, from where it wasn't.

@H-S. Жыл бұрын

Exactly. It would be especially helpful in this case; if the lander knew where it isn't, it would not waste fuel by trying to land as if it was just above the surface. :)

@u1zha Жыл бұрын

I believe that's just a sentence for lulz, engineers expressing themselves purposefully obtuse. Kalman filters are exactly the "knowing" part, and a closed loop control system is exactly the "subtracting" part.

@frodo9649 Жыл бұрын

@@u1zha kzfaq.info/get/bejne/mMCVaK1puLyniYU.html This video is full of these sentences, that are close to how control loops work, but not quite, which I find quite funny, especially if you know how it works

@simongeard4824 Жыл бұрын

@@u1zha Unfortunately, it also inspires a lot of morons to quote that line constantly on KZfaq, perhaps under the mistaken impression that it makes them look smart.

@fuffy66 Жыл бұрын

"Dead Reckoning" is the phrase you were looking for to describe how it tried to calculate its height after the radar was cut off.

@kendokaaa Жыл бұрын

Interestingly, this is the kind of problem that could also occur if you make a kOS (or kRPC) landing program in KSP if you use instruments and not the game's data

@tertiaryobjective Жыл бұрын

Like when you're walking down the stairs and miss that last step.

@mballer Жыл бұрын

Why haven't they orbited GPS satellites around the moon yet? Why don't they drop transmitters to the surface first as becons?

@TheMonthlyJack Жыл бұрын

The Moon has very few stable orbits, and they still require propellant. The moon is very lumpy and the earth tends to fling you off.

@samuraidriver4x4 Жыл бұрын

Dropping beacons on the surface is the exact same thing as putting a lander like this down.

@jhonbus Жыл бұрын

@@samuraidriver4x4 To make it easier to land our probe, we will first land three probes.

@dalel3608 Жыл бұрын

Apollo did that once using a Surveyor lander as the beacon. But that was just for position, not altitude.

@mballer Жыл бұрын

@@samuraidriver4x4 Really? Can you describe your design? I was thinking about a lightweight baseball sized package on the end of a long collapsible pole, throw a dozen of them out in hopes a few survived to do the job. Or how about a huge air bag with the probe suspended by rubber bands in the center, no need for an accurate landing of the beacon. Did y'all really think I was suggesting to land a huge piece of equipment to be a beacon?

@perwestermark8920 Жыл бұрын

Kalman filters is mentioned in the video but it isn't so much sensor fusion as handling noisy sensor data.

@baylinkdashyt Жыл бұрын

Can KSP generate a track projection line on imaging like this? That seems like it would help in visualizing what a landing like this looks like. [ Sidebar: Lunar Lander (in FOCAL on a PDP/8-e in 1976 or so) was the first computer game I ever played... on a paper teletype. I wasn't any good at it. There's a photo of an Apollo astronaut playing a port of that on a laptop that's the Best Photo Ever.)

@seann4678 Жыл бұрын

Hi Scott, during the iSpace debriefing, they reported that their velocimeter did not start reporting data when it expected to be 2km above the surface (event 9 in the schedule). Do you know if this is a separate issue or a consequence of being too high from the ground?

@SashaNaronin Жыл бұрын

Sounds like they tightened the Mahalanobis check magrins in Kalman filter. It's the check that real measurement at each time step, expected measurement and estimated measurement errors are all in accordance with each other. And you usually hardcode the acceptable marigins for that, i.e real-expected measurements must < 4 times expected error. If it isn't the measurement is bad (e.g. accelerometer physically fell off the mounting). Unfortunately the margins are often set too tight. It could've been another problem tho, related to algorithms similar to simulataneous localization and mapping, but I don't have enough experience with them to judge.

@dsewtz3139 Жыл бұрын

Good point. However, I believe (hope?) Scott was only using it as a a well known example and they don't really use a "naive"/textbook implementation of either Kalman filters or Mahalanobis distance... 🤷🏻 I mean, the moons surface is NOT a hidden probabilistic distribution - so at least four dimensions could use euclidean distance, verified against the same using surface maps - not some exhaustive search in simulation training data for planned approach vectors 🧐 ...if they need help implementing that - I actually wanted to visit a friend in Japan for some time now 🤣 (just kidding, compared to them I also don't have enough experience - but I'm 100% sure, if the control-loop "broke", it was more indepth than due to a wrong geometry of the probability space)

@yahoolane Жыл бұрын

My first feeling was that one of the issues is that they all seem to be young people, I know they are very very smart young people. But I don't see the 'old guy' in any of the pictures. An old guy who has experience with developers and engineers.

@junaid-vc3js Жыл бұрын

I was hoping to learn that a line of assembly code or c code caused the error, but blaming it on testing verification and validation makes the software coders think phew ‘ it wasn’t us gov ‘ - to be honest I just feel for the engineers involved

@antonioloma2327 Жыл бұрын

If they tested by simulating landing on other spots but not on the selected one, then they didn't tested! This isn't a software bug but a project mangement issue (specifically testing). It's like if you "test" your computer program on your desktop but then deploy it in a server and the faster hardware makes apparent a race condition that borks the system. Testing is expensive, testing is hard, but not testing the actual flight plan is dumb.

@RobertBlair Жыл бұрын

Software engineering does not start at the keyboard, and end when it gets sent to a testing team that is somehow not software engineering. Engineers have a responsibility to work with testing crew, to validate the test scenarios. The teams failed to run enough variations of realistic input, so inputs outside the limited sets caused a fault. Specifically several bugs in the system as a whole 1 Spacecraft is unable to land without altimeter inputs. Relying on only inertial guidance cannot be accurate enough to land, due to inherent input noise. If the altimeter signal is discarded more than X seconds before touchdown, error margins cause failure rates approaching 100% 2. Guidance system (apparently) had no way to recover confidence in sensors 3. Guidance system would erroneously flag valid inputs altimeter as a broken sensor. 4. Testing was not done to cover new landing site (and yes, a senior engineer should have balked at the change)

@DishNetworkDealerNEO Жыл бұрын

Software is hard…

@dorsetdumpling5387 Жыл бұрын

So, as the lander found, is the moon.

@bobbun9630 Жыл бұрын

There are very few human activities that are as complex and as routine and yet have such potential to fail catastrophically from even the smallest of errors. So yes, it's objectively a hard thing to consistently do well.

@davidboyle1902 Жыл бұрын

Sounds a tad like what would have happened to Apollo 11 had Neil not been ‘the computer’. Their planned landing sight was a disaster waiting to happen and was why Neil had to take control of Eagle and fly it clear of a boulder field. Nice presentation and a wake-up call to the folks who think computers are infallible.