|
Post by icefisher on Jul 20, 2009 7:02:39 GMT
V&V is not a program. It's simply the two words "Verification and Validation" and how software companies do each is up to them. What you are demanding is external auditing, not V&V. Where do you get this garbage Socold, do you just spout it out? V&V is a program and so is auditing. Audit programs and V&V programs are custom designed. In fact the trade calls their procedures "the audit program" and its different for every audit. And I didn't say anything about external auditing. Internal auditing pretty much does the same thing as external auditing. The only difference is external auditors have their fingers on the pulse of the industry and tend to be a little more knowledgeable about problem areas. Other than that they are both pretty independent (internal auditors are usually hired by the same people that hire the external auditors, namely the BOD) so internal auditors are often independent of management. You are like this babble machine socold. You have an opinion on everything. I bet you can tell us where in the universe there is a planet ruled by turtles. Can't you? Come on don't be modest now. I am also saying that external auditing is pointless in this situation because the product is implemented multiple times by multiple independant teams anyway, therefore alleviating the possibility of fraud and bugs. Of course the corrolary to that is you don't need to audit banks because they have such heavy investments in credit default swaps there is no way they could go under, and we know that CDS's have been audited by huge numbers of independent audit teams and are as sound as a bug in a rug. Obviously QC is not your strong suit.
|
|
|
Post by steve on Jul 20, 2009 8:44:21 GMT
Icefisher,
Why are you getting all hot under the collar when you have no idea what level of internal and external "auditing", "verification" and "validation" processes models undergo.
Popularity may not be perfect, but socold is not talking about popularity. He is talking about independent replication of results by competitive organisations. Yes there are always questions to be asked, but the success or otherwise of models to hindcast 20th Century temperatures really is not the touchstone of a good model.
Comparing climate modelling with an industry where rating agencies were paid to value paper *by the owners* of the paper is invalid, particularly as even forensic accountancy could not accurately work out the value of some of the contracts till they'd unwound. And a fact you've skated over, but I'll repeat, is that engineering studies often *do not* get any closer to the result than climate models do. They build uncertainty into their predictions and designs.
The standards in science are different from the standards in finance and mining engineering, but they're as high or higher standards, and they're definitely much more open.
|
|
|
Post by socold on Jul 20, 2009 12:42:16 GMT
V&V is not a program. It's simply the two words "Verification and Validation" and how software companies do each is up to them. What you are demanding is external auditing, not V&V. Where do you get this garbage Socold, do you just spout it out? V&V is a program and so is auditing. Audit programs and V&V programs are custom designed. In fact the trade calls their procedures "the audit program" and its different for every audit. I think you agree with my key point that they are "custom designed", ie in most cases a software company will decide how to do V&V rather than following some external industry standard. No need for analogies in this case, I think it's quite clear that multiple independant implementations will alleviate the possibility of bugs and fraud. Ie the chance of many of the models suffering the same bug or fraud is lower than any single model suffering it and a buggy or fraudulent model will be an outlier of the set. We could add auditors and auditors who audit the auditors and so on, but in this case there doesn't seem to me there would be any point. If there was only one model it would make sense, but science works by independant verification rather than auditing of work.
|
|
|
Post by icefisher on Jul 20, 2009 15:30:10 GMT
He is talking about independent replication of results by competitive organisations. Banks compete also Steve. But it didn't stop them from a massive failure for failing to recognize the underwriting risk that was occuring as a result of the government abandoning its traditional securities regulation role and subsidies to poor credit risks. As I said the prerequisite for building a model is believing you can build a model. A necessary premise for that is the belief you understand all the physical processes surrounding global warming. Mann would not have had to reconstruct temperatures in the manner he did if that was understood well. The fact you have 5, 10, 20, 50 teams all believing they understand what drives global temperatures isn't at all remarkable in a world with almost 7 billion people; especially considering all the incompetently handled trustfund-baby money running around the system. Yes there are always questions to be asked, but the success or otherwise of models to hindcast 20th Century temperatures really is not the touchstone of a good model. Hindcasting becomes important when you have no experience with the model. Because without it you have absolutely nothing. Comparing climate modelling with an industry where rating agencies were paid to value paper *by the owners* of the paper is invalid, particularly as even forensic accountancy could not accurately work out the value of some of the contracts till they'd unwound. Why not? You are failing to see the very close parallels here. This is about prediction and models that predict. If you think finance is more complicated than climate, fine. But I think you should have to prove that. And a fact you've skated over, but I'll repeat, is that engineering studies often *do not* get any closer to the result than climate models do. They build uncertainty into their predictions and designs. Life is a risk. Understand the risk in a bridge is reaching for commerce and prosperity. Its a bit different than reaching for poverty. Rest assured we can achieve whatever we reach for because the minute you stop believing that you are reaching for dying. The standards in science are different from the standards in finance and mining engineering, but they're as high or higher standards, and they're definitely much more open. Thats complete rot! You sit here and claim no need for audits. . . .and oh by the way engineers wear the liability for their failures. Tell me where the accountability is in science. Its a bunch of tenured folks with goldplated retirement plans and absolutely no personal liability. You can claim gold plated standards Steve. But rest assured Steve that like any law on the books without enforcement and punishment the standard you speak of is just an illusion.
|
|
|
Post by steve on Jul 20, 2009 17:07:37 GMT
He is talking about independent replication of results by competitive organisations. Banks compete also Steve. But it didn't stop them from a massive failure for failing to recognize the underwriting risk that was occuring as a result of the government abandoning its traditional securities regulation role and subsidies to poor credit risks. Just because banks "compete" doesn't make banks the same as climate institutions (or boxers). The failure of the attacks on the global temperature metrics is all down to this wrong-headed focus on "replication and auditing of the code" rather than the independent research and analysis. And when we do get highly publicised papers about modellers disagreeing (eg. on decadal forecasts), somehow that is a criticism, and not evidence that the science process is working well The principal argument is not about risk-reward. It is about bogus claims that mining and engineering studies are somehow better than climate studies. The fact is they all assess and deal with relatively large risks and uncertainties. The standards in science are different from the standards in finance and mining engineering, but they're as high or higher standards, and they're definitely much more open. Thats complete rot! You sit here and claim no need for audits. . . .and oh by the way engineers wear the liability for their failures. Tell me where the accountability is in science. Its a bunch of tenured folks with goldplated retirement plans and absolutely no personal liability. You can claim gold plated standards Steve. But rest assured Steve that like any law on the books without enforcement and punishment the standard you speak of is just an illusion. [/quote] Strawman strawman. The level of "audit" is good - I'm not claiming "gold-plated" or any other sort of excessive adjective. Anyone with a bit of a resource can carry out research on many of the climate models that are in wide use. Yes one can probably trundle through a 40 year career in the backwoods of climate science, but reputations are made by novel science that is not undermined or shown to be downright wrong. Bankers have up to now received instantaneous rewards for bad work. They're the exception. If you were a 30 year old climate scientist who believed in global cooling and believed that the models were soon going to be shown to be rubbish, would you bet your pension on a career in global warming?
|
|
|
Post by icefisher on Jul 20, 2009 17:38:52 GMT
Thats complete rot! You sit here and claim no need for audits. . . .and oh by the way engineers wear the liability for their failures. Tell me where the accountability is in science. Its a bunch of tenured folks with goldplated retirement plans and absolutely no personal liability. You can claim gold plated standards Steve. But rest assured Steve that like any law on the books without enforcement and punishment the standard you speak of is just an illusion. Strawman strawman. The level of "audit" is good - I'm not claiming "gold-plated" or any other sort of excessive adjective. Anyone with a bit of a resource can carry out research on many of the climate models that are in wide use. Yes one can probably trundle through a 40 year career in the backwoods of climate science, but reputations are made by novel science that is not undermined or shown to be downright wrong. Bankers have up to now received instantaneous rewards for bad work. They're the exception. If you were a 30 year old climate scientist who believed in global cooling and believed that the models were soon going to be shown to be rubbish, would you bet your pension on a career in global warming? So you are measuring this by the attitudes of the greenhorns coming out the universities and choosing a career path huh? LOL! First of all these kids are indoctrinated, second of all by the time they get an advanced degree they are heavily invested already. Bottom line is the only place for a real job for these guys is government or teaching institutions. This is not the cream of the crop to begin with. Finally, I only want two responses from you. Do you believe a standard without consequences to enforce it is a valuable standard? And in what way does a government employee or a tenured professor bet his pension? As far as bankers being rewarded, that was pure BS and fraud on the part of the government that chose to bail them out. What it was was hush money to keep them from screaming about the cause. You can bet everyone of them took the opportunity to move into a quiet dark corner with a bag of money and voluntarily put the muzzle on.
|
|
|
Post by steveeasterbrook on Jul 20, 2009 17:59:23 GMT
NASA software standards are by far the highest in the world. And for good reason. When you have a one off shot to get a multimillion spacecraft right you try to be very very careful indeed. Hello people! I heard my name mentioned, so I thought I'd pop in and join the conversation. It's interesting that folks here should be asking about V&V of climate models because that's exactly what I'm doing. As Steve mentioned, I've worked on the software V&V projects for the Shuttle, Space Station, and various planetary probes. I'm now examining climate models. The shuttle software is reputedly the most expensive (per line of code) in the world: $35 million per year. If you want to offer that much funding for software development to climate modeling centres, I'm sure they'd be delighted. Commercial sofware, e.g. for the banking sector, tends to have around around 1 error for every 100 lines of code. I guess those claiming to be "auditors" on this board aren't doing their job very well The shuttle software is two orders of magnitude better: about 1 error per 10,000 lines of code. (the shuttle flight software is about 400,000 LOC altogether) Even more interestingly, my preliminary measurement of one particular climate model (from the Hadley centre) is even better than the shuttle: around 0.3 errors per 10,000 lines of code. Now, I have a lot more work to do to validate that number, and to see if other models are similar. And there are all sorts of methodological problems with comparing defect rates of different types of software, which I'd be happy to explain if you're interested. But if my initial measurements are anywhere near correct, it's extremely impressive. How do they do it? Well, it turns out that they spend a huge amount of time doing something that every other software organisation (including NASA) skimps on: end-to-end regression testing. Every time a scientist makes a small change to the model, they run both the old and new versions of the code in full simulation mode, and check that the model exhibits the expected change in behaviour, and without breaking anything else. It's a scientific experiment, with proper controls and everything. It is expensive (maybe more expensive than the shuttle software), but doesn't get measured as "expensive software development", because it gets measured as "scientists doing science". Hundreds of them, doing it every day, on some of the most expensive supercomputers in the world. Oh, and they have very extensive Model Intercomparison Projects too, in which the outputs of models from different centres are compared in intricate detail on the same set of benchmark scenarios. Again, I know of no other software in the world for which this would be standard practice. It's remarkably effective for finding errors and understanding the software. So, I have plenty of evidence that at least one climate modeling centre has software development practices that are more effective and more mature than NASA's flight software organisations. I'll be publishing detailed results of these studies later this year.
|
|
|
Post by icefisher on Jul 20, 2009 21:57:21 GMT
Even more interestingly, my preliminary measurement of one particular climate model (from the Hadley centre) is even better than the shuttle: around 0.3 errors per 10,000 lines of code. Of course NASA's missile code actually has to get to where its going. With the climate models they can land on that planet ruled by turtles and nobody would know the difference.
|
|
|
Post by Ratty on Jul 20, 2009 22:23:43 GMT
Software without coding errors doesn't guarantee a quality result ...
|
|
|
Post by sigurdur on Jul 20, 2009 22:33:37 GMT
Software without coding errors doesn't guarantee a quality result ... It is not the coding error that produces credible results, it is the validity of the basic hypothosis that produces credible results.
|
|
|
Post by steveeasterbrook on Jul 21, 2009 0:48:45 GMT
Of course NASA's missile code actually has to get to where its going. With the climate models they can land on that planet ruled by turtles and nobody would know the difference. I'm sorry. I mistook this forum for one where intelligent conversation was possible. You might as well just stick your fingers in your ears and shout la la la. I'll shall stop wasting my time.
|
|
|
Post by slh1234 on Jul 21, 2009 1:32:10 GMT
NASA software standards are by far the highest in the world. And for good reason. When you have a one off shot to get a multimillion spacecraft right you try to be very very careful indeed. Hello people! I heard my name mentioned, so I thought I'd pop in and join the conversation. It's interesting that folks here should be asking about V&V of climate models because that's exactly what I'm doing. As Steve mentioned, I've worked on the software V&V projects for the Shuttle, Space Station, and various planetary probes. I'm now examining climate models. The shuttle software is reputedly the most expensive (per line of code) in the world: $35 million per year. If you want to offer that much funding for software development to climate modeling centres, I'm sure they'd be delighted. Commercial sofware, e.g. for the banking sector, tends to have around around 1 error for every 100 lines of code. I guess those claiming to be "auditors" on this board aren't doing their job very well The shuttle software is two orders of magnitude better: about 1 error per 10,000 lines of code. (the shuttle flight software is about 400,000 LOC altogether) Even more interestingly, my preliminary measurement of one particular climate model (from the Hadley centre) is even better than the shuttle: around 0.3 errors per 10,000 lines of code. Now, I have a lot more work to do to validate that number, and to see if other models are similar. And there are all sorts of methodological problems with comparing defect rates of different types of software, which I'd be happy to explain if you're interested. But if my initial measurements are anywhere near correct, it's extremely impressive. How do they do it? Well, it turns out that they spend a huge amount of time doing something that every other software organisation (including NASA) skimps on: end-to-end regression testing. Every time a scientist makes a small change to the model, they run both the old and new versions of the code in full simulation mode, and check that the model exhibits the expected change in behaviour, and without breaking anything else. It's a scientific experiment, with proper controls and everything. It is expensive (maybe more expensive than the shuttle software), but doesn't get measured as "expensive software development", because it gets measured as "scientists doing science". Hundreds of them, doing it every day, on some of the most expensive supercomputers in the world. Oh, and they have very extensive Model Intercomparison Projects too, in which the outputs of models from different centres are compared in intricate detail on the same set of benchmark scenarios. Again, I know of no other software in the world for which this would be standard practice. It's remarkably effective for finding errors and understanding the software. So, I have plenty of evidence that at least one climate modeling centre has software development practices that are more effective and more mature than NASA's flight software organisations. I'll be publishing detailed results of these studies later this year. How are floating point errors or loss of significance accounted for? or the drift that floating point errors or loss of significance can contribute to subsequent mathematical operations? What floating point data types are used? Those are the first questions I have for the process. I've wondered that for a while, but haven't found anyone familiar with the code that could answer those.
|
|
|
Post by socold on Jul 21, 2009 1:41:29 GMT
How are floating point errors or loss of significance accounted for? or the drift that floating point errors or loss of significance can contribute to subsequent mathematical operations? What floating point data types are used? Those are the first questions I have for the process. I've wondered that for a while, but haven't found anyone familiar with the code that could answer those. that's a good question
|
|
|
Post by poitsplace on Jul 21, 2009 4:46:54 GMT
Even more interestingly, my preliminary measurement of one particular climate model (from the Hadley centre) is even better than the shuttle: around 0.3 errors per 10,000 lines of code. Now, I have a lot more work to do to validate that number, and to see if other models are similar. And there are all sorts of methodological problems with comparing defect rates of different types of software, which I'd be happy to explain if you're interested. But if my initial measurements are anywhere near correct, it's extremely impressive. I believe you're missing the part of the concept of "validation" we're critical of here. If someone reprograms a car's computer so all it does is play a tune, it makes no difference if it's full of errors or completely without errors...its still completely useless as a car's computer. Modeling climate without the REAL understanding of what's going on is like you trying to model my personal finances under these conditions You only know what my monthly pay is on average You don't know what all of my expenses are Of the expenses you do know you only have estimates, accurate to +/- 25% For each of those estimates there's about a 10% chance the estimates are so wrong it may even involve a change in sign. They've recently realized aerosols (released by man) cause strong warming in the arctic and may even cause some warming in other areas. Overall they might actually be positive. This is a huge change in the models and would dramatically change the sensitivity to CO2 once the models were adjusted. We also don't know the full role of the sun. While the TSI may vary by a known amount, there are dramatic changes in the sun's magnetic field, solar wind and spectrum. For instance the overall output of the sun only drops by .1%...but UV output drops several percent. The loss of UV is greater than the drop in the sun's overall output. Most of the deficit is emitted as visible light...of which 36% is reflected back into space (as opposed to 100% absorption when it's UV). And then we've got that secondary issue people seldom think about. The interactions with the sun are almost entirely one-way. While the sun may have weak "direct" influence, its regular cycles would provide a synchronizing force for many of the longer term climate cycles here on earth. Do you think the models accurately model this? This is not like crash test simulation software...where we can actually crash cars repeatedly to hone the models. Even if the models were tweaked out to hindcast to some very rough "precision"...they still might have the relationships wrong. We only have the one example of past history. No climate model has ever been validated.
|
|
|
Post by steve on Jul 21, 2009 9:05:20 GMT
Thanks for dropping by Steve Easterbrook, but I'm kind of embarrassed now that you found the link from here to your site and got involved Well I for one found it interesting that an independent "audit" or whatever you might call it, comes up with a lower rate of errors than NASA software, and it seems to fit with my observation that you can iron out bugs by repeated testing just as well as you can iron them out by high quality V&V procedures in the "one-shot" scenario in which you can't run in full production mode till the critical moment. It'd be nice for people to acknowledge that maybe these models *are* adequately verified and audited before moving on to the question as to whether they are validated as well. One method of validation is comparing a model against a chosen set of observations in a different set of scenarios (eg. Pinatubo, winter/summer differences, the 1998 El Niño). Another is what socold said - comparing with results from different organisations who are doing a similar work. In both cases, given that a model is not and cannot be a perfect representation of the earth, a high degree of judgement is required as to what observations are useful, and peer review is one way of getting that judgement.
|
|