Reliability of GCM's and related |

curiousgeorge
Level 5 Rank

Posts: 987

Reliability of GCM's and related Apr 5, 2010 11:54:49 GMT

Quote

Post by curiousgeorge on Apr 5, 2010 11:54:49 GMT

Apr 5, 2010 2:45:11 GMT magellan said:

Apr 4, 2010 15:12:30 GMT curiousgeorge said:

Perhaps this will help you: www.easterbrook.ca/steve/?p=1558

Be sure to read Steve Easterbrook's bio in case you have questions about his qualifications. There is a great deal of information on his site that pertains to this specific issue.

NASA has had a software V&V requirement for many years. Why hasn't it been applied to climate models? Hmmm?
www.goes-r.gov/procurement/flight_documents/NASA-STD-8739-8-1.pdf

Perhaps this is why - www.zdnet.co.uk/news/it-strategy/2010/01/28/are-we-taking-supercomputing-code-seriously-40004192/

In particular the last 2 paragraphs.

Someone remarked to me recently that the problem with scientific software is that most of it is written by amateurs. Harsh perhaps, but it got me thinking. The point behind the remark is that most of the software used for simulation in scientific research, especially on supercomputers, is written by scientists rather than by professional numerical software engineers.

By implication, this state of affairs might be responsible for much of what some people see as the mess we are in with respect to assurance of results from the models and portable performance of the codes. The same argument might also be extended to engineering packages and data modelling.

Immediate need
A scientist writing software for research is obviously focused on creating code that is good enough to get a useful result in a reasonable timeframe. It is hard for individual scientists and institutions to spend the extra effort in time and money to look beyond the immediate need and make sure the software meets certain standards.

Ideally, they should examine whether the implementation has been rigorously tested, has specified areas of assured validity, and allows for potential future use. For example, is the software portable — both in terms of performance and robustness — and extendable?

Once the researcher has a piece of code that gives a believable result for the parameter space of immediate interest, the focus switches back to using the code for science rather than adding engineering quality to the code.

The idea of building in comprehensive software engineering from the start in the code itself and in the development and testing process will often be dismissed before it gets serious consideration. From a scientist's viewpoint, such an approach would look like designing the software, and then adding in the science at the last stage.

Rush to do science
Part of the problem is that in their rush to do science, scientists fail to spot the software for what it is: the analogue of the experimental instrument. Consequently, it needs to be treated with the same respect that a physical experiment would receive.

Any reputable physical experiment would ensure the instruments are appropriate to the job and have been tested. They would be checked for known error behaviour in the parameter regions of study, and chosen for their ability to give a satisfactory result within a useful timeframe and budget. Those same principles should apply to a software model.

socold
Level 5 Rank

Posts: 3,723

Reliability of GCM's and related Apr 5, 2010 13:39:06 GMT

Quote

Post by socold on Apr 5, 2010 13:39:06 GMT

I bet climate models are better tested and quality controlled compared to most scientific software. There are also different teams implementing the same systems independently - not many software systems can claim to enjoy that level of redundancy and independent validation!

Climate models can be compared against each other and against hundreds of aspects of observed climate. This represents a massive bank of regression tests that will help prevent bugs being introduced. There are also independently organized test sets which climate model developers apply to take part in, the objective being sheer competition - to see which models better represent various aspects of observed climate.

Climate models are also heavily developed over time, they are extensible with extra components being added over time - even with separate models being merged together (eg carbon cycle model being inserted into a GCM). Given all the above any climate model that was insufficiently software engineered would fall by the wayside and be overtaken by those that are better developed. So there is an active pressure for good software engineering practices in the field.

curiousgeorge
Level 5 Rank

Posts: 987

Reliability of GCM's and related Apr 6, 2010 1:52:33 GMT

Quote

Post by curiousgeorge on Apr 6, 2010 1:52:33 GMT

Steve Easterbrook has a list of climate models here: www.easterbrook.ca/steve/?p=667 . I checked a few of the links, and not much info about the software engineering/reliability/quality control of these models. Mostly focused on advertising new features, which as any software engineer knows, tends to result in new problems for a variety of reasons.

CCSM4 www.ccsm.ucar.edu/csm/models/ccsm4.0/ at least provided release notes and a test record - www.ccsm.ucar.edu/models/ccsm4.0/tags/ccsm4_0_rel/ - kudos for that much anyway. The others, nothing that I could find; some went to a 404 error page.

Overall, I'm not reassured that the many different models and the updates to them (which happen on a variety of schedules ) are properly managed (configuration control, etc. ) in a way that would provide confidence in their outputs.

There are some statements on some of the model web pages that "plug-in" modules have been incorporated in the model software. I'm very uneasy with this since potential problems may have been introduced via modules of unknown pedigree or from unknown sources. Problems could manifest themselves in any number of ways; similar to the sort of issue's a browser can have with plugins such as Flash, or others. Although it is currently fashionable to sing the praises of Open-Source software, this practice when applied to this level of software , does not inspire confidence. Especially given the downstream social, political and economic consequences this has.

Some will allow download of source code, some won't. Enjoy.

PS: It's understandable that these models are still "in-work" in many cases, but as a corollary; Who among us would fly in a plane in which the flight control software was still in beta-test?

Last Edit: Apr 6, 2010 2:01:22 GMT by curiousgeorge

scpg02 Level 5 Rank Posts: 2,072	Reliability of GCM's and related Apr 6, 2010 4:57:01 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by scpg02 on Apr 6, 2010 4:57:01 GMT Apr 6, 2010 1:52:33 GMT curiousgeorge said: Mostly focused on advertising new features, which as any software engineer knows, tends to result in new problems for a variety of reasons. LOL from a joke about engineers:
	When government gains the power to control the use of private property, it becomes possible for the politically dominant to profit by high commodity prices using government regulation to constrain supply. One merely drives competitors out of business by manipulating the perception of risk to a land use preferred by a democratic majority. - Mark Edward Vande Pol

poitsplace
Level 5 Rank

Posts: 1,296

Reliability of GCM's and related Apr 6, 2010 5:48:56 GMT

Quote

Post by poitsplace on Apr 6, 2010 5:48:56 GMT

Apr 5, 2010 13:39:06 GMT socold said:

I bet climate models are better tested and quality controlled compared to most scientific software. There are also different teams implementing the same systems independently - not many software systems can claim to enjoy that level of redundancy and independent validation!

Climate models are among the LEAST verified. Do not ever presume that just because there aren't a lot of errors in the code that cause crashing or oddball behavior...that the underlying concepts of the models are intact. A video game might not crash but the world it models has nothing at all to do with the REAL world (at least, I haven't seen any zombies about).

Climate models can be compared against each other and against hundreds of aspects of observed climate. This represents a massive bank of regression tests that will help prevent bugs being introduced. There are also independently organized test sets which climate model developers apply to take part in, the objective being sheer competition - to see which models better represent various aspects of observed climate.

Again you mix up software errors with errors in the basic concepts behind the models. You cannot verify a model with other models...I'm sorry but you're just an idiot if you believe this. They just released my favorite video game on the Mac...and STILL there are no zombies in the real world. By your logic, if there are two or more versions of the same simulation there MUST be zombies (if the simulation is of zombies).

Reality is simply...inconvenient.

We had an underlying warming trend. Is our warming that continued warming? Is our warming entirely CO2? Is it a mix of the two?

We had a change of the ocean currents right when there was cooling...as well as an increase in industrial output...both starting at the end of WWII. Was there suddenly a MASSIVE increase in aerosol output that caused cooling? Was it almost exclusively the PDO? Was it a mix of the two?

Then...darn it...we had ANOTHER change in the PDO right around the time we pushed newer clean air standards. Was the warming from the PDO and other currents? Was it form the loss of aerosol cooling? Was it some of both? What about that pre-existing warming trend?

There is just no way the basic premise of the models can be verified. That any "scientist" would continue to assert their validity should be a warning flag.

steve
Level 5 Rank

Posts: 3,643

Reliability of GCM's and related Apr 6, 2010 10:40:29 GMT

Quote

Post by steve on Apr 6, 2010 10:40:29 GMT

Steve Easterbrook's preliminary validation of one major climate model identified 0.3 errors per 10,000 lines of code. Space Shuttle software has around 3 times as many errors per 10,000 lines of code according to Easterbrook. He reported that here, but was sent away with a flea in his ear.

solarcycle24com.proboards.com/index.cgi?action=gotopost&board=globalwarming&thread=735&post=24666

Obviously, given the complexity of the problem, identifying that a model is doing what you intended it to do is just a first step. But surely even a headline figure that suggests such a low error level should undermine any claim that the software standards are uniformally bad.

Poitsplace claims it is impossible to verify a model. I don't know what he means by that since most models in many areas of science are currently unverifiable yet produce successful predictions. Demanding that GCMs accurately forecast a certain weather phenomenon when we know that the supercomputers are not powerful enough to reach the required resolution and that the observations are not good enough to initialise such a model is simply an attempt to set the bar impossibly high.

The converse argument is not to allow any major changes to atmospheric content until an engineering quality study is complete. I think I will stay away from mining analogies today, but hopefully you will get my drift.

If we could produce a model that was as good (or bad) at simulating the weather and climate as the current crop, yet demonstrated feedbacks substantially less than those existing GCMs, then that would be interesting.

poitsplace
Level 5 Rank

Posts: 1,296

Reliability of GCM's and related Apr 6, 2010 11:03:21 GMT

Quote

Post by poitsplace on Apr 6, 2010 11:03:21 GMT

I said "You cannot verify a model with other models...I'm sorry but you're just an idiot if you believe this."

If you write two models based on the same hypothesis...they may not be identically coded but they should get the same results (or fairly close). But if your hypothesis is crap...then both of the models will be wrong in spite of the fact that they were in agreement. I used an example of a video game written for two different platforms. The fact that both model an imaginary world does not mean their results (which match extremely well) apply to the REAL world.

ALSO
Verify (verb) to confirm the truth of.
Validate (verb) to prove valid; show or confirm the validity of something.

The models have not been verified or validated. They do terribly in the short term and if it's medium to long term the predictions have not come to pass. The models are NOT validated or verified and I'll thank you to not make this assertion again as you are now aware that it is without any shadow of a doubt...a lie

curiousgeorge
Level 5 Rank

Posts: 987

Reliability of GCM's and related Apr 6, 2010 12:14:11 GMT

Quote

Post by curiousgeorge on Apr 6, 2010 12:14:11 GMT

It seems that folks tend to focus on coding errors. Coding errors are only one of several issues, and can be equated to typo's. Buffer overflow's, logic and syntax errors, security holes, and relational anomalies in databases are much more difficult to track down and eliminate. Proper design and documentation (including version control ) can prevent much of this, and that is what seems to be lacking. A software development process that would provide confidence in that software would follow IEEE/EIA 12207 or similar standards. An overview of this standard which replaced the earlier MIL-STD-498 can be found here: sepo.spawar.navy.mil/SW_Standards.html . As can be seen, it encompasses far more than a simple "bug hunt" .

Last Edit: Apr 6, 2010 12:47:16 GMT by curiousgeorge

steve
Level 5 Rank

Posts: 3,643

Reliability of GCM's and related Apr 6, 2010 12:20:32 GMT

Quote

Post by steve on Apr 6, 2010 12:20:32 GMT

I said "You cannot verify a model with other models...I'm sorry but you're just an idiot if you believe this."

If you write two models based on the same hypothesis...they may not be identically coded but they should get the same results (or fairly close). But if your hypothesis is crap...then both of the models will be wrong in spite of the fact that they were in agreement. I used an example of a video game written for two different platforms. The fact that both model an imaginary world does not mean their results (which match extremely well) apply to the REAL world.

True. But you have ignored the fact that the models *have* been validated against the real world (which is not the same as saying that they are perfect representations of the real world). As you said in your follow-up post you are unhappy with the level of validation. But you seem not to understand that 98% of model validation is done before CO2 levels in the model are increased. The models are validated against climatology, not against global warming. Good examples of the 2% of validation done after changes in atmospheric components would be projections of warming done in the 1970s and 1980s that were followed by sequentially warmest decades, cooling following Pinatubo, stratospheric cooling and increases in humidity. Though for many phenomena, the climatological changes are hard to determine due to poorer levels observations in the past.

Technically, verification is shorthand for proving that you've followed good procedures to convert the plan or theory into good quality code. The error levels in the code measured by Steve Easterbrook suggest that the models are well-verified - the projected warming might be incomplete science but is not a memory leak.

steve
Level 5 Rank

Posts: 3,643

Reliability of GCM's and related Apr 6, 2010 12:39:17 GMT

Quote

Post by steve on Apr 6, 2010 12:39:17 GMT

Apr 6, 2010 12:14:11 GMT curiousgeorge said:

It seems that folks tend to focus on coding errors. Coding errors are only one of several issues, and can be equated to typo's. Buffer overflow's, logic and syntax errors, security holes, and relational anomalies in databases are much more difficult to track down and eliminate. Proper design and documentation (including version control ) can prevent much of this, and that is what seems to be lacking.

Errors in climate models can be and are detected by running them repeatedly, varying input parameters and varying computing platform (eg. the climateprediction.net project involved running a climate model on a PC which I expect has a different configuration to many supercomputers!)

My own experience of building models (not climate models) is that you rely as much on a physics-based understanding of what should or should not happen happen which gives you lots of ideas of how to test the model to make it show its bugs.

Most of the current crop of models have evolved over many years and many generations of supercomputer technology. This sort of testing would identify buffer overflows, logic and syntax errors, and so forth which I assume would be included in the 0.3 errors per 10,000 LOC figure that was given.

www.cs.toronto.edu/~sme/papers/2008/Easterbrook-Johns-2008.pdf

curiousgeorge
Level 5 Rank

Posts: 987

Reliability of GCM's and related Apr 6, 2010 13:00:32 GMT

Quote

Post by curiousgeorge on Apr 6, 2010 13:00:32 GMT

Apr 6, 2010 12:39:17 GMT steve said:

Apr 6, 2010 12:14:11 GMT curiousgeorge said:

It seems that folks tend to focus on coding errors. Coding errors are only one of several issues, and can be equated to typo's. Buffer overflow's, logic and syntax errors, security holes, and relational anomalies in databases are much more difficult to track down and eliminate. Proper design and documentation (including version control ) can prevent much of this, and that is what seems to be lacking.

Errors in climate models can be and are detected by running them repeatedly, varying input parameters and varying computing platform (eg. the climateprediction.net project involved running a climate model on a PC which I expect has a different configuration to many supercomputers!)

My own experience of building models (not climate models) is that you rely as much on a physics-based understanding of what should or should not happen happen which gives you lots of ideas of how to test the model to make it show its bugs.

Most of the current crop of models have evolved over many years and many generations of supercomputer technology. This sort of testing would identify buffer overflows, logic and syntax errors, and so forth which I assume would be included in the 0.3 errors per 10,000 LOC figure that was given.

www.cs.toronto.edu/~sme/papers/2008/Easterbrook-Johns-2008.pdf

I should have waited a few minutes before editing my last, so I'll repeat it here as a courtesy: " A software development process that would provide confidence in that software would follow IEEE/EIA 12207 or similar standards. An overview of this standard which replaced the earlier MIL-STD-498 can be found here: sepo.spawar.navy.mil/SW_Standards.html . As can be seen, it encompasses far more than a simple "bug hunt" . "

Simply because a system has been in development for many years is not evidence of it's quality - MS Windows for example.

If those models have been developed with the above standard or similar adhered to, then it would be to the developers benefit to say so, and provide documentation of same. I haven't been able to find such documentation (of adherence to best practices/standards ), so if you have please share.

steve
Level 5 Rank

Posts: 3,643

Reliability of GCM's and related Apr 6, 2010 13:50:03 GMT

Quote

Post by steve on Apr 6, 2010 13:50:03 GMT

curiousgeorge, I did not reference to the longevity to validate the development process, I referenced it to point out that the error count of 0.3 errors per 10,000 more than likely would have included the sorts of errors you referred to, as they are the sorts of errors that get picked up when you move from platform to platform and from compiler to compiler. eg buffer overflows on one machine will silently overwrite another array with garbage, and on another machine will cause a fatal error.

There are many standards around described by acronyn+number, but essentially they amount to procedures for ensuring that code does what is expected, usually by ensuring, through process of recording issues, reviewing changes and doing appropriate testing, that it is has an acceptable design and contains few errors. What is your evidence that there is no acceptable procedure for climate models?

The Easterbrook paper says in its conclusions:

for climate science...model validation is routinely performed, as it is built into a systematic integration and regression testing process, with each model run set up as a controlled experiment....Climate models have a ... mature, domain-specific software development process. Hence it is hard to identify potential for radical improvements in the efficiency of what is a "grand challenge" science and software engineering problem

Last Edit: Apr 6, 2010 13:52:15 GMT by steve

curiousgeorge
Level 5 Rank

Posts: 987

Reliability of GCM's and related Apr 6, 2010 14:29:53 GMT

Quote

Post by curiousgeorge on Apr 6, 2010 14:29:53 GMT

Apr 6, 2010 13:50:03 GMT steve said:

curiousgeorge, I did not reference to the longevity to validate the development process, I referenced it to point out that the error count of 0.3 errors per 10,000 more than likely would have included the sorts of errors you referred to, as they are the sorts of errors that get picked up when you move from platform to platform and from compiler to compiler. eg buffer overflows on one machine will silently overwrite another array with garbage, and on another machine will cause a fatal error.

There are many standards around described by acronyn+number, but essentially they amount to procedures for ensuring that code does what is expected, usually by ensuring, through process of recording issues, reviewing changes and doing appropriate testing, that it is has an acceptable design and contains few errors. What is your evidence that there is no acceptable procedure for climate models?

The Easterbrook paper says in its conclusions:

for climate science...model validation is routinely performed, as it is built into a systematic integration and regression testing process, with each model run set up as a controlled experiment....Climate models have a ... mature, domain-specific software development process. Hence it is hard to identify potential for radical improvements in the efficiency of what is a "grand challenge" science and software engineering problem

I didn't say there was no evidence of acceptable procedure, I said I couldn't find any documentation to support adherence to recognized standards, which I think would be of value in making the case for believing the predicted outcomes of these models; which are being used to justify enormous expenditure and redistribution of wealth on a global scale, radical modification of living standards, population reduction, abandonment of fossil fuels, etc., etc.

If I had some model which I expected to use as justification to literally change the world, you can bet I would have every single duck I could find lined up and marching in step, and there would be a brass band advertising it. That isn't happening from what I can tell.

steve
Level 5 Rank

Posts: 3,643

Reliability of GCM's and related Apr 6, 2010 15:37:01 GMT

Quote

Post by steve on Apr 6, 2010 15:37:01 GMT

curiousgeorge,

OK. Well whether or not you could find any documentation, it appears that an external expert with experience with space flight software is more than happy with the procedures.

But given the way that models are validated (through comparison with elements of the real world), arguably the most important, and certainly the most interesting, documentation is the validation documentation which is the results published in scientific papers. The most perfectly designed and structured model is uninteresting if it predicts an ice age next Christmas. The worst designed piece of code written with lots of GOTO statements, recursive loops in one giant subroutine, that manages to predict weather and climate for the next 2 months would be very interesting, though its design would probably make adding new science very hard.

Also, do you believe the results from interplanetary space missions even though you have never seen the documentation for the space craft or their instrumentation?

poitsplace
Level 5 Rank

Posts: 1,296

Reliability of GCM's and related Apr 6, 2010 17:04:25 GMT

Quote

Post by poitsplace on Apr 6, 2010 17:04:25 GMT

Apr 6, 2010 12:20:32 GMT steve said:

Good examples of the 2% of validation done after changes in atmospheric components would be projections of warming done in the 1970s and 1980s that were followed by sequentially warmest decades, cooling following Pinatubo, stratospheric cooling and increases in humidity. Though for many phenomena, the climatological changes are hard to determine due to poorer levels observations in the past.

It is always amazing to me...even though I know what deficiency causes the problem...when people do things like this. Yes, they made a prediction for the 80s and 90s...and then at the end of the 90s they suddenly discovered the PDO and the temperature increase leveled off NOT where the CO2 forcing hypothesis said...but where an ocean-current dominated model said.

Before this period you people literally have nothing. There are literally no explanations of the numerous flip-flops of the holocene. Also, the behavior relative to CO2 during the glacial stages simply doesn't rule out CO2 forcing with absolute certainty. Most importantly it, sure as heck doesn't support it in any way.

Last Edit: Apr 6, 2010 17:05:49 GMT by poitsplace

Reliability of GCM's and related

Post by curiousgeorge on Apr 5, 2010 11:54:49 GMT

Post by socold on Apr 5, 2010 13:39:06 GMT

Post by curiousgeorge on Apr 6, 2010 1:52:33 GMT

Post by scpg02 on Apr 6, 2010 4:57:01 GMT

Post by poitsplace on Apr 6, 2010 5:48:56 GMT

Post by steve on Apr 6, 2010 10:40:29 GMT

Post by poitsplace on Apr 6, 2010 11:03:21 GMT

Post by curiousgeorge on Apr 6, 2010 12:14:11 GMT

Post by steve on Apr 6, 2010 12:20:32 GMT

Post by steve on Apr 6, 2010 12:39:17 GMT

Post by curiousgeorge on Apr 6, 2010 13:00:32 GMT

Post by steve on Apr 6, 2010 13:50:03 GMT

Post by curiousgeorge on Apr 6, 2010 14:29:53 GMT

Post by steve on Apr 6, 2010 15:37:01 GMT

Post by poitsplace on Apr 6, 2010 17:04:25 GMT