Page 1 of 3

Integration tests on 3 top-model CAS systems

Unread postPosted: 07 Jun 2015, 22:43
by quinyu
311 indefinite integration problems were tested on TI-Nspire CX CAS, HP Prime and Casio ClassPad II emulators, with latest OSes available to us. The quick summary:

  • The Casio ClassPad II solved 69% of the integrals correctly, the TI-Nspire CX CAS 71% and the HP Prime 81%.
  • Using a confidence level of p=0.95, it is safe to state that the HP Prime performed significantly better than the other two calculators tested.
  • We would love if the respective manufacturers would fix the issues.

The detailed report can be found here: http://tiplanet.org/modules/archives/download.php?id=251888

No funding of any sort was received for this ongoing test. We claim no conflict of interest. No rabbits were harmed in the procedure. Yet. ;~)

Re: Integration tests on 3 top-model CAS systems

Unread postPosted: 07 Jun 2015, 22:50
by Excale
Nice :).

You put the result in red when it was wrong or unsuccessful. Did you also count the number of wrong (1+1 -> 3 is wrong; 1+2 -> 4-1 is correct (although it's not what you want) answer for each calculator?

Re: Integration tests on 3 top-model CAS systems

Unread postPosted: 07 Jun 2015, 23:03
by quinyu
As long as the derivative of the answer given by the calculator is identical to the expression that was integrated, it was counted as correct; otherwise (or if the calculator froze/rebooted/started doing weird things) as a fail.

There is no one integration result (as for example, you can rewrite the hyperbolic functions in terms of logarithms, and that's just one example out of hundreds), but they should come to the same derivative all the same (that is: given Int(f(x),x)=g(x), and f(x)-deriv(g(x),x)=0, then it's good. Simplifications and rewrites were taken into account.) If not, then the integration is wrong. Luckily, finding a derivative (like the checking requires) is much simpler and quicker than integrating (this can be proven; less simple on complex numbers, but still).

I have in some places used blue as well (spot them all and figure what was meant :P)

So answering your question: 1+1 -> 2 was considered as correct, just like 1+2 -> 4-1. At places I complained about the bulkiness of the results (don't we all?), but as long as it was a closed form and could be shown to give the same derivative, they were accepted.

Re: Integration tests on 3 top-model CAS systems

Unread postPosted: 07 Jun 2015, 23:04
by Adriweb
Nice document indeed!

Bernard Parisse is reading TI-Planet so I'm sure he'll stumble upon this topic sooner or later, but in the meantime I'm going to share this to TI, maybe it can be helpful to improve the CAS engine :)

(one more thing : it would have been fun to add Wolfram Mathematica as another CAS engine, it probably would have obliterated all 3 calcs :P)

Re: Integration tests on 3 top-model CAS systems

Unread postPosted: 07 Jun 2015, 23:06
by Excale
I was more thinking about putting it in red when it answered with an integral.

It's a fail, but not a wrong result.

And... I really prefer to get a fail over a wrong result.

Edit:
The other fun facts is that if you combine all 3 calcs, you get a very good result. So: buy all of them :P.

Re: Integration tests on 3 top-model CAS systems

Unread postPosted: 07 Jun 2015, 23:22
by quinyu
As of the TI, I don't know, I keep on submitting these stuffs (about once every 100 integrals covered) to TI as well as Casio (couldn't find a mail address for HP yet); and as of Wolfram Mathematica, you would be surprised. It can sometimes go very wrong. RUBI is my choice of integrator there.

As of a partial result - I still count it as a bad thing since the calculator throws the bit that was ultimately unsolvable for it back on us. It would remain that way.

Done the statistics check: 33 of the 311 problems were uncrackable for any of the three calculators. That's about a tenth of the problems. And currently I'm pretty damn fine with my FX-991DE plus - not to mention programmable and graphical calcs are not permitted in my school, at least not in the tests. No reason to invest in any of the three. Maybe later.

Re: Integration tests on 3 top-model CAS systems

Unread postPosted: 07 Jun 2015, 23:28
by Adriweb
quinyu wrote:As of the TI, I don't know, I keep on submitting these stuffs (about once every 100 integrals covered) to TI as well as Casio (couldn't find a mail address for HP yet);

For TI, I (and some more people here) happen to know some TIers directly, so we can report bugs etc. directly to them (instead of going through TI-Cares etc.).
For HP, the CAS engine of the Prime is giac/xcas, which is developed by Bernard Parisse, which is a member of this forum :)
And I don't know about Casio.

By the way, maybe I skipped/didn't see it in the .pdf but did you use the student software for the Nspire tests, or an actual device ?
I have actually developed an REPL for the Nspire's CAS (even though it is not public yet), which allows to tests several dozens (hundreds?) of calculations per second, actually (well, it can work either as a REPL, or take input from a file and the output will be in stdout). That would probably help for a test suite.
And I suppose that having some kind of a repl/commandline interface for giac is trivial to get.

Re: Integration tests on 3 top-model CAS systems

Unread postPosted: 07 Jun 2015, 23:36
by quinyu
Only software for all three. But since I don't like time limited options, that's kArmTI running there for the TI. Casio and HP run the manufacturer-released emus (with Casio running virtualised.) And as of hundreds of calculations per second - most of the time is the actual typing, so it's nice but wouldn't help me much. Thanks for mentioning anyhow. Casio's emu is lagging one release behind as compared to the real deal, but since I found no way to insert the new OS, I'm letting it hang in the air for now.

Re: Integration tests on 3 top-model CAS systems

Unread postPosted: 07 Jun 2015, 23:39
by Bisam
I can see many issues, in the paper :
  • Why is TI Nspire's answer for #18 not accepted ? I suppose that it is because of automatic verification... but it is perfectly correct.
  • The same for #39...
  • #50 is counted wrong for both Nspire and Classpad... when it is correct for both. Why is that ?
  • #52 and #53 are again correct for Nspire but counted as wrong
  • #68 is counted as a fail where it should be the best answer !! the other two answers are wrong when n is -1. However, the Nspire doesn't give an answer even if n is specified to be positive for example.

I didn't run through all answers but I'd like to know the reasons for excluding some good answers...

Re: Integration tests on 3 top-model CAS systems

Unread postPosted: 07 Jun 2015, 23:45
by Adriweb
quinyu wrote:Only software for all three. But since I don't like time limited options, that's kArmTI running there for the TI.

I can't help but mention Firebird Emu, now that it's out :)

Casio and HP run the manufacturer-released emus (with Casio running virtualized.)

Note: the official software(s) are actually simulators, not emulators (they [try to] reproduce the software's behaviour (by compiling the source code for the desktop architecture and not the calc's), not the hardware, like an emulator would do)

quinyu wrote:And as of hundreds of calculations per second - most of the time is the actual typing, so it's nice but wouldn't help me much. Thanks for mentioning anyhow.

Well, that would precisely allow you to have all the tests in a .txt file, and running all the tests comparing the output to the expected result would allow you to get test results in a matter of seconds, that's infinitely faster than typing every single one by hand and comparing the output manually :P