Uploaded image for project: 'JikesRVM'
  1. JikesRVM
  2. RVM-687

Cattrack performance: regression report unusable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: High
    • Resolution: Not A Bug
    • Affects Version/s: None
    • Fix Version/s: hg tip
    • Labels:
      None

      Description

      The last several weeks, cattrack pretty much always timesout/fails when I try to get it to show me a regression report.

      The server as a whole isn't down (one can follow a failure link and navigate around program output, etc), but asking for a regression or performance report results in the request "hanging" until it times out.

      I suspect some performance pathology in the querries used to construct the regression report (looking for last 10 versions of the run??) but don't know enough about how the system works to figure out what the problem is.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              pdonald Peter Donald added a comment -

              Re: (1) I suspect this will make no significant performance impact and given the effort required to achieve this then it I wouldn't bother. If there is a significant performance impact then it is likely because of missing indexes.

              Re: (2) Try putting "config.log_level = :debug" into environment.rb and restarting the server. Then hit the regression report again. This should give you timings for each individual query which should help to isolate the problematic query. Then it should be easy to either index the columns causing the issues or rework the query to be less of a monster. (Then comment the "config.log_level ..." statement and restart server again as it produces heaps of debug output and you don't want to fill up the drive with that.)

              Re: (3) - I wouldn't bother unless you plan to do significant new development. There is no huge performance gain to be had there and the amount of little things that have changed would require a bit of effort to get going again.

              Show
              pdonald Peter Donald added a comment - Re: (1) I suspect this will make no significant performance impact and given the effort required to achieve this then it I wouldn't bother. If there is a significant performance impact then it is likely because of missing indexes. Re: (2) Try putting "config.log_level = :debug" into environment.rb and restarting the server. Then hit the regression report again. This should give you timings for each individual query which should help to isolate the problematic query. Then it should be easy to either index the columns causing the issues or rework the query to be less of a monster. (Then comment the "config.log_level ..." statement and restart server again as it produces heaps of debug output and you don't want to fill up the drive with that.) Re: (3) - I wouldn't bother unless you plan to do significant new development. There is no huge performance gain to be had there and the amount of little things that have changed would require a bit of effort to get going again.
              Hide
              dgrove David Grove added a comment -

              Thanks for the quick response Peter.

              I attached a log from a regression_report. I could be reading it wrong, but it looks like the problem queries are the ones might suspect....the computation of new/intermittent/consistent failures by doing complex joins across the last 10 runs on all the test cases. Is that right? If so, is there anything to be done? They are taking about .6 seconds each and the entire page view takes about 5 seconds.

              Show
              dgrove David Grove added a comment - Thanks for the quick response Peter. I attached a log from a regression_report. I could be reading it wrong, but it looks like the problem queries are the ones might suspect....the computation of new/intermittent/consistent failures by doing complex joins across the last 10 runs on all the test cases. Is that right? If so, is there anything to be done? They are taking about .6 seconds each and the entire page view takes about 5 seconds.
              Hide
              zyridium Daniel Frampton added a comment -

              There are a bunch of different things we will ultimately need to do

              1) We can try and improve individual query results by changing the queries and database indexes.
              2) We can pre-calculate and cache results (and/or images) for the slower queries into the tables when adding result runs.
              3) We can make the system gracefully handle a slow query (such that this doesn't kill the entire system).

              I think issue (3) is the real priority at the moment. Then I would look at (1) and (2). While there may be some low hanging fruit in (1) as Peter suggests, I think the fundamental problem we have at the moment is how the system deals with slow queries and peak demand: a slow page should time out, and not bring down the entire system. Also, it appears that our current system is single threaded; a slow result prevents even a fast cached page from being returned.

              Show
              zyridium Daniel Frampton added a comment - There are a bunch of different things we will ultimately need to do 1) We can try and improve individual query results by changing the queries and database indexes. 2) We can pre-calculate and cache results (and/or images) for the slower queries into the tables when adding result runs. 3) We can make the system gracefully handle a slow query (such that this doesn't kill the entire system). I think issue (3) is the real priority at the moment. Then I would look at (1) and (2). While there may be some low hanging fruit in (1) as Peter suggests, I think the fundamental problem we have at the moment is how the system deals with slow queries and peak demand: a slow page should time out, and not bring down the entire system. Also, it appears that our current system is single threaded; a slow result prevents even a fast cached page from being returned.
              Hide
              pdonald Peter Donald added a comment -

              I just had a look and can't see any easy way to tackle (1). It looks like most of the slow queries now generate a huge temp table (i.e. 22 thousand largish rows) which exceeds the shared memory cache. It may be possible to tune postgres but I don't know how off the top of my head.

              (3) can be tackled by RVM-307 ... should be easy to do but I am reluctant to tackle as I can't monitor the situation.

              (2) is probably the way forward. At one stage I had a version that did produce a nice easy queryable olap like table (see tags/pre-olap-deletion) so this could potentially be revived for performing queries against.

              Images/html generated is already cached (see the public/results/* directories) but the page needs to be hit once. It looks like someone (maybe me?) tried to do this via wget after an import but disabled this and had some bad syntax. I fixed the syntax errors and reenabled this so it may be possible that after import the performance report is regenerated and thus is served as a static page when a human requests it. (To disable it in case I stuffed something up change perform_wget=true to perform_wget=false in app/services/test_run_importer.rb)

              Show
              pdonald Peter Donald added a comment - I just had a look and can't see any easy way to tackle (1). It looks like most of the slow queries now generate a huge temp table (i.e. 22 thousand largish rows) which exceeds the shared memory cache. It may be possible to tune postgres but I don't know how off the top of my head. (3) can be tackled by RVM-307 ... should be easy to do but I am reluctant to tackle as I can't monitor the situation. (2) is probably the way forward. At one stage I had a version that did produce a nice easy queryable olap like table (see tags/pre-olap-deletion) so this could potentially be revived for performing queries against. Images/html generated is already cached (see the public/results/* directories) but the page needs to be hit once. It looks like someone (maybe me?) tried to do this via wget after an import but disabled this and had some bad syntax. I fixed the syntax errors and reenabled this so it may be possible that after import the performance report is regenerated and thus is served as a static page when a human requests it. (To disable it in case I stuffed something up change perform_wget=true to perform_wget=false in app/services/test_run_importer.rb)
              Hide
              dgrove David Grove added a comment -

              This particular problem has been "solved" by reducing the size of the database.

              Will handle misc improvements in other issues.

              Show
              dgrove David Grove added a comment - This particular problem has been "solved" by reducing the size of the database. Will handle misc improvements in other issues.

                People

                • Assignee:
                  dgrove David Grove
                  Reporter:
                  dgrove David Grove
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  0 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: