Blag

He's not dead, he's resting

Yay for git

Linux 2.6.26-rc2 wouldn’t boot on my desktop. Linux 2.6.25 worked. In the good old days, tracking down why would be a major pain in the ass. But now, a quick git bisect and fifteen reboots later, I have the exact commit: 3def3d6ddf43dbe20c00c3cbc38dfacc8586998f, also known as:

Author: Yinghai Lu
Date:   Fri Feb 22 17:07:16 2008 -0800

    x86: clean up e820_reserve_resources on 64-bit
    
    e820_resource_resources could use insert_resource instead of request_resource
    also move code_resource, data_resource, bss_resource, and crashk_res
    out of e820_reserve_resources.
    
    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

Verifying that this really is the offender is equally easy — a quick ‘git revert’ on head and another reboot and the kernel’s working again. Now, I know nothing about what an e820 is beyond what Google tells me, but hopefully someone else will.

As much as I hate to say it, if this were Subversion I’d still be tracking down the bug. And if it were CVS, I wouldn’t’ve bothered.

11 responses to “Yay for git

  1. Chris May 18, 2008 at 4:46 pm

    Obviously, you’re not familiar with svn-bisect: http://search.cpan.org/perldoc?svn-bisect

  2. Ciaran McCreesh May 18, 2008 at 5:57 pm

    Actually, I am. It’s just really really really really slow…

  3. Dave Witbrodt August 20, 2008 at 1:11 pm

    Are you still experiencing this regression? I am at LKML working on a
    fix for a regression just like this —
    identical, I think — and was wondering
    how your situation has gone since May?
    I use Debian, and stock kernels will
    boot if I add “hpet=disable” — does
    this work for you? Also, are you willing
    to recompile kernels as we try to triage
    this thing to identify the root cause of
    the problem?

  4. Ciaran McCreesh August 20, 2008 at 1:20 pm

    Yes, it’s still broken. I’ve spoken to Yinghai Lu about the problem, but neither of us managed to get anywhere towards solving it, so I’ve taken to just reverting the commit locally.

    I’ve no objection to trying further patches etc if you think you can get anywhere. I do often need to keep the box up for several days at a time, though, so I can’t always guarantee particularly quick testing. I’ll let you know about hpet=disable when I can next reboot.

  5. Dave Witbrodt August 21, 2008 at 4:59 am

    Well, thank you for offering to help. I believe we have the same problem, even though we have much different hardware. On 2 of my 3 systems, both 3def3d6d and the very next commit have to be reverted in order to prevent the hang at boot.

    I was hoping to inject some printk()’s into your code to find out where the hang occurs; the kernel team probably would like some /proc/iomem data, most likely from a kernel that works without lockups. (You could provide the latter without a reboot.)

    I have a very long thread going at LKML since 8/4. I thought I was getting close to finally pinning down the problem today, but that is now in doubt.

    If you could post that /proc/iomem data at linux-kernel (AT) vger.kernel.org with the subject line:
    HPET regression in 2.6.26 versus 2.6.25

    I would be most grateful. Also, briefly remind everyone of the facts — you were there in May, but left having to revert the problem commit yourself in future versions of the kernel. That may or may not bring a tear to their eye, but it will underline the fact that they will be facing a major problem when 2.6.26 hits the major distros, and all sorts of complaints about 3def3d6d start flooding in.

    Thanks,
    DW

  6. Dave Witbrodt August 21, 2008 at 5:02 am

    Oops, never mind about /proc/iomem! I see that you already submitted that info back in May.

    A short post reminding them of the facts, using the subject line I mentioned, would be nice.

    THX

  7. Dave Witbrodt August 21, 2008 at 3:15 pm

    Oh, and one more thing… ;)

    I am not subscribed there, so it would be a real help if you could add my email address to the CC line, and then I can reply — putting all of the others that I have on my CC line onto yours, in case you want to make future replies.

  8. Ciaran McCreesh August 22, 2008 at 3:45 pm

    Alright, I’ve looked over the thread on marc. I’ll give hpet=disable a try at the weekend and then post details to lkml.

  9. Dave Witbrodt August 24, 2008 at 2:28 am

    I have some good news for me, and some bad news for you. These kernel hackers are really good, and Yinghai was finally able to use a “quirk” approach to solve the problem with 2 systems that were experiencing the regression.

    Once I discovered your May posts to LKML, I had hoped to get you involved quickly enough to prevent a quirks-based approach — because I suspect that the changes in the original problematic commit will affect more people than just we two. No way is the kernel team going to want separate quirks for each person who reports a hanging kernel!

    My advice is for you to get involved again while this is still fresh. I actually have to leave LKML ASAP because of an embarassing problem with my email client — I was temporarily forced to use my ISP’s webmail system, but it’s broken header support messes up threading in their inboxes — but I make a last argument that they should consider the possibility that they will be facing dozens of quirks when 2.6.26 hits the major distros. (I hope I’m wrong, of course.)

    My hardware is radically different from yours, so I’m very sure that the patch which solves my troubles will not help you at all. Sorry I didn’t find you sooner!

  10. Dave Witbrodt August 26, 2008 at 12:56 pm

    One last update before I leave you alone.

    Ingo Molnar and Yinghai Lu continued to work on this after my last comment here. They provided a more generic solution — which works on my hardware, and *should* work on other hardware as well — so hopefully you are covered now.

    The fix should make it into 2.6.27-rc5, and I am hoping to get it into stable 2.6.26.X as well (if possible).

  11. Ciaran McCreesh August 26, 2008 at 6:25 pm

    Oh good. Looks like git head works just fine now for me, so whatever it was appears to be fixed.

Leave a comment