Hypervisor fatal page fault XEN 4.3.1

Monday, November 04 2013, 07:20 PM
I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's.
DOM0 is Centos 6.3 based with linux kernel 3.10.16.
In my configuration all of the windows HVMs are running having been restored from xl save.
VM's are destroyed or restored in an on-demand fashion. After some time XEN will experience a fatal page fault while restoring one of the windows HVM subjects. This does not happen very often, perhaps once in a 16 to 48 hour period.
The stack trace from xen follows. Thanks in advance for any help.

(XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]----
(XEN) CPU: 52
(XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0
(XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
(XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000
(XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000
(XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000
(XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000
(XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0
(XEN) cr3: 000000211bee5000 cr2: ffff810000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) Xen stack trace from rsp=ffff8310333e7cd8:
(XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000
(XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548
(XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60
(XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000
(XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000
(XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440
(XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880
(XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880
(XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000
(XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440
(XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380
(XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00
(XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490
(XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c
(XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9
(XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN) [] domain_page_map_to_mfn+0x86/0xc0
(XEN) [] nvmx_handle_vmlaunch+0x49/0x160
(XEN) [] __update_vcpu_system_time+0x240/0x310
(XEN) [] vmx_vmexit_handler+0xb58/0x18c0
(XEN) [] pt_restore_timer+0xa8/0xc0
(XEN) [] hvm_io_assist+0xef/0x120
(XEN) [] hvm_do_resume+0x195/0x1c0
(XEN) [] vmx_do_resume+0x148/0x210
(XEN) [] context_switch+0x1bc/0xfc0
(XEN) [] schedule+0x254/0x5f0
(XEN) [] pt_update_irq+0x256/0x2b0
(XEN) [] timer_softirq_action+0x168/0x210
(XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0
(XEN) [] nvmx_switch_guest+0x54/0x1560
(XEN) [] vmx_intr_assist+0x6c/0x490
(XEN) [] vmx_vmenter_helper+0x88/0x160
(XEN) [] __do_softirq+0x69/0xa0
(XEN) [] __do_softirq+0x69/0xa0
(XEN) [] vmx_asm_do_vmentry+0/0xed
(XEN) Pagetable walk from ffff810000000000:
(XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff
(XEN) L3[0x000] = 0000000000000000 ffffffffffffffff
(XEN) ****************************************
(XEN) Panic on CPU 52:
(XEN) [error_code=0000]
(XEN) Faulting linear address: ffff810000000000
(XEN) ****************************************
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

Wednesday, November 06 2013, 05:27 PM - #permalink
Discussion is being resolved at http://lists.xenproject.org/archives/html/xen-devel/2013-11/msg00340.html. so will marked this one as resolved.

For future reference on raising bugs, see wiki.xenproject.org/wiki/Reporting_Bugs_against_Xen
    Monday, November 04 2013, 07:57 PM - #permalink
    Hi, I cross-posted to http://lists.xenproject.org/archives/html/xen-devel/2013-11/msg00340.html
    You may want to watch this. Will include you on the thread if you PM me your e-mail address. This is what came back in terms of questions.

    Which version of Xen were these images saved on?

    Are you expecting to be using nested-virt? (It is still very definitely experimental)
    Monday, November 04 2013, 09:04 PM - #permalink
    We were careful to regenerate all the images after upgrading the 4.3.1. Also saw the same problem on 4.3.0.
    Not using nested-virt.
