4.1.1 - Fix for a potential session initialization failure when running
against 2.6.30 or later x86_64 kernel dumpfiles whose pages have been
filtered by the the makedumpfile facility. Without the patch, the
session may fail with the error message "crash: page excluded: kernel
virtual address: <address> type: cpu number (per_cpu)", but will
initialize OK if the "--zero_excluded" command line option is used.
(anderson@redhat.com)
- Added "lsmod" as a built-in alias for the "mod" command.
(anderson@redhat.com)
- Added a defensive mechanism to handle corrupt Elf32_Nhdr/Elf64_Nhdr
structures in an ELF vmcore. The fix no longer presumes that all
Elf32_Nhdr/Elf64_Nhdr structure contents are legitimate, and if an
invalid Elf32_Nhdr or Elf64_Nhdr structure is encountered, it will
be ignored and a warning message will be displayed showing the
structure contents, and the crash session will continue on. Without
the patch, it was possible that an invalid n_namesz or n_descsz
value could cause a segmentation violation when attempting to read
the bogus note contents.
(anderson@redhat.com)
- Fix for "mach -c" command option on 2.6.30 and later x86_64 kernels
in which the per-cpu array x8664_pda data structures were replaced
with per-cpu variables. Without the patch, the command displays
just the boot cpu's cpuinfo data structure and then fails with the
error message: "mach: invalid structure name: x8664_pda".
(anderson@redhat.com)
- Fix to properly set the DEBUG exception stack size and stack base
address on 2.6.18 and later x86_64 kernels. Without the patch, the
DEBUG exception stack was presumed to be the same size as all of the
other exception stacks, so in the extremely rare occurrance that a
kernel crash started while running on a per-cpu DEBUG stack, the
backtrace code would not recognize it as such, and would either start
the trace using stale starting stack hooks, typically from "schedule"
while running on the process stack, or the backtrace attempt would
fail with the error message "bt: cannot transition from exception
stack to current process stack".
(anderson@redhat.com)
- Related to the above, when the x86_64 "bt" is displaying a trace
segment from one of the five exception stacks, change the output from
showing just "--- <exception stack> ..." to showing which exception
stack it's working from, for example, "--- <NMI exception stack> ---"
or "--- <DEBUG exception stack> ---", etc.
(anderson@redhat.com)
- Fix for a session initialization failure when running against 2.6.30
or later x86_64 kernels if the number of possible cpus equals the
kernel's configured NR_CPUS. Without the patch, the session fails
with the error message "crash: invalid kernel virtual address: cc08
type: cpu number (per_cpu)".
(bob.montgomery@hp.com)
- Preparations in the top-level source code for the integration of
gbd-7.0. The current embedded version remains gdb-6.1.
(anderson@redhat.com)
- 4.1.0 to 4.1.1 incremental patch
(11/20/09)
4.1.0 - Fix for s390x and x86 "extend" command regression created by the
"crash -x" option introduced in crash version 4.0.9. Without the
patch, the "extend" command on s390x and x86 machines fail with the
error message: "extend: <module>.so: not an ELF format object file".
(holzheu@linux.vnet.ibm.com, anderson@redhat.com)
- Cleanup of top-level source files to address compiler warnings
generated by the CFLAGS used in the Fedora build environment:
main.c ppc64.c tools.c symbols.c defs.h qemu-load.c qemu.c
xen_hyper_command.c xendump.c netdump.c s390_dump.c lkcd_common.c
remote.c cmdline.c x86_64.c net.c dev.c kernel.c task.c filesys.c
memory.c lkcd_x86_trace.c ppc64.c x86.c s390.c s390x.c s390dbf.c
Only two bugs (s390/s390x) were discovered as a result of this
exercise. The vast majority of the warnings were primarily benign
"may be used uninitialized in this function" false-positive warnings,
but were addressed nonetheless. A few "dereferencing type-punned
pointer will break strict-aliasing rules" warnings still exist, but a
fix attempt may prove more troublesome or dangerous than it's worth.
(anderson@redhat.com)
- Fix for "pte" command on s390 and s390x machines if the pte value
argument evaluates as not present. Without the patch, the command
would not display the pte value, but would either print random stack
data (if ASCII), or worse case, cause a segmentation violation.
(anderson@redhat.com)
- Allow command redirection to pipes or files when using gdb commands
alone on the command line without preceding the command string with
"gdb". Without the patch, the pipe/redirection data on the command
line would be appended to the command string passed to gdb, leading
to bizarre results when gdb attempts to evaluate the redirection
pieces of the command string.
(bob.montgomery@hp.com)
- Fix for the processing of bit fields on big endian systems in the
SIAL extension module. Without the patch, bits are not copied to
the correct position and are not shifted the right way.
(holzheu@linux.vnet.ibm.com)
- Fix for "dis -l" to properly display line-number information for
2.6.21 and later x86_64 kernel module text addresses. Without the
patch, a single erroneous file/line-number indication would be
displayed prior to the disassembly output, typically from the file
"include/linux/cpumask.h". This was due to an abnormal text block
descriptor from a function in hpet.c, which starts in the kernel
text segment and extends up into the vsyscall FIXMAP region,
effectively encompassing all kernel module address space.
(john.wright@hp.com)
- Related to the line number patch above, fix to prevent querying the
embedded gdb module for line numbers of kernel module text addresses
if the module's debuginfo data has not been loaded. Without the
patch, the same erroneous file/line-number could be displayed by
commands like "dis -l" or "bt -l" when a module's debuginfo data
has not been loaded, on 2.6.21 and later x86_64 kernels.
(anderson@redhat.com)
- Implemented a new "ps -G" option, which restricts the process status
output to show only the data of the thread group leader of a thread
group. The original request was to avoid the display of redundant
RSS data shared by many threads.
(anderson@redhat.com)
- Several fixes for the "repeat" command when used in conjunction
with an input file. Without the patch:
(1) Depending upon the command executed from the input file, a
a SIGINT would kill the command currently being executed from
the input file, but the "repeat" command would then restart it.
(2) If a command in the input file redirected its output to a pipe,
the repeat operation could stop prematurely after executing
that particular command.
(3) If a command in the input file redirected its output to a pipe,
the zombies of the command being piped to would not be cleaned
up until the repeat command was stopped.
(4) If the last command in the input file redirected its output to a
pipe, all subsequent executions of the input file would only
display the output of that last command.
(anderson@redhat.com)
- Added "trace.c" to the extensions subdirectory, where it will get
built automatically when "make extensions" is run from the top-level
source directory. The trace.so extension module has also been added
to the crash-extensions-<version>.rpm subpackage that is created
by the crash-<version>.src.rpm, which installs extension modules
in the /usr/lib[64]/crash/extensions directory.
(anderson@redhat.com)
- Fix for a potential failure to initialize the kmem slab cache
subsystem on 2.6.22 and later CONFIG_SLAB kernels if the dumpfile
has pages excluded by the makedumpfile facility. Without the patch,
the following error message would be displayed during initialization:
"crash: page excluded: kernel virtual address: <address> type:
kmem_cache_s buffer", followed by "crash: unable to initialize kmem
slab cache subsystem".
(anderson@redhat.com)
- Fix for a potential session initialization failure on x86_64 kernels
if the dumpfile has pages excluded by the makedumpfile facility.
Without the patch, the following error message would be displayed:
"crash: page excluded: kernel virtual address: <address> type:
tss_struct ist array".
(anderson@redhat.com)
- Fix for "kmem -z" option on 2.6.29 and later kernels. Without the
patch, against 2.6.29 and 2.6.30 kernels, the embedded zone VM_STAT
contents would not be displayed after the top line showing the SIZE,
PRESENT, /MIN/LOW/HIGH and FREE page counts; on 2.6.31 kernels, the
command would fail with the error message: "kmem: invalid (optional)
structure member offsets: zone_pages_min or zone_struct_pages_min".
(anderson@redhat.com)
- Fix for "irq" command on 2.6.29 and later CONFIG_SPARSE_IRQ kernels.
Without the patch, the "irq [number]" command would fail on x86_64
with the error message: "irq: x86_64_dump_irq: irq_desc[] does not
exist?", on ia64: "ia64_dump_irq: neither irq_desc or _irq_desc
exist", and on the other architectures: "irq: neither irq_desc nor
_irq_desc symbols exist".
(anderson@redhat.com)
- Fix for the "kmem -i" option on 2.6.31 kernels. Without the patch
the SHARED column may erroneously indicate 0 pages.
(anderson@redhat.com)
- Fix for the "kmem -i" option on 2.6.26 through 2.6.30 x86_64 kernels.
Without the patch, the swap page information would not be displayed,
and the error message "kmem: swap_info[0].swap_map at <address> is
unaccessible" would be displayed.
(anderson@redhat.com)
- Fix for "kmem -p" option on older 64-bit kernels that have a 32-bit
page.flags field. Without the patch, the page.count field in the
page structure would get merged with the page.flags field, and the
result displayed as a 64-bit value in the FLAGS column.
(anderson@redhat.com)
- Fix for "kmem -i" option on older kernels whose unreferenced
page.count value was -1 (instead of 0). Without the patch,
the SHARED column would contain invalid values.
(anderson@redhat.com)
- Change the cursor location when cycling through the command history
when in "vi" editing mode (the default). When using the arrow keys,
or when using CTRL-n and CTRL-p, the cursor will be placed after the
last character in each line, and will be in "insert" mode. When
using ESC followed by j or k, the cursor will be placed on the last
character in the line, and will be in "command" mode. Without the
patch, the cursor would be placed on the first character in the line
regardless of the keys used to cycle through the history.
(anderson@redhat.com)
- 4.0.9 to 4.1.0 incremental patch
(10/07/09)
4.0.9 - Versioning has been changed such that the crash-<version>.tar.gz
file no longer contains a "-" in the <version> number, and the
crash-<version>-0.src.rpm will always have a crash.spec release
number of "0". When the crash binary is built from the src.rpm file,
the "-0" will not be included/displayed as part of the crash binary's
version number, so that it will match the crash binary version that
is built from the crash-<version>.tar.gz file. This is being done
so that distributions can take the crash-<version>.tar.gz file
and append their own crash.spec file release numbering scheme onto
the base <version> number when creating their own src.rpm package.
(anderson@redhat.com)
- Also available in Fedora Rawhide devel branch:
build: dist-f12 crash-4.0.9-2.fc12
http://koji.fedoraproject.org/koji/buildinfo?buildID=131574
- Wholesale replacement of the x86/x86_64 disassembly code in the
embedded gdb-6.1 module to that used in gdb-6.8. The primary motive
is for CONFIG_FUNCTION_TRACER kernels, which contain a 5-byte nopl
instructions that can be overwritten during runtime for dynamic
ftracing. That particular nop format was not recognized by the older
disassembly code in gdb-6.1, and printed a "(bad)" instruction
followed by a incorrect "add" instruction. For example, without the
patch, the instructions at sys_write+11 and sys_write+13 below are
not correct:
crash> dis sys_write
0xffffffff8113c56b <sys_write>: push %rbp
0xffffffff8113c56c <sys_write+1>: mov %rsp,%rbp
0xffffffff8113c56f <sys_write+4>: push %r12
0xffffffff8113c571 <sys_write+6>: push %rbx
0xffffffff8113c572 <sys_write+7>: sub $0x30,%rsp
0xffffffff8113c576 <sys_write+11>: (bad)
0xffffffff8113c578 <sys_write+13>: add %r8b,(%rax)
0xffffffff8113c57b <sys_write+16>: mov %rsi,%r12
...
With the patch, the 5-byte instruction is properly translated:
crash> dis sys_write
0xffffffff8113c56b <sys_write>: push %rbp
0xffffffff8113c56c <sys_write+1>: mov %rsp,%rbp
0xffffffff8113c56f <sys_write+4>: push %r12
0xffffffff8113c571 <sys_write+6>: push %rbx
0xffffffff8113c572 <sys_write+7>: sub $0x30,%rsp
0xffffffff8113c576 <sys_write+11>: nopl 0x0(%rax,%rax,1)
0xffffffff8113c57b <sys_write+16>: mov %rsi,%r12
...
There are other side-effects/changes such as the output of negative
relative offsets from registers. For example, without the patch,
instructions like this:
mov 0xffffffffffffffc8(%rbp),%rdx
are displayed in an easier-to-understand format:
mov -0x38(%rbp),%rdx
There are undoubtedly other subtle changes as well.
(anderson@redhat.com)
- Fix for compressed diskdump/kdump vmcores to properly handle
page descriptor structures that are located beyond a 4GB file
offset in the vmcore file.
(oomichi@mxs.nes.nec.co.jp)
- Fix for x86_64 "bt" command to properly recognize vsyscall FIXMAP
virtual addresses when encountered as the RIP in an exception frame.
Without the patch, the exception frame would be followed by the
warning message: "bt: WARNING: possibly bogus exception frame".
(anderson@redhat.com)
- Fix for the "sym <address>" command option when the address
references a symbol in the vsyscall FIXMAP virtual address page
in certain x86_64 kernel versions. Without the patch, the command
would fail with a "symbol not found" message. This would also affect
commands that perform symbolic translations of virtual addresses,
such as "rd -s".
(anderson@redhat.com)
- Fix for the x86_64 "bt" command that may possibly start the backtrace
of an active non-crashing task on its per-cpu IRQ stack instead of
starting from the NMI exception stack. This could only occur on
a kdump-generated vmcore, and as a result, the backtrace would make
a faulty transition back to the process stack, dump a bogus exception
frame, and display: "bt: WARNING: possibly bogus exception frame".
(anderson@redhat.com)
- Fix for the x86_64 "bt" command in determining the frame just above
an IRQ interrupt exception frame, or above an exception frame that
gets handled on the process stack, such as a page fault. Without
the patch, the frame size of the interrupted function was being
incorrectly calculated, and could result in the display of an invalid
stale frame just above the exception frame register dump.
(anderson@redhat.com)
- Fix for the x86_64 "bt" command's frame size calculating mechanism
to differentiate between text return addresses and the precise text
RIP address of an exception. Without the patch, the instruction of
the text return address location was being incorrectly scanned for
instructions that modify the frame size, and could result in the
skipping of a stack frame.
(anderson@redhat.com)
- Fix for usage of a System.map file argument with 2.6.30 and later
kernels (which only should be done if the vmlinux file does not match
the vmcore or live system being analyzed). Without the patch there
may be several hundred "crash: symbol count overflow (trace_kmalloc)"
messages displayed during the back-patching of the gdb minimal_symbol
table phase.
(anderson@redhat.com)
- Fix for usage of a System.map file argument whose symbol list does
not contain an "_end" symbol. Without the patch, the crash session
fails during initialization with the error message: "crash: cannot
resolve _end".
(anderson@redhat.com)
- Fix for "kmem -p <address>" or "kmem <address>" options when the
<address> is not a page structure address. Without the patch,
starting with crash version 4.0-8.11, harmless but annoying "kmem:
WARNING: sparsemem: invalid section number: 8192" messages would be
displayed as a result of this patch.
(anderson@redhat.com)
- Fix for the snap.so extension module when run on pre-2.6.31 x86_64
kernels with more than 4GB of physical memory. Without the patch,
the resultant vmcore would not include memory above 4GB because
the /proc/iomem file did not display it. A typical crash session
would fail during initialization with an error message such as
"crash: read error: kernel virtual address: 1020009d024 type:
tss_struct ist array".
(anderson@redhat.com)
- Fix for the build of the sial.so extension module if /usr/bin/bison
and /usr/bin/flex do not exist on the host build system. When those
files do not exist, the build of sial.so generates a huge number of
error messages, ending with "make[3]: [sial.so] Error 1 (ignored)".
Since it is preferable to avoid extra BuildRequires entries in the
crash.spec file for extension modules, and given that it is often
built from a tar.gz installation, the failed build will indicate:
"sial.so: build failed: requires /usr/bin/flex and /usr/bin/bison".
(anderson@redhat.com)
- Fix for the build of the snap.so extension module on older systems
running with "make" versions 3.80 or earlier. Without the patch,
the build of snap.so would fail like so:
snap.mk:4: Extraneous text after `else' directive
snap.mk:7: Extraneous text after `else' directive
snap.mk:7: *** only one `else' per conditional. Stop.
make[2]: [snap.so] Error 2 (ignored)
The snap.mk file has been modified to conform to the older format.
(anderson@redhat.com)
- Fix for the "rd" and "vtop" commands on RHEL4 x86_64 Xen paravirtual
kernels in the reading or translation of vmalloc addresses that are
not in kernel module vmalloc address space. In that kernel version
(and none other that I am aware of), the PAGE_OFFSET unity-map kernel
virtual address of 0xffffff8000000000 is larger than the address of
its VMALLOC_START, 0xffffff0000000000. Because of that, without the
patch, "rd" would fail with the error message "rd: invalid user
virtual address: <address> type: 64-bit UVADDR", "vtop" would
fail with the error message "vtop: ambiguous address: <address>
(requires -u or -k)", and "vtop -k" would incorrectly report that
the <address> was "(not a kernel virtual address)".
(anderson@redhat.com)
- Implemented a new "-x" command line option that will automatically
load extension modules from a particular directory. The search for
the extension module directory will be done in the following order,
and the first one (if any) that exists will be selected as the
target directory:
1. the directory specified in the CRASH_EXTENSIONS shell
environment variable
2. /usr/lib64/crash/extensions (64-bit architectures)
3. /usr/lib/crash/extensions
4. ./extensions
All extension modules that are found in the target directory will
be loaded automatically.
(anderson@redhat.com)
- 4.0-8.12 to 4.0.9 incremental patch
(9/10/09)
4.0-8.12 - Fix to support 2.6.30 and later x86 CONFIG_4KSTACKS kernels, where
the hardirq_ctx[] and softirq_ctx[] NR_CPUS-bounded arrays were
replaced with per-cpu variables. Without the patch, the crash
session would fail during initialization with the error message
"crash: cannot resolve: hardirq_ctx".
(oomichi@mxs.nes.nec.co.jp)
- Clean up gdb header files that generate warning messages when
compiling the top-level cmdline.c file with "make warn" or
"make Warn". (anderson@redhat.com)
- If an attempt is made to use an x86 vmlinux file on an x86_64 host,
bail out with a "not a supported file format" error immediately
instead of later on when trying to match the linux_banner string.
(anderson@redhat.com)
- Fix for "bt" command on x86 Xen hypervisor dumpfiles where a vcpu
received a shutdown NMI while running in an interrupt handler.
Without the patch, the backtrace would indicate "bt: cannot resolve
stack trace", and dump the text symbols on the stack.
(anderson@redhat.com)
- Implemented support for the KVM "save-vm" file format, which is
also proposed as the dumpfile format for the "virsh dump" command
for KVM guests.
(pbonzini@redhat.com, anderson@redhat.com)
- Increase NR_CPUS from 512 to 4096 for x86_64.
(caiqian@redhat.com)
- Correct cpu accounting when processors have been taken offline using
a new get_highest_cpu_online() utility function. Without the patch,
commands that have per-cpu displays may not show a cpu's information
and/or may show information for an offline cpu. This patch only
addresses 2.6.30 and later x86_64 kernels in which the per-cpu array
x8664_pda data structures were replaced with per-cpu variables.
(anderson@redhat.com)
- Replace the CFLAGS definition in the Makefile with a CRASH_CFLAGS
definition, which in turn contains ${CFLAGS}. This will allow the
issuing of user-defined CFLAGS on the "make" command line, as is done
according to the Fedora build guidelines.
(lkundrak@v3.sk)
- Fix for a segmentation violation within the embedded gdb module
during session invocation, when running against kernels built with
Fedora gcc version 4.4.0-12 and later (2.6.31-0.62.rc2.git4.fc12
and later Fedora kernels). The gcc update introduced a more
compact Dwarf 3 DW_AT_data_member location, which in turn required
a patch to all versions of gdb.
(lkundrak@v3.sk)
- 4.0-8.11 to 4.0-8.12 incremental patch
(8/11/09)
4.0.8.11-2.fc12 - Fedora release only
- Fix for a segmentation violation within the embedded gdb module
during session invocation, when running against kernels built with
Fedora gcc version 4.4.0-12 and later (2.6.31-0.62.rc2.git4.fc12
and later Fedora kernels). The gcc update introduced a more
compact Dwarf 3 DW_AT_data_member location, which in turn required
a patch to all versions of gdb.
(lkundrak@v3.sk)
- Available in Fedora Rawhide devel branch:
build: dist-f12 crash-4.0.8.11-2.fc12
http://koji.fedoraproject.org/koji/buildinfo?buildID=126403
(8/09/09)
4.0-8.11 - Also available in Fedora Rawhide devel branch:
build: dist-f12 crash-4.0.8.11-1.fc12
http://koji.fedoraproject.org/koji/buildinfo?buildID=125683
- Kdump ELF vmcores contain NT_PRSTATUS notes for online cpus only, so
if cpus have been offlined prior to a crash, there will be fewer
notes than the number of cpus in the system, and therefore there will
not be a one-to-one correlation between each cpu and its associated
NT_PRSTATUS note. That causes backtrace failures for architectures
like ppc64 that depend upon the contents of the NT_PRSTATUS notes for
gathering the starting stack location.
(chandru@in.ibm.com, anderson@redhat.com)
- Fix and enhancement for the "dev" command. When the command was run
against 2.6.26 or later kernels, it would fail with the error message
"dev: invalid structure member offset: char_device_struct_fops".
Additionally, even when the command did work, more often than not it
would fail to determine the file_operations structure associated with
the block or character device, and erroneously display "(none)" or
"(unused)". This patch makes a more comprehensive search for the
file_operations structure, and instead of just displaying its address
and symbolic translation, it will display the address of the data
structure that contains the pointer to the file_operations structure,
along with the symbolic translation of the file_operations structure.
For character devices, the containing structure is a "cdev", and for
block devices the containing structure is a "gendisk". The command
output adds new CDEV and GENDISK columns, and under the OPERATIONS
column is the symbolic translation of its file_operations structure.
(anderson@redhat.com, bob.montgomery@hp.com)
- Fix for a potential segmentation violation when running "foreach bt"
on a very active live system with many processes starting and ending.
Without the patch, a segmentation violation could occur when a "bt"
was attempted on a task that had become non-existent. This would
happen on x86_64 or ppc64 machines, and was due to the usage of a
kernel stack pointer taken from a stale/invalid task_struct. The
command will now recognize the bad stack pointer and display the
error message "bt: task no longer exists" or "bt: invalid/stale
stack pointer for this task: <address>".
(anderson@redhat.com)
- Fix to correctly read LKCD Version 8 and later x86 dumpfile headers.
(talk90091e@gmail.com)
- If a kdump NMI issued to a non-crashing x86_64 cpu was received while
running in schedule(), after having set the next task as "current" in
the cpu's runqueue, but prior to changing the kernel stack to that of
the next task, then a backtrace would fail to make the transition
from the NMI exception stack back to the process stack, with the
error message "bt: cannot transition from exception stack to current
process stack". This patch will report inconsistencies found between
a task marked as the current task in a cpu's runqueue, and the task
found in the per-cpu x8664_pda "pcurrent" field (2.6.29 and earlier)
or the per-cpu "current_task" variable (2.6.30 and later). If it can
be safely determined that the runqueue setting (used by default) is
premature, then the crash utility's internal per-cpu active task will
be changed to be the task indicated by the appropriate architecture
specific value. Also, a new "set -a <task>" option has been added
to manually set a task to be the "active" task on its cpu.
(anderson@redhat.com)
- Fix for x86_64 "bt" command when transitioning from the IRQ stack
back to the process stack on 2.6.29 and later kernels. Without the
patch, the interrupt exception frame address on the process stack
would be incorrectly determined, and its display would typically be
preceded by "[exception RIP: unknown or invalid address]", and the
backtrace would fail from that point on.
(anderson@redhat.com)
- Enhancement to the "runq" command to show the current task in each
cpu's runqueue, plus a few formatting changes to make the output
easier to understand.
(anderson@redhat.com)
- Fix for a memory leak when running on live systems, due to the
repetitive reallocation of the internal array of active tasks.
(anderson@redhat.com)
- Fix for usage with vmlinux debuginfo files using Dwarf 3 format,
for example, the Fedora 2.6.31-0.24.rc0.git18.fc12 kernel. Without
the patch, the crash session fails during initialization with the
error message: "Dwarf Error: wrong version in compilation unit header
(is 3, should be 2) [in module <path-to>/vmlinux]", followed by
the erroneous message "crash: <path-to>/vmlinux: no debugging
data available". The patch simply accepts the Dwarf 3 header, and
the embedded gdb-6.1 version still appears to work with the updated
vmlinux debuginfo file format.
(anderson@redhat.com)
- Fix for faulty invocation failure when a System.map file is used as
an argument with a compressed diskdump or compressed kdump dumpfile.
If the System.map argument appears after the vmcore file on the
command line, as in: "crash vmcore System.map vmlinux", the crash
session fails immediately with the error message: "crash: vmcore:
initialization failed". With the patch, the arguments may be entered
in any order.
(anderson@redhat.com)
- Fix for a potential segmentation violation during invocation if a
vmcore file, a System.map file, and a non-matching vmlinux file are
used as command line arguments. The problem is that whenever a
System.map file is used, it is presumed that the user knows what he
is doing, and that the vmlinux file is not the same as the kernel
that generated the vmcore; therefore the vmlinux/vmcore matching and
verification routines are not performed. However, if the kernel data
structures in the non-matching vmlinux vary widely enough from the
kernel that generated the vmcore, all manners of bogus data may be
read and consumed. The reported segmentation violation occurred when
using a vmcore created from a "stock" Red Hat kernel with a vmlinux
file from a Red Hat "debug" kernel, where the kernel data structures
are significantly different. The patch adds a several new defensive
mechanisms, and displays additional warning messages, when invalid or
questionable data is read, and as a result the crash session will fail
in a more reasonable manner.
(anderson@redhat.com)
- Adjusted several virtual and physical memory address definitions for
2.6.31 x86_64 kernels: MAX_PHYSMEM_BITS, VMALLOC_START, VMALLOC_END,
VMEMMAP_VADDR, VMEMMAP_END, MODULES_VADDR and MODULES_END. Without
the patch, when run against CONFIG_SPARSEMEM_VMEMMAP 2.6.31 kernels,
the "kmem -i" option would hang, and when run against CONFIG_SLUB and
CONFIG_SPARSEMEM_VMEMMAP 2.6.31 kernels, the "kmem -s" option would
report numerous errors indicating "kmem: read error: kernel virtual
address: <address> type: page inuse", where the <address> was
a legitimate virtual-memmap page structure address.
(anderson@redhat.com)
- Improvement for CONFIG_SLUB "kmem -s" or "kmem -S" options when an
invalid slab page link address is encountered. Without the patch,
the commands fail with a generic "invalid kernel virtual address"
read error message, and "kmem -s" would not display any previously
collected statistics. With the patch, the error message displays
the slab cache name, the list type, and the invalid pointer found,
for example, "kmem: dentry: partial list: page.lru.next: 100100".
(anderson@redhat.com)
- 4.0-8.10 to 4.0-8.11 incremental patch
(6/30/09)
4.0-8.10 - Enhancement for currently-existing "mod -S <directory>" option to
make it search for the module debuginfo tree in the same specified,
non-standard, directory tree. When "mod -S" is used without a
specified directory argument, the "<module>.ko" object files are
searched for in the standard "/lib/modules/<release>" directory
tree, and their associated "<module>.ko.debug" are searched for
in the standard "/usr/lib/debug/lib/modules/<release>" directory
tree. Without this patch, "mod -S <directory>" would search the
specified non-standard directory tree for the kernel's "<module>.ko"
files, but the associated "<module>.ko.debug" would not be found.
With the patch, the search for the associated "<module>.ko.debug"
files will be made in the following order and manner:
1. in the same directory containing the "<module>.ko" file.
2. in the ".debug" subdirectory of the directory containing the
"<module>.ko" file.
3. if the "<module>.ko" file was found in a directory pathname
containing the "/lib/modules" component, then the search will be
made in the assocated "/usr/lib/debug/lib/modules" location.
This enhancement will allow an alternate module/module-debuginfo
directory tree to be set up like so:
# cd <directory>
# rpm2cpio kernel-<release>.rpm | cpio -idv
# rpm2cpio kernel-debuginfo-<release>.rpm | cpio -idv
Having done that, the currently-existing "mod -S <directory>"
option will find both the "<module>.ko" and "<module>.ko.debug"
files. In addition, a new "--mod" command-line option may be used
to specify the directory tree:
# crash vmlinux [vmcore] --mod <directory>
When the "--mod <directory>" command line option is used, then
"mod -S" (without a directory argument) will search that directory
tree by default instead of using the standard location.
(anderson@redhat.com)
- Fix to handle the 2.6.29 replacement of the symbols "cpu_online_map",
"cpu_present_map" and "cpu_possible_map" with analogous symbols
"cpu_online_mask", "cpu_present_mask" and "cpu_possible_mask".
Without this patch, crash would fail during initialization on s390
and s390x systems with the error message "crash: cannot resolve
cpu_online_map", or with the error message "crash: PPC64: cannot find
cpu_present_map or cpu_online_map symbols" on ppc64 systems.
(holzheu@linux.vnet.ibm.com, anderson@redhat.com)
- Added several function prototypes for the SIAL extension module file
sial.c because of its inability to #include "defs.h". Without
the patch, the compiler would presume that several un-prototyped
functions would have a return value of int, and therefore would
truncate 64-bit return values into 32-bits.
(holzheu@linux.vnet.ibm.com)
- If by remote chance the panic task cannot be determined from a ppc64
kdump vmcore, a segmentation violation would occur during crash
session initialization.
(anderson@redhat.com)
- An additional directory has been added to the currently-existing
list of directories that the "extend" command searches when the
extension module file is not expressed with a fully-qualified
pathname. The following directories will be searched in the order
shown, and the first instance of the file that is found will be
selected:
1. the current working directory
2. the directory specified in the CRASH_EXTENSIONS shell
environment variable
3. /usr/lib64/crash/extensions (64-bit architectures)
4. /usr/lib/crash/extensions
5. ./extensions
(anderson@redhat.com)
- Fix the "extensions/Makefile" to force a rebuild of extension modules
when the "defs.h" file is newer than the module source.
(anderson@redhat.com)
- Added "snap.c" and "snap.mk" files to the extensions directory. The
new module contains a "snap" command that creates a kdump or netdump
dumpfile from a live system. Currently the x86, x86_64, ppc64 and
ia64 architectures are supported. The snap.so extension module has
been added to the crash-extensions-<release>.rpm, which is created
by the crash-<release>.src.rpm, which installs extension modules
in the /usr/lib[64]/crash/extensions directory.
(anderson@redhat.com)
- Added a set of functions that, for an active task, return a pointer
to the associated register set found in an NT_PRSTATUS note in
netdump and kdump ELF dumpfiles if one exists. They are not used by
the crash source code, but are available to extension modules.
(sharyath@in.ibm.com)
- Use the "crashing_cpu" kernel symbol as a more efficient manner of
determining a kdump x86_64 panic task.
(anderson@redhat.com)
- Fix to handle the replacement of the per-cpu array of x8664_pda data
structures with per-cpu variables in 2.6.30. Without the patch, an
x86_64 crash session would die during initialization with the error
message: "crash: invalid structure size: x8664_pda".
(anderson@redhat.com, nishimura@mxp.nes.nec.co.jp)
- 4.0-8.9 to 4.0-8.10 incremental patch
(5/29/09)
4.0-8.9 - Tentatively scheduled as the baseline version for the RHEL5.4 crash
utility errata release.
- Implemented a new "bt -g" option, which will display the backtraces
of all threads in the targeted task's thread group. The thread
group leader's backtrace will be displayed first, regardless of
which task was the target of the "bt" command.
(anderson@redhat.com)
- Implement support for the kdump "split-dumpfile" format, which can
split /proc/vmcore into multiple dumpfiles as specified by the
"makedumpfile --split" command option. It simply requires that all
of the split dumpfile names be entered on the crash command line.
(tindoh@redhat.com)
- Fix for "kmem -i", "kmem -n" and "kmem -p" on x86_64 CONFIG_SPARSEMEM
and CONFIG_SPARSEMEM_EXTREME kernels that have MAX_PHYSMEM_BITS
increased from 40 to 44. Without the patch, erroneous page-related
data could be displayed depending upon the amount of physical memory
contained by the target system.
(anderson@redhat.com)
- For the architectures that support it, the "--machdep option=value"
command line option has been modified to allow more than one machine-
dependent argument. (anderson@redhat.com)
- The starting backtrace location of active, non-crashing, xen dom0
tasks are not available in kdump dumpfiles, nor is there anything
that can be searched for in their respective stacks. Therefore, for
those those tasks, the "bt" command will indicate: "bt: starting
backtrace locations of the active (non-crashing) xen tasks cannot be
determined: try -t or -T options". Without the patch, the backtrace
would either be empty, or it would show an invalid backtrace starting
at the last location where schedule() had been called.
(anderson@redhat.com)
- Fix for potentially empty "bt -t" output, and for "bt -T" potentially
dumping the text return addresses in the hard or soft IRQ stacks
instead of the process stack. This could occur if the targeted task
was the last task that used the hard or soft IRQ stack (x86 only).
(anderson@redhat.com)
- 4.0-8.8 to 4.0-8.9 incremental patch
(4/16/09)
4.0-8.8 - If a live kernel crash session fails during initialization due to
read errors, and it appears to be because the running kernel was
configured with CONFIG_STRICT_DEVMEM, display this warning message:
"crash: This kernel may be configured with CONFIG_STRICT_DEVMEM,
which renders /dev/mem unusable as a live memory source."
(anderson@redhat.com)
- Fix for the "bt" command to prevent a segmentation violation seen
with an x86_64 Egenera/LKCD dumpfile where the starting stack hooks
for the active tasks in the dumpfile header were nonsensical.
(anderson@redhat.com)
- Fix for the chronological display of the kernel printk buffer data
by the "log" output if the administrator has cleared the buffer
with syslog() or klogctl(). (oomichi@mxs.nes.nec.co.jp)
- Change the message displayed when supplying a non-process stack
address as an argument to "bt -S". Because the supplied address
is typically valid, such as a hard or soft IRQ stack address,
the message will indicate "non-process address" instead of
"invalid stack address". (anderson@redhat.com)
- The crash-<release>.src.rpm will create an additional binary
crash-extensions-<release>.rpm file containing the sial.so and
dminfo.so extension modules. The modules will be installed in the
/usr/lib[64]/crash/extensions directory.
(holzheu@linux.vnet.ibm.com, anderson@redhat.com)
- If a shared-object filename passed to the "extend" command is not
expressed with a fully-qualified pathname, the following directories
will be searched in the order shown, and the first instance of the
file that is found will be selected:
1. the current working directory
2. the directory specified in the CRASH_EXTENSIONS shell
environment variable
3. /usr/lib64/crash/extensions (64-bit architectures)
4. /usr/lib/crash/extensions
The same rules will be applied when unloading shared object files
with "extend -u <shared-object>". Without the patch, only files
in the current directory or those specified with a fully-qualified
pathname were accepted. (anderson@redhat.com)
- Changed the manner in which the "bt" command determines which PID 0
swapper task was interrupted by an ia64 INIT or MCA exception.
There is an existing ia64 INIT/MCA handler bug which incorrectly
writes the pseudo task's command name in its comm[] name string
such that the cpu number may not be part of the string. If that
happens without this patch, the "bt" command fails to make the link
back to the interrupted task, and displays the error message:
"bt: unwind: failed to locate return link (ip=0x0)!"
(anderson@redhat.com)
- Removed an unused initialized variable in get_task_mem_usage().
(junkoi2004@gmail.com)
- Added a debug-level 8 statement in readmem() that will display the
current input address and its translated physical address under the
existing debug-level 4 "<readmem: ...>" debug line, put in place to
aid in debugging read and/or seek errors.
(anderson@redhat.com)
- 4.0-7.7 to 4.0-8.8 incremental patch
(3/20/09)
4.0-7.7 - Also available in Fedora Rawhide devel branch:
build: dist-f11,devel:crash-4.0-7.7.2.f11
http://koji.fedoraproject.org/koji/buildinfo?buildID=83451
build: dist-f11-rebuild,devel:crash-4.0-8.7.2.f11
http://koji.fedoraproject.org/koji/buildinfo?buildID=84905
build: dist-f12-rebuild,devel:crash-4.0-9.7.2.fc12
http://koji.fedoraproject.org/koji/buildinfo?buildID=116824
- Because the ia64 and ppc64 architectures have configurable page
sizes, a host system running a crash session against a dumpfile may
have a different page size than the system that generated the
dumpfile. If the dumpfile is a compressed kdump vmcore or a
diskdump vmcore, the page size will be reset to the dumpfile header's
block_size variable if it does not agree with the host system's page
size. If the dumpfile is a 64-bit kdump ELF vmcore with vmcoreinfo
data that includes the crashing system's page size, that page size
will be used if the architecture is an ia64 or ppc64.
(holt@sgi.com, bwalle.suse.de)
- Fix for "mod -[sS]" command if the target module object filename
contains both underscore and dash characters. Without the patch
the module load would fail with the error message: "mod: cannot
find or load object file for <name> module". Examples are
the "aes_x86_64" module from the "aes-x86_64.ko" object file, and
the "dm_region_hash" module from the "dm-region_hash.ko" object file.
(anderson@redhat.com)
- Reject s390 and s390x "L2^B" local label symbols from the kernel
symbol list. (bwalle@suse.de)
- Enlarge the string format buffer in the show_last_run() function to
prevent a buffer overflow when running "ps -l".
(sachinp@in.ibm.com)
- Fix for "bt -a" to continue with the backtraces of the remaining
active tasks when one of them encounters a fatal error. Without
the patch, the command is aborted when any of the backtraces fail.
(anderson@redhat.com)
- Only allow trusted versions of .crashrc and .gdbinit files to be
sourced during session initialization. (anderson@redhat.com)
- Fix for a potential but highly unlikely buffer overflow in the gdb
dwarfread.c and dwarf2read.c files, which requires a hand-crafted
object file with a location block (DW_FORM_block) that contains a
large number of operations. (anderson@redhat.com)
- Fix for a potential but highly unlikely integer overflow in the
Binary File Descriptor (BFD) library, which requires a hand-crafted
object file that that specifies a large number of section headers,
leading to a heap-based buffer overflow. (anderson@redhat.com)
- Enable stack unwind on ia64 when using a kerntypes file as the
kernel namelist. (cpw@sgi.com)
- Fix for failure of "files -R" command option if an inode is unknown
due to a NULL f_dentry pointer in any open file structure because of
a kernel error condition. Without the patch, the command aborts
prematurely with the error message: "files: invalid input: ?".
(anderson@redhat.com)
- Allow an LKCD kerntypes debuginfo file created from a kernel module
to be loaded with the command: "mod -s <module> <kerntypes-file>".
(cpw@sgi.com)
- Increased NR_CPUS from 256 to 512 for x86_64, and from 128 to 1024
for ppc64. Made several NR_CPUS-bound static arrays in the internal
task_table and kernel_table structures dynamically allocated only
upon demand. (anderson@redhat.com)
- 4.0-7.6 to 4.0-7.7 incremental patch
(2/06/09)
4.0-7.6 - Fix for initialization-time failure if the kernel was built without
CONFIG_SWAP. Without the patch, it would fail during initialization
with the error: "crash: cannot resolve: nr_swapfiles"
(anderson@redhat.com)
- Fix for the "bt" command when run on x86_64 kernels that contain the
x86/x86_64 merger patch. Without the patch, non-active (blocked)
tasks do not start with "schedule", and as a result may contain
stale frame entries. (anderson@redhat.com)
- Fix for the usage of an input file of commands redirected during
runtime via "<", where more than one command in the input file
results in a fatal error. Without the patch, the handling of the
input file would go into an infinite loop repeatedly running the
second failed command. (anderson@redhat.com)
- Clean up causes for warning messages when compiling with gcc 4.3.2.
(anderson@redhat.com)
- Fix to prevent a segmentation violation during initialization when
parsing (corrupted) module symbols. Without the patch, if a kernel
module's Elf32_Sym/Elf64_Sym data structure contains a corrupt
"st_index" field, the resultant string table access could cause a
segmentation violation. (anderson@redhat.com)
- If an active task experiences a kernel stack overflow, the task's
thread_info structure located at the very bottom of the stack will
likely have its "cpu" field corrupted. Without the patch, any task
with a corrupt cpu value is not accepted, and the error message
"crash: invalid task: <task-address>" is displayed. With the
patch, an active task will be accepted based upon its existence as
the current task in a per-cpu runqueue structure, and there will be
a warning message indicating that the cpu value is corrupt.
(anderson@redhat.com)
- Modification of the the "files" command when a task has an open file
referenced by a file descriptor, but the file structure's f_dentry
field is NULL. This is a kernel error condition, but without this
patch the "files" command does not display anything for that file
descriptor, as if the file has been closed or is not in use. This
patch displays the file descriptor number and the file structure's
virtual address. (anderson@redhat.com)
- Fix for the "bt" command on x86 Xen architectures when the backtrace
starts on the hard IRQ stack. Without the patch, the backtrace
may not properly make the transition back to the process stack
with the error message "bt: invalid stack address for this task",
or it may cause a segmentation violation. (anderson@redhat.com)
- 4.0-7.5 to 4.0-7.6 incremental patch
(1/09/09)
4.0-7.5 - Fix for "kmem -i" and "kmem -p" on 2.6.26 x86 CONFIG_SPARSEMEM
PAE kernels to account for the change in value of SECTION_SIZE_BITS.
(oomichi@mxs.nes.nec.co.jp)
- Fix for "bt -[tT]" options on x86 architectures when the backtrace
starts on the hard IRQ stack. Without the patch, the backtrace
may not properly make the transition back to the process stack.
(anderson@redhat.com)
- Fix for the "bt" command when run on a xen hypervisor in which the
backtrace leads to either "process_softirqs" or "page_fault".
Without the patch, the backtrace indicates: "bt: cannot resolve stack
trace", and then the recovery code terminates the command with the
nonsensical error message: "bt: invalid structure size: task_struct".
(oda@valinux.co.jp, anderson@redhat.com)
- Fix for the "kmem -[sS]" options that could cause a segmentation
violation or bogus "bad slab pointer" and "bad inuse counter" error
messages. Reported on 2.6.25-based CONFIG_DEBUG_SLAB kernels, but
could conceivably occur on any kernel with a kmem_cache.nodelists[]
array. (anderson@redhat.com)
- Fix for a bug in the SIAL extension when dealing with bitfields.
(olaf@sgi.com, hedi@sgi.com)
- Fix for the "files" command when run on 2.6.25 and later kernels,
which would either fail with an "invalid kernel virtual address"
error of type "fill_dentry_cache", or would show nonsensical/garbage
"ROOT" and "CWD" pathnames. This was due to the change in format
of the kernel's fs_struct. (anderson@redhat.com)
- Addition of a new "null-stop" environment variable that can be turned
on/off with the "set" command. It simply controls the embedded gdb's
"null-stop" print setting, which, if on, will stop printing character
arrays when the first NULL is encountered. The default setting is
still "off", so there will be no behavioral changes unless it is
turned on during runtime or in .crashrc files.
(anderson@redhat.com)
- Fix for the builtin "g" alias, which would fail with an "Ambiguous
command" error from the embedded gdb module.
(anderson@redhat.com)
- Fix to handle the 2.6.27 kernel's change of the module structure's
num_symtab, core_size and core_text_size members from long to int.
Without the patch, initialization-time failures would result when
running against 64-bit big-endian kernels, and potentially on little-
endian 64-bit kernels. (bwalle@suse.de)
- Implement support for the /dev/crash driver being built into x86 or
x86_64 Red Hat kernels with the restricted /dev/mem driver. Without
the patch, if the kernel was built with CONFIG_CRASH configured as
"y" instead of "m", and crash was run against the resultant live
kernel, it would fail during initialization attempting to use the
restricted /dev/mem device. (anderson@redhat.com)
- If the /dev/crash driver module has been loaded prior to a live crash
session, then it will not be unloaded when the crash session exits.
Normally the module gets loaded by the crash utility during its
initialization on a live system, and then unloaded when the crash
session exits, regardless whether the module was loaded by the crash
utility itself or if it was pre-loaded manually. However, if a cpu
subsequently hangs, then a live crash session attempt would also hang
when it tries to load the module. This patch will allow the crash.ko
module to be pre-loaded -- for example during kernel boot-time -- and
if a cpu subsequently hangs, a live crash session can be initiated to
investigate the problem. (anderson@redhat.com)
- Fix to recognize the 2.6.25 re-naming of the x86 user_regs_struct
structure members. Without the patch, running against a kdump
dumpfile would fail with the error: "crash: invalid structure member
offset: user_regs_struct_ebp". (anderson@redhat.com)
- Fix for initialization-time failure when running against 2.6.27
x86_64 xen kernels, which indicate "crash: cannot resolve: end_pfn".
(bwalle@suse.de)
- Fix for initialization-time failure when running against Xen 4.4
hypervisor binaries, which indicate "crash: invalid structure member
offset: domain_is_polling". (bwalle@suse.de)
- Added a new "p -u" option, which indicates that the gdb expression
argument evaluates to a user virtual address in the current context.
This option could be used, for example, if a known kernel data
structure exists at user virtual address in the current context,
or if the debuginfo data of a user program were loaded into the
crash session via the gdb "add-symbol-file" command.
(anderson@redhat.com)
- Fix for "bt -a" command when running against the xen hypervisor where
the number of physical cpus outnumber the MAX_VIRT_CPUS value for the
processor type. Without the patch on such a system, "bt -a" would
fail after displaying backtraces for the first 32 (MAX_VIRT_CPUS)
pcpus with the the error message: "bt: invalid vcpu". The patch also
corrects the "vcpus" command output to show the vcpus associated with
pcpus 32 through 63, and the "doms" command output to show the second
idle domain associated with pcpus 32 through 63.
(oda@valinux.co.jp)
- Fix for the display of the processor speed on IBM Power6 hardware.
Without the patch, "MACHINE: ppc64 (unknown Mhz)" would be displayed
upon initialization and by the "sys" command.
(sachinp@in.ibm.com, acv@linux.vnet.ibm.com)
- 4.0-7.4 to 4.0-7.5 incremental patch
(12/05/08)
4.0-7.4 - Fix for a build regression for non-xen architectures introduced in
version 4.0-7.3. The ppc64, s390 and s390x architectures fail to
compile due to an undefined reference to "xen_hyper_print_bt_header".
(bwalle@suse.de)
- 4.0-7.3 to 4.0-7.4 incremental patch
(10/14/08)
4.0-7.3 - Fix for nonsensical usage of the "set" command when running
against the xen hypervisor binary. If entered alone on the
command line, the command would cause a segmentation violation,
because there is no concept of a "context" in the xen hypervisor.
In addition, more reasonable error messages are displayed if
"set", "set -c <cpu>", "set -p", or "set <address>" are
attempted when running against a xen hypervisor.
(anderson@redhat.com)
- Fix for "bt" command on x86 architectures when the backtrace
starts on the hard IRQ stack. Without the patch, the backtrace
may not properly make the transistion back to the process
stack, and therefore not display the interrupt exception frame
or any kernel functions leading up to the interrupt.
(anderson@redhat.com)
- Fix for "search -k" option on some ia64 hardware, depending
upon the underlying physical memory layout. Without the patch
the command could fail prematurely with the error message:
"search: ia64_VTOP(a000000200000000): unexpected region 5 address".
(anderson@redhat.com)
- Fixes for the "bt" command when running against the xen hypervisor
binary. The "bt -o" option, and setting it to run by default with
"bt -O", would fail with the vmlinux-specific error message "bt:
invalid structure size: desc_struct" with a stack trace leading
to read_idt_table(); with the patch it will display the generic
error message "bt: -o option not supported or applicable on this
architecture or kernel". The "bt -e" or "bt -E" will also display
the same error message, as opposed to the command usage message.
Lastly, the "bt -R" option would cause a segmentation violation;
it has been fixed to work as it was designed.
(anderson@redhat.com)
- The "foreach" command has been removed from the set of commands
supported for usage with the xen hypervisor. If attempted, it
would always silently fail. (anderson@redhat.com)
- Fix for "irq -d" option when run on x86_64 xen kernels. Without the
patch it would indicate: "irq: invalid structure size: gate_struct"
and dump a stack trace leading to x86_64_display_idt_table(). Now it
will indicate that the -d option is not applicable.
(anderson@redhat.com)
- Avoid the symbolic translation of ia64 unity-mapped region 7 kernel
virtual addresses as they are displayed by the "bt -r" and "rd -[sS]"
commands. Without the patch, they are shown as "v+<offset>"
because "v" is an absolute symbol equal to 0xe000000000000000.
(anderson@redhat.com)
- Remove redundant storage of "swapper_pg_dir" symbol value during x86
initialization. (junkoi2004@gmail.com)
- Recognize the removal of the "jiffies" variable when running against
newer versions of the xen hypervisor by indicating "--:--:--" next
to the UPTIME display. (oda@valinux.co.jp)
- Fix to determine whether an x86 or x86_64 xen hypervisor was built
with PERCPU_SHIFT value of 12 or 13. Without the patch, crash
sessions running against a xen-3.3 hypervisor would fail during
initialization with the error message: "crash: cannot read elf note
core." (oda@valinux.co.jp)
- 4.0-7.2 to 4.0-7.3 incremental patch
(10/10/08)
4.0-7.2 - Fix for initialization-time failure when running against 2.6.27
x86_64 kernels, which indicate "crash: cannot resolve: end_pfn".
The patch sets the new 2.6.27 x86_64 PAGE_OFFSET value, handles
the change in the x86_64 "_cpu_pda" variable declaration, and
distinguishes paravirtual "pv_ops" kernels from traditional xen
kernels. (oomichi@mxs.nes.nec.co.jp, anderson@redhat.com)
- When an improper structure member offset or structure size is
attempted, a partial crash backtrace is displayed in the ensuing
error message. However, if the crash binary was stripped, it would
show "/usr/bin/nm: /tmp/crash: no symbols" instead of the address
and name of the symbol. This has been fixed to work with stripped
binaries if the crash symbol can be found in the crash binary; if
the crash symbol cannot be found, such as for static text symbols,
it will just display its address and "(undetermined)".
(bwalle@suse.de)
- crash.spec file addition: Requires: binutils
(anderson@redhat.com)
- Fix for LKCD kerntypes debuginfo files to use "node_states" when
"node_online_map" is not in use. (cpw@sgi.com)
- Implement support for s390/s390x CONFIG_SPARSEMEM kernels. Without
the patch, crash sessions would fail during initialization with the
error message: "crash: CONFIG_SPARSEMEM kernels not supported for
this architecture". (holzheu@linux.vnet.ibm.com)
- Fix for "kmem -[sS]" when running against 2.6.27 CONFIG_SLUB kernels,
in which the kmem_cache.objects and .order members were replaced by
a kmem_cache_order_objects structure. Without the patch, the command
would fail with the error message: "kmem: "invalid structure member
offset: kmem_cache_objects". The fix also recognizes and supports
potentially variable slab sizes as introduced by the kernel patch.
(anderson@redhat.com)
- Increased the maximum number of SIAL commands from 100 to 200.
(cpw@sgi.com)
- 4.0-7.1 to 4.0-7.2 incremental patch
(9/15/08)
4.0-7.1 - Fix to address RT kernel's renaming of the address_space.nrpages
member to address_space.__nrpages. Without the patch, "kmem -i"
would fail with the error message "kmem: invalid structure member
offset: address_space_nrpages". (bwalle@suse.de)
- For crash utility debug backtraces displayed in error conditions,
the usage of __builtin_return_address() has been replaced with the
backtrace() function. This prevents crashes if the Makefile is
modified to compile with -O2. (bwalle@suse.de, anderson@redhat.com)
- Fix for ia64 hypervisor backtraces when the entries in the cpu map
are not contiguous. (takebe_akio@jp.fujitsu.com)
- Fix to make shell-escaped commands in a crash input file direct
their output properly. Without the patch, if the output of an
input file was redirected to a file or pipe, the output of any
shell-escaped commands in the input file only went to stdout.
(anderson@redhat.com)
- Fix to allow the usage of the "-i inputfile" command line option
when operating from an init script. Without the patch, the crash
session would fail during initialization with the error message:
"crash: /dev/tty: No such device or address". (anderson@redhat.com)
- Fix for the "kmem -P <address>" option, where <address> is an
invalid physical address. Without the patch, the command causes
a segmentation violation on an ia64; on other architectures an
unnecessary mem_map header is displayed prior to the error message.
(wency@cn.fujitso.com)
- Fix for a potential endless cascade of SIGFPE exceptions during
session initialization when a vmlinux and vmcore do not match,
and a correct System.map or a non-debug vmlinux file is not supplied.
Doing that is is allowable, but is certainly not recommended. In
this case, and incorrect kernel HZ value of 0 was calculated and used
for the initial "UPTIME:" display. (anderson@redhat.com)
- More gracefully handle a nonsensical "search -u <address>" command
attempt on a kernel thread or any context with no user address space.
Without the patch, the error message was related to a failed user
virtual address translation attempt; with the patch it now indicates:
"search: current context has no user address space".
(anderson@redhat.com)
- Reworked the "search" command for usage with the Xen Hypervisor.
When attempted on a Xen hypervisor, and depending upon the arguments
used, a segmentation violation, a nonsensical error message, or if
neither of the aforementioned, the command could appear to work but
not necessarily find the search target value even though it was there
in the specified memory range. To address the various shortcomings,
the following restrictions have been put in place for usage with the
Xen hypervisor:
(1) A starting virtual address must be supplied either symbolically
or with the "-s <address>" option.
(2) The (nonsensical) "-u" option is no longer accepted.
(3) The "-k" option is no longer accepted.
(4) When cycling through virtual memory, as soon as an address
cannot be read, the search will be quietly suspended. To
determine where the search was suspended, enter "set debug 1",
and then re-run the command.
(anderson@redhat.com)
- Fix for initialization-time segmentation violation due to a module
allocating and creating an exported symbol list outside of its own
virtual address space, and then overwriting its own symbol list
pointer. (anderson@redhat.com)
- Implementation of a "--minimal" command line option, which brings
up a crash session that is restricted to the "log", "dis", "rd",
"sym", "eval" and "exit" commands. This option may provide a way to
extract some minimal/quick information from a corrupted or truncated
dumpfile, or in situations where one of the several kernel subsystem
initialization routines, which are not called, would abort the
crash session. (sharyath@in.ibm.com, sachinp@in.ibm.com)
- 4.0-6.3 to 4.0-7.1 incremental patch
(8/19/08)
4.0-7 - License tag in crash.spec changed from "GPL" to "GPLv2"; otherwise
identical to 4.0-6.3. (spot@fedoraproject.org)
- Available only in Fedora Rawhide devel branch:
build: dist-f10,devel:crash-4.0-7
http://koji.fedoraproject.org/koji/buildinfo?buildID=56268
(7/15/08)
4.0-6.3 - Support for Fedora FC9 kernels containing the linux-2.6.utrace.patch,
which removes the task_struct.parent member. Without the patch, the
crash session fails during initialization with the error message:
"crash: invalid structure member offset: task_struct_parent".
(anderson@redhat.com)
- Available in Fedora Rawhide devel branch:
build: dist-f10,devel:crash-4.0-6.3
http://koji.fedoraproject.org/koji/buildinfo?buildID=47600
- Further scalability improvements to the "search -k" mechanisms.
(anderson@redhat.com)
- Changed ppc64 manner of determining the number of cpus to first check
the cpu_present_map, and only if that doesn't exist, continue to use
the cpu_online_map. Without the patch, depending upon which cpus
were offline, crash sessions could fail during initialization with
the error message: "crash: cannot determine idle task addresses from
init_tasks[] or runqueues[]". (anderson@redhat.com)
- Fix/workaround for the ppc64 "bt" command on panic/active tasks when
run against dumpfiles whose kernel had crashed with one or more
cpus offline. Without the patch, the "bt" command could cause a
segmentation violation, or fail because the starting stack location
and instruction pointer were invalid. With the patch, an error
message will be displayed, indicating that the NT_PRSTATUS note for
that task could not be determined. (anderson@redhat.com)
- Added support for vtop translation of 1MB large pages available on
new z10 (s390x) systems. (holzheu@linux.vnet.ibm.com)
- Prevent misleading init-time warning message for s390/s390x when
verifying the vmlinux file with respect to the host machine type.
Without the patch, this message would appear when running on s390
or s390x machines: "WARNING: machine type mismatch: crash utility:
S390X /usr/lib/debug/lib/modules/2.6.18-86.el5/vmlinux: (unknown)"
(holzheu@linux.vnet.ibm.com)
- Minor documentation fix to crash.8 man page, moving the "wr" command
from being munged into the "whatis" description into its own list
entry. (yamato@redhat.com)
- Support for running against an x86 xen-syms hypervisor binary based
upon xen 3.1.2 or later. Without the patch, the session would fail
to recognize that it was PAE, and "bt" commands on the non-active
task would fail with the error messages "bt: cannot resolve stack
trace" and "bt: invalid structure size: task_struct".
(oda@valinux.co.jp, anderson@redhat.com)
- Support for running against an x86_64 xen-syms hypervisor binary
based upon xen 3.1.2 or later. Without the patch, the session would
fail during initialization with the error message: "crash: cannot
resolve idle_pg_table_4". In addition, the x86_64 xen-syms
hypervisor is now relocatable, but the kdump vmcore does not
(currently) export the base physical address of the relocated
hypervisor text and static data. Without that knowledge, the crash
utility cannot make virtual to physical address translations, and
therefore cannot navigate through the vmcore. To address that
shortcoming, a patch is required for either the xen hypervisor code
or the kexec-tools package to export the value of the hypervisor's
"xen_phys_start" symbol to the vmcore. Until such time, however, a
workaround has been put in place to pass the value with a new command
line option that is invoked like so:
# crash --xen_phys_start <address> xen-syms vmcore
The value of the xen_phys_start <address> argument can be
determined in two ways, either from /proc/iomem on the live
system running the dom0 kernel that generated the kdump, or by
running crash on the target vmcore using the dom0 vmlinux file.
For example, on this system, the <address> argument would be
3ee00000:
# cat /proc/iomem | grep Hypervisor
3ee00000-3fdfffff : Hypervisor code and data
#
Alternatively, the vmcore file in this example indicates that the
<address> argument would be 0x3f000000:
# crash vmlinux vmcore
...
crash> px xen_hypervisor_res
xen_hypervisor_res = $3 = {
start = 0x3f000000,
end = 0x3fffffff,
name = 0xffffffff8049ab72 "Hypervisor code and data",
flags = 0x80000200,
parent = 0xffff880000001180,
sibling = 0x0,
child = 0xffff8800000000a8
}
If the --xen_phys_start command line option is not used, the session
will fail during initialization. However there will be a warning
message preceding the failure indicating: "WARNING: This hypervisor
is relocatable; if initialization fails below, try using the
--xen_phys_start <address> command line option". Eventually the
value of the hypervisor's "xen_phys_start" will be passed in the
vmcore header, obviating the need for this workaround.
(oda@valinux.co.jp, anderson@redhat.com)
- 4.0-6.2 to 4.0-6.3 incremental patch
(4/30/08)
4.0-6.2 - Implemented a new "rd -S" option which, like the "-s" option,
displays the symbolic translation of kernel virtual addresses,
but also recognizes the virtual addresses of slab objects, and when
found, the address is replaced by the kmem_cache slab name string
inside brackets. (anderson@redhat.com)
- Make the found address displayed by "kmem -[sS] <address>" be the
address of the containing object if the <address> argument is
offset from the beginning of the object. This only applies to
kernels using kernel/slab.c; CONFIG_SLUB kernels currently do display
the address of the containing object.
(anderson@redhat.com)
- Fix for "kmem -[sS] [address]" in 2.6.25 CONFIG_SLUB kernels, which
address changes in the kernel's per-slab free list tracking. Without
the patch, error messages of the type "kmem: invalid kernel virtual
address: 10700 type: get_freepointer" would be seen when the full
list of objects in a per-cpu slab was displayed.
(anderson@redhat.com)
- Fix for "kmem -[sS] <slab-address>" in 2.6.25 CONFIG_SLUB kernels,
in which the slab structure is actually a page struct. Some slab
addresses would not be recognized as such, and therefore without the
patch, error messages of the type "kmem: address is not allocated in
slab subsystem: <slab-address>" would be seen.
(anderson@redhat.com)
- Fix for an initialization-time failure with Ubuntu kernels because
of a mismatch between the /proc/version string and the linux_banner
string, due to additional information appended to the linux_banner
string in Ubuntu kernels. (anderson@redhat.com, asid@hp.com)
- Fix for the "net" command in 2.6.22 and 2.6.23 kernels, where the
"dev_base" net_device structure was replaced by the "dev_base_head"
list_head. Without the patch, the "net" command with no arguments
would fail with the error message: "net: dev_base does not exist!".
(eteo@redhat.com)
- Fix for the "net" command in 2.6.24 and later kernels where the
global "dev_base_head" list_head has been removed, and the network
devices are linked from the "init_net" net structure. Without the
patch, the "net" command with no arguments would fail with the
error message: "net: dev_base does not exist!".
(anderson@redhat.com)
- For kernels configured with CONFIG_SLUB, "kmem -S" has been updated
to properly differentiate whether a cache's "full" slabs are tracked
but whose full list is empty, or whether the full slabs are not
tracked at all. Without this patch, a cache's full list could be
indicated as "(empty)" instead of the more correct indication of
"(not tracked)". (i-kitayama@ap.jp.nec.com, anderson@redhat.com)
- Fix for the "vm" command when the crash session was invoked with
the -s command line option. Without the patch, if invoked prior to
a "set", "ps" or "vtop" command, the "vm" command run against a
task other than the initial context would mistakenly indicate that
the task contained no virtual memory.
(anderson@redhat.com, baiwd@cn.fujitsu.com)
- Fix/workaround for the "search -k" command option on relocatable
2.6-era ia64 machines configured with CONFIG_SPARSEMEM. Without
the patch, an immediate segmentation violation occurs.
(anderson@redhat.com, yzgcsu@cn.fujitsu.com)
- 4.0-6.1 to 4.0-6.2 incremental patch
(3/31/08)
4.0-6.1 - Support for 2.6.25 x86_64 kernels with the x86/x86_64 merger patch.
Without the patch, attempting a crash session would fail during
initialization with the error message: "crash: invalid structure
member offset: tss_struct_ist". (anderson@redhat.com)
- Support for 2.6.25 x86 kernels with the x86/x86_64 merger patch.
Without the patch, attempting a crash session on a dumpfile would
fail during initialization with the error message: "crash: invalid
structure size: user_regs_struct". (anderson@redhat.com)
- Fix for "bt" command when running on a live 2.6.25 x86 kernel with
the x86/x86_64 merger patch. Without the patch, "bt" would fail
with the error message: "bt: invalid structure member offset:
task_struct_thread_eip". (anderson@redhat.com)
- Fix for the "timer" command in 2.6.25 kernels. Without the patch
the command would fail with the error message: "timer: zero-size
memory allocation! (called from <user address>)".
(anderson@redhat.com)
- Cosmetic change to the x86 "bt" command to recognize the entry point
name change from "sysenter_entry" to "ia32_sysenter_target". Without
the patch, the entry point would indicate the "sysenter_past_esp"
assembly code label. (anderson@redhat.com)
- 4.0-6.0 to 4.0-6.1 incremental patch
(2/29/08)
4.0-6.0 - Available only as version 4.0-6.0.5 in Fedora's dist-f9/devel branch:
http://koji.fedoraproject.org/koji/buildinfo?buildID=37614
- When compiling within a 2.6.25-based build environment, four
"typedef unsigned int u32;" declarations are required due to a new
structure declaration in "asm-x86/ptrace-abi.h" that uses u32
members, but u32 is only defined in "asm-x86/types.h" within an
#ifdef __KERNEL__ section. I posted a patch on LKML to address the
ptrace-abi.h problem by changing the structure member declarations
to use __u32 typedefs, which was accepted in the -mm tree.
(anderson@redhat.com)
- 4.0-5.1 to 4.0-6.0 incremental patch
(2/20/08)
4.0-5.1 - Update "ps -l" to use task_struct.sched_info.last_arrival value
on 2.6.23 and later kernels that don't have a task_struct.last_ran
member. Without the patch, the option would fail with the error
message: "ps: neither task_struct.last_run nor task_struct.timestamp
exist in this kernel". (anderson@redhat.com)
- Fix for potential initialization-time failure when running against
2.4-era x86 netdump dumpfiles if the ebp and esp contents in the
ELF header's NT_PRSTATUS register dump do not contain a vestige of
the panic task's kernel stack address. Without the patch, there may
be one or more warning messages complaining about tasks not being in
the PID hash, followed by a fatal error message: "crash: invalid
kernel virtual address: <bad-address> type: 32-bit KVADDR", where
the <bad-address> can be any bogus kernel virtual address.
(anderson@redhat.com)
- Fix to make the unused do_radix_tree() function work as advertised.
(atyson@hp.com)
- Added zlib-devel to the crash-devel package-dependency Requires line
in the crash.spec file. (anderson@redhat.com)
- 4.0-5.0 to 4.0-5.1 incremental patch
(2/19/08)
4.0-5.0 - Tentatively scheduled as the baseline version for RHEL4.7 and RHEL5.2
crash utility errata releases; also built in Fedora Rawhide:
4.0-5.0.0 - RHEL4.7 errata version
4.0-5.0.2 - RHEL5.2 errata version
4.0-5.0.3 - Fedora Rawhide (devel branch)
- Fix for a potential segmentation violation during crash session
initialization if a task's kernel stack has been completely overrun,
corrupting its thread_info structure at the bottom of the stack.
This could occur running against kernels from 2.6.8 through 2.6.18.
With the patch, the suspect task will be reported during the task
initialization sequence. (anderson@redhat.com)
- Fix for the "bt" command when run on xen x86 dom0 dumpfiles, which
may potentially show empty backtraces for one or more active tasks.
(oomichi@mxs.nes.nec.co.jp)
- Initial support for OpenVZ kernels. (kshileev@sw.ru)
- 4.0-4.13 to 4.0-5.0 incremental patch
(1/17/08)
4.0-4.13 - If the vmlinux file or dumpfile is a machine type mismatch with
the crash utility binary, or far less likely, a ppc64 or ia64
endian mismatch, the crash session will fail during initialization
with the generic error message, "crash: <filename>: not a
supported file format". To aid the user in understanding what
caused the failure, this patch prepends an additional error
message that clarifies the reason behind the mismatch.
(anderson@redhat.com, bwalle@suse.de)
- An update for "kmem -V" option, which currently displays the kernel's
"vm_stat" counter values, will now also display the "vm_event_states"
counter values, both of which were introduced in 2.6.18. For 2.6
kernels prior to 2.6.18, the precursor "page_states" counter values
will be displayed. (anderson@redhat.com)
- Implemented a new "kmem -z" option to display per-zone memory
statistics. The amount of data displayed is dependent upon the
kernel version. At a minimum, the size, min/low/high and free
page counts are shown. If the zone struct contains nr_active,
nr_inactive, pages_scanned and all_unreclaimable members, those
fields are shown. If the zone struct contains a per-zone vm_stat[]
array (identical to the system-wide vm_stat[] array), its contents
are dumped. For any other data in the zone, the address of the
zone structure is displayed.
(anderson@redhat.com)
- Fix for the RSS amounts displayed by the "ps" and "vm" commands
on 2.6 kernels prior to 2.6.13. (anderson@redhat.com)
- Fix for the x86 "bt" command when running a version of crash built
on a pre-2.6.20 host against a 2.6.20 or later dumpfile, or when
running a version of crash build on a 2.6.20 or later host against
a pre-2.6.20 dumpfile. Without the patch, kernel exception frames
would be mistaken for, and displayed as, user exception frames, and
parts of the backtrace above the kernel exception frame would be
truncated. (atyson@hp.com)
- Fix for FC8 xen x86 kernels (2.6.21-2952.fc8xen) that fail during
initialization after reporting "WARNING: cannot read linux_banner
string", followed by the fatal error message "crash: vmlinux and
vmcore do not match!". This required a change to the virtual
address mask value used to determine the base value of the x86
kernel's unity-mapped virtual address region. (anderson@redhat.com)
- Set a default "phys_base" value for recent fully-virtualized
relocatable x86_64 kernels whose text start address is not equal
to the __START_KERNEL_map value. Without the patch, the crash
session fails during initialization with the warning message
"WARNING: cannot read linux_banner string", followed by the fatal
error message "crash: vmlinux and vmcore do not match!". The
error can alternatively be worked around if the "phys_base" value
is first determined by running a crash session on the live system
that generated the dumpfile, by entering: "help -m | grep phys_base".
The value shown can then be used when running against the dumpfile
like so: "crash --machdep phys_base=<value> vmlinux vmcore"
(anderson@redhat.com)
- Debug: implemented a new "--active" crash command line option, which
will gather only the active tasks from each runqueue, skipping the
traversal of the kernel's pid_hash mechanism.
(anderson@redhat.com)
- Debug: "help -n" formats and displays ASCII VMCOREINFO data.
(anderson@redhat.com)
- 4.0-4.12 to 4.0-4.13 incremental patch
(1/11/08)
4.0-4.12 - Fix for the "kmem -n" command to handle the 2.6.24 kernel replacement
of the "node_online_map" nodemask with its appropriate entry in the
new "node_states[]" nodemask array. Without the patch, the per-node
zone data would not be displayed, and any commands depending upon
the node table data would be affected. (anderson@redhat.com)
- Fix for "kmem -p" on 2.6.24 x86_64 kernels that are configured with
CONFIG_SPARSEMEM_VMEMMAP, which use a virtually-mapped page struct
array. Without the patch, the virtual-to-physical translation of
each page structure was invalid, and "kmem -p" would display invalid
data. This would also affect other commands as well, such as the
output of "kmem -i", and the output of a "vtop" command on a mapped
page address. Also, the virtual base address of the region is now
displayed by the "mach" command.
(oomichi@mxs.nes.nec.co.jp, anderson@redhat.com)
- Fix for the "dev" command's character device name string output to
recognize the change of the name structure member from a pointer
to an embedded string. Without the patch, 2.6.16 and later kernels
would display "(unknown)" character device names.
(olivier.daudel@u-paris10.fr, anderson@redhat.com)
- Fix for the "kmem -[sS]" command to handle the 2.6.24 change to
the CONFIG_SLUB kmem_cache structure, which re-worked the manner
in which the per-cpu slabs get referenced. Without the patch,
the command would fail with several error messages of the type:
"kmem: page_to_nid: invalid page: ffff81003993f4b0".
(anderson@redhat.com)
- Fix for the "kmem -[fF]" command to handle the 2.6.24 kernel change
of the free_area struct, which replaced the singular linked list
of pages with 5 (MIGRATE_TYPES) linked lists. Without the patch,
the command would fail with the error message: "kmem: unrecognized
free_area struct size: 88". (anderson@redhat.com)
- Fix for the "runq" command to handle the 2.6.24 kernel change to
the CFS scheduler that introduced per-cpu init_cfs_rq structures
for task group scheduling. Without the patch, no queued tasks
were displayed, because the rb_root of queued tasks was being
taken from the embedded cfs_rq in each per-cpu runqueue.
(anderson@redhat.com)
- 4.0-4.11 to 4.0-4.12 incremental patch
(12/12/07)
4.0-4.11 - Fix for task-gathering to handle the 2.6.24 pid_namespace-related
changes to the kernel pid_hash array. Without the patch, the crash
session fails during initialization with the message "crash: cannot
gather a stable task list via pid_hash (500 retries)".
(anderson@redhat.com)
- Fix for "kmem -f <address>" and "kmem <address>" commands on
x86 kernels, which may incorrectly indicate that the address is in
the kernel's free page list. Without this patch, if the address
argument is a physical address over 4GB, or a page struct address
referencing a physical address over 4GB, it is possible that the
address would incorrectly be shown as being in the kernel's free
page list. (anderson@redhat.com)
- Fix for x86 "bt" command for active tasks in Egenera dumpfiles
based upon LKCD version 7. Without the patch, the starting points
for the active task backtraces were erroneous.
(anderson@redhat.com)
- Fix for "kmem -S" error message if a slab object is found in both
a per-cpu list and on a slab's global free list. Without the patch,
the object address and cpu number values are flip-flopped in the
error message. (bob.montgomery@hp.com)
- 4.0-4.10 to 4.0-4.11 incremental patch
(12/6/07)
4.0-4.10 - Fix a regression introduced in 4.0-4.9 that causes the "kmem -p"
command to fail in SPARSEMEM kernels that that have the struct
page.index member embedded in an anonymous union, which occurred
when the CONFIG_SLUB-related modifications were made to the page
struct in 2.6.22. Without the patch, "kmem -p" fails with the error
message "kmem: invalid structure member offset: page_index".
(anderson@redhat.com)
- 4.0-4.9 to 4.0-4.10 incremental patch
(11/21/07)
4.0-4.9 - Fix for the "kmem -p" command in kernels configured with
CONFIG_SPARSEMEM, i.e., not CONFIG_SPARSEMEM_EXTREME. Without
the patch, the page structure address for each physical page
was erroneous. (oomichi@mxs.nes.nec.co.jp)
- Fix for the "kmem -p" command output of MAPPING and INDEX values
on kernels where the mapping and index members of the page structure
are contained within anonymous unions. Without the patch, those
fields may be dashed-out.
(bob.montgomery@hp.com, anderson@redhat.com)
- Fix for the "mod" command to search for module object files in the
/lib/modules/<release>/updates directory tree before looking
in /lib/modules/<release>. (charlotte.richardson@stratus.com)
- Fix for the "waitq" command for 2.6.15-era and later kernels, which
replaced the __wait_queue.task member with the __wait_queue.private
member. Without the patch, the command would fail with the error
message: "waitq: invalid structure member offset: __wait_queue_task".
(atyson@hp.com)
- SIAL interpreter fix for an "operation on 'v1' may be undefined"
warning in sial_exeop(). (bwalle@suse.de)
- Fix for several unpredictable failure modes when attempting
"crash -h [command] > outputfile" from a shell command line.
(anderson@redhat.com)
- Addressed compiler warnings generated by extensions/echo.c and
extensions/dminfo.c. (bwalle@suse.de, anderson@redhat.com)
- Addressed compiler warnings generated by lkcd_common.c, lkcd_v8.c
and symbols.c when using:
-O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector
-fno-builtin-memset -fno-strict-aliasing
(bwalle@suse.de)
- Fix for "kmem -p" on i386 CONFIG_SPARSEMEM kernels with greater than
4GB of memory. Without the patch, the physical address value wraps
back to zero after physical page ffff0000.
(oomichi@mxs.nes.nec.co.jp)
- Fix to redirect SIAL script command output to pipes, files, etc., in
the same manner as native crash commands.
(Robert.Denman@teradata.com, anderson@redhat.com)
- Fix for ppc64 kernels with 64K pages whose PTE_RPN_SHIFT has changed
from 32 to 30. Without the patch, an initialization-time warning
message "WARNING: cannot access vmalloc'd module memory" would occur,
the "mod" command would fail with the same message, and "kmem -s"
failures could occur when attempting to read a kmem slab cache name
string. Translations and reads of vmalloc'd kernel virtual addresses
and user virtual addresses would appear to work, but bogus data was
returned because the resultant physical address that was read was
incorrect. (anderson@redhat.com)
- Fix for "kmem -s" if a slab cache whose name string cannot be read
is encountered. Without the patch, a fatal error message would be
displayed and the command aborted. With this patch, a non-fatal
warning message is displayed, and the cache name is indicated as
"(unknown)". (anderson@redhat.com)
- Fix for x86-64 SPARSEMEM kernels with CONFIG_NUMA off. Without the
patch, the crash session fails during initialization with the message
"crash: invalid structure member offset: pglist_data_node_mem_map".
(sachinp@in.ibm.com)
- Fix to use the ia64 physical start address from the LKCD dump header
instead of the default value. This was reported as bug on an SGI
machine. (bwalle@suse.de)
- For s390[x] kernels the page table allocation method will be changed
such that instead of 3 levels, it will be now possible to allocate 4
levels. The current implementation of the page table walk functions
in the crash utility makes assumptions on how the page tables are
allocated by the kernel, e.g. 3 levels are hard coded. This patch
changes that, and the page table walk is done only according to the
s390 architecture without assumptions on the implementation in the
kernel. (holszheu@linux.venet.ibm.com)
- Fix for LKCD dumpfile access failures that abort() the crash session
after displaying an error message indicating a problem with physical
memory zones in the dumpfile. Without the patch, the crash session
would end immediately after displaying an error message of the sort:
"conflicting page: zone 0, page 0: 0, 177160130 != 65536". That
error message will now only be displayed if the crash debug mode is 1
or more, a readmem() "seek error" will be displayed instead, and the
session will return to the "crash>" prompt. (anderson@redhat.com)
- 4.0-4.8 to 4.0-4.9 incremental patch
(11/20/07)
4.0-4.8 - Implemented support for kernels configured with CONFIG_SLUB, which
completely replaces the venerable "kernel/slab.c" with the new
"kernel/slub.c" kmalloc() slab subsystem. Accordingly, the
"kmem -s [address]", "kmem -S [address]", and "kmem <address>"
commands will display slab-related information in a similar manner
to what they currently do, with additional per-node information.
It should be noted that, due to slub.c's design, the verbose
"kmem -S" output will be pared down slightly to not display the
list of all "full" slabs unless the proper kernel slub debugging
has been turned on. However, given a address of an object from a
full slab page, or of the full slab page itself, that address
will then be traced back to its original slab cache and its data
displayed. (anderson@redhat.com)
- Change for support of LKCD dumpfile version 8 and later to determine
the backtrace starting registers from the dumpfile header. Increase
(maximum) NR_CPUS for ia64 to 4096.
(bwalle@suse.de)
- The SIAL interpreter extension module has been updated to support
the ia64, ppc64, s390 and s390x architectures. Several fixes have
been applied, and three new debug commands, sdebug, sclass and sname
have been added. (lucchouina@yahoo.com)
- Fixed a bug in the CONFIG_SPARSEMEM patch (contributed in 4.0-3.22)
in which a static pointer variable was initializing itself with a
buffer that was returned from a command-time-only GETBUF() call,
instead of using malloc(). It would then continue to use the buffer,
trampling on the buffer contents set up by whatever command that
subsequently allocated the buffer. I only caught this during the
CONFIG_SLUB development, so I have no examples (if any) of how this
would have ever manifested itself in a crash command error.
(anderson@redhat.com)
- Fixed the "mach" command in CONFIG_SLUB kernels which would abort
with the error message: "mach: cannot resolve cache_cache" when
trying to determine the value for the L1 CACHE SIZE display. Since
the generic manner of determining the cache size no longer worked
correctly anyway, the L1 CACHE SIZE display has been removed.
(anderson@redhat.com)
- Fix for missing NODE header in NUMA "kmem -f" output.
(anderson@redhat.com)
- Fix for the chronology of the contents of the kernel message buffer
output by the "log" command. (atyson@hp.com)
- Display a WARNING message if a PT_LOAD segment in an ELF-style
dumpfile advertises a memory segment that would go beyond the end
of the dumpfile. (bwalle@suse.de, anderson@redhat.com)
- 4.0-4.7 to 4.0-4.8 incremental patch
(10/30/07)
4.0-4.7 - Incorporation of Luc Chouinard's SIAL interpreter (Simple Image
Access Language) as a crash extension module. When loaded with
the "extend" command, the sial.so module provides three commands,
"load" to load a SIAL script, "unload" to unload it, and "edit",
which unloads the script, brings up an $EDITOR-based edit session
of the script, and then loads it again. Also, when the sial.so
module is loaded, it will automatically load any SIAL scripts
found in the /usr/share/sial/crash or $HOME/.sial directories.
Therefore, by putting "extend <path-to>/sial.so" in either
./.crashrc or $HOME/.crashrc, all desired SIAL scripts may be
loaded on a particular machine in a hands-off manner. For details,
consult the README and README.sial files in the extensions/libsial
subdirectory. (lucchouina@yahoo.com)
- Removed hardwired-dependencies in the top-level and extensions
subdirectory Makefiles for building extension modules. Now it is
possible to copy an extension module's .c file into the extensions
subdirectory, and enter "make extensions" from the top-level to build
it. If the build of the module requires special handling, a .mk
makefile with the same prefix as the .c file may be provided, and
and it will be automatically used to build it.
(jmoyer@redhat.com, anderson@redhat.com)
- When a 32-bit x86 xenU guest is run on an x86_64 dom0 host, the
new-style xen ELF format dumpfile contains an ELF header with an
e_machine type of EM_X86_64 (instead of EM_386). This was getting
gets rejected with the error message "crash: vmcore: not a supported
file format". The fix simply accepts the e_machine type mismatch,
since the new-style ELF format dumpfiles are 64-bit by default.
(anderson@redhat.com)
- Enhanced the "kmem <address>" option to also search for task_struct
and kernel stack addresses, and report them with the "set" output.
Also, fix for when "kmem <vmalloc-address>" was entered, the header
for the mem_map data was not displayed. (anderson@redhat.com)
- Fix for determining starting rip/rsp backtrace hooks for the panic
task in x86_64 xen dom0 kdumps; newer kernels have replaced the
call to "xen_machine_kexec" with "machine_kexec", and without this
patch may display back-traces with missing frames. Also on x86_64
non-xen kdump panic task backtraces, it is possible that the wrong
stack instance of "crash_kexec" is used as the starting hook, which
may also lead to missing frames. (anderson@redhat.com)
- Fix for ia64 LKCD dumpfiles where it is not possible to read the task
structure of the task that follows a task which is in the task address
"fixup list", and zeroes are returned instead. (atyson@hp.com)
- Fix for potential "mod -[sS]" failures with modules whose object
files contain an unusually large number of sections; module
loading attempts may issue a "<segmentation violation in gdb>"
message followed by the error message: "mod: [module name]: gdb
add-symbol-file command failed".
(carl.hsieh@teradata.com, anderson@redhat.com)
- Fix to prevent dumpfile reads beyond EOF when reading new (optimized)
xen ELF core xendumps. Without the patch, error messages of the sort:
"crash: cannot read index page [number]" may occur during session
initialization, with unpredictable run-time results.
(yamahata@valinux.co.jp)
- In x86_xen_kdump_p2m_create(), the same variable was being used as
the for-loop index in both an outer and an embedded inner for-loop.
As a result, if debug level was equal to or larger than 7, the outer
for-loop was repeated only once. (nishimura@mxp.nes.nec.co.jp)
- 4.0-4.6 to 4.0-4.7 incremental patch
(9/25/07)
4.0-4.6 - Also released as:
4.0-4.6.1 - RHEL5.1 errata version (beta)
4.0-4.6.2 - Fedora Rawhide (devel branch)
- Implemented the "runq" command for 2.6.20 and later kernels that have
replaced the O(1) scheduler with the CFS scheduler. If the kernel
was configured to use CFS, the command will display the tasks queued
in each cpu's RT and CFS runqueues. (anderson@redhat.com)
- The initial support put in place for the usage of "kerntypes"
debuginfo files only recognized files created by the LKCD
"dwarfextract" utility run against a -g built vmlinux kernel.
This version adds a new "-k" command line option that allows the
usage of standard -g compiled LKCD Kerntypes files.
(holzheu@linux.vnet.ibm.com)
- Update of "xencrash" support to properly handle dom0/hypervisor
kdumps taken under xen version 3.1 in addition to those taken under
xen 3.0.x. Without this patch, the following warning message
would be displayed during initialization of a xen-syms hypervisor
session: "WARNING: unsupported elf note format". Fixes x86 "bt"
command segmentation violation when running against a xen-syms
hypervisor. Fixes x86_64 session initialization failure when running
against a xen-syms hypervisor, which would display the error
message "crash: invalid structure member offset: tss_struct_rsp0".
(oda@valinux.co.jp)
- 4.0-4.5 to 4.0-4.6 incremental patch
(8/27/07)
4.0-4.5 - Addresses FC7/upstream x86 kernels that have been configured such
that the vmlinux symbol values do not match their relocated values
when loaded. If CONFIG_PHYSICAL_START contains a value that is
greater then CONFIG_PHYSICAL_ALIGN, then this mismatch occurs.
Since the crash utility and its embedded gdb have always expected
that the compiled-in kernel symbol addresses are "real", the virtual
to physical translation fails, leading to an initialization-time
failure with the message: "crash: vmlinux and /dev/crash do not
match!" (/dev/mem or the dumpfile name may replace /dev/crash).
To deal with this issue, there are several alternatives:
1) Configure the kernel with CONFIG_PHYSICAL_START less than
or equal to CONFIG_PHYSICAL_ALIGN. Having done that, there
is no problem; the resultant vmlinux file will be loaded at
the address for which it was compiled, which has always
been the case.
2) Since /proc/kallsyms uses the same format as a System.map file,
and since it reflects the relocated symbol addresses, it
can be placed on the crash command line as if it were
a System.map file. (Note that the System.map file created
by these relocated kernels contains the same "wrong" symbol
values as the vmlinux file from which it was created.)
3) On a live system that has /proc/kallsyms (i.e., the kernel was
configured with CONFIG_KALLSYMS), this version of the crash
utility will replace/patch the vmlinux symbol values with those
seen in /proc/kallsyms. The relocation value will be displayed
as a WARNING message during initialization.
4) On a dumpfile, the relocation will not be performed automatically
as on a live system. It will require the addition of the
/proc/kallsyms on the command line, or if run on a different
host, a copy of the crashed system's /proc/kallsyms may be
used.
5) Alternatively on a dumpfile, a new command line option has been
created to specify the relocation amount. For example, if a
kernel was configured with a CONFIG_PHYSICAL_START value of 16MB
and a CONFIG_PHYSICAL_ALIGN of 4MB, that results in a relocation
of 12MB. To specify that, enter "crash --reloc=12m ..." on the
command line. (Recall that if crash is run on the live system,
a WARNING message will specify the relocation amount.)
Using /proc/kallsyms or a --reloc=[size] as a command line argument
is similar to using a System.map file, in that it results in the loss
of the use of line number debug data. (anderson@redhat.com)
- Fix for x86 2.6.22 kernel initialization-time failure indicating:
"crash: invalid size request: 0 type: __per_cpu_offset"
(oomichi@mxs.nes.nec.co.jp)
- Fix to recognize the 2.6.22 kernel's replacement of kmalloc slab
subsystem from the "./mm/slab.c" file to CONFIG_SLUB-configured
kernels that use the infrastructure in "./mm/slub.c". Without this
fix, crash sessions would fail during initialization with the message
"crash: invalid structure member offset: kmem_cache_s_c_num".
(anderson@redhat.com)
- Cliff Wickman sent an additional patch for the LKCD kerntypes
support he introduced in version 4.0-4.4, which addresses this
message that is seen during initialization on 2.6.22 kernels:
"WARNING: cannot determine pgdat list for this kernel/architecture".
(cpw@sgi.com)
- NOTE: The CONFIG_SLUB change in the 2.6.22 kernel will require a
significant update in the crash utility in order for "kmem -[sS]"
options to work again.
- NOTE: 2.6.20 and later kernels may have replaced the O(1) scheduler
with the new CFS scheduler. If configured to use CFS, the "runq"
command fails, which will require a crash utility update to recognize
and display the contents of each cpu's RT and CFS run queue.
- 4.0-4.4 to 4.0-4.5 incremental patch
(7/27/07)
4.0-4.4 - Fix for kernels in which the irq_desc_t typedef is not included in
the vmlinux debuginfo data, by using the 2.6-era struct irq_desc.
Without the patch, the "irq" command fails with the error message,
"irq: cannot determine size of irq_desc_t". (hugh@mimosa.com)
- Implemented new "irq -u" option that displays only in-use IRQs, now
that there can be several thousand entries in the irq_desc[] array.
(anderson@redhat.com)
- Prevent occasional 99% cpu usage waiting for the built-in less
command to complete. (anderson@redhat.com)
- Implemented support for the use of "kerntypes" debuginfo files that
are created by the LKCD "dwarfextract" utility, as an alternative to
the use of the vmlinux file. This requires the use of the matching
System.map file, as in this example:
# crash kerntypes System.map [vmcore]
This capability was written by Cliff Wickman of SGI, and he has
generously offered to maintain its functionality. (cpw@sgi.com)
- Fixes, code improvement and cleanup for "crash -h [command]".
(hugh@mimosa.com)
- The output of command data exceeding a terminal page-size has been
traditionally fed by default to "/usr/bin/less -E -X" with a prompt;
if the /usr/bin/less command was not available on the host system,
output would be fed to "/bin/more" instead. Scrolling can be turned
off with "set scroll off" or the built-in alias "sf", and back on
with "set scroll on" or the built-in alias "sn". This release
allows the user to specify an alternative scrolling program by
creating a CRASHPAGER environment variable, which be used by default
if it exists. Also, the "set scroll [arg]" internal variable setting
command, which until now accepted "on" and "off" as arguments, now
accepts "less", "more" and "CRASHPAGER" as alternative arguments,
both during runtime, or in .crashrc files. Also, new crash command
line arguments have also been added to override the default and/or
.crashrc settings: --more, --less, and --CRASHPAGER. Lastly, the
output of the "crash -h [command]" will also use the relevant scroll
command selection. (anderson@redhat.com)
- Updated crash(8) man page. (hugh@mimosa.com, anderson@redhat.com)
- 4.0-4.3 to 4.0-4.4 incremental patch
(7/20/07)
4.0-4.3 - Tentatively scheduled as the baseline version for RHEL4.6 and RHEL5.1
crash utility errata releases:
4.0-4.3.0 - RHEL4.6 errata version
4.0-4.3.1 - RHEL5.1 errata version
- Fix for "kmem -f" command on 2.6.17 and later CONFIG_DISCONTIGMEM
kernels. Without the patch, the command would fail with the error
message "kmem: cannot determine zone mem_map: TBD".
(troy.heber@hp.com)
- Fix for segmentation violation when using the wrong vmlinux file
command line argument on a live system on either the x86_64 or
ia64 architectures. If attempted with this version, the normal
"WARNING: vmlinux and /proc/version do not match!" message will
be followed by an additional warning message that displays the
Linux version number from /proc/version, and then the final message:
"crash: please use the vmlinux file for that kernel version, or
try using the System.map for that kernel version as an additional
argument." (anderson@redhat.com)
- For all 4 types of input-file processing:
1) $HOME/.crashrc
2) ./.crashrc
3) "crash -i input"
4) session-time "< input"
If a command in the input file encounters a FATAL error, the
remainder of the commands will be executed. Until now, if any
command in the input file caused a FATAL error, the processing
of the remainder of the commands would be aborted.
(anderson@redhat.com)
- 4.0-4.2 to 4.0-4.3 incremental patch
(6/22/07)
4.0-4.2 - Fix for support of 2.6.22 kernels, which have changed the name
of the task_struct's "thread_info" member to the "stack" member.
This would cause the crash session to fail during initialization.
(troy.heber@hp.com, anderson@redhat.com)
- Fix to account for the number of pgdata nodes being less than the
number of cpus. Without the patch, the crash session would fail
during initialization with the error message: "crash: numnodes out
of sync with pgdat_list?" (sharyath@in.ibm.com)
- Implemented support for ia64 dom0/HV kdump dumpfile support, taken
either via the traditional kdump process, or simulated via the
Fujitsu "sadump" facility. (oda@valinux.co.jp)
- Created a "--no_panic" command line option to avoid the panic-task
search during initialization. (anderson@redhat.com)
- Implmented a new "ps -r" option to display resource limits (ulimits):
crash> ps -r 20618
PID: 20618 TASK: 1003cb82030 CPU: 1 COMMAND: "bash"
RLIMIT CURRENT MAXIMUM
CPU (unlimited) (unlimited)
FSIZE (unlimited) (unlimited)
DATA (unlimited) (unlimited)
STACK 10485760 (unlimited)
CORE 0 (unlimited)
RSS (unlimited) (unlimited)
NPROC 8180 8180
NOFILE 1024 1024
MEMLOCK 32768 32768
AS (unlimited) (unlimited)
LOCKS (unlimited) (unlimited)
SIGPENDING 1024 1024
MSGQUEUE 819200 819200
(anderson@redhat.com)
- Implement support for the registration of CLEANUP extension commands
that do not show up in help menu, but get called by restore_sanity().
Extension modules may also register HIDDEN_COMMAND functions; and the
"help -e" debug output has been enhanced. (anderson@redhat.com)
- Implemented a new symbol_value_module() primitive, primarily for use
by extension modules to quickly access the address of a module symbol
in cases where a name-clash may exist between the base kernel and/or
other modules. (anderson@redhat.com)
- The crash-4.0-4.2.src.rpm package will create an additional package
named crash-devel-4.0-4.2.i386.rpm, which is for use by extension
modules. The -devel package installs the top-level "defs.h" file in
"/usr/include/crash/defs.h". (anderson@redhat.com)
- 4.0-4.1 to 4.0-4.2 incremental patch
(6/04/07)
4.0-4.1 - Implemented dependable backtraces for the x86_64 architecture. (!!!)
This feature builds upon the current "low_budget" backtrace function,
and also required the fix for the BUG()/ud2a disassembly problem
addressed in 4.0-3.22. It does not require kernel unwind support,
but rather it calculates function framesizes by disassembling the
code from the beginning of the function to the point where it calls
the next function, parsing for add or sub instructions on the rsp,
and for push and pop instructions, thereby determining the framesize
of the function at the point of the call. This is similar to what is
done for x86, but requires far less hackery. You will notice a slight
hitch the first time a "bt" is done on a task, but for each text
return address in any backtrace, its framesize is cached for all
subsequent instances. It also accounts for backtrace text return
addresses from the .text.lock section, by appending "(via function)"
to the end of the frame line. Also, because it layers on top of the
current backtrace code, it does not compromise the capability of
switching between the process, IRQ, and exception stacks. That all
being said, 100% accuracy cannot be guaranteed. But for the ~30
sample dumpfiles I keep around for x86_64 testing, I cannot find any
obviously invalid backtraces. However, if there is any doubt, the
"bt -o" option will perform backtraces using the "old" manner; and
"bt -O" will force the old manner to always be used. Of course the
"bt -t" and "bt -T" options are still available. It's interesting to
redirect the output of "foreach bt" to a file using this version, and
then compare it with the output from an older version.
(anderson@redhat.com)
- Fix for s390 and s390x backtrace commands to recognize the kernel
structure name change from "runqueue" to "rq".
(holzheu@linux.vnet.ibm.com)
- Merged fourth round of "xencrash" patches, which allows a crash
session to alternatively be brought up against the xen-syms
binary instead of a vmlinux kernel. This patch enhances the
"doms" command display contents, and adds support to access the
ia64 frame table virtual address space so that the page_info table
can be accessed. (oda@valinux.co.jp)
- 4.0-3.22 to 4.0-4.1 incremental patch
(4/27/07)
4.0-3.22 - In kernel version 2.6.20 a "__bug_table" section has been added
to the kernel for x86 and x86_64, which contains the encoding for
the filename and line number information associated with each
instance of a kernel BUG(). Prior to that, x86 and x86_64 kernels
may have contained the filename/line-number encoding in the bytes
following the BUG()'s "ud2a" instruction. When disassembled, the
output would display a series of nonsensical instructions, or perhaps
one or more "(bad)" instruction lines, before eventually getting
back in sync with the actual instruction stream. Whether the
encoded bytes were included depends upon the kernel version,
whether CONFIG_DEBUG_BUGVERBOSE was configured, or whether an
"#if 1" surrounding the BUG() definition was manually changed.
This version of crash determines whether the encoded bytes exist,
and if so, the embedded gdb disassembler has been modified to
skip over those bytes, resulting in correct "dis" command output.
If necessary, a "dis -b" option has been added to override the
pre-calculated encoded byte count value. (anderson@redhat.com)
- Fix for the x86 backtrace code to also recognize the encoded
filename and line number information potentially following
"ud2a" instructions generated by kernel BUG() calls. In order
to determine the framesize of a function, the backtrace code
does its own text disassembly to count instances of push, pop,
and stack register increments/decrements. Without this patch,
the framesize calculation may either be too small or too large,
depending upon the contents of the encoded data following the
BUG()'s ud2a instruction. Therefore, it is possible that one or
more bogus frames are selected and displayed, and/or one or more
legitimate frames are skipped over. For example, when it affected
the framesize calculation of schedule(), backtraces of all non-active
tasks ending up in schedule() would be invalid. Here's an example in
which the schedule() framesize was miscalulated:
PID: 1292 TASK: ed78a000 CPU: 0 COMMAND: "setroubleshootd"
#0 [c07fdba8] schedule at c05f370e
#1 [c07fdcb4] __journal_file_buffer at ee05126d
#2 [c07fdcd8] __journal_file_buffer at ee05126d
#3 [c07fdd08] ext3_mark_iloc_dirty at ee08837d
#4 [c07fdd38] journal_dirty_metadata at ee052a13
#5 [c07fdd80] __find_get_block at c0463f59
#6 [c07fddac] __find_get_block at c0463f59
#7 [c07fddf0] find_get_page at c0444294
#8 [c07fddfc] filemap_nopage at c0446cf5
#9 [c07fde6c] find_extend_vma at c0454132
#10 [c07fde7c] get_futex_key at c042f9f6
#11 [c07fde94] futex_wake at c042fe2a
#12 [c07fdeb8] do_futex at c0430a19
#13 [c07fdfac] sys_poll at c047254b
#14 [c07fdfb8] system_call at c0404cf8
EAX: ffffffda EBX: 09f3da18 ECX: 00000002 EDX: 00000064
DS: 007b ESI: 00000064 ES: 007b EDI: 00342ff4
SS: 007b ESP: bfe76d04 EBP: bfe76d18
CS: 0073 EIP: 0094a402 ERR: 000000a8 EFLAGS: 00200246
With the fix, it looks like this:
PID: 1292 TASK: ed78a000 CPU: 0 COMMAND: "setroubleshootd"
#0 [c07fdba8] schedule at c05f370e
#1 [c07fdc0c] schedule_timeout at c05f3e7c
#2 [c07fdc30] do_sys_poll at c047243e
#3 [c07fdfac] sys_poll at c047254b
#4 [c07fdfb8] system_call at c0404cf8
EAX: ffffffda EBX: 09f3da18 ECX: 00000002 EDX: 00000064
DS: 007b ESI: 00000064 ES: 007b EDI: 00342ff4
SS: 007b ESP: bfe76d04 EBP: bfe76d18
CS: 0073 EIP: 0094a402 ERR: 000000a8 EFLAGS: 00200246
In the example above, the schedule() framesize was miscalculated
because the post-ud2a text contained the filename pointer address
c060fe0b, and the "60" was decoded as a "pusha" instruction; that
occurred twice, each time incrementing the framesize by 32 bytes.
(anderson@redhat.com)
- Added preparations for an upcoming version update to kdump's
associated makedumpfile utility, which will return an error if a
read attempt of a page that has been explicitly excluded is made.
Until now, a zero-filled page was returned. To maintain the
current behavior of returning a zero-filled page when accessing
an excluded page, three options are available:
1) use the "--zero_excluded" crash command line option.
2) during runtime, enter "set zero_excluded on".
3) enter "set zero_excluded on" in your .crashrc file.
(anderson@redhat.com, oomichi@mxs.nes.nec.co.jp, bob.montgomery@hp.com)
- Implemented "help -n" debug output function for compressed diskdump
and compressed kdump dumpfiles. As is done for the other dumpfile
formats, the core file's header information along with any other
run-time dumpfile data is displayed. (anderson@redhat.com)
- If the page-exclusion "dump_level" of a compressed diskdump, a
compressed kdump, or an ELF diskdump dumpfile exists and can be
determined, its value and bitmask translation will be displayed as
part of the "help -n" dumpfile debug output. Also, as has been done
with partial ELF diskdumps, if a compressed diskdump or compressed
kdump can be confirmed as a partial dump, the "[PARTIAL DUMP]"
indicator will follow the dumpfile name during initialization and by
the "sys" command. (anderson@redhat.com, oomichi@mxs.nes.nec.co.jp,
indou.takao@jp.fujitsu.com, akiyama.nobuyuk@jp.fujitsu.com)
- Support for xendumps of fully-virtualized x86_64 relocatable
kernels. Without the patch, the physical base address was not
being determined, and the session would fail during initialization
with the error message: " crash: vmlinux and core do not match!"
(anderson@redhat.com)
- Fix for 4.0-3.21 "BOOKE" ppc.c patch, which failed to compile.
(antipov@ru.mvista.com)
- 4.0-3.21 to 4.0-3.22 incremental patch
(04/10/07)
4.0-3.21 - Introduced support for upstream xensource ELF format dumpfiles,
which will replace the current xendump format in xen 3.0.5. The
new xen format uses ELF in a non-standard manner such that memory
contents are defined in section headers instead of the traditional
manner of using program headers. Testing has been completed on
paravirtualized x86, x86 PAE, x86_64 and ia64 dumpfiles. Fully-
virtualized dumpfiles have not been tested. (anderson@redhat.com)
- A number of "xencrash" (where the session is run against a xen-syms
binary) fixes have been applied:
1) "bt" did not switch from the ia64 MCA stack to the vcpu stack.
2) "bt" caused an infinite loop if ar_bspstore contained an illegal
value.
3) "bt" shows unnecessary unwind warning message. (ia64)
4) "man log" caused crash to fail with a segmentation violation.
5) "man log" did not have an example.
(oda@valinux.co.jp)
- Fix for "vtop" on x86 PAE kernels, which could abort upon reaching
the PTE translation section, showing the error message: "vtop:
cannot determine the swap location". (anderson@redhat.com)
- Fix for "vm -p" or "vtop" on 2.6 x86 PAE kernels, which could show
incorrect swap offsets, because the swap type/offset encoding was
moved to the high word of the 64-bit PTE. (anderson@redhat.com)
- Fix for "vm -p" on x86_64 kernels when a PTE referenced a swap
location, it would show "(not mapped)" instead of the swap location.
(anderson@redhat.com)
- In current 2.6 kernels, it is now possible to recognize ppc BOOKE
processors, which is the current default in crash. If the processor
is confirmed to not be BOOKE, then page table translation is done
differently. (antipov@ru.mvista.com)
- Fix for live system analysis of Ubuntu kernels due to a mismatch
between /proc/version and the linux_banner string. This was due
to an appendage to the linux_banner string in Ubuntu kernels.
(asid@hp.com)
- Fix for 2.6.21 kernels that fail during initialization with the
message: "crash: invalid (optional) structure member offsets:
zone_struct_free_pages or zone_free_pages". This was due to the
removal of the zone struct's "free_pages" member; instead the
zone struct's "vm_stat[NR_FREE_PAGES]" value is used.
(anderson@redhat.com)
- 4.0-3.20 to 4.0-3.21 incremental patch
(03/16/07)
4.0-3.20 - Merged third round of "xencrash" patches, which allows a crash
session to alternatively be brought up against the xen-syms
binary instead of a vmlinux kernel. This update introduces
support for ia64. (oda@valinux.co.jp)
- Verified support of live system analysis of ia64 xen kernels, and
removed unnecessary EFI memory verification warning message during
their initialization. (anderson@redhat.com)
- Added gdb's "shell" command to the prohibited gdb command list, and
updated the "help output" page to describe shell escape usage.
(anderson@redhat.com)
- Fix for the x86 "bt" command for the 2.6.20 kernel, which has added
the "xgs" field to the pt_regs structure. Without this patch, the
exception frame dump in "bt" would show invalid contents for several
registers; the fix also shows the GS register contents.
(anderson@redhat.com)
- Fix for the "mount" command for the 2.6.20 kernel to recognize the
new "nsproxy" field in the task_struct and the contents of the
nsproxy and mnt_namespace structures, in order to find the root
mount namespace. Without the patch, the command would fail with:
"mount: invalid kernel virtual address: 69 type: first list entry".
(anderson@redhat.com)
- Fix for the "files" command for the 2.6.20 kernel to handle the
removal of the fdtable "max_fdset" member. Without the patch, the
command would fail with: "files: invalid structure member offset:
fdtable_max_fdset". (anderson@redhat.com)
- Fix for the "net -[sS]" command options for the 2.6.20 kernel to
handle the removal of the fdtable "max_fdset" member. Without the
patch, the command would fail with: "net: invalid structure member
offset: fdtable_max_fdset". (anderson@redhat.com)
- Fix for the "vm" command for the 2.6.20 kernel to handle the removal
of the file structure's "f_dentry" member, and its placement inside
the embedded "path" structure. Without the patch the command would
fail with: "vm: invalid structure member offset: file_f_dentry".
(anderson@redhat.com)
- Fix for the "swap" command for the 2.6.20 kernel to handle the removal
of the file structure's "f_vfsmnt" member, and its placement inside
the embedded "path" structure. Without the patch the command would
fail with: "swap: invalid structure member offset: file_f_vfsmnt".
(anderson@redhat.com)
- 4.0-3.19 to 4.0-3.20 incremental patch
(02/21/07)
4.0-3.19 - Fix for support of paravirtual x86 xendumps that were:
1) created on host machines with greater than 4GB of memory, and
2) the active guest task at crash-time had been assigned a page
directory page (cr3) with a machine address greater than 4GB.
If both of the above apply, the crash session would fail with one of
two error messages, either "crash: cannot read/find cr3 page", or
"crash: cannot create xen pfn-to-mfn mapping". (anderson@redhat.com)
- Fix for the "kmem -p [page-struct-address]" command construct, which
would cause a segmentation violation when run on SPARSEMEM kernels.
(anderson@redhat.com)
- Added a new "struct -u" option, which indicates that the subsequent
address argument is a user virtual address in the current context.
This option could be used, for example, if a known kernel data
structure exists at user virtual address in the current context,
or if the debuginfo data of a user program were loaded into the
crash session via the gdb "add-symbol-file" command.
(anderson@redhat.com)
- Added new "rd -f" and "struct -f" options, which indicate that the
subsequent address argument is a dumpfile file offset. These options
could be used, for example, to print a known kernel data structure
that exists in the dumpfile header, or to simply dump data directly
from the dumpfile. (anderson@redhat.com)
- Cosmetic fix to prevent double-printing of "kmem -p" and "kmem -v"
headers when those commands are passed multiple address arguments.
(anderson@redhat.com)
- 4.0-3.18 to 4.0-3.19 incremental patch
(02/07/07)
4.0-3.18 - Enhancement to the "mod" command to expand the number of section
arguments to the internal "add-symbol-file" command issued to gdb to
load the debug data for module objects. On most architectures, this
allows the usage of the command construct "p [module-symbol-name]" to
print out the module data structure in the same way that is done for
kernel proper data structure names. (castor.fu@3pardata.com)
- Two enhancements to significantly speed up the initialization of
crash sessions when running against multi-gigabyte xen kernels or
xendumps. The cache of mfn-to-phys_to_machine_mapping page has been
changed from a single-mfn-to-phys_to_machine_mapping page format to
storing a contiguous-range-of-mfns-to-phys_to_machine_mapping format.
This benefit is primarily seen during the "gathering module symbol
data" phase. The second change simply increases the size of the
pfn-to-xendump-page-offset cache. (anderson@redhat.com)
- Fix for a segmentation violation during the "gathering task table
data" phase of initialization if the thread_info structure of the
runqueue-advertised active task has been freed. This has only ever
been seen in a xendump created by "xm dump-core -L [guest-domain]".
(anderson@redhat.com)
- Cosmetic fix to prepend newlines to messages that happen to be
generated during any of the "please wait" segments of initialization.
(anderson@redhat.com)
- Addressed several compiler warnings when using -D_FORTIFY_SOURCE=2.
Some are in gdb code that is never exercised, others were legitimate
but would require impossible code paths, but one of them could
result in runaway "help -t" output if the kernel was built without
IKCONFIG. (bwalle@suse.de)
- Fix for the s390x "bt -f" command option, which was displaying the
stack as a sequence of 32-bit words which were dumped "backwards",
i.e., at the wrong offset. (krader@us.ibm.com)
- 4.0-3.17 to 4.0-3.18 incremental patch
(02/01/07)
4.0-3.17 - Two fixes for "dev -p" command option:
1) The head entry of the PCI device list was being skipped.
2) For systems with no PCI devices, exit gracefully rather than
failing the command due to the use of an invalid virtual
address.
(rachita@in.ibm.com, anderson@redhat.com)
- Fix to recognize "linux_banner" symbol type change from 'R'
to 'r' in 2.6.20-rc2 kernels. Without the patch, the session
fails during initialization with the error message " WARNING:
invalid linux_banner pointer: 756e694c", and then "crash: vmlinux
and vmcore do not match! (vgoyal@in.ibm.com)
- Fix to recognize "__per_cpu_start" and "__per_cpu_end" symbol
type change from 'A' to 'D' in relocatable kernels. Without
the patch, SMP kernels running on uniprocessor systems may fail
during initialization with the message "crash: cannot resolve
init_task_union". (sachinp@in.ibm.com)
- Fix for the xencrash "dumpinfo -t" command to properly cycle
through the ELF_timeval structures for each cpu.
(anderson@redhat.com)
- Fix for x86_64 backtraces that may end prematurely at either a
stale "schedule" or "schedule_timeout" reference when doing a
"bt" on an active task in a dumpfile. (anderson@redhat.com)
- Fix for a possible empty panic message in 2.6 kernels both during
initialization and when running the "sys" command, because of
the change of the kernel panic() string from "Kernel panic: " to
"Kernel panic -- not syncing: ". If the panic message was not
recognized in another manner, such as by an oops message, by a
kernel BUG message, or sysrq-generated crash, the "PANIC:" status
would be empty. (anderson@redhat.com)
(01/12/07)
4.0-3.16 - Recognize new XC_CORE_MAGIC_HVM xendump magic number, which in turn
introduces support for xendumps of fully-virtualized ia64 kernels.
(oda@valinux.co.jp)
- Recognize an INVALID_MFN marker in the indexed mfn list of a xendump,
and if found, fail the read attempt on the associated pfn.
(oda@valinux.co.jp, anderson@redhat.com)
(12/21/06)
4.0-3.15 - Introduced support for xendumps of fully-virtualized x86 kernels
taken while running on an x86 Xen host (32-bit on 32-bit host).
(anderson@redhat.com)
- Introduced support for xendumps of fully-virtualized x86 kernels
taken while running on an x86_64 Xen host (32-bit on 64-bit host).
(anderson@redhat.com)
- Introduced support for xendumps of fully-virtualized x86_64 kernels.
(anderson@redhat.com)
- Introduced support for xendumps of para-virtualized ia64 kernels.
It should be noted that currently the ia64 Xen kernel does not
lay down a switch_stack for the panic task, so only raw "bt -t"
backtraces can be done on the panic task. (anderson@redhat.com)
- Introduced support for "xm save" dumpfiles of para-virtualized ia64
kernels, which use a completely different format than that used for
x86 and x86_64. (anderson@redhat.com)
- Additional support for the current kexec/kdump patch for Xen:
1) Merged second round of "xencrash" patches, which allows a crash
session to be alternatively brought up against the xen-syms
binary instead of a vmlinux kernel. (oda@valinux.co.jp)
2) Using the xencrash feature above, the pfn_to_mfn_list_list value
of any guest domain that was running when the dom0 or hypervisor
crashed can be determined; that pfn value can in turn be used
as an argument to a new "--p2m_mfn [pfn]" crash command line
option. That will allow a crash session to be run against any
guest domain. Therefore, with a single dom0/hypervisor vmcore,
the following types of crash sessions may be initiated:
$ crash vmlinux-dom0 vmcore
$ crash xen-syms vmcore
$ crash --p2m_mfn [pfn] vmlinux-guest-#1 vmcore
$ crash --p2m_mfn [pfn] vmlinux-guest-#2 vmcore
$ ...
(anderson@redhat.com)
3) Fixed "help -n" debug output to properly display the contents
of the new XEN_ELFNOTE_CRASH_INFO and XEN_ELFNOTE_CRASH_REGS
ELF note types. (anderson@redhat.com)
- Turn off the LKCD dumpfile-access "spinner" when "crash -s" is used.
(castor.fu@3pardata.com)
- Update to MODULES_IN_CWD code segment so that it will work on 2.6
kernels where modules end with ".ko". This requires that kernel.c
is compiled with -DMODULES_IN_CWD. (castor.fu@3pardata.com)
- Support LKCD "map" files in lieu of standard System.map files.
Without this patch, crash would fail with an error message of the
sort: "crash: map.4: not a supported file format". (bwalle@suse.de)
- The ia64 PR_UNALIGN_NOPRINT and PR_FPEMU_NOPRINT prctl commands have
been moved earlier in time, in order to prevent "unaligned access"
messages when accessing ELF header contents. (anderson@redhat.com)
- The dlopen() call used by the "extensions" facility has been changed
to use the RTLD_GLOBAL flag, so that symbols from an extension object
will be visable to subsequently loaded modules. (asid@hp.com)
(12/20/06)
4.0-3.14 - Tentatively scheduled for RHEL5-GA
- Added support for Magnus Damm's latest kexec/kdump patch for Xen.
The ELF header of the vmcore, which is a full memory dump of the
dom0/hypervisor combination, contains a XEN_ELFNOTE_CRASH_INFO note
that contains the pfn_to_mfn_list_list value for dom0, allowing
pfn-to-mfn translations may be made for crash analysis of the dom0
linux kernel. (anderson@redhat.com)
- Added support for recognizing the zero-fill segments in ELF vmcore
files created by the makedumpfile command from kdump /proc/vmcore files.
Without this patch, ELF vmcore files generated by makedumpfile could
only be used by gdb. (anderson@redhat.com)
- Updated the 4.0-3.4 patch that addressed the bogus kernel-/proc/version
mismatch initialization failures using recent s390x vmlinux files that
contain an ASCII character just preceding the Linux version string.
That patch fixed the problem when the vmlinux file name was placed on
the crash command line; this version also fixes it when "crash" is
entered alone on the command line, and it has to search for the vmlinux
file. (anderson@redhat.com)
(12/01/06)
4.0-3.13 - Adapted the "xencrash-0.2" patch described here:
https://www.redhat.com/archives/crash-utility/2006-November/msg00036.html
This functionality consists of three inter-dependent parts, all of
while are still under development:
1) the kexec-tools user package
2) the kdump kernel patch for Xen
3) the crash utility
The end result will be a single crash binary that can be used with
either the Xen dom0 vmlinux kernel, or with the xen-syms hypervisor binary,
with the common vmcore created when either of those two entities crash.
(oda@valinux.co.jp, anderson@redhat.com)
- Fixed the initialization-time, and "sys" command, displays of the system
memory size when memory nodes have holes. Without this patch, more memory
than what is installed may be displayed. (anderson@redhat.com)
(11/27/06)
4.0-3.12 - For 2.6.14 and later ia64 kdumps, taken either as a result of the
INIT switch, or when an MCA exception has occurred, several problems
needed to be addressed. First, the "pseudo-task" that handles the
kdump operation due to an INIT or MCA was not being recognized as
the "panic" task. Secondly, the backtraces of the per-cpu INIT
or MCA handling pseudo-tasks only went back as far as their entry
onto their own per-cpu stacks, and did not show the backtrace of
the task that was running on that cpu when the INIT or MCA event
occurred. This version recognizes the pseudo-task that handles the
kdump operation; and for each cpu, the active tasks' backtraces now
also show a transition back to the task that was running on that cpu
when the INIT or MCA event occurred. (j-nomura@ce.jp.nec.com)
- To address the need to display per-cpu variables, the "p"
command has been modified to recognize "per_cpu__xxx" arguments
when the kernel is SMP, in order to prevent the attempt to display
the contents of a variable whose symbol value does not represent
the actual location of its data. In that case, the data type of
the per-cpu variable will be displayed, followed by the addresses
of each per-cpu instance. Given that information, a proper command
can be utilized in order to display the data. For example, to look
at the per-cpu buffer_head accounting for cpu 2:
crash> p per_cpu__bh_accounting
PER-CPU DATA TYPE:
struct bh_accounting per_cpu__bh_accounting;
PER-CPU ADDRESSES:
[0]: c5405a80
[1]: c540da80
[2]: c5415a80
[3]: c541da80
crash> bh_accounting c5415a80
struct bh_accounting {
nr = 434,
ratelimit = 2216
}
Note that "p" on the first command line above is optional, because
whenever a data variable is entered alone, crash will recognize it
as such, and pass it to the "p" command by default. I had thought
of putting this functionality into the "struct" command, but many
of the per-cpu variables are pointers, arrays, etc.. So for the
non-structure cases, the "rd" command would be more appropriate,
or alternatively a cobbled-together gdb print command.
(anderson@redhat.com)
- A consolidated cleanup and minor fixes patch has been applied to
the experimental x86_64 dwarf CFI unwind facility.
(rachita@in.ibm.com)
- Also related to the experimental x86_64 dwarf CFI unwind facility,
fixed a problem where if a "set unwind on" was done, and followed
by a subsequent "set unwind off", then the "bt" output could either
cause a segmentation violation, or display backtrace data that was
different from the original. (anderson@redhat.com)
(11/15/06)
4.0-3.11 - Tentatively scheduled for RHEL5-B2
- Updated fix for 2.6.18 x86_64 kernels to address the change in
the IRQ-stack-to-process stack linkage; the fix introduced in
4.0-3.9 could fail depending upon the crash session's display
window size, due to a behind-the-scenes gdb line-wrap of text
disassembly. (anderson@redhat.com)
(11/09/06)
4.0-3.10 - [Red Hat internal -- identical to 4.0-3.9]
4.0-3.9 - Tentatively scheduled as errata version for RHEL4-U5.
- The current 2.6.18 x86_64 kernel has changed the IRQ-stack-to-
process-stack linkage, where until now the link value was a pointer
to the exception frame on the process stack, but has been changed
to point to a location on the process stack above the exception
frame. Because of that, after displaying the trace data from the
IRQ stack, "bt" would then display an invalid exception frame,
which was reported as a "possibly bogus exception frame".
(anderson@redhat.com)
- Also in x86_64 kernels, fix for the "bt" command. When the backtrace
started on the NMI exception stack, it was displaying the correct
exception frame data, but was erroneously reporting that it was a
"possibly bogus exception frame". (anderson@redhat.com)
- And again in x86_64 kernels, fix for the "bt" command. When making
the transition from the IRQ stack back to the process stack, when
the IRQ stack entry was made via the relatively new "call_softirq"
entry point. In that case, there is no exception frame on the
process stack, because it's essentially just a cross-stack call
from do_softirq(). However, a bogus exception frame was being
displayed, along with a "possibly bogus exception frame" message;
and if the RIP value in the truly bogus exception frame happened
to fall in the user virtual address range, the remainder of the
process stack trace was not displayed at all. (anderson@redhat.com)
- Fix for 2.6.18-era ia64 DISCONTIGMEM kernels, which would fail
during initialization with the error message: "crash: invalid
(optional) structure member offsets: pglist_data_node_next or
pglist_data_pgdat_next". (anderson@redhat.com)
- Adapted Olivier Daudel's nifty enhancement to the "struct" command,
which allows the single "struct.member" argument to optionally be
expressed in a "struct.member[,member,member] format, in order to
display multiple members of a given structure. This also applies to
the "union" and "*" commands, as all three functions have now been
combined into one behind the scenes. Fixed the display for applying
a minus count, and given that it opened up a the door to a number of
entry errors, I also added additional error-catching/handling to avoid
the display of incorrect structure data.
(olivier.daudel@u-paris10.fr, anderson@redhat.com)
- Fixed three sources of potential segmentation violations when using
the "bt" command when the experimental dwarf CFI unwind backtrace
facility was turned on. (anderson@redhat.com)
- Added a new machdep_init(POST_VM) call, which is currently only being
used by the x86_64 architecture; it calls init_unwind_table(), which
has to be done after vm_init() in order to access the unwind tables
of kernel modules. (anderson@redhat.com)
- Prevent ia64 "floating-point assist fault" and "unaligned access"
console messages by issuing PR_FPEMU_NOPRINT and PR_UNALIGN_NOPRINT
prctl() settings. (anderson@redhat.com)
(11/02/06)
4.0-3.8 - Fix for the "irq" command when run on 2.6.17 and later kernels, which
replaced the hw_interrupt_type structure with the irq_chip structure.
Without the patch, the command would fail with the error message
"irq: invalid structure member offset: irq_desc_t_handler".
(rachita@in.ibm.com)
- Phased in the first stage of support for the use of dwarf CFI data to
produce accurate x86_64 back traces, and to eventually improve the
reliability of x86 back traces. The code is very much still under
development, and is not turned on as of yet; for x86_64 only, its
usage can be toggled on and off with the set command, by entering
"set unwind on" or "set unwind off". It will only work if dwarf CFI
information exists in the kernel memory, or if the vmlinux file
contains an .eh_frame section. Expect multiple iterations before
this feature is ready for prime-time.
(rachita@in.ibm.com, anderson@redhat.com)
- Prevents stream of invalid "WARNING: possibly bogus exception frame"
messages during initialization when run against x86_64 xendump
dumpfiles created with the new "xm dump-core" facility.
(anderson@redhat.com)
- Fix for the "struct -o" option to print structure member offsets if
the member type is a function pointer. (anderson@redhat.com)
(10/20/06)
4.0-3.7 - Support for paravirtualized x86_64 RHEL4 Xen kernels, which require
the use of unique hardwired kernel VM addresses, as well as a new
user vtop function. Without the patch, crash would report several
read errors during invocation, and then eventually die with this
message: "crash: cannot access phys_to_machine_mapping page".
(anderson@redhat.com)
- Fix for accessing user space stack addresses in ia64 kernels with
3-level page tables. This was a reqression introduced in 4.0-3.1,
and would cause the new "ps -a" option to fail with an error message
such as: "ps: cannot access user stack address: 60000fffffffbe28".
Also, if the user stack address was given an the argument to the
"vtop" command, it would indicate "(not mapped)".
(anderson@redhat.com)
- Implemented a new "sig -g" option, which breaks down the signal
information into a common per-thread group section, followed by
the signal information relevant to each task in the thread group.
Added the capability of using the option via "foreach sig -g".
(olivier.daudel@u-paris10.fr, anderson@redhat.com)
- Update to allow the entry of multiple "list -s struct.member"
arguments, in order to display multiple members from each structure.
Added the capability of entering a single "-s" option with multiple
members entered in a comma-separated list, i.e., using the option
format "-s struct.member1,member2,member3".
(olivier.daudel@u-paris10.fr, anderson@redhat.com)
- The refresh_hlist_task_table() and refresh_hlist_task_table_v2()
functions now recognize when the number of running tasks exceeds their
internal table size, and realloc's task space as required. Without
the patch it would be possible to not access all tasks in a live
system if the number of tasks increased (rather dramatically) from the
time that the crash session started. (anderson@redhat.com)
- Added a new hash queue tool called hq_entry_exists(). The function
may be helpful in an extension, or future patch, to query for the
existence of an entry in the current hash queue. (jmoyer@redhat.com)
(10/13/06)
4.0-3.6 - Workaround for pre-2.6.17 kernels whose vmlinux file does not
contain debug information for the "pid_hash" array. Without this
patch, the crash session would fail during initialization with the
error message: "crash: cannot determine pid_hash array dimensions".
This problem appears to be limited to kernels built with gcc
version 4.0.0, which had a known regression that omitted debug
information for uninitialized variables. (anderson@redhat.com)
(10/05/06)
4.0-3.5 - Implemented new "ps -a" option which, when available, displays the
complete command line and environment variables of selected, or all,
tasks. For example:
crash> ps -a automount
PID: 3948 TASK: f722ee30 CPU: 0 COMMAND: "automount"
ARG: /usr/sbin/automount --timeout=60 /net program /etc/auto.net
ENV: SELINUX_INIT=YES
CONSOLE=/dev/console
TERM=linux
INIT_VERSION=sysvinit-2.85
PATH=/sbin:/usr/sbin:/bin:/usr/bin
LC_MESSAGES=en_US
RUNLEVEL=3
runlevel=3
PWD=/
LANG=ja_JP.UTF-8
PREVLEVEL=N
previous=N
HOME=/
SHLVL=2
_=/usr/sbin/automount
Individual tasks may be selected in the same manner as always;
"ps -a" alone lists all tasks. (anderson@redhat.com)
- Implmented new "ps -g" option, which lists tasks by thread group,
for selected, or all, tasks. For example, to display the tasks
in the thread group containing task c20ab0b0:
crash> ps -g c20ab0b0
PID: 6425 TASK: f72f50b0 CPU: 0 COMMAND: "firefox-bin"
PID: 6516 TASK: f71bf1b0 CPU: 0 COMMAND: "firefox-bin"
PID: 6518 TASK: d394b930 CPU: 0 COMMAND: "firefox-bin"
PID: 6520 TASK: c20aa030 CPU: 0 COMMAND: "firefox-bin"
PID: 6523 TASK: c20ab0b0 CPU: 0 COMMAND: "firefox-bin"
PID: 6614 TASK: f1f181b0 CPU: 0 COMMAND: "firefox-bin"
The thread group leader will be shown first, with the other threads
indented. Individual tasks may be selected in the same manner as
always; "ps -g" alone lists all thread groups. (anderson@redhat.com)
- Fix for "timer" display; although the timer_list entries for each cpu
are correct, the "TVEC_BASES[cpu]" output was displaying incorrect
addresses for each cpu's tvec_base_t structure. (anderson@redhat.com)
(10/02/06)
4.0-3.4 - Implemented support for x86_64 and ia64 compressed kdump dumpfiles
created by the makedumpfile command, which need to pass their
respective physical address load locations in a kdump-specific
dumpfile sub-header. (oomichi@mxs.nes.nec.co.jp)
- Fix for the "timer" command on 2.6.17 and later kernels. Without this
patch, the command would spew out error messages of the sort:
timer: invalid list entry: 0
timer: ignoring faulty timer list at index 0 of timer array
This was due to the kernel's tvec_bases data structures being moved
out of the per-cpu memory regions, and replaced with just per-cpu
pointers to the data. (anderson@redhat.com)
- Fix for ia64 machines whose kernel's text and static data region 5
segment is not loaded at physical address 64MB; live systems get
the physical load address from /proc/iomem, while kdump dumpfiles
contain the load address in the ELF header. Without this patch,
the crash session would fail during initialization with a "crash:
invalid kernel virtual address: [address] type: xtime" error message.
The physical address may still be forcibly set using the command line
option "--machdep phys_start=[address]" (anderson@redhat.com)
- When using the "--machdep phys_start=[address]" on an ia64 machine,
an irrelevant error message indicating: "WARNING: invalid vm= option"
would be displayed. (anderson@redhat.com)
- Updated the ppc64 page size determination from always using
getpagesize() on the host machine to symbolically determining
whether 64k page sizes are in use. (sachinp@in.ibm.com)
- Enhancement of the "sig" command to display the lists of both private
and/or shared queued signals, if any. (olivier.daudel@u-paris10.fr)
- Adapted "mount [-n pid|task]" patch, which displays the mounted
filesystems with respect to the namespace of a given pid or task.
(olivier.daudel@u-paris10.fr)
- Fix for running crash without parameters on a live system that does
not have a "/usr/src" directory, which would result in a segmentation
violation. (holzheu@de.ibm.com)
- The /proc/version check against vmlinux "strings" output needed to be
made aware that some other character may be adjacent to the "L" in the
"Linux version..." string. This would lead to erroneous "vmlinux and
/proc/version do not match!" errors during initialization.
(holzheu@de.ibm.com)
- gdb-6.1.patch update for gdb-6.1/sim/ppc/debug.c to compile in SUSE
build environment. (olh@suse.de)
(9/19/06)
4.0-3.3 - Addressed a number of issues associated with CONFIG_SPARSEMEM
kernels and kernels using updated manners for the linkage of
their pglist_data structures, and pointers to their mem_map arrays.
(anderson@redhat.com)
- Implemented "kmem -n" for CONFIG_SPARSEMEM kernels; in addition
to the pgdat- and zone-related data command output, it also
displays a list of the SPARSEMEM mem_sections. Here is an
example from an ia64:
crash> kmem -n
NODE SIZE PGLIST_DATA BOOTMEM_DATA NODE_ZONES
0 2359296 e000000008c00000 a000000100749b70 e000000008c00000
e000000008c02400
e000000008c04800
e000000008c06c00
MEM_MAP START_PADDR START_MAPNR
e0000001040a3f00 0 0
ZONE NAME SIZE MEM_MAP START_PADDR START_MAPNR
0 DMA 262144 e0000001040a3f00 0 0
1 DMA32 0 0 0 0
2 Normal 2097152 e0000001048a3f00 100000000 262144
3 HighMem 0 0 0 0
-------------------------------------------------------------------
NR SECTION CODED_MEM_MAP MEM_MAP PFN
0 e00000010409ff00 e0000001040a3f00 e0000001040a3f00 0
1 e00000010409ff08 e0000001040a3f00 e0000001044a3f00 65536
4 e00000010409ff20 e0000001038a3f00 e0000001048a3f00 262144
5 e00000010409ff28 e0000001038a3f00 e000000104ca3f00 327680
6 e00000010409ff30 e0000001038a3f00 e0000001050a3f00 393216
7 e00000010409ff38 e0000001038a3f00 e0000001054a3f00 458752
8 e00000010409ff40 e0000001038a3f00 e0000001058a3f00 524288
9 e00000010409ff48 e0000001038a3f00 e000000105ca3f00 589824
10 e00000010409ff50 e0000001038a3f00 e0000001060a3f00 655360
11 e00000010409ff58 e0000001038a3f00 e0000001064a3f00 720896
12 e00000010409ff60 e0000001038a3f00 e0000001068a3f00 786432
13 e00000010409ff68 e0000001038a3f00 e000000106ca3f00 851968
14 e00000010409ff70 e0000001038a3f00 e0000001070a3f00 917504
15 e00000010409ff78 e0000001038a3f00 e0000001074a3f00 983040
16 e00000010409ff80 e0000001038a3f00 e0000001078a3f00 1048576
17 e00000010409ff88 e0000001038a3f00 e000000107ca3f00 1114112
18 e00000010409ff90 e0000001038a3f00 e0000001080a3f00 1179648
19 e00000010409ff98 e0000001038a3f00 e0000001084a3f00 1245184
20 e00000010409ffa0 e0000001038a3f00 e0000001088a3f00 1310720
21 e00000010409ffa8 e0000001038a3f00 e000000108ca3f00 1376256
22 e00000010409ffb0 e0000001038a3f00 e0000001090a3f00 1441792
23 e00000010409ffb8 e0000001038a3f00 e0000001094a3f00 1507328
34 e0000001040a0010 e0000001010a3f00 e0000001098a3f00 2228224
35 e0000001040a0018 e0000001010a3f00 e000000109ca3f00 2293760
crash>
(anderson@redhat.com)
- Fix for "kmem -i" failure in CONFIG_SPARSEMEM kernels that would
typically fail with the error message: "kmem: invalid kernel virtual
address: 0 type: node_zones free_pages". (anderson@redhat.com)
- Fix for "kmem -f" failure in CONFIG_SPARSEMEM kernels that would
typically fail with the error message: "kmem: invalid kernel virtual
address: ab8 type: node_zones name". (anderson@redhat.com)
- Fix for "kmem -f" failure in 2.6.17 kernels (possibly earlier) that
would fail with the error message: "kmem: invalid structure member
offset: zone_zone_mem_map". (anderson@redhat.com)
- Fix for "kmem [address]" failure in 2.6.17 kernels (possibly earlier)
that would fail with the error message: "kmem: invalid structure
member offset: zone_zone_mem_map". (anderson@redhat.com)
- Fix for "kmem -i" that resulted in a bogus "CACHED" page count
value. (anderson@redhat.com)
- As an result of the last "kmem -i" fix, I've added a new "kmem -V"
option that dumps the kernel's new vm_stat[] array contents by
their enum values:
crash> kmem -V
NR_ANON_PAGES: 38656
NR_FILE_MAPPED: 3116
NR_FILE_PAGES: 141106
NR_SLAB: 58605
NR_PAGETABLE: 1059
NR_FILE_DIRTY: 7
NR_WRITEBACK: 0
NR_UNSTABLE_NFS: 0
NR_BOUNCE: 0
NUMA_HIT: 86475467
NUMA_MISS: 0
NUMA_FOREIGN: 0
NUMA_INTERLEAVE_HIT: 31523
NUMA_LOCAL: 86475467
NUMA_OTHER: 0
crash>
Interally, a new dump_vm_stat() function has been added to access
any of the items in the list. (anderson@redhat.com)
- Implemented support for relocatable x86_64 live kernels and kdump
generated vmcores. Without this patch, attempts to analyze those
kernels would fail during initialization with the error message:
"crash: vmlinux and vmcore do not match!" (anderson@redhat.com)
- Support for recognizing real-time signals in the "sig" command.
(olivier.daudel@u-paris10.fr)
- Fix for "sys -c" display of "sys_ni_syscall" entries that showed
different system call names that have the same (W) symbol value
as the (T) symbol "sys_ni_syscall". For example:
crash> sym -l | grep ffffffff802a38b6
ffffffff802a38b6 (W) compat_sys_ipc
ffffffff802a38b6 (W) compat_sys_keyctl
ffffffff802a38b6 (W) compat_sys_sysctl
ffffffff802a38b6 (W) ppc_rtas
ffffffff802a38b6 (T) sys_ni_syscall
ffffffff802a38b6 (W) sys_pciconfig_iobase
ffffffff802a38b6 (W) sys_pciconfig_read
ffffffff802a38b6 (W) sys_pciconfig_write
ffffffff802a38b6 (W) sys_spu_create
ffffffff802a38b6 (W) sys_spu_run
ffffffff802a38b6 (W) sys_vm86
ffffffff802a38b6 (W) sys_vm86old
crash>
Depending upon the kernel, one of those symbols would be displayed
instead of sys_ni_syscall. (olivier.daudel@u-paris10.fr)
- Fix for "sig" command where in later 2.6 kernels, the queued signal
list at the end of the display would loop back on itself, repeatedly
displaying the same queued signal(s). (olivier.daudel@u-paris10.fr)
(09/07/06)
4.0-3.2 - Enabled CONFIG_SPARSEMEM support for ia64 kernels; tested on
RHEL5-alpha (2.6.17-1.2519.4.5.el5). Without this fix, crash
would fail during initialization with error message indicating:
"crash: CONFIG_SPARSEMEM kernels not supported for this architecture"
(anderson@redhat.com, dwilder@us.ibm.com)
- Moved read_in_kernel_config() to just after the internal gdb
module gets initialized. Without this fix, Xen kernels built
with CONFIG_IKCONFIG would fail during initialization indicating:
"crash: gdb_interface: gdb not initialized?"
(anderson@redhat.com, moriwaka@valinux.co.jp)
- Implemented new s390/s390x command "s390dbf" command to print out
kernel traces from the s390 debug feature (s390dbf). The debug
feature is an s390 kernel trace API which uses wraparound buffers
to store trace records in memory. Many of the s390 device drivers
use this feature. There is some documentation of the s390dbf in
the kernel sources under /Documentation/s390/s390dbf.txt.
(holzheu@de.ibm.com)
- RHEL5-alpha kernel modules (only x86_64 confirmed) may possibly
fail to be loaded with the "mod" command due to dwarf2 errors
associated with the the split module.ko/module.ko.debug debuginfo
facility used by RHEL kernels. Bugzillas have been filed to
address those problems, but the crash utility's error-reporting
mechanism has beem modified to properly reflect that the internal
gdb module has failed to load the kernel module's debug data.
Without this fix, the "mod -[sS]" commands would silently return
without loading the module data because the "add-symbol-file"
operation inside the gdb module failed, did a longjmp(), and ended
up back at the crash prompt. That behaviour has been changed
to report the module name and the gdb error like so:
crash> mod -S
mod: /lib/modules/2.6.17-1.2564.1/kernel/drivers/scsi/scsi_mod.ko
gdb add-symbol-file command failed
crash>
Note that this problem occurs in all post-RHEL4 kernels, i.e.,
FC4, FC5, and now FC6 and RHEL5.
(anderson@redhat.com)
- Fix for runaway unkillable "repeat" command output that can happen
when scrolling is turned off and the command that was entered is
bogus. (anderson@redhat.com)
- Fix for "struct structure.member address" output when the member
is an array; additional members beyond the array contents would
get displayed. (anderson@redhat.com)
- Fix to internal gdb module to properly handle relocatable kernel
virtual addresses; this will be required for upcoming relocatable
RHEL5 kernels required for the kexec/kdump facility.
(anderson@redhat.com)
- Combined kernel_init(PRE_GDB) and kernel_init(POST_GDB) into a
single call to kernel_init() that is done after gdb is initialized;
verify_version() now called by kernel_init(). This is just a code
re-work, and does not change any functionality. (anderson@redhat.com)
(8/23/06)
4.0-3.1 - Fix to address 2.6.18 and later Fedora 2.6.17-based kernel data
structure name change from "runqueue" to "rq". This would cause
crash to fail during initialization with a "crash: cannot determine
idle task addresses from init_tasks[] or runqueues[]" message,
followed by a red herring message: "crash: cannot resolve
init_task_union". (haren@us.ibm.com)
- Added 4-level pagetable support for ia64. Since this is based
upon whether the kernel was built with CONFIG_PGTABLE_4, the
determination of whether the crash utility uses 4-level page
tables is based upon one of two possibilities: the "automatic"
manner depends upon the kernel also being configured with
CONFIG_IKCONFIG; otherwise it will require the commmand line
option "--machdep vm=4l". (troy.heber@hp.com)
- Leveraging Troy Heber's addition of code to dig out and uncompress
in-kernel CONFIG_IKCONFIG data, a new "sys config" command option has
been added, which dumps all of the kernel configuration data.
(anderson@redhat.com, troy.heber@hp.com)
- Also leveraging the new CONFIG_IKCONFIG data access, the value of HZ
can now be absolutely determined by reading CONFIG_HZ. If the config
data is not available, then the current use of the HZ #define will
be replaced by the use of sysconf(_SC_CLK_TCK) to account for the
upcoming removal of HZ from glibc header files.
(anderson@redhat.com, olh@suse.de)
- Added a new "--cpus [number]" command line option to work around any
situations where the number of cpus cannot be correctly determined.
This is unlikely to ever be needed, but it was necessary for an ia64
kdexec/kdump development kernel issue that has been addressed.
However it's been left in place as a work around in case the same
thing occurs due to some other circumstance. (anderson@redhat.com)
(8/04/06)
4.0-2.33 - Fix for possible compilation error in x86_xen_kdump_load_page_PAE()
function in 4.0-2.32 version of x86.c. (anderson@redhat.com)
(7/13/06)
4.0-2.32 - Implemented and tested code to create the Xen kdump p2m table from
the mfn value found in the "pfn_to_mfn_frame_list_list" member
contained within the shared per-domain "arch_shared_info" structure,
which is contained within the architecture-neutral "shared_info"
structure. However, the use of this capability will require that:
(1) the Xen kdump implementation pass this mfn value in the vmcore
ELF header, and
(2) the crash utility will need additional updating to access this
value from the vmcore ELF header.
The current test version of the Xen kdump code passes the dom0 cr3
value in the ELF header, but that only works for Xen kernels with
writable pagetables. Using the pfn_to_mfn_frame_list_list mfn will
work for both writable- and shared-pagetable Xen kernels.
(anderson@redhat.com)
- Support for kernels with no vmalloc addresses, i.e., with an empty
"vmlist", fixing an initialization-time session failure indicating:
"crash: invalid kernel virtual address: 0 type: first vmlist addr"
(moriwaka@valinux.co.jp)
- Fix that allows the "wr" command to accept at 64-bit value.
(castor.fu@3pardata.com)
- Fix for "vtop" on user/kernel virtual addresses that showed the page
offset value on the "PAGE:" output line on x86 PAE kernels.
(anderson@redhat.com)
- Added "rd -x" option to avoid the display of the ASCII translation at
the end of each line. (anderson@redhat.com)
- Fix for unnecessary double-printing of the "mount" command header
when a directory argument is referenced by two different vfsmounts.
(harihare@vnet.ibm.com, shenlinf@cn.ibm.com)
- Fix to recognize equivalent directory arguments to the "mount"
command, i.e., "/boot" is the same as "/boot/".
(shenlinf@cn.ibm.com)
- Fix for "swap" command that dropped "/dev" from swap device pathnames
in 2.6 kernels. (shenlinf@cn.ibm.com)
- Fix for potential segmentation violation when running "bt -f" command
on s390 and s390x. (holzheu@de.ibm.com)
- Added a "rd -m machine-address" option to read Xen machine
addresses if they are accessible; also a general cleanup of the
m2p functionality. (anderson@redhat.com)
(7/12/06)
4.0-2.31 - Bumped crash-internal NR_CPUS for x86 and ia64; added a warning
message to "recompile crash" and forced an initialization failure
when the kernel's configured NR_CPUS is greater than the maximum
allowed NR_CPUS value compiled into crash.
(maneesh@in.ibm.com, anderson@redhat.com)
- Fix for initialization failure indicating a kernel/memory-source
mismatch when x86 kernel configures its physical memory start
address higher than the traditional 1MB starting point.
(anderson@redhat.com)
- Fix for kernels that have replaced the "system_utsname" data
structure with contents of the "init_uts_ns" data structure.
This fixes a "crash: cannot resolve system_utsname" initialization
failure. (pbadari@us.ibm.com, anderson@redhat.com)
- Fix for large LKCD dumpfiles that resulted in an initialization
time failure indicating "fixme, need to add more zones (ZONE_ALLOC)".
When statically-defined ZONE_ALLOC value is too small, the fix
expands the zone size dynamically. (indou.takao@jp.fujitsu.com)
- Fix for "kmem -i" failure when the "all_bdevs" block_device list
is empty. Part of the command output would be displayed, followed by
"kmem: invalid kernel virtual address: 0 type: inode buffer".
(anderson@redhat.com)
- First pass at supporting a Xen hypervisor kexec/kdump vmcore as the
dumpfile format for the dom0 vmlinux. Developed/tested OK on an x86
vmlinux/vmcore set supplied by horms@verge.net.au. Code for x86_64
is in place, but untested. (anderson@redhat.com)
- Also in place, but untested, is initial support for Xen x86 PAE
kernels. (anderson@redhat.com)
(6/27/06)
4.0-2.30 - RHEL4-U4 version. RHEL3-U8 will be (indentical) version 4.0-2.29.
- Fix for x86_64-only "vm -p" failure due to "pml page" read error
on kernels with 3-level user page tables; regression was introduced
by x86_64 Xen support in 4.0-2.24. (anderson@redhat.com)
- Fedora, and future RHEL, build procedure requires the removal of
the inclusion of certain kernel header files; removed inclusion(s)
of page.h, list.h, and segment.h. (anderson@redhat.com)
(6/06/06)
4.0-2.24 - Fix for 2.6.17 kernels that do not use "pgdat_list" memory node
list header, which would cause crash to fail during initialization
with a "crash: cannot resolve: pgdat_list" error message.
(anderson@redhat.com)
- Fix for 2.6.17 kernels that have re-worked the kernel pid_hash
handling, which would cause crash to fail during initialization
with a "crash: cannot determine pid_hash array dimensions" error
message. (anderson@redhat.com)
- If the vmlinux file and /proc/version do not match, and crash tries
to find an appropriate System.map file to use for symbol addresses,
a new "WARNING: vmlinux and /proc/version do not match!" message
will be displayed. Note that the System.map file that crash finds
will be appropriate for data symbols, but may not necessarily be
correct for text regions. When this happens, kernel text disassembly
may be incorrect, and this in turn leads to other problems, such as
incorrect back-tracing. (anderson@redhat.com)
- Fix for recent 2.6 kernels "sys" UPTIME display and for "ps -t" RUN
TIME displays due to change to HZ value. (anderson@redhat.com)
- Continued Xen support: this version runs on live x86_64 xen0 and xenU
kernels, on x86_64 xenU core dumps, and x86_64 xenU "xm save" files.
As is the case for x86 Xen, this support is only for x86_64 kernels
with writable page tables. (anderson@redhat.com)
- Fix for x86_64 IS_LAST_PML4_READ() macro, which (harmlessly) never
worked, but caused the PML4 page to be re-read each time. Added a
per-arch clear_machdep_cache() function for processors needing to do
their own virtual-to-physical page table cache clearing; so far only
ppc64 and x86_64 need it for the top-most of their 4-level page table
pages. (anderson@redhat.com)
(5/01/06)
4.0-2.23 - Fix for "kmem -[sS]" command in 2.6.15 kernels which introduced
per-NUMA node slab chains. Without this patch the command fails
with a "kmem: invalid structure member offset: kmem_cache_s_lists"
error message. (sharyath@in.ibm.com)
- Fix for this initialization error on 2.6.16 kernels indicating:
"crash: cannot determine idle task addresses from init_tasks[]
or runqueues[]" followed by "crash: cannot resolve init_task_union"
error messages. This was due to the introduction of a runqueue.cpu
member that conflicted with an old cpu member in RHEL3-specific O(1)
scheduler code. (anderson@redhat.com)
- Fix for "kmem -i" in newer 2.6 kernels where the new ZONE_DMA32 bumps
up the value of ZONE_HIGHMEM, causing a potential segmentation
violation. (anderson@redhat.com)
- Fix for "kmem -i", where the PG_slab bit determination has been fixed
so that the correct number of slab pages is displayed.
(anderson@redhat.com)
- Fix for "swap" command and "kmem -i" option on 64-bit 2.6.15 kernels
which could fail with a crash internal buffer dump followed by these
messages: "swap: cannot allocate any more memory!" or "kmem: cannot
allocate any more memory!". This was due to the swap_info_struct.max
member being downsized from a long to an int. (anderson@redhat.com)
- Continued Xen support: this version runs on live x86 xen0 and xenU
kernels, on xenU core dumps, and on xenU "xm save" files. This
support is for x86 kernels with writable page tables only. Minimal
support for running on live x86_64 xen0 kernels with writable page
tables is also in place, but does not allow access to user virtual
memory as of yet. (anderson@redhat.com)
(4/12/06)
4.0-2.22 - Incorporated initial patch-set to implement support for kernels built
with CONFIG_SPARSEMEM. (dwilder@us.ibm.com)
- Fix for post-2.6.15 ppc64 kernels to use cpu_online_map when perusing
the paca array for the per_cpu_offsets. (haren@us.ibm.com)
- Fix for ppc64 "bt" command for active tasks that were running in
user space at the time of crash. (haren@us.ibm.com)
- Fix to remove dependencies upon any kernel header files so as to
allow crash to build in a Ubuntu environment. (aquynh@gmail.com)
- Fix size of x86_64 "cpu_khz" variable to match that of the kernel.
(sharyath@in.ibm.com)
- Created framework for support of Xen kernel dumpfiles and live Xen
kernels; this is going to be a long-period work-in-progress affair,
and the code added in this release is being done now primarily to aid
in future patch integration efforts. (anderson@redhat.com)
(3/23/06)
4.0-2.21 - Fix to recognize post-2.6.15 ppc64 kernels moving the per_cpu_offsets
to the "paca" structure. Without this patch, crash fails with the
following error messages: "crash: cannot determine idle task addresses
from init_tasks[] or runqueues[]" and "crash: cannot resolve
init_task_union". (pbadari@us.ibm.com)
- Incorporated a patch containing ppc64 specific changes when reading
kdump vmcores. Kdump vmcores contain pt_regs for all cpus in the ELF
header, so they are read from there rather than from the active tasks'
kernel stacks; also, the registers contents are printed before any
active task backtrace. (haren@us.ibm.com)
- If pglist_data.node_mem_map structure member does not exist, as in a
ppc64 kernel built with CONFIG_SPARSEMEM, print an init-time warning
message instead of failing with "crash: invalid structure member
offset: pglist_data.node_mem_map" message. (haren@us.ibm.com,
anderson@redhat.com)
(2/16/06)
4.0-2.20 - Fix to recognize 2.6.16 change that removed the x86_64 cpu_pda[]
array of x8664_pda structures and replaced it with a _cpu_pda[]
array of pointers to those structures. Without the patch, crash
failed during initiatization of 2.16.16 x86_64 kernels with a
"crash: cannot resolve cpu_pda" error. (rachita@in.ibm.com)
- Added a minor enhancement to the "list" command to allow the
"start" argument to also be an (expression) that evaluates to the
address of the starting list_head; previously it only allowed
a symbol or a virtual address. (anderson@redhat.com)
(2/03/06)
4.0-2.19 - Fix for the "bt" command on ia64 kernels with 64K page size.
(1/11/06)
4.0-2.18 - Fix for the "files" command for 2.6.14 and later kernels, in which
the files_struct data structure contains the new fdtable data
structure. (rachita@in.ibm.com)
- Fix for an "invalid lvalue in assignment" compile-time error
generated from gdb-6.1/bfd/coff-alpha.c that prevents the embedded
gdb from building with newer compilers. (troy.heber@hp.com)
(1/5/06)
4.0-2.17 - Fix to resurrect LKCD version 8 support, inadvertently broken in
4.0-2.15. (troy.heber@hp.com)
- Fix for "net -S" failures in certain 2.6 kernels that failed with
"net: cannot determine what an inet_sock structure is" message;
shows embedded sock structure instead of failing. (anonymous donor)
- Fix for erroneous "net -s" source/destination address and port
values in certain 2.6 kernels; added "net -s" source/destination
address and port values for IPv6 sockets. (anderson@redhat.com)
(12/16/05)
4.0-2.16 - Fix for the x86_64 backtrace code to search all of the exception
stacks for the origin of the active tasks' backtrace when the
information is not available in the dumpfile header. Up until now,
the search was made in the process stack, the per-cpu IRQ stack,
and the per-cpu NMI exception stack; this patch looks at all 3
exception stacks in 2.4 kernels (NMI, STACKFAULT and DOUBLEFAULT),
and all 5 exception stacks in 2.6 kernels (NMI, STACKFAULT,
DOUBLEFAULT, DEBUG and MCE).
- Fix to remove erroneous warning message re: the task cpu not being
the same as the IRQ or exception stack cpu, which was displayed when
doing a non-context-sensitive "bt -E" on an x86_64.
(12/12/05)
4.0-2.15 - Applied Kurt Rader's (kdrader@us.ibm.com) patch for SUSE SLES 9
"bigsmp" kernel LKCD dumpfiles, to fix "conflicting page" abort
caused by a dumpfile header that is larger than the formerly
hard-wired header size.
- Fix for ppc64-only segmentation violation when running "bt" on the
panic task when run against a dumpfile created by the diskdump
facility's new compressed format.
(12/02/05)
4.0-2.13 - Adapted Takao Indoh of Fujitsu's patch for determining proper size
of the ia64_init_stack; fixes empty ia64 "bt -a" output for cpu 8 and
above for diskdumps generated via OS_INIT.
- Applied a patch to address a "net -s" error due to the inet_opt
structure being dropped between 2.6.10 and 2.6.11, which led to a
"net: invalid structure member offset: inet_opt_daddr" failure.
- Made the initialization-time rule such that if "bt -O" is contained
in any or all of the 3 possible initialization-time input files
($HOME/.crashrc, ./.crashrc, or "-i inputfile" files), the setting
will remain idempotent. Fixed the redundant running of $HOME/.crashrc
and ./.crashrc files if they are the same file.
- Added a gdb work-around/hack for ia64 initialization-time warning
"WARNING: cannot determine unw.tables offset" on rebuilt RHEL3 ia64
kernels that would prevent "bt" from working.
- Backed out 4.0-2.11 x86_64 pseudo-backtrace patch to show in-kernel
exception frame RIP and RSP values as a unique frame following the
register dump; instead, the exception RIP address is translated
and displayed prior to the register dump.
(11/23/05)
4.0-2.12 - Update to diskdump page_desc struct, required for ongoing support
of the diskdump facility's compression feature, currently under
development.
- Applied patch from Ken'ichi Ohmichi of NEC to prevent a segmentation
violation during a "bt -f" on an x86_64 task that had taken a NMI
during cpu_idle().
- Adapted Badari Pulavarty's patch for recognition of recent 2.6.14
kernel structure/member name changes: mm_struct._rss to _file_rss,
and the kmem_cache_s structure's renaming to kmem_cache. Without
the patch, crash sessions would fail during initialization with an
"crash: invalid structure member offset: kmem_cache_s_num" error,
and the "ps" command would fail with a "ps: invalid structure member
offset: mm_struct_rss" error.
(11/15/05)
4.0-2.11 - Adapted a number of proposed patches:
- Badari Pulavarty of IBM's implentation of support for 2.6.14
ppc64 kernel's use of 4-level page tables.
- Added a new "extensions" sub-directory for collecting crash
command extension libaries; initially populated with the sample
"echo.c" from the extend help page, along with a device-mapper
related "dminfo.c" module from NEC.
- Castor Fu of 3PAR's implementation of support for LKCD version 10,
as well the handling of single-bit errors in LKCD compressed
pages by trying out all possible single-bit errors. Also his
fixes for better recognizing -fomit-frame-pointer kernel builds,
a stronger defense against potential bogus processor numbers
associated with tasks in dumpfiles, and a fix to re-allow crash
builds for gcc 2.x compilers.
- Fix for potential "vmcore: initialization failed" fatal error during
initializaton when using more than just a vmlinux and vmcore command
line arguments.
- Fix for diskdump.c compile failures using gcc 2.96.
Update to the x86_64 pseudo-backtrace code to show as a frame the
RSP, RIP and name of the function causing a kernel-mode exception
frame.
- Fix for the x86_64 pseudo-backtrace code to not neglect to show the
user-mode exception frame when that task subsequently took a
kernel-mode exception.
Exported the load_extension() and unload_extension() functions so
that they can be called from an extension library.
(11/10/05)
4.0-2.10 - Adapted a patch set created by Badari Pulavarty of IBM, that
addresses a fatal initialization-time crash error, which displays
"crash: invalid structure member offset: x8664_pda_level4_pgt"
when run against post-2.6.10 x86_64 kernels. But more importantly,
Badari's patch adds support for these x86_64 kernel changes that
were introduced in 2.6.11:
- x86_64 kernel virtual address range changes, and
- x86_64 user virtual address space usage of 4-level page tables
(11/07/05)
4.0-2.9 - Adapted a patch set from NEC and Fujitsu that introduces support
for an alternative compressed dumpfile format created by the
diskdump facility. When the diskdump facility is configured to
use compression, the dumpfile will not be an ELF vmcore file,
but rather a compressed dumpfile image, derived from the LKCD
dumpfile format.
(11/03/05)
4.0-2.8 - Adapted a patch sent by Jun'ichi Nomura of NEC that addresses
a problem with the "mod" command, such that when trying to load
the debug data from a module whose kernel name is different than
its module object filename, it would require a manual "mod -s"
command line containing the full pathname to the module's object
file. This typically happens when a module's name string contains
an underscore, while its object file contains a dash. Jun'ichi's
patch simply retries any unsuccessful module object file searches
after replacing the underscore with a dash.
(10/21/05)
4.0-2.7 - Fixed x86_64 backtrace code to recognize 32-bit user code kernel
entry exception frames (code segment selectors of 0x23) without
issuing a "bt: WARNING: possibly bogus exception frame" message.
- Fixed x86_64 backtrace code to recognize in-kernel exception
frames generated from module text in situations where the module
data was not included in the dumpfile, such as in a netdump which
resulted in a vmcore-incomplete file.
(10/19/05)
4.0-2.6 - Backed out support for the proposed NT_KDUMPINFO ELF notes section
in kexec/dump vmcores (which have been rejected upstream for now).
- Fix for faulty backtrace display of exception frames coming out of
either "nmi" or the generic "error_code" fault handlers, as seen in
later 2.6 kernels.
- Restored "dev -i" and "dev -p" options for x86, which I mistakenly
removed when the s390[x] support was added in 3.10-13.2.
(9/29/05)
4.0-2.5 - Continued support for kexec/kdump generated vmcore files, with
this release running against SMP i386 dumpfiles (32-bit ELF),
which contain multiple, per-cpu, NT_PRSTATUS sections, and also
containing support for the proposed NT_KDUMPINFO ELF notes section.
- Implemented new "bt -T" option to supplement "bt -t", the difference
being that the -T option dumps all text addresses in a process stack
starting just above the task_struct or thread_info structure,
whichever applies; whereas "bt -t" starts where it determines is
the lowest depth that the stack had reached during the task's last
entry into the kernel.
- Fix for "bt -r" output on 2.6.13 kernels where certain addresses
were recognized as kernel addresses, but could not be translated
symbolically (also affected "rd -s" output).
(9/20/05)
4.0-2.4 - Initial support for kexec/kdump generated vmcore files. So far
testing has only been done on uniprocessor i386 32-bit and 64-bit
ELF header dumpfiles; expect ongoing kdump support updates.
- Fix for "ps: invalid structure member offset: mm_struct_rss"
command failures on 2.6.13 kernels.
(9/09/05)
4.0-2.3 - Update to recognize the contents of the 2.6 kernel's shared
array_cache in each kmem_cache_s header:
- kmem -S will show an individual object in a slab cache's shared
cache as a free object tagged with "(shared cache)", instead
of indicating that the object is allocated.
- kmem -[sS] statistics will count the shared objects as free
instead of allocated.
- kmem -[sS] error checking has been updated to recognize when
an object is erroneously on more than one of the three possible
list types, i.e., a slab's free list, any of the cache's per-cpu
lists, and the caches's shared list.
(8/30/05)
4.0-2.2 - Fixes inadvertent breakage of the kmem_cache initialization
code on 2.4 UP kernels, which would indicate "crash: unable to
initialize kmem slab cache subsystem" during crash session init.
The bug was introduced in 4.0-2.1, and would disallow subsequent
"kmem -s" operations.
(8/11/05)
4.0-2.1 - This update consists of a set of SUSE-kernel related changes.
Based upon a suggestion from Kurtis Rader of IBM, we made the
initialization of the kmem_cache slab subsystem able to more
gracefully handle:
- missing pages in dumpfiles which could cause the crash
session to bail out prematurely with an "seek" error on
an "array cache limit" access.
- x86_64 dumpfiles from kernels that have NR_CPUS set to
greater than 32, which would cause a segmentation violation.
- Kurtis Rader also sent in a patch for kmem-related commands that
recognizes SUSE's replacement of the zone_struct.zone_start_paddr
field with the zone_struct.zone_start_pfn field.
- Kenneth Sumrall of MontaVista Software sent in a patch that makes
the timer command, after coming across a corrupted list, continue
dumping the remaining timer lists.
(8/10/05)
3.10-13.11 - Adapted a patch set forwarded by Bob Bell of EMC to support
LKCD dumpfile version 9 format.
(7/7/05)
3.10-13.10 - Several x86_64 "bt" command fixes that address the following
occasionally-seen bugs:
- Double display of user entry exception frame.
- Skipping of an in-kernel exception frame display.
- Invalid "possibly bogus exception frame" associated with a
valid in-kernel exception frame.
- Potential premature stopping of backtrace frame display due to
a stale "schedule_timeout" return address.
(6/16/05)
3.10-13.9 - Michael Holzheu of IBM forwarded a set of patches that address
the following issues:
- Fix for "ps -t" stime and utime for recent 2.6 kernels.
- Align the output of the "mod" command for correctly for both
32 and 64 bit.
- If there is specified a search pattern for the mount command,
which matches multiple mount entries, print all mount entries
which match. Fix for mount -f or -i is specified parsing problem
if a search pattern has also been specified.
- Alight the output of the address field of the "sig" command.
- Print error message if 0 is entered as the argument to bt -S.
- Added recognition of new diskdump ELF note section, and based upon
its contents, indicate whether a diskdump-generated dumpfile is
a partial dump (i.e., with configured-out user pages, page cache
pages and/or zero-filled pages).
(6/08/05)
3.10-13.8 - Fix for possible "cannot determine mount list location!" error
when running "mount" command on a 64-bit system because gdb cannot
find debug data for namespace structure. Update to the "struct"
command to allow a negative argument to the -c option; if a negative
count value is entered, that (positive) count of structures leading
up to and including the target structure will be displayed.
(5/17/05)
3.10-13.7 - Michael Holzheu of IBM forwarded a set of patches that address
the following issues:
- No "bt" is allowed on s390[x] running tasks on a live system.
- The "bt -I [eip]" option is not allowed on s390[x] systems.
- The help page for "bt" indicates that multiple pid and/or task
arguments may be entered.
- Fix for possible segmentation violation if "files -d [dentry]"
is given a random dentry address argument.
- "kmem -v" formatting fix for s390x.
- Fix for division by zero violation when "kmem -i" is run on a
system with no swap.
(5/06/05)
3.10-13.6 - Two fixes for "kmem -s [slab-address]": one where the slab's inuse
count would be reported as incorrect (bug introduced in 3.10-13.5),
and a second fix to display the proper slab statistics. Fixed both
btop and ptob commands to handle 64-bit values in 32-bit system.
Fix for "net -s" in 2.6 in which gdb cannot properly determine
the contents of an inet_sock structure.
(5/02/05)
3.10-13-5 - Restored gdb-6.1 as the embedded gdb; a fix to gdb's dwarf2read.c
allows proper access debug data from vmlinux kernels built with
gcc-3.4.*, and the proper loading of module debug data. The
init-time warning messages re: --readnow has been removed, and
will only be shown if a required structure member or structure size
cannot be determined.
- First pass at handling diskdump-generated 64-bit vmcore files with
multiple PT_LOAD segments.
- Fix for x86_64 kernel exception frame recognition; bt was showing
"error_exit" symbol in trace without accompanying exception frame.
- Additional "slabs_full" contents verification of c_num field in
2.4 kernels; kmem -s would bypass a bogus slab_s with an invalid
inuse field.
(4/26/05)
3.10-13.3 - Fix for case in which a netdump's panic task is dead, had
called do_exit(), which in turn has called schedule(). It is
kernel bug for the task to be rescheduled and return back to
do_exit() from schedule(), and if it does the kernel does a
BUG() to force an oops. The crash utility never expected to
see this anomoly, and would bail out during initialization
with a "crash: task does not exist: [task address]" message.
- Fix to allow running with 2.6 x86_64 kernels in which CONFIG_NR_CPUS
is 8, on a system with 8 cpus. The system would fail initialization
with a message of the sort: "crash: read error: kernel virtual
address: 20000800403acd23 type: tss_struct ist array".
(4/12/05)
3.10-13.2 - Introduces s390 and s390x support, submitted by Michael Holzheu
of IBM. Both LKCD and s390 standalone dumpfile formats are
supported.
(3/23/05)
3.10-13.1 - Fix for 2.6 kernels with "linux_banner" located in read-only
data section, to fix initialization failure indicated by
a "WARNING: invalid linux_banner pointer: 756e694c" message,
followed by an "invalid kernel virtual address: 756e694c",
and then a "bad match" failure.
- Addressed several type-check warnings generated when compiling
with gcc 4.
- Clean up %build-root after an rpmbuild without the install step.
(3/18/05)
3.10-11 - Enhanced x86, x86_64 and ia64 module text disassembly output
to symbolically display call targets without requiring module
debuginfo data.
- Fixed hole where an ia64 vmcore could be mistakenly accepted
as a usable dumpfile on an x86_64 host, leading eventually
to a non-related error message.
- Fixed potential "bt -a" hang on dumpfile where diskdump/netdump
IPI interrupted an x86 process on a 4g/4g kernel while executing
the instructions just after it had entered the kernel for a
syscall, but before calling the handler.
- Updated to handle backtraces from x86_64 dumpfiles generated
while running on the NMI exception stack.
- Applied patch from Troy Huber of HP to fix faulty ia64 module
text disassembly output.
(2/21/05)
3.10-3.1 - Fortified "kmem -[sS]" verification of slab chain linkage and
slab structure contents, and to report any errors found; this
prevents potential segmentation violations or command hangs
when performing a kmem -[sS] command on a dumpfile with slab
corruption.
- Adapted a patch from Jun'ichi Nomura of NEC that properly displays
backtraces from netdump/diskdump dumpfiles generated from INIT
switches on ia64 machines; the kernel must have per-cpu INIT stacks.
(12/17/04)
3.10-1 - Fix for segmentation violation during initialization seen on a
2.6 x86_64 SMP kernel run on a system running with "maxcpus=1".
Fix for "bt" on the panic task in a 2.6 x86_64 netdump, due to
the user_regs_struct debug data not being gathered -- even with
the retrofitted gdb-6.0 (yet another issue associated with gdb
and kernels built with gcc-3.4.*).
- Fix for the "mod -[sS]" command for ppc64 on kernels built with
gcc-3.4.*. It should be noted that "mod -[sS]" does not work on
ia64 and x86_64 kernels built with gcc-3.4.*, as there is no
version of gdb that can properly handle ia64 and x86_64 kernel
module objects built with compiler versions starting with 3.4.*.
Hopefully that shortcoming will be addressed in a future version
of gdb.
(11/24/04)
3.9-1 - In a interim attempt to deal with the current version of gdb
not being able to properly access debug data from vmlinux
kernels built with gcc-3.4.*, I have reverted the embedded
gdb version from gdb-6.1 back to gdb-6.0. (I will keep version
3.8-5.11 available in old_versions/, which is the last version
with gdb-6.1 embedded.) The gdb team at Red Hat is aware of the
problem, and when a new version of gdb that works is available
either at the FSF site or internal to Red Hat, I will upgrade
again. In any case, with gdb-6.0, the initialization-time
"using an invalid structure member offset" errors should not
occur; if they do, the warning message concerning the use of
the command line "--readnow" option should be applied.
- Other fixes in this release address:
- the occasional failures of the "timer" command failing with
"zero-size memory allocation!" errors.
- a failure of the "bt" for the 2.6 migration_thread.
- ia64 /dev/mem read failures when a page staddles an EFI memory
segment.
- the removal of unnecessary ia64 "unwind" warning messages running
"bt" on 2.6 kernels.
(11/18/04)
3.8-5.11 - No functional changes or fixes are in this release. However,
the error reporting mechanism for attempts to use an invalid
structure member offset or invalid structure size has been
beefed up to additionally report the invalid data type item,
along with the function, filename and line number. Up until
now, only a rudimentary back trace has been displayed.
(10/29/04)
3.8-5.10 - Fix for a failure to properly determine the correct number
of cpus in an ia64 netdump dumpfile from an SMP kernel running
on a single-processor system.
(10/26/04)
3.8-5.9 - Fix for potential segmentation violation during "mod -S" due
to an overrun of what was a statically-defined bfd section data
array. Cleaned up bogus error message appearing at the top
of the README file.
- Fixed two uninitialized variable usages.
(10/26/04)
3.8-5.8 - Fix for newer 2.6 ia64 kernels whose in-kernel "unw" structure
contains the new "r0" member, which was causing backtraces to
fail with "kernel and crash unwind data structures are out of sync"
messages.
- Fixed ia64 register dumps in backtraces to properly display the
floating point register contents, as well as the additional
F9 and F10 registers from kernels with newer pt_regs structures
that contain them.
(10/15/04)
3.8-5.7 - Introduced support for ia64 LKCD dumpfiles, as well as several
other LKCD-related fixes, submitted by Troy Heber of Hewlett-Packard.
- Fix for x86 backtrace if an IRQ is received on a CPU that has just
entered the kernel via system_call but has not yet called the system
call hander, running on a 4g/4g kernel.
- Updated ExclusiveArch in crash.spec file to include both ppc64pseries
and ppc64iseries.
(10/14/04)
3.8-5.6 - Continued ppc64 support work for 2.6 diskdump and netdump
facilities handling, submitted by Haren Myneni of IBM.
(10/01/04)
3.8-5.5 - Continued ppc64 support, submitted by Haren Myneni of IBM, to
handle ppc64 netdump-generated dumpfiles, to find and display
the active backtraces from dumpfile info, to handle 2.6 IRQ stacks,
and to fix an endian issue associated with the kmem command.
- Fix for x86_64 to deal with recent 2.6 removal of the init_tss
array, which has been replaced with per-cpu tss_structs; fixes
"cannot resolve init_tss" error during initialization.
(9/29/04)
3.8-5.4 - Implement support for recently-introduced PID hashing scheme in
which pid_hash[] is now an array of hlist_head pointers that
head a list of hlist_node structures; fixes "using an invalid
structure member offset" crash initialization failure.
(9/10/04)
3.8-5.3 - Makes NR_CPUS processor-specific, for the most part based upon
their configurable maximums; fixes segmentation violation during
intialization when crash's NR_CPUS was less than the kernel's
configured value.
- Updated time-related issues to deal with 2.6 change of task_struct's
start_time from an unsigned long to a u64, and kernel HZ difference
from user-space view; still there are problems to address, such as
the system's uptime display, and the ps -t output of the initial
swapper process.
(9/3/04)
3.8-5.2 - Accept "--readnow" crash command line argument, which gets passed
on to the embedded gdb module. This may help alleviate the
gdb-6.1/gcc-3.4.x debug data problem on some architectures.
(8/31/04)
3.8-5.1 - Introduces ppc64 support, submitted by Haren Myneni of IBM.
(8/24/04)
3.8-5 - Snapshot of the tentative target for RHEL4/FC3:
- Fix for possible ia64 "bt -al" segmentation violation on the
idle task ("swapper") running on other than the boot processor.
- Fix for ia64 build issues when compiled in a 2.6 environment.
- Fix for incorrect presumption that a vmlinux file has no
debugging data when compiled with gcc 3.4.x.
- Clean up of lkcd_x86_trace.c's gcc version-specific kludgery.
NOTE: The for gdb 6.1's inability to gather debug data from vmlinux
files built with gcc 3.4.x does not work in all cases; the gdb team
is looking into the issue now.
(7/14/04)
3.8-3 - No change -- version 3.8-3 is a Red Hat internal snapshot of 3.8-2.2.
Version 3.8-3 is tentatively targeted for the RHEL3-U3 release,
so this is simply a version-sync.
(6/28/04)
3.8-2.2 - Minor changes:
- Fixes gcc 3.3.3 compiler warning regarding the use of 64-bit
bitmap #define's that didn't have "ULL" appended.
- Presumes backtrace compatibility with gcc 3.3.2 for now.
- Added another Red Hat kernel search directory; the current one for
AS2.1/RHEL3 is: "/usr/src/redhat/BUILD/kernel-2.x.x/linux".
For RHEL4 kernels it has been changed to be of the form:
"/usr/src/redhat/BUILD/kernel-2.x.x/linux-2.x.x". This will
allow a no-argument live crash session to be initiated on a
system hosting the kernel's build environment without having
to install the kernel debuginfo package.
(6/25/04)
3.8-2.1 - Fixes a flurry of bugs related to 2.6 kernel changes:
- The initialization of kmem_cache subsytem was causing a crash
invocation failure on UP systems; fixed bug, but also added a
"--no_kmem_cache" command line option to skip over kmem_cache
slab subsystem initialization, since there's no need to die if
the slab subsystem undergoes changes.
- Fix to handle new kmem_bufctl_t typedef, which has always been
an int, but is now a processor-dependent int or short. This
resulted in "kmem -S" causing a segmentation violation.
- Fix to deal with structure member name change from page->count
to page->_count. This was causing various kmem command options
to fail.
(6/22/04)
3.8-2 - Introduces support for the diskdump facility on ia64 platforms
(diskdump support for x86 has always worked). There is preliminary
support for diskdump support from x86_64 platforms as well, but is
untested on SMP because diskdump has not been developed for SMP
systems because of unresolved CONFIG_NUMA issues.
- Several bug fixes, primarily related to ia64 commands, were
forwarded by Takao Indoh of Fujitsu and applied.
- The "mod -S" command now takes an optional directory argument,
which overrides the default module tree search location
/lib/modules/[release].
(6/17/04)
3.8-1 - Introduces x86_64 support.
- Also, updates from Corey Minyard to support the 2.6 LKCD dumpfile
format were applied.
(6/02/04)
3.8-0 - First crash release containing FSF gdb-6.1, replacing Red Hat version
gdb-5.3post-0.20021129.36rh.
(5/04/04)
3.7-5.4 - First crash release that supports the 2.6 kernel.
(4/22/04)