Tuesday, November 18, 2008

ORA-10388: parallel query server interrupt (failure)

ORA-10388: parallel query server interrupt (failure) is a error that may occur with parallel query server. But we have got this error one day when SMON was trying to do parallel recovery but failed. We face a instance crash for this error. We figure out this from trace file output which is like bellow.

*** 2008-10-24 08:56:26.057
*** SERVICE NAME:(SYS$BACKGROUND) 2008-10-24 08:56:26.054
*** SESSION ID:(1094.336) 2008-10-24 08:56:26.054
Parallel Transaction recovery server caught exception 10388

Then I used following query to find out all the parent and related child processes running and found all the child processes SMON spawns at that time. This sql is very helpful to find parallel query process and their slaves.


select
decode(px.qcinst_id,NULL,username,
' - '||lower(substr(s.program,length(s.program)-4,4) ) ) "Username",
decode(px.qcinst_id,NULL, 'QC', '(Slave)') "QC/Slave" ,
to_char( px.server_set) "Slave Set",
to_char(s.sid) "SID",
decode(px.qcinst_id, NULL ,to_char(s.sid) ,px.qcsid) "QC SID",
px.req_degree "Requested DOP",
px.degree "Actual DOP"
from
gv$px_session px,
gv$session s
where
px.sid=s.sid (+)
and
px.serial#=s.serial#
order by 5 , 1 desc ;

So take SID from the output of the query and find out from gv$active_session_history what is the object name that child processes tryig to access. We found the object name from CURRENT_OBJ# column of gv$active_session_history which was a recylebin object.

This is wired that SMON trying to to access a recylebin object. So we decided to purge the object from recycle bin. After doing that SMON finally stop to do recover and failing with error.

1 comment:

Zahid Hossain said...

Very interesting article..

Thanks
Zahid Hossain