

Fault Manager is part of self-healing functionality that provides fault isolation and component restart, in this case hardware component (SMF will take care of software components). Make sure that you run the service and have required packages.
# pkginfo |grep fmd system SUNWfmd Fault Management Daemon and Utilities system SUNWfmdr Fault Management Daemon and Utilities (Root) |
# svcs fmd STATE STIME FMRI online Jun_29 svc:/system/fmd:default |
Display Fault Manager Configuration:
# fmadm config MODULE VERSION STATUS DESCRIPTION cpumem-diagnosis 1.6 active CPU/Memory Diagnosis cpumem-retire 1.1 active CPU/Memory Retire Agent eft 1.16 active eft diagnosis engine fmd-self-diagnosis 1.0 active Fault Manager Self-Diagnosis io-retire 1.0 active I/O Retire Agent sysevent-transport 1.0 active SysEvent Transport Agent syslog-msgs 1.0 active Syslog Messaging Agent zfs-diagnosis 1.0 active ZFS Diagnosis Engine zfs-retire 1.0 active ZFS Retire Agent |
For example, kernel sends error to FMD and FMD forwards error to module. There are two types of module: 1. Diagnosis engine : provides diagnosis based on symptoms 2. Agents : respond to given diagnosis and takes action, say offline faulty CPU. The fault manager maintains two log files: 1. error log - list of errors sent to the fault manager daemon 2. fault log - list of diagnosed and repaired problems See fault log with: # fmdump See error log with: # fmdump -e Tips: -u - limits the output to a specific UUID -T - displays events that occurred BEFORE specific time yyyy-mm-dd -t - displays events that occurred AFTER specific time yyyy-mm-dd -V - verbose output Run command below to see if Faulty Manager shows some failed resources. In this example we see that memory module DIMM 3 failed.
# fmadm faulty
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Jun 23 02:30:30 2578e639-38cd-4cd8-9c16-87e96116f41e AMD-8000-2F Major
Fault class : fault.memory.dimm_sb
Affects : mem:///motherboard=0/chip=1/memory-controller=0/dimm=3/rank=0
degraded but still in service
FRU : "CPU 1 DIMM 3" (hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=oryx/motherboard=0
/chip=1/memory-controller=0/dimm=3)
Description : The number of errors associated with this memory module has
exceeded acceptable levels. Refer to
http://sun.com/msg/AMD-8000-2F for more information.
Response : Pages of memory associated with this memory module are being
removed from service as errors are reported.
Impact : Total system memory capacity will be reduced as pages are
retired.
Action : Schedule a repair procedure to replace the affected memory
module. Use fmdump -v -u <EVENT_ID> to identify the module.
|
Note that there is the link with more info (like knowledge base), go there and it tells you about resolution. Okay, so say you are replacing DIMM now. Once DIMM is replaced, you need to update resource cache to indicate there is no issue any more.
# fmadm repair 2578e639-38cd-4cd8-9c16-87e96116f41e fmadm: recorded repair to 2578e639-38cd-4cd8-9c16-87e96116f41e |
Reset the Fault Manager module. Don't know which one, previously mentioned web link will tell you.
# fmadm reset eft fmadm: eft module has been reset |
Verify that there is no more faulty resources. # fmadm faulty No output, super! Means there is no h/w issue!
Developers needed access to some objects from one schema to another using database link. To enable database link he tried to create entry in tnsnames.ora file but had a problem with insufficient permissions. As a developer he has limited privileges on Unix machines so he can’t edit and save tnsnames.ora file.
But there is solution for this little problem.
You can create functional database link without editing tnsnames.ora file.
Little Demo Case:
system@TEST11> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production PL/SQL Release 11.1.0.7.0 - Production CORE 11.1.0.7.0 Production TNS for Linux: Version 11.1.0.7.0 - Production NLSRTL Version 11.1.0.7.0 - Production 5 rows selected. system@TEST11> select * from dba_db_links; 1. no rows selected
Create database link testlink_db2 using full tns entry:
system@TEST11> create database link testlink_db2 connect to system identified by oracle using '(DESCRIPTION= (ADDRESS= (PROTOCOL=TCP) (HOST=10.2.10.18) (PORT=1525)) (CONNECT_DATA= (SID=test10)))' / Database link created.
Now little check and cleanout:
system@TEST11> select * from v$version@testlink_db2; BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod PL/SQL Release 10.2.0.4.0 - Production CORE 10.2.0.4.0 Production TNS for Linux: Version 10.2.0.4.0 - Production NLSRTL Version 10.2.0.4.0 - Production 5 rows selected. -- cleanout system@TEST11> drop database link testlink_db2; Database link dropped.
http://download.oracle.com/docs/html/B13951_01/net.htm#i1153728
http:https://docs.oracle.com/cd/E18283_01/server.112/e17118/statements_5005.htm
server_name = (DESCRIPTION=
(ADDRESS=
(PROTOCOL=TCP)
(PORT=port_number)
(HOST=host_name)
)
(CONNECT_DATA=(SERVICE_NAME=service_name)
)
)where:
server_name is the name of an Oracle server that matches an entry in the RDB directory. An entry in the RDB directory can be added using the ADDRDBDIRE command.
TCP is the TCP protocol used for TCP/IP connections.
port_number is the port number of the Oracle Net listener. This is usually port number 1521.
host_name is the name that defines the system where the target Oracle server resides. This name must be in the local host definition on the AS/400 or in a name server on your network. The host name can also be entered as an IP address, for example, 161.14.10.12.
service_name is the service name of the Oracle server.