Note: This web page was automatically created from a PalmOS "pedit32" memo.
2005-04-08 solaris 10 bootcamp notes
Solaris 10 includes a new cryptography framework
includes openssl under /usr/sfw.
development of x86 and x86-64 dev did not stop during period sun was
not making sol/x86 available to customers.
fma: fault management architecture
smf: service management facility
big focus on lowering TCO (total cost of ownership) in solaris 10
they eat their own dogfood, and they start eating it early in the OS
release lifecycle
sun engineers use solaris on their desktops
lots of innovation in sol 10 (the os), linus quoted saying most innovation
in linux is above the os layer
This section will cover little-known features in versions of Solaris
prior to Solaris 10:
observability:
truss:
-m machine faults
-u function calls (like sotruss?)
stop process: -T -S -M -U
-c syscall stats
pmap
-s
-x
pfiles
shows files, sockets, doors of a process
replaces lsof, should not need lsof on sol 10
pstack can be run on a core file. Applies to each thread
pargs print command arguments. Security implications?
psig show signal disposition
nohup -p - nohup for an already-running process
preap - forces parent to reap child zombies
prstat - like top. "Don't need top anymore" pretty detailed, see man
page -L for threads. -z -Z? Look at zones
ptree
gcore - force a running process to core dump, but don't kill the process
pldd - ldd for running processes
pwdx - print working dir for a running process
pcred - shows effective, real and set uid and gid
pwait - waits for a process to terminate
resource management:
processor set- added in sol 8
bind processors to processes
free, dynamic, easy
change without reboot
stateless, not good, so script it
sol 9 - resource pools: stateful processor sets
fair share scheduler
fixed slice scheduler (fx)
once a processor is in a processor set, only added processes will run
on that cpu
don't use processor sets on 1 cpu machines
can interact poorly with heavy interrupts, say, due to gigabit or 10
gigabit nic. Can disable interrupts for a processor set
psradm
psrset
projects: a persistent namespace
/etc/projects
bind applications together for managing them collectively
poolbind
rctladm
rcapd
app with no more 10 G of RAM
harder than cpu management
ability will be added to future sol 10 release
"resource cap" for memory
libumem:
heap allocator, intended to work well for more kinds of work loads (mem usage)
like mallloc()
like unify theory in physics :)
kma: kernel memory allocator
duplicated in some linux, *bsd
libumem may help efficiency a lot (in applications that use the heap?),
lecturer has seen 3x improvement
LD_PRELOAD'able
includes nice set of debugging features accessed via mdb debugger
real time:
priocntl
fully preemptive kernel
see CLOCK-HIGHRES In timer_create(3RT)
end of little known features in prev releases of solaris
Covering Solaris 10 information now
containers (zones) - like FreeBSD jails
dynamic resource pools
dtrace
predictive self healing: fma, smf
process rights management: finer grained than root/nonroot. /etc/user_attr
zettabyte filesystem (ZFS). Not in first rel of sol 10, hopefully in 2nd,
not sure. Hopefully in "solaris 10 update 2".
resource management
project - task - process - thread
resource pools
kernel requires at least one cpu for itself
srm - service resource manager (?)
sol 9 - no need to buy srm, can use fss - not priorities, but shares -
allocate by proportions more or less
may get more, but never less shares than allocated
define total num and what gets how many shares
rctladm(1M) sol 9
see man page
resource limits
ulimit -a
/etc/system no longer required for installing a commercial database.
Will use if there, but native is resource control. Defaults larger
sun wants us doing less and less with /etc/system
sol 10: dynamic resource pools
poold - if proj not getting needed resources, poold will shuffle resources
to satisfy
sol 10: containers
zones
lecturer not sure if same thing
some say containers are more than zones, but how?
sol 10 always boots into global zone
zone create takes 20 minutes on lecturers laptop
zoneadm list -iv
zlogin to log into a specific zone
no pid overlap between zones. Only global zone sees all pids on system
single instance of solaris in multizone scenario
nonglobal zones don't see /dev, for example
cannot run dtrace in nonglobal zone yet, they are working on this
can config what filesystems are visible
can inherit filesystems, can make filesystems readonly or rw
from global zone, can cd into local zone's hierarchies. root, dev.
df output different in local zones. Does not show /dev info
every zone can have a unique network identity
ip traffic routed through global zone's NIC
much like virtual interfaces
apps can bind to INADDR_ANY, but still only get connects for that zone
ports unique to each zone
breakins: may make a mess of a local zone but should not be able to get
to global zone (or other local zones?)
don't nfs export from global zone to a local zone "there are issues with that"
/dev/*mem /dev/dsk /dev/rdsk &c not visible in local zones
zonecfg(1 m) - give name, path, network, autoboot, pool (which resource pool)
config data in xml format
packages and patches + zones
install in global zone, shows up in local zones
can also install only in global zone
also in local zones
pkgadd and friends changed to control this behavior
inherit-pkg-dir: dir in local zone that should inherit from global zone
loopback and readonly by default
zonecfg -z zone
create
set zonepath=
set autoboot=false
...
/etc/zones holds config data
can be vi'd but...
zoneadm list -cv (opts?)
zoneadm -z name halt
zoneadm -z name boot
boots in a couple seconds
zlogin zonename
www.solarisinternals.com
pdf from talk should be there, in current version, by tonight
dtrace -l list all probes, point of data collection
42000+ probes
every probe has a unique int identifier, module, function, name
docs.sun.com
download dtrace doc
mdb -
ufs_read::dis (?)
disassemble ufs_read
dtrace -n ufs-read:entry
disassembly changes when you set a probe
code changes to enable probe
disable the probe, orig code is restored
one of primary goals was to keep dtrace safe enough to use on a production
system. Should not be possible to cause a crash with dtrace
very thorough error checking
can aggregate data to cut userspace postprocessing
dtrace -n 'syscall:::entry { @sc[execname,probefunc] = count() }'
blank fields match all occurrences
shows commands and their syscalls
probefunc is name of function
execname is name of program
@ is an associative array
'instant gratification'
instrumentation uses language similar to c or awk
lockstat has been around a long time. Uses dtrace in sol 10
plockstat looks at user apps, added in sol 10
both look at kernel locks
dtrace intermediate format: virtual risc architecture, used for executing
dtrace commands
dtrace -l -P sched
list all probes related to scheduler
can sometimes guess meanings of dtrace providers without being os
internals expert
fpt - big dtrace provider, for every func in kernel, can set a probe
for every func in kernel
"pid" dtrace "provider"
dtrace -n 'pid111:::entry' 111 is pid number
no action spec'd
probe fires and lists
dtrace -n 'pid10161:::entry { @s[probefunc] = count() }'
show all in pid 10161
if you do not restrict what to instrument, dtrace will attempt to
instrument every instruction, and often fail to do so due to lack
of memory
no recompilation necessary for using dtrace
sdt provider can get more info from progs you have source to using dtrace
With interpreted languages (python, java, ruby, bash, perl...), you
have to run against the interpreter, not the script (of course), but
they are working on java script support for dtrace.
dtrace language is called "d", borrows from c and awk
many example d scripts are available
dtrace -s dscriptname.d
quantize(n) can give log base 2 result, useful for weeding out hoardes of
insignificantly small items, and making the big items show up. count()
is linear on the other hand.
/ cond / is a form of "if"?
no explicit loops in d, to keep safety - otherwise, you could accidentally
create an infinite loop, crashing the system. But there are implicit
loops, kind of like in a database query language. Most related
technologies can crash systems if used "incorrectly".
dtrace can identify all ioctl()'s used by a given process
dtrace can delve into what happens inside a system call
tfork.d can essentially sotruss the kernel during fork()
docs.sun.com
blogs.sun.com
dtrace
not just for:
diagnosis
kernel engineers
service persnnel
system administrators
developers
zonename command
there's a zone variable in dtrace, so even though you cannot currently
dtrace in a local zone, you can dtrace only things in a particular zone
from the global zone
should add zonename to PS1 if available :)
dtrace (d language) has awk-like BEGIN END
also has c-like printa like printf?
supports c-like operators
printa: print aggregation
does d have printf too?
Predictive self healing
fma
smf - AKA "green line", named after a train in Boston
fma: fault management architecture
The point of FMA is to associate every error with a corrective action.
Automate if possible, otherwise notify admin. Also attempts to name
errors, for easy lookup on Sun's website.
eg: cpu starts generating soft errors- fma took cpu offline, rather than panic
fma:
detect errors
data capture - describe
naming errors with fmri
event protocol
diagnosis
dependency
action
history
no big block of nonsense to sift through and use for diagnosis
go to www.sun.com, cut and paste error message id into form, get info
did the lecturer say that sol 10 can offline individual memory pages?
fault diagnosis tech in sol 10 called "eversholt"
diagnose
fault tree
language "eversholt"
compiled
simulation environment
You're unlikely to need to know eversholt, as an admin. It sounds like
kernel engineers might be interested though.
fma is part of the solaris kernel
error handler > fault manager > ...
fmadm
fmdump - check logs
fmstat
maybe add some of this to oacstats
fmstat -a
takes a few minutes
hardware specific: fma
fma components are specific to pieces of hardware, but many parts of
fma are generic
fmstat -a hangs in local zones: run it in the global zone
to look up fma errors:
http://sun.com/msg/
fmadm config
/etc /etc/rc*.d, init and inetd: "ad hoc"
smf replaces these
dependencies across a network for SMF: no not yet, but it's the next
logical step
smf and fma both use fmri
"fault management resource identifier", or a name for a resource
svcs
gives fmri , time service started, whether it's running
not -everything- from sun has migrated to smf yet, sounds like they ran
out of time for doing so before sol 10 was frozen
right now smf is process oriented, but later you'll be able to give a
verification method that could test for an active tcp port, etc (Actually,
this would be trivial to add with some shell or python scripting....)
svcs -p fmri
fmri eg network/smtp-sendmail
svcs -D network/physical
list all services depend on physical network
svcs -l metainit (check on Solaris Volume Manager initialization)
blogs.sun.com
examples for doing it by hand
later there'll be a framework for creating them
process rights management
each process has four priviledge sets
integrated with security framework
list privs available for processes
ppriv -l
complementary to rbac: rule based access control
"effective privilege set"
can define what's inherited and what's not
ppriv -v $$
lots of info about privs for a given process (your current shell is $$)
zfs
not in sol 10 fcs
hopefully in sol 10 update 2
zettabyte fs - not spelled consistently in lecture: sometimes one t,
sometimes two
zfs external beta will be available very soon
limits
integrity checks
...
128 bits: bits, bytes or blocks
65 bit in 12 years
zetabyte=70 bits
zfs goes to 256 quadrillion zb
quantum limit of earth based storage
striped across devices- keeps track of device response times, and tries
to go to fastest disks
self-healing data
dd to a dev in a zfs mirror
zfs is fine, gives back good data, tells you there's an error
if zfs can mirror, what does that mean for SVM?
DMU treats disks as intetchangable parts for disk space alloc purposes
contrasting ufs+svm with zfs
ufs+svm many steps, zfs easier.
zpool create "home" mirror(disk1,disk2)
zfs mount -c home/ann
zpool add "home" mirror(disk3,disk4)
zfs will not be supported for root fs initially
zfs supports (user) quotas
cannot layer ufs overtop of zfs
zfs has snapshot capability
root ro like cdrom and writeable zfs
group quotas?
pool has fixed size, but any fs in pool can consume entire pool, barring quotas
copy on write for zfs snapshots - don't have to break a mirror to
accomlish this
extra blocks only from same pool?
Zfs: no silent data corruption: uses checksum for data integrity
changed threads model: optional in 8 , def in 9
Atlas project small systems tuning
janus project: run linux binaries on same-cpu solaris
niagra: multicore, multithread per core
low heat
vertical threading: multiple threads multiplexed on a pipeline
05-4-10: 1.1 million downloads of Solaris 10 since january