Note: This web page was automatically created from a PalmOS "pedit32" memo.
2005-04-08 solaris 10 bootcamp notes

Solaris 10 includes a new cryptography framework
includes openssl under /usr/sfw.

development of x86 and x86-64 dev did not stop during period sun was
not making sol/x86 available to customers.

fma: fault management architecture
smf: service management facility

big focus on lowering TCO (total cost of ownership) in solaris 10

they eat their own dogfood, and they start eating it early in the OS
release lifecycle

sun engineers use solaris on their desktops

lots of innovation in sol 10 (the os), linus quoted saying most innovation
in linux is above the os layer

This section will cover little-known features in versions of Solaris
prior to Solaris 10:

observability:

truss:
-m machine faults
-u function calls (like sotruss?)
stop process: -T -S -M -U
-c syscall stats

pmap
-s 
-x

pfiles
shows files, sockets, doors of a process
replaces lsof, should not need lsof on sol 10

pstack can be run on a core file.  Applies to each thread

pargs print command arguments.  Security implications?

psig show signal disposition

nohup -p  - nohup for an already-running process

preap - forces parent to reap child zombies

prstat - like top.  "Don't need top anymore"  pretty detailed, see man
page   -L for threads.  -z -Z?  Look at zones

ptree

gcore - force a running process to core dump, but don't kill the process
pldd - ldd for running processes
pwdx - print working dir for a running process
pcred - shows effective, real and set uid and gid
pwait - waits for a process to terminate


resource management:

processor set- added in sol 8
bind processors to processes
free, dynamic, easy
change without reboot
stateless, not good, so script it

sol 9 - resource pools: stateful processor sets

fair share scheduler
fixed slice scheduler (fx)

once a processor is in a processor set, only added processes will run
on that cpu

don't use processor sets on 1 cpu machines

can interact poorly with heavy interrupts, say, due to gigabit or 10
gigabit nic.  Can disable interrupts for a processor set

psradm
psrset

projects: a persistent namespace
/etc/projects
bind applications together for managing them collectively

poolbind
rctladm
rcapd

app with no more 10 G of RAM
harder than cpu management
ability will be added to future sol 10 release

"resource cap" for memory

libumem:
heap allocator, intended to work well for more kinds of work loads (mem usage)
like mallloc()
like unify theory in physics :)

kma: kernel memory allocator
duplicated in some linux, *bsd

libumem may help efficiency a lot (in applications that use the heap?),
lecturer has seen 3x improvement
LD_PRELOAD'able

includes nice set of debugging features accessed via mdb debugger

real time:
priocntl
fully preemptive kernel
see CLOCK-HIGHRES In timer_create(3RT)

end of little known features in prev releases of solaris



Covering Solaris 10 information now

containers (zones) - like FreeBSD jails
dynamic resource pools
dtrace
predictive self healing: fma, smf
process rights management: finer grained than root/nonroot.  /etc/user_attr
zettabyte filesystem (ZFS).  Not in first rel of sol 10, hopefully in 2nd,
not sure.  Hopefully in "solaris 10 update 2".

resource management

project - task - process - thread

resource pools

kernel requires at least one cpu for itself

srm - service resource manager (?)

sol 9 - no need to buy srm, can use fss - not priorities, but shares -
allocate by proportions more or less
may get more, but never less shares than allocated
define total num and what gets how many shares

rctladm(1M) sol 9
see man page

resource limits
ulimit -a

/etc/system no longer required for installing a commercial database.
Will use if there, but native is resource control.  Defaults larger

sun wants us doing less and less with /etc/system


sol 10: dynamic resource pools
poold - if proj not getting needed resources, poold will shuffle resources
to satisfy

sol 10: containers
zones
lecturer not sure if same thing
some say containers are more than zones, but how?

sol 10 always boots into global zone

zone create takes 20 minutes on lecturers laptop

zoneadm list -iv

zlogin to log into a specific zone

no pid overlap between zones.  Only global zone sees all pids on system

single instance of solaris in multizone scenario

nonglobal zones don't see /dev, for example
cannot run dtrace in nonglobal zone yet, they are working on this
can config what filesystems are visible
can inherit filesystems, can make filesystems readonly or rw

from global zone, can cd into local zone's hierarchies.  root, dev.

df output different in local zones.  Does not show /dev info

every zone can have a unique network identity
ip traffic routed through global zone's NIC
much like virtual interfaces
apps can bind to INADDR_ANY, but still only get connects for that zone
ports unique to each zone

breakins: may make a mess of a local zone but should not be able to get
to global zone (or other local zones?)

don't nfs export from global zone to a local zone "there are issues with that"

/dev/*mem /dev/dsk /dev/rdsk &c not visible in local zones

zonecfg(1 m) - give name, path, network, autoboot, pool (which resource pool)

config data in xml format

packages and patches + zones
install in global zone, shows up in local zones
can also install only in global zone
         also                        in local zones

pkgadd and friends changed to control this behavior

inherit-pkg-dir: dir in local zone that should inherit from global zone

loopback and readonly by default

zonecfg -z zone
create
set zonepath=
set autoboot=false
...

/etc/zones holds config data
can be vi'd but...

zoneadm list -cv  (opts?)

zoneadm -z name halt
zoneadm -z name boot

boots in a couple seconds

zlogin zonename


www.solarisinternals.com
pdf from talk should be there, in current version, by tonight


dtrace -l  list all probes, point of data collection
42000+ probes
every probe has a unique int identifier, module, function, name

docs.sun.com
download dtrace doc

mdb -
ufs_read::dis      (?)
disassemble ufs_read

dtrace -n ufs-read:entry
disassembly changes when you set a probe
code changes to enable probe

disable the probe, orig code is restored

one of primary goals was to keep dtrace safe enough to use on a production
system.  Should not be possible to cause a crash with dtrace

very thorough error checking

can aggregate data to cut userspace postprocessing

dtrace -n 'syscall:::entry { @sc[execname,probefunc] = count() }'
blank fields match all occurrences
shows commands and their syscalls

probefunc is name of function
execname is name of program
@ is an associative array

'instant gratification'

instrumentation uses language similar to c or awk

lockstat has been around a long time.  Uses dtrace in sol 10
plockstat looks at user apps, added in sol 10
both look at kernel locks

dtrace intermediate format: virtual risc architecture, used for executing
dtrace commands

dtrace -l -P sched
list all probes related to scheduler

can sometimes guess meanings of dtrace providers without being os
internals expert

fpt - big dtrace provider, for every func in kernel, can set a probe
for every func in kernel

"pid" dtrace "provider"

dtrace -n 'pid111:::entry'    111 is pid number
no action spec'd
probe fires and lists

dtrace -n 'pid10161:::entry { @s[probefunc] = count() }'
show all in pid 10161

if you do not restrict what to instrument, dtrace will attempt to
instrument every instruction, and often fail to do so due to lack
of memory

no recompilation necessary for using dtrace

sdt provider can get more info from progs you have source to using dtrace

With interpreted languages (python, java, ruby, bash, perl...), you
have to run against the interpreter, not the script (of course), but
they are working on java script support for dtrace.

dtrace language is called "d", borrows from c and awk

many example d scripts are available

dtrace -s dscriptname.d

quantize(n) can give log base 2 result, useful for weeding out hoardes of
insignificantly small items, and making the big items show up.  count()
is linear on the other hand.

/ cond /      is a form of "if"?

no explicit loops in d, to keep safety - otherwise, you could accidentally
create an infinite loop, crashing the system.  But there are implicit
loops, kind of like in a database query language.  Most related
technologies can crash systems if used "incorrectly".

dtrace can identify all ioctl()'s used by a given process

dtrace can delve into what happens inside a system call

tfork.d can essentially sotruss the kernel during fork()

docs.sun.com

blogs.sun.com

dtrace
not just for:
diagnosis
kernel engineers
service persnnel
system administrators
developers


zonename command
there's a zone variable in dtrace, so even though you cannot currently
dtrace in a local zone, you can dtrace only things in a particular zone
from the global zone

should add zonename to PS1 if available :)

dtrace (d language) has awk-like BEGIN END
also has c-like printa like printf?
supports c-like operators

printa: print aggregation

does d have printf too?


Predictive self healing

fma
smf - AKA "green line", named after a train in Boston

fma: fault management architecture
The point of FMA is to associate every error with a corrective action.
Automate if possible, otherwise notify admin.  Also attempts to name
errors, for easy lookup on Sun's website.

eg: cpu starts generating soft errors- fma took cpu offline, rather than panic

fma:
detect errors
data capture - describe
naming errors with fmri
event protocol

diagnosis
dependency
action
history

no big block of nonsense to sift through and use for diagnosis

go to www.sun.com, cut and paste error message id into form, get info

did the lecturer say that sol 10 can offline individual memory pages?

fault diagnosis tech in sol 10 called "eversholt"
diagnose
fault tree
language "eversholt"
compiled
simulation environment
You're unlikely to need to know eversholt, as an admin.  It sounds like
kernel engineers might be interested though.

fma is part of the solaris kernel


error handler > fault manager > ...

fmadm
fmdump - check logs
fmstat

maybe add some of this to oacstats

fmstat -a
takes a few minutes

hardware specific: fma
fma components are specific to pieces of hardware, but many parts of
fma are generic

fmstat -a hangs in local zones: run it in the global zone

to look up fma errors:
http://sun.com/msg/

fmadm config

/etc /etc/rc*.d, init and inetd: "ad hoc"
smf replaces these

dependencies across a network for SMF: no not yet, but it's the next
logical step

smf and fma both use fmri
"fault management resource identifier", or a name for a resource

svcs 

gives fmri , time service started, whether it's running

not -everything- from sun has migrated to smf yet, sounds like they ran
out of time for doing so before sol 10 was frozen

right now smf is process oriented, but later you'll be able to give a
verification method that could test for an active tcp port, etc (Actually,
this would be trivial to add with some shell or python scripting....)

svcs -p fmri
fmri eg network/smtp-sendmail

svcs -D network/physical
list all services depend on physical network

svcs -l metainit   (check on Solaris Volume Manager initialization)


blogs.sun.com
examples for doing it by hand
later there'll be a framework for creating them


process rights management
each process has four priviledge sets
integrated with security framework

list privs available for processes
ppriv -l

complementary to rbac: rule based access control

"effective privilege set"

can define what's inherited and what's not

ppriv -v $$
lots of info about privs for a given process (your current shell is $$)

zfs
not in sol 10 fcs
hopefully in sol 10 update 2

zettabyte fs - not spelled consistently in lecture: sometimes one t,
sometimes two

zfs external beta will be available very soon

limits
integrity checks
...

128 bits: bits, bytes or blocks
65 bit in 12 years
zetabyte=70 bits
zfs goes to 256 quadrillion zb
quantum limit of earth based storage

striped across devices- keeps track of device response times, and tries
to go to fastest disks

self-healing data

dd to a dev in a zfs mirror
zfs is fine, gives back good data, tells you there's an error

if zfs can mirror, what does that mean for SVM?

DMU treats disks as intetchangable parts for disk space alloc purposes

contrasting ufs+svm with zfs
ufs+svm many steps, zfs easier.

zpool create "home" mirror(disk1,disk2)

zfs mount -c home/ann

zpool add "home" mirror(disk3,disk4)

zfs will not be supported for root fs initially

zfs supports (user) quotas

cannot layer ufs overtop of zfs

zfs has snapshot capability

root ro like cdrom and writeable zfs

group quotas?

pool has fixed size, but any fs in pool can consume entire pool, barring quotas

copy on write for zfs snapshots - don't have to break a mirror to
accomlish this

extra blocks only from same pool?

Zfs: no silent data corruption: uses checksum for data integrity

changed threads model: optional in 8 , def in 9

Atlas project small systems tuning
janus project: run linux binaries on same-cpu solaris

niagra: multicore, multithread per core
low heat

vertical threading: multiple threads multiplexed on a pipeline

05-4-10: 1.1 million downloads of Solaris 10 since january
Back to Dan's palm memos