Before you can do anything, you must initialize a database storage
area on disk. We call this a database cluster.
(SQL uses the term catalog cluster.) A
database cluster is a collection of databases that is managed by a
single instance of a running database server. After initialization, a
database cluster will contain a database named postgres
,
which is meant as a default database for use by utilities, users and third
party applications. The database server itself does not require the
postgres
database to exist, but many external utility
programs assume it exists. Another database created within each cluster
during initialization is called
template1
. As the name suggests, this will be used
as a template for subsequently created databases; it should not be
used for actual work. (See Chapter 19, Managing Databases for
information about creating new databases within a cluster.)
In file system terms, a database cluster will be a single directory
under which all data will be stored. We call this the data
directory or data area. It is
completely up to you where you choose to store your data. There is no
default, although locations such as
/usr/local/pgsql/data
or
/var/lib/pgsql/data
are popular. To initialize a
database cluster, use the command initdb, which is
installed with PostgreSQL. The desired
file system location of your database cluster is indicated by the
-D
option, for example
$
initdb -D /usr/local/pgsql/data
Note that you must execute this command while logged into the PostgreSQL user account, which is described in the previous section.
initdb
will attempt to create the directory you
specify if it does not already exist. It is likely that it will not
have the permission to do so (if you followed our advice and created
an unprivileged account). In that case you should create the
directory yourself (as root) and change the owner to be the
PostgreSQL user. Here is how this might
be done:
root#mkdir /usr/local/pgsql/data
root#chown postgres /usr/local/pgsql/data
root#su postgres
postgres$initdb -D /usr/local/pgsql/data
initdb
will refuse to run if the data directory
looks like it has already been initialized.
Because the data directory contains all the data stored in the
database, it is essential that it be secured from unauthorized
access. initdb
therefore revokes access
permissions from everyone but the
PostgreSQL user.
However, while the directory contents are secure, the default
client authentication setup allows any local user to connect to the
database and even become the database superuser. If you do not
trust other local users, we recommend you use one of
initdb
's -W
, --pwprompt
or --pwfile
options to assign a password to the
database superuser. Also, specify -A md5
or
-A password
so that the default trust
authentication
mode is not used; or modify the generated pg_hba.conf
file after running initdb
,
before you start the server for the first time. (Other
reasonable approaches include using ident
authentication
or file system permissions to restrict connections. See Chapter 20, Client Authentication for more information.)
initdb
also initializes the default
locale for the database cluster.
Normally, it will just take the locale settings in the environment
and apply them to the initialized database. It is possible to
specify a different locale for the database; more information about
that can be found in Section 21.1, “Locale Support”. The sort order used
within a particular database cluster is set by
initdb
and cannot be changed later, short of
dumping all data, rerunning initdb
, and reloading
the data. There is also a performance impact for using locales
other than C
or POSIX
. Therefore, it is
important to make this choice correctly the first time.
initdb
also sets the default character set encoding
for the database cluster. Normally this should be chosen to match the
locale setting. For details see Section 21.2, “Character Set Support”.