This document is a user's guide for GNU PSPP statistical analysis software version 0.8.4. It provides information on invoking and using PSPP, including preparing data files, performing statistical tests and analyses, and the PSPP command language. The authors thank Network Theory Ltd for financial support in producing this manual.
This document provides an introduction to using R for psychology experiments and questionnaires. It discusses useful concepts in R like functions, objects, and modes. It also describes commands for getting help, installing packages, performing operations on vectors/matrices, reading and saving data, creating graphics, and conducting statistical analyses like regression and ANOVA. The overall purpose is to help new users of R reduce the difficulty of tasks by explaining key aspects of the programming language.
This is the printout version of my lecture slides for the OS course. It includes more details (quations from books, references, etc.) than the slides version.
This document provides an introduction to security on mainframe systems. It discusses fundamental security concepts like confidentiality, integrity and availability. It also covers security elements such as identification, authentication, authorization, encryption and auditing. Additionally, it examines the System z architecture and how the hardware and operating system provide security features. The document uses a case study about securing an online bookstore to illustrate how these concepts apply in a business context. It is intended to help readers understand mainframe security.
This document provides an overview of CRUD (create, read, update, delete) operations in MongoDB. It describes the basic database operations like querying and modifying documents. It also covers related features like indexes, read preference for replica sets, and write concern. The document consists of several sections that explain MongoDB CRUD concepts and provide tutorial-style examples for performing CRUD operations.
This document is a reference manual for Libgcrypt version 1.6.4, which was released on September 7, 2015. Libgcrypt is a cryptographic library developed by GNU that provides cryptographic primitives like symmetric ciphers, public-key ciphers, hashes, and MACs. The manual provides documentation on using the various cryptographic algorithms and functions provided by Libgcrypt. It also covers topics like error handling, multi-threading, and the use of self-tests and FIPS mode. The manual is published under the GNU GPL license.
This document is a practical introduction to Python programming that covers basic concepts like installing Python, writing simple programs, variables, conditionals, loops, functions of numbers and strings, lists, and more advanced topics like list comprehensions and two-dimensional lists. It is intended as a teaching guide, broken into chapters and sections with examples and exercises for learning Python programming concepts and techniques.
Information extraction systems aspects and characteristicsGeorge Ang
This document provides a survey of information extraction systems and techniques. It discusses the main components and design approaches of information extraction, including manual and automatic pattern discovery. It also reviews several important prior information extraction systems and approaches to wrapper generation, including both supervised and unsupervised methods. The document serves to describe the state of the art in information extraction and provide an overview of the field.
This document provides an overview of IBM Watson Content Analytics and describes how it can be used to gain insights from unstructured content. It discusses the product's history and key features in version 3.0. Some main capabilities include performing automated content analysis, discovering patterns and correlations in data, and gaining insights to improve products and services. The document also provides examples of how Content Analytics has been applied in various use cases, such as customer service, healthcare, and investigations.
This document provides an overview of IBM Watson Content Analytics and how it can be used to gain insights from unstructured content. It discusses the architecture of Content Analytics, which includes ingesting and processing unstructured data using natural language processing techniques. It then provides several use case examples where Content Analytics has been applied, such as for customer insights, healthcare, and investigations. The document also covers best practices for designing Content Analytics solutions and understanding the types of analysis that can be performed.
This document is a book about expert Oracle database architecture for versions 9i and 10g. It discusses programming techniques and solutions. The book covers topics such as developing successful Oracle applications, database architecture overview, database files, memory structures, SQL and PL/SQL, concurrency and locking, performance, backup and recovery, and more. It aims to help readers understand Oracle architecture and how to write efficient and optimized database applications.
Perl <b>5 Tutorial</b>, First Editiontutorialsruby
This document is a 241-page tutorial on the Perl 5 programming language. It was written by Chan Bernard Ki Hong and published in 2003. The document covers topics such as what Perl is, getting started with Perl, manipulating data structures, operators, conditionals, loops, and subroutines. It includes over 20 chapters to guide readers through learning Perl programming.
This document is a PDF version of the Python Programming Wikibook, which provides instruction on a variety of Python topics. It includes the LaTeX source code as an attachment, and specifies how to extract and decompress the source code from the PDF. The document also describes various licenses that may apply to parts of the content within, due to being derived from Wikibooks and Wikipedia projects.
This document provides an overview and introduction to the R programming language. It discusses the history and development of R, how to get started with R, how to import and export data, and how to perform common operations like subsetting, manipulating data frames, control structures, and defining functions. The document serves as a manual or guide for getting started with the basic functionality of R for data science applications.
This document is a Python tutorial that provides an overview of the Python programming language. It covers topics like using the Python interpreter, basic syntax, data structures, modules, input/output, exceptions, classes and inheritance, and the standard library. The tutorial is intended for new Python programmers to help them learn the essential aspects of the language.
Python is an easy to learn, powerful programming language. It has efficient high-level data structures and
a simple but effective approach to object-oriented programming. Python’s elegant syntax and dynamic
typing, together with its interpreted nature, make it an ideal language for scripting and rapid application
development in many areas on most platforms.
The Python interpreter and the extensive standard library are freely available in source or binary form for all
major platforms from the Python Web site, https://www.python.org/, and may be freely distributed. The
same site also contains distributions of and pointers to many free third-party Python modules, programs
and tools, and additional documentation.
This document is a tutorial for the Python programming language. It covers topics such as using Python as a calculator, basic programming concepts like variables and functions, built-in data types like lists and dictionaries, modules and packages, input/output functions, exceptions and errors, classes and inheritance, and an overview of the Python standard library. The tutorial is intended for new Python programmers to help them learn the fundamentals of the language.
This document is a tutorial for the Python programming language. It covers topics such as using the Python interpreter, basic syntax, data types, control flow, functions, modules, input/output, exceptions, object-oriented programming with classes, and an overview of the standard library. The tutorial is intended for new Python programmers to help them learn the fundamentals of the language.
This document is a textbook titled "Programming Fundamentals - A Modular Structured Approach using C++" by Kenneth Leroy Busbee. It covers topics related to programming fundamentals such as data types, operators, functions, input/output, and more using C++ as the programming language. The textbook is divided into chapters that each cover a programming concept and include examples and exercises. It is intended to teach structured programming techniques using a modular approach in C++.
This document is a 241-page tutorial on the Perl 5 programming language. It was written by Chan Bernard Ki Hong and published in 2003 under a Creative Commons license. The document provides an introduction to Perl and teaches the fundamentals of Perl programming, including data structures, operators, conditionals, loops, subroutines, references and more. It aims to help readers learn Perl and further improve the quality of the tutorial through reader feedback.
This document is a practical introduction to Python programming that covers basic concepts like installing Python, writing simple programs, using variables, conditionals, loops, functions of numbers and strings, lists, and more advanced topics like list comprehensions and two-dimensional lists. It is intended as a teaching guide, broken into chapters and sections with examples and exercises for learning Python programming concepts and techniques.
A practical introduction_to_python_programming_heinoldthe_matrix
This document is a practical introduction to Python programming that covers various Python basics like installing Python, writing a first program, using variables, conditionals, loops, strings, lists, functions and more. It is intended as a teaching guide, broken into chapters and sections with examples and exercises for learning Python programming concepts and techniques.
A Practical Introduction To Python ProgrammingNat Rice
This document is a practical introduction to Python programming that covers basic concepts like installing Python, writing simple programs, variables, conditionals, loops, functions of numbers and strings, lists, and more advanced topics like list comprehensions and two-dimensional lists. It is intended as a teaching guide, broken into chapters and sections with examples and exercises for learning Python programming concepts and techniques.
This document is a practical introduction to Python programming that covers basic concepts like installing Python, writing simple programs, using variables, conditionals, loops, functions of numbers and strings, lists, and more advanced topics like list comprehensions and two-dimensional lists. It is intended as a teaching guide, broken into chapters and sections with examples and exercises for learning Python programming concepts and techniques.
This document provides a summary and introduction to a professional programmer's guide to Fortran77. It discusses the early development and standardization of Fortran. It also provides an overview of the contents of the guide, which aims to cover the entire Fortran77 language as defined by standards at the time. The guide has been converted to digital format and made freely available online as the original published version is now out of print.
This document provides an introduction to learning how to program with the Python programming language. It covers fundamental programming concepts like values and variables, expressions, conditional execution, iteration, functions, and more. Each chapter defines and provides examples of a new programming concept. The goal is to teach readers enough Python to begin writing basic programs.
This document is a table of contents for the book "How to Think Like a Computer Scientist: Learning with Python 3" which teaches Python programming concepts like variables, expressions, statements, functions, conditionals, iteration and more. It lists the chapter titles and section headings that make up the book's content which is focused on teaching the fundamentals of computer science using the Python programming language.
This document is a draft of a book on mathematics for programmers. It covers various topics in mathematics including prime numbers, modular arithmetic, probability, combinatorics, Galois fields, and logarithms. The document provides explanations, examples, and applications of these mathematical concepts for use in computer programming. It is intended to help programmers understand and apply core mathematical principles in their work.
The document is an IBM redbook that discusses dimensional modeling for business intelligence environments. It covers dimensional modeling techniques, data warehouse architectures, the data modeling lifecycle, and considerations for dimensional modeling like identifying grains, dimensions, and handling dimension types and hierarchies. The goal of dimensional modeling is to organize data to facilitate analysis and enable flexible, high-performance querying of business metrics and key performance indicators.
This document is a user guide for scikit-learn, an open-source machine learning library for Python. It provides tutorials and documentation on machine learning techniques including supervised learning algorithms like linear models, support vector machines, decision trees, and unsupervised learning algorithms like clustering and dimensionality reduction. The guide covers loading and preprocessing data, model fitting and prediction, evaluating performance, and parameter tuning. It also includes examples demonstrating how to apply the library's algorithms to real-world problems.
This document is a textbook about the science of computing. It is divided into chapters that cover various topics in computing including: logic circuits, data representation, computational circuits, computer architecture, operating systems, artificial intelligence, and language and computation. The textbook is copyrighted to Carl Burch and is intended to provide an introduction to fundamental concepts in computer science.
Algorithmic Problem Solving with Python.pdfEmily Smith
This document is a textbook titled "Algorithmic Problem Solving with Python" that introduces Python programming concepts. It covers topics such as obtaining and running Python, data types, expressions, statements, functions, objects, lists, for-loops, modules and strings. Each chapter provides examples and exercises to help readers learn. The textbook is intended to teach readers how to solve algorithmic problems using the Python programming language.
This document provides an introduction to using R for data mining. It discusses importing and exploring data, various data mining techniques including decision trees, regression, clustering, outlier detection, time series analysis, association rule mining and text mining. Case studies and online resources for further learning are also referenced. The document is intended as a guide to applying data mining methods in R and includes examples using common R packages.
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartMohit Tripathi
SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA MATKA RESULT KALYAN MATKA TIPS SATTA MATKA MATKA COM MATKA PANA JODI TODAY BATTA SATKA MATKA PATTI JODI NUMBER MATKA RESULTS MATKA CHART MATKA JODI SATTA COM INDIA SATTA MATKA MATKA TIPS MATKA WAPKA ALL MATKA RESULT LIVE ONLINE MATKA RESULT KALYAN MATKA RESULT DPBOSS MATKA 143 MAIN MATKA KALYAN MATKA RESULTS KALYAN CHART
Kalyan Matka Kalyan Result Satta Matka Result Satta Matka Kalyan Satta Matka Kalyan Open Today Satta Matka Kalyan
Kalyan today kalyan trick kalyan trick today kalyan chart kalyan today free game kalyan today fix jodi kalyan today matka kalyan today open Kalyan jodi kalyan jodi trick today kalyan jodi trick kalyan jodi ajj ka.
No, it's not a robot: prompt writing for investigative journalismPaul Bradshaw
How to use generative AI tools like ChatGPT and Gemini to generate story ideas for investigations, identify potential sources, and help with coding and writing.
A talk from the Centre for Investigative Journalism Summer School, July 2024
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdfJackieSparrow3
we may assume that God created the cosmos to be his great temple, in which he rested after his creative work. Nevertheless, his special revelatory presence did not fill the entire earth yet, since it was his intention that his human vice-regent, whom he installed in the garden sanctuary, would extend worldwide the boundaries of that sanctuary and of God’s presence. Adam, of course, disobeyed this mandate, so that humanity no longer enjoyed God’s presence in the little localized garden. Consequently, the entire earth became infected with sin and idolatry in a way it had not been previously before the fall, while yet in its still imperfect newly created state. Therefore, the various expressions about God being unable to inhabit earthly structures are best understood, at least in part, by realizing that the old order and sanctuary have been tainted with sin and must be cleansed and recreated before God’s Shekinah presence, formerly limited to heaven and the holy of holies, can dwell universally throughout creation
Integrated Marketing Communications (IMC)- Concept, Features, Elements, Role of advertising in IMC
Advertising: Concept, Features, Evolution of Advertising, Active Participants, Benefits of advertising to Business firms and consumers.
Classification of advertising: Geographic, Media, Target audience and Functions.
Principles of Roods Approach!!!!!!!.pptxibtesaam huma
Principles of Rood’s Approach
Treatment technique used in physiotherapy for neurological patients which aids them to recover and improve quality of life
Facilitatory techniques
Inhibitory techniques
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894PECB
As artificial intelligence continues to evolve, understanding the complexities and regulations regarding AI risk management is more crucial than ever.
Amongst others, the webinar covers:
• ISO/IEC 42001 standard, which provides guidelines for establishing, implementing, maintaining, and continually improving AI management systems within organizations
• insights into the European Union's landmark legislative proposal aimed at regulating AI
• framework and methodologies prescribed by ISO/IEC 23894 for identifying, assessing, and mitigating risks associated with AI systems
Presenters:
Miriama Podskubova - Attorney at Law
Miriama is a seasoned lawyer with over a decade of experience. She specializes in commercial law, focusing on transactions, venture capital investments, IT, digital law, and cybersecurity, areas she was drawn to through her legal practice. Alongside preparing contract and project documentation, she ensures the correct interpretation and application of European legal regulations in these fields. Beyond client projects, she frequently speaks at conferences on cybersecurity, online privacy protection, and the increasingly pertinent topic of AI regulation. As a registered advocate of Slovak bar, certified data privacy professional in the European Union (CIPP/e) and a member of the international association ELA, she helps both tech-focused startups and entrepreneurs, as well as international chains, to properly set up their business operations.
Callum Wright - Founder and Lead Consultant Founder and Lead Consultant
Callum Wright is a seasoned cybersecurity, privacy and AI governance expert. With over a decade of experience, he has dedicated his career to protecting digital assets, ensuring data privacy, and establishing ethical AI governance frameworks. His diverse background includes significant roles in security architecture, AI governance, risk consulting, and privacy management across various industries, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: June 26, 2024
Tags: ISO/IEC 42001, Artificial Intelligence, EU AI Act, ISO/IEC 23894
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
Beyond the Advance Presentation for By the Book 9John Rodzvilla
In June 2020, L.L. McKinney, a Black author of young adult novels, began the #publishingpaidme hashtag to create a discussion on how the publishing industry treats Black authors: “what they’re paid. What the marketing is. How the books are treated. How one Black book not reaching its parameters casts a shadow on all Black books and all Black authors, and that’s not the same for our white counterparts.” (Grady 2020) McKinney’s call resulted in an online discussion across 65,000 tweets between authors of all races and the creation of a Google spreadsheet that collected information on over 2,000 titles.
While the conversation was originally meant to discuss the ethical value of book publishing, it became an economic assessment by authors of how publishers treated authors of color and women authors without a full analysis of the data collected. This paper would present the data collected from relevant tweets and the Google database to show not only the range of advances among participating authors split out by their race, gender, sexual orientation and the genre of their work, but also the publishers’ treatment of their titles in terms of deal announcements and pre-pub attention in industry publications. The paper is based on a multi-year project of cleaning and evaluating the collected data to assess what it reveals about the habits and strategies of American publishers in acquiring and promoting titles from a diverse group of authors across the literary, non-fiction, children’s, mystery, romance, and SFF genres.
How to Add Colour Kanban Records in Odoo 17 NotebookCeline George
In Odoo 17, you can enhance the visual appearance of your Kanban view by adding color-coded records using the Notebook feature. This allows you to categorize and distinguish between different types of records based on specific criteria. By adding colors, you can quickly identify and prioritize tasks or items, improving organization and efficiency within your workflow.
How to Install Theme in the Odoo 17 ERPCeline George
With Odoo, we can select from a wide selection of attractive themes. Many excellent ones are free to use, while some require payment. Putting an Odoo theme in the Odoo module directory on our server, downloading the theme, and then installing it is a simple process.
2. This manual is for GNU PSPP version 0.8.4-g267362, software for statistical analysis.
Copyright c 1997, 1998, 2004, 2005, 2009, 2012, 2013, 2014 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document under the
terms of the GNU Free Documentation License, Version 1.3 or any later version
published by the Free Software Foundation; with no Invariant Sections, no
Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included
in the section entitled "GNU Free Documentation License".
3. 1
The authors wish to thank Network Theory Ltd http://www.network-theory.co.uk
for their financial support in the production of this manual.
10. Chapter 1: Introduction 2
1 Introduction
pspp is a tool for statistical analysis of sampled data. It reads the data, analyzes the data
according to commands provided, and writes the results to a listing file, to the standard
output or to a window of the graphical display.
The language accepted by pspp is similar to those accepted by SPSS statistical products.
The details of pspp’s language are given later in this manual.
pspp produces tables and charts as output, which it can produce in several formats;
currently, ASCII, PostScript, PDF, HTML, and DocBook are supported.
The current version of pspp, 0.8.4-g267362, is incomplete in terms of its statistical pro-
cedure support. pspp is a work in progress. The authors hope to fully support all features
in the products that pspp replaces, eventually. The authors welcome questions, comments,
donations, and code submissions. See Chapter 20 [Submitting Bug Reports], page 174, for
instructions on contacting the authors.
11. Chapter 2: Your rights and obligations 3
2 Your rights and obligations
pspp is not in the public domain. It is copyrighted and there are restrictions on its distri-
bution, but these restrictions are designed to permit everything that a good cooperating
citizen would want to do. What is not allowed is to try to prevent others from further
sharing any version of this program that they might get from you.
Specifically, we want to make sure that you have the right to give away copies of pspp,
that you receive source code or else can get it if you want it, that you can change these
programs or use pieces of them in new free programs, and that you know you can do these
things.
To make sure that everyone has such rights, we have to forbid you to deprive anyone else
of these rights. For example, if you distribute copies of pspp, you must give the recipients
all the rights that you have. You must make sure that they, too, receive or can get the
source code. And you must tell them their rights.
Also, for our own protection, we must make certain that everyone finds out that there
is no warranty for pspp. If these programs are modified by someone else and passed on, we
want their recipients to know that what they have is not what we distributed, so that any
problems introduced by others will not reflect on our reputation.
Finally, any free program is threatened constantly by software patents. We wish to avoid
the danger that redistributors of a free program will individually obtain patent licenses, in
effect making the program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone’s free use or not licensed at all.
The precise conditions of the license for pspp are found in the GNU General Public
License. You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth
Floor, Boston, MA 02110-1301 USA. This manual specifically is covered by the GNU Free
Documentation License (see Appendix A [GNU Free Documentation License], page 186).
12. Chapter 3: Invoking pspp 4
3 Invoking pspp
pspp has two separate user interfaces. This chapter describes pspp, pspp’s command-line
driven text-based user interface. The following chapter briefly describes PSPPIRE, the
graphical user interface to pspp.
The sections below describe the pspp program’s command-line interface.
3.1 Main Options
Here is a summary of all the options, grouped by type, followed by explanations in the same
order.
In the table, arguments to long options also apply to any corresponding short options.
Non-option arguments
syntax-file
Output options
-o, --output=output-file
-O option=value
-O format=format
-O device={terminal|listing}
--no-output
-e, --error-file=error-file
Language options
-I, --include=dir
-I-, --no-include
-b, --batch
-i, --interactive
-r, --no-statrc
-a, --algorithm={compatible|enhanced}
-x, --syntax={compatible|enhanced}
--syntax-encoding=encoding
Informational options
-h, --help
-V, --version
Other options
-s, --safer
--testing-mode
syntax-file Read and execute the named syntax file. If no syntax files are specified, pspp
prompts for commands. If any syntax files are specified, pspp by default exits
after it runs them, but you may make it prompt for commands by specifying
‘-’ as an additional syntax file.
-o output-file
Write output to output-file. pspp has several different output drivers that
support output in various formats (use --help to list the available formats).
13. Chapter 3: Invoking pspp 5
Specify this option more than once to produce multiple output files, presumably
in different formats.
Use ‘-’ as output-file to write output to standard output.
If no -o option is used, then pspp writes text and CSV output to standard
output and other kinds of output to whose name is based on the format, e.g.
pspp.pdf for PDF output.
-O option=value
Sets an option for the output file configured by a preceding -o. Most options
are specific to particular output formats. A few options that apply generically
are listed below.
-O format=format
pspp uses the extension of the file name given on -o to select an output format.
Use this option to override this choice by specifying an alternate format, e.g.
-o pspp.out -O html to write HTML to a file named pspp.out. Use --help
to list the available formats.
-O device={terminal|listing}
Sets whether pspp considers the output device configured by the preceding -o
to be a terminal or a listing device. This affects what output will be sent to
the device, as configured by the SET command’s output routing subcommands
(see Section 16.20 [SET], page 157). By default, output written to standard
output is considered a terminal device and other output is considered a listing
device.
--no-output
Disables output entirely, if neither -o nor -O is also used. If one of those options
is used, --no-output has no effect.
-e error-file
--error-file=error-file
Configures a file to receive pspp error, warning, and note messages in plain
text format. Use ‘-’ as error-file to write messages to standard output. The
default error file is standard output in the absence of these options, but this is
suppressed if an output device writes to standard output (or another terminal),
to avoid printing every message twice. Use ‘none’ as error-file to explicitly
suppress the default.
-I dir
--include=dir
Appends dir to the set of directories searched by the INCLUDE (see Section 16.15
[INCLUDE], page 155) and INSERT (see Section 16.16 [INSERT], page 155)
commands.
-I-
--no-include
Clears all directories from the include path, including directories inserted in
the include path by default. The default include path is . (the current direc-
tory), followed by .pspp in the user’s home directory, followed by pspp’s system
configuration directory (usually /etc/pspp or /usr/local/etc/pspp).
14. Chapter 3: Invoking pspp 6
-b
--batch
-i
--interactive
These options forces syntax files to be interpreted in batch mode or interac-
tive mode, respectively, rather than the default “auto” mode. See Section 6.3
[Syntax Variants], page 30, for a description of the differences.
-r
--no-statrc
Disables running rc at pspp startup time.
-a {enhanced|compatible}
--algorithm={enhanced|compatible}
With enhanced, the default, pspp uses the best implemented algorithms for
statistical procedures. With compatible, however, pspp will in some cases
use inferior algorithms to produce the same results as the proprietary program
SPSS.
Some commands have subcommands that override this setting on a per com-
mand basis.
-x {enhanced|compatible}
--syntax={enhanced|compatible}
With enhanced, the default, pspp accepts its own extensions beyond those
compatible with the proprietary program SPSS. With compatible, pspp rejects
syntax that uses these extensions.
--syntax-encoding=encoding
Specifies encoding as the encoding for syntax files named on the command
line. The encoding also becomes the default encoding for other syntax files
read during the pspp session by the INCLUDE and INSERT commands. See
Section 16.16 [INSERT], page 155, for the accepted forms of encoding.
--help Prints a message describing pspp command-line syntax and the available device
formats, then exits.
-V
--version
Prints a brief message listing pspp’s version, warranties you don’t have, copying
conditions and copyright, and e-mail address for bug reports, then exits.
-s
--safer Disables certain unsafe operations. This includes the ERASE and HOST com-
mands, as well as use of pipes as input and output files.
--testing-mode
Invoke heuristics to assist with testing pspp. For use by make check and similar
scripts.
15. Chapter 3: Invoking pspp 7
3.2 PDF, PostScript, and SVG Output Options
To produce output in PDF, PostScript, and SVG formats, specify -o file on the pspp com-
mand line, optionally followed by any of the options shown in the table below to customize
the output format.
PDF, PostScript, and SVG output is only available if your installation of pspp was
compiled with the Cairo library.
-O format={pdf|ps|svg}
Specify the output format. This is only necessary if the file name given on -o
does not end in .pdf, .ps, or .svg.
-O paper-size=paper-size
Paper size, as a name (e.g. a4, letter) or measurements (e.g. 210x297,
8.5x11in).
The default paper size is taken from the PAPERSIZE environment variable or the
file indicated by the PAPERCONF environment variable, if either variable is set.
If not, and your system supports the LC_PAPER locale category, then the default
paper size is taken from the locale. Otherwise, if /etc/papersize exists, the
default paper size is read from it. As a last resort, A4 paper is assumed.
-O foreground-color=color
-O background-color=color
Sets color as the color to be used for the background or foreground. Color should
be given in the format #RRRRGGGGBBBB, where RRRR, GGGG and BBBB are
4 character hexadecimal representations of the red, green and blue components
respectively.
-O orientation=orientation
Either portrait or landscape. Default: portrait.
-O left-margin=dimension
-O right-margin=dimension
-O top-margin=dimension
-O bottom-margin=dimension
Sets the margins around the page. See below for the allowed forms of dimension
Default: 0.5in.
-O prop-font=font-name
-O emph-font=font-name
-O fixed-font=font-name
Sets the font used for proportional, emphasized, or fixed-pitch text. Most sys-
tems support CSS-like font names such as “serif” and “monospace”, but a wide
range of system-specific font are likely to be supported as well.
Default: proportional font serif, emphasis font serif italic, fixed-pitch font
monospace.
-O font-size=font-size
Sets the size of the default fonts, in thousandths of a point. Default: 10000 (10
point).
16. Chapter 3: Invoking pspp 8
-O line-gutter=dimension
Sets the width of white space on either side of lines that border text or graphics
objects. Default: 1pt.
-O line-spacing=dimension
Sets the spacing between the lines in a double line in a table. Default: 1pt.
-O line-width=dimension
Sets the width of the lines used in tables. Default: 0.5pt.
Each dimension value above may be specified in various units based on its suffix: ‘mm’
for millimeters, ‘in’ for inches, or ‘pt’ for points. Lacking a suffix, numbers below 50 are
assumed to be in inches and those about 50 are assumed to be in millimeters.
3.3 Plain Text Output Options
pspp can produce plain text output, drawing boxes using ASCII or Unicode line drawing
characters. To produce plain text output, specify -o file on the pspp command line,
optionally followed by options from the table below to customize the output format.
Plain text output is encoded in UTF-8.
-O format=txt
Specify the output format. This is only necessary if the file name given on -o
does not end in .txt or .list.
-O charts={template.png|none}
Name for chart files included in output. The value should be a file name that
includes a single ‘#’ and ends in png. When a chart is output, the ‘#’ is replaced
by the chart number. The default is the file name specified on -o with the
extension stripped off and replaced by -#.png.
Specify none to disable chart output. Charts are always disabled if your instal-
lation of pspp was compiled without the Cairo library.
-O foreground-color=color
-O background-color=color
Sets color as the color to be used for the background or foreground to be used
for charts. Color should be given in the format #RRRRGGGGBBBB, where RRRR,
GGGG and BBBB are 4 character hexadecimal representations of the red,
green and blue components respectively. If charts are disabled, this option has
no effect.
-O paginate=boolean
If set, pspp writes an ASCII formfeed the end of every page. Default: off.
-O headers=boolean
If enabled, pspp prints two lines of header information giving title and subtitle,
page number, date and time, and pspp version are printed at the top of every
page. These two lines are in addition to any top margin requested. Default:
off.
17. Chapter 3: Invoking pspp 9
-O length=line-count
Physical length of a page. Headers and margins are subtracted from this value.
You may specify the number of lines as a number, or for screen output you may
specify auto to track the height of the terminal as it changes. Default: 66.
-O width=character-count
Width of a page, in characters. Margins are subtracted from this value. For
screen output you may specify auto in place of a number to track the width of
the terminal as it changes. Default: 79.
-O top-margin=top-margin-lines
Length of the top margin, in lines. pspp subtracts this value from the page
length. Default: 0.
-O bottom-margin=bottom-margin-lines
Length of the bottom margin, in lines. pspp subtracts this value from the page
length. Default: 0.
-O box={ascii|unicode}
Sets the characters used for lines in tables. If set to ascii the characters ‘-’, ‘|’,
and ‘+’ for single-width lines and ‘=’ and ‘#’ for double-width lines are used. If
set to unicode then Unicode box drawing characters will be used. The default
is unicode if the locale’s character encoding is "UTF-8" or ascii otherwise.
-O emphasis={none|bold|underline}
How to emphasize text. Bold and underline emphasis are achieved with over-
striking, which may not be supported by all the software to which you might
pass the output. Default: none.
3.4 HTML Output Options
To produce output in HTML format, specify -o file on the pspp command line, optionally
followed by any of the options shown in the table below to customize the output format.
-O format=html
Specify the output format. This is only necessary if the file name given on -o
does not end in .html.
-O charts={template.png|none}
Sets the name used for chart files. See Section 3.3 [Plain Text Output Options],
page 8, for details.
-O borders=boolean
Decorate the tables with borders. If set to false, the tables produced will have
no borders. The default value is true.
-O css=boolean
Use cascading style sheets. Cascading style sheets give an improved appearance
and can be used to produce pages which fit a certain web site’s style. The default
value is true.
18. Chapter 3: Invoking pspp 10
3.5 OpenDocument Output Options
To produce output as an OpenDocument text (ODT) document, specify -o file on the
pspp command line. If file does not end in .odt, you must also specify -O format=odt.
ODT support is only available if your installation of pspp was compiled with the libxml2
library.
The OpenDocument output format does not have any configurable options.
3.6 Comma-Separated Value Output Options
To produce output in comma-separated value (CSV) format, specify -o file on the pspp
command line, optionally followed by any of the options shown in the table below to cus-
tomize the output format.
-O format=csv
Specify the output format. This is only necessary if the file name given on -o
does not end in .csv.
-O separator=field-separator
Sets the character used to separate fields. Default: a comma (‘,’).
-O quote=qualifier
Sets qualifier as the character used to quote fields that contain white space,
the separator (or any of the characters in the separator, if it contains more
than one character), or the quote character itself. If qualifier is longer than one
character, only the first character is used; if qualifier is the empty string, then
fields are never quoted.
-O titles=boolean
Whether table titles (brief descriptions) should be printed. Default: on.
-O captions=boolean
Whether table captions (more extensive descriptions) should be printed. De-
fault: on.
The CSV format used is an extension to that specified in RFC 4180:
Tables Each table row is output on a separate line, and each column is output as a
field. The contents of a cell that spans multiple rows or columns is output only
for the top-left row and column; the rest are output as empty fields.
Titles When a table has a title and titles are enabled, the title is output just above
the table as a single field prefixed by ‘Table:’.
Captions When a table has a caption and captions are enabled, the caption is output
just below the table as a single field prefixed by ‘Caption:’.
Footnotes Within a table, footnote markers are output as bracketed letters following the
cell’s contents, e.g. ‘[a]’, ‘[b]’, . . . The footnotes themselves are output fol-
lowing the body of the table, as a separate two-column table introduced with a
line that says ‘Footnotes:’. Each row in the table represent one footnote: the
first column is the marker, the second column is the text.
19. Chapter 3: Invoking pspp 11
Text Text in output is printed as a field on a line by itself. The TITLE and SUBTI-
TLE produce similar output, prefixed by ‘Title:’ or ‘Subtitle:’, respectively.
Messages Errors, warnings, and notes are printed the same way as text.
Charts Charts are not included in CSV output.
Successive output items are separated by a blank line.
20. Chapter 4: Invoking psppire 12
4 Invoking psppire
4.1 The graphic user interface
The PSPPIRE graphic user interface for pspp can perform all functionality of the command
line interface. In addition it gives an instantaneous view of the data, variables and statistical
output.
The graphic user interface can be started by typing psppire at a command prompt.
Alternatively many systems have a system of interactive menus or buttons from which
psppire can be started by a series of mouse clicks.
Once the principles of the pspp system are understood, the graphic user interface is
designed to be largely intuitive, and for this reason is covered only very briefly by this
manual.
21. Chapter 5: Using pspp 13
5 Using pspp
pspp is a tool for the statistical analysis of sampled data. You can use it to discover patterns
in the data, to explain differences in one subset of data in terms of another subset and to
find out whether certain beliefs about the data are justified. This chapter does not attempt
to introduce the theory behind the statistical analysis, but it shows how such analysis can
be performed using pspp.
For the purposes of this tutorial, it is assumed that you are using pspp in its interactive
mode from the command line. However, the example commands can also be typed into a
file and executed in a post-hoc mode by typing ‘pspp filename’ at a shell prompt, where
filename is the name of the file containing the commands. Alternatively, from the graphical
interface, you can select File → New → Syntax to open a new syntax window and use the
Run menu when a syntax fragment is ready to be executed. Whichever method you choose,
the syntax is identical.
When using the interactive method, pspp tells you that it’s waiting for your data with
a string like PSPP> or data>. In the examples of this chapter, whenever you see text like
this, it indicates the prompt displayed by pspp, not something that you should type.
Throughout this chapter reference is made to a number of sample data files. So that
you can try the examples for yourself, you should have received these files along with your
copy of pspp.1
Please note: Normally these files are installed in the directory
/usr/local/share/pspp/examples. If however your system administrator or
operating system vendor has chosen to install them in a different location, you
will have to adjust the examples accordingly.
5.1 Preparation of Data Files
Before analysis can commence, the data must be loaded into pspp and arranged such that
both pspp and humans can understand what the data represents. There are two aspects of
data:
• The variables — these are the parameters of a quantity which has been measured or
estimated in some way. For example height, weight and geographic location are all
variables.
• The observations (also called ‘cases’) of the variables — each observation represents an
instance when the variables were measured or observed.
For example, a data set which has the variables height, weight, and name, might have the
observations:
1881 89.2 Ahmed
1192 107.01 Frank
1230 67 Julie
The following sections explain how to define a dataset.
1
These files contain purely fictitious data. They should not be used for research purposes.
23. Chapter 5: Using pspp 15
(ie. when the data prompt is current) since it is appropriate only for terminating
commands.
5.1.2 Listing the data
Once the data has been entered, you could type
PSPP list /format=numbered.
to list the data. The optional text ‘/format=numbered’ requests the case numbers to be
shown along with the data. It should show the following output:
Case# forename height
----- ------------ --------
1 Ahmed 188.00
2 Bertram 167.00
3 Catherine 134.23
4 David 109.10
Note that the numeric variable height is displayed to 2 decimal places, because the format
for that variable is ‘F8.2’. For a complete description of the LIST command, see Section 8.10
[LIST], page 76.
5.1.3 Reading data from a text file
The previous example showed how to define a set of variables and to manually enter the
data for those variables. Manual entering of data is tedious work, and often a file containing
the data will be have been previously prepared. Let us assume that you have a file called
mydata.dat containing the ascii encoded data:
Ahmed 188.00
Bertram 167.00
Catherine 134.23
David 109.10
.
.
.
Zachariah 113.02
You can can tell the DATA LIST command to read the data directly from this file instead of
by manual entry, with a command like:
PSPP data list file=’mydata.dat’ list /forename (A12) height.
Notice however, that it is still necessary to specify the names of the variables and their
formats, since this information is not contained in the file. It is also possible to specify
the file’s character encoding and other parameters. For full details refer to see Section 8.5
[DATA LIST], page 66.
5.1.4 Reading data from a pre-prepared pspp file
When working with other pspp users, or users of other software which uses the pspp data
format, you may be given the data in a pre-prepared pspp file. Such files contain not only
the data, but the variable definitions, along with their formats, labels and other meta-data.
Conventionally, these files (sometimes called “system” files) have the suffix .sav, but that
is not mandatory. The following syntax loads a file called my-file.sav.
24. Chapter 5: Using pspp 16
PSPP get file=’my-file.sav’.
You will encounter several instances of this in future examples.
5.1.5 Saving data to a pspp file.
If you want to save your data, along with the variable definitions so that you or other pspp
users can use it later, you can do this with the SAVE command.
The following syntax will save the existing data and variables to a file called my-new-
file.sav.
PSPP save outfile=’my-new-file.sav’.
If my-new-file.sav already exists, then it will be overwritten. Otherwise it will be created.
5.1.6 Reading data from other sources
Sometimes it’s useful to be able to read data from comma separated text, from spreadsheets,
databases or other sources. In these instances you should use the GET DATA command (see
Section 9.4 [GET DATA], page 83).
5.1.7 Exiting PSPP
Use the FINISH command to exit PSPP:
PSPP finish.
5.2 Data Screening and Transformation
Once data has been entered, it is often desirable, or even necessary, to transform it in some
way before performing analysis upon it. At the very least, it’s good practice to check for
errors.
5.2.1 Identifying incorrect data
Data from real sources is rarely error free. pspp has a number of procedures which can be
used to help identify data which might be incorrect.
The DESCRIPTIVES command (see Section 15.1 [DESCRIPTIVES], page 127) is used
to generate simple linear statistics for a dataset. It is also useful for identifying potential
problems in the data. The example file physiology.sav contains a number of physiological
measurements of a sample of healthy adults selected at random. However, the data entry
clerk made a number of mistakes when entering the data. Example 5.2 illustrates the use
of DESCRIPTIVES to screen this data and identify the erroneous values.
27. Chapter 5: Using pspp 19
If you now re-run the DESCRIPTIVES or EXAMINE commands in Example 5.2 and
Example 5.3 you will see a data summary with more plausible parameters. You will also
notice that the data summaries indicate the two missing values.
5.2.3 Inverting negatively coded variables
Data entry errors are not the only reason for wanting to recode data. The sample file
hotel.sav comprises data gathered from a customer satisfaction survey of clients at a par-
ticular hotel. In Example 5.4, this file is loaded for analysis. The line display dictionary.
tells pspp to display the variables and associated data. The output from this command has
been omitted from the example for the sake of clarity, but you will notice that each of the
variables v1, v2 . . . v5 are measured on a 5 point Likert scale, with 1 meaning “Strongly
disagree” and 5 meaning “Strongly agree”. Whilst variables v1, v2 and v4 record responses
to a positively posed question, variables v3 and v5 are responses to negatively worded ques-
tions. In order to perform meaningful analysis, we need to recode the variables so that they
all measure in the same direction. We could use the RECODE command, with syntax such
as:
recode v3 (1 = 5) (2 = 4) (4 = 2) (5 = 1).
However an easier and more elegant way uses the COMPUTE command (see Section 12.3
[COMPUTE], page 113). Since the variables are Likert variables in the range (1 . . . 5),
subtracting their value from 6 has the effect of inverting them:
compute var = 6 - var.
Example 5.4 uses this technique to recode the variables v3 and v5. After applying COMPUTE
for both variables, all subsequent commands will use the inverted values.
5.2.4 Testing data consistency
A sensible check to perform on survey data is the calculation of reliability. This gives
the statistician some confidence that the questionnaires have been completed thoughtfully.
If you examine the labels of variables v1, v3 and v4, you will notice that they ask very
similar questions. One would therefore expect the values of these variables (after recoding)
to closely follow one another, and we can test that with the RELIABILITY command (see
Section 15.16 [RELIABILITY], page 151). Example 5.4 shows a pspp session where the
user (after recoding negatively scaled variables) requests reliability statistics for v1, v3 and
v4.
29. Chapter 5: Using pspp 21
compensated for by applying a logarithmic transformation. This is done with the COMPUTE
command in the line
compute mtbf_ln = ln (mtbf).
Rather than redefining the existing variable, this use of COMPUTE defines a new variable
mtbf ln which is the natural logarithm of mtbf. The final command in this example calls
EXAMINE on this new variable, and it can be seen from the results that both the skewness
and kurtosis for mtbf ln are very close to zero. This provides some confidence that the
mtbf ln variable is normally distributed and thus safe for linear analysis. In the event that
no suitable transformation can be found, then it would be worth considering an appropriate
non-parametric test instead of a linear one. See Section 15.10 [NPAR TESTS], page 141,
for information about non-parametric tests.
31. Chapter 5: Using pspp 23
5.3 Hypothesis Testing
One of the most fundamental purposes of statistical analysis is hypothesis testing. Re-
searchers commonly need to test hypotheses about a set of data. For example, she might
want to test whether one set of data comes from the same distribution as another, or whether
the mean of a dataset significantly differs from a particular value. This section presents just
some of the possible tests that pspp offers.
The researcher starts by making a null hypothesis. Often this is a hypothesis which he
suspects to be false. For example, if he suspects that A is greater than B he will state the
null hypothesis as A = B.2
The p-value is a recurring concept in hypothesis testing. It is the highest acceptable
probability that the evidence implying a null hypothesis is false, could have been obtained
when the null hypothesis is in fact true. Note that this is not the same as “the probability
of making an error” nor is it the same as “the probability of rejecting a hypothesis when it
is true”.
5.3.1 Testing for differences of means
A common statistical test involves hypotheses about means. The T-TEST command is used
to find out whether or not two separate subsets have the same mean.
Example 5.6 uses the file physiology.sav previously encountered. A researcher sus-
pected that the heights and core body temperature of persons might be different depending
upon their sex. To investigate this, he posed two null hypotheses:
• The mean heights of males and females in the population are equal.
• The mean body temperature of males and females in the population are equal.
For the purposes of the investigation the researcher decided to use a p-value of 0.05.
In addition to the T-test, the T-TEST command also performs the Levene test for equal
variances. If the variances are equal, then a more powerful form of the T-test can be
used. However if it is unsafe to assume equal variances, then an alternative calculation is
necessary. pspp performs both calculations.
For the height variable, the output shows the significance of the Levene test to be 0.33
which means there is a 33% probability that the Levene test produces this outcome when
the variances are equal. Had the significance been less than 0.05, then it would have been
unsafe to assume that the variances were equal. However, because the value is higher than
0.05 the homogeneity of variances assumption is safe and the “Equal Variances” row (the
more powerful test) can be used. Examining this row, the two tailed significance for the
height t-test is less than 0.05, so it is safe to reject the null hypothesis and conclude that
the mean heights of males and females are unequal.
For the temperature variable, the significance of the Levene test is 0.58 so again, it is
safe to use the row for equal variances. The equal variances row indicates that the two
tailed significance for temperature is 0.20. Since this is greater than 0.05 we must reject
the null hypothesis and conclude that there is insufficient evidence to suggest that the body
temperature of male and female persons are different.
2
This example assumes that it is already proven that B is not greater than A.
33. Chapter 5: Using pspp 25
not only tests if the variables are related, but also identifies the potential linear relationship.
See Section 15.15 [REGRESSION], page 149.
35. Chapter 5: Using pspp 27
The coefficients in the first table suggest that the formula mttr = 9.81 + 3.1 × mtbf +
1.09×duty cycle can be used to predict the time to repair. However, the significance value
for the duty cycle coefficient is very high, which would make this an unsafe predictor. For
this reason, the test was repeated, but omitting the duty cycle variable. This time, the
significance of all coefficients no higher than 0.06, suggesting that at the 0.06 level, the
formula mttr = 10.5 + 3.11 × mtbf is a reliable predictor of the time to repair.
36. Chapter 6: The pspp language 28
6 The pspp language
This chapter discusses elements common to many pspp commands. Later chapters will
describe individual commands in detail.
6.1 Tokens
pspp divides most syntax file lines into series of short chunks called tokens. Tokens are then
grouped to form commands, each of which tells pspp to take some action—read in data,
write out data, perform a statistical procedure, etc. Each type of token is described below.
Identifiers Identifiers are names that typically specify variables, commands, or subcom-
mands. The first character in an identifier must be a letter, ‘#’, or ‘@’. The
remaining characters in the identifier must be letters, digits, or one of the fol-
lowing special characters:
. _ $ # @
Identifiers may be any length, but only the first 64 bytes are significant. Iden-
tifiers are not case-sensitive: foobar, Foobar, FooBar, FOOBAR, and FoObaR are
different representations of the same identifier.
Some identifiers are reserved. Reserved identifiers may not be used in any con-
text besides those explicitly described in this manual. The reserved identifiers
are:
ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH
Keywords Keywords are a subclass of identifiers that form a fixed part of command syntax.
For example, command and subcommand names are keywords. Keywords may
be abbreviated to their first 3 characters if this abbreviation is unambiguous.
(Unique abbreviations of 3 or more characters are also accepted: ‘FRE’, ‘FREQ’,
and ‘FREQUENCIES’ are equivalent when the last is a keyword.)
Reserved identifiers are always used as keywords. Other identifiers may be used
both as keywords and as user-defined identifiers, such as variable names.
Numbers Numbers are expressed in decimal. A decimal point is optional. Numbers may
be expressed in scientific notation by adding ‘e’ and a base-10 exponent, so that
‘1.234e3’ has the value 1234. Here are some more examples of valid numbers:
-5 3.14159265359 1e100 -.707 8945.
Negative numbers are expressed with a ‘-’ prefix. However, in situations where
a literal ‘-’ token is expected, what appears to be a negative number is treated
as ‘-’ followed by a positive number.
No white space is allowed within a number token, except for horizontal white
space between ‘-’ and the rest of the number.
The last example above, ‘8945.’ will be interpreted as two tokens, ‘8945’ and
‘.’, if it is the last token on a line. See Section 6.2 [Forming commands of
tokens], page 29.
Strings Strings are literal sequences of characters enclosed in pairs of single quotes (‘’’)
or double quotes (‘’). To include the character used for quoting in the string,
37. Chapter 6: The pspp language 29
double it, e.g. ‘’it’’s an apostrophe’’. White space and case of letters are
significant inside strings.
Strings can be concatenated using ‘+’, so that ‘a + ’b’ + ’c’’ is equivalent
to ‘’abc’’. So that a long string may be broken across lines, a line break may
precede or follow, or both precede and follow, the ‘+’. (However, an entirely
blank line preceding or following the ‘+’ is interpreted as ending the current
command.)
Strings may also be expressed as hexadecimal character values by prefixing
the initial quote character by ‘x’ or ‘X’. Regardless of the syntax file or ac-
tive dataset’s encoding, the hexadecimal digits in the string are interpreted as
Unicode characters in UTF-8 encoding.
Individual Unicode code points may also be expressed by specifying the hex-
adecimal code point number in single or double quotes preceded by ‘u’ or ‘U’.
For example, Unicode code point U+1D11E, the musical G clef character, could
be expressed as U’1D11E’. Invalid Unicode code points (above U+10FFFF or
in between U+D800 and U+DFFF) are not allowed.
When strings are concatenated with ‘+’, each segment’s prefix is considered
individually. For example, ’The G clef symbol is:’ + u1d11e + . inserts
a G clef symbol in the middle of an otherwise plain text string.
Punctuators and Operators
These tokens are the punctuators and operators:
, / = ( ) + - * / ** = = ~= | .
Most of these appear within the syntax of commands, but the period (‘.’)
punctuator is used only at the end of a command. It is a punctuator only as
the last character on a line (except white space). When it is the last non-space
character on a line, a period is not treated as part of another token, even if it
would otherwise be part of, e.g., an identifier or a floating-point number.
6.2 Forming commands of tokens
Most pspp commands share a common structure. A command begins with a command
name, such as FREQUENCIES, DATA LIST, or N OF CASES. The command name may be ab-
breviated to its first word, and each word in the command name may be abbreviated to its
first three or more characters, where these abbreviations are unambiguous.
The command name may be followed by one or more subcommands. Each subcommand
begins with a subcommand name, which may be abbreviated to its first three letters. Some
subcommands accept a series of one or more specifications, which follow the subcommand
name, optionally separated from it by an equals sign (‘=’). Specifications may be separated
from each other by commas or spaces. Each subcommand must be separated from the next
(if any) by a forward slash (‘/’).
There are multiple ways to mark the end of a command. The most common way is to
end the last line of the command with a period (‘.’) as described in the previous section
(see Section 6.1 [Tokens], page 28). A blank line, or one that consists only of white space
or comments, also ends a command.
38. Chapter 6: The pspp language 30
6.3 Syntax Variants
There are three variants of command syntax, which vary only in how they detect the end
of one command and the start of the next.
In interactive mode, which is the default for syntax typed at a command prompt, a
period as the last non-blank character on a line ends a command. A blank line also ends a
command.
In batch mode, an end-of-line period or a blank line also ends a command. Additionally,
it treats any line that has a non-blank character in the leftmost column as beginning a new
command. Thus, in batch mode the second and subsequent lines in a command must be
indented.
Regardless of the syntax mode, a plus sign, minus sign, or period in the leftmost column
of a line is ignored and causes that line to begin a new command. This is most useful in
batch mode, in which the first line of a new command could not otherwise be indented, but
it is accepted regardless of syntax mode.
The default mode for reading commands from a file is auto mode. It is the same as
batch mode, except that a line with a non-blank in the leftmost column only starts a new
command if that line begins with the name of a pspp command. This correctly interprets
most valid pspp syntax files regardless of the syntax mode for which they are intended.
The --interactive (or -i) or --batch (or -b) options set the syntax mode for files
listed on the pspp command line. See Section 3.1 [Main Options], page 4, for more details.
6.4 Types of Commands
Commands in pspp are divided roughly into six categories:
Utility commands
Set or display various global options that affect pspp operations. May appear
anywhere in a syntax file. See Chapter 16 [Utility commands], page 153.
File definition commands
Give instructions for reading data from text files or from special binary “system
files”. Most of these commands replace any previous data or variables with new
data or variables. At least one file definition command must appear before the
first command in any of the categories below. See Chapter 8 [Data Input and
Output], page 64.
Input program commands
Though rarely used, these provide tools for reading data files in arbitrary textual
or binary formats. See Section 8.9 [INPUT PROGRAM], page 73.
Transformations
Perform operations on data and write data to output files. Transformations are
not carried out until a procedure is executed.
Restricted transformations
Transformations that cannot appear in certain contexts. See Section 6.5 [Order
of Commands], page 31, for details.
39. Chapter 6: The pspp language 31
Procedures
Analyze data, writing results of analyses to the listing file. Cause transforma-
tions specified earlier in the file to be performed. In a more general sense, a
procedure is any command that causes the active dataset (the data) to be read.
6.5 Order of Commands
pspp does not place many restrictions on ordering of commands. The main restriction is
that variables must be defined before they are otherwise referenced. This section describes
the details of command ordering, but most users will have no need to refer to them.
pspp possesses five internal states, called initial, input-program file-type, transformation,
and procedure states. (Please note the distinction between the INPUT PROGRAM and FILE
TYPE commands and the input-program and file-type states.)
pspp starts in the initial state. Each successful completion of a command may cause a
state transition. Each type of command has its own rules for state transitions:
Utility commands
• Valid in any state.
• Do not cause state transitions. Exception: when N OF CASES is executed in
the procedure state, it causes a transition to the transformation state.
DATA LIST
• Valid in any state.
• When executed in the initial or procedure state, causes a transition to the
transformation state.
• Clears the active dataset if executed in the procedure or transformation
state.
INPUT PROGRAM
• Invalid in input-program and file-type states.
• Causes a transition to the intput-program state.
• Clears the active dataset.
FILE TYPE
• Invalid in intput-program and file-type states.
• Causes a transition to the file-type state.
• Clears the active dataset.
Other file definition commands
• Invalid in input-program and file-type states.
• Cause a transition to the transformation state.
• Clear the active dataset, except for ADD FILES, MATCH FILES, and UPDATE.
Transformations
• Invalid in initial and file-type states.
• Cause a transition to the transformation state.
Restricted transformations
• Invalid in initial, input-program, and file-type states.
40. Chapter 6: The pspp language 32
• Cause a transition to the transformation state.
Procedures
• Invalid in initial, input-program, and file-type states.
• Cause a transition to the procedure state.
6.6 Handling missing observations
pspp includes special support for unknown numeric data values. Missing observations are
assigned a special value, called the system-missing value. This “value” actually indicates the
absence of a value; it means that the actual value is unknown. Procedures automatically
exclude from analyses those observations or cases that have missing values. Details of
missing value exclusion depend on the procedure and can often be controlled by the user;
refer to descriptions of individual procedures for details.
The system-missing value exists only for numeric variables. String variables always have
a defined value, even if it is only a string of spaces.
Variables, whether numeric or string, can have designated user-missing values. Every
user-missing value is an actual value for that variable. However, most of the time user-
missing values are treated in the same way as the system-missing value.
For more information on missing values, see the following sections: Section 6.7 [Datasets],
page 32, Section 11.6 [MISSING VALUES], page 102, Chapter 7 [Expressions], page 46. See
also the documentation on individual procedures for information on how they handle missing
values.
6.7 Datasets
pspp works with data organized into datasets. A dataset consists of a set of variables, which
taken together are said to form a dictionary, and one or more cases, each of which has one
value for each variable.
At any given time pspp has exactly one distinguished dataset, called the active dataset.
Most pspp commands work only with the active dataset. In addition to the active dataset,
pspp also supports any number of additional open datasets. The DATASET commands can
choose a new active dataset from among those that are open, as well as create and destroy
datasets (see Section 8.4 [DATASET], page 65).
The sections below describe variables in more detail.
6.7.1 Attributes of Variables
Each variable has a number of attributes, including:
Name An identifier, up to 64 bytes long. Each variable must have a different name.
See Section 6.1 [Tokens], page 28.
Some system variable names begin with ‘$’, but user-defined variables’ names
may not begin with ‘$’.
The final character in a variable name should not be ‘.’, because such an iden-
tifier will be misinterpreted when it is the final token on a line: FOO. will be
divided into two separate tokens, ‘FOO’ and ‘.’, indicating end-of-command. See
Section 6.1 [Tokens], page 28.
41. Chapter 6: The pspp language 33
The final character in a variable name should not be ‘_’, because some such
identifiers are used for special purposes by pspp procedures.
As with all pspp identifiers, variable names are not case-sensitive. pspp capi-
talizes variable names on output the same way they were capitalized at their
point of definition in the input.
Type Numeric or string.
Width (string variables only) String variables with a width of 8 characters or fewer
are called short string variables. Short string variables may be used in a few
contexts where long string variables (those with widths greater than 8) are not
allowed.
Position Variables in the dictionary are arranged in a specific order. DISPLAY can be
used to show this order: see Section 11.3 [DISPLAY], page 100.
Initialization
Either reinitialized to 0 or spaces for each case, or left at its existing value. See
Section 11.5 [LEAVE], page 101.
Missing values
Optionally, up to three values, or a range of values, or a specific value plus a
range, can be specified as user-missing values. There is also a system-missing
value that is assigned to an observation when there is no other obvious value for
that observation. Observations with missing values are automatically excluded
from analyses. User-missing values are actual data values, while the system-
missing value is not a value at all. See Section 6.6 [Missing Observations],
page 32.
Variable label
A string that describes the variable. See Section 11.15 [VARIABLE LABELS],
page 107.
Value label
Optionally, these associate each possible value of the variable with a string. See
Section 11.12 [VALUE LABELS], page 105.
Print format
Display width, format, and (for numeric variables) number of decimal places.
This attribute does not affect how data are stored, just how they are displayed.
Example: a width of 8, with 2 decimal places. See Section 6.7.4 [Input and
Output Formats], page 34.
Write format
Similar to print format, but used by the WRITE command (see Section 8.17
[WRITE], page 80).
Custom attributes
User-defined associations between names and values. See Section 11.14 [VARI-
ABLE ATTRIBUTE], page 106.
Role The intended role of a variable for use in dialog boxes in graphical user inter-
faces. See Section 11.19 [VARIABLE ROLE], page 108.
42. Chapter 6: The pspp language 34
6.7.2 Variables Automatically Defined by pspp
There are seven system variables. These are not like ordinary variables because system
variables are not always stored. They can be used only in expressions. These system
variables, whose values and output formats cannot be modified, are described below.
$CASENUM Case number of the case at the moment. This changes as cases are shuffled
around.
$DATE Date the pspp process was started, in format A9, following the pattern DD MMM
YY.
$JDATE Number of days between 15 Oct 1582 and the time the pspp process was started.
$LENGTH Page length, in lines, in format F11.
$SYSMIS System missing value, in format F1.
$TIME Number of seconds between midnight 14 Oct 1582 and the time the active
dataset was read, in format F20.
$WIDTH Page width, in characters, in format F3.
6.7.3 Lists of variable names
To refer to a set of variables, list their names one after another. Optionally, their names
may be separated by commas. To include a range of variables from the dictionary in the
list, write the name of the first and last variable in the range, separated by TO. For instance,
if the dictionary contains six variables with the names ID, X1, X2, GOAL, MET, and NEXTGOAL,
in that order, then X2 TO MET would include variables X2, GOAL, and MET.
Commands that define variables, such as DATA LIST, give TO an alternate meaning. With
these commands, TO define sequences of variables whose names end in consecutive integers.
The syntax is two identifiers that begin with the same root and end with numbers, separated
by TO. The syntax X1 TO X5 defines 5 variables, named X1, X2, X3, X4, and X5. The
syntax ITEM0008 TO ITEM0013 defines 6 variables, named ITEM0008, ITEM0009, ITEM0010,
ITEM0011, ITEM0012, and ITEM00013. The syntaxes QUES001 TO QUES9 and QUES6 TO QUES3
are invalid.
After a set of variables has been defined with DATA LIST or another command with this
method, the same set can be referenced on later commands using the same syntax.
6.7.4 Input and Output Formats
An input format describes how to interpret the contents of an input field as a number or
a string. It might specify that the field contains an ordinary decimal number, a time or
date, a number in binary or hexadecimal notation, or one of several other notations. Input
formats are used by commands such as DATA LIST that read data or syntax files into the
pspp active dataset.
Every input format corresponds to a default output format that specifies the formatting
used when the value is output later. It is always possible to explicitly specify an output
format that resembles the input format. Usually, this is the default, but in cases where the
input format is unfriendly to human readability, such as binary or hexadecimal formats, the
default output format is an easier-to-read decimal format.
43. Chapter 6: The pspp language 35
Every variable has two output formats, called its print format and write format. Print
formats are used in most output contexts; write formats are used only by WRITE (see
Section 8.17 [WRITE], page 80). Newly created variables have identical print and write
formats, and FORMATS, the most commonly used command for changing formats (see
Section 11.4 [FORMATS], page 101), sets both of them to the same value as well. Thus,
most of the time, the distinction between print and write formats is unimportant.
Input and output formats are specified to pspp with a format specification of the form
TYPEw or TYPEw.d, where TYPE is one of the format types described later, w is a field
width measured in columns, and d is an optional number of decimal places. If d is omitted,
a value of 0 is assumed. Some formats do not allow a nonzero d to be specified.
The following sections describe the input and output formats supported by pspp.
6.7.4.1 Basic Numeric Formats
The basic numeric formats are used for input and output of real numbers in standard or
scientific notation. The following table shows an example of how each format displays
positive and negative numbers with the default decimal point setting:
Format 3141.59 -3141.59
F8.2 3141.59 -3141.59
COMMA9.2 3,141.59 -3,141.59
DOT9.2 3.141,59 -3.141,59
DOLLAR10.2 $3,141.59 -$3,141.59
PCT9.2 3141.59% -3141.59%
E8.1 3.1E+003 -3.1E+003
On output, numbers in F format are expressed in standard decimal notation with the
requested number of decimal places. The other formats output some variation on this style:
• Numbers in COMMA format are additionally grouped every three digits by inserting
a grouping character. The grouping character is ordinarily a comma, but it can be
changed to a period (see [SET DECIMAL], page 159).
• DOT format is like COMMA format, but it interchanges the role of the decimal point
and grouping characters. That is, the current grouping character is used as a decimal
point and vice versa.
• DOLLAR format is like COMMA format, but it prefixes the number with ‘$’.
• PCT format is like F format, but adds ‘%’ after the number.
• The E format always produces output in scientific notation.
On input, the basic numeric formats accept positive and numbers in standard decimal
notation or scientific notation. Leading and trailing spaces are allowed. An empty or all-
spaces field, or one that contains only a single period, is treated as the system missing
value.
In scientific notation, the exponent may be introduced by a sign (‘+’ or ‘-’), or by one of
the letters ‘e’ or ‘d’ (in uppercase or lowercase), or by a letter followed by a sign. A single
space may follow the letter or the sign or both.
44. Chapter 6: The pspp language 36
On fixed-format DATA LIST (see Section 8.5.1 [DATA LIST FIXED], page 66) and in a
few other contexts, decimals are implied when the field does not contain a decimal point.
In F6.5 format, for example, the field 314159 is taken as the value 3.14159 with implied
decimals. Decimals are never implied if an explicit decimal point is present or if scientific
notation is used.
E and F formats accept the basic syntax already described. The other formats allow
some additional variations:
• COMMA, DOLLAR, and DOT formats ignore grouping characters within the integer
part of the input field. The identity of the grouping character depends on the format.
• DOLLAR format allows a dollar sign to precede the number. In a negative number,
the dollar sign may precede or follow the minus sign.
• PCT format allows a percent sign to follow the number.
All of the basic number formats have a maximum field width of 40 and accept no more
than 16 decimal places, on both input and output. Some additional restrictions apply:
• As input formats, the basic numeric formats allow no more decimal places than the field
width. As output formats, the field width must be greater than the number of decimal
places; that is, large enough to allow for a decimal point and the number of requested
decimal places. DOLLAR and PCT formats must allow an additional column for ‘$’
or ‘%’.
• The default output format for a given input format increases the field width enough to
make room for optional input characters. If an input format calls for decimal places,
the width is increased by 1 to make room for an implied decimal point. COMMA,
DOT, and DOLLAR formats also increase the output width to make room for grouping
characters. DOLLAR and PCT further increase the output field width by 1 to make
room for ‘$’ or ‘%’. The increased output width is capped at 40, the maximum field
width.
• The E format is exceptional. For output, E format has a minimum width of 7 plus the
number of decimal places. The default output format for an E input format is an E
format with at least 3 decimal places and thus a minimum width of 10.
More details of basic numeric output formatting are given below:
• Output rounds to nearest, with ties rounded away from zero. Thus, 2.5 is output as 3
in F1.0 format, and -1.125 as -1.13 in F5.1 format.
• The system-missing value is output as a period in a field of spaces, placed in the
decimal point’s position, or in the rightmost column if no decimal places are requested.
A period is used even if the decimal point character is a comma.
• A number that does not fill its field is right-justified within the field.
• A number is too large for its field causes decimal places to be dropped to make room.
If dropping decimals does not make enough room, scientific notation is used if the field
is wide enough. If a number does not fit in the field, even in scientific notation, the
overflow is indicated by filling the field with asterisks (‘*’).
• COMMA, DOT, and DOLLAR formats insert grouping characters only if space is
available for all of them. Grouping characters are never inserted when all decimal
places must be dropped. Thus, 1234.56 in COMMA5.2 format is output as ‘ 1235’
45. Chapter 6: The pspp language 37
without a comma, even though there is room for one, because all decimal places were
dropped.
• DOLLAR or PCT format drop the ‘$’ or ‘%’ only if the number would not fit at all
without it. Scientific notation with ‘$’ or ‘%’ is preferred to ordinary decimal notation
without it.
• Except in scientific notation, a decimal point is included only when it is followed by
a digit. If the integer part of the number being output is 0, and a decimal point is
included, then the zero before the decimal point is dropped.
In scientific notation, the number always includes a decimal point, even if it is not
followed by a digit.
• A negative number includes a minus sign only in the presence of a nonzero digit: -0.01
is output as ‘-.01’ in F4.2 format but as ‘ .0’ in F4.1 format. Thus, a “negative
zero” never includes a minus sign.
• In negative numbers output in DOLLAR format, the dollar sign follows the negative
sign. Thus, -9.99 in DOLLAR6.2 format is output as -$9.99.
• In scientific notation, the exponent is output as ‘E’ followed by ‘+’ or ‘-’ and exactly
three digits. Numbers with magnitude less than 10**-999 or larger than 10**999 are not
supported by most computers, but if they are supported then their output is considered
to overflow the field and will be output as asterisks.
• On most computers, no more than 15 decimal digits are significant in output, even
if more are printed. In any case, output precision cannot be any higher than input
precision; few data sets are accurate to 15 digits of precision. Unavoidable loss of
precision in intermediate calculations may also reduce precision of output.
• Special values such as infinities and “not a number” values are usually converted to the
system-missing value before printing. In a few circumstances, these values are output
directly. In fields of width 3 or greater, special values are output as however many
characters will fit from +Infinity or -Infinity for infinities, from NaN for “not a
number,” or from Unknown for other values (if any are supported by the system). In
fields under 3 columns wide, special values are output as asterisks.
6.7.4.2 Custom Currency Formats
The custom currency formats are closely related to the basic numeric formats, but they
allow users to customize the output format. The SET command configures custom currency
formats, using the syntax
SET CCx=string.
where x is A, B, C, D, or E, and string is no more than 16 characters long.
string must contain exactly three commas or exactly three periods (but not both), except
that a single quote character may be used to “escape” a following comma, period, or single
quote. If three commas are used, commas will be used for grouping in output, and a period
will be used as the decimal point. Uses of periods reverses these roles.
The commas or periods divide string into four fields, called the negative prefix, prefix,
suffix, and negative suffix, respectively. The prefix and suffix are added to output whenever
space is available. The negative prefix and negative suffix are always added to a negative
number when the output includes a nonzero digit.