Graphs typically have a high information density. We routinely design graphs to distill the insights from the raw computing data and convey the ideas effectively. I used to heavily leverage fancies tools such as matplotlib, ggplot2, Matlab, R to produce graphs. Eventually I settled down on gnuplot (not GNU) due to its simplicity, decoupling from data preperation (typically done with Python), and great flexibility to tailor the fine-grained details of the style. It is now an indispensable part of my toolbox and my Swiss Army knife for visualizing numerical data.

Notes

gnuplot is NOT a statistical package

gnuplot has a relatively strong assumption on the format of the input file, which is actually nice since it clearly decouples the data preprocessing and focuses exclusively on the the plotting itself (and interacts well with other PLs such as Python, Perl). Though it does support calculations and inline processing over the data, e.g., plot [][-2:2] sin(x), x, x-(x**3)/6, plot exp(-x**2), the scope of processing is relatively simple and limited. However, I strongly agree with the viewpoint that gnuplot limitations are actually regarded as as strengths in disguise (Gnuplot in Action: Understanding Data with Graphs). It echos the Unix philosophy: do one thing, and do it really well.

Data file structure

Besides function, gnuplot’s plot also supports data from file. The basic data file structure is column oriented: \(L\) non-empty lines indicating \(R=L\) records, each having \(C\) while-space delimitered columns (1-indexing). One could change the delimiter, e.g., by set datafile separator ",". The column 0 is a hidden column with values being the line number in the current data set (i.e., gnuplot resets automatically when encountering a new data set) while the column -2 indicates the index of the current data set. Example usage: plot "testdata" using 0:1, plot "test.data" using 1:-2. If there is a single blank line, the records above and below it belong to the same data set but considered as discontinuous — the record before and after it will not be connected by line.

If there is a double blank line, the recoreds above and below belong to different data sets (test2.data) that could be addressed by index directive (0-indexing): index {int:start}[:{int:end}][:{int:step}].

If we want to cherrypick a subset of the records (test3.data), use every directive (0-indexing) in syntax: every {int:step}[::{int:start}[::{int:end}]]. Note that unlike index directive, it uses double colon to separate the arguments.

# plot "test3.data" ev 2 using 1:2 w lp
0 1
10 -1
1 1
10 -1
2 2
10 -1
3 3
10 -1
4 5

Supported data manipulation in gnuplot

smooth directive to the plot command support multiple modes to manipulate the records: unique, frequency, bezier, sbezier, csplines, acsplines. Example usages: plot "test.data" u 1:2 smooth unique with lines, plot "test.data" u 1:2 with linespoints, "" u 1:2 title "bezier" s bezier. E.g., unique mode sorts the records based on x axis keys and sanitizes the data points with the same x axis key by a single data point with the mean of their values. Frequency mode sanitizes the records after sort by merging points with the same x axis key into a single point with the value as the sum of their values (which is useful when plotting histograms). Other modes such as beizier, sbezier apply data fitting to the noisy data using different algorithms.

gnuplot also provides handy built-in (actually quite comprehensive) math functions and user-defined variables/functions (all global scope). The following gnuplot script shows the representative usages.

# Gnuplot in Action: Understanding Data with Graphs
sqrt3 = sqrt(3)
a = "Double quote\" escaped."
b = a[2:4]
c = a[3:] # 1-indexing

# Examples of defining functions
f(x) = -x * log(x)
gauss(x,m,s) = exp(-0.5*((x-m)/s)**2)/sqrt(2*pi*s**2) # define a 1D gaussian
binom(n,k) = n!/(k!*(n-k)!) 
min(a,b) = (a<b) ? a : b
step(x) = (x<0) ? 0 : 1
g(x) = cos(a*x)/a
head(s) = s[1:3] # Besides built-in string functions, one could define custom ones too

y = f(1)
# x is the default independent dummy variable when plot
plot gauss(x, 0, 1)
# one could change x to t
set dummy t
plot f(t)
plot a=1 g(t), a=2 g(t)

z = { 1, 1 }
z0 = real(z)
z1 = imag(z)

show variables
show functions

One could also enfore inline transformations per record along with the using directive. Arguments wrapped in parentheses will be treated as expressions to evaluate. One could still access the column values inside parentheses using the dollar sign. Examples:

  • plot "test.data" u 1:(($2+$3)/2.0) w l
  • plot "test.data" u (log($1)):(log($2)) w l
  • plot "test.data" u 1:($2+sqrt($3)) w l, "" using 1:($2-sqrt($3)) w l
  • plot "test.data" u 1:($2>0?log($2):1/0) w l (1/0 undefined value is a trick to drop invalid data)

However, my personally take is that it is better to forget about these and offload these manipulations to external (and powerful) data preprocessing programs as much as possible to produce a clean, straightforward data file so that minimal manipulation is done by gnuplot since gnuplot is not designed for such purposes.

There are many other interesting gnuplot functionalities with details in Gnuplot In Action and gnuplot manual.

Demo

I maintained a store of gnuplot scripts to be reused and continuously polished in the future.