Program Package: Pajek / Pajek-XXL

How to use Pajek-XXL and Pajek-3XL

Examples; Download Pajek-XXL and Pajek-3XL.

Pajek-XXL main window

What is Pajek-XXL

Pajek-XXL is a special edition of program Pajek. Its memory consumption is much lower. For the same network it needs at least 2-3 times less physical memory than Pajek. Operations that are memory intensive (e.g. generating random networks, extracting, shrinking,...) are also faster. Table on the right gives comparison of space needed to store and time needed to generate random network with 10.000.000 vertices and 40.000.000 lines (1.73GHz processor) in Pajek and Pajek-XXL.

Resources	Space (G RAM)		Time (secs)
Windows	32	64	32	64
Pajek	3.25	4.35	15	15
PajekXXL	1.64	2.64	12	12

Calculating space needed to store a huge network in Pajek-XXL

Internal data structure in Pajek-XXL is optimized to use all available memory very efficiently. Therefore space needed to store a network in Pajek-XXL can be precisely calculated.
Let n be number of vertices and m number of lines in a network. Then (calculations in bytes):

4n + 40m < min (4.000.000.000, available RAM) for Pajek32-XXL
8n + 64m < available RAM (e.g. 16.000.000.000) for Pajek64-XXL

Note: With Pajek32-XXL (as with any other 32 bit program) we can use 4G RAM at most.
Of course we must leave some memory free also for results of Pajek-XXL operations (e.g. to store partitions and/or vectors that are obtained as results).

Rough estimation:

sparse networks with some tens of millions of vertices can be analysed on computers having up to 4G RAM of memory,
for sparse networks where number of vertices is around hundred millions 16G RAM or more is needed,
for networks where number of vertices is close to a billion 128G RAM or more is needed.

It is important to stress again: In Pajek-XXL number of lines is much more important than number of vertices. As can be noticed from above two formulas:
· 1 line needs the same amount of space as 10 vertices in Pajek32-XXL (10 : 1)
· 1 line needs the same amount of space as 8 vertices in Pajek64-XXL (64 : 8)
That means that performance of Pajek-XXL is especially good when networks are huge and very sparse. Table on the right gives a comparison of space needed in Pajek64-XXL to store two networks with the same number of vertices (100.000.000). The first network has 200.000.000 and the second one 100.000.000 edges. The second network occupies almost two times less space than the first.

Vertices (millions)	100	100
Edges (millions)	200	100
Space (G RAM)	13.6	7.2

Pajek-XXL vs. Pajek-3XL

Pajek-XXL uses 32 bit (4 bytes) integers for vertices numbers. That is why the highest number of vertices that Pajek-XXL can handle is set to two billions (2^31 or approx. 2*10^9). If network contains more than two billions of vertices Pajek-3XL must be used.
Pajek-3XL uses 64 bit (8 bytes) integers for vertices numbers. The highest number of vertices that Pajek-3XL can handle is currently set to 10 billions (10*10^9), but can easily be further increased.

Formula for calculating space needed to store a huge network in Pajek-3XL is exactly the same as formula for Pajek-XXL which was described in the previous section (8n + 64m bytes in case of 64 bit OS). That means that space needed to store a network in Pajek-3XL and Pajek-XXL is exactly the same. That is really good news.

But there is an important difference:
· Any Partition on n vertices takes 8n bytes in Pajek-3XL but only 4n bytes in Pajek-XXL.

Example:
Network with one billion of vertices and one billion of lines occupies 72 G RAM both in Pajek-XXL and Pajek-3XL. But each Partition on these vertices occupies 8G RAM in Pajek-3XL and only 4G RAM in Pajek-XXL. Network and ten partitions occupy altogether 152G RAM in Pajek-3XL and only 112G RAM in Pajek-XXL. In case you have computer with 128G RAM, you might be able to load and later analyze this network by producing 10 additional Partitions using Pajek-XXL but not using Pajek-3XL.

The recommendation when to use Pajek-3XL instead of Pajek-XXL is therefore straightforward:
Use Pajek-3XL only for networks that cannot be loaded to Pajek-XXL - networks with more than 2 billions of vertices.

Regarding Vector - another object which is often used in all versions of Pajek:
There is no difference in size needed to store Vectors in Pajek-3XL and Pajek-XXL. More pricesely:
Any Vector on n vertices takes 8n bytes both in Pajek-3XL and Pajek-XXL.

In the following sections typical sequences that are usually used when analyzing huge networks are explained. In explanations Pajek-XXL is used, but the same commands are available also in Pajek-3XL.

Pajek-XXL uses

Pajek-XXL is used for analysis of huge networks - networks that cannot be loaded into physical memory using 'ordinary' Pajek. It is supposed that Pajek-XXL is used to extract some smaller, interesting parts of a huge network that can be later further analysed (and visualized) with more sophisticated methods available in Pajek.

The other possible use of Pajek-XXL is analysis of huge networks where identity of vertices is not important. Such examples are simulation studies on random networks, where we are interested in general properties, like distributions (e.g. degrees, triads,...). Using Repeat Last Command we can generate some thousands of large random networks and compute mean values, variances and other statistics of interesting properties. For details see Chapter 13, Monte Carlo Simulation.

On the first sight Pajek-XXL looks very similar to Pajek but some menu options are not available. Permutation as a data object does not exist in Pajek-XXL, instead we have a new data object called Vertex ID. There is also additional menu item labeled Vertex ID in the main menu. Network data structure in Pajek-XXL is limited to graph only, it contains no strings which are available in 'ordinary' Pajek, e.g.:

vertices do not have labels or coordinates, they do not have any additional information, like time intervals, shapes, colors, sizes...
lines are described by a relation number and line value, but again, there is no additional information, like time intervals, patterns, colors...

Therefore not all menu options that are available in Pajek are available also in Pajek-XXL. All options that require additional information on vertices or lines are not available in Pajek-XXL (e.g. Draw menu option is not available).

Typical steps from Pajek-XXL to Pajek

Here are some typical steps how to analyse a huge network with Pajek-XXL and store it in the form that can be further analysed with 'ordinary' Pajek.

Running Pajek-XXL

Run Pajek-XXL and check how much physical memory is available (Info / Memory).

Recall that labels of vertices are not part of network in Pajek-XXL anymore. Vertex ID object is used to keep track of vertices, At the beginning vertex labels are equal to vertex sequential numbers therefore Vertex ID is not yet generated (to save space). After each operation which produces smaller part of a huge network (e.g. Operations / Network+Partition / Extract SubNetwork) the corresponding Vertex ID with original vertex numbers of a starting huge network is produced. We use Vertex ID for example in Network / Info, Partition / Info, Vector / Info, Networks / Multiply Networks and when saving a subnetwork to a NET file.

Reading huge network in Pajek-XXL

Read huge network in a usual way. After reading you may check Info / Memory again to see how much space is actually occupied by the loaded network.
In general, it is good to check Info / Memory (F11) often to see how much memory is available at any moment, especially after loading or disposing some huge network. It is also interesting to open Windows Task Manager / Performance and examine how memory usage is increasing and decreasing when some operation is executed (e.g. reading network, generating random network, disposing network...)

Analyzing networks in Pajek-XXL

Execute any operation that is fast and finds some interesting groups or dense parts in huge networks, e.g.

components,
communities,
cores,
islands,
3-rings and 4-rings,
fragment searching,
citation weights and CPM for acyclic networks,
degrees,
clustering coefficient
important vertices, hubs and authorities.
...

One of the results of these operations is almost always also a Partition. We can extract subnetwork induced by the Partition using Operations / Network+Partition / Extract SubNetwork. It is recommended to extract the subnetwork without generating a new network - answer No to the question Create a new Network as a result? (obtained subnetwork will replace the old, huge network, and the old huge network will be disposed to get a lot of free memory available again). The threshold for large network (number of vertices for which additional question Create a new Network as a result? appears) can be set in Options / ReadWrite / Large Network (Vertices).
When extracting subnetwork is finished you will get as a result also a compatible Vertex ID with vertex numbers of extracted vertices from original network.

If the extracted subnetwork is still too large to be loaded to Pajek you can perform some additional operation to further lower the dimension of a resulting subsubnetwork.
For example in the first step we extracted just the largest component of the huge network, but if it is still too large, we can compute cores and extract only the densest part of the network (core of some level or higher).
Be sure that Vertex ID that corresponds to active Network is selected in Vertex ID drop down menu before executing the command to get the right vertex IDs in obtained subsubnetwork.

How to bring extracted subnetwork from Pajek-XXL to Pajek?

Two options are available:

direct calling Pajek from Pajek-XXL,
saving network in Pajek-XXL first and then loading it to Pajek manually.

Option 1: Direct calling ordinary Pajek from Pajek-XXL
In Pajek-XXL 4.10 or higher you can call ordinary Pajek directly with some extracted subnetwork. Be sure that Options / Read-Write / Save VertexIDs as Vertex Labels? is checked.
Calling Pajek is available in Tools / Pajek. In addition to extracted subnetwork, corresponding Partition and/or Vector can be sent from PajekXXL directly to Pajek too. Each of the options (sending network with or without partition and/or vector) contains three suboptions:
1. with Default Vertex Labels - In this case default vertex labels (labels like v????) are sent to Pajek. See bellow how to replace default labels with the real labels later in Pajek.
2. +Add Vertex Labels from File(s) - Default vertex labels are replaced with real vertex labels from selected NET file and sent to Pajek.
3. +Add Vertex Labels and Descriptions from File(s) - Default vertex labels and descriptions (e.g. coordinates, shapes,...) are replaced with real vertex labels and descriptions from selected NET file and sent to Pajek.
If you for some reason do not want to call Pajek directly from Pajek-XXL (maybe not enough memory is available to have Pajek-XXL and Pajek running at the same time), you can save the obtained subnetwork first, later call Pajek, load the network and replace labels with the real ones. Sequence of steps to do this is explained in the next paragraph (Option 2).
Option 2: Saving Network in Pajek-XXL for later use in Pajek
After you obtain some smaller part manageable also by 'ordinary' Pajek, save the network with corresponding IDs stored in Vertex ID using File / Network / Save. Again, be sure that Network and corresponding Vertex ID are selected before saving the network. If Vertex ID is not selected or does not match in size with the Network, network will be saved without vertices labels.
Be sure that Options / Read-Write / Save VertexIDs as Vertex Labels? is checked before saving (otherwise definition of vertices is not written to a NET file). If it is not selected or you do not want to have vertices labels in the same file as network you can save labels in a separate NET file using:
File / VertexID / Save to NET File as Vertex Labels
Now run 'ordinary' Pajek. If vertices labels are stored in the same file as network you just need to load the saved network in an usual way.
If labels are stored in a separate file you must first load a subnetwork (without labels) and then (to obtain the right default vertex labels (v????)) run
Network / Create New Network / Transform / Add / Vertex Labels/from File(s)
and select the NET file produced from Vertex ID in Pajek-XXL.
You can have another real vertex labels (not just v????) stored in another NET file, in this case you must apply
Network / Create New Network / Transform / Add/Vertex Labels / from File(s)
again.
In case of 2-mode networks, you can select two files with labels for vertices in the first and second mode respectively (that is needed for 2-mode networks obtained by multiplications of networks, since in this case labels for the two modes are stored in two different files).
In case that you want to import other vertex descriptions (e.g. coordinates, shapes...) from a NET file too, you must apply
Network / Create New Network / Transform / Add / Vertex Labels / and other Descriptions from File(s)
instead.

Continue the demonstration with some examples.

Download Pajek-XXL; Andrej Mrvar.