How to use Pajek-XXL
Examples; Download Pajek-XXL.
Pajek-XXL main window
What is Pajek-XXL
|
Pajek-XXL is a special edition of program Pajek. Its memory consumption is much lower.
For the same network it needs at least 2-3 times less physical memory than Pajek.
Operations that are memory intensive (e.g. generating random networks, extracting, shrinking,...) are also faster.
Table on the right gives comparison of space needed to store and time needed to generate random network
with 10.000.000 vertices and 40.000.000 lines (1.73GHz processor) in Pajek and Pajek-XXL.
|
| Resources | Space (G RAM) | Time (secs) |
| Windows | 32 | 64 | 32 | 64 |
| Pajek | 3.25 | 4.35 | 16 | 18 |
| PajekXXL | 1.64 | 2.64 | 12 | 13 |
|
Calculating space needed to store a huge network in Pajek-XXL
Internal data structure in Pajek-XXL is optimized to use all available memory very efficiently.
Therefore space needed to store a network in Pajek-XXL can be precisely calculated.
Let n be number of vertices and m number of lines in a network. Then:
- 4n + 40m < 4.000.000.000 for Pajek32-XXL running on 64 bit Windows (Pajek32-XXL-4G)
- 4n + 40m < 2.000.000.000 for Pajek32-XXL running on 32 bit Windows (Pajek32-XXL-2G)
- 8n + 64m < available RAM (e.g. 16.000.000.000) for Pajek64-XXL
Of course we must leave some memory free also for results of Pajek-XXL operations
(e.g. to store partitions and/or vectors that are obtained as results).
Rough estimation: Sparse networks with some tens of millions of vertices can be analysed on computers having up to 4G RAM of memory,
while for networks where number of vertices is close to a billion 16G RAM or more is needed.
|
|
It is important to stress again: In Pajek-XXL number of lines is much more important than number
of vertices. As can be noticed from above three formulas, memory space needed for storing 1 line is the same as space needed
for 10 vertices in Pajek32-XXL and 8 vertices in Pajek64-XXL. That means that
performance of Pajek-XXL is especially good for huge very sparse networks.
Table on the right gives a comparison of space needed in Pajek64-XXL to store two networks with the same number of vertices (200.000.000).
The first one has 200.000.000 and the second one 100.000.000 edges. The second network occupies almost two times less space than the first.
|
Vertices (millions) | 200 | 200 |
Edges (millions) | 200 | 100
|
Space (G RAM) | 14.4 | 8.0 |
|
Pajek-XXL uses
|
Pajek-XXL is used for analysis of huge networks -
networks that cannot be loaded into physical memory using 'ordinary' Pajek.
It is supposed that Pajek-XXL is used to extract some smaller, interesting parts of a huge network
that can be later further analysed (and visualized) with more sophisticated methods available in Pajek.
The other possible use of Pajek-XXL is analysis of huge networks where identity of
vertices is not important. Such examples are simulation studies on random networks,
where we are interested in general properties, like distributions
(e.g. degrees, triads,...).
Using Repeat Last Command we can generate some thousands of large random networks and compute mean values and variances of interesting properties.
For details see
Chapter 13, Monte Carlo Simulation.
On the first sight Pajek-XXL looks very similar to Pajek but some menu options are not available.
Permutation as a data object does not exist any more, instead we have a new data object called Vertex ID.
There is also additional menu item labeled Vertex ID in the main menu.
Network data structure in Pajek-XXL is limited to graph only, it contains no strings which are available
in 'ordinary' Pajek, e.g.:
- vertices do not have labels or coordinates, they do not have any additional information, like time intervals, shapes, colors, sizes...
- lines are described by a relation number and line value, but again, there is no additional information, like time intervals, patterns, colors...
Therefore not all menu options that are available in Pajek are available also in Pajek-XXL.
All options that require additional information on vertices or lines are not available in Pajek-XXL
(e.g. Draw menu option is not available).
|
Typical steps from Pajek-XXL to Pajek
Here are some typical steps how to analyse a huge network with Pajek-XXL and store it in the
form that can be further analysed with 'ordinary' Pajek.
Running Pajek-XXL
Run Pajek-XXL and check how much physical memory is available (Info/Memory).
Recall that labels of vertices are not part of network in Pajek-XXL anymore.
Vertex ID object is used to keep track of vertices,
At the beginning vertex labels are equal to vertex sequential numbers therefore
Vertex ID is not yet generated (to save space).
After each operation which produces
smaller part of a huge network (e.g. Operations/Network+Partition/Extract SubNetwork)
the corresponding Vertex ID with original vertex numbers of a starting huge network
is produced. We use Vertex ID for example in Network/Info,
Partition/Info, Vector/Info, Networks/Multiply Networks and when saving a subnetwork to a NET file.
Reading huge network in Pajek-XXL
Read huge network in a usual way. After reading you may check Info/Memory again to see
how much space is actually occupied by the loaded network.
In general, it is good to check Info/Memory (F11) often to see how much memory is available at any moment,
especially after loading or disposing some huge network.
It is also interesting to open Windows Task Manager/Performance
and examine how memory usage is increasing and decreasing.
Analyzing networks in Pajek-XXL
Execute any operation that is fast and finds some interesting groups or dense parts in huge networks, e.g.
- components,
- communities,
- cores,
- islands,
- 3-rings and 4-rings,
- fragment searching,
- citation weights and CPM for acyclic networks,
- degrees,
- clustering coefficient
- important vertices, hubs and authorities.
- ...
One of the results of these operations is almost always also a Partition.
We can extract subnetwork induced by the Partition using
Operations/Network+Partition/Extract SubNetwork.
It is recommended to extract the subnetwork without generating a new network
- answer No to the question Create a new Network as a result?
(obtained subnetwork will replace the old, huge network, and the old huge
network will be disposed to get a lot of free memory available again).
The threshold for large network (number of vertices for which additional question Create a new Network as a result? appears) can be set in Options/ReadWrite/Large Network (Vertices).
When extracting subnetwork is finished you will get as a result also a compatible
Vertex ID with vertex numbers of extracted vertices from original network.
If the extracted subnetwork is still too large to be loaded to Pajek
you can perform some additional operation to further lower the dimension of a resulting
subsubnetwork.
For example in the first step we extracted just the largest component of the huge network,
but if it is still too large, we can compute cores and extract only the densest part of the
network (core of some level or higher).
Be sure that Vertex ID that corresponds to active Network is selected in Vertex ID drop down menu
before executing the command to get the right vertex IDs in obtained subsubnetwork.
Saving Network in Pajek-XXL for later use in Pajek
After you obtain some smaller part manageable also by 'ordinary' Pajek,
save the network with corresponding IDs stored in Vertex ID using File/Network/Save.
Again, be sure that Network and corresponding Vertex ID are selected
before saving the network. If Vertex ID is not selected or
does not match in size with the Network, network will be saved without vertices labels.
Be sure that Options/Read-Write/Save VertexIDs as Vertex Labels? is checked before
saving (otherwise definition of vertices is not written to a NET file).
If it is not selected or you do not want to have vertices labels in the same file as network you can save labels in a separate NET file using:
File/VertexID/Save to NET File as Vertex Labels
Now run 'ordinary' Pajek.
If vertices labels are stored in the same file as network you just need to load the saved network in an usual way.
If labels are stored in a separate file you must
first load a subnetwork (without labels) and then
(to obtain the right default vertex labels (v????)) run
Network/Create New Network/Transform/Add/Vertex Labels/from File(s)
and select the NET file produced from Vertex ID in Pajek-XXL.
You can have another real vertex labels (not just v????) stored
in another NET file, in this case you must apply
Network/Create New Network/Transform/Add/Vertex Labels/from File(s)
again.
In case of 2-mode networks, you can select two files with labels for vertices in the first
and second mode respectively (that is needed for 2-mode networks obtained by multiplications of networks,
since in this case labels for the two modes are stored in two different files).
In case that you want to import other vertex descriptions (e.g. coordinates, shapes...)
from a NET file too, you must apply
Network/Create New Network/Transform/Add/Vertex Labels/and other Descriptions from File(s)
instead.
|
Continue the demonstration with some examples.
Download Pajek-XXL; Andrej Mrvar.
|