SxGeo v2.2 File Format Specification

The Sypex Geo version 2.2 database file (abbreviated as SxGeo v2.2) consists of:

  1. Heading
  2. Index of first octets (bytes)
  3. Main index
  4. IP ranges
  5. Region directory (optional)
  6. Country directory (optional)
  7. Directory of cities (optional)

Let's take a closer look at them.

1. Heading

The header contains 40 bytes of information about the contents of the database. The header structure, as well as example values are presented in the following table.

¹ Offset Size Value Description
1 0 3 SxG File ID, "SxG"
2 3 1 21 File version (21 => 2.1)
3 4 4 0 Creation time (Unix timestamp)
4 8 1 2 Parser (0 - Universal, 1 - SxGeo Country, 2 - SxGeo City, 11 - GeoIP Country, 12 - GeoIP City, 21 - ipgeobase)
5 9 1 0 Encoding (0 - UTF-8, 1 - latin1, 2 - cp1251)
6 10 1 224 Elements in the index of the first bytes (up to 255)
7 11 2 1756 Elements in the main index (up to 65 thousand)
8 13 2 1034 Blocks in one index element (up to 65 thousand)
9 15 4 1 877 608 Number of ranges (up to 4 billion)
10 19 1 ID block size in bytes (1 for countries, 3 for cities)
11 20 2 46  Maximum region entry size (up to 64 KB)
12 22 2 51 Maximum city record size (up to 64 KB)
13 24 4 30 936 Region directory size
14 28 4 2 820 691 City directory size
15 32 2 62 Maximum country record size (up to 64 KB)
16 34 4 4 959 Country directory size
17 38 2 140 City/Region/Country Packaging Format Description Size
18 40 N Descriptions of city/region/country packaging format

Note: to store numbers, byte order is used - from high to low (big-endian). All numbers are unsigned integers.

2. Index of first bytes (octets)

The first byte index consists of 4-byte numbers containing the offset in the base for each first byte. 4 bytes instead of 3 were chosen due to the significantly higher conversion speed of 4-byte numbers and a slight increase in the occupied space. Contains 224 values from 0 to 223, since addresses from 224 and above are reserved (but it is possible to increase this value to 255).

3. Main index

The main index divides the database into equal parts. The number of such fragments can vary and is equal to the value of "Items in the main index" in the Header. Each index element consists of 4 bytes, representing the first IP of each database fragment. The number of ranges in each complete fragment of "Blocks in one index element", the last fragment may not be complete and contain a smaller number of ranges.

The size of the main index is selected so that the size Header + Index of the first bytes + Main index  was a multiple of 4096, and that the number of blocks in one index element was no more than 2000. By default, for databases with cities in the main index - 1756 elements. The values shown are recommended but not required.

4. Ranges

Ranges are stored as 3 bytes (IP of the beginning of the range without the first byte) + from 1 to 4 bytes storing the object ID or the offset of the entry in the directory. Ending ranges are not stored. During database creation, if “holes” are detected between two ranges, an empty range with an ID value of 0 is automatically inserted between them. Thus, the database covers absolutely all IPs, and the end of the current range is always the beginning of the next range.

The ID size is indicated in the Header (ID block size), and returning this ID or needing to look at the directory depends on the database type specified in the Header.

5. Directory of Regions

Regions are stored in a universal data packaging format. Using the received offset, a string of maximum  region length (specified in the Header) is read, after which the string is unpacked using the format specified in the header. 

Region directory is optional.

6. Country Directory

Countries are stored in a universal data packaging format. Using the received offset, a string of maximum length for the country is read, after which the string is unpacked using the format specified in the header. The directory of countries is combined with the directory of cities (since there may be IPs for which the city is not defined).

Country directory is optional.

7. Directory of cities

Cities are stored in a universal data packaging format. Using the received offset, a string of maximum length for the city is read, after which the string is unpacked using the format specified in the header.

City directory is optional.

Universal data packaging format

In version 2.2, a special format has been created for packaging data about cities, regions and countries. The following data types are supported.

Code Type Size Description
t tinyint signed 1 Integer from -128 to 127
T tinyint unsigned 1 Integer from 0 to 255
s smallint signed 2 Integer from -32,768 to 32,767
S smallint unsigned 2 Integer from 0 to 65,535
m mediumint signed 3 Integer from -8,388,608 to 8,388,607
M mediumint unsigned 3 Integer from 0 to 16,777,215
i integer signed 4 Integer from -2,147,483,648 to 2,147,483,647
I integer unsigned 4 Integer from 0 to 4 294 967 295
f float 4 4-byte single precision floating point number
d double 8 8-byte double precision floating point number
n# number 16bit 2 Number (2 bytes) with a fixed number of decimal places. After n the number of decimal places is indicated
N# number 32bit 4 Number (4 bytes) with a fixed number of decimal places. After N the number of decimal places is indicated
c# char N Fixed size string. After s the number of characters is indicated
b blob N Null-terminated string