The Sypex Geo database file version 2.1 (abbreviated as SxGeo v2.1) consists of:
- Heading
- Index of first octets (bytes)
- Main index
- IP ranges
- Region directory (optional)
- Directory of cities (optional)
Let's take a closer look at them.
1. Heading
The header contains 32 bytes of information about the contents of the database. The header structure, as well as example values are presented in the following table.
¹ | Offset | Size | Value | Description |
---|---|---|---|---|
1 | 0 | 3 | SxG | File ID, "SxG" |
2 | 3 | 1 | 21 | File version (21 => 2.1) |
3 | 4 | 4 | 0 | Creation time (Unix timestamp) |
4 | 8 | 1 | 2 | Parser (0 - Universal, 1 - SxGeo Country, 2 - SxGeo City, 11 - GeoIP Country, 12 - GeoIP City, 21 - ipgeobase) |
5 | 9 | 1 | 0 | Encoding (0 - UTF-8, 1 - latin1, 2 - cp1251) |
6 | 10 | 1 | 224 | Elements in the index of the first bytes (up to 255) |
7 | 11 | 2 | 2840 | Elements in the main index (up to 65 thousand) |
8 | 13 | 2 | 1034 | Blocks in one index element (up to 65 thousand) |
9 | 15 | 4 | 1 877 608 | Number of ranges (up to 4 billion) |
10 | 19 | 1 | 3 | ID block size in bytes (1 for countries, 3 for cities) |
11 | 20 | 2 | 46 | Maximum region entry size (up to 64 KB) |
12 | 22 | 2 | 51 | Maximum city record size (up to 64 KB) |
13 | 24 | 4 | 30 936 | Region directory size |
14 | 28 | 4 | 2 820 691 | City directory size |
Note: to store numbers, byte order is used - from high to low (big-endian). All numbers are unsigned integers.
2. Index of first bytes (octets)
The first byte index consists of 4-byte numbers containing the offset in the base for each first byte. 4 bytes instead of 3 were chosen due to the significantly higher conversion speed of 4-byte numbers and a slight increase in the occupied space. Contains 224 values from 0 to 223, since addresses from 224 and above are reserved (but it is possible to increase this value to 255).
3. Main index
The main index divides the database into equal parts. The number of such fragments can vary and is equal to the value of "Items in the main index" in the Header. Each index element consists of 4 bytes, representing the first IP of each database fragment. The number of ranges in each complete fragment of "Blocks in one index element", the last fragment may not be complete and contain a smaller number of ranges.
The size of the main index is selected so that the size Header + Index of the first bytes + Main index was a multiple of 4096, and that the number of blocks in one index element was no more than 2000. By default, for databases with cities in the main index - 2840 elements. The values shown are recommended but not required.
4. Ranges
Ranges are stored as 3 bytes (IP of the beginning of the range without the first byte) + from 1 to 4 bytes storing the object ID or the offset of the entry in the directory. Ending ranges are not stored. During database creation, if “holes” are detected between two ranges, an empty range with an ID value of 0 is automatically inserted between them. Thus, the database covers absolutely all IPs, and the end of the current range is always the beginning of the next range.
The ID size is indicated in the Header (ID block size), and returning this ID or needing to look at the directory depends on the database type specified in the Header.
5. Directory of Regions
Regions are stored as null-terminated strings. Using the received offset, a string of maximum region length (indicated in the Header) is read, after which the string is split at the zero character and the required number of elements is returned.
Region directory is optional.
6. Directory of cities
Cities are stored as numbers in binary form, followed by strings with a null character at the end. Using the resulting offset, a string of maximum length for the city is read, after which it is converted from the beginning of the string into a specified number of numbers, and the rest of the string is split at the zero character.
City directory is optional.