Basic Documents — Fundamental Fields

The basic metadata values common to all documents collected by Bilby.

Field NameTypeExample/Possible ValuesDescription
id (basic)stringa4954fc1-0ffb-9b08-b092-f288d3ad51044772e669-953e-c7b6-9b1d-2b4d7e72b8daThe unique identifier of the document.
id (commodities)stringa4954fc1-0ffb-9b08-b092-f288d3ad51044772e669-953e-c7b6-9b1d-2b4d7e72b8da-aluminumThe unique identifier of the document.
id (gics)stringa4954fc1-0ffb-9b08-b092-f288d3ad51044772e669-953e-c7b6-9b1d-2b4d7e72b8da-35-health-careThe unique identifier of the document.
source_linestringofficial_lineThe source line of the document.
source_countrystringChinaThe source country of the document.
source_languagestringChineseThe source language of the document.
utc_datedate2023-03-04T00:00:00.000ZThe UTC date of the document.
published_attimestamp2023-03-04T00:00:00.000ZThe published date of the document.

id

Definition: A unique identifier for the document.

Example Values:

  1. For the basic dataset: a4954fc1-0ffb-9b08-b092-f288d3ad51044772e669-953e-c7b6-9b1d-2b4d7e72b8da
  2. For the commodities dataset: a4954fc1-0ffb-9b08-b092-f288d3ad51044772e669-953e-c7b6-9b1d-2b4d7e72b8da-aluminum
  3. For the gics dataset: a4954fc1-0ffb-9b08-b092-f288d3ad51044772e669-953e-c7b6-9b1d-2b4d7e72b8da-aluminum-60-real-estate

source_line

Each row in the API corresponds to a document. The source of the document is the official name of its publisher, e.g. People’s Daily. Bilby groups these sources into six source lines, listed in the table below. Sources within a common line play a similar role in the governance of China. For example, all sources within the ministry line are published by ministries, and all sources within the regulatory_line are published by regulation agencies.

LineDescription
official_lineOfficial media sources of the country.
regulatory_lineRegulatory agencies of the country that produce legal documents.
private_linePrivate media sources of the country.
ministryMinistries of the country.
SOEState-Owned Enterprises.
partyPolitical parties.
bankBanks.

source_country

Definition: The country that the source belongs to.

Possible values: Currently, the only value for this field is China.


source_language

Definition: The language of the original document.

Possible values: Currently, the only values for this field are Chinese and English.

Note: Currently, fewer than one percent of the documents are in English.


utc_date

Definition: The UTC date of the document is the date in UTC time zone.

Example value: 2023-03-04T00:00:00.000Z.


published_at

Definition: The published date of the document is the date in the original time zone of the document.

Example value: 2023-03-04T00:00:00.000Z.