Basic Documents — Fundamental Fields

The basic metadata values common to all documents collected by Bilby.

Field NameTypeExample/Possible ValuesDescription
id (basic)string614836f6732855d8b931dca9141665d7b405f8e402a1c6e68f008fe35e22414aThe unique identifier of the document.
id (commodities)stringf627348a244f879e43d41bede462a8ecdca944836d1663160c19e4bfb3641a01-aluminumThe unique identifier of the document.
id (gics)string65b5d92b7ec8dcf015ffdc0cf88a8fc3d47084422c3feab75fa7207c915d58b0-60-real-estateThe unique identifier of the document.
source_linestringofficial_lineThe source line of the document.
source_countrystringChinaThe source country of the document.
source_languagestringChineseThe source language of the document.
utc_datedate2023-03-04T00:00:00.000ZThe UTC date of the document.
published_attimestamp2023-03-04T00:00:00.000ZThe published date of the document.

id

Definition: A unique identifier for the document.

Example Values:

  1. For the basic dataset: 614836f6732855d8b931dca9141665d7b405f8e402a1c6e68f008fe35e22414a
  2. For the commodities dataset: f627348a244f879e43d41bede462a8ecdca944836d1663160c19e4bfb3641a01-aluminum
  3. For the gics dataset: 65b5d92b7ec8dcf015ffdc0cf88a8fc3d47084422c3feab75fa7207c915d58b0-60-real-estate

source_line

Each row in the API corresponds to a document. The source of the document is the official name of its publisher, e.g. People’s Daily. Bilby groups these sources into six source lines, listed in the table below. Sources within a common line play a similar role in the governance of China. For example, all sources within the ministry line are published by ministries, and all sources within the regulatory_line are published by regulation agencies.

LineDescription
official_lineOfficial media sources of the country.
regulatory_lineRegulatory agencies of the country that produce legal documents.
private_linePrivate media sources of the country.
ministryMinistries of the country.
SOEState-Owned Enterprises.
partyPolitical parties.
bankBanks.

source_country

Definition: The country that the source belongs to.

Possible values: Currently, the only value for this field is China.


source_language

Definition: The language of the original document.

Possible values: Currently, the only values for this field are Chinese and English.

Note: Currently, fewer than one percent of the documents are in English.


utc_date

Definition: The UTC date of the document is the date in UTC time zone.

Example value: 2023-03-04T00:00:00.000Z.


published_at

Definition: The published date of the document is the date in the original time zone of the document.

Example value: 2023-03-04T00:00:00.000Z.