The full dataset viewer is not available (click to read why). Only showing a preview of the rows.
Error code: DatasetGenerationError
Exception: CastError
Message: Couldn't cast
all_languages_strict_exact: bool
created_utc: timestamp[s]
engine: struct<languages: list<item: string>, projection_api: string, public_cnl_contains_private_cst_transp (... 10 chars omitted)
child 0, languages: list<item: string>
child 0, item: string
child 1, projection_api: string
child 2, public_cnl_contains_private_cst_transport: bool
plaincode_cnl: struct<en: string, es: string, fr: string, hi: string, pt: string, zh: string>
child 0, en: string
child 1, es: string
child 2, fr: string
child 3, hi: string
child 4, pt: string
child 5, zh: string
proof: struct<en: struct<byte_exact_ok: bool, char_exact_ok: bool, exact_ok: bool, reverse_ast_ok: bool, re (... 1826 chars omitted)
child 0, en: struct<byte_exact_ok: bool, char_exact_ok: bool, exact_ok: bool, reverse_ast_ok: bool, reverse_byte_ (... 214 chars omitted)
child 0, byte_exact_ok: bool
child 1, char_exact_ok: bool
child 2, exact_ok: bool
child 3, reverse_ast_ok: bool
child 4, reverse_byte_count: int64
child 5, reverse_char_count: int64
child 6, reverse_compile_error: null
child 7, reverse_compile_ok: bool
child 8, reverse_sha256: string
child 9, sha_exact_ok: bool
child 10, source_byte_count: int64
child 11, source_char_count: int64
child 12, source_sha256: string
child 1, es: struct<byte_exact_ok: bool, char_exact_ok: bool, exact_ok: bool, reverse_ast_ok: bool, reverse_byte_ (... 214 chars omitted)
child 0,
...
: bool
child 2, fr: bool
child 3, hi: bool
child 4, pt: bool
child 5, zh: bool
spanish_cnl: string
roundtrip_python_by_language: struct<en: string, es: string, fr: string, pt: string, zh: string, hi: string>
child 0, en: string
child 1, es: string
child 2, fr: string
child 3, pt: string
child 4, zh: string
child 5, hi: string
source_stats: struct<sha256: string, char_count: int64, byte_count: int64, line_count: int64, compile_ok: bool, co (... 18 chars omitted)
child 0, sha256: string
child 1, char_count: int64
child 2, byte_count: int64
child 3, line_count: int64
child 4, compile_ok: bool
child 5, compile_error: null
mandarin_cnl: string
portuguese_cnl: string
french_cnl: string
ids: struct<row_id: string, python_source_id: string, python_source_index: int64, run_id: string, created (... 19 chars omitted)
child 0, row_id: string
child 1, python_source_id: string
child 2, python_source_index: int64
child 3, run_id: string
child 4, created_utc: timestamp[s]
hindi_cnl: string
source_metadata: struct<repo_name: string, path: string, license: string, language: string, size: int64, source_datas (... 114 chars omitted)
child 0, repo_name: string
child 1, path: string
child 2, license: string
child 3, language: string
child 4, size: int64
child 5, source_dataset: string
child 6, source_config: string
child 7, source_split: string
child 8, source_label: string
child 9, source_index_seen_in_stream: int64
english_cnl: string
to
{'source': Value('string'), 'english_cnl': Value('string'), 'spanish_cnl': Value('string'), 'french_cnl': Value('string'), 'mandarin_cnl': Value('string'), 'hindi_cnl': Value('string'), 'portuguese_cnl': Value('string'), 'ids': {'row_id': Value('string'), 'python_source_id': Value('string'), 'python_source_index': Value('int64'), 'run_id': Value('string'), 'created_utc': Value('timestamp[s]')}, 'roundtrip_python_by_language': {'en': Value('string'), 'es': Value('string'), 'fr': Value('string'), 'pt': Value('string'), 'zh': Value('string'), 'hi': Value('string')}, 'source_stats': {'sha256': Value('string'), 'char_count': Value('int64'), 'byte_count': Value('int64'), 'line_count': Value('int64'), 'compile_ok': Value('bool'), 'compile_error': Value('null')}, 'source_metadata': {'repo_name': Value('string'), 'path': Value('string'), 'license': Value('string'), 'language': Value('string'), 'size': Value('int64'), 'source_dataset': Value('string'), 'source_config': Value('string'), 'source_split': Value('string'), 'source_label': Value('string'), 'source_index_seen_in_stream': Value('int64')}, 'semantic_family': {'families': List(Value('string')), 'primary_family': Value('string'), 'ast_node_count': Value('int64'), 'ast_top_counts': Json(decode=True)}, 'engine': {'projection_api': Value('string'), 'requested_languages': List(Value('string')), 'row_schema': Value('string'), 'public_cnl_contains_private_cst_transport': Value('bool')}, 'row_sha256': Value('string')}
because column names don't match
Traceback: Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/datasets/builder.py", line 1779, in _prepare_split_single
for key, table in generator:
^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/datasets/packaged_modules/json/json.py", line 299, in _generate_tables
self._cast_table(pa_table, json_field_paths=json_field_paths),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/datasets/packaged_modules/json/json.py", line 128, in _cast_table
pa_table = table_cast(pa_table, self.info.features.arrow_schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/datasets/table.py", line 2321, in table_cast
return cast_table_to_schema(table, schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/datasets/table.py", line 2249, in cast_table_to_schema
raise CastError(
datasets.table.CastError: Couldn't cast
all_languages_strict_exact: bool
created_utc: timestamp[s]
engine: struct<languages: list<item: string>, projection_api: string, public_cnl_contains_private_cst_transp (... 10 chars omitted)
child 0, languages: list<item: string>
child 0, item: string
child 1, projection_api: string
child 2, public_cnl_contains_private_cst_transport: bool
plaincode_cnl: struct<en: string, es: string, fr: string, hi: string, pt: string, zh: string>
child 0, en: string
child 1, es: string
child 2, fr: string
child 3, hi: string
child 4, pt: string
child 5, zh: string
proof: struct<en: struct<byte_exact_ok: bool, char_exact_ok: bool, exact_ok: bool, reverse_ast_ok: bool, re (... 1826 chars omitted)
child 0, en: struct<byte_exact_ok: bool, char_exact_ok: bool, exact_ok: bool, reverse_ast_ok: bool, reverse_byte_ (... 214 chars omitted)
child 0, byte_exact_ok: bool
child 1, char_exact_ok: bool
child 2, exact_ok: bool
child 3, reverse_ast_ok: bool
child 4, reverse_byte_count: int64
child 5, reverse_char_count: int64
child 6, reverse_compile_error: null
child 7, reverse_compile_ok: bool
child 8, reverse_sha256: string
child 9, sha_exact_ok: bool
child 10, source_byte_count: int64
child 11, source_char_count: int64
child 12, source_sha256: string
child 1, es: struct<byte_exact_ok: bool, char_exact_ok: bool, exact_ok: bool, reverse_ast_ok: bool, reverse_byte_ (... 214 chars omitted)
child 0,
...
: bool
child 2, fr: bool
child 3, hi: bool
child 4, pt: bool
child 5, zh: bool
spanish_cnl: string
roundtrip_python_by_language: struct<en: string, es: string, fr: string, pt: string, zh: string, hi: string>
child 0, en: string
child 1, es: string
child 2, fr: string
child 3, pt: string
child 4, zh: string
child 5, hi: string
source_stats: struct<sha256: string, char_count: int64, byte_count: int64, line_count: int64, compile_ok: bool, co (... 18 chars omitted)
child 0, sha256: string
child 1, char_count: int64
child 2, byte_count: int64
child 3, line_count: int64
child 4, compile_ok: bool
child 5, compile_error: null
mandarin_cnl: string
portuguese_cnl: string
french_cnl: string
ids: struct<row_id: string, python_source_id: string, python_source_index: int64, run_id: string, created (... 19 chars omitted)
child 0, row_id: string
child 1, python_source_id: string
child 2, python_source_index: int64
child 3, run_id: string
child 4, created_utc: timestamp[s]
hindi_cnl: string
source_metadata: struct<repo_name: string, path: string, license: string, language: string, size: int64, source_datas (... 114 chars omitted)
child 0, repo_name: string
child 1, path: string
child 2, license: string
child 3, language: string
child 4, size: int64
child 5, source_dataset: string
child 6, source_config: string
child 7, source_split: string
child 8, source_label: string
child 9, source_index_seen_in_stream: int64
english_cnl: string
to
{'source': Value('string'), 'english_cnl': Value('string'), 'spanish_cnl': Value('string'), 'french_cnl': Value('string'), 'mandarin_cnl': Value('string'), 'hindi_cnl': Value('string'), 'portuguese_cnl': Value('string'), 'ids': {'row_id': Value('string'), 'python_source_id': Value('string'), 'python_source_index': Value('int64'), 'run_id': Value('string'), 'created_utc': Value('timestamp[s]')}, 'roundtrip_python_by_language': {'en': Value('string'), 'es': Value('string'), 'fr': Value('string'), 'pt': Value('string'), 'zh': Value('string'), 'hi': Value('string')}, 'source_stats': {'sha256': Value('string'), 'char_count': Value('int64'), 'byte_count': Value('int64'), 'line_count': Value('int64'), 'compile_ok': Value('bool'), 'compile_error': Value('null')}, 'source_metadata': {'repo_name': Value('string'), 'path': Value('string'), 'license': Value('string'), 'language': Value('string'), 'size': Value('int64'), 'source_dataset': Value('string'), 'source_config': Value('string'), 'source_split': Value('string'), 'source_label': Value('string'), 'source_index_seen_in_stream': Value('int64')}, 'semantic_family': {'families': List(Value('string')), 'primary_family': Value('string'), 'ast_node_count': Value('int64'), 'ast_top_counts': Json(decode=True)}, 'engine': {'projection_api': Value('string'), 'requested_languages': List(Value('string')), 'row_schema': Value('string'), 'public_cnl_contains_private_cst_transport': Value('bool')}, 'row_sha256': Value('string')}
because column names don't match
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 1348, in compute_config_parquet_and_info_response
parquet_operations = convert_to_parquet(builder)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 980, in convert_to_parquet
builder.download_and_prepare(
File "/usr/local/lib/python3.12/site-packages/datasets/builder.py", line 882, in download_and_prepare
self._download_and_prepare(
File "/usr/local/lib/python3.12/site-packages/datasets/builder.py", line 943, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/usr/local/lib/python3.12/site-packages/datasets/builder.py", line 1646, in _prepare_split
for job_id, done, content in self._prepare_split_single(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/datasets/builder.py", line 1832, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the datasetNeed help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.
source string | english_cnl string | spanish_cnl string | french_cnl string | mandarin_cnl string | hindi_cnl string | portuguese_cnl string | ids dict | roundtrip_python_by_language dict | source_stats dict | source_metadata dict | semantic_family dict | engine dict | row_sha256 string |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
from scipy.spatial import distance
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
from sklearn.preprocessing import StandardScaler
from s... | Load numpy, referred to as np.
Load pandas, referred to as pd.
Load Series, DataFrame from pandas.
Load distance from scipy.spatial.
Load matplotlib.pyplot, referred to as plt.
Load DBSCAN from sklearn.cluster.
Load metrics from sklearn.
Load make_blobs from sklearn.datasets.samples_generator.
Load StandardScaler from ... | Importar numpy, referido como np.
Importar pandas, referido como pd.
Importar Series, DataFrame desde pandas.
Importar distance desde scipy.spatial.
Importar matplotlib.pyplot, referido como plt.
Importar DBSCAN desde sklearn.cluster.
Importar metrics desde sklearn.
Importar make_blobs desde sklearn.datasets.samples_ge... | Charger numpy, référé comme np.
Charger pandas, référé comme pd.
Charger Series, DataFrame depuis pandas.
Charger distance depuis scipy.spatial.
Charger matplotlib.pyplot, référé comme plt.
Charger DBSCAN depuis sklearn.cluster.
Charger metrics depuis sklearn.
Charger make_blobs depuis sklearn.datasets.samples_generato... | 导入 numpy, 别名为 np.
导入 pandas, 别名为 pd.
导入 Series, DataFrame 从 pandas.
导入 distance 从 scipy 点 spatial.
导入 matplotlib 点 pyplot, 别名为 plt.
导入 DBSCAN 从 sklearn 点 cluster.
导入 metrics 从 sklearn.
导入 make_blobs 从 sklearn 点 datasets 点 samples_generator.
导入 StandardScaler 从 sklearn 点 preprocessing.
导入 decomposition 从 sklearn. # PCA
... | आयात करें numpy को np के रूप में.
आयात करें pandas को pd के रूप में.
pandas से आयात करें Series, DataFrame.
scipy बिंदु spatial से आयात करें distance.
आयात करें matplotlib बिंदु pyplot को plt के रूप में.
sklearn बिंदु cluster से आयात करें DBSCAN.
sklearn से आयात करें metrics.
sklearn बिंदु datasets बिंदु samples_genera... | Importar numpy, com o nome np.
Importar pandas, com o nome pd.
Importar Series, DataFrame de pandas.
Importar distance de scipy atributo spatial.
Importar matplotlib atributo pyplot, com o nome plt.
Importar DBSCAN de sklearn atributo cluster.
Importar metrics de sklearn.
Importar make_blobs de sklearn atributo dataset... | {
"row_id": "pc_cnl100k_000000_a0e561623699",
"python_source_id": "pc_src100k_000000_a0e561623699",
"python_source_index": 0,
"run_id": "run_20260530_225930_utc",
"created_utc": "2026-05-30T22:59:48"
} | {
"en": "import numpy as np\nimport pandas as pd\nfrom pandas import Series, DataFrame\nfrom scipy.spatial import distance\nimport matplotlib.pyplot as plt\n\nfrom sklearn.cluster import DBSCAN\nfrom sklearn import metrics\nfrom sklearn.datasets.samples_generator import make_blobs\nfrom sklearn.preprocessing import S... | {
"sha256": "a0e56162369928d35e38e19a24ee57376c55158ef8f01353e0c45273aa9f3ab9",
"char_count": 1450,
"byte_count": 1450,
"line_count": 51,
"compile_ok": true,
"compile_error": null
} | {
"repo_name": "banacer/door-wiz",
"path": "src/identification/Identifier.py",
"license": "mit",
"language": "Python",
"size": 1449,
"source_dataset": "codeparrot/github-code-clean",
"source_config": "Python-mit",
"source_split": "train",
"source_label": "github_code_clean_python_mit_single",
"sourc... | {
"families": [
"classes",
"functions",
"imports"
],
"primary_family": "functions",
"ast_node_count": 123,
"ast_top_counts": {
"Name": 18,
"Constant": 17,
"alias": 15,
"Load": 14,
"ImportFrom": 9,
"Assign": 9,
"Store": 9,
"Import": 5,
"arg": 5,
"keyword": 4,... | {
"projection_api": "plaincode.api.project_python_dataset_rows",
"requested_languages": [
"en",
"es",
"fr",
"pt",
"zh",
"hi"
],
"row_schema": "source-six-cnl-columns-selected-metadata-v2",
"public_cnl_contains_private_cst_transport": false
} | a98953afebb6e20b13b963bcc135c1d2440cfe77a5fd95c67d6e0724447719ad |
"""
********************************************************************
Test file for implementation check of CR3BP library.
********************************************************************
Last update: 21/01/2022
Description
-----------
Contains a few sample orbit propagations to test the CR3BP library.
... | Text block:
""
"********************************************************************"
" Test file for implementation check of CR3BP library."
"********************************************************************"
""
"Last update: 21/01/2022"
""
"Description"
"-----------"
"Contains a few sample orbit propagations ... | Texto literal:
""
"********************************************************************"
" Test file for implementation check of CR3BP library."
"********************************************************************"
""
"Last update: 21/01/2022"
""
"Description"
"-----------"
"Contains a few sample orbit propagatio... | Texte littéral:
""
"********************************************************************"
" Test file for implementation check of CR3BP library."
"********************************************************************"
""
"Last update: 21/01/2022"
""
"Description"
"-----------"
"Contains a few sample orbit propagati... | 文本块:
""
"********************************************************************"
" Test file for implementation check of CR3BP library."
"********************************************************************"
""
"Last update: 21/01/2022"
""
"Description"
"-----------"
"Contains a few sample orbit propagations to test... | मूल्यांकन करें पाठ खंड:
""
"********************************************************************"
" Test file for implementation check of CR3BP library."
"********************************************************************"
""
"Last update: 21/01/2022"
""
"Description"
"-----------"
"Contains a few sample orbit p... | Expressão bloco de texto:
""
"********************************************************************"
" Test file for implementation check of CR3BP library."
"********************************************************************"
""
"Last update: 21/01/2022"
""
"Description"
"-----------"
"Contains a few sample orbit... | {
"row_id": "pc_cnl100k_000001_12984d458a6d",
"python_source_id": "pc_src100k_000001_12984d458a6d",
"python_source_index": 1,
"run_id": "run_20260530_225930_utc",
"created_utc": "2026-05-30T22:59:48"
} | {
"en": "\"\"\"\n********************************************************************\n Test file for implementation check of CR3BP library.\n********************************************************************\n\nLast update: 21/01/2022\n\nDescription\n-----------\nContains a few sample orbit propagations to te... | {
"sha256": "12984d458a6d72940bdc4d5140cb9b3d103421964bd8595bfe443f2a5f378e9a",
"char_count": 6058,
"byte_count": 6058,
"line_count": 219,
"compile_ok": true,
"compile_error": null
} | {
"repo_name": "poliastro/poliastro",
"path": "contrib/CR3BP/test_run_CR3BP.py",
"license": "mit",
"language": "Python",
"size": 6277,
"source_dataset": "codeparrot/github-code-clean",
"source_config": "Python-mit",
"source_split": "train",
"source_label": "github_code_clean_python_mit_single",
"sou... | {
"families": [
"imports"
],
"primary_family": "imports",
"ast_node_count": 1029,
"ast_top_counts": {
"Load": 285,
"Name": 198,
"Constant": 114,
"Attribute": 79,
"Call": 77,
"Store": 59,
"Assign": 42,
"Expr": 35,
"Tuple": 34,
"Subscript": 24,
"Slice": 24,
"B... | {
"projection_api": "plaincode.api.project_python_dataset_rows",
"requested_languages": [
"en",
"es",
"fr",
"pt",
"zh",
"hi"
],
"row_schema": "source-six-cnl-columns-selected-metadata-v2",
"public_cnl_contains_private_cst_transport": false
} | 938613df3cfd0f2fc6ab847dfadc87401a5cec072b62af0fb3fd8a3953f852ad |
"#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nfrom runner.koan import *\n\nclass AboutIteration(...TRUNCATED) | "# !/usr/bin/env python\n# -*- coding: utf-8 -*-\nLoad everything from runner.koan.\nDefine class Ab(...TRUNCATED) | "# !/usr/bin/env python\n# -*- coding: utf-8 -*-\nImportar todo desde runner.koan.\nDefinir clase Ab(...TRUNCATED) | "# !/usr/bin/env python\n# -*- coding: utf-8 -*-\nCharger tout depuis runner.koan.\nDéfinir classe (...TRUNCATED) | "# !/usr/bin/env python\n# -*- coding: utf-8 -*-\n从以下内容导入全部 runner 点 koan.\n定(...TRUNCATED) | "# !/usr/bin/env python\n# -*- coding: utf-8 -*-\nrunner बिंदु koan से सब कु(...TRUNCATED) | "# !/usr/bin/env python\n# -*- coding: utf-8 -*-\nImportar tudo de runner atributo koan.\nDefinir cl(...TRUNCATED) | {"row_id":"pc_cnl100k_000002_1b67b0b77bce","python_source_id":"pc_src100k_000002_1b67b0b77bce","pyth(...TRUNCATED) | {"en":"#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nfrom runner.koan import *\n\nclass AboutIte(...TRUNCATED) | {"sha256":"1b67b0b77bce8b6371a7a96811177081e402c030c72bccb2aa3d57426ade105e","char_count":3923,"byte(...TRUNCATED) | {"repo_name":"bohdan7/python_koans","path":"python3/koans/about_iteration.py","license":"mit","langu(...TRUNCATED) | {"families":["classes","exceptions","functions","imports","loops"],"primary_family":"functions","ast(...TRUNCATED) | {"projection_api":"plaincode.api.project_python_dataset_rows","requested_languages":["en","es","fr",(...TRUNCATED) | 0e11b0b4997340cff972d393c6c55832f6059176e2b49346c0e579d052526ebf |
"from api_request import Api\nfrom util import Util\nfrom twocheckout import Twocheckout\n\n\nclass (...TRUNCATED) | "Load Api from api_request.\nLoad Util from util.\nLoad Twocheckout from twocheckout.\nDefine class (...TRUNCATED) | "Importar Api desde api_request.\nImportar Util desde util.\nImportar Twocheckout desde twocheckout.(...TRUNCATED) | "Charger Api depuis api_request.\nCharger Util depuis util.\nCharger Twocheckout depuis twocheckout.(...TRUNCATED) | "导入 Api 从 api_request.\n导入 Util 从 util.\n导入 Twocheckout 从 twocheckout.\n定义类 (...TRUNCATED) | "api_request से आयात करें Api.\nutil से आयात करें Util.\ntwo(...TRUNCATED) | "Importar Api de api_request.\nImportar Util de util.\nImportar Twocheckout de twocheckout.\nDefinir(...TRUNCATED) | {"row_id":"pc_cnl100k_000003_fc5f29b7278c","python_source_id":"pc_src100k_000003_fc5f29b7278c","pyth(...TRUNCATED) | {"en":"from api_request import Api\nfrom util import Util\nfrom twocheckout import Twocheckout\n\n\n(...TRUNCATED) | {"sha256":"fc5f29b7278cd68344637d8a0a6ece80950a2b3987bde96c2976b225b8fae0e9","char_count":3388,"byte(...TRUNCATED) | {"repo_name":"2Checkout/2checkout-python","path":"twocheckout/sale.py","license":"mit","language":"P(...TRUNCATED) | {"families":["classes","functions","imports","loops"],"primary_family":"functions","ast_node_count":(...TRUNCATED) | {"projection_api":"plaincode.api.project_python_dataset_rows","requested_languages":["en","es","fr",(...TRUNCATED) | 1b1b7dd19b28d1aada2a1589230b1ed2879a7b1e2dbdc19e9af4cab9efcda79c |
"import json\nimport os\n\nfrom flask import request, g, render_template, make_response, jsonify, Re(...TRUNCATED) | "Load json.\nLoad os.\nLoad request, g, render_template, make_response, jsonify, Response from flask(...TRUNCATED) | "Importar json.\nImportar os.\nImportar request, g, render_template, make_response, jsonify, Respons(...TRUNCATED) | "Charger json.\nCharger os.\nCharger request, g, render_template, make_response, jsonify, Response d(...TRUNCATED) | "导入 json.\n导入 os.\n导入 request, g, render_template, make_response, jsonify, Response 从 (...TRUNCATED) | "आयात करें json.\nआयात करें os.\nflask से आयात करे(...TRUNCATED) | "Importar json.\nImportar os.\nImportar request, g, render_template, make_response, jsonify, Respons(...TRUNCATED) | {"row_id":"pc_cnl100k_000004_632cf16b365e","python_source_id":"pc_src100k_000004_632cf16b365e","pyth(...TRUNCATED) | {"en":"import json\nimport os\n\nfrom flask import request, g, render_template, make_response, jsoni(...TRUNCATED) | {"sha256":"632cf16b365e99c71fc66362442c2d8899ecfe30076aaf203488d4fc3f1fa59f","char_count":4697,"byte(...TRUNCATED) | {"repo_name":"CenterForOpenScience/scinet","path":"scinet/views.py","license":"mit","language":"Pyth(...TRUNCATED) | {"families":["exceptions","functions","imports"],"primary_family":"functions","ast_node_count":517,"(...TRUNCATED) | {"projection_api":"plaincode.api.project_python_dataset_rows","requested_languages":["en","es","fr",(...TRUNCATED) | 823e27051e5e5217a7ad6988bc723e7d4a534f02319bc5955187a812ba7c6e02 |
"from corecat.constants import OBJECT_CODES, MODEL_VERSION\nfrom ._sqlalchemy import Base, CoreCatBa(...TRUNCATED) | "Load OBJECT_CODES, MODEL_VERSION from corecat.constants.\nLoad Base, CoreCatBaseMixin from the curr(...TRUNCATED) | "Importar OBJECT_CODES, MODEL_VERSION desde corecat.constants.\nImportar Base, CoreCatBaseMixin desd(...TRUNCATED) | "Charger OBJECT_CODES, MODEL_VERSION depuis corecat.constants.\nCharger Base, CoreCatBaseMixin depui(...TRUNCATED) | "导入 OBJECT_CODES, MODEL_VERSION 从 corecat 点 constants.\n导入 Base, CoreCatBaseMixin 从当(...TRUNCATED) | "corecat बिंदु constants से आयात करें OBJECT_CODES, MODEL_VERSION.\nth(...TRUNCATED) | "Importar OBJECT_CODES, MODEL_VERSION de corecat atributo constants.\nImportar Base, CoreCatBaseMixi(...TRUNCATED) | {"row_id":"pc_cnl100k_000005_c6ce1032252b","python_source_id":"pc_src100k_000005_c6ce1032252b","pyth(...TRUNCATED) | {"en":"from corecat.constants import OBJECT_CODES, MODEL_VERSION\nfrom ._sqlalchemy import Base, Cor(...TRUNCATED) | {"sha256":"c6ce1032252b4239564dee0b9077e8c9f1960fb1fdb37a9e5878b75ad2e48d19","char_count":1533,"byte(...TRUNCATED) | {"repo_name":"DanceCats/CoreCat","path":"corecat/models/project.py","license":"mit","language":"Pyth(...TRUNCATED) | {"families":["classes","functions","imports"],"primary_family":"functions","ast_node_count":104,"ast(...TRUNCATED) | {"projection_api":"plaincode.api.project_python_dataset_rows","requested_languages":["en","es","fr",(...TRUNCATED) | 9684f850160ef18de57b25a1307ae5c116bc2e947cba48987355777be216c768 |
"#!/usr/bin/env python\nfrom ansible.module_utils.hashivault import hashivault_argspec\nfrom ansible(...TRUNCATED) | "# !/usr/bin/env python\nLoad hashivault_argspec from ansible.module_utils.hashivault.\nLoad hashiva(...TRUNCATED) | "# !/usr/bin/env python\nImportar hashivault_argspec desde ansible.module_utils.hashivault.\nImporta(...TRUNCATED) | "# !/usr/bin/env python\nCharger hashivault_argspec depuis ansible.module_utils.hashivault.\nCharger(...TRUNCATED) | "# !/usr/bin/env python\n导入 hashivault_argspec 从 ansible 点 module_utils 点 hashivault.\n导(...TRUNCATED) | "# !/usr/bin/env python\nansible बिंदु module_utils बिंदु hashivault से आ(...TRUNCATED) | "# !/usr/bin/env python\nImportar hashivault_argspec de ansible atributo module_utils atributo hashi(...TRUNCATED) | {"row_id":"pc_cnl100k_000006_a0e9729e2c0c","python_source_id":"pc_src100k_000006_a0e9729e2c0c","pyth(...TRUNCATED) | {"en":"#!/usr/bin/env python\nfrom ansible.module_utils.hashivault import hashivault_argspec\nfrom a(...TRUNCATED) | {"sha256":"a0e9729e2c0c522c08ad9bbb0675686f62c4c54042284552560d733a8e82c311","char_count":1659,"byte(...TRUNCATED) | {"repo_name":"TerryHowe/ansible-modules-hashivault","path":"ansible/modules/hashivault/hashivault_ap(...TRUNCATED) | {"families":["functions","imports"],"primary_family":"functions","ast_node_count":162,"ast_top_count(...TRUNCATED) | {"projection_api":"plaincode.api.project_python_dataset_rows","requested_languages":["en","es","fr",(...TRUNCATED) | 93ac92aa70628792793e51544ef38d84a841abef4152d37a7ef45a60c993efa8 |
"from flask import Blueprint, request, render_template\nfrom ..load import processing_results\nfrom (...TRUNCATED) | "Load Blueprint, request, render_template from flask.\nLoad processing_results from the parent packa(...TRUNCATED) | "Importar Blueprint, request, render_template desde flask.\nImportar processing_results desde el paq(...TRUNCATED) | "Charger Blueprint, request, render_template depuis flask.\nCharger processing_results depuis le paq(...TRUNCATED) | "导入 Blueprint, request, render_template 从 flask.\n导入 processing_results 从父包 点 load(...TRUNCATED) | "flask से आयात करें Blueprint, request, render_template.\nthe parent पैक(...TRUNCATED) | "Importar Blueprint, request, render_template de flask.\nImportar processing_results do pacote pai a(...TRUNCATED) | {"row_id":"pc_cnl100k_000007_3169340bb559","python_source_id":"pc_src100k_000007_3169340bb559","pyth(...TRUNCATED) | {"en":"from flask import Blueprint, request, render_template\nfrom ..load import processing_results\(...TRUNCATED) | {"sha256":"3169340bb5595134af8e94f6e05283fa2a406fb147f6574dcb864b98d5df2328","char_count":1108,"byte(...TRUNCATED) | {"repo_name":"griimick/feature-mlsite","path":"app/liner/views.py","license":"mit","language":"Pytho(...TRUNCATED) | {"families":["functions","imports","loops"],"primary_family":"functions","ast_node_count":180,"ast_t(...TRUNCATED) | {"projection_api":"plaincode.api.project_python_dataset_rows","requested_languages":["en","es","fr",(...TRUNCATED) | 3d752d2079d186e624d73ab69e0e98d82c01e3889fb3903aa1ee2695a690f15d |
"import asyncio\nimport discord\nimport datetime\nimport pytz\nfrom discord.ext import commands\nf(...TRUNCATED) | "Load asyncio.\nLoad discord.\nLoad datetime.\nLoad pytz.\nLoad commands from discord.ext.\nLoad Fuz(...TRUNCATED) | "Importar asyncio.\nImportar discord.\nImportar datetime.\nImportar pytz.\nImportar commands desde d(...TRUNCATED) | "Charger asyncio.\nCharger discord.\nCharger datetime.\nCharger pytz.\nCharger commands depuis disco(...TRUNCATED) | "导入 asyncio.\n导入 discord.\n导入 datetime.\n导入 pytz.\n导入 commands 从 discord 点 e(...TRUNCATED) | "आयात करें asyncio.\nआयात करें discord.\nआयात करें d(...TRUNCATED) | "Importar asyncio.\nImportar discord.\nImportar datetime.\nImportar pytz.\nImportar commands de disc(...TRUNCATED) | {"row_id":"pc_cnl100k_000008_ae3a9eb785a6","python_source_id":"pc_src100k_000008_ae3a9eb785a6","pyth(...TRUNCATED) | {"en":"import asyncio\nimport discord\nimport datetime\nimport pytz\nfrom discord.ext import comma(...TRUNCATED) | {"sha256":"ae3a9eb785a6d7efce90a5bd045df7fdb9aa0b84eaa746434172fb1e49618b98","char_count":8197,"byte(...TRUNCATED) | {"repo_name":"TheMasterGhost/CorpBot","path":"Cogs/Time.py","license":"mit","language":"Python","siz(...TRUNCATED) | {"families":["async","classes","exceptions","functions","imports","loops"],"primary_family":"functio(...TRUNCATED) | {"projection_api":"plaincode.api.project_python_dataset_rows","requested_languages":["en","es","fr",(...TRUNCATED) | 0ab3975b008a70fd4edaf1e6702c9cae4528eabeff74ea14470f7506327646ee |
"import unittest\n\nfrom katas.beta.what_color_is_your_name import string_color\n\n\nclass StringCol(...TRUNCATED) | "Load unittest.\nLoad string_color from katas.beta.what_color_is_your_name.\nDefine class StringColo(...TRUNCATED) | "Importar unittest.\nImportar string_color desde katas.beta.what_color_is_your_name.\nDefinir clase (...TRUNCATED) | "Charger unittest.\nCharger string_color depuis katas.beta.what_color_is_your_name.\nDéfinir classe(...TRUNCATED) | "导入 unittest.\n导入 string_color 从 katas 点 beta 点 what_color_is_your_name.\n定义类 St(...TRUNCATED) | "आयात करें unittest.\nkatas बिंदु beta बिंदु what_color_is_your_(...TRUNCATED) | "Importar unittest.\nImportar string_color de katas atributo beta atributo what_color_is_your_name.\(...TRUNCATED) | {"row_id":"pc_cnl100k_000009_94d665abb9d5","python_source_id":"pc_src100k_000009_94d665abb9d5","pyth(...TRUNCATED) | {"en":"import unittest\n\nfrom katas.beta.what_color_is_your_name import string_color\n\n\nclass Str(...TRUNCATED) | {"sha256":"94d665abb9d572969769335cb2a36992d39b1abfb05af081fe72fc666d6cb428","char_count":656,"byte_(...TRUNCATED) | {"repo_name":"the-zebulan/CodeWars","path":"tests/beta_tests/test_what_color_is_your_name.py","licen(...TRUNCATED) | {"families":["classes","functions","imports"],"primary_family":"functions","ast_node_count":93,"ast_(...TRUNCATED) | {"projection_api":"plaincode.api.project_python_dataset_rows","requested_languages":["en","es","fr",(...TRUNCATED) | 686146bd6bdc22b3dfb6eb3455e21ddc1141d2039a42b718d1dd488af4c6d151 |
Plaincode CNL 100k
This dataset is generated with Plaincode by streaming Python source files, projecting each accepted Python module/function into the current controlled-natural-language surfaces, and enforcing strict exact Python roundtrip proof for every emitted language. The public row schema is one wide row per accepted Python source: source first, then English/Spanish/French/Mandarin/Hindi/Portuguese CNL columns, then selected metadata.
Included surfaces
- Python source
- English CNL (
en) - Spanish CNL (
es) - French CNL (
fr) - Portuguese CNL (
pt) - Mandarin CNL (
zh) - Hindi CNL (
hi)
Acceptance gate
A row is accepted only when every requested CNL surface reverses back to the exact original Python bytes through Plaincode proof. The proof details are used as an internal gate, but they are not stored in each public JSON row.
Schema
Each row is ordered for readability and training use:
source: the original Python sourceenglish_cnl: English Plaincode CNLspanish_cnl: Spanish Plaincode CNLfrench_cnl: French Plaincode CNLmandarin_cnl: Mandarin Plaincode CNLhindi_cnl: Hindi Plaincode CNLportuguese_cnl: Portuguese Plaincode CNLids: row/source/run identifiersroundtrip_python_by_language: exact reconstructed Python per CNL languagesource_stats: source hash/count/compile summarysource_metadata: upstream source metadata where availablesemantic_family: broad code-shape labelengine: Plaincode generation metadatarow_sha256: stable hash for the public row
Generation stats
{
"accepted_python_sources": 100000,
"accepted_rows": 100000,
"completed": true,
"completed_shards": [
"train-00000.jsonl.gz",
"train-00001.jsonl.gz",
"train-00002.jsonl.gz",
"train-00003.jsonl.gz",
"train-00004.jsonl.gz",
"train-00005.jsonl.gz",
"train-00006.jsonl.gz",
"train-00007.jsonl.gz",
"train-00008.jsonl.gz",
"train-00009.jsonl.gz",
"train-00010.jsonl.gz",
"train-00011.jsonl.gz",
"train-00012.jsonl.gz",
"train-00013.jsonl.gz",
"train-00014.jsonl.gz",
"train-00015.jsonl.gz",
"train-00016.jsonl.gz",
"train-00017.jsonl.gz",
"train-00018.jsonl.gz",
"train-00019.jsonl.gz"
],
"duplicates": 0,
"family_counts": {
"async": 1201,
"classes": 51727,
"comprehensions": 16570,
"context_managers": 11998,
"exceptions": 25579,
"functions": 76419,
"generators": 3523,
"imports": 91065,
"lambda": 6296,
"loops": 38775,
"simple_module": 2564
},
"filter_reasons": {
"auto_generated": 6812,
"compile_error:UnicodeEncodeError": 2,
"skipped_path_pattern": 6786,
"syntax_error:\"\\ \" is an invalid escape sequence. Did you mean \"\\\\ \"? A raw string is also an option.": 84,
"syntax_error:\"\\!\" is an invalid escape sequence. Did you mean \"\\\\!\"? A raw string is also an option.": 5,
"syntax_error:\"\\#\" is an invalid escape sequence. Did you mean \"\\\\#\"? A raw string is also an option.": 11,
"syntax_error:\"\\$\" is an invalid escape sequence. Did you mean \"\\\\$\"? A raw string is also an option.": 27,
"syntax_error:\"\\%\" is an invalid escape sequence. Did you mean \"\\\\%\"? A raw string is also an option.": 13,
"syntax_error:\"\\&\" is an invalid escape sequence. Did you mean \"\\\\&\"? A raw string is also an option.": 7,
"syntax_error:\"\\(\" is an invalid escape sequence. Did you mean \"\\\\(\"? A raw string is also an option.": 124,
"syntax_error:\"\\)\" is an invalid escape sequence. Did you mean \"\\\\)\"? A raw string is also an option.": 4,
"syntax_error:\"\\*\" is an invalid escape sequence. Did you mean \"\\\\*\"? A raw string is also an option.": 60,
"syntax_error:\"\\+\" is an invalid escape sequence. Did you mean \"\\\\+\"? A raw string is also an option.": 19,
"syntax_error:\"\\,\" is an invalid escape sequence. Did you mean \"\\\\,\"? A raw string is also an option.": 9,
"syntax_error:\"\\-\" is an invalid escape sequence. Did you mean \"\\\\-\"? A raw string is also an option.": 55,
"syntax_error:\"\\.\" is an invalid escape sequence. Did you mean \"\\\\.\"? A raw string is also an option.": 239,
"syntax_error:\"\\/\" is an invalid escape sequence. Did you mean \"\\\\/\"? A raw string is also an option.": 66,
"syntax_error:\"\\9\" is an invalid escape sequence. Did you mean \"\\\\9\"? A raw string is also an option.": 1,
"syntax_error:\"\\:\" is an invalid escape sequence. Did you mean \"\\\\:\"? A raw string is also an option.": 15,
"syntax_error:\"\\;\" is an invalid escape sequence. Did you mean \"\\\\;\"? A raw string is also an option.": 11,
"syntax_error:\"\\<\" is an invalid escape sequence. Did you mean \"\\\\<\"? A raw string is also an option.": 13,
"syntax_error:\"\\=\" is an invalid escape sequence. Did you mean \"\\\\=\"? A raw string is also an option.": 5,
"syntax_error:\"\\>\" is an invalid escape sequence. Did you mean \"\\\\>\"? A raw string is also an option.": 2,
"syntax_error:\"\\?\" is an invalid escape sequence. Did you mean \"\\\\?\"? A raw string is also an option.": 23,
"syntax_error:\"\\@\" is an invalid escape sequence. Did you mean \"\\\\@\"? A raw string is also an option.": 4,
"syntax_error:\"\\A\" is an invalid escape sequence. Did you mean \"\\\\A\"? A raw string is also an option.": 13,
"syntax_error:\"\\B\" is an invalid escape sequence. Did you mean \"\\\\B\"? A raw string is also an option.": 2,
"syntax_error:\"\\C\" is an invalid escape sequence. Did you mean \"\\\\C\"? A raw string is also an option.": 3,
"syntax_error:\"\\D\" is an invalid escape sequence. Did you mean \"\\\\D\"? A raw string is also an option.": 23,
"syntax_error:\"\\E\" is an invalid escape sequence. Did you mean \"\\\\E\"? A raw string is also an option.": 5,
"syntax_error:\"\\F\" is an invalid escape sequence. Did you mean \"\\\\F\"? A raw string is also an option.": 3,
"syntax_error:\"\\G\" is an invalid escape sequence. Did you mean \"\\\\G\"? A raw string is also an option.": 4,
"syntax_error:\"\\H\" is an invalid escape sequence. Did you mean \"\\\\H\"? A raw string is also an option.": 3,
"syntax_error:\"\\I\" is an invalid escape sequence. Did you mean \"\\\\I\"? A raw string is also an option.": 6,
"syntax_error:\"\\L\" is an invalid escape sequence. Did you mean \"\\\\L\"? A raw string is also an option.": 3,
"syntax_error:\"\\M\" is an invalid escape sequence. Did you mean \"\\\\M\"? A raw string is also an option.": 6,
"syntax_error:\"\\O\" is an invalid escape sequence. Did you mean \"\\\\O\"? A raw string is also an option.": 1,
"syntax_error:\"\\P\" is an invalid escape sequence. Did you mean \"\\\\P\"? A raw string is also an option.": 20,
"syntax_error:\"\\R\" is an invalid escape sequence. Did you mean \"\\\\R\"? A raw string is also an option.": 1,
"syntax_error:\"\\S\" is an invalid escape sequence. Did you mean \"\\\\S\"? A raw string is also an option.": 49,
"syntax_error:\"\\T\" is an invalid escape sequence. Did you mean \"\\\\T\"? A raw string is also an option.": 8,
"syntax_error:\"\\V\" is an invalid escape sequence. Did you mean \"\\\\V\"? A raw string is also an option.": 3,
"syntax_error:\"\\W\" is an invalid escape sequence. Did you mean \"\\\\W\"? A raw string is also an option.": 57,
"syntax_error:\"\\Z\" is an invalid escape sequence. Did you mean \"\\\\Z\"? A raw string is also an option.": 3,
"syntax_error:\"\\[\" is an invalid escape sequence. Did you mean \"\\\\[\"? A raw string is also an option.": 104,
"syntax_error:\"\\]\" is an invalid escape sequence. Did you mean \"\\\\]\"? A raw string is also an option.": 10,
"syntax_error:\"\\^\" is an invalid escape sequence. Did you mean \"\\\\^\"? A raw string is also an option.": 3,
"syntax_error:\"\\_\" is an invalid escape sequence. Did you mean \"\\\\_\"? A raw string is also an option.": 35,
"syntax_error:\"\\`\" is an invalid escape sequence. Did you mean \"\\\\`\"? A raw string is also an option.": 4,
"syntax_error:\"\\c\" is an invalid escape sequence. Did you mean \"\\\\c\"? A raw string is also an option.": 27,
"syntax_error:\"\\d\" is an invalid escape sequence. Did you mean \"\\\\d\"? A raw string is also an option.": 436,
"syntax_error:\"\\e\" is an invalid escape sequence. Did you mean \"\\\\e\"? A raw string is also an option.": 9,
"syntax_error:\"\\g\" is an invalid escape sequence. Did you mean \"\\\\g\"? A raw string is also an option.": 18,
"syntax_error:\"\\h\" is an invalid escape sequence. Did you mean \"\\\\h\"? A raw string is also an option.": 11,
"syntax_error:\"\\i\" is an invalid escape sequence. Did you mean \"\\\\i\"? A raw string is also an option.": 25,
"syntax_error:\"\\j\" is an invalid escape sequence. Did you mean \"\\\\j\"? A raw string is also an option.": 1,
"syntax_error:\"\\k\" is an invalid escape sequence. Did you mean \"\\\\k\"? A raw string is also an option.": 1,
"syntax_error:\"\\l\" is an invalid escape sequence. Did you mean \"\\\\l\"? A raw string is also an option.": 33,
"syntax_error:\"\\m\" is an invalid escape sequence. Did you mean \"\\\\m\"? A raw string is also an option.": 43,
"syntax_error:\"\\o\" is an invalid escape sequence. Did you mean \"\\\\o\"? A raw string is also an option.": 18,
"syntax_error:\"\\p\" is an invalid escape sequence. Did you mean \"\\\\p\"? A raw string is also an option.": 28,
"syntax_error:\"\\q\" is an invalid escape sequence. Did you mean \"\\\\q\"? A raw string is also an option.": 1,
"syntax_error:\"\\s\" is an invalid escape sequence. Did you mean \"\\\\s\"? A raw string is also an option.": 371,
"syntax_error:\"\\u\" is an invalid escape sequence. Did you mean \"\\\\u\"? A raw string is also an option.": 2,
"syntax_error:\"\\w\" is an invalid escape sequence. Did you mean \"\\\\w\"? A raw string is also an option.": 180,
"syntax_error:\"\\y\" is an invalid escape sequence. Did you mean \"\\\\y\"? A raw string is also an option.": 1,
"syntax_error:\"\\z\" is an invalid escape sequence. Did you mean \"\\\\z\"? A raw string is also an option.": 1,
"syntax_error:\"\\{\" is an invalid escape sequence. Did you mean \"\\\\{\"? A raw string is also an option.": 31,
"syntax_error:\"\\|\" is an invalid escape sequence. Did you mean \"\\\\|\"? A raw string is also an option.": 19,
"syntax_error:\"\\}\" is an invalid escape sequence. Did you mean \"\\\\}\"? A raw string is also an option.": 1,
"syntax_error:\"is not\" with 'bytes' literal. Did you mean \"!=\"?": 1,
"syntax_error:\"is not\" with 'int' literal. Did you mean \"!=\"?": 55,
"syntax_error:\"is not\" with 'str' literal. Did you mean \"!=\"?": 54,
"syntax_error:\"is not\" with 'tuple' literal. Did you mean \"!=\"?": 2,
"syntax_error:\"is\" with 'bytes' literal. Did you mean \"==\"?": 3,
"syntax_error:\"is\" with 'int' literal. Did you mean \"==\"?": 150,
"syntax_error:\"is\" with 'str' literal. Did you mean \"==\"?": 157,
"syntax_error:\"is\" with 'tuple' literal. Did you mean \"==\"?": 2,
"syntax_error:'(' was never closed": 329,
"syntax_error:':' expected after dictionary key": 1,
"syntax_error:'[' was never closed": 9,
"syntax_error:'await' outside async function": 1,
"syntax_error:'bool' object is not callable; perhaps you missed a comma?": 2,
"syntax_error:'break' in a 'finally' block": 3,
"syntax_error:'return' in a 'finally' block": 54,
"syntax_error:'return' outside function": 7,
"syntax_error:'str' object is not callable; perhaps you missed a comma?": 5,
"syntax_error:'tuple' object is not callable; perhaps you missed a comma?": 2,
"syntax_error:'u' and 'r' prefixes are incompatible": 74,
"syntax_error:'yield from' inside async function": 1,
"syntax_error:(unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: malformed \\N character escape": 2,
"syntax_error:(unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \\UXXXXXXXX escape": 1,
"syntax_error:(unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \\uXXXX escape": 2,
"syntax_error:(unicode error) 'unicodeescape' codec can't decode bytes in position 11-22: unknown Unicode character name": 1,
"syntax_error:(unicode error) 'unicodeescape' codec can't decode bytes in position 114-115: truncated \\xXX escape": 1,
"syntax_error:(unicode error) 'unicodeescape' codec can't decode bytes in position 13-14: malformed \\N character escape": 1,
"syntax_error:(unicode error) 'unicodeescape' codec can't decode bytes in position 14-15: truncated \\UXXXXXXXX escape": 1,
"syntax_error:(unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \\UXXXXXXXX escape": 11,
"syntax_error:(unicode error) 'unicodeescape' codec can't decode bytes in position 23-24: truncated \\UXXXXXXXX escape": 1,
"syntax_error:(unicode error) 'unicodeescape' codec can't decode bytes in position 280-282: truncated \\xXX escape": 1,
"syntax_error:(unicode error) 'unicodeescape' codec can't decode bytes in position 294-295: truncated \\uXXXX escape": 1,
"syntax_error:(unicode error) 'unicodeescape' codec can't decode bytes in position 40-41: truncated \\uXXXX escape": 2,
"syntax_error:(unicode error) 'unicodeescape' codec can't decode bytes in position 41-42: truncated \\uXXXX escape": 1,
"syntax_error:(unicode error) 'unicodeescape' codec can't decode bytes in position 62-63: truncated \\UXXXXXXXX escape": 1,
"syntax_error:* argument may appear only once": 1,
"syntax_error:Expected one or more names after 'import'": 3,
"syntax_error:Function parameters cannot be parenthesized": 83,
"syntax_error:Generator expression must be parenthesized": 2,
"syntax_error:Lambda expression parameters cannot be parenthesized": 60,
"syntax_error:Missing parentheses in call to 'exec'. Did you mean exec(...)?": 51,
"syntax_error:Missing parentheses in call to 'print'. Did you mean print(...)?": 13128,
"syntax_error:assertion is always true, perhaps remove parentheses?": 6,
"syntax_error:bytes can only contain ASCII literal characters": 1,
"syntax_error:can't use starred expression here": 1,
"syntax_error:cannot assign to True": 5,
"syntax_error:cannot assign to attribute here. Maybe you meant '==' instead of '='?": 2,
"syntax_error:cannot assign to expression here. Maybe you meant '==' instead of '='?": 1,
"syntax_error:cannot mix bytes and nonbytes literals": 2,
"syntax_error:cannot use except statement with tuple": 2,
"syntax_error:closing parenthesis ')' does not match opening parenthesis '['": 1,
"syntax_error:closing parenthesis ')' does not match opening parenthesis '[' on line 184": 1,
"syntax_error:closing parenthesis ')' does not match opening parenthesis '{' on line 125": 1,
"syntax_error:closing parenthesis ')' does not match opening parenthesis '{' on line 201": 1,
"syntax_error:closing parenthesis ')' does not match opening parenthesis '{' on line 53": 1,
"syntax_error:closing parenthesis ')' does not match opening parenthesis '{' on line 64": 1,
"syntax_error:closing parenthesis ']' does not match opening parenthesis '('": 4,
"syntax_error:closing parenthesis ']' does not match opening parenthesis '(' on line 10": 1,
"syntax_error:closing parenthesis ']' does not match opening parenthesis '(' on line 21": 1,
"syntax_error:closing parenthesis ']' does not match opening parenthesis '(' on line 27": 1,
"syntax_error:closing parenthesis ']' does not match opening parenthesis '(' on line 42": 1,
"syntax_error:closing parenthesis '}' does not match opening parenthesis '(' on line 126": 1,
"syntax_error:closing parenthesis '}' does not match opening parenthesis '(' on line 13": 1,
"syntax_error:closing parenthesis '}' does not match opening parenthesis '(' on line 202": 1,
"syntax_error:closing parenthesis '}' does not match opening parenthesis '(' on line 22": 1,
"syntax_error:closing parenthesis '}' does not match opening parenthesis '(' on line 255": 1,
"syntax_error:closing parenthesis '}' does not match opening parenthesis '(' on line 52": 1,
"syntax_error:closing parenthesis '}' does not match opening parenthesis '(' on line 85": 1,
"syntax_error:closing parenthesis '}' does not match opening parenthesis '['": 1,
"syntax_error:did you forget parentheses around the comprehension target?": 1,
"syntax_error:expected '('": 11,
"syntax_error:expected ':'": 42,
"syntax_error:expected 'else' after 'if' expression": 1,
"syntax_error:expected 'except' or 'finally' block": 3,
"syntax_error:expected an indented block after 'elif' statement on line 13": 1,
"syntax_error:expected an indented block after 'else' statement on line 86": 1,
"syntax_error:expected an indented block after 'for' statement on line 1": 1,
"syntax_error:expected an indented block after 'for' statement on line 135": 1,
"syntax_error:expected an indented block after 'for' statement on line 21": 2,
"syntax_error:expected an indented block after 'for' statement on line 238": 1,
"syntax_error:expected an indented block after 'for' statement on line 27": 1,
"syntax_error:expected an indented block after 'for' statement on line 32": 1,
"syntax_error:expected an indented block after 'for' statement on line 42": 1,
"syntax_error:expected an indented block after 'for' statement on line 45": 1,
"syntax_error:expected an indented block after 'for' statement on line 5": 2,
"syntax_error:expected an indented block after 'for' statement on line 57": 1,
"syntax_error:expected an indented block after 'for' statement on line 88": 1,
"syntax_error:expected an indented block after 'for' statement on line 9": 1,
"syntax_error:expected an indented block after 'if' statement on line 1": 1,
"syntax_error:expected an indented block after 'if' statement on line 10": 1,
"syntax_error:expected an indented block after 'if' statement on line 12": 2,
"syntax_error:expected an indented block after 'if' statement on line 20": 1,
"syntax_error:expected an indented block after 'if' statement on line 209": 1,
"syntax_error:expected an indented block after 'if' statement on line 24": 1,
"syntax_error:expected an indented block after 'if' statement on line 32": 1,
"syntax_error:expected an indented block after 'if' statement on line 33": 1,
"syntax_error:expected an indented block after 'if' statement on line 39": 1,
"syntax_error:expected an indented block after 'if' statement on line 42": 1,
"syntax_error:expected an indented block after 'if' statement on line 49": 1,
"syntax_error:expected an indented block after 'if' statement on line 5": 2,
"syntax_error:expected an indented block after 'if' statement on line 50": 1,
"syntax_error:expected an indented block after 'if' statement on line 78": 1,
"syntax_error:expected an indented block after 'try' statement on line 23": 1,
"syntax_error:expected an indented block after 'while' statement on line 105": 1,
"syntax_error:expected an indented block after 'while' statement on line 25": 1,
"syntax_error:expected an indented block after 'with' statement on line 72": 1,
"syntax_error:expected an indented block after class definition on line 19": 1,
"syntax_error:expected an indented block after class definition on line 3": 2,
"syntax_error:expected an indented block after class definition on line 5": 1,
"syntax_error:expected an indented block after class definition on line 6": 2,
"syntax_error:expected an indented block after class definition on line 7": 3,
"syntax_error:expected an indented block after class definition on line 8": 1,
"syntax_error:expected an indented block after class definition on line 82": 1,
"syntax_error:expected an indented block after function definition on line 1": 1,
"syntax_error:expected an indented block after function definition on line 11": 2,
"syntax_error:expected an indented block after function definition on line 13": 2,
"syntax_error:expected an indented block after function definition on line 154": 1,
"syntax_error:expected an indented block after function definition on line 16": 1,
"syntax_error:expected an indented block after function definition on line 17": 3,
"syntax_error:expected an indented block after function definition on line 2": 1,
"syntax_error:expected an indented block after function definition on line 21": 1,
"syntax_error:expected an indented block after function definition on line 25": 2,
"syntax_error:expected an indented block after function definition on line 251": 1,
"syntax_error:expected an indented block after function definition on line 3": 1,
"syntax_error:expected an indented block after function definition on line 30": 1,
"syntax_error:expected an indented block after function definition on line 32": 1,
"syntax_error:expected an indented block after function definition on line 33": 1,
"syntax_error:expected an indented block after function definition on line 35": 1,
"syntax_error:expected an indented block after function definition on line 38": 1,
"syntax_error:expected an indented block after function definition on line 4": 5,
"syntax_error:expected an indented block after function definition on line 47": 1,
"syntax_error:expected an indented block after function definition on line 5": 2,
"syntax_error:expected an indented block after function definition on line 6": 3,
"syntax_error:expected an indented block after function definition on line 63": 2,
"syntax_error:expected an indented block after function definition on line 68": 1,
"syntax_error:expected an indented block after function definition on line 7": 1,
"syntax_error:expected an indented block after function definition on line 73": 1,
"syntax_error:expected an indented block after function definition on line 74": 1,
"syntax_error:expected an indented block after function definition on line 9": 1,
"syntax_error:expected argument value expression": 3,
"syntax_error:expression cannot contain assignment, perhaps you meant \"==\"?": 1,
"syntax_error:expression expected after dictionary key and ':'": 1,
"syntax_error:from __future__ imports must occur at the beginning of the file": 5,
"syntax_error:future feature relative_imports is not defined": 1,
"syntax_error:import * only allowed at module level": 6,
"syntax_error:inconsistent use of tabs and spaces in indentation": 572,
"syntax_error:invalid character '§' (U+00A7)": 1,
"syntax_error:invalid character '©' (U+00A9)": 1,
"syntax_error:invalid character '·' (U+00B7)": 2,
"syntax_error:invalid character '–' (U+2013)": 1,
"syntax_error:invalid character '‘' (U+2018)": 2,
"syntax_error:invalid character '’' (U+2019)": 1,
"syntax_error:invalid character '…' (U+2026)": 1,
"syntax_error:invalid character '→' (U+2192)": 1,
"syntax_error:invalid character ',' (U+FF0C)": 1,
"syntax_error:invalid character ':' (U+FF1A)": 1,
"syntax_error:invalid decimal literal": 125,
"syntax_error:invalid hexadecimal literal": 17,
"syntax_error:invalid non-printable character U+00A0": 2,
"syntax_error:invalid syntax": 1043,
"syntax_error:invalid syntax. Is this intended to be part of the string?": 4,
"syntax_error:invalid syntax. Maybe you meant '==' or ':=' instead of '='?": 14,
"syntax_error:invalid syntax. Perhaps you forgot a comma?": 99,
"syntax_error:leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers": 139,
"syntax_error:list indices must be integers or slices, not tuple; perhaps you missed a comma?": 1,
"syntax_error:name 'CONFIG' is assigned to before global declaration": 1,
"syntax_error:name 'i' is assigned to before global declaration": 1,
"syntax_error:name 'imageSizeX' is used prior to global declaration": 1,
"syntax_error:name 'logo_box' is assigned to before global declaration": 1,
"syntax_error:name 'time' is used prior to global declaration": 1,
"syntax_error:named arguments must follow bare *": 1,
"syntax_error:parameter without a default follows parameter with a default": 5,
"syntax_error:positional argument follows keyword argument": 3,
"syntax_error:trailing comma not allowed without surrounding parentheses": 1,
"syntax_error:unexpected EOF while parsing": 9,
"syntax_error:unexpected character after line continuation character": 2,
"syntax_error:unexpected indent": 60,
"syntax_error:unindent does not match any outer indentation level": 24,
"syntax_error:unmatched ')'": 104,
"syntax_error:unmatched ']'": 2,
"syntax_error:unmatched '}'": 3,
"syntax_error:unterminated string literal (detected at line 10)": 3,
"syntax_error:unterminated string literal (detected at line 104)": 1,
"syntax_error:unterminated string literal (detected at line 107); perhaps you escaped the end quote?": 1,
"syntax_error:unterminated string literal (detected at line 116)": 1,
"syntax_error:unterminated string literal (detected at line 12); perhaps you escaped the end quote?": 1,
"syntax_error:unterminated string literal (detected at line 13)": 3,
"syntax_error:unterminated string literal (detected at line 13); perhaps you escaped the end quote?": 1,
"syntax_error:unterminated string literal (detected at line 14)": 2,
"syntax_error:unterminated string literal (detected at line 15)": 2,
"syntax_error:unterminated string literal (detected at line 16)": 1,
"syntax_error:unterminated string literal (detected at line 178)": 1,
"syntax_error:unterminated string literal (detected at line 191)": 1,
"syntax_error:unterminated string literal (detected at line 22)": 1,
"syntax_error:unterminated string literal (detected at line 226)": 1,
"syntax_error:unterminated string literal (detected at line 241)": 1,
"syntax_error:unterminated string literal (detected at line 29)": 1,
"syntax_error:unterminated string literal (detected at line 3)": 2,
"syntax_error:unterminated string literal (detected at line 6)": 1,
"syntax_error:unterminated string literal (detected at line 64)": 1,
"syntax_error:unterminated string literal (detected at line 7)": 2,
"syntax_error:unterminated string literal (detected at line 70)": 1,
"syntax_error:unterminated string literal (detected at line 78)": 1,
"syntax_error:unterminated string literal (detected at line 79)": 1,
"syntax_error:unterminated string literal (detected at line 8)": 1,
"syntax_error:unterminated string literal (detected at line 91)": 1,
"syntax_error:unterminated string literal (detected at line 92)": 1,
"syntax_error:unterminated string literal (detected at line 93)": 1,
"syntax_error:unterminated string literal (detected at line 98)": 2,
"syntax_error:unterminated triple-quoted f-string literal (detected at line 145)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 103)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 107)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 108)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 119)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 132)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 145)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 15)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 16)": 2,
"syntax_error:unterminated triple-quoted string literal (detected at line 170)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 171)": 2,
"syntax_error:unterminated triple-quoted string literal (detected at line 185)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 19)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 193)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 215)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 260)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 316)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 383)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 389)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 394)": 3,
"syntax_error:unterminated triple-quoted string literal (detected at line 41)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 43)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 430)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 57)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 80)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 81)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 87)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 90)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 92)": 1,
"syntax_error:unterminated triple-quoted string literal (detected at line 96)": 1,
"too_large": 9890,
"too_many_ast_nodes": 39,
"too_many_lines": 677,
"too_small": 3022
},
"filtered_candidates": 46439,
"finished_utc": "2026-05-31T01:47:30+00:00",
"hf_repo_id": "CircularBalls/plaincode-cnl-100k",
"languages": [
"en",
"es",
"fr",
"pt",
"zh",
"hi"
],
"projection_failures": 82,
"resume_accepted_rows": 0,
"resume_skipped_accepted_rows": 0,
"resume_start_shard": 0,
"row_schema": "source-six-cnl-columns-selected-metadata-v2",
"run_id": "run_20260530_225930_utc",
"seen_candidates": 146521,
"shards_completed": 20,
"source_label_counts": {
"github_code_clean_python_mit_single": 100000
},
"source_specs": [
{
"config": "Python-mit",
"dataset": "codeparrot/github-code-clean",
"enabled": true,
"label": "github_code_clean_python_mit_single",
"split": "train",
"trust_remote_code": false,
"weight": 1
}
],
"started_utc": "2026-05-30T22:59:32+00:00",
"strict_gate_failures": 0,
"target_accepted_rows": 100000,
"updated_utc": "2026-05-31T01:47:30+00:00"
}
Intended use
Training/evaluating semantic projection, code-to-CNL, CNL-to-code, multilingual controlled-language alignment, and exact roundtrip experiments.
Source/license note
Rows retain source metadata where available, including license fields exposed by upstream datasets. The default generator configuration prioritizes permissive-ish Python source configs. Review upstream licenses before redistribution or commercial use.
- Downloads last month
- 214