Model is trained on a large parquet dataset for a long time, instead of a single text file for a few minutes. The tokenizer of Llama3 is used. Generate Llama3 weights which can be loaded by any ...