Model is trained on a large parquet dataset for a long time, instead of a single text file for a few minutes. The tokenizer of Llama3 is used. Generate Llama3 weights which can be loaded by any ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results