HILANCO

Hungarian Intelligent Language Applications Consortium

« Back to Home Page

HIL-ALBERT

ALBERT is a language model aimed at improving training speed and decreasing memory consumption of the BERT model by applying parameter-reduction techniques. It was introduced in the original paper by Lan et al. We present two pre-trained uncased ALBERT models: one of them was trained on Hungarian Wikipedia, which is a part of the Webcorpus 2.0 dataset, while the other one was trained on a sample from the NYTI-BERT corpus containing approximately 10% of the whole dataset. We used Google’s SentencePiece for tokenization with a vocabulary size of 30000 tokens. The models were trained using Masked Language Modeling but without Next Sentence Prediction. Our code was based on the Hugging Face library. The training was performed on 4 GTX 1080Ti GPU cards with the batch size set to 32. We used a single epoch for training the first model on the sample from the NYTI-BERT corpus and 2 epochs for training the other model on the Wikipedia corpus. In the first case, the run took approximately 85 hours and about 54 hours in the second case.

To DOWNLOAD the models, please fill out the registration form: » REGISTRATION FORM «

References


More Language Models