Filling a Wikibase instance with Millions of Data

As more and more Wikibase instances are coming into existence we are seeing attempts to start them with masses of data from already existing data bases that switch to the new software.

Experimenting I tried to find a faster way to insert a huge amount of items into a Wikibase instance. I have not been able to insert more than two or three statements per second using the ‘official’ tools, such as QuickStatements or the WDI library.

Therefore, I am inserting the data directly into the MySQL database used by Wikibase.

The process consists of these steps:

  • generate the data for an item in JSON
  • determine the next Q number and update the JSON item data accordingly


  • [...]

Quelle: https://blog.factgrid.de/archives/2013

Weiterlesen

2018-06-11/12: Daten in das FactGrid füllen — praktischer Workshop der Gothaer Forschungsstelle Illuminatenforschung

Liebe Erfurter und Gothaer FactGrid Interessierte,

in den letzten Wochen arbeiteten wir im engeren Kreis an der Einrichtung einer WikiBase-Instanz (der Software hinter dem Wikidata-Projekt) auf dem Server der Uni Erfurt: https://database.factgrid.de/

Es gab einige technische Schwierigkeiten bei der Anpassung der Tools an die neue Server-Umgebung zu bewältigen, doch sind wir seit einigen Tagen soweit, dass die Grundausstattung läuft: QuickStatements steht für den massenweisen Datenimport zur Verfügung. Der Query Service läuft, so dass wir SPARQL-Abfragen von Daten hinkriegen. Das Design (Logo etc.) ist noch offen – was im Moment den Vorteil hat, dass alles wie bei Wikidata aussieht. Matti Blume arbeitet daran, die Datensätze des Illuminatenprojektes für den Projektstart in die Datenbank zu füllen.



[...]

Quelle: https://blog.factgrid.de/archives/884

Weiterlesen

The (sobering) status report of Friday 13, April 2018

[A version of this was originally posted here]

[Postscript Friday 4, May 2018: SPARQL is on, we are in the middle of our first more massiv data input]

Four months have passed since the kick-off workshop shop, and the FactGrid project has run into its first unexpected problems. We are confident that we will solve the – primarily technical – issues, but one of the lessons we have learned so far is that we will need the support of a larger community in order to situate the FactGrid Project with more impact in the Wikidata-community.

What do we want to achieve? We are still trying to launch a Wikibase installation with the aim to offer a platform for original research. Data hosted on the FactGrid will be free to be used by Wikidata. Data will leave the FactGrid database, however, with the personal authorisations of research which Wikidata is not be able to generate.

Digital humanities projects interested to work on the collective FactGrid platform will sponsor software developments with their respective DH-funding.

[...]

Quelle: https://blog.factgrid.de/archives/817

Weiterlesen