Tom McIntyre
Member
The NAWCC has a very large document archive in pdf files from our publications starting in 1943.
I have experimented with modernizing some of the documents by compressing and extracting the text/ocr text depending on the age of the documents. I was looking at making each document a post and each issue a thread in XenForo with an embedded image of the first page and a hidden comment containing the text portion.
It works pretty well, but I looked around a bit more and found fscrawler for elastic search. I was wondering how much work it might be to add an additional content type that used fscrawler to create an index that would find and provide links to the documents in our archive. I am sure I could find someone in the elastic community to work on this, but you seem to be pretty familiar with ES, so I thought I would ask here first.
By the way thank you for adding the simple search to search expressions. Now I only need to figure out how to teach my mostly 70 year old audience to use it.
I have experimented with modernizing some of the documents by compressing and extracting the text/ocr text depending on the age of the documents. I was looking at making each document a post and each issue a thread in XenForo with an embedded image of the first page and a hidden comment containing the text portion.
It works pretty well, but I looked around a bit more and found fscrawler for elastic search. I was wondering how much work it might be to add an additional content type that used fscrawler to create an index that would find and provide links to the documents in our archive. I am sure I could find someone in the elastic community to work on this, but you seem to be pretty familiar with ES, so I thought I would ask here first.
By the way thank you for adding the simple search to search expressions. Now I only need to figure out how to teach my mostly 70 year old audience to use it.