Parsing m3u file for curl checks using AWK

July 28, 2016 Comments off

I recently had reason to make use of an m3u playlist file for an IPTV device. I found one on http://www.sattvhelp.com, a great resource for all kinds of sat and IPTV issues. IIRC I came across this after finding a post about a filmon.tv plugin at http://iptvlivestream.com/iptv/filmon-tv/.

In any case the m3u file contained lots of links to IPTV stations that were no longer available of not responding anymore so I wrote an awk script to parse the m3u and based on an expression execute and action that was a system() call. AWK is a great tool to use search expressions and logic on records in files but is a PITA to debug – and my other attempts at using getline didnt help either. M3U files are a sequence of paired records which makes grep and shell scripting innappropriate I think (which I did try firstly) and awk seems a better way, even if it needs a bunch of calls out to another process/shell.

See my post on sattvhelp.com for more (http://www.sattvhelp.com/forum/technomate-non-linux-chat/54223-iptv-channels-tm-f3-5-tm5402-m3-33.html#post149601) but heres the script if you need it to parse/validate any other kind of m3u playlist from time to time.

#!/usr/bin/awk -f
 BEGIN {
 FS="\n"
 print "#EXTM3U"
 }
 {
 if ( $0 ~ /^#EXTINF/ ) {
 ITM=$0
 #reset URL so it doesnt print the same one twice
 URL=""
 } else if ( $0 !~ /^#EXTINF/ ) {
 URL=$0
 #print "Found "URL
 }

#print ITM"\n"URL
 # curl --connect-timeout $SLEEP --output /dev/null --silent $u
 #
 #RC=system("curl --connect-timeout 5 --output /dev/null -silent" URL)

if ( URL != "" ) {
 cmd = "curl --head --location --fail --max-time 10 --connect-timeout 5 --head --output /dev/null -silent " URL
 #print "Calling : "cmd
 RC = system( cmd )
 #print "Return code : "RC
 if ( RC == 0 ) {
 print ITM"\n"URL
 }
 }

}

Pork and Noodles

June 11, 2016 Comments off

Marinade (or coat if you dont have time) minced pork in a mixture of 2 tablespoons each of cornflour, fish sauce, soy sauce (dark and light to give it some colour), oyster sauce; a chopped chili, a couple of cloves of garlic, some chopped ginger – it should be fairly wet and not too thick, perhaps like consistency of single cream.

Boil water for noodles.

Wok on high, starting cooking mince – it will clump together because of the cornflour. Add some thinly sliced white cabbage/bean-sprouts/brocolli even – whatever you like with pork. Garnish with chopped redpepper, coriander, sesame seed and oil, lime juice.

Mix with noodles, serve.

EAT YUM

Open Semantic Desktop Search – EXIF, IPTC etc

June 11, 2016 Comments off

Open Semantic Desktop Search (OSDS), Tika and image/photo metadata

OSDS doesn’t especially handle image metadata out of the box – it seems it needs some massage and some code change.

Tika has a JPEG parser that I’ve configured in (see previous OSDS post [1]). It uses an ImageMetadataExtractor class to pull metadata from jpeg files. It can also do Tiff (perhaps RAW/NEF photo files too) and webp. However, to do this it used Drew Noakes metadata library. In doing so it can extract all sorts of metadata from photo or image files, not just Exif. IPTC or XMP. But the Tika image parsers then use a selection of these metadata fields (as defined in the Tika metadata Java interfaces). Any other metadata gets popped into Tika too with a Catch all Drew metadata handler – and generally this seems to mean that the “unknown” metadata field will appear in the map that Tika processes during output without a prefix like exif: or psd:. So I’ve pulled together those two sets of metadata as lists.

These will then have to be processed by OSDS and Solr. This means updating the enchance_* OSDS script to map them into Solr fields, and updating the Solr schema.xml so that it can pick them up and index them. Phew. (It might be nice just to OSDS to process ANY metadata that tika extracts and for Solr to process ANY metadata that OSDS posts to it, but it doesn’t look lke Solr can dynamically ingest unknown fields (you can train it in schema-less mode tho).

More later….

 

[1] https://uoccou.wordpress.com/2016/04/22/open-semantic-desktop-search-good-but/

 

Drew metadata tags

“DCT Encode Version”
“Flags 0”
“Flags 1”
“Color Transform”
“Header Size”
“Image Height”
“Image Width”
“Planes”
“Bits Per Pixel”
“Compression”
“X Pixels per Meter”
“Y Pixels per Meter”
“Palette Colour Count”
“Important Colour Count”
“Firmware Version”
“Image Number”
“Image Type”
“Owner Name”
“Camera Serial Number”
“Camera Info Array”
“File Length”
“Custom Functions”
“Canon Model ID”
“Movie Info Array”
“AF Point Selected”
“Continuous Drive Mode”
“Contrast”
“Easy Shooting Mode”
“Exposure Mode”
“Flash Details”
“Flash Mode”
“Focal Units per mm”
“Focus Mode”
“Focus Mode”
“Image Size”
“Iso”
“Long Focal Length”
“Macro Mode”
“Metering Mode”
“Saturation”
“Self Timer Delay”
“Sharpness”
“Short Focal Length”
“Quality”
“Unknown Camera Setting 2”
“Unknown Camera Setting 3”
“Unknown Camera Setting 4”
“Digital Zoom”
“Focus Type”
“Unknown Camera Setting 7”
“Lens Type”
“Unknown Camera Setting 9”
“Unknown Camera Setting 10”
“Flash Activity”
“Unknown Camera Setting 12”
“Unknown Camera Setting 13”
“White Balance”
“Sequence Number”
“AF Point Used”
“Flash Bias”
“Auto Exposure Bracketing”
“AEB Bracket Value”
“Subject Distance”
“Auto ISO”
“Base ISO”
“Measured EV”
“Target Aperture”
“Target Exposure Time”
“Exposure Compensation”
“White Balance”
“Slow Shutter”
“Sequence Number”
“Optical Zoom Code”
“Camera Temperature”
“Flash Guide Number”
“AF Points in Focus”
“Flash Exposure Compensation”
“Auto Exposure Bracketing”
“AEB Bracket Value”
“Control Mode”
“Focus Distance Upper”
“Focus Distance Lower”
“F Number”
“Exposure Time”
“Measured EV 2”
“Bulb Duration”
“Camera Type”
“Auto Rotate”
“ND Filter”
“Self Timer 2”
“Flash Output”
“Panorama Frame Number”
“Panorama Direction”
“AF Point Count”
“Valid AF Point Count”
“Image Width”
“Image Height”
“AF Image Width”
“AF Image Height”
“AF Area Width”
“AF Area Height”
“AF Area X Positions”
“AF Area Y Positions”
“AF Points in Focus Count”
“Primary AF Point 1”
“Primary AF Point 2”
“Long Exposure Noise Reduction”
“Shutter/Auto Exposure-lock Buttons”
“Mirror Lockup”
“Tv/Av And Exposure Level”
“AF-Assist Light”
“Shutter Speed in Av Mode”
“Auto-Exposure Bracketing Sequence/Auto Cancellation”
“Shutter Curtain Sync”
“Lens Auto-Focus Stop Button Function Switch”
“Auto Reduction of Fill Flash”
“Menu Button Return Position”
“SET Button Function When Shooting”
“Sensor Cleaning”
“Thumbnail Image Valid Area”
“Serial Number Format”
“Super Macro”
“Date Stamp Mode”
“My Colors”
“Firmware Revision”
“Categories”
“Face Detect Array 1”
“Face Detect Array 2”
“AF Info Array 2”
“Image Unique ID”
“Raw Data Offset”
“Original Decision Data Offset”
“Custom Functions (1D) Array”
“Personal Functions Array”
“Personal Function Values Array”
“File Info Array”
“AF Points in Focus (1D)”
“Lens Model”
“Serial Info Array”
“Dust Removal Data”
“Crop Info”
“Custom Functions Array 2”
“Aspect Information Array”
“Processing Information Array”
“Tone Curve Table”
“Sharpness Table”
“Sharpness Frequency Table”
“White Balance Table”
“Color Balance Array”
“Measured Color Array”
“Color Temperature”
“Canon Flags Array”
“Modified Information Array”
“Tone Curve Matching”
“White Balance Matching”
“Color Space”
“Preview Image Info Array”
“VRD Offset”
“Sensor Information Array”
“Color Data Array 1”
“CRW Parameters”
“Color Data Array 2”
“Black Level”
“Custom Picture Style File Name”
“Color Info Array”
“Vignetting Correction Array 1”
“Vignetting Correction Array 2”
“Lighting Optimizer Array”
“Lens Info Array”
“Ambiance Info Array”
“Filter Info Array”
“CasioType1MakernoteDirectory.java (20 matches)
“CCD Sensitivity”
“Contrast”
“Digital Zoom”
“Flash Intensity”
“Flash Mode”
“Focusing Mode”
“Object Distance”
“Quality”
“Recording Mode”
“Saturation”
“Sharpness”
“Makernote Unknown 1”
“Makernote Unknown 2”
“Makernote Unknown 3”
“Makernote Unknown 4”
“Makernote Unknown 5”
“Makernote Unknown 6”
“Makernote Unknown 7”
“Makernote Unknown 8”
“White Balance”
Thumbnail Dimensions”
“Thumbnail Size”
“Thumbnail Offset”
“Quality Mode”
“Image Size”
“Focus Mode”
“ISO Sensitivity”
“White Balance”
“Focal Length”
“Saturation”
“Contrast”
“Sharpness”
“Print Image Matching (PIM) Info”
“Casio Preview Thumbnail”
“White Balance Bias”
“White Balance”
“Object Distance”
“Flash Distance”
“Record Mode”
“Self Timer”
“Quality”
“Focus Mode”
“Time Zone”
“BestShot Mode”
“CCD ISO Sensitivity”
“Colour Mode”
“Enhancement”
“Filter”
“Makernote Version”
“Serial Number”
“Quality”
“Sharpness”
“White Balance”
“Color Saturation”
“Tone (Contrast)”
“Color Temperature”
“Contrast”
“White Balance Fine Tune”
“Noise Reduction”
“High ISO Noise Reduction”
“Flash Mode”
“Flash Strength”
“Macro”
“Focus Mode”
“Focus Pixel”
“Slow Sync”
“Picture Mode”
“EXR Auto”
“EXR Mode”
“Auto Bracketing”
“Sequence Number”
“FinePix Color Setting”
“Blur Warning”
“Focus Warning”
“AE Warning”
“GE Image Size”
“Dynamic Range”
“Film Mode”
“Dynamic Range Setting”
“Development Dynamic Range”
“Minimum Focal Length”
“Maximum Focal Length”
“Maximum Aperture at Minimum Focal Length”
“Maximum Aperture at Maximum Focal Length”
“Auto Dynamic Range”
“Faces Detected”
“Face Positions”
“Face Detection Data”
“File Source”
“Order Number”
“Frame Number”
“Parallax”
“Kodak Model”
“Quality”
“Burst Mode”
“Image Width”
“Image Height”
“Year Created”
“Month/Day Created”
“Time Created”
“Burst Mode 2”
“Shutter Speed”
“Metering Mode”
“Sequence Number”
“F Number”
“Exposure Time”
“Exposure Compensation”
“Focus Mode”
“White Balance”
“Flash Mode”
“Flash Fired”
“ISO Setting”
“ISO”
“Total Zoom”
“Date/Time Stamp”
“Color Mode”
“Digital Zoom”
“Sharpness”
“Proprietary Thumbnail Format Data”
“Print Image Matching (PIM) Info”
“Quality”
“User Profile”
“Serial Number”
“White Balance”
“Lens Type”
“External Sensor Brightness Value”
“Measured LV”
“Approximate F Number”
“Camera Temperature”
“Color Temperature”
“WB Red Level”
“WB Green Level”
“WB Blue Level”
“CCD Version”
“CCD Board Version”
“Controller Board Version”
“M16 C Version”
“Image ID Number”
“CCD Sensitivity”
“Color Mode”
“Digital Zoom”
“Fisheye Converter”
“Focus”
“Image Adjustment”
“Quality”
“Makernote Unknown 1”
“Makernote Unknown 2”
“Makernote Unknown 3”
“White Balance”
“Firmware Version”
“ISO”
“Quality & File Format”
“White Balance”
“Sharpening”
“AF Type”
“White Balance Fine”
“White Balance RB Coefficients”
“ISO”
“ISO Mode”
“Data Dump”
“Program Shift”
“Exposure Difference”
“Preview IFD”
“Lens Type”
“Flash Used”
“AF Focus Position”
“Shooting Mode”
“Lens Stops”
“Contrast Curve”
“Light source”
“Shot Info”
“Color Balance”
“Lens Data”
“NEF Thumbnail Size”
“Sensor Pixel Size”
“Unknown 10”
“Scene Assist”
“Unknown 11”
“Retouch History”
“Unknown 12”
“Flash Sync Mode”
“Auto Flash Mode”
“Auto Flash Compensation”
“Exposure Sequence Number”
“Color Mode”
“Unknown 20”
“Image Boundary”
“Flash Exposure Compensation”
“Flash Bracket Compensation”
“AE Bracket Compensation”
“Flash Mode”
“Crop High Speed”
“Exposure Tuning”
“Camera Serial Number”
“Color Space”
“VR Info”
“Image Authentication”
“Unknown 35”
“Active D-Lighting”
“Picture Control”
“World Time”
“ISO Info”
“Unknown 36”
“Unknown 37”
“Unknown 38”
“Unknown 39”
“Vignette Control”
“Unknown 40”
“Unknown 41”
“Unknown 42”
“Unknown 43”
“Unknown 44”
“Unknown 45”
“Unknown 46”
“Unknown 47”
“Scene Mode”
“Camera Serial Number”
“Image Data Size”
“Unknown 27”
“Unknown 28”
“Image Count”
“Deleted Image Count”
“Saturation”
“Digital Vari Program”
“Image Stabilisation”
“AF Response”
“Unknown 29”
“Unknown 30”
“Multi Exposure”
“High ISO Noise Reduction”
“Unknown 31”
“Unknown 32”
“Unknown 33”
“Unknown 48”
“Power Up Time”
“AF Info 2”
“File Info”
“AF Tune”
“Flash Info”
“Image Optimisation”
“Image Adjustment”
“Tone Compensation”
“Adapter”
“Lens”
“Manual Focus Distance”
“Digital Zoom”
“Colour Mode”
“Camera Hue Adjustment”
“NEF Compression”
“Saturation”
“Noise Reduction”
“Linearization Table”
“Nikon Capture Data”
“Unknown 49”
“Unknown 50”
“Unknown 51”
“Print IM”
“Unknown 52”
“Unknown 53”
“Nikon Capture Version”
“Nikon Capture Offsets”
“Nikon Scan”
“Unknown 54”
“NEF Bit Depth”
“Unknown 55”
“Camera Settings Version”
“Preview Image Valid”
“Preview Image Start”
“Preview Image Length”
“Exposure Mode”
“AE Lock”
“Metering Mode”
“Exposure Shift”
“ND Filter”
“Macro Mode”
“Focus Mode”
“Focus Process”
“AF Search”
“AF Areas”
“AF Point Selected”
“AF Fine Tune”
“AF Fine Tune Adj”
“Flash Mode”
“Flash Exposure Comp”
“Flash Remote Control”
“Flash Control Mode”
“Flash Intensity”
“Manual Flash Strength”
“White Balance 2”
“White Balance Temperature”
“White Balance Bracket”
“Custom Saturation”
“Modified Saturation”
“Contrast Setting”
“Sharpness Setting”
“Color Space”
“Scene Mode”
“Noise Reduction”
“Distortion Correction”
“Shading Compensation”
“Compression Factor”
“Gradation”
“Picture Mode”
“Picture Mode Saturation”
“Picture Mode Hue”
“Picture Mode Contrast”
“Picture Mode Sharpness”
“Picture Mode BW Filter”
“Picture Mode Tone”
“Noise Filter”
“Art Filter”
“Magic Filter”
“Picture Mode Effect”
“Tone Level”
“Art Filter Effect”
“Drive Mode”
“Panorama Mode”
“Image Quality 2”
“Image Stabilization”
“Stacked Image”
“Manometer Pressure”
“Manometer Reading”
“Extended WB Detect”
“Roll Angle”
“Pitch Angle”
“Date Time UTC”
“Equipment Version”
“Camera Type 2”
“Serial Number”
“Internal Serial Number”
“Focal Plane Diagonal”
“Body Firmware Version”
“Lens Type”
“Lens Serial Number”
“Lens Model”
“Lens Firmware Version”
“Max Aperture At Min Focal”
“Max Aperture At Max Focal”
“Min Focal Length”
“Max Focal Length”
“Max Aperture”
“Lens Properties”
“Extender”
“Extender Serial Number”
“Extender Model”
“Extender Firmware Version”
“Conversion Lens”
“Flash Type”
“Flash Model”
“Flash Firmware Version”
“Flash Serial Number”
“Makernote Version”
“Camera Settings”
“Camera Settings”
“Compressed Image Size”
“Thumbnail Offset”
“Thumbnail Offset”
“Thumbnail Length”
“Thumbnail Image”
“Colour Mode”
“Image Quality”
“Image Quality”
“Body Firmware Version”
“Special Mode”
“JPEG Quality”
“Macro”
“BW Mode”
“DigiZoom Ratio”
“Focal Plane Diagonal”
“Lens Distortion Parameters”
“Firmware Version”
“Pict Info”
“Camera Id”
“Image Width”
“Image Height”
“Original Manufacturer Model”
“Preview Image”
“Pre Capture Frames”
“White Board”
“One Touch WB”
“White Balance Bracket”
“White Balance Bias”
“Scene Mode”
“Firmware”
“Print Image Matching (PIM) Info”
“Data Dump”
“Data Dump 2”
“Shutter Speed Value”
“ISO Value”
“Aperture Value”
“Brightness Value”
“Flash Mode”
“Flash Device”
“Bracket”
“Sensor Temperature”
“Lens Temperature”
“Light Condition”
“Focus Range”
“Focus Mode”
“Focus Distance”
“Zoom”
“Macro Focus”
“Sharpness”
“Flash Charge Level”
“Colour Matrix”
“Black Level”
“White Balance”
“Red Bias”
“Blue Bias”
“Color Matrix Number”
“Serial Number”
“Flash Bias”
“External Flash Bounce”
“External Flash Zoom”
“External Flash Mode”
“Contrast”
“Sharpness Factor”
“Colour Control”
“Valid Bits”
“Coring Filter”
“Final Width”
“Final Height”
“Compression Ratio”
“Thumbnail”
“Thumbnail Offset”
“Thumbnail Length”
“CCD Scan Mode”
“Noise Reduction”
“Infinity Lens Step”
“Near Lens Step”
“Equipment”
“Camera Settings”
“Raw Development”
“Raw Development 2”
“Image Processing”
“Focus Info”
“Raw Info”
“Exposure Mode”
“Flash Mode”
“White Balance”
“Image Size”
“Image Quality”
“Shooting Mode”
“Metering Mode”
“Apex Film Speed Value”
“Apex Shutter Speed Time Value”
“Apex Aperture Value”
“Macro Mode”
“Digital Zoom”
“Exposure Compensation”
“Bracket Step”
“Interval Length”
“Interval Number”
“Focal Length”
“Focus Distance”
“Flash Fired”
“Date”
“Time”
“Max Aperture at Focal Length”
“File Number Memory”
“Last File Number”
“White Balance Red”
“White Balance Green”
“White Balance Blue”
“Saturation”
“Contrast”
“Sharpness”
“Subject Program”
“Flash Compensation”
“ISO Setting”
“Camera Model”
“Interval Mode”
“Folder Name”
“Color Mode”
“Color Filter”
“Black and White Filter”
“Internal Flash”
“Apex Brightness Value”
“Spot Focus Point X Coordinate”
“Spot Focus Point Y Coordinate”
“Wide Focus Zone”
“Focus Mode”
“Focus Area”
“DEC Switch Position”
“Quality Mode”
“Version”
“White Balance”
“Focus Mode”
“AF Area Mode”
“Image Stabilization”
“Macro Mode”
“Record Mode”
“Audio”
“Internal Serial Number”
“Unknown Data Dump”
“Easy Mode”
“White Balance Bias”
“Flash Bias”
“Exif Version”
“Color Effect”
“Camera Uptime”
“Burst Mode”
“Sequence Number”
“Contrast Mode”
“Noise Reduction”
“Self Timer”
“Rotation”
“AF Assist Lamp”
“Color Mode”
“Baby Age”
“Optical Zoom Mode”
“Conversion Lens”
“Travel Day”
“Contrast”
“World Time Location”
“Text Stamp”
“Program ISO”
“Advanced Scene Mode”
“Print Image Matching (PIM) Info”
“Number of Detected Faces”
“Saturation”
“Sharpness”
“Film Mode”
“White Balance Adjust (AB)”
“White Balance Adjust (GM)”
“Af Point Position”
“Face Detection Info”
“Lens Type”
“Lens Serial Number”
“Accessory Type”
“Transform”
“Intelligent Exposure”
“Face Recognition Info”
“Flash Warning”
“Recognized Face Flags”
“Title”
“Baby Name”
“Location”
“Country”
“State”
“City”
“Landmark”
“Intelligent Resolution”
“Makernote Version”
“Scene Mode”
“White Balance (Red)”
“White Balance (Green)”
“White Balance (Blue)”
“Flash Fired”
“Text Stamp 1”
“Text Stamp 2”
“Text Stamp 3”
“Baby Age 1”
“Transform 1”
“Capture Mode”
“Quality Level”
“Focus Mode”
“Flash Mode”
“White Balance”
“Digital Zoom”
“Sharpness”
“Contrast”
“Saturation”
“ISO Speed”
“Colour”
“Print Image Matching (PIM) Info”
“Time Zone”
“Daylight Savings”
“Makernote Data Type”
“Version”
“Print Image Matching (PIM) Info”
“Ricoh Camera Info Makernote Sub-IFD”
“Makernote Offset”
“Sanyo Thumbnail”
“Special Mode”
“Sanyo Quality”
“Macro”
“Digital Zoom”
“Software Version”
“Pict Info”
“Camera ID”
“Sequential Shot”
“Wide Range”
“Color Adjustment Node”
“Quick Shot”
“Self Timer”
“Voice Memo”
“Record Shutter Release”
“Flicker Reduce”
“Optical Zoom On”
“Digital Zoom On”
“Light Source Special”
“Resaved”
“Scene Select”
“Manual Focus Distance or Face Info”
“Sequence Shot Interval”
“Flash Mode”
“Print IM”
“Data Dump”
“Serial Number”
“Drive Mode”
“Resolution Mode”
“Auto Focus Mode”
“Focus Setting”
“White Balance”
“Exposure Mode”
“Metering Mode”
“Lens Range”
“Color Space”
“Exposure”
“Contrast”
“Shadow”
“Highlight”
“Saturation”
“Sharpness”
“Fill Light”
“Color Adjustment”
“Adjustment Mode”
“Quality”
“Firmware”
“Software”
“Auto Bracket”
“Camera Info”
“Focus Info”
“Image Quality”
“Flash Exposure Compensation”
“Teleconverter Model”
“White Balance Fine Tune Value”
“Camera Settings”
“White Balance”
“Extra Info”
“Print Image Matching Info”
“Multi Burst Mode”
“Multi Burst Image Width”
“Multi Burst Image Height”
“Panorama”
“Preview Image”
“Rating”
“Contrast”
“Saturation”
“Sharpness”
“Brightness”
“Long Exposure Noise Reduction”
“High ISO Noise Reduction”
“HDR”
“Multi Frame Noise Reduction”
“Picture Effect”
“Soft Skin Effect”
“Vignetting Correction”
“Lateral Chromatic Aberration”
“Distortion Correction”
“WB Shift Amber/Magenta”
“Auto Portrait Framing”
“Focus Mode”
“AF Point Selected”
“Shot Info”
“File Format”
“Sony Model ID”
“Color Mode Setting”
“Color Temperature”
“Color Compensation Filter”
“Scene Mode”
“Zone Matching”
“Dynamic Range Optimizer”
“Image Stabilisation”
“Lens ID”
“Minolta Makernote”
“Color Mode”
“Lens Spec”
“Full Image Size”
“Preview Image Size”
“Macro”
“Exposure Mode”
“Focus Mode”
“AF Mode”
“AF Illuminator”
“Quality”
“Flash Level”
“Release Mode”
“Sequence Number”
“Anti Blur”
“Long Exposure Noise Reduction”
“Dynamic Range Optimizer”
“High ISO Noise Reduction”
“Intelligent Auto”
“White Balance 2”
“No Print”
“Makernote Thumb Offset”
“Makernote Thumb Length”
“Sony-6-0x0203”
“Makernote Thumb Version”
“Thumbnail Offset”
“Thumbnail Length”
“GPS Version ID”
“GPS Latitude Ref”
“GPS Latitude”
“GPS Longitude Ref”
“GPS Longitude”
“GPS Altitude Ref”
“GPS Altitude”
“GPS Time-Stamp”
“GPS Satellites”
“GPS Status”
“GPS Measure Mode”
“GPS DOP”
“GPS Speed Ref”
“GPS Speed”
“GPS Track Ref”
“GPS Track”
“GPS Img Direction Ref”
“GPS Img Direction”
“GPS Map Datum”
“GPS Dest Latitude Ref”
“GPS Dest Latitude”
“GPS Dest Longitude Ref”
“GPS Dest Longitude”
“GPS Dest Bearing Ref”
“GPS Dest Bearing”
“GPS Dest Distance Ref”
“GPS Dest Distance”
“GPS Processing Method”
“GPS Area Information”
“GPS Date Stamp”
“GPS Differential”
“File Name”
“File Size”
“File Modified Date”
“GIF Format Version”
“Image Height”
“Image Width”
“Color Table Size”
“Is Color Table Sorted”
“Bits per Pixel”
“Has Global Color Table”
“Background Color Index”
“Pixel Aspect Ratio”
“Profile Size”
“CMM Type”
“Version”
“Class”
“Color space”
“Profile Connection Space”
“Profile Date/Time”
“Signature”
“Primary Platform”
“CMM Flags”
“Device manufacturer”
“Device model”
“Device attributes”
“Rendering Intent”
“XYZ values”
“Profile Creator”
“Tag Count”
“AToB 0”
“AToB 1”
“AToB 2”
“Blue Colorant”
“Blue TRC”
“BToA 0”
“BToA 1”
“BToA 2”
“Calibration Date/Time”
“Char Target”
“Chromatic Adaptation”
“Chromaticity”
“Copyright”
“CrdInfo”
“Device Mfg Description”
“Device Model Description”
“Device Settings”
“Gamut”
“Gray TRC”
“Green Colorant”
“Green TRC”
“Luminance”
“Measurement”
“Media Black Point”
“Media White Point”
“Named Color”
“Named Color 2”
“Output Response”
“Preview 0”
“Preview 1”
“Preview 2”
“Profile Description”
“Profile Sequence Description”
“Ps2 CRD 0”
“Ps2 CRD 1”
“Ps2 CRD 2”
“Ps2 CRD 3”
“Ps2 CSA”
“Ps2 Rendering Intent”
“Red Colorant”
“Red TRC”
“Screening Desc”
“Screening”
“Technology”
“Ucrbg”
“Viewing Conditions Description”
“Viewing Conditions”
“Apple Multi-language Profile Name”
“Image Type”
“Image Width”
“Image Height”
“Colour Palette Size”
“Colour Planes”
“Hotspot X”
“Bits Per Pixel”
“Hotspot Y”
“Image Size Bytes”
“Image Offset Bytes”
“Enveloped Record Version”
“Destination”
“File Format”
“File Version”
“Service Identifier”
“Envelope Number”
“Product Identifier”
“Envelope Priority”
“Date Sent”
“Time Sent”
“Coded Character Set”
“Unique Object Name”
“ARM Identifier”
“ARM Version”
“Application Record Version”
“Object Type Reference”
“Object Attribute Reference”
“Object Name”
“Edit Status”
“Editorial Update”
“Urgency”
“Subject Reference”
“Category”
“Supplemental Category(s)”
“Fixture Identifier”
“Keywords”
“Content Location Code”
“Content Location Name”
“Release Date”
“Release Time”
“Expiration Date”
“Expiration Time”
“Special Instructions”
“Action Advised”
“Reference Service”
“Reference Date”
“Reference Number”
“Date Created”
“Time Created”
“Digital Date Created”
“Digital Time Created”
“Originating Program”
“Program Version”
“Object Cycle”
“By-line”
“By-line Title”
“City”
“Sub-location”
“Province/State”
“Country/Primary Location Code”
“Country/Primary Location Name”
“Original Transmission Reference”
“Headline”
“Credit”
“Source”
“Copyright Notice”
“Contact”
“Caption/Abstract”
“Local Caption”
“Caption Writer/Editor”
“Rasterized Caption”
“Image Type”
“Image Orientation”
“Language Identifier”
“Audio Type”
“Audio Sampling Rate”
“Audio Sampling Resolution”
“Audio Duration”
“Audio Outcue”
“Job Identifier”
“Master Document Identifier”
“Short Document Identifier”
“Unique Document Identifier”
“Owner Identifier”
“Object Data Preview File Format”
“Object Data Preview File Format Version”
“Object Data Preview Data”
“Version”
“Resolution Units”
“Y Resolution”
“X Resolution”
“Thumbnail Width Pixels”
“Thumbnail Height Pixels”
“Extension Code”
“JPEG Comment”
“Compression Type”
“Data Precision”
“Image Width”
“Image Height”
“Number of Components”
“Component 1”
“Component 2”
“Component 3”
“Component 4”
“Version”
“Bits Per Pixel”
“X Min”
“Y Min”
“X Max”
“Y Max”
“Horizontal DPI”
“Vertical DPI”
“Palette”
“Color Planes”
“Bytes Per Line”
“Palette Type”
“H Scr Size”
“V Scr Size”
“Quality”
“Comment”
“Copyright”
“Channels, Rows, Columns, Depth, Mode”
“Mac Print Info”
“XML Data”
“Indexed Color Table”
“Resolution Info”
“Alpha Channels”
“Display Info (Obsolete)”
“Caption”
“Border Information”
“Background Color”
“Print Flags”
“Grayscale and Multichannel Halftoning Information”
“Color Halftoning Information”
“Duotone Halftoning Information”
“Grayscale and Multichannel Transfer Function”
“Color Transfer Functions”
“Duotone Transfer Functions”
“Duotone Image Information”
“Effective Black and White Values”
“EPS Options”
“Quick Mask Information”
“Layer State Information”
“Layers Group Information”
“IPTC-NAA Record”
“Image Mode for Raw Format Files”
“JPEG Quality”
“Grid and Guides Information”
“Photoshop 4.0 Thumbnail”
“Copyright Flag”
“URL”
“Thumbnail Data”
“Global Angle”
“ICC Profile Bytes”
“Watermark”
“ICC Untagged Profile”
“Effects Visible”
“Spot Halftone”
“Seed Number”
“Unicode Alpha Names”
“Indexed Color Table Count”
“Transparency Index”
“Global Altitude”
“Slices”
“Workflow URL”
“Jump To XPEP”
“Alpha Identifiers”
“URL List”
“Version Info”
“EXIF Data 1”
“EXIF Data 3”
“XMP Data”
“Caption Digest”
“Print Scale”
“Pixel Aspect Ratio”
“Layer Comps”
“Alternate Duotone Colors”
“Alternate Spot Colors”
“Layer Selection IDs”
“HDR Toning Info”
“Print Info”
“Layer Groups Enabled ID”
“Color Samplers”
“Measurement Scale”
“Timeline Information”
“Sheet Disclosure”
“Display Info”
“Onion Skins”
“Count information”
“Print Info 2”
“Print Style”
“Mac NSPrintInfo”
“Win DEVMODE”
“Auto Save File Path”
“Auto Save Format”
“Path Selection State”
“Clipping Path Name”
“Origin Path Info”
“Image Ready Variables XML”
“Image Ready Data Sets”
“Lightroom Workflow”
“Print Flags Information”
“Plug-in %d Data”, tagType – 0x0fa0 + 1)
“Channel Count”
“Image Height”
“Image Width”
“Bits Per Channel”
“Color Mode”
“White Point X”
“White Point Y”
“Red X”
“Red Y”
“Green X”
“Green Y”
“Blue X”
“Blue Y”
“Image Height”
“Image Width”
“Bits Per Sample”
“Color Type”
“Compression Type”
“Filter Method”
“Interlace Method”
“Palette Size”
“Palette Has Transparency”
“sRGB Rendering Intent”
“Image Gamma”
“ICC Profile Name”
“Textual Data”
“Last Modification Time”
“Background Color”
“Pixels Per Unit X”
“Pixels Per Unit Y”
“Unit Specifier”
“Significant Bits”
“Image Height”
“Image Width”
“Has Alpha”
“Is Animation”
“XMP Value Count”
“Make”
“Model”
“Exposure Time”
“Shutter Speed Value”
“F-Number”
“Lens Information”
“Lens”
“Serial Number”
“Firmware”
“Focal Length”
“Aperture Value”
“Exposure Program”
“Date/Time Original”
“Date/Time Digitized”
“Base URL”
“Create Date”
“Creator Tool”
“Identifier”
“Metadata Date”
“Modify Date”
“Nickname”
“Rating”
“Label”
“Title”
“Subject”
“Date”
“Type”
“Description”
“Relation”
“Coverage”
“Creator”
“Publisher”
“Contributor”
“Rights”
“Format”
“Identifier”
“Language”
“Audience”
“Provenance”
“Rights Holder”
“Instructional Method”
“Accrual Method”
“Accrual Periodicity”
“Accrual Policy”

Tika metadata tags

From org.apache.tika.metadata files ?

access_permission:
can_modify
extract_content
extract_for_accessibility
assemble_document
fill_in_form
modify_annotations
can_print
can_print_degraded

(climate forecast, no prefix)
prg_ID
cmd_ln
history
table_id
institution
source
contact
project_id
Conventions
references
acknowledgement
realization
experiment_id
comment
model_name_english

(createivecomments no prefix)
License-Url
License-Location
Work-Type

(database – no prefix)
table_name
column_count
column_name

DublinCore dc:
dc:format
dc:identifier
dc:modified
dc:contributor
dc:coverage
dc:creator
dc:date
dc:description
dc:language
dc:publisher
dc:relation
dc:rights
dc:source
dc:subject
dc:title
dc:type

dcterms:
dcterms:created

geo:
geo:lat
geo:long
geo:alt

HttpHeaders – no prefix
Content-Encoding
Content-Language
Content-Length
Content-Location
Content-Disposition
Content-MD5
Content-Type
Content-Type-Hint
Last-Modified
Location

IPTC Iptc4xmpCore:
Iptc4xmpCore:CountryCode
Iptc4xmpCore:IntellectualGenre
Iptc4xmpCore:Scene
Iptc4xmpCore:SubjectCode
Iptc4xmpCore:Location
Iptc4xmpCore:CreatorContactInfo
Iptc4xmpCore:CiAdrExtadr
Iptc4xmpCore:CiAdrCity
Iptc4xmpCore:CiAdrCtry
Iptc4xmpCore:CiEmailWork
Iptc4xmpCore:CiTelWork
Iptc4xmpCore:CiAdrPcode
Iptc4xmpCore:CiAdrRegion
Iptc4xmpCore:CiUrlWork

IPTC Iptc4xmpExt:
Iptc4xmpExt:AddlModelInfo
Iptc4xmpExt:ArtworkOrObject
Iptc4xmpExt:OrganisationInImageCode
Iptc4xmpExt:CVterm
Iptc4xmpExt:LocationShown
Iptc4xmpExt:ModelAge
Iptc4xmpExt:OrganisationInImageName
Iptc4xmpExt:PersonInImage
Iptc4xmpExt:DigImageGUID
Iptc4xmpExt:DigitalSourcefileType
Iptc4xmpExt:DigitalSourceType
Iptc4xmpExt:Event
Iptc4xmpExt:RegistryId
Iptc4xmpExt:IptcLastEdited
Iptc4xmpExt:LocationCreated
Iptc4xmpExt:MaxAvailHeight
Iptc4xmpExt:MaxAvailWidth
Iptc4xmpExt:AOCopyrightNotice
Iptc4xmpExt:AOCreator
Iptc4xmpExt:AODateCreated
Iptc4xmpExt:AOSource
Iptc4xmpExt:AOSourceInvNo
Iptc4xmpExt:AOTitle
Iptc4xmpExt:LocationShownCity
Iptc4xmpExt:LocationShownCountryCode
Iptc4xmpExt:LocationShownCountryName
Iptc4xmpExt:LocationShownProvinceState
Iptc4xmpExt:LocationShownSublocation
Iptc4xmpExt:LocationShownWorldRegion
Iptc4xmpExt:LocationCreatedCity
Iptc4xmpExt:LocationCreatedCountryCode
Iptc4xmpExt:LocationCreatedCountryName
Iptc4xmpExt:LocationCreatedProvinceState
Iptc4xmpExt:LocationCreatedSublocation
Iptc4xmpExt:LocationCreatedWorldRegion
Iptc4xmpExt:RegItemId
Iptc4xmpExt:RegOrgId

IPTC plus:
plus:ImageSupplier
plus:ImageSupplierID
plus:ImageSupplierId
plus:ImageSupplierName
plus:ImageSupplierImageID
plus:Version
plus:CopyrightOwner
plus:CopyrightOwnerID
plus:CopyrightOwnerId
plus:CopyrightOwnerName
plus:ImageCreator
plus:ImageCreatorID
plus:ImageCreatorName
plus:Licensor
plus:LicensorID
plus:LicensorId
plus:LicensorName
plus:LicensorCity
plus:LicensorCountry
plus:LicensorEmail
plus:LicensorExtendedAddress
plus:LicensorPostalCode
plus:LicensorRegion
plus:LicensorStreetAddress
plus:LicensorTelephone1
plus:LicensorTelephone2
plus:LicensorURL
plus:MinorModelAgeDisclosure
plus:ModelReleaseID
plus:ModelReleaseStatus
plus:PropertyReleaseID
plus:PropertyReleaseStatus

Message – no prefix
Message-Recipient-Address
Message-From
Message-To
Message-Cc
Message-Bcc

MSOffice – no prefix being replaced with Office
Keywords
Comments
Last-Author
Author
Application-Name
Revision-Number
Template
Total-Time
Presentation-Format
Notes
Manager
Application-Version
Version
Content-Status
Category
Company
Security
Slide-Count
Page-Count
Paragraph-Count
Line-Count
Word-Count
Character Count
Character-Count-With-Spaces
Table-Count
Image-Count
Object-Count
Edit-Time
Creation-Date
Last-Save-Date
Last-Printed

Office – meta:
meta:keyword
meta:initial-author
meta:last-author
meta:author
meta:creation-date
meta:save-date
meta:print-date
meta:slide-count
meta:page-count
meta:paragraph-count
meta:line-count
meta:word-count
meta:character-count
meta:character-count-with-spaces
meta:table-count
meta:image-count
meta:object-count

OfficeOpenXMLCore – cp:
cp:category
cp:contentStatus
cp:lastModifiedBy
cp:lastPrinted
cp:revision
cp:version
cp:subject

OfficeOpenXMLExtended – extended-properties:
extended-properties:Template
extended-properties:Manager
extended-properties:Company
extended-properties:PresentationFormat
extended-properties:Notes
extended-properties:TotalTime
extended-properties:HiddedSlides
extended-properties:Application
extended-properties:AppVersion
extended-properties:DocSecurity
w:comments

PagedText – xmpTPg:
xmpTPg:NPages

Photoshop – photoshop:
photoshop:AuthorsPosition
photoshop:ColorMode
photoshop:CaptionWriter
photoshop:Category
photoshop:City
photoshop:Country
photoshop:Credit
photoshop:DateCreated
photoshop:Headline
photoshop:Instructions
photoshop:Source
photoshop:State
photoshop:SupplementalCategories
photoshop:TransmissionReference
photoshop:Urgency

RTF rtf_meta:
rtf_meta:thumbnail
rtf_meta:emb_app_version
rtf_meta:emb_class
rtf_meta:emb_topic
rtf_meta:emb_item

TIFF tiff:
tiff:BitsPerSample
tiff:ImageLength
tiff:ImageWidth
tiff:SamplesPerPixel
tiff:Make
tiff:Model
tiff:Software
tiff:Orientation
tiff:XResolution
tiff:YResolution
tiff:ResolutionUnit

TIFF exif:
exif:Flash
exif:ExposureTime
exif:FNumber
exif:FocalLength
exif:IsoSpeedRatings
exif:DateTimeOriginal

Tika – X-TIKA:
X-TIKA:EXCEPTION
X-TIKA:warn

TikaMetadata – no prefix
resourceName
protected
embeddedRelationshipId
embeddedStorageClassId
embeddedResourceType

XMP – xmp:
xmp:CreateDate
xmp:CreatorTool
xmp:Identifier
xmp:Label
xmp:MetadataDate
xmp:ModifyDate
xmp:Rating

XMPDM – xmpDM:
xmpDM:album
xmpDM:absPeakAudioFilePath
xmpDM:altTapeName
xmpDM:artist
xmpDM:audioModDate
xmpDM:albumArtist
xmpDM:audioSampleRate
xmpDM:audioSampleType
xmpDM:compilation
xmpDM:audioChannelType
xmpDM:audioCompressor
xmpDM:composer
xmpDM:copyright
xmpDM:discNumber
xmpDM:duration
xmpDM:engineer
xmpDM:fileDataRate
xmpDM:genre
xmpDM:instrument
xmpDM:key
xmpDM:logComment
xmpDM:loop
xmpDM:numberOfBeats
xmpDM:metadataModDate
xmpDM:pullDown
xmpDM:relativePeakAudioFilePath
xmpDM:releaseDate
xmpDM:scaleType
xmpDM:scene
xmpDM:shotDate
xmpDM:shotLocation
xmpDM:shotName
xmpDM:speakerPlacement
xmpDM:stretchMode
xmpDM:tapeName
xmpDM:tempo
xmpDM:timeSignature
xmpDM:trackNumber
xmpDM:videoAlphaMode
xmpDM:videoAlphaUnityIsTransparent
xmpDM:videoColorSpace
xmpDM:videoCompressor
xmpDM:videoFieldOrder
xmpDM:videoFrameRate
xmpDM:videoModDate
xmpDM:videoPixelDepth
xmpDM:videoPixelAspectRatio

XMPIdq – xmpidq:
xmpidq:Scheme

XMPMM – xmpMM:
xmpMM:DocumentID
xmpMM:InstanceID
xmpMM:OriginalDocumentID
xmpMM:RenditionClass
xmpMM:RenditionParams
xmpMM:History:InstanceID
xmpMM:History:Action
xmpMM:History:When
xmpMM:History:SoftwareAgent
xmpMM:DerivedFrom:DocumentID
xmpMM:DerivedFrom:InstanceID

XMPRights – xmpRights:
xmpRights:Certificate
xmpRights:Marked
xmpRights:Owner
xmpRights:UsageTerms
xmpRights:WebStatement

Categories: technology

Smoked whitefish salad

May 24, 2016 Comments off

Fish

poach – water and star anise

Salad

lettuce/green leaf, shredded

red pepper, thin slice matchstick
black olive, sliced
orange/nectarine/peach, thin slice/mandolin
one lemon – zest

Dressing

yoghurt
mustard powder
salt

honey
half lemon juice

Table

combine, done.

EAT YUM

Feta Spinach Pastry

May 12, 2016 Comments off

Supermarche puff pastry, spinach, feta, nutmeg

Oven at 180-200c


Sweat a bunch of spinach – or about 200g frozen

A 200-250 pack of Feta, diced.

Mix, add nutmeg to taste

lay on unrolled pastry in 2 lines, fold over with egg wash etc

poke with fork for air

egg wash

in oven for 15-20 mins

EAT YUM

Anchovy Asian salad

May 12, 2016 Comments off

Aide memoire recipe

Dressing

Olive oil, lemon juice to volume and taste
1 crushed garlic clove
couple of dried birdseye chilis

Veg body

Blanche some sliced and sized green stuff – brocolli and fennel I used
I added some egg leftover from eggwash on the Feta-Spinach pastries I made – fried into an omelette then diced, mini size
Anchovies and oil chopped mini size all over. (2 x 15g tins ?!)
Some sesame seed and oil

EAT YUM.

Open Semantic Desktop Search – good but….

April 22, 2016 4 comments

….needs more administration documentation I think, or maybe an idiots guide for the likes of me. I installed the Desktop version VirtualBox image and it all went fairly smoothly. After setting up doc shares to an archive of about 800k docs on a NAS things started indexing. Cool ! Facets ! Keywords ! Metadata ! But all was not right – it was slowish – but hey its a VM and my host is not super-top-of-the-range (a Haswell pentium G3258 – so I made sure it was running with 2 CPUs and had 4gb RAM and about 40gb disk to play with at the start. Monitoring it is easy with the search GUI or using the XML REST response at http://localhost:8983/solr/admin/cores?action=status. But things seem to halt at times, or CPU spikes and not much appears to be happening – where do you find any info about what OSDS is doing right now ?

So – the usual places – web server, syslog, etc. Only trouble is I can get a desktop terminal to run in the VM – it seems to start then nothing. So ctrl-f2 into a console. What user id ? Turns out its “user”. What’s the password ? turns out its “live”. I found the log4j.properties for solr in /var/solr and adjusted to INFO level with console and file output, restarted SOLR and…no more info. Messages and syslog need root access – sudo of course – but have to add user “user” to sudoers. Whats the root password then ? I found it somewhere in the documentation but now (ironic) I cant re-find it there. So if you find it let me know – and when you do you can update the sudoers to include user live. Turns out the other place to look for clues is the /tmp dir – it contains the tesseract OCR and tika tmp copies of things so you can monitor the number of files in there and see progress.

But I still cant find out what exactly is going on right now (or maybe this is all there is) and importantly I cannot really guess when the detection, extraction, OCR and indexing will finish. I have a file count from my archive and can see the numbers of current docs indexed but that doesnt give me much help in terms of timing. Tesseract seems pretty sensitive and some quick blog and forum searching seems to confirm that. Still – despite this and the occasional crash or VM abort (and no real way to understand why except removing the most recently active folder share from the VM and wading thru /var/log – making only 1 cpu available to the VM seems to help the crash frequency it turns out) its still going to be better than Recoll I think which wont have facets are the possibilities or RDF enrichment, vocabularies etc. I’d also like to try out ElasticSearch with it – soon.

So –

  • zenity at 50% ? – kill the pid – its just a GUI notification that somethings running, and not really needed
    Nautilus seems to miss behave and if you leave it open on /tmp it also seems to take 50% cpu – kill it
    Give the VM plenty of RAM. I seem to have come across some SMP bug in debian on my system so Ive tuned the VM down to 1 cpu, which seems to help
  • Important dirs on your travels…
    • /opt/solr,
    • /var/solr/logs
    • /var/log/messages
    • /tmp
    • ~
    • /var/opensemanticdesktopsearch
    • /var/lib/opensemanticsearch (*.py files for UI)
    • /usr/share/solr-php-ui/templates/view.index.topbar.php (more UI files – eg header)
      /usr/share/python-django-common/django/
      /var/solr/logs
      /opt/solr
      /etc/defaults/solr.sh
    • /var/solr/data/core1/conf
    • /usr/share/solr-php-ui/config.php
    • /usr/share/solr-php-ui/config/config.facets.php (add facet to this list – even tho it says not to because the UI will overwrite it : it doesnt tho – so they appear in the UI)
      ./opensemanticsearch/enhancer-rdf (map facets to purls)
      ./opensemanticsearch-django-webapps/apache.conf
  • tika on localhost:9998
  • default logging for tika and tesseract appears to be system.out
  • sudo apt-get update !

Adding facets

Note editing and uploading facets via text file at http://localhost/search-apps/admin/thesaurus/facet/ attempts to overwrite config.facets.php but fails !

'Facet' object has no attribute 'title'

but the facet is created in Django and appears under “ontologies”, but without any named entities. Some debugging and rooting around shows that the PHP code in /var/lib/opensemanticsearch/ontologies.views is looking for facet.title from the form, when it is in fact called facet.label. Changing line 287 in this file to

""".format(     facet.facet.encode('utf-8'),    facet.label.encode('utf-8')

means that you can now upload text files with concepts for facets. These then show up under the facet name in the right hand column, but dont show up as “concepts” that you can alias for instance.

Collapsing facet menus

if the list of concepts or even the list of facets gets long then putting them in accordian might be good idea. I used Daniel Stocks jQuery plugin. (https://github.com/danielstocks/jQuery-Collapse). Download then include the plugin, eg in

/usr/share/solr-php-ui/templates/view.index.php:

add the following script include

http://js/jQuery-Collapse-master/src/jquery.collapse.js

Then : change line 229 (in function print_fact) to in

/usr/share/solr-php-ui/ndex.php
<div id="<?= $facet_field ?>" class="facet" data-collapse="accordian">


Counting docs

A quick script to show the number of processed docs on the cmdline

FILE=/tmp/numdocs.log
echo "Outputting to $FILE"
wget -o /tmp/status_msg.log -O $FILE http://localhost:8983/solr/admin/cores?action=status 
grep --color numDocs $FILE
rm $FILE

Any more tips ?


 

  • Update (May 4 2016) – about half way thru volume of 800k docs now after 25 days processing. Still crashing out but not so often now it seems. About 20gb of disk used in the VM now.
  • Update (June 1 2016) – finished, but only after I disabled pdf ocr at about 700k – have to come back to this
  • Update June 7 2016 – Ive been trying to get exif data into solr from all the jpegs that I have but without much success until now. After head scratching and debugging and trying to work it out I have had to
    • provide a tika config file :
      <?xml version="1.0" encoding="UTF-8"?>
      <properties>
       <parsers>
         <!-- Most things can use the default -->
         <parser class="org.apache.tika.parser.DefaultParser">
           <!-- Don't use DefaultParser for these mimetypes, alternate config below -->
           <mime-exclude>image/jpeg</mime-exclude>
         </parser>
      
         <!-- JPEG needs special handling - try+combine everything -->
         <parser class="org.apache.tika.parser.jpeg.JpegParser" >
            <mime>image/jpeg</mime>
         </parser>
       </parsers>
      </properties>
    • update/fix the /etc/init.d/tika script start/respawn cmd to correctly use that file (and reboot the vm as init restart doesnt seem to work and systemctl daemon-restart doesnt either – or maybe its just my dud config) :
      daemon --respawn --user=tika --name=tika --verbose -o /tmp/tika.log -O /tmp/tika.err -- 
      java -jar /usr/share/java/tika-server.jar -c /home/user/osds-config.xml
    • try and work out if the /usr/lib/python2.7/etl/enhance_extract_text_tika_server.py script was working or not -lots of extra print statements and verbose = True. The long and the short of it is that it is working, but the extracted metadata fields defined in the script dont include much in the way of exif fields, and even if they did we’d also have to update the /var/solr/data/core1/conf/schema.xml to include them as fields. Thats the next job…
    • A handy cmdline test of the tika-server is to post a jpeg to it using curl. If your init script isnt working you wont get much back, likewise of the file you think you are posting doesnt actually exist, and if you are getting a 415 unsupported media type back in verbose curl response, it probably means you tika config file is screwed, like mine was, but I kept ignoring that – fool !. I went back to unit level and defined a single test dir in the /etc/opensemanticsearch/filemonitoring/files and put one test jpeg in there. Using the curl cmd you can then test the tika-server is working (you’ll get back a json blob with exif fields), and then using ‘touch’ and /usr/bin/opensemanticsearch-index-dir you can test the pipeline in full.
      curl -vX POST -H "Accept: application/json" -F file=@exif.jpg http://localhost:9998/rmeta/form -H "Content-type: multipart/form-data"
  • (Update Sept 2016) – new version of OSDS available that seems to work better out of the box. Interface changes, django defaults to english, adding named/entities and facets doesn’t barf.

 

http://www.opensemanticsearch.org/doc/tutorial

http://www.lesbonscomptes.com/recoll/usermanual/webhelp/docs/index.html

https://www.elastic.co/products/elasticsearch

https://www.kernel.org/doc/Documentation/sysrq.txt (although this doesnt seem to be possible during the crash as the system is completely unresponsive)

https://help.ubuntu.com/community/DebuggingSystemCrash